Showing posts with label realtime search. Show all posts
Showing posts with label realtime search. Show all posts

Tuesday, December 8, 2009

Google Realtime Search ? No ! Call it: Google realtime ticker (with a filter) !

You might have read the recent announcement of Google introducing realtime search and remember my previous post about the likelihood of Googles infrastructure facing a major challenge with realtime search. 


Ok, today's announcement has not yet proven me wrong. What Google presented today is not search how Google itself would define it. This is only a realtime ticker with a filter applied to it. No relevance rating is added to the ticker.

Realtime search is only a challenge if Google wants to sort realtime posts for relevance. The current solution does not do that. For solution no central infrastructure is needed. 

However if you would like to add some relevance factor to it, this is different. The computing of relevance of realtime updates would require a central infrastructure. When I use the term relevance think about something like a mechanism to rank often followed tweets higher than others, something where re-tweets push a result higher and where a simple tweet with no links and no follow-ups has a very fast degradation of relevance. This would be realtime page-rank. And for this you need a central infrastructure different from the massive parallel Google server world.  Lets wait and see.

OJ

Monday, November 30, 2009

Realtime Messages: Can Google include realtime data ?

OK, we already pointed out the factors that will drive search results to be enriched with your social network data (remember: in our opinion this this will be driven by smartphones, social address books and Universal Search); But what about real time data ? How can real time data be included into general search results?

Real time data is definitely getting more and more important in the net. Just think about the news from the Hudson plane crash in January; At this time it was still newsworthy that Twitter was the first to report the accident, but now the new Google Chrome OS, the latest firmeware update from your favorite device etc., all of this is naturally first reported and discussed on Twitter. Certainly realtime data can be found in many other places like facebook, comments in blogs and so on. Thinking about it, I suddenly get the feeling that quite a substantial amount of new data in the web is entered as "realtime data" with a time component. This time component is strongly influencing the relevance of the data.

Lots of action is currently happening in this realtime search space. Please see techcrunch and venturebeat for excellent summaries of the current state of the art.

While the two challenges (social data and realtime data) pose a similar challenge in presenting the search results to the user and weighting relevance of the result, realtime data is by nature much more complicated. Realtime data deeply affects the necessary infrastructure that is needed to process it.

Googles infrastructure is clearly an "offline" architecture. By offline we mean that updates to the Google search index are only included very slowly. The underlying reason is that this gives Google the possibility to scale their systems with massive numbers of rather small and cheap servers. This is normally called horizontal scaling in contrast to vertical scaling, where you need big and expensive machines to which you add processors, storage and memory if necessary. In Googles park of thousands of small servers, the index is replicated for better performance. This replication is not a "realtime" thing. It takes a significant amount of time. Realtime replications are usually very costly. Software architectures with realtime update capabilities tend to be developed for large scale machines. So we have a natural contradiction between Googles way of computing (with massive amounts of small server machines) against the requirement of relatime updates for parts of the index.

A possible solution for Google would be to enrich the standard offline search results with realtime results which are produced from a new and different infrastructure. Most likely this infrastructure will be based on large, powerfull and expensive servers which might be a completely new world for Google. Certainly this is possible for Google, but scaling might be the "real" challenge of the realtime search game.

Certainly we have described the world a bit simplistic here. As the world is neither black nor white there are numerous new trends (e.g. virtualization) and technologies which blur the line between horizontal vs. vertical scaling and offline vs. realtime architectures. Nevertheless we see realtime search as a challenge for big old "offline" Google search.

OJ