Monday, November 30, 2009

Realtime Messages: Can Google include realtime data ?

OK, we already pointed out the factors that will drive search results to be enriched with your social network data (remember: in our opinion this this will be driven by smartphones, social address books and Universal Search); But what about real time data ? How can real time data be included into general search results?

Real time data is definitely getting more and more important in the net. Just think about the news from the Hudson plane crash in January; At this time it was still newsworthy that Twitter was the first to report the accident, but now the new Google Chrome OS, the latest firmeware update from your favorite device etc., all of this is naturally first reported and discussed on Twitter. Certainly realtime data can be found in many other places like facebook, comments in blogs and so on. Thinking about it, I suddenly get the feeling that quite a substantial amount of new data in the web is entered as "realtime data" with a time component. This time component is strongly influencing the relevance of the data.

Lots of action is currently happening in this realtime search space. Please see techcrunch and venturebeat for excellent summaries of the current state of the art.

While the two challenges (social data and realtime data) pose a similar challenge in presenting the search results to the user and weighting relevance of the result, realtime data is by nature much more complicated. Realtime data deeply affects the necessary infrastructure that is needed to process it.

Googles infrastructure is clearly an "offline" architecture. By offline we mean that updates to the Google search index are only included very slowly. The underlying reason is that this gives Google the possibility to scale their systems with massive numbers of rather small and cheap servers. This is normally called horizontal scaling in contrast to vertical scaling, where you need big and expensive machines to which you add processors, storage and memory if necessary. In Googles park of thousands of small servers, the index is replicated for better performance. This replication is not a "realtime" thing. It takes a significant amount of time. Realtime replications are usually very costly. Software architectures with realtime update capabilities tend to be developed for large scale machines. So we have a natural contradiction between Googles way of computing (with massive amounts of small server machines) against the requirement of relatime updates for parts of the index.

A possible solution for Google would be to enrich the standard offline search results with realtime results which are produced from a new and different infrastructure. Most likely this infrastructure will be based on large, powerfull and expensive servers which might be a completely new world for Google. Certainly this is possible for Google, but scaling might be the "real" challenge of the realtime search game.

Certainly we have described the world a bit simplistic here. As the world is neither black nor white there are numerous new trends (e.g. virtualization) and technologies which blur the line between horizontal vs. vertical scaling and offline vs. realtime architectures. Nevertheless we see realtime search as a challenge for big old "offline" Google search.

OJ

No comments:

Post a Comment