Thursday, January 7, 2010

Companies be aware: Google could know your strategy!

Ok, we now know that Xing can find out about your new starting project, but did you know that Google could know your company strategy and next secret  moves ? They really could find out the company you will acquire and the next brand new technology you will use. And we have indications that Google actually practices similar algorithms.

For the experts: The prognosis mechanisms used in Google Flue prognosis can easily be used to detect search trends within the IP Address ranges of companies. Clearly search trends could point to your next major company activity. This might be a merger or acquisition or just the planned usage of a new technology. Or do you think that employees dealing with strategic moves do not search for their new topic ?

So here are the details about a potential case study: Within your company you have a small number of employees who secretly evaluate potential mergers and acquisition with other companies or plan for the usage of specific new technology. To start their research these employees will certainly use the web and search engines. So how does this look like on the Google end ? Within a certain IP Address Range which can be linked to your company, all search terms will be monitored by Google standard mechanisms. This is something that is definitely done by Google. Now over time specific search terms will follow a certain pattern. And this search terms can be matched to your next strategic move.  Before your company starts it's research very little search on the specific terms will be done. But as soon as your strategy evaluation is being started  the trends for this search term will explode. This is similar to the analysis which Google is already doing for the Flu prognosis.

With some high level industry expertise, Google could easily base an investment strategy on this information. This does not really sound good for keeping your strategy secret from Google and other search giants.
Will Google do this ? We could not find anything in the terms of usage which prevents them from doing it.

We know that companies spend large efforts on keeping M&A activity secret, but have they thought about this obvious security hole ?

We know about a way how you can protect yourself against this. One of our next posts will explain the details.
OJ

Wednesday, January 6, 2010

It is your wrong data which stays

Some time ago i tested a service called Plaxo. Plaxo copies and updates your profile data to other peoples address book. When you update your telephone number in your Plaxo account all of your friends address books will be updated.

I only tested this service and deleted my account pretty soon. In my view  Plaxo was spamming other people with mail requests to update their entry in my address book. I found this whole concept a bit to intrusive.

In January after testing Plaxo, I received numerous birthday congratulations month away from my real birthday. First I was a bit confiused. It turned out that somehow I did not enter my birthday into my Plaxo profile and the default birthday of 1st January was replicated to a lot of my contacts. 

This all happened a couple of years back. And certainly since then, I am fighting on 1st of January against numerous wrong congratulations.

The general learning: Wrong data about you will stay present and impact your live for a loooong time. Wasn't there a movie about this ? (Remember: Buttle was wrongly arrested for Tuttle)

A happy new year to all of you! And congratulations !
OJ

Sunday, December 20, 2009

Google Realtime Search


We were talking about the difficulties of creating a realtime search and how Google introduced a ticker attemp to solve this. Find a interesting article about this here: Google Search RIP
OJ

Tuesday, December 8, 2009

Picture Processing: Google Goggles Update

Have you seen Google Goggles ? Hey this is even slightly more disturbing than our previous post. Somebody taking pictures of you with Google Goggles might get the name directly displayed on her mobile.
However Goggles can not do this quite yet.

Any guess how long it will take? I would say no longer than one, two years.....
OJ

Google Realtime Search ? No ! Call it: Google realtime ticker (with a filter) !

You might have read the recent announcement of Google introducing realtime search and remember my previous post about the likelihood of Googles infrastructure facing a major challenge with realtime search. 


Ok, today's announcement has not yet proven me wrong. What Google presented today is not search how Google itself would define it. This is only a realtime ticker with a filter applied to it. No relevance rating is added to the ticker.

Realtime search is only a challenge if Google wants to sort realtime posts for relevance. The current solution does not do that. For solution no central infrastructure is needed. 

However if you would like to add some relevance factor to it, this is different. The computing of relevance of realtime updates would require a central infrastructure. When I use the term relevance think about something like a mechanism to rank often followed tweets higher than others, something where re-tweets push a result higher and where a simple tweet with no links and no follow-ups has a very fast degradation of relevance. This would be realtime page-rank. And for this you need a central infrastructure different from the massive parallel Google server world.  Lets wait and see.

OJ

Sunday, December 6, 2009

Picture Processing: or John Q. Public kissing in Hawaii


I guess we all have realized that something is happening in the world of picture processing.
Services like photosynth and Polar Rose and even Automatic Photo Tagging illustrate this trend. Not to forget Picasa and iPhoto face recognition capabilities.

What we see is basically that the computer starts to "understand" the content of the pictures and its relation to the real world. We will not go into details about the mechanisms but it is easy to grasp the idea. Just think of large amounts of public tagged photos, add cheap server processing power and online storage to this, add photo comparison and finally some recognition algorithms for faces and buildings etc.

Lets explore, what this will mean to users and bystanders in the future. Let's think this through a bit:

We can safely assume that all faces in all public pictures will at some time be tagged with the real peoples name. Yes even your name. As typically in the Internet this tagging will not be 100% reliable, but a fair amount of the data will be correct. Even if you think, you can avoid this, it will not help in the long run. Somebody somewhere will put a picture of you online and tag your face with your name. And once this information is in the wild, it can be used as reference for all the other pictures of you.

I guess you might have known this. But extend the thought a bit. After face recognition, comes building recognition (sorry i do not have a link, yet; however it is possible and similar to photosynth). The buildings on your pictures will be recognized and automatically be geotagged. Other recognition algorithms will follow. (How difficult can it be to detect if two faces are kissing each other ?)

So lets put all this together in a single use case in the near future:

  • John Q. Public is on a holiday trip in Hawaii
  • Somebody takes a random photo with his mobile that shows John in the background giving a goodbye kiss to his traveling acquaintance.
  • This photo is uploaded a day later to a public photo page
  • Somebody will automatically detect John and tag this picture with "Hawaii" (the airport building), "John" (face recognition) and "kissing" (new algorithm)
  • As the Internet never really forgets something, the picture now captures an eternal moment of John.
So you will stop kissing on airports from now on? It might be too late, your last goodbye could already be online.....and it will resurface whenever you do not expect it.

Who will do all this tagging and analysis? That's easy, don't you remember Google's mission ? (Google's mission: to organize the world's information and make it universally accessible and useful. ) . You will be able to search and find poor John with "John Q. Public"+"kissing"+"Hawaii"

You feel a bit of pitty for John? Maybe this will result in a general tolerance increase. Everybody might have his eternal moments online! So nobody can fingerpoint to somebody else.

What do you think? Is this story too absurd or did we hit some points?
Please comment!

OJ

Monday, November 30, 2009

Realtime Messages: Can Google include realtime data ?

OK, we already pointed out the factors that will drive search results to be enriched with your social network data (remember: in our opinion this this will be driven by smartphones, social address books and Universal Search); But what about real time data ? How can real time data be included into general search results?

Real time data is definitely getting more and more important in the net. Just think about the news from the Hudson plane crash in January; At this time it was still newsworthy that Twitter was the first to report the accident, but now the new Google Chrome OS, the latest firmeware update from your favorite device etc., all of this is naturally first reported and discussed on Twitter. Certainly realtime data can be found in many other places like facebook, comments in blogs and so on. Thinking about it, I suddenly get the feeling that quite a substantial amount of new data in the web is entered as "realtime data" with a time component. This time component is strongly influencing the relevance of the data.

Lots of action is currently happening in this realtime search space. Please see techcrunch and venturebeat for excellent summaries of the current state of the art.

While the two challenges (social data and realtime data) pose a similar challenge in presenting the search results to the user and weighting relevance of the result, realtime data is by nature much more complicated. Realtime data deeply affects the necessary infrastructure that is needed to process it.

Googles infrastructure is clearly an "offline" architecture. By offline we mean that updates to the Google search index are only included very slowly. The underlying reason is that this gives Google the possibility to scale their systems with massive numbers of rather small and cheap servers. This is normally called horizontal scaling in contrast to vertical scaling, where you need big and expensive machines to which you add processors, storage and memory if necessary. In Googles park of thousands of small servers, the index is replicated for better performance. This replication is not a "realtime" thing. It takes a significant amount of time. Realtime replications are usually very costly. Software architectures with realtime update capabilities tend to be developed for large scale machines. So we have a natural contradiction between Googles way of computing (with massive amounts of small server machines) against the requirement of relatime updates for parts of the index.

A possible solution for Google would be to enrich the standard offline search results with realtime results which are produced from a new and different infrastructure. Most likely this infrastructure will be based on large, powerfull and expensive servers which might be a completely new world for Google. Certainly this is possible for Google, but scaling might be the "real" challenge of the realtime search game.

Certainly we have described the world a bit simplistic here. As the world is neither black nor white there are numerous new trends (e.g. virtualization) and technologies which blur the line between horizontal vs. vertical scaling and offline vs. realtime architectures. Nevertheless we see realtime search as a challenge for big old "offline" Google search.

OJ