Friday, March 24, 2006

Funny Profiles on Zoominfo

These days a lot of people try hard to work on improving search on the internet. Today’s wealth of internet content is so vast that any method that would help people to differentiate quality content from the ballast (that is overall flooding the net) would be extremely beneficial. Well, we already have one such a method – it is called PageRank. This method is based on the “universal popularity” of a particular site expressed by links that are pointing to it. In other words, PageRank grubs out the semantic information on popularity from the only available syntactic tool: web links. The PageRank algorithm is well proven and fine-tuned to the best possible extent. It is very hard to find any further improvement of it.

Context digging

OK, so where can we move from this point? There are just two ways forward:

  • to add some additional syntax piece to the internet (that would help make the content better searchable), or
  • try to work better with the existing unstructured content.

Zoominfo can serve as a typical application of the second approach. It tries to dig out the semantics information from the context of keywords and automatically builds user profiles from publicly available news resources. To do this, it attempts to uniquely identify a particular person by searching its name in the context of other keywords that are automatically identified as being relevant to this person. This is a very non-trivial thing to do, indeed!

The Reality Check

Let me share some examples with you. If we search Zoominfo for the most popular Czech singer Karel Gott, we find eight (!) different profiles. The good news is that all are sort of related to the singer; however, the bad news is that no one is really correct and seven of the eight actually don’t mention that this person is a singer! Where is the problem? In the attempt to differentiate possible namesakes the system actually splits information about one person to many different profiles. Of course, the balance is difficult to reach. On one hand, it is wise to suppose that if there is a lot of information about a particular person, part of it should be contributed to namesakes. On the other hand, it doesn’t hold always, particularly if the person is really popular.

From professor to journalist or landlord

However, this problem is even more general and is not limited to top celebrities only. For example prof. Vorisek, who is the Head of Department of Information Technologies at the Prague Economic University, has 4 different profiles. Only the profile No. 2 is sort of correct, but it is vastly incomplete, just quoting his name and school. We don’t even know his function and have no idea about his other activities. In addition, some of the profiles are pretty funny. My favorite one is the one that actually identifies Jiri as a sort of landlord of Zofin Palace. In reality, Zofin Palace is just the venue of a regular annual conference Jiri’s department is organizing.

The conclusion

I don’t think that people at Zoominfo don’t try hard. They certainly do. The problem is a more serious one: the task to process context of keywords exceeds capabilities of today’s technologies, even if we limit this task to search in a particular context only (e.g., search of names and positions, as Zoominfo does). The idea itself is not bad, but it is a too ambitious one. Generally speaking, the complexity of this task is close to the problem of an automatic text comprehension and translation. Zoominfo’s case just illustrates that we are not at this stage yet.

This is a very clear message that shouldn’t be overlooked. It is (yet) very hard and even contra productive to automatically work with unstructured information, even in very special scenarios. On the other hand the syntax approach (PageRank) works well; the problem however is that its mechanism is already “milked to death”.

The solution?

To get better search results, we will have to add some additional syntax to the web. We should do it smartly – we cannot expect too much work from users, but in the same time we should make this web extension a clear advantage for everybody who joins.

There are many applications already that tackle the internet search problem this way – social networks can serve as a good example; thanks to their growing popularity they are in fact turning a significant part of the internet to a structured form! Another interesting example is the Friend of a Friend (FOAF) project.

We will however try to formulate a more general approach based on Unique Personal Identificator (UPI). It is actually a nice paradox that Zoominfo (and not only it) would greatly benefit from such a system. On the other hand, if the internet had UPI, applications like Zoominfo would not be necessary at all...

Labels: , , , ,

Monday, March 13, 2006

What Will Supersede PageRank?

Today we live in a world ruled by PageRank. Every web page has its specific rank that says whether it is valuable to the internet community or not. There is however one problem. There is nothing like a “universal” internet community per se. There are just people with different priorities, interests, expectations.

Although PageRank was a big success of its days (being able to distinguish between valuable content and the “mess” of the web), more and more people understands that the “majority” approach, that fits well with broadcasting media, is not suitable for the internet, which is by its nature an interactive medium, able to personally identify its users.

“I don’t want to only see the stories that most people are interested in, I want interesting stories.” (Dave’s Wordpress Blog)

OK, this is a reasonable expectation. But, how to move on? By replacing an “universal” PageRank with an “personalized” one?

A “personalized” PageRank

Page Rank is a brilliant piece of thinking. It was able to make use of the only semantic information that is embedded in the web syntax (the links) to evaluate quality of pages. By processing statistics of links we can understand which pages are most linked to, and this in fact allows us to access the vast amount of work of people who already read and evaluated these pages and created links to those they considered valuable.

But the links are already “milked to death” and there is nothing other in the web syntax that would give us an additional clue to quality of web content. So any attempt to move forward with the quality of web search would require introducing some new piece of syntax to the web, or, put it simply, something that would make the web content more structured. Yes, it is a tremendous task, but not impossible. And in fact, it is already happening.

Towards a more structured web

There are two possible approaches to adding more structure to the web:

  1. Growing popularity and thus mass penetration of structured applications, like social networks.
  2. Introducing a new piece to the web’s syntax, that would be seamlessly integrated to the existing web. My candidate: the Unique Personal Identificator (UPI).

These are quite different approaches; while the first one is based on mass adoption of structured applications, the second one is based on adoption of simple additional syntax by users. Let’s start with the first one for now.

Social network as a search engine

Social network is in fact an application that consists of

  • a specialized web search engine coupled with
  • a specialized web hosting service.

This approach has a clear motivation: the specialized search engine greatly benefits from being able to work with upfront defined structured information. So, for example, if we assume that the name is always filled in a field called “name”, company name in the appropriate field “company” (and is in addition related to the unique ticker symbol), education degree and country are selected from a pre-filled list etc., we are able to provide far better and far more relevant search results for our predefined queries than any full-text based approach can. So we are just porting the old good theory from traditional database systems to the internet. Ideally, the entire web should be structured this way!

Growing popularity of social networks

But now the interesting piece comes. The web is in fact becoming more structured, thanks to these applications. Because the search in social networks really works (well, structured search worked in traditional databases since 60’s, so why not here), these applications become useful and thus popular. The biggest social networks today contain tens of million of users and put profiles of these users on the web. Thanks to this development, a significant piece of the internet content is becoming structured in a very formal, traditional “database way”. We can even say that the web is becoming a more organized place.

Wider consequences of social networks

So there are now millions of users on the web, who took the time to create their personalized and structured profiles, and who keep these structured profiles updated. This is an amount of work that cannot be overlooked. In fact, it could already be compared (at least to certain extent) to the effort, which web users invested into linking their pages. This growing piece of structured web content will serve as a special (and welcomed!) input to universal web search engines. It can greatly improve their search capabilities in the areas where applications like social networks force people to use “strict syntax”.

Vision

This in fact doesn’t mean anything else than introduction of new syntax rules to certain application areas of the web. It is fair to expect that there will be more and more applications like social networks over the time. All these applications will have one thing in common: they all will motivate users to use the internet in a predefined, highly structured way. Whether this will result in structured personal profiles, product descriptions, descriptions of calendar events, or others, all this information will turn the internet to a more structured base of data. The amount of structured content on the internet will grow and will become a goldmine for any search engine of the future. As a result, traditional full text based web search will be complemented by more efficient tools in all areas where possible. Thank to this development, search will certainly improve. But for a really significant improvement, we should dethrone PageRank from its role of a sole and universal expert for evaluating information relevance.

PageRank Replacement?

To do this, we should implement a shift from evaluating pages to evaluating users. This would be a true revolution in the web search allowing us to search personally relevant information.

However, as we already said, this would require introducing a new piece to syntax to the entire web. Very difficult concept, indeed! Could we find out a method how to persuade users and developers to adopt this new piece of web syntax? Let us think about it next time.

Labels: , , ,

Live.com - too deep innovation

There is lot of areas where we should innovate the web search, except of one – user interface. It was Google’s big contribution to the internet community to go the simplest way. No flashing banners, no “sexy” layouts. Just a very intuitive text list. And a page navigation that uses our own browser functions. What could be nicer and more practical?

And now have a look at live.com. Its search results are displayed in a fancy window and end some 5cm above the bottom of page. You intuitively need to scroll – but oops! No scrollbar is there. Just two strange and almost invisible (because made in light grey on white background) arrows. Should we move them? Click on them? Click on the bar?

This is not the way to go. As Microsoft Monitor blog puts it: “I see the new doohickeys--slider and macros--as adding complexity without significantly improving search relevancy.”

Microsoft uses the extra white space under the search results for a message Help us improve. Interesting enough – if they really improve in this matter (and focus their innovation efforts to the right areas) the place for this message disappears automatically...

Labels: , , ,

Sunday, March 12, 2006

What is Missing In the Ray Ozzie’s Live Clipboard Concept?

Let me say a few words to the new initiative of Ray Ozzie, who proposes the universal clipboard for the internet. I would like to show a slightly wider approach that could be better positioned for a mass adoption because - in my opinion - it better corresponds with the nature of today's internet applications and user’s expectations.

Ray Ozzie’s Concept

Ray envisions a standard for interchanging structured information between web applications (e.g. web calendars and address books) and calls it an “extension of the clipboard user model to the web”.

Quote:

And what was the most fundamental technology enabling “mash-ups” of desktop applications?

The clipboard. And a set of common clipboard data formats.

Before the clipboard, individual applications (such as Lotus 1-2-3 with its Copy and Move operations) enabled intra-application data transfer – in a world largely designed around a single running application. But the advent of the multi-application user environment, combined with the simplicity of the Select/Cut/Copy/Paste/Clear model, suddenly empowered the user in ways they hadn’t previously experienced.

Reading these lines, no doubt the concept and its reasoning sounds interesting. But when I had a look at the screencast of a Live Clipboard demo, a big question emerged in my head. Will the Live Clipboard really succeed? Is this the right application for the internet world?

I don’t think so. The user’s perspective changed significantly since late 80’s and 90’. These days, people expect more from internet applications than they expected from PC with Windows. They would like a real automation, not just a tool for manually moving (even complex) data.

And this is why I think developers will not be too excited to implement this concept – it will not bring any real competitive advantage to their products.

Why the Clipboard Was Adopted

Now more from the developers’ view. Back in 1985, when Windows 1.0 first appeared on the market, the battle was not about pushing the clipboard; it was of course about pushing Windows and reach its wide adoption by developers. Clipboard was just one (and certainly not the most important one) "selling point" of Windows (the really important selling points were: GUI, ability to execute multiple graphical applications at the same time, virtual memory, system's own device drivers). But as soon as developers decided to move to Windows platform, implementing all Windows features (including clipboard) made a good sense for them, as it differentiated their product from its DOS competitors. And regarding clipboard itself, they of course had no alternative to it. The platform was owned by Microsoft, and Microsoft also defined all the standards of data interchange.

The Difference

So, what are the chances of Life Clipboard for its adoption by developers?

Three things have changed since 80’s:

  1. The platform is not owned by any single company
  2. While it would still make sense for the developers to implement a “rich clipboard” type of functions, it would not bring them any competitive advantage (while seamless interoperability with other applications was a reasonable competitive advantage in developer’s decision to port their application to Windows).
  3. The user expectation changed (we will cover later)

Given these facts, the motivation of developers to implement Life Clipboard is very weak and is in fact a “chicken-and-egg” problem. The effort to implement this function pays off only when there are enough applications that support it. On one hand, Microsoft is a strong company, so the standard is certainly not dead. But...

User’s perspective

But on the other hand I always think we should strive for more; for something “more sexy”, which would really make a clear difference for the user.

Well, back in late 1980s, the clipboard was no doubt a big step forward. Instead of having to save file, exit the application, launch a new one and then import the saved file we got a very friendly, fast and useful tool. But even this example clearly shows that we shouldn’t exaggerate the sole role of the clipboard – without an ability to run multiple applications at once, clipboard would be virtually of no value to users.

And the same holds for the Life Clipboard concept. Something is missing to it – yet – to make the concept really appealing.

We should strive for more!

Let us then think about an idea which would be really “sexy” by itself; an idea that would excite developers and would motivate them to further extent it. To me, the manual “Cut & Paste” model doesn’t fall in this category any more. It was OK in the DOS time, but the expectations have changed since then. Today’s users would expect something more automatic and more convenient than just a tool for manual transfer of appointments and business cards from one application to another (and it doesn’t matter that all these items contain rich information).

So, what could make the real difference today?

The Vision

Imagine I find an interesting concert on a web page and want to attend. I would expect to find a simple button on the page which I can press in such a case. In the same time, system identifies me (which is technically possible already today) and asks me to confirm payment for the ticket. It also contacts directly my (web) diary (which is however automatically synchronized with any personal device I use) and writes down the event. In case there is any conflicting appointment, the system lets me know before requesting my payment. Sounds better than a simple cut and paste? Yes, indeed – because this is a real automation. But the story doesn’t finish yet.

Now, the concert is cancelled – you know, musicians are just people, so this may happen even in the future -:). Instead of driving there and finding a closed hall with a crowd of angry people, the appointment will be automatically removed from my diary (again, no technical problem – who records a particular information is also allowed to change it) and I will be informed about the change just as it occurs (which may be just a function of my diary – so nobody needs to know my personal email, IM, or whatever channel I use; nobody also needs to learn in which way and when I would like to be informed about changes). This would be a good, useful application. And still no rocket science!

Labels: , , ,

Saturday, March 11, 2006

Why Are Social Networks Dying?

Dear Blogosphere:
Skip directly to the paragraph labeled “socioware”, if you want to start directly with my recent thinking about social networks. But as this is my first blog here, let me start with a few words about myself. First, to my motivation: There is one problem with today’s internet. We can find quite easily products, texts, even maps there, but it is very difficult to find people with similar interests and similar way of thinking. To do that, we have to do some extra work – like the one I am starting now. In the best case, a fruitful discussion starts and the right people emerge from the discussion.
By the way, I understand that you (the readers/discussion partners I seek to find) will read these lines when and only when I am persistent enough with my publishing efforts and cover enough interesting topics in a way that is close to your own thinking; only then you may return also to the beginning of the blog one day and read these lines. And this is a very interesting thing about the blogosphere itself: right now I am actually “broadcasting against the wall” and writing something for my potential future readers. But it depends only on my effort whether I succeed to remove the wall between me and you one day. There is no shortcut today. But there might be one in the future – and we are already coming to socioware visions :-). But allow me two more paragraphs before we get there...
Second, about myself. At the time of writing, I am an 42 year old mathematician, recently earned Ph.D. in Computer Science, who lives and works his entire life in Prague, Czech Republic. I am new to English blogosphere, but not new to publishing at all. I wrote two books, one called e-Business for managers, one sci-fi novel “Stab in the back on the information superhighway”, one TV serial “Man and computer” (produced and broadcasted by Czech TV in 1992). All my publications cover my hobby and lifetime passion: trends in the IT and the consequence of IT developments to various areas of human life and business. These days I write a regular column for the Czech most popular weekly economics magazine Ekonom and for 15 years I am regularly publishing for Czech edition of Chip, the most popular computer magazine here. There is however one BUT: All my publications (with a few exceptions) are in Czech language.
So I am “Mr. Nobody” in the English speaking blogosphere and starting nearly from scratch here (well, I delivered some English presentations and gave some interviews when I was Senior Manager of “Big 4” consultancy Deloitte; and I also put some articles on calresco.org). But most of my work you will find is in Czech.
These days I am teaching e-business and IT Management courses on University of Northern Virginia in Prague (one interesting implication of September 11th – it is more difficult for students to get their US student visa now; this lead some US private schools to open campuses in other parts of the world); sometimes I teach in Beijing for the same school. In addition, I’ve just found a producer for my new educational TV serial on future technologies “Stepping Forward”. The entire serial is based on a story that is placed in the future. This serial will be in English, too. There are lots of strange things heroes of the serial (in their “innovative professions”) have to go through. And I would love to discuss at least some of them with you, too, in some of my next posts.
Enough about me for now. I hope I’ve just covered what is my life-time interest and hobby: visions of IT applications and visions of the internet applications; in the same time, we are not finished yet, as this will be the subject of this entire blog – as long as time, energy and passion allows me to continue (well, I hope for at least one additional post-:))...

Dying Socioware

Contemporary social networks, like LinkedIn, OpenBC, or Orkut have a flawed business model. They all try to earn money on selling the so called “premium membership”, that means access to those parts of their membership database that is not “linked to” us yet. This concept is in direct contradiction with the original motivation of these applications, which was to establish close networks of trusted friends.
All these applications build a concept of “friendship” that is far too simplistic and does not correspond to any kind of real-life relationship of our real world. “Friendship” in these networks is established when two contacts agree via email to “connect”. By this agreement, they make their own contacts mutually visible; if I connect to somebody, I can see his contacts, and I can search in contacts of his contacts. If I search elsewhere, I just get a result like “partner at Deloitte; if you want to know his name, buy our premium membership”.
What are the consequences of such a business model? To make the application work for me, I am motivated to be as opened to accepting and offering connections as possible. As a result, an average number of connections in these networks constantly grows. “Hubs” and “superhub” users appear that connect thousands or even tens of thousands of people. LinkedIn recently decided not to publish the actual number of connections of particular user any more, but this is of course not the real solution of this problem; it is just its manifestation.
The longer term consequence is pretty clear and sad in the same time: everyone will be a connection (a “friend”) of everyone one day. While this is happening, the value of friendship that was originally meant as the essential information of social networks, degrades and will eventually be lost. We can even say that socioware is dying these days, thanks to its flawed business model.

The Way Out?

So, if the business model is wrong, there should be a way to fix it. It would actually not be that difficult. First of all, let us have a look at what are the main characteristics of today’s concept of “friendship” in social networks.
“Friendship” is:
  • digital (yes/no – a person is friend or is not)

  • static (once agreed, we are friends; OK, in theory, we can break, but it would be too painful in current implementation :-))
In addition, friends are our key to make the search functions of the network accessible to us.
I am pretty sure that the key to survival of today’s social networks is to frankly answer the following two questions:
  1. How to re-define friendship (of course, in a “non-digital” and dynamic way)

  2. What should be its purpose (and this implies another question: what should be the business model of social networks?)
Nobody is perfect in his reasoning, so in fact I’ve decided to consult this problem with some really good professionals in the Czech internet community. There is a server called Lupa.cz which is focused on the technology and internet, but, what is more important, which is also home of a very strong community of internet professionals. Actually everyone who means something in the Czech IT business reads articles and participates in discussions on this server.

eWorkshop

Six years ago, in Spring 2000, I tried to launch an experimental format on this server. I called it “eWorkshop” and based it on a simple idea. In the first day of an eWorkshop, an article about an interesting topic is placed to this server. This article formulates a problem (like the one above) and ends with some open questions. People are encouraged to participate in a discussion, which is moderated by the author; in the evening, this discussion is summarized in a new article. These steps are then repeated three or four times, and eventually, at the end of the week, the communityitself comes up with an interesting proposal.
I must say I was not sure whether this will ever work when I was running the first eWorkshop back in 2000. But eventually I was really surprised how fruitful the discussion was and how much appreciation I’ve got from the community. It is true – the more you give, the more you get. So no doubt this discussion lead to really innovative ideas and views I would be myself unable to come up with. In some areas (suggestions how to improve web search) we even came up with a solution that was eventually implemented (independently on us) by a commercial Israeli firm. I’ve run eWorkshop on this server six times since then, but last time in 2003.

The Outcome

This February I revived this format and asked the community about its ideas how to salvage social networks. And I must say, the community doesn’t age :-). Even today it came up with very good answers and suggestions that would really solve some big issues of existing socioware. But the biggest surprise to me came later on in the discussion. From a concept of static network of nodes, which serve as the universal basis of today’s socioware, we moved to a much more general an interesting approach. We proposed a concept based on of UPI (Unique Personal Identificator) that would solve better and more generally the original purpose of socioware: finding people that are similar to us in their way of thinking, work and behaviour. This concept could be implemented as a natural extension of existing search engines and would convert web search from being based on universal evaluation of web content quality (PageRank) to a personalized method.
Being able to do that, we would be able find ideal candidates for our real friends. Such a nice change if we compare it with static applications called “network of friends”, or socioware. Well, the internet is dynamic. And this will be our next topic if you stay tuned.

Labels: , ,