I want to start off by talking about a phenomenon that many will be familiar with. You may have thought of it, though as just an apocryphal anecdote or a popcultural phenomenon.
The Small World Phenomenon
The correct way to state the Small World problem is this: it's a claim that any two people in the world or some large population can be connected by some short chain of acquaintances. Obviously, we're thinking of the world as a network. The social world consists of some very large number of individuals who are connected by relationships or social ties. These ties could be business ties, friendship, family ties, and so on. Think of dots connected by lines. But what do we mean by "short chain?" For the moment, we'll stick with an intuitive notion that short means orders of magnitude smaller than the population of individuals. So if the world has six billion people in it, then six steps in a chain is a lot less than the total population or than say, 1,000 degrees of separation.
This is not the same phenomenon as meeting someone at a cocktail party and discovering that you have a mutual friend who went to your college but works at their company and saying "gee, it's a small world." That's different. That has to do with human psychology. We tend to pick out patterns in the world and highlight them more than other phenomenon. The problem is that we don't think about all the people we have met at cocktail parties who don't have previously undiscovered connections to mutual friends. That's a problem of getting the denominator wrong. Coincidence is the likelihood of unlikely things.
Even those times when you run into someone, with whom you don't share a mutual friend, you're connected by some other chain of acquaintances that is still short. This is supposed to hold true for everybody. First, today, I'll try to convince you that the small world phenomenon is very surprising and then I'll convince you that it isn't surprising at all, but very generic. You'll end up with a more sophisticated reason for believing what you already knew! [laughter]
Applications
The problem is actually a sociological problem with a long history, but it has relevancy to a surprising range of fields. Even if you don't care about it from a sociological point of view, it may be relevant in a field closer to heart. The small world problem is a way of thinking about the structure of social networks. It's amazing how little we know about social networks, but we've made progress in the last few years. Social networks are conduits through which things like information can pass, through which influence can be exerted upon others, and as a mechanism for learning about the world. One of the strategies that we seem to have evolved for handling uncertain environments is to look at what other people are doing. If you're driving down the road and everyone starts to pull over to the left, you begin to think, "what am I missing?" Even if you can't see any obstacle, you assume that there is something ahead that you can't see, so you get in the left lane as well. The small world problem is also relevant to diffusion of innovations or ideas, for winnertakeall markets, bubbles in financial markets and architectures for redistribution networks (airlines, the Internet).
Social networks are also valuable when you use the ties to initiate action. Finding a job is an example. My boss at Columbia is finishing a book on doormen in New York City. If you decide tomorrow that you're sick of trading stocks and you want to become a doorman, you can't. You can't apply to be a doorman in New York. There's no market. Everyone who is a doorman in New York City got the job through some connection. If you don't have that connection you can't have the job. In the home of capitalism we have this system that's anathema to free markets. [laughter] This is related to using networks to solve search problems. You know the scenario: your boss has just given you a project that's due tomorrow, you know nothing about it, and you have to find someone who can help you find the right information so you can solve the problem. How do we search through our social networks?
History of the Problem
The problem has been around since at least the 1920's. An Austrian writer named Karinthy wrote a short story called Chains where he speculates about how he can connect himself to anybody in the world. He says that he can do it with five intermediariesthe same of six degrees. He just made it up. It has never been translated. One of the guys who works on these problems is a physicist at Notre Dame who actually works on these problems and he translated the story for us.
Pool and Kochen started thinking about the problem at MIT in the 1950s. Pool approached Kochen because he was interested in how people mobilized political power via connections. Could the connections be used to generate real political influence? He realized there was a math problem in it somewhere and he needed Kochen's help. They created a working paper that circulated for some twenty years before it was finally published in the journal Social Networks.
Their work attracted the attention of Stanley Milgram, a psychologist and an experimentalist. You may have heard of his infamous experiments concerning obedience to authority that used electric shocks. He was very controversial, but very good at designing experiments. He wasn't a theoretical guy, though. He wanted to invent an experiment to test Pool and Kochen's thinking. So he created the small world method.
He picked a single target person, a stockbroker who lived in Massachusetts. Then he chosesupposedly randomlythe senders. But Milgram was a sneaky guy. He knew he was right, and wasn't too concerned about the nittygritty of what the results showed. He wasn't actually dishonest, but the details are a little shifty. But I'll just tell the classic story here. He picked almost three hundred senders from Boston and Nebraska. In his first report on this experiment, he claimed 300 people from Nebraska, neglecting the Boston part of the data. [laughter] Half of the ones in Nebraska were blue chip stockholders, so naturally they had stockbrokers. Nevertheless, these people received a letter from Milgram with the crest of the university on it (even though Harvard never actually approved the study). It looked like a university study. This was prehuman subjects regulations. The challenge was that each person had to get the letter to the target person. But they couldn't send it direct unless they were personal friends with him. They had to send it to someone who was a personal friend of theirs who they believed to be closer to the target than they were. The process repeats, creating chains. Milgram tracked these chains, and many of them petered out before they reached the target. About 64 did reach the target. The average length of the chains was about six.
Random Graphs and Real Social Networks
So, what's the big deal? Pool and Kochen had already preempted the result because it was trivial to show that short paths existed between chosen pairs even in very large populations when the social ties between the pairs are allocated at random (uncorrelated). The randomness is a key term. If you have 100 friends and each of them has 100 friends, in a random setup, each individual is choosing their friends from the whole population at random. The probability that any of my 100 friends are going to pick each other is roughly 100/6,000,000,000. This setup creates a random graph. The number of people you can reach in n steps goes up by 100^n. It increases exponentially. In five steps you could reach 10 billion people. It's trivial. What's all the fuss about?
Actually, the model is not very realistic, and Pool and Kochen pointed this out. Random ties are not realistic. If you have 100 best friends and you ask them who their 100 best friends are, there is likely a great deal of overlap. In other words, social networks exhibit homophily, or the tendency of like people to group together into clusters. You're likely to be friends with people who are near to you geographically or work in the same business or industry you do. There's another phenomenon at work called triadic closure. It's a dynamic process. Imagine you move to a new city and meet a few new people. Who are you likely to meet next? Practically, the friends of the new people you have just met. This creates a triad (you, your new friend, and your new friend's friend). An intermediary is required to lower the transaction costs. We just don't walk up to strangers and try to make friends. There's a screening and introduction process.
The interesting small world problem concerns how it is possible for all this clustering to exist and still be small in a global sense. The analysis becomes extremely difficult.
After Milgram, not much happened for about 30 years. These experiments are difficult to do. It's hard to get people to participate, even if it's as simple as sending an email. Largescale network data are hard to collect. Pool even recognized the problem of finding out how many friends the average person has. People forget their list of friends all the time. The results of such studies are all over the map. You simply couldn't collect a lot of network data until very recently with the advent of the Internet. AOL, for example, is a large enough social network to use, but there are also privacy issues and much of the data is proprietary. So, the theory was too hard to do without computers in Pool's day and good experiments were too expensive to run.
Simulating a Universe of Networks
Now that computing power is readily available, we have adopted a different approach. Instead of asking whether the actual world was "small", we decided to imagine the entire universe of networks. What are the conditions under which any network can be clustered and still be considered "small" with respect to the chains of relationships? Imagine starting off with a network at one extreme that's very highly ordered (like a crystal lattice, completely geometrical) and then systematically randomize it until at the other extreme there is a completely random network. What goes on in the middle of that continuum? Can we find conditions that allow networks to be highly clustered and still small?
We started with a onedimensional lattice with the nodes arranged in a ring. Each node was connected to its immediate neighbor and also to its neighbors two nodes away. Each node had four connections. Or we could assemble the ring so that each node had any number of immediate neighbors, not only two (in the case of ten immediate neighbors, each node would have 100 connections). This model served as our definition of order.
Then we applied to this perfect lattice a random rewiring process. We went through every link and rewired it or not based on some probability. We set "p" at some number between 0 and 1. If a randomly generated number for a connection was less than "p," then we changed the connection to a new randomly designated node. If p=1, every link in the network is rewired, and that simulates a random graph. Every network that has values of 0<p<1 is mostly ordered but has a few random ties.
What were the path links or typical separation between nodes in the network, and what was the probability that two of the nodes will "know" each otherhow redundant was each node's neighborhood? That's what we wanted to investigate. First, what happens at the extremes of the model? When p=0, the path lengths are long. Imagine 1,000 people in a ring where each person is connected to their four nearest neighbors. The only way to get a message to someone on the opposite side of the ring is to tell someone on your right or left and have them pass it on. There are about 250 steps around this particular ring. Two hundred and fifty degrees of separation in a network of only 1,000 people is pretty large. With billions of people, that would mean millions of degrees of separation. But the clustering is very high. The people next to you know one another and this degree of redundancy seems very familiar to us.
At the random extreme, there's no clustering but the path lengths are short. The short path length feels familiarwe expect that in the real worldbut the lack of clustering is unfamiliar to us. There's no redundancy. It would be a reasonable intuition, therefore, that networks could either have high redundancy and high separation or low redundancy and low separation. The same intuition would say that the small world phenomenon can't existthere can't be networks with high redundancy and low separation. But it turns out that for values of 0<p<1, path length and clustering change at different ratestheir behavior between p=0 and p=1 is nonlinear. If we plot length and clustering, when p=0.01 (a 1% chance of rewiring of each node), the length drops almost all the way to its asymptotic value, but the clustering has changed very little. This region around p=0.01 is the realm of small world networks.
How does this work? The intuition is quite simple. The behavior of the path length variable is governed by the number of shortcuts, or the number of random links. If you rewire just five lengths in a perfect lattice, it cuts the average path length of any pair of nodes in the network by half, regardless of how many nodes there are. It doesn't make any difference. That's quite surprising. The length is governed by p*N but the clustering is governed by the fraction of random shortcuts.
Prevalence of Small World Networks
When you have a large population, then a small fraction of shortcuts will have a very big effect on the length but leave the clustering the same. This is a very simple model. But the simplicity is a very powerful feature. There isn't anything in the model that has to do with real social networks. There was nothing based on an understanding of social process. But it does show that very little is required to generate a small world network. Any mechanism that generates clustering along with some tiny element of randomness will lead to a small world network.
We should expect to see this phenomenon in all kinds of networksit should be extremely widespread. So we looked at real networks to see if we could find evidence of this widespread nature.
We started with movie actors. There are about halfamillion actors and actresses and 150,000 movies recorded in an online movie database. It can be thought of as a proxy for a social network. If two actors acted in a movie together, they're considered connected. The clustering and lengths can be computed and they correspond very well to the prediction of a small world network. As a side note, one of the things that made this a pop culture phenomenon is what's called the Kevin Bacon game, which a few frat brothers at William & Mary college invented several years ago. They must have known a lot about movies because it's hard to play without a computer. You start by picking two actors and then see what chains are made to connect them. They concluded that any actor could be connected to Kevin Bacon in four steps or less. Suddenly he was the secret fulcrum of the movie universe. They could have picked anybody, and the same phenomenon exists. Kevin Bacon is not nearly the most connected actor in Hollywoodthat happens to be Christopher Lee. The difference in connectedness between the top thousand actors is negligible.
We also looked at the power grid of the Western United States, the neural network of a worm, the World Wide Web, ownership networks of German firms, metabolic networks, collaboration networks of scientists and boards of directors of Fortune 1000 companies. All these networks turn out to be small world networks: highly clustered with short pathways.
Solving a Different Problem: Global vs. Local Knowledge
What does this mean? Everything turns out to be a small world network. So what's interesting? Aren't we back to where we started? Obviously all these networks are not the same. We started looking more carefully at networks to see if there were different kinds of small world networks. In 1999, Jon Kleinberg identified the "Algorithmic SmallWorld Problem." His work indicated that the problem we had been working on was not the same problem that Milgram worked on. Kleinberg said that currently, small world networks are defined as those with short paths between any two individuals. It's possible to show that in a computer because computers have global knowledge of their networks, but Milgram didn't have any global knowledge of the network to be able to calculate all pathways between all points. His subjects just had some information about a target and they knew who their friends were. Beyond the circle of friends, they didn't know much about the world. All experience is local. You just know who you know. Occasionally you know something about someone you haven't met yetthat's sometimes part of triadic closure or prelude to an introduction. We have pretty good knowledge about our friends. We have hazier knowledge of the next ring out, and beyond thatbeyond two degreesit's a world of strangers. If anybody beyond two degrees is impossibly far away, then six is really a big number although it sounds small. Try calling up a friend of a friend of a friend of a friend of a friend and asking him for a job!
Nevertheless, Milgram's subjects were able to solve the problem with only local information. And they weren't rocket scientists. The algorithmic version of the small world problem is not satisfied, therefore, by the networks I just described. Here's why it's true. When we picked up a node and rewired it, we moved the other end at random. That uniform probability is a very special assumption. There is no relationship between your random ties and the lattice ties. If a node "a" is trying to use random links to get to "b", the first link might actually do a great job in getting closer to the target. But successive random shortcuts are just as likely to take you farther from the target as they are to take you closer. They don't really help much more than if you did not have the random links at all. Without perfect knowledge of the network, it's not possible to know if a random connection will get you closer to the target or not. It's the difference between broadcast search and directed search. Viruses that email themselves to everyone in your address book are like broadcast searches. Those viruses find short paths to everyone only because they swamped the entire network. But that can't be done in a practical way without saturating the channels. But what Milgram's subjects did was get the message to the target in only six steps and they didn't use a broadcast strategy. A particular subject didn't send the message to multiple recipients.
Kleinberg created a different version of the lattice. There were still local contacts and random contacts. But the random contacts aren't added uniformly at random. He used a formula that said the probability of connecting one node to any other node at some distance "r" goes down by some exponent gammait decreases with distance. When gamma is zero, the model is uniform random probability. When gamma is very large, the only people who you can connect to are the ones who are very close to you. At that extreme of the model, all contacts are local. There will be no short paths. Kleinberg investigated the entire range of gamma. He varied how correlated the randomness is to the underlying structure. In a geographic social structure, you're more likely to meet people who live close to you and not people who live far away (depending on transportation systems). As transport and information systems come into play, gamma decreases.
Across the entire range of gamma there are two regimes that divide at gamma=2, the critical value. This is also the dimension of the underlying lattice, so it's not a coincidence. In the "a" regime, where gamma is small, there are short paths but the random ties are not correlated with the underlying structure, so no one knows how to use the short paths and the chain lengths end up being long. At the other extreme, you can navigate well because everything is highly correlated with the social structure, but the short paths are missing. Right at the critical value where gamma=2, you get the tradeoff you're looking for: short paths exist between any two nodes and they can be found. At gamma=2, the curve cusps sharply. It's a very special state. This means that the world has to be a very particular way in order for networks to be searchablefor the Milgram results to work and for the world to be a small world. I'll take issue with this special state requirement later.
ScaleFree Distributions: the Power Law of Connected Nodes
There was another approach that came out soon after and took advantage of a different aspect of networks. Many real networks have a scalefree distribution. Plot the distribution of how many friends any given person has. The probability is "p" and "k" is the number of friends. The resulting curve is a normal distribution. But it turns out that a number of networks have a very different distribution. Many networks are composed of many nodes that have a few links coming in and out of them and a very few nodes that have a tremendous number of links. And it varies as a power law. The distribution of links on the World Wide Web varies as a power law. This is called a scalefree distribution. The consequence is a small fraction of very connected nodes. Some people know a lot more people than average. It's like the airline hubandspoke system. If you're traveling by air from a small town to another small town, you first fly to a hub. You jump along the hubs and then hop off the hub to get to your final destination. That kind of network is searchable if you can seek out the hubs. There was a book called The Tipping Point that claimed that some people are very connected and that if you want to get a message to someone, you need to get it to these connectors and they'll handle it.
How Do Real Social Networks Work?
I don't think this is how social networks work. There are some problems. First, consider the Kleinberg model. It assumes that social networks are based on geometrical lattices. There's no evidence of that. Now, he wasn't trying to make a realistic model of social networks, and if you were building a network from scratch, his model would be a good one to follow. But a social network doesn't come with a condition that the world has to be "just so" for the network to be searchable. There is no organizing mechanism to drive gamma to the sweet spot. We need a model that is much more forgiving.
Second, social networks are not likely to be scalefree. If you think about it, the limitation on the number of friends you can have is not a function of the size of the system. You only have so much time, energy and effort to devote to other individuals. The limitation is inherent then, and it's individual and not dependent on a network. Individuals are not websites. Hubs don't really work for social networks.
At this juncture, some very old ideas in sociology are applicable. Social structures are built by membership in groups. A group is any kind of context for interaction: job, hobby, where you grew up, where you travel, and so on. We can categorize groups. Some are more similar to one another than to others. Imagine a hierarchical organization. Teams are part of departments, which are part of divisions, which are part of firms, which are part of industries. The distance between two individuals in a hierarchy is the lowest common ancestor in the hierarchy (team, department, division and so on). We think about ourselves in terms of the groups to which we belong. We catalog people according to the groups to which they belong. and yet, hierarchy is only a cognitive device that defines similarities and differences between individuals. They do not actually comprise the network, but they do influence the way networks are formed. A network is generated as a function of social distance. There is some social structure underlying the network. The probability of two people being connected is expressed in terms of this structure.
There are some other important features to the social structure model. First, the groups can be based on anything. What groups determine how we interact socially? They all do. Our relationships have multiple dimensionsmultiple groups. Each one corresponds to its own hierarchy.
The final important point is that social distance is the minimum distance across all of the dimensionstaking all of our memberships into account. Similarities count more than differences. If you and I are connected within one dimension (like the workplace), that makes us close. If you work with someone who comes from a different country and practices a different religion, those differences are transcended by the fact that you work together every day in the same dimension (the company or team or department). That one dimension is enough to make you close. This claim has a big result. It violates the principle of triangle inequality. You can have friends in America that you work with and friends in Australia who you grew up with. Those two sets of friends exist in two different dimensions. They each think that they're close to you but distant from each other, when in reality, they're only two degrees separated from each other through you. This is at the crux of the whole problem. This distance we carry around in our heads about how close or distant we are to people violates the triangle inequality.
The Solution to the Problem of the Searchable Social Network
We have two kinds of information about the world. First, we have this hierarchical information we carry in our heads about how close or far we are to different people. We think of this as a distance but it violates the triangle inequality. Second, we have the network. It has perfectly well defined distances but only local information about paths. Neither model can tell us how to get information to a distant target.
But when these two kinds of information work together, we can resolve the problem. If we look at the world as multiple hierarchies and use the clustering (homophily) parameter, networks turn out to be searchable in a very wide swath of parameters. In other words, networks don't have to be formed around the very special Kleinberg condition. There can be a number of different kinds of hierarchies and as long as people have some tendency to associate with others like themselves (homophily), then the network is searchable.
Here are some consequences. In a world of one social dimension (only one network that people can belong to), the Kleinberg condition is required for a network to be searchable. But in a world of multiple social dimensions, homophilous networks work better. Or, we may conclude that in a homophilous world, multiple social dimensions are essential for searchability, but the world is highly forgiving. Searchability is a generic property of social networks.
We've been running a rerun of the Milgram experiment over the web for the last year. As a result, we have 48,000 senders and 19 targets. Participants come from 157 countries. At the end, when we correct for the attrition rate, the median chain length is six.
In conclusion, here are some notes. The small world problem is a particularly clean example of social search, or the ability to locate remote targets using local ties. Social search is a critical aspect of problem solving when the environment is uncertain or ambiguous or when a central database or directory is absent. A technological example is peertopeer networks. By extracting the essence of social search in human organizations, we may be able to better design protocols for smarter technological networks.
