October 22, 2002

File sharing programs found inefficient

Recent findings indicate that Internet peer-to-peer file sharing programs such as Gnutella and Kazaa, though incredible popular, may not be as efficient or well-designed as initially believed. University of Chicago researchers have found that these programs, so dangerous to both the recording artists and to the record industry as a whole, are programmed using a system that does not correspond with the underlying programming of the Internet.

The University's research team, which consisted of graduate student Matei Ripeanu, computer science professor Ian Foster, and Adriana Iamnitchi, a graduate student in computer science, wanted to research peer-to-peer systems because of the natures of these systems. "There are few technical or natural systems that integrate tens of thousands of elements," Ripeanu explained. "Most of those systems that do are static, and engineers have devised strict rules to bind them. In contrast, computers participating in Gnutella network are highly dynamic, share little information, and exist without a central authority that can control them. It is thus interesting to investigate the network properties that emerge from this apparently chaotic behavior."

According to Ripeanu, Gnutella nodes, called "servants" by developers, perform tasks usually associated with both servers and clients. "They provide client-side interface through which users can issue and view search results, and server-side services by accepting queries from other nodes, checking for matches against their local data set, and responding with corresponding results," he said.

In order to become a member of the network, a servant node has to open one or multiple TCP connections with nodes that are already in the network, meaning the nodes in the Gnutella network are actually the computers that run Gnutella programs. For example, each computer that runs Morpheus (a Gnutella-compliant application since spring 2001) acts as a node in the Gnutella network. The links in the systems are simply the open TCP connections between different computers.

To collect data about a system as diverse and complicated as the Gnutella network, the research team designed a crawler that would operate as a servant in the Gnutella programming, using the Gnutella protocol to collect topology information.

The researchers concluded that Gnutella's nodes are the primary cause of its inefficient Internet interface. The nodes simply do not fit the Internet's "tumblers," Ripeanu said, explaining why the system does not correspond with the underlying Internet in which it operates. "We have performed a number of experiments that confirmed that nodes which are close in the virtual Gnutella topology are generally far apart in the Internet's physical network," he said.

Although Gnutella might try to send information or programs by the fastest and most direct route, in reality it could be sending them by the longest and most time consuming way, according to the study. In doing so, the Gnutella network wastes not only time, but also bandwidth on each individual node, making each download take longer and use more bandwidth.

Since the team's findings were released, the Gnutella network has made several important changes to its system. "Today the network is organized in two tiers: a smaller subset of 'super-nodes' route most of the traffic and follows the original Gnutella protocol; the regular nodes, which make up about ninety percent of all nodes, act only as clients to the 'super-nodes' and do not route traffic," Ripeanu said. "A more aggressive bandwidth-saving alternative, called the Query Routing Protocol, is gradually being introduced to help Gnutella's expensive bandwidth mechanisms."

Ripeanu does not personally take credit for the improvements being made to the Gnutella network but believes that the study he was part of helped bring the need for such changes to light. "Although our study did not directly suggest these two improvements, it did channel research by stressing the importance of proper traffic management," he said.

According to Ripeanu, it is improbable that other peer-to-peer programs, such as Napster and Kazaa, are organized along the same lines as Gnutella and, consequently, do not suffer from the same inefficiency. "It is most likely that their structure is different from Gnutella's: both rely on central servers to speed up searches (Napster) or manage the network from outside (Kazaa). In contrast to this Gnutella is completely decentralized," Ripeanu said.

It is, however, impossible to know for sure how differently or similarly Napster and Kazaa's peer-to-peer servers operate since it would be very difficult to research the particulars of their systems. "Both Kazaa and Napster's peer-to-peer networks employ protocols making it difficult for a third party to investigate their structure," Ripeanu explained.