Slyck.com
Search Slyck  
Anonymous
Welcome
 
Improving Gnutella Stats
February 25, 2004
Thomas Mennecke
Font Bigger Font Smaller
No other P2P network has as many clients as the Gnutella community. Several clients, such as LimeWire, Shareaza, Morpheus and BearShare have been among the more popular programs. Other, more obscure clients such as aBitCool, VPutella, and HelloWorld manage to float around among the nearly 30 different Gnutella iterations.

The improved crawler is an effort currently underway by the LimeWire team. With so many different clients on the network, ascertaining an accurate user count is complex at best. We spoke with LimeWire CEO Greg Bildson to explain some of the details about the improve network crawler:

"I'm currently trying to motivate Gnutella developers to conform to the standards that make accurate counting possible in the crawl. This simplifies reporting the Leaves and Peer headers when presented with the "Crawler: 0.1" header - clients that don't support that should at least ensure that true ultrapeers are only added to the X-TRY-ULTRAPEER header. Both BearShare and LimeWire originally forgot to include the UserAgent in the crawl feedback but that is starting to become universally prevalent now."

LimeWire's improved crawler, available here, gives an interesting perspective into the popularity of the various Gnutella clients. From the latest network crawl, it appears that LimeWire, Shareaza and Morpheus are the top three clients, followed closely by BearShare, Gnucleus and gIFT. The popularity of LimeWire and BearShare seems to be confirmed by Download.com, as both clients have approximately 19 million downloads.

While interesting, many have questioned the relevance of Gnutella in recent months as more advanced networks have become more prevalent. Even "old skool" networks like DirectConnect have managed to find themselves in greater favor among the file-sharing community.

Gnutella's lack of cohesion in the past has fractured the network and hampered progress towards a unified community. However, it seems lead players are making a great effort to remedy this situation. As communications and cooperation improve, so may the Gnutella network.

Greg Bildson also provided us with a detailed description about the network crawler:

Slyck.com: Please explain the "Unknown Row".

Greg Bildson: The UserAgent is whatever was encountered in the UserAgent header when connecting to a host. The Unknown row is for those hosts that did not report a UserAgent. LimeWire and BearShare use to not report UserAgent when presented with the special "Crawler: 0.1" header that the crawler uses to get access to clients. So, most of the "Unknown" results are attributable to BearShare and LimeWire. Given that LimeWire and BearShare have both just put out new versions, we are hoping that the unknown count will go down.

Free Usenet Access
Slyck.com: What is the difference between "count" and "unique leave"?

Greg Bildson: The Count Column is a raw count of how many true connections the crawler made to these clients. The intent of this crawl is mainly to connect to ultrapeers but some leaves do get crawled as well. However, there are too many leaves to crawl so the crawl tries to hit the ultrapeers and get leaf information indirectly.

This leads us to the Unique Leaves column. Simply put, this is a count of the leaves under the crawled ultrapeers. For BearShare and LimeWire, these leaves will mostly be their own clients.
(Details: Clients that support the Crawler specification
(http://groups.yahoo.com/group/the_gdf/files/Development/Crawler%20Compatibility.html) will output the IP:port for each leaf connection. The number of these unique IP:ports is counted for an accurate representation of the number of leaves under the crawled ultrapeers. The leaves reported in this fashion are not crawled subsequently. )

Slyck.com: Could you please define the Pongs Received, Peers Headers, Leaves Headers, Xtry Headers, Xtry Ultrapeer, Ultrapeers and Rejected Connections columns?

The Pongs Received column is a count of how many pongs came back from clients of this type. For clients that support the Crawler specification, this number should be zero. Given that LimeWire still reports a large number of pongs, this shows that there are a fair number of our older clients still out there. The old style of crawling support was to allow the crawler to connect temporarily and request pongs.

The Peers column is a count of how many Peers headers were received from a host. The Peers header is a feature of the Crawler specification that reports all the IP:ports of other connected ultrapeers. There is normally only one of these headers with LimeWire clients but other clients have been known to issue multiple headers.

Similarly, the Leaves column is a count of how many Leaves headers were received from a host. They are also a part of Crawler specification.

The Xtry Headers column is a count of how many X-TRY headers were received from a client. These headers are the standard way to report non-ultrapeer client addresses during a Gnutella connection.

The Xtry Ultrapeer column is a count of how many X-TRY-ULTRAPEER headers were received from a client. These headers are the standard way to report ultrapeer client addresses during a Gnutella connection.

The Ultrapeers column is a count of whether the clients reported themselves as being an ultrapeer.

The Rejected Connections column is a count of rejected connection (503) responses upon connection. I noticed today that BearShare tends to report 503 Busy when given the Crawler header; whereas, LimeWire returns "404 Success." Regardless, they both still report the required leaves and peers information.

The hope for this extended crawl information is to make the first two columns tell the whole story of the crawl. Given perfect reporting of the ultrapeers and their leaves, the network crawl would be totally accurate. Given that BearShare and LimeWire mainly accept their own leaves as clients, a fairly accurate vendor count should be possible for many vendors as well. If full support for the Crawler specification is implemented on the network, the crawl can also be sped up. Currently, the crawl waits 40 seconds on each connection for pongs. If most of the network has Crawler specification support, each connection would be done nearly instantaneously.


This story is filed in these Slyck News categories
P2P Clients :: Other Gnutella Clients
File-Sharing/P2P Related :: Interviews

You can discuss this article here

© 2001-2008 Slyck.com