Slyck.com
 
Slyck Chatbox - And More

BitTorrent Hash Value Question

Discussion about the BitTorrent Network and the Clients used to access it.
Forum rules
PLEASE READ BEFORE POSTING: Slyck Forum Rules

BitTorrent Hash Value Question

Postby SlyckTom » Sat Apr 18, 2009 12:45 pm

Ok...another question! This one is about hash values and torrents.

A torrent file has a hash value, and each segment of the file being traded in a swarm has a hash value. The tracker examines the torrent file's hash value, and finds a swarm matching that hash. Before the client accepts a piece from the swarm, the value of that particular piece is verified by the client.

My question is, when I look at a torrent in an editor, I only see one hash value (the .torrent file hash value I presume). How is the hash value of each segment calculated? From what I've read, I think the value I see in a torrent is the mathematical total of every piece, and the value of every piece is calculated in a process I can't see..but I'm not 100% sure.
Follow us on Twitter @SlyckDotCom
Join our Facebook Fan page
SlyckTom
 
Posts: 5713
Joined: Fri Jul 26, 2002 7:22 pm
Location: New York City

Re: BitTorrent Hash Value Question

Postby TorrentMama » Sat Apr 18, 2009 4:17 pm

Lionel Hutz, court-appointed attorney. I'll be defending you on the charge of... Murder One! Wow! Even if I lose, I'll be famous!
User avatar
TorrentMama
 
Posts: 2827
Joined: Wed Aug 16, 2006 3:42 pm

Re: BitTorrent Hash Value Question

Postby SlyckTom » Sat Apr 18, 2009 5:56 pm

Thank you TM, some good info there. I think this is the answer to my question: (from the BT protocol specification)

pieces maps to a string whose length is a multiple of 20. It is to be subdivided into strings of length 20, each of which is the SHA1 hash of the piece at the corresponding index.


I'm not great at reading protocol specs, but this is how I understand it. Let's say you have "A great Movie.avi", and during the .torrent file creating process, the .torrent hash value is, for simplicities sake: qwerty 123456 yuiopa 789012. The file is segmented into 4 pieces, and each of those pieces comply with the protocol.

Am I correct when I say that piece 1 hash=qwerty, piece 2= 123456, piece 3= yuiopa, piece 4= 789012?

Thanks!
Follow us on Twitter @SlyckDotCom
Join our Facebook Fan page
SlyckTom
 
Posts: 5713
Joined: Fri Jul 26, 2002 7:22 pm
Location: New York City

Re: BitTorrent Hash Value Question

Postby IneptVagrant » Sun Apr 19, 2009 9:15 am

You've described the attribute "peices" in the metainfo file. I'll discuss it in regards to the two networks i'm familiar with, bt and emule.

In emule, you take this description of "peices" and apply the same transform hash to it, generating a Hash for the file.

In bittorrent, you take the attribute "info" from the metainfo file (of which "pieces" in a sub attribute) and hash that.

The difference in the two strategy is largely that, in emule difference file names can have the same HASH. But in a torrent the files names and directory structure have to also match, only the annouce URL(s) can be different.

the .torrent hash value is, for simplicities sake: qwerty 123456 yuiopa 789012
For a torrent, you take a hash of that string, but that string is not all the data that is needed, see the other attributes of "info"

In emule a hash of that string would be the hash of the file. Also note, a emule client doesn't know the "pieces" attribute when it begins, its obtains it from other clients with completed pieces as it performs.

**

Some implications.

Knowing a torrent hash tells you nothing about the torrent itself because you can't reverse the process. The hash by itself does not even tell you if you have the right torrent. You must have the "info" attribute in entirety from a metainfo file to confirm the hash. Or in emule you must also know the file length (not to verify the hash but only to be reasonably sure you are targeting the same file as other users with the hash/filelength)

Knowing a torrent hash in no way validates the data. You must know the "pieces" attribute from "info" to validate the file data. To validate each piece individually. And finally validate the "pieces" via the file hash.

Knowing a piece hash and computing it from the file data does not guarantee a piece is not flawed, see hash collisions (thou you can be reasonably sure given the size of the hash and knowing the length of a piece has to be appropriate)
IneptVagrant
 
Posts: 1247
Joined: Tue Nov 15, 2005 5:07 am
Location: close the world . . . . . . . . . . . . . . txEn eht nepO

Re: BitTorrent Hash Value Question

Postby multivariable » Sun Apr 19, 2009 9:38 am

sweeeeet
User avatar
multivariable
 
Posts: 27959
Joined: Sat Jan 21, 2006 11:28 am

Re: BitTorrent Hash Value Question

Postby SlyckTom » Tue Apr 21, 2009 8:55 am

Interesting...OK! Can't wait to get started on the eDonkey2000 section!

Your explanation is excellent...it's cleared many things up, and made me a bit more confused as well ;)

I'm really curious where the segment hash values come from. Again, let's say you have "A great movie.avi" - are the hash values of the segments derived from the actual movie file, or the torrent file? Here are the two competing thoughts in my mind: (although both could be wrong ;) )

1) You take the movie file, and derive the hash value of that moive. The movie is then chopped up, and a hash value for each segment is derived. The hash value of each of those pieces is then incorporated into the torrent file, and the hash value of the torrent is then derived (including the other info data)....

-OR-

2) You take the movie file, and start the torrent creation process. A hash value of the torrent is then derived. The hash value of the segments are then derived from the hash value of the torrent file.

-----

edit: Ok...i think i get it. You have the Info dictionary, which has specific info on the torrent (piece length, number of pieces, and hash value of the segments). You then hash this information, which is the hash of the torrent. The tracker interprets this hash value, and finds a swarm sharing a file with an identical hash.

Now, you say it cant be reversed so I imagine that means the tracker doesn't know what information is being shared. Only the client can verify the hash data - correct?
Follow us on Twitter @SlyckDotCom
Join our Facebook Fan page
SlyckTom
 
Posts: 5713
Joined: Fri Jul 26, 2002 7:22 pm
Location: New York City

Re: BitTorrent Hash Value Question

Postby IneptVagrant » Wed Apr 22, 2009 12:22 pm

1) is close, but drop the first sentence. A hash for the file(s) themselves isn't used in BT, the representative hash for the file is the the hash of the .torrent "info" section. In eMule the representative hash, is the hash of the concatenated pieces' hashes.

--

You divide a complete file into pieces, and hash each piece. From there you can build the "info" section in BT, and finally hash the 'info' section to get a hash for the torrent.

--

A tracker doesn't 'interrupt' a hash. Its just as number, like a serial number to ID related transactions. A tracker knows other ppl looking for the same product, and facilitates to get the many in touch with each other.

Only a Seed (a specialized form of client, one who has the complete file) can verify a hash.

If you only looked as a hash, then two products could by coincidence or difference in market regions end up with the same serial number, or hash. By upon receipt would be obviously different. And that's why piece hash is also used, along with other information to verify. Its highly unlikely that two products with the same serial but produced in different markets have all the same pieces, or even a similar set of pieces.
IneptVagrant
 
Posts: 1247
Joined: Tue Nov 15, 2005 5:07 am
Location: close the world . . . . . . . . . . . . . . txEn eht nepO

Re: BitTorrent Hash Value Question

Postby SlyckTom » Wed Apr 22, 2009 12:32 pm

Thank you, after reading for the last few days its becoming much clearer. I've included below a small segment from the book that describes torrent files and hashing. Now...the book isn't super-technical - its designed so everyone can read and understand it...so some information is presented in layman's terms.

Torrent Files
In order for your BitTorrent client to do something useful, you need a .torrent file. A .torrent file is a small file that initiates communication between the client and the BitTorrent tracker (see figure 2-02). The .torrent also contains structured information to help the client coordinate with the tracker to locate the file desired. This data includes the tracker address and information that describes the torrent. The descriptive information (or info attribute) includes the file name, file length, piece length and the hash value of each of the pieces being shared. The hash value helps protects the end user by verifying that the pieces being traded in the swarm is the same file the end user wants. Every file (in this case the pieces) has a unique signature that can be expressed as an alphanumeric string of text called a hash value, or hash code. The info attribute expressed on the .torrent is hashed again, yielding the info hash (or the overall hash value) of the .torrent file.

When a BitTorrent client connects to a tracker, the tracker receives the .torrent’s info hash value. It then matches up this hash value to the appropriate swarm. Once the proper swarm is found, the tracker provides a list of peers to the client, which then connects to the swarm. Before the client receives a piece (or pieces), the data is error checked by examining the hash value of the piece it receives against the hash value documented in the .torrent file. Do the hash values of the pieces match up? Perhaps they do, but the hash values of the pieces aren’t enough to validate the data. The incoming data must also must also match the info attribute of the torrent – such as the size of the piece, file name, etc. For example, if the piece length doesn’t match up, but the hash value is correct, the segment will still be rejected. If the information is verified, the transfer can begin. As long as the .torrent data known by the client matches the value traded in the swarm, the client will accept information from other peers.
Follow us on Twitter @SlyckDotCom
Join our Facebook Fan page
SlyckTom
 
Posts: 5713
Joined: Fri Jul 26, 2002 7:22 pm
Location: New York City

Re: BitTorrent Hash Value Question

Postby IneptVagrant » Thu Apr 23, 2009 7:21 am

besides the typo "must also must also match"

I'm not sure how correct you want to be.

But to be nit picky

unique signature , is not true. Is a signature, and collectively with the other information is unique -- and even then we can't really say its unique, only that its probably unique. But that's exactitude and not laymen. (bruteing a set of data to fit a given hash is in the same realm of difficulty as bruteing an encryption key)

Length and other info is verified before the hash, because the hash is expensive to verify. Also you can not verify the hash until the transfer has completed (you confused perspective there, seed vs leech) hence rejected parts.

You may want to say something along the lines of, each part is verified as it completes. If you know all the parts match, than the file hash must also match. (eMule doesn't make that assumption because piece hash's are received from the swarm during operation and hence can't be trusted, and is why it does a final file check before marking it as complete)
IneptVagrant
 
Posts: 1247
Joined: Tue Nov 15, 2005 5:07 am
Location: close the world . . . . . . . . . . . . . . txEn eht nepO

Re: BitTorrent Hash Value Question

Postby SlyckTom » Thu Apr 23, 2009 10:15 am

I see...I thought the hash was verified before receiving the piece.

I can definitely work that into the section, thank you again!

And to answer your question, I definitely want to be as accurate as possible - yet understandable to new comers.

When a BitTorrent client connects to a tracker, the tracker receives the .torrent’s info hash value. It then matches up this hash value to the appropriate swarm. Once the proper swarm is found, the tracker provides a list of peers to the client, which then connects to the swarm. Each piece is verified as it completes. Do the info attributes (piece length, file name, etc.) of the pieces match up? Perhaps they do, but the hash value must also match. For example, if the info attributes match up, but the hash value of the piece does not, the segment will be rejected. If the information is verified, the data is accepted. As long as the .torrent data known by the client matches the value traded in the swarm, the client will accept information from other peers.
Follow us on Twitter @SlyckDotCom
Join our Facebook Fan page
SlyckTom
 
Posts: 5713
Joined: Fri Jul 26, 2002 7:22 pm
Location: New York City

Re: BitTorrent Hash Value Question

Postby IneptVagrant » Thu Apr 23, 2009 2:12 pm

I see...I thought the hash was verified before receiving the piece.
The hash you start with as a leech is assumed to be true. You never actually verify the hash is correct, you only verify the data received by generating its hash and comparing to what you have. If the hash you start with wasn't correct, you'd never be able to join the swarm, since obviously everyone else is working with a different hash.

What you wrote is good thou, as it doesn't say "verify the hash" anywhere.

the tracker provides a list of peers to the client
Peer acquisition is its own can of worms. In BT there are 4 ways to acquire peers. DHT, peer exchange, local exchange, and the tracker. If you are on the DHT network, you don't have to be connected to a tracker. And is also why DHT is disabled alot, to force clients onto at least one tracker so they can record statistics. Peer exchange (in emule this is where most peers come from) and local exchange can only occur after you have meet peers.

You don't have to have trackers to use BT, but trackers make it much faster during the run up phase (and let you keep statistics)
IneptVagrant
 
Posts: 1247
Joined: Tue Nov 15, 2005 5:07 am
Location: close the world . . . . . . . . . . . . . . txEn eht nepO

Re: BitTorrent Hash Value Question

Postby SlyckTom » Thu Apr 23, 2009 6:42 pm

Thanks IV! Much appreciated...

Ah...I forgot about local exchange! But that's unique to uTorrent - at least natively...Vuze has a plug in im sure.
Follow us on Twitter @SlyckDotCom
Join our Facebook Fan page
SlyckTom
 
Posts: 5713
Joined: Fri Jul 26, 2002 7:22 pm
Location: New York City


Return to BitTorrent

Who is online

Users browsing this forum: No registered users and 2 guests

© 2001-2008 Slyck.com