The Newsgroups Intro
The Newsgroups, also known as Usenet, are a special part of the internet that pre-dates the World Wide Web. Once upon a time when the internet was still young there were special interest groups that shared information and kept in touch by using a bulletin board type system. This system was designed to take advantage of the internet in a way an old BBS couldn't: each location had a machine (news server) that would store all the messages of the newsgroups that were desired by it's users. Periodically these servers connect to each other and exchange all messages that are missing on either server. In this manner, a message sent by any user would eventually get distributed to every server that carried that newsgroup.
A short time passed and the users of certain newsgroups thought that this system would be ideal to share files with each other. However, the newsgroups were not designed to transfer binary files - they can only transfer text files. For the news server to function as a source of binary files, there's a major hurdle that must be overcome. There's no way to get around the text-only policy of a news server, so instead of trying to defeat the problem, clever people who were determined to create a file-sharing world instead worked with the problem. The solution: text-encoded binary files.
Overcoming News Server Limitations
A text-encoded binary file is information that is encoded, or transformed, from binary format to text. It's through this method that any file can be then be uploaded to a news server. You might think to yourself, "Great, let's encode this ISO to text and upload that sucka to the newsgroups!" Unfortunately, it's not that easy. There's still another problem to overcome.
A news server limits the size of an article or news post it will accept. Usually that limit is rather small. Each server is different, but generally a newsgroup article can be no longer than 5,000 lines of text. A five Gigabyte file, if converted to text, would be tens of millions, if not hundreds of millions, of lines of text. So how do uploaders get large files to the newsgroups? The sly way.
The process of making a binary file available on the newsgroups is complex. Most people will never upload a file the newsgroups, but the process is important to know and provides helpful insight into how the newsgroups work. Let's start off with a 700 Megabyte (MB) ISO file, which equates to around 15 million lines of text. The mission: upload this file to the alt.binaries.test newsgroup.
How The Magic Happens
The first step in the process is to chop the 700 MB file into multiple parts. When the 700 MB file is first split up into multiple parts, it becomes known as an archive. The archiving process is done using a utility such as WinRAR or HJ-Split. As a general standard for newsgroup posting, a 700 MB file is broken up into 15 MB chunks. This process is shown in Figure 1. There are no set rules for this, just a generalized approach that has proven to work.
Figure 1: Splitting Up A Large File...
There are two types of files you'll find in a Usenet archive: RAR files and split files. WinRAR is an application used to split and compress files into an archive. Back in the day, files were uploaded/downloaded on dial-up and if you can compress the archive and save a few MBs, tremendous amounts of time could be saved. Today, bandwidth is cheaper and speeds are faster, so saving a few MBs isn't as important. Hence, split files have become more prevalant. Programs such as HJ-Split or QuickPAR easily split large files into multiple chunks, but don't compress them.
At this point, there are forty six 15 MB files, and one 10 MB file (the remainder from creating the archive); and all are still too large to post to a news server. How can the news server accept a 15 MB file that equates to over 300,000 lines of text? Figure 2 shows how we need to break it down again. Each of the archive files are further broken down into text messages (the encoding process), at which point the news server will accept the post. A 15 MB part, when posted to the newsgroups, is actually made up of 70 text messages totaling approximately 5000 lines (the 10 MB chunk will be a little less).
Figure 2: Making A Small File Even Smaller...
There are a lot of reasons why large files are split into small files. We already know that news servers won't accept articles beyond a certain size. Why not just skip the archive part and just convert the 700 MB file into text? You could, but archives are easier to deal with from an organizational standpoint - it's easy to identify and grab a few parts when necessary. PAR2 recovery files
also do a beter job handling smaller archive parts rather than one huge 700 MB file. The purpose is also partly rooted in tradition, as this is the method release groups use to distribute information. Using WinRAR allows for error-checking, which is made easier when verifying an archive made of multiple parts. Mere downloaders are the eventual beneficiaries of this process.
So is it worth it? You better believe it! With today's NZB files and indexing sites, this process is almost entirely transparent, unless you're into browsing content
. The newsgroups are an awesome resource. If you're not using them and are ready to kick it up a level, check them out. You won't believe your eyes!