Linux System Administrator's Survival Guide lsg38.htm

What Is USENET?
A Brief History of USENET
How USENET News Is Handled
Summary

Chapter 38

USENET and Netnews

If you have an Internet connection, eventually you are going to want to access USENET and its newsgroups. USENET is one of the most dynamic (and often controversial) aspects of the Internet. With access to the Internet, you can set up, access, and work with all kinds of newsgroups, but most Linux users will be interested in using USENET specifically. This chapter looks at the background of USENET and news services for UNIX in particular, as well as how the Linux news programs handle the news.

What Is USENET?

USENET is one of the most misunderstood aspects of the Internet. At the same time, it is one of the most popular and frequently used aspects of the Internet (with the possible exception of e-mail). To many users, especially those who don't use Internet's mail facilities, USENET is Internet, and vice-versa.

USENET was originally developed to facilitate discussion groups (called newsgroups in USENET jargon). A newsgroup lets any user with access to the system participate in a public dialogue with everyone else. By the end of 1995, USENET carried over 9,000 different newsgroups totaling well over 100M of information every day. USENET is supported in millions of networks in hundreds of countries and reaches hundreds of millions of users.

Despite what most people think, USENET is not a formal network or entity. Instead, it is a number of networked machines that exchange electronic mail (articles) tagged with predetermined subject headers for specific areas of interest (newsgroups). The articles are handled as electronic mail messages by most network machines; articles are processed as news information only by the applications called newsreaders that send and receive the messages.

Any machine that can attach itself to the Internet either directly, through a gateway, or through a forwarding service (such as an online service provider) can become part of USENET. All that is required to use USENET is the software that downloads and uploads the newsgroup mail and a reader package that lets users read and write articles.

The software that implements the passing of USENET messages over local area networks from one machine to another is the Network News Transfer Protocol (NNTP). Using NNTP, your Linux machine can interact with any other machines that handle the news. NNTP software is an integral part of most Linux versions, so you don't need to purchase or look for additional software. Indeed, many people establish Linux machines just to access Internet services like USENET, e-mail, and the World Wide Web.

A Brief History of USENET

USENET was developed out of a UNIX release known as UNIX V7, which implemented UUCP (UNIX to UNIX CoPy) for the first time. As UUCP became popular for communications between machines, it was expanded with program extensions and supplementary programs. USENET began at the University of North Carolina, where Steve Bellovin used shell scripts to write the first version of news software. UNC and Duke used this software to pass messages and commentary between the two universities. Interest in the news software spread when the UNC system was described at a Usenix conference in 1980. Steve Daniel was the first to implement the news software in the C programming language. This version eventually became the first general release of the news software, which was called release A.

To cope with the increasing volume of messages as new news sites were added to the expanding informal network, two University of California students, Mark Horton and Matt Glickman, rewrote the software and added new functionality. After a further revision of their release B, the news software was generally released in 1982 as version 2.1. From there, the Center for Seismic Studies' Rick Adams took over maintenance of the software in 1984, at which point it was up to release 2.10.2. One of Rick's first additions was the capability for moderated newsgroups, resulting in release 3.11 in 1986.

Since then, several contributors have added features to the software, the most important of which was a complete rewrite of the software undertaken in 1987 by the University of Toronto's Geoff Collyer and Henry Spencer. Their rewrite greatly increased the speed with which message mail could be processed and was generally released under the name C News (from Release C). Over the next few years, the basic news package went through some minor revisions but has remained true to Collyer and Spencer's version. Important changes were made to the way machines transferred news messages, and a daemon was added to process incoming and outgoing postings.

All the versions of news software developed to this point had used UUCP as the transport. To allow transfer of messages over a network, a protocol called the Network News Transfer Protocol (NNTP) was developed in 1986. NNTP-based software began to be refined, and a widely used version was implemented in software written by Brian Barber and Phil Lapsley, called nntpd. An alternative NNTP system that is widely available is INN (Internet News), which provides a complete news package (user interface and underlying software).

Apart from the underlying mechanics for transferring messages for newsgroups, developments also were continuing in the user interface area, where the newsreader exists. Newsreader software lets you read articles in newsgroups as they arrive. The original reader was called readnews, and it remains one of the most widely used newsreader packages, primarily because is it easy to use and is available on practically every UNIX system.

Several alternate newsreaders were developed, expanding on the features offered by readnews. Software such as rn (a more flexible version of readnews), trn (threaded readnews), and vnews (visual newsreader) are freely distributed now. All are character-based systems originally developed for UNIX and ported to many other operating systems. With the popularity of graphical user interfaces, newsreaders were also ported to these environments, resulting in software such as xrn (X Windows-based readnews). Most of the readnews variants share a basic command set, although each adds features that may appeal to some users.

How USENET News Is Handled

Two types of software are involved in making a news service work on a Linux machine. The transport software (usually C News for UUCP connections or NNTP for TCP connections) gets the newsgroups to your machine. The newsreader then assembles and presents the articles to the user. Newsreaders are only involved in the actual user interface; they simply pass and receive news articles from the underlying software. For that reason, you don't need to look at the mechanics of a newsreader to understand how Linux processes news. The original news system relied completely on UUCP, so much of the news software was designed for UUCP and then modified later to accommodate alternate methods.

To transfer news from one machine to another, a technique called flooding is used. One machine calls another and transfers all the news articles. The machine that just received the news calls another and transfers the articles again. The news articles flow across the networks by moving from machine to machine instead of all the machines polling a single main news source. Each machine maintains a list of other sites it can contact to transfer mail. Each connection to another machine is called a newsfeed.

Each machine can generate new articles as the system's users interact with newsgroups. When new articles are created, the machine checks its list of newsfeeds and calls them to transfer the new mail. Because each article generated by a newsreader has a list of the machines that it has passed through (called the Path), the local machine knows whether the remote sites on its newsfeed list have already seen the article. As articles move from machine to machine, each machine adds its own identifier to the article's Path field, using the UUCP bang-style notation.

An entry in the Distribution field of the header may place a restriction on the machines that can be sent an article. For example, if you write an article that you want to stay within your local area network, you can specify this in the Distribution field of the message when you write it. Then when a newsfeed to a machine outside the local area network is created, the Distribution field prevents the article from being sent.

To help prevent duplicates of articles moving around USENET, each article has a unique identifying number called a message ID (which sits in the Message-Id field in the article header). The message ID is a combination of a unique number and the name of the machine that the article was originally posted on. Machines use these message ID numbers when a connection to a newsfeed is established. A history file on each system contains a list of all message ID numbers that the local system has. When the two machines communicate with each other, they can check the history file to find out whether the message should be sent. This process is part of a news transfer protocol called ihave/sendme.

With the ihave/sendme protocol, one machine sends a list of all the message ID numbers it currently has and waits for the other machine to identify the ones it wants. These numbers are transferred one at a time in response to sendme messages. Then the process can be reversed to update the other machine. This type of protocol works well, but it does involve a lot of overhead in the communications process. For that reason (coupled with the generally slow lines used by UUCP modem links), ihave/sendme protocols are not often used when a very large newsgroup transfer has to take place at regular intervals. You wouldn't want to use ihave/sendme to transfer 100M of articles every day, for example.

An alternative method used for large transfers is batching of articles. In this method, one machine sends everything it has to another machine. The receiving machine then performs a check of the newly arrived articles to see whether it already has them. By looking at the message ID number, the machine can discard duplicates. This method tends to be a faster for transferring, although it does have more processing overhead for the receiving machine when the machine deals with the newly arrived batch of articles.

For network-based news access, there are three ways to get articles from another machine. Using NNTP, your machine can download articles you want using a technique called pushing the news, which is similar to the ihave/sendme protocol. Your machine can also request specific newsgroups or articles from the remote based on the date of arrival, which is called pulling the news. Alternatively, you can interact on an article-by-article basis with the remote, never downloading the articles to your local machine. This process is called interactive newsreading, and it works only when you have a newsfeed you can log in to (which is common these days).

Summary

This chapter looked at the basics of USENET and news. This information provides a foundation for the next two chapters, which look at NNTP (for network-based access to news) and C News (for UUCP and network access to news). Entire books are written on the subject of USENET and the protocols it uses. Check out one of them if you need more information on this subject.