[Chapter 1] Introduction

This is a book that needs no introduction. By this time, if you don't know what the World Wide Web is, then you probably haven't heard of Rollerblades, VCRs, or Boris Yeltsin either.

In this chapter, we give the world's quickest introduction to Web technology and the roles of the WebMasters who breathe life into each Web document. If you want to learn more about the history of the Web, or how to make your Web pages "cool," or the social impact of the Internet, or how to make money online, etc., etc., etc., well, we include a bibliography at the end of this chapter, and we can also recommend many of the books sitting next to this one on the bookstore shelf. But we don't get into those issues.

This is a book by impatient writers for impatient readers. We're less interested in the hype of the Web than we are in what makes it actually tick. We'll leave it to the pundits to predict the future of the Web, or to declare today's technology already outdated. Too much analysis makes our heads spin; we just want to get our Web sites online.

1.1 The Web in a Nutshell

We've organized this book in a roughly "outside-in" fashion--that is, with the outermost layer (HTML) first, and the innermost layer (the server itself) last. That way, the material most readers are interested in is immediately accessible, while the material of less general interest remains in the back. But since it's a good idea for all readers to know how everything fits together, let's take a minute to breeze through a description of the Web from the inside-out: no history, no analysis, just the technology basics.

Clients and Servers

The tool that most people use on the Web is a browser, such as Netscape Navigator, Internet Explorer, or Mosaic. Web browsers work by connecting over the Internet to remote machines, requesting specific documents, and then formatting the documents they receive for viewing on the local machine.

The language, or protocol, used for Web transactions is Hypertext Transfer Protocol, or HTTP. (HTTP is covered in Chapter 17, HTTP Overview through Chapter 20, Media Types and Subtypes.) The remote machines containing the documents run HTTP servers that wait for requests from browsers and then return the specified document. The browsers themselves are technically HTTP clients.

There are several different types of Web server software available, both free and commercial. We cover the configuration of several of the most popular servers in Chapter 22, Server Configuration Overview through Chapter 25, Netscape Server Configuration.

Uniform Resource Locators (URLs)

Now, let's take a short detour in this overview. One of the most important things to grasp when working on the Web is the format for URLs. A URL is basically an address on the Web, identifying each document uniquely (for example, http://www.ora.com/products.html). Since URLs are so intrinsic to the Web, we'll discuss them here in a little detail. The simple syntax for a URL is:

http://host/path

where:

host

is the host to connect to, for example www.ora.com or altavista.digital.com. (While many Web servers run on hosts beginning with www, the www prefix is just a convention.)

path

is the document requested on that server.

Most URLs you encounter follow this simple syntax. A more generalized syntax, however, is:

scheme://host/path/extra-path-info?query-info

where:

scheme

is the protocol used to connect to the site. For Web sites, the scheme is http. For FTP or Gopher sites, the scheme is (respectively) ftp or gopher.

extra-path-info

is optional extra path information (used by CGI programs). See Chapter 9, CGI Overview, for more information.

query-info

is optional query information (used by CGI programs). See Chapter 9, CGI Overview, for more information.

HTML documents also often use a "shorthand" for linking to other documents on the same server, called a relative URL. An example of a relative URL is images/webnut.gif. The browser knows to translate this into complete URL syntax before sending the request. For example, if the document with URL http://www.ora.com/books/webnut.html contains a reference to images/webnut.gif, the browser reconstructs the relative URL as a full (or absolute) URL, http://www.ora.com/books/images/webnut.gif, and requests that document independently (if needed).

Often in this book, you'll see us refer to a URI, not a URL. A URI (Universal Resource Identifier) is a superset of URL, in anticipation of different resource naming conventions being developed for the Web. For the time being, however, the only URI syntax in practice is URL--so while purists might complain, you can safely assume that "URI" is synonymous with "URL" and not go wrong (yet).

Web Content: HTML, CGI, Java, and JavaScript

While Web documents can conceivably be in any format, the one that has been adopted as the standard is Hypertext Markup Language (HTML), a language for creating formatted text interspersed with images, sounds, animation, and hypertext links to other documents anywhere on the Web. Chapter 2, HTML Overview through Chapter 8, Browser Comparison, cover the most current version of HTML.

When static documents aren't sufficient for a Web site's needs, it uses tools such as CGI, Java, and JavaScript. CGI is a way for the Web server to call external programs instead of simply returning a static document. Chapter 9, CGI Overview through Chapter 16, Other CGI Resources, are for CGI programmers. Java is an object-oriented language for writing all sorts of programs that can be downloaded over the Web, from animations to spreadsheets. This book does not cover the complexities of Java, but it does cover JavaScript, a related language that can be written directly into the HTML document. (For details on Java, we recommend Java in a Nutshell, by David Flanagan.)


Acknowledgments		Who Are the WebMasters?

1. Introduction

1.1 The Web in a Nutshell

Clients and Servers

Uniform Resource Locators (URLs)

Web Content: HTML, CGI, Java, and JavaScript