The Wide World of Webs

The Wide World of Webs

William (Toby) Gibson

The New Rage
It is the new rage of Cyberspace, and with it comes a facelift for the "text only" Internet, new horizons for information storage and retrieval, a new set of acronyms and more confusion than United States Tax Laws. It is the World Wide Web.

Even the name is confusing. It is called the Web, WWW and even W3, but rarely by its full name. To describe it is almost impossible. For instance, my favorite definition to date is, "It is kind of like Gophers only completely different."¹ Unfortunately, for the novice Web user this is probably the best definition going.

What sets the Web apart from all of cyber-entities is its ability to handle images and sound as well as the text documents that we have come to love. The Internet is now bursting at the seams with an explosion of sights and sounds. By the time you've finished reading this, you too will be able to explore the Web and with any luck join the ranks of literally hundreds of thousands of individuals who have their own little corner of the Internet.

The problem with the Web in particular and Internet in general is that those who work in and around these virtual worlds talk in a "thieves" cant, that is they will say to you: "you are el to h-t-t-p what-sit dot whosit dot wheresit slash tilde thisone slash" and expect it all to make sense. To make matters worse, when you say "What are you talking about?" you are shrugged off with the dreaded 'Read the FAQ' response.

What I just described is a worst case scenario of "Web Crawling." What follows is an attempt to avoid such circumstances. In order to get from the newbie web crawler, to the Web Expert takes a little time, a little effort and a lot of interest.

As I said before, the computer world talks in a thieve's cant. This means that if you want to survive in a computer world you need to learn the way the 'hackers' talk. Just about any good computer dictionary will give you background needed to survive in the world of the internet. Unfortunately, the Web is so new that many of the terms are not included in even the newest dictionaries, and books on the World Wide Web are just now hitting the book shelves. The most important terms are explained here in plain language in an attempt to make Web access easier.

The primary factor making the Web possible is speed. Previously, modems were too slow to allow the practical transfer of large graphics and sound. This is not to say it wasn't done before the Web explosion, only that it wasn't done often. The scientific community has been transferring such data for years and local area networks were also suited for such information transfer.

Unfortunately, trying to transfer large files on a modem slower than 9600 baud was tedious at best and in some cases impossible. Today's modems work at remarkably fast speeds, yet even the fastest modems still need help in order to operate on the Web. The most commonly used software for this purpose is Serial Line Internet Protocol (SLIP). A modem provides dial-up access to a larger computer that is connected to the Internet turning your computer into a dumb terminal. That is your personal computer loses all of its features and takes on the personality of the main frame you dialed into. In essence, what you are actually doing is dialing into an Internet address and using that computer to do your work.²

However, this is typically not enough to get complete access to the Web.³ SLIP and other programs such as Point-to-Point Protocol (PPP) "offers a bridge between dial-up access and the complete range of Internet connections provided by a dedicated access."⁴ Basically, SLIP fools the Internet into accepting your computer and providing it a network address and thus allowing full Web access.

It is possible to reach the Web and achieve good results using a 14.4 modem and a SLIP connection; however, the best results can usually be obtained using an Ethernet connection. An Ethernet connection is a hard line cable directly linked to the Internet. Such connections allow rapid transfer of data by allowing

117

the information to transfer from server to server along a completely digital line. In contrast, modems, by their nature, require the information signal to be translated from a digital signal to an analog signal and vice versa. Despite the increasing speeds of modems, there are speed limitations because of the need to translate signals. For the most part, however, this becomes an argument of what machine will be faster and establishing an Ethernet connection will not always be practical.

What Makes the Web Work
We know now that the Web is practical simply because computers operate much faster than they did just a few short years ago. This increased speed made the web a reality but does not explain really how the Web worked. In order to understand the Web, you need to be familiar with four crucial components of the Web and the simply complicated way these parts interact. The four crucial parts of the Web are: the Uniform Resource Locator, the Hypertext Markup Language, the client and the server.

The Uniform Resource Locator (URL)
Every server on the Web has a URL. It is a Web address, a place to go to. While most often this will be followed by http:// this will not always be the case. You often will find urls that begin ftp:// gopher://,mailto://, and other letter combinations. What is always part of a url is the :// The :// is a crucial part of the url, without the:// the client-server will not be able to connect to the Web site you are looking for. When discussing a site the address should be written: go to <url: http://www.uic.edu/~toby-g/tgp.html>. This is the preferred way to cite a URL. However, the actual address you type in when trying to find the site would only be: http://www.uic.edu/~toby-g/. In other words you leave out the "<, url:" and ">". It is important to remember that not all servers are URL servers. This means there are still plenty of Gophers, FTP sites and Telnet locations that can only be reached through Gopher, FTP and Telnet. Fortunately, the Web is designed to handle these other methods of information retrieval and access. However, there will be times when a Web client cannot reach certain addresses. It is also important to remember that the Web is not the complete Internet.

The HTTP in the URL
While most people are familiar with such terms as FTP, Telnet and Gopher, this new initialism, HTTP, needs some getting use to. HTTP stands for HyperText Transfer Protocol. It is almost always displayed in lower case (http) because Unix machines are case sensitive and most computer hacks do not bother with capitalization, thus cutting down on key-strokes and the possibility of errors. HTTP acts as a signal to the server to take you to a hypertext (Web) document rather than a Gopher, FTP site, Telnet or E-mail connections. Web servers can handle several types of Internet Protocols other than just HTTP, such as those just mentioned. However, only Web client software can actually open and run HTTP documents properly.

What makes HTTP possible is Hypertext Markup Language (HTML). While other non-HTML documents can be viewed on the Web, HTML allows the full potential of the Web to be exploited. HTML is the code that Web documents are written in. For example, if you go a web site and choose View Source from the menu bar in your client software, a text document with all sorts of little tags such as: <LI>, <H1>, <title>/</LI> will appear. These are the HTML markers, that explain to the client how the document display.

Occasionally, you will also hear the term SGML. This is a second cousin to HTML. It stands for Standardized General Markup Language. There will be times when you hear old timers (those people who were doing pages over a year ago) refer to SGML, Don't worry too much about it. Your life as a Web Crawler will not be endangered if you don't know what SGML is.

The client is a software program that is used by an individual to access a server. Some of the more popular clients are Mosaic, Netscape and MacWeb to name just a few. Often, the Web is mistakenly referred to by the name of the client software programs. (I was on Mossaic the other day ...) Don't let this throw you. The number of Web clients is growing daily. One important point to note is that not all Web clients work the same way. Most enable access to text, graphic and sound information. But there are some Web clients, such as Lynx, which allow text-only viewing.

The server is a computer that is networked in some manner to allow storage, access and retrieval of information from other locations. Thus, a client is used to access a server or computer. Servers are given URLs and can then be accessed by the web.

The simply complicated relationship that makes the Web work is sometimes referred to as a client-server-server-client relationship. In practice, it works like this: when you give your server the URL of another server, your server tries to contact it. Once it contacts the other server it will attempt to access the information you were looking for, the information that you find was stored on the other server via some other client using HTML. Thus the chain is complete.

118

Navigating the Web
There are several Web clients. The ones that will be discussed are those that are used with a Graphical User Interface (GUI). GUIs are interfaces, such as Macintosh or MS Windows for IBM Compatibles. In almost every instance using these types of clients necessitate the use of a mouse. The cursor is moved to a desired place on the screen and then the mouse button is depressed to choose a particular function. These functions will vary and in some instances the use of the mouse can be replaced by using specific commands issued from the keyboard. To keep these instructions simple we will disregard the use of command keys. The mouse will be adequate for the first time Web crawler.

The Menu Bar is the series of words that run across the top of the screen. If you move the cursor with the mouse to one of the choices and click and hold the mouse button down, a pull-down menu will display. These are all the related topics under a specific menu choice. To choose any menu choice, continue to hold the mouse button down with the pull-down menu displaying, and when you reach the desired choice release the mouse button. Menu bars will differ from client to client so there is no way to say what every Menu Bar will include. Suffice is to say that they all work about the same.

The Tool Bar usually shows up just below the menu bar. Tool Bars tend to consist of a series of icons and buttons. These are usually navigational tools as well as help aids and/or built-in links to interesting sites that the client provider deems worth knowing about. The Tool Bars also vary from client to client. There are, however, a few icons that are universal.

The Tool Bar in figure one is for Netscape 1.0. It is typical of most Tool Bars. Of special importance are the home, reload and stop icons. The home icon will take you back to the home page, which is designated by you. All clients have a preinstalled home page that you need to reset. Typically, this can be done by locating a preference pop-up menu. This is usually buried somewhere in the Menu Bar. For example, in Netscape you can find the Preferences under options in the Menu Bar.

The reload icon allows you to reload the page. Occasionally, you might attempt to go to a server only to have the page you are interested in fail to load properly. By clicking on the reload icon you will make the computer attempt to reconnect to the server.

Stop is the bail out icon. Sometimes your computer will end up in an endless loop as it attempts to contact another server. If you get tired of waiting, you can click on the stop icon and the machine will stop what it is doing. You can then proceed to try again or try some other site.

In order to navigate backwards and forwards one page at a time you can use the arrows. It is sort of like flipping backwards and forwards in a book. The only catch is you must go somewhere before you can begin using the arrow keys.

What really separates the Web from the rest of the Internet is the use of HTML. HTML provides the language that allows a person to navigate from site to site, despite the fact that different clients and servers are being used. As the first two letters in HTML suggests, the Web is arranged using a hypertext format, which means you do not need to travel in a straight line but you can go to just about anywhere. People familiar with the "seamless architecture" of Gopher or Macintosh's HyperCard program will be familiar with the implications of hypertext. Simply put, hypertext is the ability to go from point A to just about any other point without effort. Typically, this is achieved by "linking" files using key words or phrases located within a page.

According to Paul Gilster:
The notion so closely tied to the name Ted Nelson, who has for years been the guiding force behind a system called Xanadu, which focuses on creating hypertext by forging such links. Of course hypertext doesn't have to be limited to text. An encyclopedia might include links to stored audio or video; click on the right place in the entry on Richard Nixon and you might hear his resignation speech or see him boarding a helicopter for his final presidential ride. Multimedia CD-ROM implementations of this

119

technology provide such audio and video links today, and we're sure to see more as so called hypermedia matures.⁵

For example, a term paper will often have notes and a bibliography attached to it. When reading a term paper it is fairly easy to flip back and forth from the work to the footnote. The footnote contains all the necessary information to retrieve the original citation. Dealing with a dissertation with hundreds of notes becomes more difficult. If the dissertation existed on the Web, one would be able to jump straight from where the work is sighted to the footnote and then back again with the click of a button. Furthermore, if the original citation were online, one could also go directly to the original source and back again. One would not need to go to the library and dig through a journal to find the source, it would be a mere mouse click away. This is moving through hypertext in its simplest form.

Typically, the Web uses highlighted text or icons to act as links between pages. On a monochrome display the links tend to be underlined, and on color monitors different colors are used to mark the links. Using your mouse, you must move your cursor to the desired link and click on it. The client software then takes over and attempts to link you to a new location.

Of course "attempting" and "doing" are two separate things. Several things can happen when you attempt a hypertext jump. Most likely you will link to the desired location with little or no problem. Another possibility is that you will get a message screen that says "URL 404 not found" followed sometimes with a message for today. This means that something is wrong with the address you provided the server. We'll come back to what you can do if this happens. A third possibility is that you will receive an "Unable to Contact Host" message. There is no call waiting on the Internet; you've just received a busy signal.

If you receive a URL 404 not found message you should carefully check the address you are trying to contact. If you are typing the address in yourself, remember that the Web, like Unix, is case sensitive. This means you can only type X for X. A lower case x will not work. All punctuation must be exact. If the address is correct, you may have received a bad address. If possible, you should check the source of the address.

Knowing how to use the Web and getting anything substantial out of it is two different things. In order to accomplish anything on the Web you need to have a starting point. Fortunately, some of the participants on the Web are librarians. There are indexes of cataloged chaos all over the Web. One of the best known and most comprehensive to date is known as Yahoo. It is for the Web what the old Subject Card Catalog was for the library—well, almost.

There is no source that currently catalogs the entire Web and it is doubtful that one will exist anytime soon. However, with Yahoo you have a reliable site that was originally set up by Sanford University. It makes an attempt to provide several methods of searching possibilities. Therefore, if you are a novice and want to see what all this talk is about, then Yahoo is a good place to start.

The Web as a Reference Tool
The Web opens a vast new area for Reference Exploration. On the one hand, all the old options still exist. FTP and Gopher sites are still available through the Web but now all the HTTP sites are here. This leads to the first problem. Almost anyone on the Internet can maintain a Web page, which means that both good and bad information resides side by side. Quite often it is difficult to determine what is personal opinion and qualified facts. It can be difficult to determine which is an organizational site and which is a site that is run by an individual. This in no way means that the Web should be discounted as a reference source. It just means that with all reference work you must verify your source and qualify your answer.

The next step in reference is the building of the Virtual Reference Desk. Many libraries set up pages that are devoted to tracking and compiling high-use electronic sources, such as Britannica Online and the Oxford English Dictionary. Full text journals and directories and other online resources can round out the virtual library. This is nothing new. But now with the advent of the Web, you can also obtain audio clips from political speeches, images from world renown museums and even short movie clips, with a simple click of the mouse button.

Building a Virtual Library is not as easy as it seems. While a virtual library has no walls and no real books it does provide information. That information needs to be collected, cataloged, retrieved and maintained just like any other "real" library.

Therefore, all the considerations that go into building the traditional library must be considered when building the electronic equivalent. To compound the matter, other considerations, such as access versus ownership of materials philosophics, must be overcome. Additionally, all the networking and patron access questions must be answered.

While all these problems may seem more trouble than they are worth, what you wind up with is access.

120

and sometimes ownership to a variety of sources that are more accessible in many ways than traditional materials. What makes the Web such a valuable resource for libraries is the ability to answer what would normally be the unanswerable question by asking the experts in a particular field. For example, a person came in with a question concerning the removal of skunk odor from their cat. He made it clear that the tomato juice bath managed to produce an orange cat that still smelled like a skunk. The answer was found using the Web. A skunk enthusiast had mounted a page, and actually gave a documented cure for removing skunk odor.⁶

And You Can Do It Yourself

Setting up your own page opens up new horizons in information sharing. While the concept is quite simple, in reality there are a few things that the novice will need to do. In order for a page to be viewed you need to have a server with a URL address. If you do not have a URL you can still write a page and view it on your own computer. This is because client software can be used to open documents on your computer. Your system administrator should be able to tell you if there is a server that has a URL. There is also software that can make your networked PC act as a server, but this is beyond the scope of this paper. If you have a client running, such as Mosaic or Netscape, it will open to a particular "home" page. This home page can be changed to your own home page but you have to create it first. Following are brief instructions on how to establish a public HTML folder, which will allow you to setup a home page. These instructions are good for most Unix machines, but local variations may exist. It is best to contact your system administrator for complete details.

Instructions for creating your own Web page
All commands that need to be issued are given in bold face. Type these commands as is in plain text.

1. Log into your Unix account enter pwd. The term "pwd" stands for Print Working directory. The pwd will show up similar to /home/homel/netid. or something similar.

2. You need to create a new directory called "public_html" by entering: mkdir public_html.

3. Check to see if you've done this correctly by typing: Is -1. This will show a detailed list of files and directories.

4. Next, you need to change permissions for all your directories. Enter chmod a+rx public_html to make it accessible.

5. Now, back up one directory by entering cd.

6. Then enter chmod a+rx netid to open.

7. Finally, go into a script editor such as Pico and write this simple document
<html>
<title> my page</title>
<hl> my page </H1>
This is my page<P>
</html>

8. Save this document as home.html and place it in your directory called public_html.

You now have created a directory for all your Web activity and established a home page. Unfortunately, the page is devoid of any hypertext links. In order to learn how to establish links to other page and write HTML, one needs to leam the language. This can only be done through time and patience. For instance, if you wanted to establish a link to The Yahoo Index of the WordWide Web you would establish that link by typing

a href="http://www.yahoo.com/">Yahoo Index</A>

The information in the first set of <> tells the computer to take you to this particular URL. When you click on the linked text (Yahoo Index) The </A> is an html command that tells the computer the command is finished. If the </A> was missing, the computer would attempt to link all the following text to the particular address, in this case Yahoo.

Fortunately, there are literally hundreds of self-help tutorials that explain the ins and outs of correct HTML. Several are listed at the end of this paper. There are also "HTML Editors" that are available as shareware through the Web. These editors will assist the novice in writing HTML. An IBM version is called HTML Assist and a Macintosh Version is called HTML Editor. Both are available from the site listed at the end of this paper.

You are now ready to expand your Internet horizons. Do not be discouraged by the immensity of the Web; instead, tackle it one step at a time. Often, the difference between the novice and the expert is a mere three or four months of experience. Whether your goal is to set up a personal reference page for your own benefit or to put a museum exhibit⁷ online, there really is no better way to date than the World Wide Web. The only limits you face are your own imagination.

Thanks to: Mike Hall, Melissa Koenig, David Romito, Edward Proctor and Brett Sutton for their valuable assistance in this project.

121

A Cyberography

Places To Go on the Web to find out more about the Web.

1. Where to find HTML helper applications:

<url:http://home.netscape.com/assist/helper_apps/index.html>

2. An Introduction to HTML: <url:http://melmac.harris.atd.com/abouUitml.html>

3. How to write Web Documents: <url: http:// fire.clarkson.edu/doc/html/htut.html>

4. Web resources: <url:http://info.med.yale.edu/caim/M_Resources.HTML>

5. The Yahoo World Wide Web Index: <url: http:// www.yahoo.com/>

6. And of course my own comer of the Web. <url:http://www.uic.edu/~toby-g/tgp.html>

Footnotes

1. Anonymous

2. Paul Gilster, The Internet Navigator. (New York: John Wiley & Sons Inc., New York, 1993), 72.

3. In many case dial up access will give access to the Web, if not full at least partial, it depends entirely on what server you are dialing into. For instance, a person with an authorized account at the University of Illinois at Chicago have Dial access to CMI which in tact has www capabilities. A Charlotte (Charlotte's Web) client has been added to CMS which provides access to the Web. Unfortunately only the text of the documents can be read, and images and round files can not be accessed, due to client-server limitations.

4. Internet Navigator, n72

5. Internet Navigator page, 326

6. The Wonderful Skunk and Opossum Page <url: http://elvis.neep.wisc.edu/~firmiss/mephitis-didelphis.html>

7. DNA to Dinosaurs, <url: http://www.bvis.uic.edu/museum/> is an interactive exhibit of the Field Museum in Chicago. Several other Museums have also put selected exhibits on the Web for world wide access.

*William (Toby) Gibson, Professional Library Associate, University of Illinois at Chicago. ( His URL is http://www.uic.edu/~tobyg/tgp.html)

122

Illinois Periodicals Online (IPO) is a digital imaging project at the Northern Illinois University Libraries funded by the Illinois State Library