Zero to HTML in One Hour or Less

Introduction
Getting started
Paragraphs 'n' stuff
Italics 'n' stuff
"The Multi-Media Sector of the Internet"
Miscellany
Trivia
When you're done

Introduction

This is a simple guide to HTML. It assumes some user-side knowledge of the Web. If you've never used the Web, why are you writing a page? Go surfin', man!

It also assumes knowledge of the term "URL". If you don't know what that is, go poke around the Help menu of your preferred Web client for a while, then come back to this.

This was first written on 23 November 1995, and has since been slightly updated. It covers the essentials of HTML 2.0; a second guide to more advanced topics is planned.

HTML documents are text files. On a Macintosh, you can create them with SimpleText; on a PC, you can use Notepad. Or, you can use a word processor or utility (like Microsoft Word, ClarisWorks, or WordPad), as long as you are sure to save the document as text.

The most basic concept of HTML is containers, called elements in HTML. All of the codes in HTML signal the start or the end of an element. A start-tag marks the beginning of an element, and looks like this: <html>. An end-tag marks the end of an element, and looks like this: </html>. Note that they are nearly the same; the only difference is the slash in the end-tag. Elements can contain other elements or text; the exact rules vary depending on the kind of element.

Some elements do not need the end-tags specified. The most prominent ones are the paragraph element (<p>) and the list-item element (<li>). Some elements have attributes, which provide additional information about the element. The most prominent ones of these are the hypertext anchor element (<a>) and the in-line image element (<img>). We'll discuss each of these elements and their attributes in detail later.

An HTML file contains a single <html> element, which contains the entire document. To start your HTML document, open a new file and add:

<html>
</html>

Everything in the document will go between the start and the end of the <html> element.

The <html> element contains exactly two elements: a <head> and a <body>. Add them to your new document:

<html>
<head>
</head>
<body>
</body>
</html>

The <head> can contain a few other elements. The only required one is the <title>. Add it:

<html>
<head>
<title></title>
</head>
<body>
</body>
</html>

The <title> element contains only text. Most Web clients use this content for the text in the title bar of their window. Add a title to your document:

<html>
<head>
<title>My First HTML Document!</title>
</head>
<body>
</body>
</html>

You should avoid the less-than (<) and ampersand (&) characters, as they have special meaning in HTML. There are ways to get around that in the body of the document, but many Web clients do not understand those ways in the <title>.

Now see what the sample looks like. Find where the <title> element is displayed in your Web client. Try loading your own file into your Web client.

Paragraphs 'n' stuff

Now for the <body>. There are two common mistakes that beginning Web authors make, which are reinforced by out-of-date author resources on the Web. One is "uncontained text". The <body> can only contain other elements - no text! These elements can contain text, but Web clients can only guess at what to do with text that isn't in its own element. The other common mistake we'll postpone discussing, to avoid confusion. The content of the <body> is the bulk of what will be seen by people looking at your Web page.

The basic <body> elements are the paragraph (<p>) and the six heading levels (<h1>, <h2>, <h3>, <h4>, <h5>, and <h6>). As already mentioned, <p> does not require an end-tag, but the headings do. You may not have a heading numbered higher than six, and you must always have a number. These are actually six different kinds of elements.

The majority of Web pages consist of headings and paragraphs. Let's add some to the sample page:

<html>
<head>
<title>My First HTML Document!</title>
</head>
<body>

<h1>Welcome to my Web page!</h1>

<p>This is a paragraph in a Web page.  Note that it begins with the
paragraph start-tag, but doesn't need an end-tag.

<h2>This is a lower-level heading</h2>

<p>Here is another paragraph, below the lower-level heading.

<p>And yet another paragraph.  Each paragraph signals the end of the
previous one, which is why you don't need to close them.  It isn't
wrong to, though.</p>

</body>
</html>

Now see what it looks like. Try loading your own file. Play around with it a bit: take out the <p> end-tag. Now take out one of the heading end-tags. What happened?

The other kind of thing that <body> can contain is lists. There are three types of lists: ordered (numbered) lists, unordered (bulleted) lists, and definition lists.

The ordered list is <ol>; the unordered list is <ul>. Both lists need start- and end-tags. The lists contain list items: <li>. The list items do not need end tags. See the sample document:

<html>
<head>
<title>My First HTML Document!</title>
</head>
<body>

<h1>Welcome to my Web page!</h1>

<p>This is a paragraph in a Web page.  Note that it begins with the
paragraph start-tag, but doesn't need an end-tag.

<h2>This is a lower-level heading</h2>

<p>Here is another paragraph, below the lower-level heading.

<p>And yet another paragraph.  Each paragraph signals the end of the
previous one, which is why you don't need to close them.  It isn't
wrong to, though.</p>

<h2>Lists</h2>

<p>This is an ordered, or numbered, list:

<ol>
<li>This is the first item.
<li>This is the second item.
</ol>

<ul>
<li>This is an unordered list.
<li>Most Web clients will show bullets in front of the items in this
list.
</ul>

</body>
</html>

Once again, see the sample in your client. Load your own. Try adding end-tags for the list items. Does it make a difference? It does, in some poorly-designed Web clients.

Definition lists do not contain items, but rather terms and definitions. The definition list is <dl>. The term is <dt> and the definition is <dd>. The sample document is getting too long, so here's an out-of-context example of a definition list:

<dl>
<dt>sugar<dd>a natural sweetener
<dt>geek<dd>someone who bites heads off of chickens
<dt>web<dd>something spun by spiders
</dl>

See what it looks like in your client. Play with your own document, etc. You get the gist.

Note that the <dt> and the <dd> do not need end-tags, but the <dl> does. Most Web clients will handle this list with each term and each definition starting a new line, with the <dd> indented. To keep them on the same line if possible, use the compact attribute.

Attributes go inside the start-tag for an element. The attribute name is followed by an equals sign (=) and the attribute value. For the definition list, there is an attribute named compact, which can take a value of compact: <dl compact="compact">. In certain cases, attributes can be abbreviated; in this case, you can simply give the attribute value: <dl compact>. That example again:

<dl compact>
<dt>sugar<dd>a natural sweetener
<dt>geek<dd>someone who bites heads off of chickens
<dt>web<dd>something spun by spiders
</dl>

You know what to do. Does your client display the compact and non-compact lists differently?

Those are the basic "block-level" elements; they are paragraphs or things like paragraphs. They generally cause line-breaks and have some space before and after them. Now we'll discuss the "flow-level" elements. These are the ones that go inside block-level elements; they cause things like italics, boldface, and hypertext links.

Italics 'n' stuff

The <em> element is emphasis. In GUI clients, it's usually presented on the screen in italics, but is better than simply saying "italic" (<i>) because it gives a reason why the text should be italic. Not everyone on the Web is even reading a screen; some are listening to speech synthesizers, and some are reading Braille terminals.

Use of the <em> element also distinguishes its content from the <cite> element, which is used for book, magazine, or movie titles. There are many services that index the content of the Web, and <cite> elements are indexed differently by some of these.

For the same reason, <strong> is preferred to simply saying "bold" (<b>), but it's harder to type, and there are no other boldface-inducing elements, so this rule is often ignored. Ninety percent of your audience will not notice the difference. Here's an out-of-context example:

<p>This is a paragraph.  It's <em>very</em> important that you read it
carefully.  <strong>This sentence is <em>especially</em>
critical.</strong> I just read a book called <cite>Outlander</cite>,
and watched a movie called <cite>My Neighbor Totoro</cite>.  For
contrast, my employer makes a product called <i>Dyna</i>Web, and in
this case the italics are purely presentational, with no inherent
meaning.</p>

Check it out.

Note that all of these elements require end-tags. If you forget one of these, it will apply to the rest of the document in many brain-damaged Web clients, including Netscape and Mosaic.

"The Multi-Media Sector of the Internet"

Now we come to the two elements that are the heart of the Web: <a> and <img>. The <a> element is the hypertext anchor. Its primary use is to create links to other documents. Here we have another use for attributes: <a> takes an attribute called href, whose value is the URL of the target document. The URL must be in quotes.

The URL can be fully specified: <a href="http://www.shore.net/%7Ecrism/index.html">. Often, though, groups of related documents are stored in the same directory. There is a useful shortcut for specifying the URL: if only a filename is given, the Web client will assume that the method, host, and directory are all the same as the current document. This is extremely useful if one is using a service provider, and doesn't know in advance what the exact URL for one's documents will be. <a href="index.html"> will find the same document as the example above, if it's in the same directory as the current document. This also makes it possible to move collections of documents without having to update all of the links.

Hypertext <a> links do not have to go only to HTML files; they can also link to graphics, text files, PostScript files, spreadsheets, compressed executables... Anything specifiable by a URL can be the target of a link.

The <a> element is a flow-level element. It should only occur in paragraphs, list-items, etc., and always requires an end-tag.

The <img> element works similarly, except that the URL is given by the src attribute. The Web client will load this as an in-line image, so in this case, the target should be a graphic (usually a GIF or JPEG). One should also give an alt attribute, with text to display in the case that the client can't display graphics, or has inline graphic loading turned off. (Forty percent of Web users do not load graphics when first visiting a page, or at all.) The alt attribute should be short, but descriptive enough that the user can decide whether or not to look at the picture. It also can not contain < or &. (There is a way that should make it possible to use these, but most Web clients will behave improperly if you try.) Also, since the attribute value starts with a double-quote ("), it will be ended by the next one, which means that it can't contain any. <img src="mypic.gif" alt="My smilin' face.">

The <img> element is a new concept: an empty element. It not only requires no end-tag, but it has no content! A picture is inserted wherever the <img> tag occurs. The <img> element is also unusual in that it can be either a block-level or a flow-level element.

Miscellany

Sometimes you need a line-break in the middle of a block-level element, but it's not really a new paragraph or list item or whatever. The <br> element provides that. Like <img>, it's an empty element: no end-tag, and no content.

Many Web pages have horizontal rules across them. This is done with the empty <hr> element.

I mentioned before that there are ways around the inability of elements to contain less-than signs or ampersands. That way is to use an entity reference. That's a technical term; all it means is an ampersand (&), followed by the name of a character, followed by a semicolon (;). That's why you can't use an ampersand ordinarily; it signals the start of an entity reference. So, to get "AT&T" on your Web page, you'd type AT&T. This is also useful for sample HTML - take a look at the source for this page.

You can also use entity references to get western European accented characters and some other characters, but < and & are the only ones you need to know about to get started.

Most Web clients display pages in a proportional-width font, like Times or Helvetica. However, sometimes it's nice to be able to use a monospaced font (like Courier) for an entire block of text, like for the sample HTML pages in this document. This is when <pre> can be used - for "preformatted" text. Whereas carriage returns and spaces are collapsed in other kinds of paragraphs, and the text wrapped to fit the Web client window, in <pre> all carriage returns and spaces are honored, and the text is usually displayed in a monospaced font.

The <blockquote> element can contain all of the same things that <body> can. This is so that chunks of an HTML document can be quoted directly within a <blockquote>. Usually, the display of those elements is exactly the same as if they weren't in a <blockquote>, but they are indented a bit, like a large quote in a textbook.

Lists (the <ul>, <ol>, and <dl> mentioned previously) can be nested. List items (<li>) and def-list definitions (<dd>) can contain all block level elements, including other lists. For example:

<h3>Building a widget</h3>

<ol>
<li>Select the type of widget:
   <ul>
   <li>brown
   <li>green
   <li>purple
   </ul>
<li>Pick the type of paint:
   <dl compact>
   <dt>enamel<dd>shiny, good for attracting small animals
   <dt>matte<dd>duller, doesn't show dirt
   </dl>
<li>Etc....
</ol>

Your Web client will show this something like this (which gives me a chance to demonstrate a <blockquote>):

Building a widget

Select the type of widget:

brown
green
purple

Pick the type of paint:

enamel
shiny, good for attracting small animals
matte
duller, doesn't show dirt

Etc....

The <address> element is somewhat of a leftover from earlier days of the Web. It's a block-level element, but its visual behavior is unpredictable. Some Web clients indent it, some italicize it, some do both.

That's quite a bit more than the basics. For more information, start at the World-Wide Web Consortium. I recommend playing around with the concepts above. Just create a text file and change it around. Most Web clients can open a file on your computer.

Below are some irrelevant details - the stuff above should be enough to get anyone going with a Web page. You can also skip to the end for a more advanced guide.

Trivia

The other common Web authoring mistake involves leftovers from the early Web. In those days, the tags didn't signal containers' start or finish - they were just processing instructions. Then, the <p> tag acted as a carriage return. You can still see Web pages (and several prominent Web how-to pages!) that recommend this usage. It's wrong. It will still work, but as HTML evolves into the future, new features may require Web clients to be stricter about the way they handle documents, and these old pages may very well break. When starting from scratch, it's easier to start right.

There is actually a very strict set of rules governing the relationships between elements - which ones may contain which others. There is a validation service on the Web for checking the conformance of an HTML document to these rules. However, its output is a bit esoteric.

Formal HTML has a document type declaration at the top: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> or <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> or something along those lines. However, this implies that the document below it is strictly valid HTML, according to the relationship rules mentioned above. Only include a line like this in documents that pass a validation test.

Sticking with strictly valid HTML 2.0, there is no way to get fancy features like centering, backgrounds, etc. I strongly recommend avoiding backgrounds; in some cases, they do make a page prettier for some users, but they add nothing to the content, increase download time, and can render a page completely unreadable for others.

Centering is usually unnecessary. If you must center something, do not use the <center> tag. It will screw stuff up on some Web clients, and look silly on others. Besides, there's a better way. It predates the existence of Netscape, is better from a semantic point of view, and doesn't break things on Web clients that don't know about it. And here's the kicker: Netscape knew about it, and implemented it, but decided (for some unknown reason) to add <center> - and document it, but not the other, cleaner way.

There is an attribute of align="center" available on nearly all block-level elements, which will center text the same as if <center> were used. E.g., <p align="center">.

When you're done

There will soon be another guide on more advanced topics, including:

tables
the Any Browser initiative
frames - just say no
Java, JavaScript, and ActiveX
more on validation

Last updated and validated 12 April 2001 by crism.