Copyright © 1995, 1997 Christopher R. Maden.
This is a simple guide to HTML. It assumes some user-side knowledge of the Web. If you've never used the Web, why are you writing a page? Go surfin', man!
It also assumes knowledge of the term "URL". If you don't know what that is, go poke around the Help menu of your preferred Web client for a while, then come back to this.
This was first written on 23 November 1995, and has since been slightly updated. It covers the essentials of HTML 2.0; a second guide to more advanced topics is planned.
HTML documents are text files. On a Macintosh, you can create them with SimpleText; on a PC, you can use Notepad. Or, you can use a word processor or utility (like Microsoft Word, ClarisWorks, or WordPad), as long as you are sure to save the document as text.
The most basic concept of HTML is containers, called
elements in HTML. All of the codes in HTML signal
the start or the end of an element. A start-tag marks the beginning
of an element, and looks like this: <html>
. An
end-tag marks the end of an element, and looks like this:
</html>
. Note that they are nearly the same; the only
difference is the slash in the end-tag. Elements can contain other
elements or text; the exact rules vary depending on the kind of
element.
Some elements do not need the end-tags specified. The most
prominent ones are the paragraph element (<p>
) and the
list-item element (<li>
). Some elements have
attributes, which provide additional information
about the element. The most prominent ones of these are the hypertext
anchor element (<a>
) and the in-line image element
(<img>
). We'll discuss each of these elements and
their attributes in detail later.
An HTML file contains a single <html>
element,
which contains the entire document. To start your HTML document, open
a new file and add:
<html> </html>
Everything in the document will go between the start and the end of
the <html>
element.
The <html>
element contains exactly two elements: a
<head>
and a <body>
. Add them to your
new document:
<html> <head> </head> <body> </body> </html>
The <head>
can contain a few other elements. The
only required one is the <title>
. Add it:
<html> <head> <title></title> </head> <body> </body> </html>
The <title>
element contains only text. Most Web
clients use this content for the text in the title bar of their
window. Add a title to your document:
<html> <head> <title>My First HTML Document!</title> </head> <body> </body> </html>
You should avoid the less-than (<
) and ampersand
(&
) characters, as they have special meaning in HTML.
There are ways to get around that in the body of the document, but
many Web clients do not understand those ways in the
<title>
.
Now see what the sample looks like. Find
where the <title>
element is displayed in your Web
client. Try loading your own file into your Web client.
Now for the <body>
. There are two common mistakes
that beginning Web authors make, which are reinforced by out-of-date
author resources on the Web. One is "uncontained text". The
<body>
can only contain other elements - no
text! These elements can contain text, but Web clients can only
guess at what to do with text that isn't in its own element. The other common mistake we'll postpone discussing,
to avoid confusion. The content of the <body>
is the
bulk of what will be seen by people looking at your Web page.
The basic <body>
elements are the paragraph
(<p>
) and the six heading levels
(<h1>
, <h2>
, <h3>
,
<h4>
, <h5>
, and <h6>
).
As already mentioned, <p>
does not require an end-tag,
but the headings do. You may not have a heading numbered higher than
six, and you must always have a number. These are actually six
different kinds of elements.
The majority of Web pages consist of headings and paragraphs. Let's add some to the sample page:
<html> <head> <title>My First HTML Document!</title> </head> <body> <h1>Welcome to my Web page!</h1> <p>This is a paragraph in a Web page. Note that it begins with the paragraph start-tag, but doesn't need an end-tag. <h2>This is a lower-level heading</h2> <p>Here is another paragraph, below the lower-level heading. <p>And yet another paragraph. Each paragraph signals the end of the previous one, which is why you don't need to close them. It isn't wrong to, though.</p> </body> </html>
Now see what it looks like. Try loading
your own file. Play around with it a bit: take out the
<p>
end-tag. Now take out one of the heading
end-tags. What happened?
The other kind of thing that <body>
can contain is
lists. There are three types of lists: ordered (numbered) lists,
unordered (bulleted) lists, and definition lists.
The ordered list is <ol>
; the unordered list is
<ul>
. Both lists need start- and end-tags.
The lists contain list items: <li>
. The list items do
not need end tags. See the sample document:
<html> <head> <title>My First HTML Document!</title> </head> <body> <h1>Welcome to my Web page!</h1> <p>This is a paragraph in a Web page. Note that it begins with the paragraph start-tag, but doesn't need an end-tag. <h2>This is a lower-level heading</h2> <p>Here is another paragraph, below the lower-level heading. <p>And yet another paragraph. Each paragraph signals the end of the previous one, which is why you don't need to close them. It isn't wrong to, though.</p> <h2>Lists</h2> <p>This is an ordered, or numbered, list: <ol> <li>This is the first item. <li>This is the second item. </ol> <ul> <li>This is an unordered list. <li>Most Web clients will show bullets in front of the items in this list. </ul> </body> </html>
Once again, see the sample in your client. Load your own. Try adding end-tags for the list items. Does it make a difference? It does, in some poorly-designed Web clients.
Definition lists do not contain items, but rather terms and
definitions. The definition list is <dl>
. The term
is <dt>
and the definition is <dd>
.
The sample document is getting too long, so here's an out-of-context
example of a definition list:
<dl> <dt>sugar<dd>a natural sweetener <dt>geek<dd>someone who bites heads off of chickens <dt>web<dd>something spun by spiders </dl>
See what it looks like in your client. Play with your own document, etc. You get the gist.
Note that the <dt>
and the <dd>
do
not need end-tags, but the <dl>
does. Most Web
clients will handle this list with each term and each definition
starting a new line, with the <dd>
indented. To keep
them on the same line if possible, use the compact
attribute.
Attributes go inside the start-tag for an element. The
attribute name is followed by an equals sign (=
) and the
attribute value. For the definition list, there is an attribute named
compact
, which can take a value of compact
:
<dl compact="compact">
. In certain cases, attributes
can be abbreviated; in this case, you can simply give the attribute
value: <dl compact>
. That example again:
<dl compact> <dt>sugar<dd>a natural sweetener <dt>geek<dd>someone who bites heads off of chickens <dt>web<dd>something spun by spiders </dl>
You know what to do. Does your client display the compact and non-compact lists differently?
Those are the basic "block-level" elements; they are paragraphs or things like paragraphs. They generally cause line-breaks and have some space before and after them. Now we'll discuss the "flow-level" elements. These are the ones that go inside block-level elements; they cause things like italics, boldface, and hypertext links.
The <em>
element is emphasis. In
GUI clients, it's usually presented on the screen in italics, but is
better than simply saying "italic" (<i>
) because it
gives a reason why the text should be italic. Not everyone
on the Web is even reading a screen; some are listening to speech
synthesizers, and some are reading Braille terminals.
Use of the <em>
element also distinguishes its
content from the <cite>
element, which is used for
book, magazine, or movie titles. There are many services that index
the content of the Web, and <cite>
elements are
indexed differently by some of these.
For the same reason, <strong>
is preferred to
simply saying "bold" (<b>
), but it's harder to type,
and there are no other boldface-inducing elements, so this rule is
often ignored. Ninety percent of your audience will not notice the
difference. Here's an out-of-context example:
<p>This is a paragraph. It's <em>very</em> important that you read it carefully. <strong>This sentence is <em>especially</em> critical.</strong> I just read a book called <cite>Outlander</cite>, and watched a movie called <cite>My Neighbor Totoro</cite>. For contrast, my employer makes a product called <i>Dyna</i>Web, and in this case the italics are purely presentational, with no inherent meaning.</p>
Note that all of these elements require end-tags. If you forget one of these, it will apply to the rest of the document in many brain-damaged Web clients, including Netscape and Mosaic.
Now we come to the two elements that are the heart of the Web:
<a>
and <img>
. The
<a>
element is the hypertext anchor. Its primary use
is to create links to other documents. Here we have another use for
attributes: <a>
takes an attribute called
href
, whose value is the URL of the target document. The
URL must be in quotes.
The URL can be fully specified: <a
href="http://www.shore.net/%7Ecrism/index.html">
. Often,
though, groups of related documents are stored in the same directory.
There is a useful shortcut for specifying the URL: if only a filename
is given, the Web client will assume that the method, host, and
directory are all the same as the current document. This is extremely
useful if one is using a service provider, and doesn't know in advance
what the exact URL for one's documents will be. <a
href="index.html">
will find the same document as the example
above, if it's in the same directory as the current document. This
also makes it possible to move collections of documents without having
to update all of the links.
Hypertext <a>
links do not have to go only to HTML
files; they can also link to graphics, text files, PostScript files,
spreadsheets, compressed executables... Anything specifiable by a URL
can be the target of a link.
The <a>
element is a flow-level element. It should
only occur in paragraphs, list-items, etc., and always requires an
end-tag.
The <img>
element works similarly, except that the
URL is given by the src
attribute. The Web client will
load this as an in-line image, so in this case, the target should be a
graphic (usually a GIF or JPEG). One should also give an
alt
attribute, with text to display in the case that the
client can't display graphics, or has inline graphic loading turned
off. (Forty percent of Web users do not load graphics when first
visiting a page, or at all.) The alt
attribute should be
short, but descriptive enough that the user can decide whether or not
to look at the picture. It also can not contain <
or
&
. (There is a way that should make it possible to
use these, but most Web clients will behave improperly if you try.)
Also, since the attribute value starts with a double-quote
("
), it will be ended by the next one, which means that
it can't contain any. <img src="mypic.gif" alt="My smilin'
face.">
The <img>
element is a new concept: an
empty element. It not only requires no end-tag, but
it has no content! A picture is inserted wherever the
<img>
tag occurs. The <img>
element
is also unusual in that it can be either a block-level
or a flow-level element.
Sometimes you need a line-break in the middle of a block-level
element, but it's not really a new paragraph or list item or whatever.
The <br>
element provides that. Like
<img>
, it's an empty element: no end-tag, and no
content.
Many Web pages have horizontal rules across them. This is done
with the empty <hr>
element.
I mentioned before that there are ways around the inability of
elements to contain less-than signs or ampersands. That way is to use
an entity reference. That's a technical term; all it
means is an ampersand (&
), followed by the name of a
character, followed by a semicolon (;
). That's why you
can't use an ampersand ordinarily; it signals the start of an entity
reference. So, to get "AT&T" on your Web page, you'd type
AT&T
. This is also useful for sample HTML - take
a look at the source for this page.
You can also use entity references to get western European accented
characters and some other characters, but <
and
&
are the only ones you need to know about to get
started.
Most Web clients display pages in a proportional-width font, like
Times or Helvetica. However, sometimes it's nice to be able to use a
monospaced font (like Courier) for an entire block of text, like for
the sample HTML pages in this document. This is when
<pre>
can be used - for "preformatted" text. Whereas
carriage returns and spaces are collapsed in other kinds of
paragraphs, and the text wrapped to fit the Web client window, in
<pre>
all carriage returns and spaces are honored, and
the text is usually displayed in a monospaced font.
The <blockquote>
element can contain all of the
same things that <body>
can. This is so that chunks
of an HTML document can be quoted directly within a
<blockquote>
. Usually, the display of those elements
is exactly the same as if they weren't in a
<blockquote>
, but they are indented a bit, like a
large quote in a textbook.
Lists (the <ul>
, <ol>
, and
<dl>
mentioned previously) can be nested. List items
(<li>
) and def-list definitions (<dd>
)
can contain all block level elements, including other lists. For
example:
<h3>Building a widget</h3> <ol> <li>Select the type of widget: <ul> <li>brown <li>green <li>purple </ul> <li>Pick the type of paint: <dl compact> <dt>enamel<dd>shiny, good for attracting small animals <dt>matte<dd>duller, doesn't show dirt </dl> <li>Etc.... </ol>
Your Web client will show this something like this (which gives me
a chance to demonstrate a <blockquote>
):
Building a widget
- Select the type of widget:
- brown
- green
- purple
- Pick the type of paint:
- enamel
- shiny, good for attracting small animals
- matte
- duller, doesn't show dirt
- Etc....
The <address>
element is somewhat of a leftover
from earlier days of the Web. It's a block-level element, but its
visual behavior is unpredictable. Some Web clients indent it, some
italicize it, some do both.
That's quite a bit more than the basics. For more information, start at the World-Wide Web Consortium. I recommend playing around with the concepts above. Just create a text file and change it around. Most Web clients can open a file on your computer.
Below are some irrelevant details - the stuff above should be enough to get anyone going with a Web page. You can also skip to the end for a more advanced guide.
The other common Web authoring mistake
involves leftovers from the early Web. In those days, the tags didn't
signal containers' start or finish - they were just processing
instructions. Then, the <p>
tag acted as a carriage
return. You can still see Web pages (and several prominent Web how-to
pages!) that recommend this usage. It's
wrong. It will still work, but as HTML
evolves into the future, new features may require Web clients to be
stricter about the way they handle documents, and these old pages may
very well break. When starting from scratch, it's easier to start
right.
There is actually a very strict set of rules governing the relationships between elements - which ones may contain which others. There is a validation service on the Web for checking the conformance of an HTML document to these rules. However, its output is a bit esoteric.
Formal HTML has a document type declaration at the
top: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
or
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2
Final//EN">
or something along those lines. However, this
implies that the document below it is strictly valid HTML, according
to the relationship rules mentioned above. Only include a line like
this in documents that pass a validation test.
Sticking with strictly valid HTML 2.0, there is no way to get fancy features like centering, backgrounds, etc. I strongly recommend avoiding backgrounds; in some cases, they do make a page prettier for some users, but they add nothing to the content, increase download time, and can render a page completely unreadable for others.
Centering is usually unnecessary. If you must center something, do
not use the <center>
tag. It will screw
stuff up on some Web clients, and look silly on others. Besides,
there's a better way. It predates the existence of Netscape, is
better from a semantic point of view, and doesn't break things on Web
clients that don't know about it. And here's the kicker: Netscape
knew about it, and implemented it, but decided (for some unknown
reason) to add <center>
- and document it,
but not the other, cleaner way.
There is an attribute of align="center"
available on
nearly all block-level elements, which will center text the same as if
<center>
were used. E.g., <p
align="center">
.
There will soon be another guide on more advanced topics, including:
Copyright © 1995, 1997 Christopher R. Maden. All rights are reserved, but permission to quote or reproduce will almost certainly be granted if you ask first.
Last updated and validated 12 April 2001 by crism.