Converting HTML to XHTML

March 6, 2006 - 10:10am — Brandon

This tutorial will guide you through the steps necessary to convert your existing website into one which meets and/or exceeds XHTML 1.0 compliance rules.

For the purpose of this tutorial, we'll assume that you're converting your pages into the XHTML 1.0 Transitional DTD.

Once you are fully versed with the XHTML 1.0 Transitional Doctype, you're free to move onto the stricter XHTML interpretations: XHTML 1.0 Strict, XHTML 1.1 or even XHTML 2.0.

This tutorial will look at these areas:

  1. The basic XHTML document
  2. Proper Nesting & Empty Tags
  3. Tag syntax
  4. Attributes & alt=""
  5. ID/Name
  6. Deprecated Elements

1. The Basic XHTML Document

While HTML was relatively lax when it came to requirements of what it considered a well formed document, XHTML is very strict in the structure that each web page must take. Every XHTML document is made up of 3 distinct parts:

  1. The Doctype
  2. The Head
  3. The Body

Let's take a look at what is considered a well formed document.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html" charset="iso-8859-1" />
    <title>Title Goes Here</title>
  </head>
<body>Content Here</body>
</html>

The above is the absolute minimal code that defines a fully XHTML valid web page. The <meta> tag can be omitted, but only because the validator automatically falls back to the encoding UTF-8 when no other encoding method is specified. Point being, to control the encoding, include the <meta>.

2. Proper Nesting & Empty Tags

In HTML, it was often acceptable to omit closing tags, for example:

<p>this was a paragraph.
<p>this is another paragraph.

Omitting the closing tag would have no real effect on how the HTML was rendered. You would only really notice under certain circumstances, for example when you forgot to close an anchor tag <a>. In XHTML, ALL tags MUST be closed. To be XHTML compliant, the above would instead have to be:

<p>this was a paragraph.</p>
<p>this is another paragraph.</p>

Proper Nesting

In HTML you could also get away with improper nesting, for example, <b><u>some bolded underlined text</b></u>. Here, all tags are closed, but the nesting - the order in which they are closed is incorrect. The correct nesting for this would be <b><u>some bolded underlined text</u></b>. If it helps, you can think of the underline tag as being a child of the bold tag. Therefore, the underline needs to be closed before you can close the "parent" bold tag.

Empty Elements

There are some elements in HTML for which a closing tag is usually not used. Very often the <p> tag is such an element, but there are others, like <meta>, <link>, <img>, <br>, <hr>, <input>, <option> etc. XHTML says that ALL elements must have a closing tag. You can do this either by using syntax like <img src="picture.jpg"></img> or <hr></hr> or you can use the shorthand notation, which looks like this: <hr /> or <br /> or <img src="image.jpg" />. Note that there is a space before the /, but the closing > follows directly after the /. In XHTML <br /> is exactly the same as <br></br>. Make sure you include the space before the / to ensure compatibility with older browsers.

3. Tag Syntax

In XHTML, ALL tags must be in LOWER CASE, whereas in HTML the case of tags wasn't important. <TD>, <P>, <HR> etc. are NOT valid.

4. Attributes & alt=""

In XHTML, there are several important changes where tag attributes are concerned. Firstly, all attributes must be in lower case. For example, <img SRC="image.jpg" ALT="some text"> is NOT valid. <img src="image.jpg" src="some text"> is the proper way of coding an image.

Secondly, all attributes must be enclosed in quotes. For example: <img src=image.jpg alt=some text> is invalid, whereas <img src="image.jpg" alt="some text"> is valid.

Thirdly, attribute minimization is forbidden in XHTML. In HTML it was ok to write <input checked>, whereas in XHTML, this must be written as <input checked="checked" />. Note the use of the shorthand notation for the closing tag.

Incorrect:

<input checked>
<input readonly>
<input disabled>
<option selected>
<frame noresize>

Correct:

<input checked="checked" />
<input readonly="readonly" />
<input disabled="disabled" />
<option selected="selected" />
<frame noresize="noresize" />

This is a list of the minimized attributes in HTML, and their proper XHTML form.

HTML XHTML
compact compact="compact"
checked checked="checked"
declare declare="declare"
readonly readonly="readonly"
disabled disabled="disabled"
selected selected="selected"
defer defer="defer"
ismap ismap="ismap"
nohref nohref="nohref"
noshade noshade="noshade"
nowrap nowrap="nowrap"
multiple multiple="multiple"
noresize noresize="noresize"

Required alt Attribute on Images

The alt attribute on images has long been abused by web designers. In a standards compliant browser, the text in the alt attribute won't be shown "on hover." It's supposed to be used for screen readers etc. as a description of what the image is. That is a whole other discussion though. In any event, in XHTML, the alt attribute is required on all images. Ideally each alt attribute would include a good description of what the image is, but if that's not the case, make sure you include the alt attribute, but just make it empty like this: <img src="image.jpg" alt="" />

5. ID/Name

In HTML many tags had an optional name attribute. The name attribute has been deprecated in favor of id. For example <img src="image.jpg" name="picture1"> is incorrect in XHTML, and should instead be written as <img src="image.jpg" id="picture1" />. To be compatible with older browsers, you can use both the name and id attributes, like so: <img src="image.jpg" name="picture1" id="picture1" />

6. Deprecated Elements

There are some tags and tag attributes that were deprecated in HTML 4. In XHTML these elements are no longer allowed (note: in XHTML 1.0 transitional some of these are still allowed, but for future compatibility you should try and get into the habit of not using these.) For the most part, these elements are deprecated because their functionality is replaced by using CSS style sheets. Separation of content and style (through CSS) is a major goal of XHTML, therefore many so called "presentation elements" are deprecated.

Deprecated Tags

The following are deprecated tags. In brackets you'll find how their functionality can be replaced in an XHTML compliant way.

  • <applet> (<object>)
  • <basefont> (CSS)
  • <center> (CSS)
  • <dir> & <menu> (<ul>)
  • <isindex> (<form>)
  • <s> & <strike> (CSS)
  • <u> (CSS)

Deprecated Attributes

The following deprecated attributes can all be replaced by using CSS. In brackets are the tags to which the attributes commonly apply.

  • align (caption, img, table, hr, div, h1..6, p)
  • alink (body)
  • background (body, td)
  • bgcolor (body, table, tr, td, th)
  • clear (br)
  • compact (ol, ul)
  • color (font)
  • border (img, object)
  • hspace (img, object)
  • link (body)
  • noshade (hr)
  • nowrap (td, th)
  • size (basefont, font, hr)
  • start (ol)
  • text (body)
  • type (li)
  • value (li)
  • vlink (body)
  • width (hr, pre, td, th)
  • vspace (img, object)

Using the W3C Validator

The W3C's validator, located at http://validator.w3.org/ can and should be used to check for errors while you convert your pages to XHTML. The most important thing you need to do before using the validator is to make sure you have a DOCTYPE defined for the document you're trying to validate. The doctype determines which standard (any of the XHTML 1.0 flavours, XHTML 1.1 etc) the validator will validate your code against.

While converting your web pages, a quick tip is to create a temporary link to http://validator.w3.org/check?uri=referer somewhere on the page. That way, by clicking on the link, the current page is automatically run through the validator. Once you pass the validation check, you can remove the link. The W3C validator is an invaluable resource and one you should become familiar with.

Conclusion

Converting your website to be XHTML may seem like a daunting task, but in the long run you'll end up with cleaner code that will make your website more accessible, and decrease the variations in its rendering across different browsers. Putting in the effort to learn the difference will also help you code XHTML compliant pages in the first place, so conversion won't be necessary. You should also be very careful about using programs like Microsoft Word to create web pages since Word tends to add a lot of junk code which often is not standards compliant. The more you use XHTML standards the more you'll appreciate the beauty of its design.

Links

W3 Schools on XHTML
W3C Validator

Share/Save
Tags: Design, html, xhtml