HTML and XHTML The differences explained

First things first: 
To understand the differences between HTML (HyperText Markup Language) and XHTML (eXtensible HyperText Markup Language), I think it's best to first understand what XML (eXtensible Markup Language) is. 

SGML (Standard Generalised Markup Language), is the mother of all markup languages. It is simply the international standard used to define the structure and appearance of documents. HTML is the web's version of SGML. However, SGML is very much more complicated than HTML, and HTML lacks a lot that SGML is. 

One major thing that HTML lacks is the ability to tell the browser what sort of information is contained in the document. There are the limited capabilities of <meta> tags, but nothing more. 

XML is another branch from the SGML tree. However, XML allows web designers to define both the structure of the page, and the type of information being presented. 

XML produces a document that is very easy to search and manipulate, which is why XML is being called the greatest thing to happen to the internet since HTML 

XSL (eXtensible Style Language), simply converts an XML document into another type of document, such as a HTML document for display on the web. It can also define a page's output (what it looks like, etc.). It is in no way a replacement for CSS however. 

The basics of XML 
XML, like HTML, is made up of tags. These tags are used to describle how the browser is to render the document. However, XML also allows you to create your own tags, and define what they appear like. You create a DTD (Document Type Definition), to define these tags. The DTD is used to describle what type of information each particular tag contains, etc. 

You might have a list of people, along with their corresponding jobs. To us humans, the difference between the person's name and job is very apparent. We know that their name is John Smith , and his job is a web designer . That is common sense to us. We realise that web designer isn't his name. However, the computer doesn't. All it sees is a long list of alphanumeric characters. We can use XML to give the items in the list a meaning. 

Now, using the DTD and special tags, the browser can tell the difference between each item, and knows what sort of data it is. The DTD can also be used to inform the browser on how to actually display the data. 

The real stuff 
All of the above was only an introduction to what this tutorial is about. Now we get to the proper part of this tutorial. 

The aim with XML and XSL is to free the structure of the page from it's style. However, HTML is very, very popular indeed now, and XML is a lot more complicated. There needs to be some intermediate technology to aid the conversion. 

Enter XHTML: 

XHTML is simply a hybrid of the latest version of HTML (HTML 4.01), and XML. When HTML was first thought of, it was meant to be a structured language. The early version were structured. However, when Microsoft and Netscape were developing their browser's, they felt that the current HTML tags and their associated attributes weren't enough, and so created their own. The W3C has slowly looked at these new tags, and made some of them valid, so that they appear in the next official release of HTML. However, this leads to an un-structured language, which leads to hate, which leads to suffering, which leads to the dark side. Well OK, maybe not, but you know what I mean ;-) 

XHTML is simply HTML, re-structured. Many tags, attributes, etc. have been removed to make XHTML into a more structured language. XHTML uses the XML DTD, but also uses the same tags as in the HTML DTD. This is why it is a hybrid language. 

The end result of this merger is that a web designer can use the obvious strengths of XML, but the code will still work, even if the browser being used doesn't support XML. 

So what does the X in XHTML mean? 
The X stands for eXtensible . This means that it's a lot easier to add new capabilities to XHTML than it is to HTML. With HTML, the internal code of the browser describles how each font is rendered. With XHTML, an external DTD is used to do this. This makes XHTML more modular, as a change in the XHTML DTD will affect all XHTML documents. This means that the capabilties of XHTML can be altered and improved for future browsers, and other internet-enabled devices. All these changes will not be at the sacrifice of backwards-compatibility, however. 

You need a DTD at the top of each XHTML page for it to work properly (above the <html> tag). This is the DTD that I use: 

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" 
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

You also need to add two attributes to the <html> tag, for the XML compatibility: 

<html xmlns="http://www.w3.org/1999/xhtml" ¬ xml:lang="en">

A standard language is needed 
There are more and more palm pilots, TV sets, mobile phones, etc. that can connect to the internet. If you think it's hard coding for the handful of browsers we have now, imagine how hard it would be to code for all the different devices that could soon be out. These devices will also have a much smaller bandwidth than the average PC, and so filesizes are at a premium. Again, enter XHTML. It's a standard-compliant language, and can produce good, well-structured websites, that still keep a small file size. 

Converting from HTML to XHTML 
There really isn't a lot to the conversion. The first step is using the right DTD as shown above. You still have a <head> and a <body> tag. The head tag also needs to contain a <title> tag, which most HTML documents do anyway. 

A major rule is that you cannot overlap tags. So, 

<p>this is <b>bad nesting</p></b>

While this is good nesting: 

<p><b>good</b> nesting</p>

Can you see the difference? You put all text in a paragraph tag. You can use the <b> tag, but you must close the <b> tag before the <p> tag, as the <b> tag was opened last. This might clarify it a bit more: 

<p>
<b>
Good
</b>
nesting (and formatting)
</p>

All tags and attributes have to be lower-case. HTML isn't case-sensitive, but XML is. 

All tags must have an ending tag. This means that you cannot simply use a <p> to make a gap between your text. You must enclose the paragraph you're wanting to create with an opening and closing <p> tag. 

For any tags that have no ending tag (like <br> or <hr>), You need to put a space then a forward-slash just before the closing chevron: 

<br>

Becomes: 

<br />

Another rule is that you can't nest links. At all. So, the following wouldn't work: 

<a href="one.htm">
click 
<a href="two.htm">
me
</a>
</a>

However, I have no idea why you would want to do that... ? 

All attribute values must be in quotes (I'd make them double quotes): 

<img src=myImg.gif />

Becomes: <img src="myImg.gif" />

Many people hide the contents of a <script> tag with HTML comments. You cannot do this either with XHTML. 

The End.

Well, that's about it as far as XHTML goes. 

I hope you understood the tutorial: I've tried to write it as basically as I could think of, but if you don't understand something then it's likely other people don't either, so please visit my website and contact me and I will try and clarify it. 

Credit: Chris Poole / Homepage.

This information has been in the DesignerWiz.com Public Library.