Anne van Kesteren

Adding markup {1}

I want to do something for al those people who want to do markup & style the correct way, that's why I'm starting this new chapter on my weblog, learning. Learning will cover all the basics towards a better/backward/forward compatible site. Everything explained is based on XHTML1.0, which only slightly differs from HTML4.01. The lessons are for everyone, who wants to have better search results and a more maintainable site. You site looks will remain the same. The only thing we do is add some extra attributes and elements and replace some with CSS, while everything remains working well. Because you can always improve markup and you don't necessarily have to start with a tableless site (actually, starting with a table based design will give you more insight of how powerful good markup and style is.

(You don't know anything about (X)HTML? Start here: XHTML tutorial from the scratch or here: Getting started with HTML.)

Let's start with the first tag in you document, the <html> tag. Probably it just looks like this, with no additional attributes. What do you think if we change it to this:

<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en" id="website-extension">

Not everything is being made smaller you know ;). The first attribute is probably the hardest one of the all, but essential for letting a browsing device know that you use XHTML. If you just use HTML you can leave it out. Then we have the lang and xml:lang attributes. The first is for backward compatibility with older browsers (and the only one available for HTML) the second, xml:lang is there for forward compatibility. The latest attribute is the id attribute, which you probably already know. We have it there, so that a user can make a specific style sheet for your site, we call that accessibility ;). The language attributes are also very important for accessibility and Google. This way a screen reader will know what the primary language is for you web document and Google can index your site under a specific language, which will improve the search results.

The <html> tag is always followed by the <head> tag. Within the head element, you can specify a lot of other elements. For now, the most important is the title element. Every document should have one (not more and not less). This element is also important to Google and should contain the website's/company name and a short description of the content of that page. It is recommend that you don't specify the following element and content within the head element: <meta http-equiv="content-type" content="text/html;charset=iso-8859-1" />. This should be done through a server-side language, like PHP or should be set in the webserver. In PHP you could put this at the top of your documents:

<?php
 $charset   = "iso-8859-15"
 $mime_type = "text/html"

 header("content-type:$mime_type;charset=$charset");
?>

You should change the values of the variable $charset to the encoding you are using. If you want to send XHTML as real XML, you can read send application/xhtml+xml on how to do that in PHP. You should know that sending XHTML 1.0 as text/html is perfectly valid, but it is recommended that you send it as application/xhtml+xml to browsers that support it.

If you can't use any server-side language and you don't have access to you webserver, it is required that you use the meta element or a PI to specify your character encoding. The meta element which contains the character encoding should be the first element within the head element. A PI should appear at the top of the document (before the html element).

Example with PHP:

<?php
 $charset   = "iso-8859-15"
 $mime_type = "text/html"

 header("content-type:$mime_type;charset=$charset");
?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en" id="website-extension">
<head>
 <title>Example page with PHP</title>
</head>
<body>
 <h1>Example page with <abbr>PHP</abbr></h1>
</body>
</html>

Example with the meta element:

<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en" id="website-extension">
<head>
 <meta http-equiv="content-type" content="text/html;charset=iso-8859-1" />
 <title>Example page with PHP</title>
</head>
<body>
 <h1>Example page with <abbr>PHP</abbr></h1>
</body>
</html>

Example page with the Processing instruction:

<?xml version="1.0" encoding="iso-8859-1"?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en" id="website-extension">
<head>
 <title>Example page with PHP</title>
</head>
<body>
 <h1>Example page with <abbr>PHP</abbr></h1>
</body>
</html>

Additional resources:

Comments

  1. Sorry, you can't have an id attribute in the html element. It sounds crazy, but that's the way it is. You'll just have to put it in the body tag instead (which is not entirely satisfactory, since we might need to style the html element as well in XHTML).

    If you don't believe me, try it with a validator.

    Posted by Bertilo Wennergren at

  2. There are ways to make it possible: ID on root element

    Posted by Bas Hamar de la Brethonière at

  3. BTW it is allowed in XHTML1.0 (second edition) and that's the main specification, this tutorial(s) are about.

    which is not entirely satisfactory, since we might need to style the html element as well in XHTML

    We have to style the html element in XHTML, but that topic was a long time ago.

    Posted by Anne at

  4. For the sake of forward compatibility towards XHTML 1.1 it may be better to avoid the complex issue of id on the root element, as this is a tutorial. Styling the head element is advanced stuff anyway, for which I personally haven't found any use yet (although I can imagine it being used for debugging purposes). So id on the body element will do.

    A much more serious problem is the omission of the DOCTYPE declaration here. You will not have a valid XHTML document without one! It is not the namespace attribute which informs the browser that we are using XHTML, but the DOCTYPE declaration. I suppose the next tutorial will be about this issue…

    I would also drop the lang attribute. I really think xml:lang will do. If you are concerned about backwards compatibility I would prefer using a meta tag for this purpose.

    Posted by Ben de Groot at

  5. Ben,
    XHTML1.1 is based on the modularization which should be updated IMHO in order to include the id attribute. It is also not meant to style the head element, but the html element. Otherwise you can't specify the background-color properly. XHTML == XML, remember?

    You are 'also' wrong about the DOCTYPE. Including such thing can make big differences on the current design. This tutorial is only based on proper markup, not to make a completely valid site. That will come later, but it can be a big disadvantage for newcomers to learn how to handle DOCTYPEs.

    Since xml:lang isn't even supported in Mozilla, I include the lang attribute for compatibility. That's also a reason why I don't talk anybody into application/xhtml+xml.

    Proper markup is most important from my point of view. The next part will handle the basics of style sheets, so I can eliminate some rubbish attributes (maybe even elements) in the third.

    Posted by Anne at

  6. About the namespace: That is exactly the attribute which tells the browser what kink of markup language we use. Try this:

    1. Make two documents.
    2. Omit the doctype.
    3. Make both well-formed XHTML and set a header element within both.
    4. Add the xmlns attribute with the appropriate value into one of the documents.
    5. Save both the documents with the extension '.xhtml'
    6. Take a look in Mozilla ;).

    Posted by Anne at

  7. I tried what you said in #6. If I save a test document as .xhtml my server will send it as application/xhtml+xml, causing it to be processed as XML by Mozilla Firebird. Without DOCTYPE declaration it chokes on a character entity, with and without the xmlns attribute. This tells me the namespace is not enough to tell the browser we are using XHTML. We do really need the DOCTYPE declaration. It is important to tell this to beginners as well!

    Posted by Ben at

  8. But you are right about styling html. We do need this, so we should be able to use id on the html element.

    Posted by Ben at

  9. The meta element which contains the character encoding should be the first element within the head element.

    Any specific reason why it should be the first element?

    Posted by David H at

  10. The META declaration must only be used when the character encoding is organized such that ASCII-valued bytes stand for ASCII characters (at least until the META element is parsed). META declarations should appear as early as possible in the HEAD element.

    This can be found within the first resource I specified.

    Ben,
    Now try it with a DOCTYPE, without the xmlns attribute. It is not about entities, although they are a part of the specification. It is how the h1 element is treated for example and how the document will be handled if there is now style applied to it.

    Posted by Anne at

  11. Thanks, didn't know that.

    Posted by David H at