Anne van Kesteren

Struikelblok and quality weighting

12 November 2003

Two totally separate items in one weblog entry. I hope you don't have a problem with it. First: Struikelblok: a website from Dutch professionals for Dutch professionals about accessibility. If you didn't knew it, read it!

Second: quality weighting. This has to do with HTTP and such. Simon Jessey has an entry about it: Vary. My question to you: "Why would someone put 'application/xhtml+xml' in his accept header if he doesn't want it".

Lot's strange stuff in that HTTP specification if you ask me. If your browser can handle it accept it! (Should you use the 'vary' header if you are different MIME-types to different browsers?)

Comments

Mark Pilgrim mentioned the q settings thing in a mezzoblue discussion a while back (too lazy to link, sorry).
I made the same point then--there seems to be nothing in use anywhere that weights text/html above application/xhtml+xml. As it stands, I can't see it as an issue.
Posted by Sean at 1:56AM
Oh yes there is such an agent. Here's the accept header for Opera 7.20/Windows.
```
Accept: text/html, application/xml;q=0.9, application/xhtml+xml;q=0.9, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
```
So since I also on my site check the Q ratings, Opera is being served text/html by me even though I know full well it could do application/xhtml+xml just fine. But I refuse to start browser sniffing.
Posted by Bill Mason at 2:28AM
Struikelblok: lettertype te klein. Vragen-formulier werkt niet (action=#fout)
Verder wel goed initiatief natuurlijk.
Posted by Arthur! at 7:02AM
I don't think it's about what the browser prefers, but what it accepts.
W3C say that XHTML should be served as application/xhtml+xml. Thus, if the browser proclaims that it accepts that MIME-type, I'm going to serve the document as such—regardless of any q-value. It doesn't matter if the browser prefers text/html, because that is not the recommended MIME-type for any version of XHTML.
Posted by TOOLman at 1:49PM
Arthur: tekstgrootte is schaalbaar, te klein is in dit geval dus een relatief begrip. ;-)
Verder ben ik natuurlijk benieuwd wanneer je de foutmelding krijgt bij het vragenformulier. Ik stuur je ook een mailtje met mijn vraag hierover.
Veel dank voor je feedback en je compliment over dat het een goed initiatief is.
Posted by rjv at 3:04PM
I have to disagree. You shouldn't get to pick and choose what parts of the RFC you want to respect. If you're going to check the accept header, you can't just ignore the Q rating. Otherwise, why bother caring what's in the header at all? Or why worry about specifications if you're only worrying about the ones that you want to and disregarding the others?
Posted by Bill Mason at 3:46PM
I can see the logic in both arguments, but on the face of it I do believe that respecting the Q rating is probably the way to go. So user agents that can accept application/xhtml+xml SHOULD be fed XHTML with that type, unless the Q-rating suggests otherwise.
With that in mind, I intend to update the article on Keystone Websites in the near future.
Posted by Simon Jessey at 7:48PM
~~I made a small test case: MIME switch test. The source is available by adding a 's' to the extension.~~ No longer available.
Posted by Anne at 11:08PM
I think that if the document you're serving is an xhtml document, you should serve it as application/xhtml+xml, as long as the browser accepts documents of that type. The q value (as far as I know) only applies when documents of several different types can be sent. Packaging a XHTML document as text/html doesn't make it an html document, any more than packaging it as a image/png makes it a png image. HTML and XHTML documents have different semantics, different abilities (you can embed other namespaces in XHTML, for example) and a different (though similar) syntax. That's why they have different MIME types. The fact that they look similar means that mislabeling XHTML as HTML is a hack that can be beneficial; it doesn't make it the right thing to do even if the browser would prefer to be sent text/html than application/xhtml+xml
If, on the other hand, you are capable of transforming the content to a different type (assuming lossless transformation), then by all means, give the browser what it prefers by inspecting the q value. If the transformation is lossy (e.g. one could render an XHTML document as a jpeg and serve that to browsers which prfer JPEGs to XHTML) then you need to make an intelligent descision about the right course of action.
Posted by jgraham at 4:50AM
The q value (as far as I know) only applies when documents of several different types can be sent.

That is perhaps the most important thing to consider, because in this case we are offering different types of documents. In fact, we are basically letting the browser choose what it wishes to receive - proper XHTML or proper HTML. I now believe it is inherently wrong to send XHTML as text/html, so a third "hybrid" option is out of the question.
I have had a very busy week, so I haven't been able to devote any of my time to blogging or article updates. I will do my best to update the article to include an examination of the q rating, incorporating some code from Bill Mason.
My hope is that by making this technique freely available, correctly served XHTML will become more prevalent. By taking advantage of XHTML's extensibility, developers can offer more to the user with the better web browser, and perhaps cultivate a little envy in the IE user. It may even help to push IE development, but I won't hold my breath.
Posted by Simon Jessey at 9:00AM
in this case we are offering different types of documents

Just to empahsise my point, you're also assuming lossless conversion. In particular, you seem to be assuming that a valid XHTML1 document can be converted into a valid HTML4 document by changing the doctype, altering the shorttag form and removing the xml decleration. This is often but not always true. Obvious examples of things that miight prevent this from being true include :
- Different CSS behaviours (specfically the behaviour of the body and html elements in each case
- Different DOMs (I'm not actually very clear on this point; but I'm pretty sure that browsers are expected to behave in different ways)
- Ability to include namespaced content. In XHTML, you can include inline MathML or SVG, for example. This isn't possible in HTML4.
So, assuming that none of the non-syntatic differences between XHTML and HTML are a problem then, by all means, change the syntax and send the content in the form the browser prefers. If you're able to work around the above problems in some way (link different stylesheets to the two document types, automagically place namespaced content in external files referenced via object, or whatever) then send the type that the browser prefers. If neither of these things are true, then whether the q value of application/xhtml+xml is higher than that of text/html is irrelevant because all you have is an XHTML document.
Posted by jgraham at 5:12PM
Responding to jgraham post #12:
I think you can cover almost all differences between XHTML and HTML in some few string replacements using Regular Expressions. On top of that you might implement a caching system. This is something I'll be working on when I get time to look more closely at the WordPress hacks at my website.
In HTML, tag names for Element Nodes are uppercase, in XHTML they are lowercase. You can and actually should use oNode.tagName.toLowerCase() if you want to compare tag names to a string value. As far as I know this doesn't matter for document.getElementsByTagName, but perhaps (this sounds logic to me) document.getElementsByTagName for an uppercase tag name does not work in the true XHTML compliant browsers.
Posted by Mark Wubben at 12:57AM
Mark:
OK, here's some of the things that you would need to do to convert a XHTML document to an HTML document:
- Make sure all tags are lower case
- Remove the XML decleration
- Convert any XML PIs that reference CSS stylesheets to link elements
- Deal with any other PIs in an appropriate way
- Change the doctype
- Change the shorttag form
- Expand all entity references
- Deal with any character encoding issues
- Hope any scripts don't break (e.g. because you just removed something they relied on)
This list probably isn't comprehensive but lets pretend that, for a pure XHTML document, that's all you have to do. Most of those requirements could even be done with regular expressions. However you still haven't dealt with the issue of namespaced content. Since the ability to use content from non-XHTML namespaces is one of the big advantages that XHTML has over HTML, this really is a problem. Undoubtedly, there are ways to deal with this. One could, for example, render all MathML documents to PNG files then replace the MathML markup with the graphic. A similar approach would work with SVG. However, in neither case is the conversion lossless; an equation rendended as a PNG can't be read by a screen reader, nor can it be resized with the surrounding text, or read by a MathML supporting program (I don't know if this is actually possible, but one can imagine saving a MathML fragment from a web page and then reading it into, say, Mathematica, in order to graph the function, or carry out calculations based on it). However, we're now way beyond what can be acheived with "some few string replacements".
I'm not trying to discourage people from writing programs to convert XHTML to HTML 4. Indeed, I would very much like such a program myself. I'm just pointing out that, for cases where XHTML provides a substantial benefit over HTML4, these conversions are distinctly non-trivial and inherently lossy processes.
Posted by jgraham at 2:20AM
jgraham,
How is the fact that a "MathML supporting program" might not be able to read a PNG file? Isn't the point of all this that every media is provided with what it supports?
Sure, you can't resize the text in PNG like you can in MathML, and sure you make a list of things you will loose when converting a document from one technology to another...
The point is, the PNG (which should be read as a metaphor for all second-choice technologies) is to ONLY be served to media that DO NOT understand MathML.
Thus, providing a second (or more) choice equivalent. That's the point of the convertion in the first place, is it not?
Posted by ACJ at 5:24AM
Yes, I appreciate that. The point is this: One must consider these losses when converting from one format to another. The original point of this post was whether one should send XHTML to all applications which claim support for it, ignoring the quality of support they claim to offer, or if one should send text/html to applications that can support HTML and XHTML, but prefer HTML. My point is simply that the idea of a lossless conversion between the two formats is only possible in some simple cases and, in general, much harder than people seem to believe. Therefore when deciding which content type to send to which browser, one must consider the quality factor of the content as well as the browser preference. Obviously it is no use sending MathML to a browser that does not support it but, if a browser existed which supported MathML in XHTML documents, but claimed to prefer text/html to application/xhtml+xml according to it's accept header, you might be better off sending the XHTML+MathML rather than HTML+png since that document is likely to be better for the user. A simplistic approach that only examines the quality factors in the accept header wouldn't necessarily do this. Therefore, the correct solution is somewhere between the two discrete options originally presented.
Posted by jgraham at 6:49AM
Ah, I see your point. Thank you. PING: TITLE: struikelblok BLOG NAME: StijlStek.nl Er is een goede Nederlandse site over toegankelijkheid van websites: Struikelblok. Ga er snel kijken en leer alles over de mythen en mogelijkheden van toegankelijkheid op het web. Met dank aan Anne van Kesteren voor de link....
Posted by ACJ at 10:12AM