Sunday, November 06, 2005

MicroTags and microformats

I've been looking at HTML techniques this week. More accurately, the indexing of HTML pages that contain these tags called micro formats. They seem to be cropping up everywhere. Instead of trying to invent new XML formats, people have been starting to use the old HTML tags in a structured way, to be recogniseable to other, perhaps automated, systems. Information can be tagged in this way, to make it available as a "search for" item. The tags might be harvested like Google harvests pages, but the structure of the microformat makes it much easier to catalogue particular types of information. Inserting semantic information can be done in several ways without interfering with the existing presentation of the pages.

Meta tags can contain a lot of data that describes the page.
Class attributes can be used to show the type of the data being shown.
Rel tags use the "rel" attribute to attach data to other HTML tags.
Abbr and Span elements can extend their normal use to add information in a structured way.

These are largely ignored by current browsers when laying out a page, unless the class is involved in the css description. The information embedded in them is waiting for some more complicate searcher, that is either spidering the pages, or is to be built in the the next generation of browsers.

For a simple example, we can look at the date-time design pattern.

The page would show,

The party is at 10 o'clock on the 10th.

But the HTML would contain,

The party is at <abbr class="dtstart" title="20051010T10:10:10-0100">10 o'clock on the 10th.


where the datetime is specified in a fully machine readable format. This tag can then be read by the browser to allow the system to add it to the calendar etc. Page editors will have a plugin to add this html in just the same way as you now add the href to a link.

I had a look at the XHTML Friends network which tries to hold people's relationships is rel tags but it's a complicated area. How they will cope with people that meet and then marry then split up, is too complicated for my part. The links that they are laying down are held in pages that may not be retrieved in the same order. I don't see how that might work. Freinds and aquaintances, meetings etc should be ok, but not reversible situations. Have a look at http://gmpg.org/xfn/ if you feel a bit more positive than I do.

The relTag idea is similar. Authors of pages can add particular labels to links, for example, to show the type of link that they are describing. The following link,

Animation facility

would allow a spider to search out animation links, rather than relying on pulling the idea from the text in the page. There will be a flood of specific tag labels for different industries and areas, and there will be facilities for people to search for a variety of these links across the web. Again, a simple HTML tag has become a much, much more useful piece of information, without any impact on the readability of the page. It will allow us to set up tag links to search through our web space, or the web as a whole. If users are allowed to add the tags, as Flickr do with the photos that people add, it allows others to come along and search in a more structured fashion, using the cross-site tagging that has been created. Adding the data at source is much more efficient than trying to add it later, and it's done by the users rather than being a load on the system.

Another format that seems to be popular is the hCard which is an HTML version of contact information formatted so that a machine reader can convert it to a vCard contact information block which Outlook and other mail programs can use (Not Thunderbird yet?). It allows people to give correctly formatted information, while keeping the page readable. Quite why people want to put their email into a public place is beyond me! I'm sure there's a reason.

More information from the microformats site

0 Comments:

Post a Comment

<< Home