Found an interesting problem with XML Sitemaps and the way Google seems to handle the lastmod time. The sitemap protocol uses W3C Time as the standard. I thought I would write this up and put this out there.
I was seeing errors in GWT for one of our sites for a recently updated sitemap. GWT was complaining and about errors in date time formatting. If we look at the code (original site redacted)
This all looks fine and validates both with w3c and xmllint – the only thing I could see that looked “hinky” is the use of the fractional seconds this sitemap index use last modified dates with fractional seconds to 6 decimal places i.e. to the nearest millionth of a second.
This is fine and is perfectly valid, unfortunately reading the small print of the time standard your supposed to use for last modified date – the fractional second part of is not strictly defined
“This profile does not specify how many digits may be used to represent the decimal fraction of a second. An adopting standard that permits fractions of a second must specify both the minimum number of digits (a number greater than or equal to one) and the maximum number of digits (the maximum may be stated to be “unlimited”).”
Unfortutely the sitemap protocol does not define the number of digits used for fractional seconds and Google is reporting invalid date errors when it parses the date time. In fact it doesn’t seem to like shorter usage of fractional seconds at all.
An example of sloppy work in defining the sitemap standard and to a lesser extent Google’s in not handling a unlimited number of digits – which as the sitemap standard is silent is the obvious fail safe assumption they should have made.
I do wonder how some of today’s crop of PFP (pimply Faced Programmers) would cope with full on seven layer OSI based systems – I wonder if Vint has any old OSI blue books to act as an example of how-to write more stringent standards . (and yes I am aware of the pitfalls of the OSI model of standards making)