Problems with Googles parsing of in google sitemaps August 24, 2012
Posted by Neuromancer in SEO.Tags: enterprise-it, software
add a comment
Found an interesting problem with XML Sitemaps and the way Google seems to handle the lastmod time. The sitemap protocol uses W3C Time as the standard. I thought I would write this up and put this out there.
I was seeing errors in GWT for one of our sites for a recently updated sitemap. GWT was complaining and about errors in date time formatting. If we look at the code (original site redacted)
<sitemapindex xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9“>
<sitemap>
<loc>http://www.example.com/site-map-type-subtype-pages-1.xml</loc>
<lastmod>2012-08-17T10:39:18.6685044Z</lastmod>
</sitemap>
<sitemap>
.
.
.
This all looks fine and validates both with w3c and xmllint – the only thing I could see that looked “hinky” is the use of the fractional seconds this sitemap index use last modified dates with fractional seconds to 6 decimal places i.e. to the nearest millionth of a second.
2012-08-16T07:58:15.5396524Z
This is fine and is perfectly valid, unfortunately reading the small print of the time standard your supposed to use for last modified date – the fractional second part of is not strictly defined
“This profile does not specify how many digits may be used to represent the decimal fraction of a second. An adopting standard that permits fractions of a second must specify both the minimum number of digits (a number greater than or equal to one) and the maximum number of digits (the maximum may be stated to be “unlimited”).”
Unfortutely the sitemap protocol does not define the number of digits used for fractional seconds and Google is reporting invalid date errors when it parses the date time. In fact it doesn’t seem to like shorter usage of fractional seconds at all.
An example of sloppy work in defining the sitemap standard and to a lesser extent Google’s in not handling a unlimited number of digits – which as the sitemap standard is silent is the obvious fail safe assumption they should have made.
I do wonder how some of today’s crop of PFP (pimply Faced Programmers) would cope with full on seven layer OSI based systems – I wonder if Vint has any old OSI blue books to act as an example of how-to write more stringent standards . (and yes I am aware of the pitfalls of the OSI model of standards making)
Link builder Chutzpah or Spearlink Spam Attack July 20, 2012
Posted by Neuromancer in SEO.add a comment
I work for RBI in the inhouse SEO team and we recently received a email that was part of a link building campaign they had obviously scraped on of our large sites ICIS ( a Big Chemical And Energy Industry site) and found that we had linked to a university article on Bio fuels which had subsequently been taken down.
They then produced a page with similar information to the now 404 page and suggested that we could replace this broken link with their link which was on a car insurance site
From: XXXXX XXXXXX [mailto:XXXX.XXXXX17@gmail.com] Sent: 18 July 2012 12:43 To: Subject: Broken link on your page Hi , I came across your website and wanted to notify you about a broken link on your page in case you weren't aware of it. The link on http://www.icis.com/blogs/biofuels/archives/biodiesel which links to http://www.example.edu/p2/biodiesel/article_alge.html is no longer working. I've included a link to a useful page on biodiesel that you could replace the broken link with if you're interested in updating your site. Thanks for providing a great resource! Link: http://www.example.org/algae-solutions Best, XXXX
Certainly this link builder wins an award for Shear Chutzpah – this approach to link building is similar to a spear fishing attack where an email is targeted directly at a specific individual and tailored to the interests of the victim/mark – which is why I dubbed this a Spearlink attack.
Unfortunately the recipient immediately realised that this looked dodgy. An attack on a government site or one with less savvy staff this attack could have easily succeed.
Its also interesting that they used a blog and not an article on the main site maybe the rss feed for the mt blog was used – rather than a crawl of the main site.
Googles April Algorithem Changes Panda3.5 19 April and Penguin 24th April April 30, 2012
Posted by Neuromancer in SEO.Tags: google, internet
add a comment
Penguin 24th April
Googles latest update the Penguin update launched on April 24. It was a change to Google’s search results that was designed to remove pages that have been spamming Google. Spamming in this case is where people do things like “keyword stuffing”, “hiding text” or “cloaking” that violate Google’s guidelines.
Panda 3.5 19th April
On the 19th an update of the Panda algorithm was launched. Panda is an algorithm designed to promote higher quality pages over lower quality sites.
Parked Domains Problem April 17th
Google also made a rare admission that it made a mistake – on the 17th April they had a problem that was incorrectly identifying sites as parked domains. A parked domain is one that you own but has no content apart from a holding page.
This is an executive summary of a longer post at search engine land here
Best Adsense Fail or Scary Devil Nunnery Recruiting – and a SEO Fail on jobs.guardian.co.uk September 23, 2011
Posted by Neuromancer in SEO.Tags: SEO
add a comment
Whilst perusing the Guardians job section to analyse the platform they use – I both found a number of ways to completely mess up that entire section of the site and I also found what must be the strangest Adsense advert of all time.
Having managed to create arbitrary pages on the job site I took a look at the Adsense served up at the base of the page which is show here (note the faked page I created was IT related).

Though I must say holding SEO Audits in the style of the “Congregation for the Doctrine of the Faith” does appeal some times – especially when one comes across pages whose markup could be best described as “Your aving a laugh mate”. Though I suspect that HR might winge when we took people down to the basement for the “shewing of the instruments “.
HPCC – High Performance Computer Cluster Open Sourced June 28, 2011
Posted by Neuromancer in HPCC.1 comment so far
I love my Job I get to play with large amounts of data and some cool new cutting edge cloud based toys such as Map Reduce and Mahout and some interesting Web 2.0 Machine learning and AI type algorithms
Map Reduce is a software frame work developed by Google to allow processing on large datasets on clusters of commodity computers. Though in an odd coincidence the Map stage of map reduce is effectively the same approach we used at Telecom Gold to handle processing the Large logs in the Telecom Gold Billing system with a system called GLE Generic Log Extract (written in PL1).
After some hacking I have got a small test cluster up and running to try out Map reduce for some interesting work on clustering documents, in this case web pages on some well known large websites.
I was having some difficulty in getting Mahout which is an open source set of algorithms to perform clustering of documents using map reduce – when almost by chance I found that out parent company has its own system HPCC (High Performance Computing Cluster) is a massive parallel-processing computing platform that solves Big Data problems that Map Reduce is used for.
HPCC used to be just an internal system developed by Lexis Nexis and has been used for lexis nexis customers for the past decade. But recently ie last week HPCC has been open sourced. As with Hadoop there is a web based interface
Also there is a windows IDE which directly connects to a HPCC cluster to allow you to run ECL which is the declarative non procedural language used to program jobs to be run on your HPPC cluster.
There is a test virtual machine available for down load here to allow people to test HPCC and learn ECL here binaries for Centos and Red Hat are avaible and source should be available in a few weeks.
Panda Update Hits the UK and All English Queries April 11, 2011
Posted by Neuromancer in SEO.Tags: Farmer, Panda, SEO
1 comment so far
It looks like the infamous Panda Google update has arrived outside of the USA. According to Google the Panda update (some times called Farmer/Panda) is meant to better identify low-quality pages and sites.
These are the sort of pages (often seen on “content farms”) with text that is automatically tuned to match the query – but may not provide the best user experience. (Google apparently calls it a “high quality sites algorithm.)
I am due to help give a presentation on SEO to a group of RBI’s developers on Wednesday – so guess whos going to be quickly revamping the presentation deck tomorrow – as well as groveling in the Analytics data to see if any of our sites have been hit.
Though its my boss that will be fielding the calls from the senior management I am glad to say. Coverage here and Googles own blog here
New Funtionality in GWT Non Informative Title Tags and Non Indexable Content March 23, 2011
Posted by Neuromancer in SEO.add a comment
Google have just lanched some new funtionality in GWT (Google Webmaster tools) two new items in the html sugestions: Non Informative Title Tags and Non Indexable Content
Could be usefull in diagnosing problems in sites that need fixing – espesialy as a non informative title tag is a big low quality signal.
Steam Punk Sara Palin February 9, 2011
Posted by Neuromancer in Uncategorized.Tags: graphic novels, sara palin, WTF
1 comment so far
Comics or graphic novels if we are being pretentious have had some odd one offs and crossovers -and recently a genre which mixes science with Jules Verne HG Wells era SF called steam punk has become popular.
And what did I find…. Drum roll please! Ladies and Gentlemen I give you Steam Punk Sarah Palin.
One reviewer commented
“Steampunk Palin defies classification into any literary genre, unless there’s a genre I’m unaware of simply called “WTF?!?
It seems to be in the so bad its good territory I cant wait for the film. A Review is here
RIP Gladys Horton of The Marvelettes February 2, 2011
Posted by Neuromancer in Music.Tags: Marvelettes, Mowtown, Soul
add a comment
Sad to see that Gladys Horton one the founders of the Marvelettes has recently passed away. I saw her obit in the Guardian the other day. I thought I should post up a link to one of my favorite Marvelettes tracks for an early Harlem Apollo show in 63.
As you can see they where doing the moonwalk years before Micheal Jackson and in high heels!
Bing Copying – Google Throws Toys out of Pram February 1, 2011
Posted by Neuromancer in SEO.add a comment
Oh dear sounds like Google is getting upset over Bing using googles results to improve theirs from the write up on Searchengine land here.
Google has run a sting operation that it says proves Bing has been watching what people search for on Google, the sites they select from Google’s results, then uses that information to improve Bing’s own search listings. Bing doesn’t deny this.
Reverse engineering is legal other wise we would still all be using IBM PC’s – Google should just man up and take it as a compliment.
Black Templar Space Marines WH40K January 27, 2011
Posted by Neuromancer in War Games.1 comment so far
I have been thinking about doing some WH40K gaming and one a visit to the mother ship at warhammer world I finaly broke down and brought a space marine battle force which is a basic starter set. I also looked at the various different Space marine factions and was taken with the Black Templars.
The Black Templars are a non standard Chapter of Space marines who deviate in a number of ways from the standard, Most of the time they fight in Companies which are formed in an ad hoc manner. The individual squads and specialists fight side by side out of familiarity and comradeship rather than any imposed organisation.
They also have a stark black and white colour scheme which appealed – though when I looked at the difficulty of painting black armour – I did wonder if I had bitten off more than I could chew. However I have made a start undercoating a 10 men unit and am in the process on painting them up ill post some pics when I dig out my camera. Though i might not get to this standard.
I am just at the stage of doing the white shoulder pads which require multiple coats of grey as a second undercoat so that the white stands out against the back under coat.
I also went to GW in London and Bedford and brought some more a few days later and have some vehicle models which will be done later.
So left to right we have shots of a rhino APC: a Razorback MICV and lastly a Predator tank when they are made up and painted in the chapter colors. My Local GW Shop in Bedford is here
Quora January 8, 2011
Posted by Neuromancer in SEO.add a comment
Just been playing with a new site Quora that is the new hotness
Basically its a site where you can post and answer questions – they describe it as:
Quora is a continually improving collection of questions and answers created, edited, and organized by everyone who uses it.
You can see my Quora profile here
3000 Point AT43 Game August 15, 2010
Posted by Neuromancer in SEO.add a comment
Pics from my recent AT43 Game
Search Marketing Pro – SEP 2010 August 6, 2010
Posted by Neuromancer in SEO.2 comments
Beer + Search geeks = a good time!
As a standin for the on hiatus SEO London my Boss is aranging a Networking event for SEO’s/People working in Search in London. At the moment its in the advanced cat hearding stage there is an Info Page for Search Marketing Pro Fill in the survay and we hope to see you there.
DR WHO S5 Trailer – Spitfires in Space March 20, 2010
Posted by Neuromancer in DrWho.add a comment
Just saw that IO9 had a link to the latest trailer for the new DR Who series and it looks like DRWHO Season 5 will be a fun season.
WW2 Camo Daleks, Spitfires in space and the return of the Weeping Angels from Blink the HUGO award wining episode from season 3.












