September 30, 2002

Tweney: Google should index blog RSS feeds

Google loves blogs. Blogs loves Google. But is there trouble in paradise? When items slip of the front page of most blogs, there is an anecdotal two- to three-week delay before archived items are reindexed. As Dylan Tweney points out this is an artifact of the fact that Google's basic unit of indexing is the web page URL and blogs are more fine-grained: the post as the basic unit, usually multiple posts on a single page.

Permalinks arose to address this same issue, allowing post-level targetting of links to web posts. This is generally implemented with named anchors within pages, although it's also possible to assign each entry its own page in the archives, even if several entries are aggregated at any one time on the blog's index page.

Dylan has a suggestion, though, to help the Googlesphere catch up with the blogosphere:

As it turns out, we do have a couple of data formats that understand the difference between a post and a page, include useful summary data, and even include handy pointers back to the exact archive location of a post. They're called RSS and RDF.

These syndication formats are used to aggregate news, but they could be useful indexing tools too. What if Google (or Daypop, once they can afford to buy a few new hard drives) collected RSS and RDF feeds — and then archived them in a searchable index?

Instead of news stories scrolling off into oblivion when they get to the bottom of a feed, they'd enter a permanent index where they could be used for information retrieval later.

It seems that the same approach would work when indexing an intranet or enterprise portal. Maybe part of the solution for turning k-logs into a true knowledge sharing system is to make sure the search implementation indexes RSS feeds from k-logs, making knowledge retrieval possible without discontinuities.

Posted by xian at September 30, 2002 3:31 PM
Other incoming links (via Technorati)

Hosted by Mediajunkie.

Sponsors
On this day in 2004
You know you're obsessed with blogging when: You lie about the size of your readership... in a dream.... (Weblog Concepts)
On this day in 2003
Blogs briefing the press: Billmon think's truly pathetic that journalists are getting their background on the Plame Affair from blogs, but one of his commenters points out that its the journalists who aren't reading blogs who are harder to understand. It's a great thread, not least because it includes this rewrite of Shakespeare by "Monica":... (Politics)