Wednesday 3 December 2008

Google Blog Search No Longer Indexes Feeds

Vanessa Fox reports that Google's blog search engine changed the way it indexes blog posts. Until now, Google Blog Search only indexed feeds, so the results weren't very good for sites that offered partial feeds. The site started to offer a more comprehensive search by indexing the entire content of the page, including comments, navigation links and blogrolls.

"We have changed the way we index blog posts to include the full content of the page. We've had occasional complaints about the use of the feed content, particularly the problem with partial feeds. The indexing change has improved the results for a lot of queries, both because we have the full content of the page and because we extract links that are missing from the feeds. The downside of this change is that we see more results that match only the blogroll and other parts of the page that are common to all of a blog's posts," explains Jeremy Hylton. He says that the algorithm will be improved to exclude "the content that isn't really part of the post" to make the results more useful.

Here's an example of a comment from a Google OS post indexed by Google Blog Search:


Tip: if you want to find recent blog posts, don't sort the results by date. Just select "last 12 hours" or "last day" from the sidebar. This way, you'll get relevant results and you'll minimize the number of splogs (spam blogs) in the list of search results.

No comments:

Post a Comment