Searching without PageRank

… implemented in an hour (!)

[ Discuss this post on Hacker News ]

I recently saw a nice Hacker News discussion about millionshort.com, an experimental search engine that makes it easy to search the web, minus the million most popular websites.

blekko’s search engine has a feature called slashtags, which can be used to either restrict a search to a list of websites, or remove that list of websites from the results. We typically use this feature for human curation, for example, picking out the best health websites. Hm, I thought, what an interesting hack! I’ll take that list of the most popular websites, and make slashtags which can be used to either search or exclude the most popular 10, 100, 1000, 10,000, or 100,000 websites. Our current effective limit to slashtag size is 100,000 websites, so I couldn’t do the most popular 1,000,000 sites.

The way you search with a slashtag on blekko is to add /slashtagname to your search. In this case the top-100,000 site slashtag is named /top10/top100k, and we use a minus sign to exclude those sites:

And, after about an hour of work, voilĂ ! Here are a couple of examples of our new bottom-website search engine in action.

great concert

This query happens to show another blekko feature in action, autoboosted slashtags. When you search for great concert, we automatically figure out that you’re interested in music, and so we use our curated /music slashtag to improve the results:

great concert

But, for the purposes of this experiment, let’s turn off this autoboost feature by adding /web on the end of our query:

great concert /web

Now let’s look at the web, minus the top 10, 100, etc. websites:

great concert -/top10/top10 – VoilĂ ! amazon.com disappears.

great concert -/top10/top100

great concert -/top10/top1000

great concert -/top10/top10k – mtv.com and emusic.com disappear.

great concert -/top10/top100k

hugo winner

The Hugo Award is given annually to the best science fiction book of the year. Let’s see what the bottom of the web thinks about it:

hugo winner

hugo winner -/top10/top10

hugo winner -/top10/top100 – flickr and about.com disappear; davidbrin.com appears.

hugo winner -/top10/top1000 – goodreads disappears, librarything appears

hugo winner -/top10/top10k – io9, boingboing, and Esquire disappear; dpsinfo, sfwriter.com, and mabfan.livejournal.com appear

hugo winner -/top10/top100k – no change

Try it for yourself!

[ Discuss this post on Hacker News ]

Search the top 100,000:
Search all but the top 100,000:
Search the top 10,000:
Search all but the top 10,000:
Search the top 1000:
Search all but the top 1000:
Search the top 100:
Search all but the top 100:
Search the top 10:
Search all but the top 10:

About Greg

Greg is the CTO of blekko
This entry was posted in Technology. Bookmark the permalink.

3 Responses to Searching without PageRank

  1. dmitriy says:

    Interesting. Some of the results for python programming questions and bungee jumping are still the same. Maybe because those are less popular searches in the first place.

  2. brad says:

    interesting! still wrapping my brain around all of blekko’s unique features. i’ve been using it for a few months now as my go-to search and i must say i have been pleasantly surprised! to be sure, there is still a lot of work to do, and maybe one query out of twenty forces me back to google, but blekko is otherwise fast and accurate. i was a doubter, i wasn’t sure there was room in search for a new player…consider me converted

  3. How about an option to exclude sites with ads?