One of the advantages of having your own search engine is that you have access to all sorts of data that no one else does. Really, really cool data. And when you tell your friends you have all this data, you get lots of people asking you for stuff. Interesting questions like: “Can you give me a list of every site that uses Facebook Connect? In rank order?” Or: “Can you send me a list of sites that have the Google +1 button on them?”
Or, the most popular one: “Can you give me a list of every site that is running [insert name] ad network?” Obviously this is the most popular as it generates what is essentially a list of leads for someone who works for a competing ad network.
Anyway, we’re all too happy to give this info out. After all, one of blekko’s founding tenents is that the web should be open and transparent. We believe in a transparent web so much so that we wanted to extend access to this valuable data to people outside our personal social circles. So today we’re excited to announce the launching of WebGrepper. WebGrepper is a simple way to mine the web for data, that frankly, you can’t get from any other search engine.
The way it works is simple. Everyday, we will run 2 map jobs against our crawl of 4 billion pages. These will be greps for strings, patterns, regex expressions that blekko users submit to us and decide are cool. Got a grep you want to run? Submit it here. If enough people agree with you that this grep is interesting (by voting it up), we’ll run it. And we’ll post the results here. We make the top 500 results for every grep available for free to anyone who wants it. Pretty cool, eh?
We look at the web as one big, massive data set. Keyword matches (with a relevance filter) are certainly one way to pull data from that set. But its not the only way. There’s a ton of interesting data that lives within the source HTML that keywords just can’t get to. That’s where WebGrepper comes in – and now you have access to it.
So do us all a favor: think of some cool things to grep for and submit them to the community. Get out there and
slash Grep the Web!
If you’re one of those guys who wears a t-shirt that says “Grep Me“, “Get a Grep”, “I Grep Therefore I am” or “who | grep -i blonde | date; cd“, chances are you can guess what blekko’s new Web Grepper does. For all the other folks out there who don’t speak UNIX or Klingon, Web Grepper “greps” or searches lines of code within Web files to identify relevant or matching domains based on specific topic and search terms.