At blekko, we believe the web and search should be open and transparent — it’s number one in the blekko Bill of Rights. To make web data accessible, blekko gives away our search results to innovative applications using our API. Today, we’re happy to announce the ongoing donation of our search engine ranking metadata for 140 million websites and 22 billion webpages to the Common Crawl Foundation.
Common Crawl has built an open crawl of the web that can be accessed and analyzed by everyone. The goal is building a truly open web, with open access to information that enables more innovation in research, business, and education. Common Crawl will use blekko’s metadata to improve its crawl quality, while avoiding webspam, porn, and the influence of excessive SEO (search engine optimization). This will ensure that Common Crawl’s resources and engineering time are spent on webpages that are written by, and are useful to, humans.
We’re putting our full-fledged support behind Common Crawl’s crawl and mission with this donation. We’re not doing this because it makes us feel good (OK, it makes us feel a little good), or because it makes us look good (OK, it makes us look a little good), we’re helping Common Crawl because Common Crawl is taking strides towards our shared vision of an open and transparent Internet.
Just take a look at this excerpt from Common Crawl’s website:
“As the largest and most diverse collection of information in human history, the web grants us tremendous insight if we can only understand it better. For example, web crawl data can be used to spot trends and identify patterns in politics, economics, health, popular culture and many other aspects of life. It provides an immensely rich corpus for scientific research, technological advancement, and innovative new businesses. It is crucial for our information-based society that the web be openly accessible to anyone who desires to utilize it.”
Who could disagree with that?
MIT Review: A Free Database of the Entire Web May Spawn the Next Google (Hey, isn’t blekko the next Google?!)