Blekko Upgrades SEO Data

Blekko integrated the up-to-date crawl information that powers our search engine index directly into our SEO product offerings. Not only is the data more comprehensive, but there are major improvements in real-time updates to our SEO pages.

When it comes to pages crawled, the sweet spot for blekko is a little more than 4 billion pages. To keep our crawl fresh, we update at least 100 million pages each day. As soon as our crawler, Scoutjet, crawls a webpage, users have access to information about it through blekko’s SEO product. We want to enable people to see the Internet the way a search engine sees it, especially what the rest of the internet is saying about an url.

Scoutjet updates the top ranked starting pages on the Internet around every hour, while other high quality pages are checked at least every week. The continuous updates to blekko’s SEO data include page content, meta data, duplicate text, and inbound link counts. Staying up-to-date is as much about forgetting the old as finding the new. So, we eliminate inbound links that are no longer live and duplicate content that is no longer available.

Since our traffic continues to grow rapidly, we are bringing more machines into service to keep our site humming. While we were upgrading our site to handle more traffic, we decided to leverage our highly customizable NoSQL database to make real time access to our crawl publicly available. Our “combinator” abstraction proved critical in quickly making the right tradeoffs between crawl throughput and user request latency.

We hope you will be pleased with the new and improved performance for web search and SEO data!

About Greg

Greg is the CTO of blekko
This entry was posted in Features, SEO and tagged . Bookmark the permalink.

11 Responses to Blekko Upgrades SEO Data

  1. Sweet. For about how many pages is data updated on a daily basis vs. weekly basis?

    • Robert Saliba says:

      The numbers are always in flux as we tweak our system. Currently, we are updating around 130 million pages weekly. Around a million pages are updated daily, mostly homepages and frequently updated directory pages. Also, duplicate text and anchor information is updated immediately upon discovery, even if the target page is not on a fast update cycle.

  2. WOW! Blekko SEO data just got a whole lot better

  3. IMNorth says:

    Thank you guys! We are many who appreciate your initiative to share your data with us! I am sure this will prove to be a winning concept in the long run.

    Keep up the good work!
    /the IMNorth staff

  4. Rama says:

    Appreciate your efforts, You guys rock

  5. Aidan Rogers says:

    Wow awesome guys – absolutely lovin Blekko keep it coming!

  6. Matt says:

    Thanks for all the hard work you folks a blekko put into your product. Keep up the awesome work! :)

  7. Will Bakhos says:

    Go Blekko… lets try to coin the phrase ‘just blekko it’!

    Good stuff

  8. Kathrin says:

    Much awaited updated. Thanks guys

  9. What does avg. page latency mean and how come the White House doesn’t have it? Thanks!

    • Robert Saliba says:

      Avg page latency is the average time our crawler spends fetch pages from a host. We stopped tracking this information some time ago. Unfortunately, we did not eliminate the blank space on the SEO web page at the same time. We will fix the page soon. Thanks for asking about it!