With all the recent controversy over US Government data collection, it’s a good time to bring up a privacy concern that every Internet user should have: the information that search engines keep about their users. A large fraction of all website visits go through search engines — no one uses bookmarks or remembers domain names anymore. Even if you are not a criminal, you probably make searches that you don’t want your minister, boss, or spouse to know about. You don’t expect your doctor to keep a record of every question you ask at a checkup, but your search engine probably remembers every medical search you’ve ever made. And even if you’ve been careful to log out and clear your cookies, those searches are probably associated with your real name.
Violate my privacy for a good reason, eh?
One reason search engines keep search histories is to provide personalized ads and results. Search ads are mostly based on the keywords you type into the search box, and are very lucrative — so much so that there’s not much benefit from knowing that you clicked on a Mercedes website last week. Choosing which news sources (New York Times or Fox News?) based on user history is good for the user, but doesn’t require remembering every single article that you’ve ever clicked on (“J-Lo Reveals: Space Aliens Tattooed My Baby!”) User click data is a great way to improve the order that websites are presented in results, but most of the benefit comes from completely anonymized click data, not having my personal click history.
Recording very sensitive data because it might be useful someday is a bad idea.
“Big Data” is really popular with businesses these days, with the hope that it can provide great value, either to users, or to advertisers. Recording nearly everything that users do, which is what major search engines do, is bad for many reasons:
- The right thing to do is to keep just enough information to provide most of the benefit for the user, not all the information for a tiny additional benefit.
- The user really doesn’t benefit from helping advertisers. I like seeing more relevant ads, but not at the cost of having my search engine remember every embarrassing query I’ve ever made.
- Just for fun, bad guys might break in and publish search histories. You can read about these kinds of incidents every week; it’s never happened to Google, but it’s still a bad idea to keep all that data.
- Just like public libraries, it is not the mission of a search engine to collect information for the government. OK, maybe in non-free countries that’s the mission of both search engines and the public library, but that’s not exactly the ideal that most of us hope for on the Internet.
What’s the right thing to do? Privacy by design.
- Don’t track anyone’s search histories
- Be careful that anonymized data really is anonymized, and is minimized to provide the most benefit with the least data.
- Keep nothing if users select the “Do not track” option in their browser.
In the long run, consumers will only have privacy if they demand it. Thank you to all who have raised their voices about this issue!
[ Discuss on hacker news. ]