Looking at America Online…literally

AOL Research recently released a collection of search queries that AOL users had entered from March 1, 2006 – May 31, 2006. Attached to the data was the following description of the file contents: This...

AOL Research recently released a collection of search queries that AOL users had entered from March 1, 2006 – May 31, 2006. Attached to the data was the following description of the file contents:

This collection consists of ~20M web queries collected from ~650k users over
three months. The data is sorted by anonymous user ID and sequentially arranged.
The goal of this collection is to provide real query log data that is based on
real users. It could be used for personalization, query reformulation or other
types of search research.
The data set includes {AnonID, Query, QueryTime, ItemRank, ClickURL}.
AnonID - an anonymous user ID number.
Query  - the query issued by the user, case shifted with
most punctuation removed.
QueryTime - the time at which the query was submitted for search.
ItemRank  - if the user clicked on a search result, the rank of the
item on which they clicked is listed.
ClickURL  - if the user clicked on a search result, the domain portion of
the URL in the clicked result is listed.

In other words, this data trove is pretty much the same type of materials that Google fought vigorously to keep out of the hands of the U.S. government. While AOL has since removed the data from its website, copies of the data file are still floating around the Internet. However, instead of downloading and processing the raw data by yourself, which is no small feat considering the amount of data involved, I’ve come across a website — appropriately called AOL Search Database — that lets you search through the AOL search query data by User ID, search keyword or website result.

Here’s a quick review of the site.

Easy to Use. Beats setting up a database and importing 36 million lines of data.


No Free Text Search. Keyword search is not the same as free text search. If I search for gps, no results are returned because gps is not on the keywords list. That’s restrictive…

AND/OR. When I search for stanley cup on Google, it assumes that I want to search with all of the words; i.e., an AND operator between the search terms. In contrast, AOL Search Database assumes an OR operator between the search terms. So, stanley cup returns stanley cup, as well as morgan stanley, stanley cleaners and stanley furniture, which aren’t exactly relevant to my search query.

With those limitations in mind, let’s see what this data reveals.

ups tracking
http wwwapps.ups.com tracking tracking.cgi tracknum 1z800x050378624778AOL Search is “enhanced” by Google’s search engine. If you enter a FedEx/UPS/USPS tracking number in Google, Google will identify it as a tracking number and will give you a link to the relevant shipper. In this case, 1z800x050378624778 shows a delivery to Vista, CA, which was signed by Rosenbach. Of course, the searcher could be the sender or recipient. If you do use a search engine to track your packages, just keep in mind that someone viewing this search query data will be able to associate your searches with your location, even without a corresponding IP address.
User ID: 9072185golf courses in colorado, arrowhead golf club, bear creek golf club, broadlands golf course, lobster cookingBased on the User ID from the above search, you can follow the user’s search patterns. This persons apparently likes golfing and seafood. What a life!
drunk driving
drunk driving and cpa licenseNow this is an interesting search. Is this person concerned that a drunk driving conviction may bar them from receiving a CPA license or may cause them to lose their CPA license?
User ID:
new jersey and alcotest, cops suck, dui expungement in new jersey, enterprise rent a car and dui, new jersey cpa license renewal, new jersey cpa and driving under the influenceMuch easier to say cops suck to a search engine than to the face of a police officer. Good bet that someone in New Jersey got a drunk driving conviction. In light of this, you would be surprised by another search associated to this USER ID: new jersey home beer delivery.
order rohypnol from mexico or chileRohypnol is NOT legal in the U.S. It is legal in Europe and Mexico and prescribed for sleep problems and as an anesthetic. It is used as a date rape drug and brought into the U.S. illegally. So, what else does someone searching for order rohypnol from mexico or chile look for online?
User ID:
alprazolam cod, cod psilocybin, cod salvia divinorum, drinking 10 – 15 drinks a day, internet forums for alcoholics, .14 bac, .21 bal, doing morphine and alcohol at the same time, smells like vodka at work, sheriff’s north jail, heavy alcohol use by patients with panic disorder, vodka addicted, is their fentanyl real, what is the best pharmacy to buy steroids from, what is the most euphoric prescription opiate, is codeine the same high as morphine, avoid middle man, avoid middle man drugbuyers, customs in miami, cash payments drugbuyers, drinking in the morning, possession of cocaine in south florida, how much codeine to get as high as 40mgs of oxycodone, buy vials of ketamine overseas no prescription, how to rob a pharmacy, how to call in prescriptionsWow! Someone is looking to buy a lot of drugs online. And this doesn’t look like someone is just doing casual research for educational purposes either.
medical malpractice
medical malpractice law suits against valley medical center in rentonHow exactly do people search for lawyers? Let’s follow this user and find out.
User ID:
medical malpractice lawsuits involving vaginal packing left, infections due to vaginal packing left in after hysterectomy, malpractice attourney, what is my medical malpractice case worth, lawsuit against valley medical center in renton washington, medical malpractice valley medical center in renton washington, malpractice attorneys in renton washington, valley medical center in renton medical malpracticeSo, following this user’s search history, we can see that the searches involve combinations of these types of terms: (1) medical malpractice, (2) type of medical error (e.g., vaginal packing left in); (3) result of medical error (i.e., infection); (4) location of medical error (e.g., valley medical center in renton); (5) legal services sought (e.g., malpractice attorney in renton washington).

A lot of useful data may be extracted from the AOL Search Database. For attorneys looking to build their Internet presence, understanding exactly how individuals search for attorneys will allow you to tailor your website to most effectively reach prospective clients.

