What Data Does Google Store About You and How to Download It with Takeout

What Data Does Google Store About You and How to Download It with Takeout

As is well known, Google stores a massive amount of data about its users, which has led to ongoing criticism. In response, Google created a tool that allows you to download all your data. This service is called Takeout, and it has a variety of interesting uses, which we’ll discuss here. We’ll also take a detailed look at what you actually get when you request your data.

Why Download Your Google Data?

There are many reasons you might want to download your data. Maybe you want to migrate to another service and transfer your information (though some conversion will likely be needed). Or perhaps you want to build an analytics system for lifelogging or quantified self projects, and your Google data can help. Or maybe your country suddenly decides to block Google, leaving you without access. Unfortunately, that happens too.

Over the past twelve years, I’ve used many Google products, often for reviews, and I never disabled tracking features. Maybe you think that’s crazy, but Google has treated my data carefully so far, and I’m not advocating this approach for everyone. Think of it as me enduring this so you don’t have to!

How to Get Your Google Archives

To get your archives, go to takeout.google.com, check the boxes next to the services you’re interested in, and wait (it can take a while). When I requested my full archive, the download link arrived after more than 24 hours. The email from Google’s bot said my data covered 36 products, totaled 63.6 GB, and was split into three archives. Most of the data was in the first two, while the third was a zipped catalog page of everything included.

If you don’t want to download such large files, you can request partial archives that include only selected services. For example, if you exclude Photos, YouTube, Gmail, and Drive (if you store more than test documents there), your archive might be just a few hundred megabytes. The smaller the archive, the faster you’ll get the download link.

Search History

Let’s start with one of the most interesting things—your search history. It’s in the Searches folder, split into files by three-month periods (e.g., 2006-01-01 January 2006 to March 2006.json). Each entry contains just two things: the Unix timestamp and the search query.

You can use an online converter to translate the time, or use Python for batch conversion:

datetime.datetime.fromtimestamp(int("timestamp")).strftime('%d-%m-%Y %H:%M:%S')

For fun, you can search for specific keywords using grep after converting the JSON to strings (for example, with the gron utility):

$ for F in *; do cat "${F}" | gron | grep "keyword"; done

Try searching for “download” or the “@” symbol to find all email addresses and Twitter accounts you’ve looked up. Note that image and video searches aren’t included here, but you’ll find them in the My Activity folder.

Chats

If you want to add your Google Talk and Hangouts chats to your old ICQ logs, you can—but reading the exported chats from Takeout is tough. All messages are in a single JSON file with lots of metadata, and sender names are replaced with user IDs. The simplest approach is to extract just the text:

$ gron Hangouts.json | grep '.text'

This way, you can at least see the conversations, even if they’re anonymized.

Google+

If you ever used Google+, you might want to back up your posts. The data is split into three folders: Google+ Stream, Circles, and Pages.

  • Circles: Your contacts, organized by circles, in vCard (VCF) format. You can import these into any address book.
  • Pages: Only present if you had public pages. Usually just a userpic and cover photo.
  • Profile: A JSON file with all the info you filled out in your profile, including links to other social profiles and workplaces.
  • Google+ Stream: All your posts and comments as separate HTML files. You can extract just the post texts using Python and BeautifulSoup by targeting the entry-title and entry-content classes. Note: images from posts are not backed up—they remain as links to Google’s servers and require authentication to access.

Maps

Another major category is your location data.

  • MyMaps: Routes you created in Google Maps, each as a KMZ file (Google Earth format, which is basically a ZIP containing a KML XML file). You can convert these to GeoJSON using GeoConverter.
  • Maps (your places): A Saved Places.json file with all your Google Maps bookmarks. Each entry includes a title, date added, date modified, and a link. Coordinates may be stored in different fields, so some parsing is required.
  • Location History: A file with your entire location history from your mobile device. Each entry includes a Unix timestamp, latitude, longitude, and accuracy. Sometimes it also includes direction, altitude, and altitude accuracy. You can analyze this data with Python, R, or specialized tools like Location History Visualizer Pro (paid) or free services like They Know Where You’ve Been.

Google Maps also has a Timeline feature where you can view your data by day, along with analytics like place names and transportation modes.

Chrome

The Chrome folder contains your cloud-synced Chrome data (though possibly not all of it):

  • Bookmarks.html: Your bookmarks as an HTML list, with Unix timestamps for when they were added.
  • Dictionary.csv: Presumably custom spellcheck entries (mine was empty).
  • Extensions.json: Installed extensions.
  • SearchEngines.json: Custom search engines and their query rules.
  • SyncSettings.json: Chrome sync settings.
  • Autofill.json: Autofill data (mine was empty).
  • BrowserHistory.json: I expected a full browsing history, but only found 14 links from mobile Chrome. Desktop history was missing, possibly a Takeout bug. If you get a full file, it includes the type of visit (LINK or TYPED), page title, URL, client ID (to distinguish devices), and Unix timestamp.

My Activity

This is perhaps the most interesting folder, showing exactly how Google tracks you. You’ll find records of:

  • Visits to sites affiliated with Google Adwords
  • Books opened in Google Books
  • Sites visited via Chrome
  • APIs used (Developers folder)
  • Stock quotes viewed in Finance
  • Object searches in Goggles
  • Pages viewed in Google Play Store
  • Help requests (Help folder)
  • Image searches and link clicks
  • Map object views
  • Google News searches and article reads
  • Search queries and link clicks (Search folder)
  • Shopping searches and purchases (Shopping folder)
  • Trip views in Google Trips
  • Video searches and clicks (Video Search)
  • Voice searches (Voice and Audio folder)
  • YouTube searches and video views

My Chrome site history was as empty as in the Chrome folder, and Shopping barely tracked any real purchases. Still, the amount of information is impressive. The Voice and Audio folder even contains MP3 files of your own voice saying “OK Google…” and similar phrases.

You can also view and filter this data at myactivity.google.com, where you can delete individual records or disable tracking for certain activities.

The exported format is HTML with heavy Material Design markup. Here’s a quick Python script to extract clean text from a MyActivity.html file:

import re

text = open('MyActivity.html', 'r').read()
result = re.findall(r'body-1">(.+?)</div>', text)
for r in result:
    for s in r.split('>'):
        print(s.split('<')[0])

This will give you clean text for further analysis.

Other Google Products

We won’t go into detail on all 40+ products, but here’s a quick overview:

  • +1: HTML list of pages you’ve liked via Google+.
  • Bookmarks: Bookmarks, mostly from Google Maps (starred places), in HTML format.
  • Calendar: Google Calendar events in iCalendar (.ics) format, importable into most calendar apps.
  • Photos: All your photos, organized by day, in their original formats (including RAW), with metadata in JSON files—even if you use the free version of Photos.
  • YouTube: All videos you’ve uploaded, in original formats, with metadata in JSON. Also includes playlists, subscriptions (OPML), watch and search history (HTML), and comments (HTML).
  • Classic Sites: Sites created with Google Sites. Cross-links work locally, but images remain on Google’s servers.
  • Drive: All Google Drive documents. Text as DOCX, spreadsheets as XLSX, comments as HTML. File names and folder structure are preserved.
  • Google Pay: Transaction history (CSV) and rewards/gift cards/offers (PDF), if any.
  • Mail: Full Gmail archive as a single Mbox file, including spam and trash. You can import this into most email clients, but it’s often easier to use IMAP directly with Gmail.
  • Google My Business: Basic account info in JSON.
  • Contacts: Gmail address book, organized by groups, with vCard files and userpics.
  • G Suite Marketplace: Plugins you’ve published (if any).
  • Tasks: Google Tasks data in a complex JSON structure.
  • Google Play Books: Folders named after your books, each containing HTML with the book’s title, author, last opened time, and a link. No actual book files are included.

Some products (Blogger, Classroom, Fit, Play Music, Groups, Handsfree, Hangouts on Air, Keep, Search Contributions, Voice) had no data in my archive because I never used them. The biggest omission is Keep, but you can find a parser for its data here.

Conclusions

The most important archives, besides mail, photos, and documents, are Searches, Location History, BrowserHistory.json from Chrome, and My Activity.

Does Takeout give you all your personal data? Not quite. Some old services (like Google Reader and Wave) are missing, and I doubt Google has truly deleted that data—it’s probably just in cold storage. There are other gaps, too, like missing desktop Chrome data and the lack of location info tied to search queries in My Activity.

Still, Google deserves credit for making data portability and transparency a priority. Takeout is useful not just for taking your data and leaving, but also for all kinds of analytics and personal projects.

Leave a Reply