Projectskeyboard_arrow_rightDocker for Data

Docker for Data

How to make open data accessible & usable?

    Governance Area
  • Global Governance
  • Institution Type
  • Nonprofit Academic Corporate/Business Public Sector
  • Innovative Capability
  • Open data Big data
  • Product Category
  • database dataset platform


Existing open data portals are focused on views of single data sets in the browser. This is great until you need to do deep analysis or build a business on it.

Downloading big datasets can take hours. It can take many more hours finding the location of important small datasets, especially those not living on the same portal. Add in a few more hours to load everything into a real database and figure out the relations, and you've spent days (or thousands of dollars) on something trivial.

It doesn’t have to be this hard.





Docker for Data is a cloud-based open source toolkit to speed up the extraction and loading of large open datasets. It gives open data users a command that can install a broad variety of data sets into a SQL database directly. A user can load data like New York City’s deeds and mortgages directly into a powerful SQL database in just minutes instead of hours.

Users no longer have to worry about schemas, transformations, load processes, or waiting for slow data portals. Data from many formats is standardized and ready to go, fast. Installation is a snap: users don’t even need a SQL database installed beforehand. Docker for Data can be installed with one line of code.

Results & Impact

Because Docker for Data archives the extracted data sets, it functions as a repository for usable open data. On Docker, you can find:


John Krauss