Using MongoDB

This lesson introduces MongoDB, a document-oriented database that departs from the relational nature of SQL. Because SQL is so well-established and prevalent as a database technology, the terms “relational database” and “SQL” are often synonymous. Furthermore, the term “NoSQL” has come to encompass a whole class of database technologies that are non-relational in nature. There are other “NoSQL” technologies such as key-value stores and column stores that differ from MongoDB’s document/collection approach, but we will only cover MongoDB today.

MongoDB trades off the strictness of SQL for a degree of simplicity and flexibility desirable for scientists who are often not certain about the best schema design for new/changing data sets. Whereas SQL lends itself well to enforcing a schema and thus ensuring data validation at the database level, a system like MongoDB does not require setting/migrating schemas to get started with data management. However, this means that application-level data validation (e.g. before adding to the database) is particularly important. MongoDB can pick up on implicit schema in your data through the creation of indexes, which will speed up queries significantly for large data sets.

This lesson was originally developed at the Lawrence Berkeley National Laboratory, where many scientists use MongoDB servers hosted by NERSC to manage workflows and process data on supercomputing clusters. The example data for this lesson is from the Materials Project, which hosts computed information for tens of thousands of known and predicted inorganic crystalline compounds.

Prerequisites

  • Familiarity with the command line, for running a mongo server locally and importing data.

Getting ready

You need to download two data files to follow this lesson:

  1. Make a new folder in your Desktop called data.
  2. Download mongo-novice-materials.json (~20 MB) (Right-click the link and “Save as…”) to this folder.
  3. Download periodic_table.json (~120 KB) to the same folder.

Topics

  1. Introduction/Setup
  2. Connect to / Import into a Database
  3. Insert Data
  4. Find Data
  5. Specify Conditions with Operators
  6. Sort Results
  7. Update Data
  8. Remove Data
  9. Data Aggregation
  10. Indexes

Other Resources