Using MongoDB
Connect to / Import into a Mock Database
Learning Objectives
- Use
mongomock
to put off needing a real database connection - Import JSON data from a file
First, let’s connect to the (mock) server and get a handle for our client.
from mongomock import MongoClient
#from pymongo import MongoClient
client = MongoClient()
A MongoDB instance can host multiple databases, which are created dynamically. Here, we will supply a database name as an attribute of the client object, which will prompt MongoDB to create the database with that name if it doesn’t exists.
# Refer to the Software Carpentry ("swc") database
db = client.swc
print(db)
Database(mongomock.MongoClient('localhost', 27017), 'swc')
We see here that what we’re calling db
is a Database with the name “swc” that we’re accessing through a Mongo client connected to localhost on port 27017.
print(db.materials)
Collection(Database(mongomock.MongoClient('localhost', 27017), 'swc'), 'materials')
A MongoDB database is organized as a set of collections, each of which contains a set of documents. To first order, you can for now think of a collection as corresponding to a table in SQL and a collection document as corresponding to a table row in SQL.
Just as with databases themselves, database collections are created dynamically in MongoDB. Above, we created a materials
collection in our database simply by referring to it by name.
Now, let’s load data from a file and import it as documents into our collection:
import json
We first import the json
module from the Python standard library. JSON
, which stands for “Javascript Object Notation”, is a way to express simple data structures that is widely used in web-based applications. We’ll go over the format in the next topic when we construct a document to insert into our collection, but for now let’s focus on importing data that we’re given.
with open('data/mongo-novice-materials.json') as f:
db.materials.insert_many(json.load(f))
We are using a Python context manager to open a file and ensure that it is closed when we are done processing the file contents. In this case, we use the json
module to load the file contents as Python-native data structures, which we then hand off to the insert_many
method of database collections to insert all of the loaded documents.
db.materials.count()
To confirm we have the data loaded, we use the count()
method of a collection object and see that we have more than zero documents in our materials
collection.
Loading JSON data
Reload the dataset from the JSON file and assign it to a variable dataset
instead of immediately inserting it into the database collection. What is type(dataset)
, the type of the dataset? What is the type of dataset[0]
, the first (“zeroth”) member of the imported data?
collection
anddocument
array
andobject
list
anddict
set
andmember
Unknown collections
What happens when you run the following:
db.publications.count()
0
Error: no such collection "publications"
Error: no method "count" on Undefined