Using MongoDB

Sort Results

Learning Objectives

  • Understand sorting as a method chained to the cursor
  • Sorting by multiple keys

To specify an order for the result set, you can append the sort() method to a cursor. The argument to sort can be as simple as a key of interest:

filt = {"elasticity": {"$ne": None}}
proj = {"elasticity.poisson_ratio": 1}
proj.update(COMMON_PROJ)
cursor = db.materials.find(filt, proj).sort("elasticity.poisson_ratio")
for doc in cursor.limit(5):
    pprint(doc)
{'elasticity': {'poisson_ratio': -0.07595596751510682},
 'material_id': 'mp-771798',
 'pretty_formula': 'WO3',
 'spacegroup': {'number': 204}}
{'elasticity': {'poisson_ratio': 0.042582069532848744},
 'material_id': 'mp-87',
 'pretty_formula': 'Be',
 'spacegroup': {'number': 194}}
{'elasticity': {'poisson_ratio': 0.05989488523534276},
 'material_id': 'mp-765892',
 'pretty_formula': 'MnCoO4',
 'spacegroup': {'number': 10}}
{'elasticity': {'poisson_ratio': 0.07135395365323711},
 'material_id': 'mp-611426',
 'pretty_formula': 'C',
 'spacegroup': {'number': 194}}
{'elasticity': {'poisson_ratio': 0.07391837849830045},
 'material_id': 'mp-23703',
 'pretty_formula': 'LiH',
 'spacegroup': {'number': 225}}

If just a key is given, then the sorting is in ascending order. To specify descending-order sorting by a key:

cursor = db.materials.find(filt, proj).sort("elasticity.poisson_ratio", -1)
for doc in cursor.limit(5):
    pprint(doc)
{'elasticity': {'poisson_ratio': 0.46752282089107655},
 'material_id': 'mp-1387',
 'pretty_formula': 'AlV3',
 'spacegroup': {'number': 223}}
{'elasticity': {'poisson_ratio': 0.46347893160099135},
 'material_id': 'mp-361',
 'pretty_formula': 'Cu2O',
 'spacegroup': {'number': 224}}
{'elasticity': {'poisson_ratio': 0.4622062265482987},
 'material_id': 'mp-544',
 'pretty_formula': 'Ti3Ir',
 'spacegroup': {'number': 223}}
{'elasticity': {'poisson_ratio': 0.4461861025154294},
 'material_id': 'mp-22060',
 'pretty_formula': 'Nb3In',
 'spacegroup': {'number': 223}}
{'elasticity': {'poisson_ratio': 0.4450687838211702},
 'material_id': 'mp-75',
 'pretty_formula': 'Nb',
 'spacegroup': {'number': 229}}

What happens if we want secondary sorting? The argument to sort() is then a list of pairs, where each pair specifies a key to sort by and a direction, either ascending (1) or descending (-1) order. The first pair specifies the primary sorting, and the second pair the secondary sorting after the first is done.

cursor = db.materials.find(filt, proj).sort([
        ("nelements", -1),
        ("elasticity.poisson_ratio", -1),
    ])
for doc in cursor.limit(5):
    pprint(doc)
{'elasticity': {'poisson_ratio': 0.3419981312210572},
 'material_id': 'mp-556194',
 'pretty_formula': 'SrSbSe2F',
 'spacegroup': {'number': 129}}
{'elasticity': {'poisson_ratio': 0.34027796672596883},
 'material_id': 'mp-12532',
 'pretty_formula': 'KAg2PS4',
 'spacegroup': {'number': 121}}
{'elasticity': {'poisson_ratio': 0.3349886462699885},
 'material_id': 'mp-557862',
 'pretty_formula': 'BaAg2(HgO2)2',
 'spacegroup': {'number': 125}}
{'elasticity': {'poisson_ratio': 0.3050725032443602},
 'material_id': 'mp-11806',
 'pretty_formula': 'LiMgSnPt',
 'spacegroup': {'number': 216}}
{'elasticity': {'poisson_ratio': 0.3042911533524732},
 'material_id': 'mp-546011',
 'pretty_formula': 'YZnAsO',
 'spacegroup': {'number': 129}}

You can also pass a keyword argument to find() to specify sorting:

cursor = db.materials.find({
        "nelements": {"$lt": 3}
    }, {
        "_id": 0,
        "nelements": 1,
        "elasticity.K_VRH": 1,
        "pretty_formula": 1,
        "spacegroup.crystal_system": 1,
        "material_id": 1,
    }, sort=[
        ("nelements", 1),
        ("elasticity.K_VRH", -1)
    ])

for doc in cursor.limit(5):
    pprint(doc)
{'elasticity': {'K_VRH': 435.66148729813784},
 'material_id': 'mp-611426',
 'nelements': 1,
 'pretty_formula': 'C',
 'spacegroup': {'crystal_system': 'hexagonal'}}
{'elasticity': {'K_VRH': 401.3286154019227},
 'material_id': 'mp-49',
 'nelements': 1,
 'pretty_formula': 'Os',
 'spacegroup': {'crystal_system': 'hexagonal'}}
{'elasticity': {'K_VRH': 365.08592365295135},
 'material_id': 'mp-8',
 'nelements': 1,
 'pretty_formula': 'Re',
 'spacegroup': {'crystal_system': 'hexagonal'}}
{'elasticity': {'K_VRH': 346.3227612888041},
 'material_id': 'mp-101',
 'nelements': 1,
 'pretty_formula': 'Ir',
 'spacegroup': {'crystal_system': 'cubic'}}
{'elasticity': {'K_VRH': 307.5241654938898},
 'material_id': 'mp-33',
 'nelements': 1,
 'pretty_formula': 'Ru',
 'spacegroup': {'crystal_system': 'hexagonal'}}

Why a List of Pairs?

A query filter is specified as a dict of key-value pairs, and so is a query projection. Why can’t we just use, for example, {"key1": 1, "key2": -1} as an argument to sort() to specify a primary ascending sort by “key1” and a secondary descending sort by “key2”?

Take a Number

Each crystal system maps to a consecutive range of space group numbers. Verify this is the case by inspection through sorting. It may be helpful to use the distinct aggregation method on a collection to get a list of the crystal systems present in the collection:

# Filter out compounds with unknown crystal symmetry
systems = filter(None, db.materials.distinct("spacegroup.crystal_system"))
for s in systems:
   # do stuff