Find us on GitHub

Lawrence Berkeley National Laboratory

Apr 7, 2016

10:00 am - 4:00 pm

Instructors: Greg Wilson, Donny Winston

Helpers:

General Information

Software Carpentry's mission is to help scientists and engineers get more research done in less time and with less pain by teaching them basic lab skills for scientific computing. This hands-on workshop will cover basic concepts and tools for effective data management using SQL and MongoDB. Participants will be encouraged to help one another and to apply what they have learned to their own research problems.

For more information on what we teach and why, please see our paper "Best Practices for Scientific Computing".

Who: The course is aimed at graduate students and other researchers. You don't need to have any previous knowledge of the tools that will be presented at the workshop.

Where: Bldg 54 Rm 130, 1 Cyclotron Road, Berkeley, CA. Get directions with OpenStreetMap or Google Maps.

Requirements: Participants must bring a laptop with a few specific software packages installed (listed below). They are also required to abide by Software Carpentry's Code of Conduct.

Contact: Please mail dwinston@lbl.gov for more information.


Schedule

10:00 Managing data with MongoDB
11:15 Coffee
12:30 Lunch break
13:30 Managing data with SQL
14:45 Coffee
16:00 Wrap-up

Etherpad: http://pad.software-carpentry.org/swc-lbnl-2016-04-07.
We will use this Etherpad for chatting, taking notes, and sharing URLs and bits of code.


Syllabus

Managing Data with MongoDB

  • Set up a local server
  • Connect via Python and a Jupyter Notebook
  • Import and Insert Data
  • Find Data; Specify Conditions with Operators
  • Sort Results
  • Update and Remove Data
  • Data Aggregation
  • Indexes
  • Reference...

Managing Data with SQL

  • Reading and sorting data
  • Filtering with where
  • Calculating new values on the fly
  • Handling missing values
  • Combining values using aggregation
  • Combining information from multiple tables using join
  • Creating, modifying, and deleting data
  • Programming with databases
  • Reference...

Setup

To participate in this workshop, you will need the software described below. In addition, you will need an up-to-date web browser, and the following four files:

  1. mongo-novice-materials.json
  2. periodic_table.json
  3. survey.db
  4. gen-survey-database.sql

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

MongoDB

MongoDB is a powerful system for search and analysis of your data, and can handle large, complex data sets.

Windows

Go to https://www.mongodb.org/downloads and download the MSI installer. The official installation tutorial walks through how to determine which version of the installer will be best for your system. Run the installer, accept the license, and choose a “Complete” installation.

Once, installed, open a Powershell window and type in

md \data\db

to create the default data directory for MongoDB to use on your system. Then, add the MongoDB binary files directory to your Path environment by typing in

$env:Path += ";C:\Program Files\MongoDB\Server\3.2\bin"

and then start the server (called the mongo daemon) by typing in

mongod

Now, open another Powershell window, and this time call mongo (no d at the end) to open a connection to your running server:

$env:Path += ";C:\Program Files\MongoDB\Server\3.2\bin"
mongo

Type help and hit enter, and you should see a listing of commands. Type exit and hit enter to exit the MongoDB shell and return to your Windows command prompt.

Mac OS X

Homebrew, “the missing package manager for OS X”, installs binary packages based on published “formulae”. To install, open a Terminal window and enter (one line)

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

and if you already have brew installed, run brew update to update your installation’s package database.

To install the MongoDB binaries, run the following:

At a Terminal prompt,

brew install mongodb

Use brew services to launch the mongodb daemon (mongod) and ensure it re-launches if your system restarts:

brew tap homebrew/services
brew services restart mongodb

Run mongo at a terminal prompt to open a connection to your running server. Type help and hit enter, and you should see a listing of commands. Type exit and hit enter to exit the MongoDB shell.

Linux

The procedure maps more or less to that for Mac OS X above. Consult https://docs.mongodb.org/manual/administration/install-on-linux/ for MongoDB setup intructions particular to your flavor of Linux, and ensure everything works as described in the Mac OS X instructions.

SQLite

SQL is a specialized programming language used with databases. We use a simple database manager called SQLite in our lessons.

Windows

The Software Carpentry Windows Installer installs SQLite for Windows. If you used the installer to configure nano, you don't need to run it again.

Mac OS X

SQLite comes pre-installed on Mac OS X.

Linux

SQLite comes pre-installed on Linux.

If you installed Anaconda, it also has a copy of SQLite without support to readline. Instructors will provide a workaround for it if needed.