Find us on GitHub

Lawrence Berkeley National Laboratory

Mar 2, 2016

9:00 am - 4:00 pm

Instructors: Greg Wilson, Donny Winston

Helpers: Joey Montoya, Shreyas Scholia

General Information

Software Carpentry's mission is to help scientists and engineers get more research done in less time and with less pain by teaching them basic lab skills for scientific computing. This hands-on workshop will cover basic concepts and tools for effective data management using SQL and MongoDB. Participants will be encouraged to help one another and to apply what they have learned to their own research problems.

For more information on what we teach and why, please see our paper "Best Practices for Scientific Computing".

Who: The course is aimed at graduate students and other researchers. You don't need to have any previous knowledge of the tools that will be presented at the workshop.

Where: Bldg 54 Rm 130, 1 Cyclotron Road, Berkeley, CA. Get directions with OpenStreetMap or Google Maps.

Requirements: Participants must bring a laptop with a few specific software packages installed (listed below). They are also required to abide by Software Carpentry's Code of Conduct.

Contact: Please mail dwinston@lbl.gov for more information.


Schedule

09:00 Managing data with SQL
10:30 Coffee
12:00 Lunch break
13:00 Managing data with MongoDB
14:30 Coffee
16:00 Wrap-up

Etherpad: http://pad.software-carpentry.org/swc-lbnl-2016-03-02.
We will use this Etherpad for chatting, taking notes, and sharing URLs and bits of code.


Syllabus

Managing Data with SQL

  • Reading and sorting data
  • Filtering with where
  • Calculating new values on the fly
  • Handling missing values
  • Combining values using aggregation
  • Combining information from multiple tables using join
  • Creating, modifying, and deleting data
  • Programming with databases
  • Reference...

Managing Data with MongoDB

  • Set up a local server
  • Connect via Python and a Jupyter Notebook
  • Import and Insert Data
  • Find Data; Specify Conditions with Operators
  • Sort Results
  • Update and Remove Data
  • Data Aggregation
  • Indexes
  • Reference...

Setup

To participate in this workshop, you will need the software described below. In addition, you will need an up-to-date web browser, and the following two files:

  1. survey.db
  2. gen-survey-database.sql

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

SQLite

SQL is a specialized programming language used with databases. We use a simple database manager called SQLite in our lessons.

Windows

The Software Carpentry Windows Installer installs SQLite for Windows. If you used the installer to configure nano, you don't need to run it again.

Mac OS X

SQLite comes pre-installed on Mac OS X.

Linux

SQLite comes pre-installed on Linux.

If you installed Anaconda, it also has a copy of SQLite without support to readline. Instructors will provide a workaround for it if needed.

MongoDB + Python

Python is a popular language for scientific computing, and great for general-purpose programming as well. Installing all of its scientific packages individually can be a bit difficult, so we recommend Anaconda, an all-in-one installer.

Regardless of how you choose to install it, please make sure you install Python version 3.x (e.g., 3.5 is fine).

We will teach MongoDB through its official Python driver (pymongo) using the IPython notebook, a programming environment that runs in a web browser. For this to work you will need a reasonably up-to-date browser. The current versions of the Chrome, Safari and Firefox browsers are all supported (some older browsers, including Internet Explorer version 9 and below, are not).

For detailed setup instructions for MongoDB and Python, including the packages needed for our lesson, click here.