The notes represent a distinctly practical, hands-on approach to learning both pandas and SQL. While these two topics may be touched upon in traditional database courses (or introduction to data science courses), delving deeply into how to efficiently write modern code using these tools is often a lower priority compared to other topics. These notes approach learning the basics of data management from a different angle — you need to be comfortable with directly manipulating data before you can easily internalize additional concepts.

This course has been taught at undergraduate, graduate, and executive certificate levels in a variety of course structures.

This page contains the most recent version of those notes, currently taught to students in the MS-CAPP Program at the University of Chicago.

The data used in this course can be found in the repository here. Below is the table of contents with links to specific chapters. Please note that some sections are works in progress or TBD.

A combined PDF with all the notes is available here. Specific chapters are listed below, but please be aware that links in the chapter-specific PDFs are not currently functional.

.

Table of Contents

Introduction and Errata Introduction
Relational Databases
Rows and Columns
  • What is a Relational Database
  • Selecting Columns
  • WHERE: Filtering rows
  • Null
  • ORDER BY and LIMIT
  • Column Numbering
  • Where are we: A Note on Scope
Chapter 1
Basic Manipulations
  • Types
  • Renaming a Column
  • Basic Mathematical Manipulations, ABS and LEAST/GREATEST
  • Queries without a FROM Clause and Singletons
  • String Functions: LEFT, RIGHT, LOWER, UPPER, LENGTH, TRIM and CONCAT
  • ROUND and Changing Types (CAST)
  • CAST and changing types
Chapter 2
Subqueries, Distinct & Case
  • Query Evaluation Order: SELECT and WHERE
  • Comparisons: BETWEEN, LIKE and ILIKE
  • CASE: Conditional Logic
  • The DISTINCT Operator
  • Subqueries (IN, ANY, ALL)
  • Correlated Subqueries
Chapter 3
Database Internals: Transactions
  • REDO / COMBINE NEXT SECTIONS
  • Table Creation and Deletion
  • Database Operations: CRUD
  • Creating Tables, Constraints and Deleting tables
  • Altering Tables
  • Inserting, Copying, Updating and Deleting
  • Transactions and ACID
  • Isolation Levels in Relational Databases
  • Why do we care (NoSQL)?
  • NoSQL
  • Transaction Implementations [TBD]
Chapter 4
Aggregations
  • Introduction to MTA data set
  • GROUP BY clause
  • Column numbering syntax
  • Aggregates and CASE Statements
  • Named Subqueries
Chapter 5
Dates and Types
  • Date Types
  • Date Functions
  • Hard GROUP BY problems
Chapter 6
Averages
  • The Trouble with Averages
  • HAVING
  • COALESCE and NVL
Chapter 7
Joins
  • Joins
  • UNION and UNION ALL
  • Best Practices when Combining Tables
  • Intermediate Joins
  • Statistical Analysis in SQL
Chapter 8
Advanced Joins
  • The Shape of Data
  • Revenue over time & Advanced Joins
Chapter 9
Analytic Functions & CTE's
  • Analytic Functions
  • Using Analytic Functions with Transaction Data
  • Common Table Expressions (``CTE")
  • CTEs with the transaction data
Chapter 10
Database Internals: Performance Evaluation
  • Normalization
  • Views
  • Information Schema
  • Performance Considerations
  • Index
  • Distributed Systems and the CAP Theorem
Chapter 11
Extensions [TBD]
  • More Advanced Joins
  • OLAP: Cube and Rollup
  • Schemas
  • Keys
  • Data Exploration Strategies
  • Query Strategies
Chapter 12
Interview Hints
  • Interview Hints
  • Example Interview #1
  • Example Interview #2
  • Example Interview #3
  • Example Interview #4
Chapter 13
Pandas
Introduction
  • What is Pandas
  • Data structures
  • Selecting Columns and Rows
  • Column Types Conversion
  • Dealing with NaN
  • Choosing the largest and smallest values
  • Manipulating Data & Method Chaining
  • Indexes: Creating and Dropping
  • Views and Copies
Chapter 14
More Manipulations and Types
  • Sorting DataFrames
  • Dealing with Duplicates
  • Using Type specific functions
  • CASE style statements and the ``isin" operator
  • Regex Pattern Matching
Chapter 15
Aggregations
  • Introduction to the MTA dataset
  • Simple Aggregations
  • GroupBy Objects
  • Advanced Index / Multiindex
  • If not indexes...
  • Indexing with aggregations, a big Gotcha
Chapter 16
Joins
  • Helpful Table / Review
  • Merging data in Pandas
  • Complex Join Conditions
  • Stacking Data
  • Lags and Leads
  • Apply, map and applymap: Advanced Transformations
Chapter 17
Window Functions
  • Window Functions in Pandas
  • Some gotchas
  • Reshaping Data: Transpose, Stack and Unstack
  • A Bunch of stuff to clean up
  • Combining with the original DataFrame
  • Moving the Window
  • Pivot / Melt
Chapter 18
Appendix
Data Dictionaries
  • Introduction
  • Iowa Fleet data
  • NY MTA Data
  • Daily Stock Data: s2010 and s2011
  • Annual Fundamental Financial information: fnd
  • Soap Transaction Data
Appendix A
Connecting SQL to Python or R
  • Connecting to any database: ODBC and JDBC
  • Connecting only to PostgreSQL
Appendix B
Assignments
Example Exams