Category Archives: Database

Building a Data Pipeline in Python – Part 5 of N – Database, ORM & SQLAlchemy

Adding data to your database

Many people focusing on ETL will eventually be utilizing a database. We will be examining a relational database, SQLite in this case, to store and process our data. If you are not a SQL expert, this can be a daunting task. Most relational databases require you to know keys, indices, relationships, data types, etc. While you still need an understanding of these to do things properly, you do not need to write the SQL when utilizing an object relational mapper (ORM) such as SQLAlchemy in Python.

While the ORM handles a lot of the operations, there is one other very important thing to keep in mind about an ORM, these types of tools can utilize many types of databases. In our case, we’re using SQLite, but if you needed to switch it over to MySQL or SQL Server, you wouldn’t have to change your code! *This is mostly true, some operations are available in certain databases but not in others.*

SQLAlchemy will write all of the SQL behind the scenes for you and this type of abstraction can be extremely powerful for those who do not need extremely high performance reads / writes.

In this example, we will take a look at a script that would take your data and insert it into a database utilizing SQLAlchemy. For simplicity, we are only going to utilize a few columns of data to create tables for: country, orders, status.

If you have not gone through the previous posts, please do so in order to understand where we are at in terms of functionality. For convenience, we moved all of the analysis into one folder within our repository to keep life simple.

Current thread posts on this topic:

  1. INTRODUCTION – PART 1
  2. DATA EXPLORATION – PART 2
  3. TESTING DATA – PART 3
  4. BASIC REPORTING – PART 4
  5. DATABASE, ORM & SQLALCHEMY – PART 5
Continue reading