Setup a MySQL + Python Docker Dev Stack

Uncategorized Apr 05, 2019

Learning Curve

The Pain

The first time I setup an app to connect to a mysql database I spent at least a full hour fiddling with that connection string, before throwing my hands up in frustration and taking a walk. I find walking to be very therapeutic, so I calmed down and figured out that I was mixing up the ports. I had a MySQL container and a node.js container. I already had something running on port 3306 on that computer, so I exposed the port on MySQL as 3307, but tried to connect to it in the node.js container as localhost:3307. 

Figuring it Out

Now I can say, well dummy, all the containers in a docker-compose stack talk to one another because docker does magic with networking, and the hostname is the same as the service name, and the port is the default internal port of the application. Hindsight and all that. 

Onwards with an Example!

Docker-Compose networking MAGIC

If you read my learning curve shenanigans above you will generally...

Continue Reading...

Creating a Custom CellProfiler Docker Image

Uncategorized Mar 25, 2019


If you've ever worked with scientific software you will know that installing them is not necessarily straightforward. I think this is changing quite a bit with tools like conda and docker, but sometimes we need to just sit down and debug an installation. There is hope here, because if you can get it working just once, and put it in a docker container, you don't ever have to worry about getting it working on another server! 


CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automatically. More information can be found in the CellProfiler Wiki
CellProfiler GitHub

Cool stuff. CellProfiler is an industry standard, and beyond useful for many scientists. It's totally worth packaging up into a docker container, and there is in fact one available from the wonderful people at CellProfiler. I needed to...

Continue Reading...

Setup PyCharm to use a Project Interpreter in a Docker Compose Service

Uncategorized Mar 22, 2019


If you are anything like me, you heavily rely on a debugger. Likewise, you install all your development environments into a docker container because installing things on your own computer is FOR THE BIRDS. I cannot tell you what a relief it is to have all my development environments safely in docker containers. I break operating systems frequently. It's a serious problem for me and I needed an intervention. For me, this intervention was docker. I do not care how often I break a docker container, because 1. it's versioned, and 2. I can always just rebuilt it. MAGICAL.

The Problem

Many IDEs and editors assume you are working from a local environment. I think this is changing with so many people moving towards cloud computing and technologies like docker. I don't have an exact list of IDEs that do and do not have use an interpreter from a docker or docker-compose environment. 

The Solution (Well, a solutions anyways! )

The JetBrains IDEs support...

Continue Reading...

Apache Airflow Tutorial – Part 4 DAG Patterns


During the previous parts in this series, I introduced Apache Airflow in general, demonstrated my docker dev stack, and built out a simple linear DAG definition. I want to wrap up the series by showing a few other common DAG patterns I regularly use.

In order to follow along, get the source code!

Bring up your Airflow Development Environment

unzip cd airflow-template docker-compose up -d docker-compose logs airflow_webserver

This will take a few minutes to get everything initialized, but once its up you will see something like this:

DAG Patterns

I use 3 main DAG patterns. Simple, shown in Part 3, Linear, and Gather. Of course, once you master these patterns, you an combine them to make much more complex pipelines.

Simple DAG Pattern

What I call a simple pattern (and I have no idea if any of these patterns have official names) is a chain of tasks where each task depends upon the previous task. In this...

Continue Reading...

Apache Airflow Tutorial – Part 3 Start Building


If you've read this far you should have a reasonable understanding of the Apache Airflow layout and be up and running with your own docker dev environment. Well done!  This part in the series will cover building an actual simple pipeline in Airflow.

Start building by getting the source code!

Build a Simple DAG

The simplest DAG is simply having a list of tasks, where each task depends upon its previous task. If you've spun up the airflow instance and taken a look, it looks like this:

Now, if you're asking why I would choose making an ice cream sundae as my DAG, you may need to reevaluate your priorities.

Generally, if you order ice cream, the lovely deliverer of the ice cream will first as you what kind of cone (or cup, you heathen) you want, then your flavor (or flavors!), what toppings, and then will put them all together into sweet, creamy, cold, deliciousness.

You would accomplish this awesomeness with the following Airflow code:


Continue Reading...

Apache Airflow Tutorial – Part 2 Install with Docker

Install Apache Airflow With Docker Overview

In this part of the series I will cover how to get a nice Apache Airflow instance up and running with docker. You won't need to have anything installed locally besides docker, which is fantastic, because configuring all these pieces individually would be kind of awful!

This is the exact same setup and configuration I use for my own Apache Airflow instances. When I run Apache Airflow in production I don't use Postgres in a docker container, as that is not recommended, but this setup is absolutely perfect for dev and will very closely match your production requirements!

Following along with a blog post is great, but the best way to learn is to just jump in and start building. Get the Apache Airflow Docker Dev Stack here.

Celery Job Queue

Getting an instance Apache Airflow up and running looks very similar to a Celery instance. This is because Airflow uses Celery behind the scenes to execute tasks. Read more...

Continue Reading...

Apache Airflow Tutorial – Part 1 Introduction

What is Apache Airflow?

Briefly, Apache Airflow is a workflow management system (WMS). It groups tasks into analyses, and defines a logical template for when these analyses should be run. Then it gives you all kinds of amazing logging, reporting, and a nice graphical view of your analyses. I'll let you hear it directly from the folks at Apache Airflow

Apache Airflow is a platform to programmatically author, schedule and monitor workflows.

Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.

When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative.

Source - ...

Continue Reading...

Setting up a Local Spark Development Environment using Docker

Every time I want to get started with new tech I figure out how to get a stack up and running that closely resembles a real-world production instance as much as possible. 

This is a get up and running post. It does not get into the nitty gritty details of developing with Spark, since I am only just getting comfortable with Spark myself. Mostly I wanted to get up and running, and write a post about some of the issues that came up along the way.


What is Spark?

Spark is a distributed computing library with support for Java, Scala, Python, and R. It's what I refer to as a world domination technology, where you want to do lots of computations, and you want to do it fast. You can run computations from the embarrassingly parallel, such as parallelizing a for loop to complex workflows, and support for distributed machine learning as well. You can transparently scale out your computations to not only multiple cores, but even multiple machines by creating a spark cluster....

Continue Reading...

Deploy a Celery Job Queue With Docker – Part 2 Deploy with Docker Swarm on AWS


In Part 1 of this series we went over the Celery Architecture, how to separate out the components in a docker-compose file, and laid the ground for deployment.

Deploy With AWS CloudFormation

This portion of the blog post assumes you have a ssh key setup. If you don't go to the AWS docs here.

What is CloudFormation?

AWS CloudFormation is an infrastructure design tool that allows users to design their infrastructure by defining file systems, compute requirements, networking, etc. If you have no interest in designing infrastructure, y0u probably don't need to worry. Cloudformation configurations are shareable through templates. 

Docker AWS CloudFormation

Getting Started

Docker has come to our rescue here, with a Docker for AWS CloudFormation template. This will, with the click of a few buttons, deploy a docker swarm on AWS for us!! 

Click on the page, and scroll down to quick start. Under 'Stable Channel' select '...

Continue Reading...

Deploy a Celery Job Queue With Docker – Part 1 Develop


In this post I will hopefully show you how to organize a large docker-compose project, specifically a project related to a job queue. In this instance we will use Celery, but hopefully you can see how the concepts relate to any project with a job queue, or just a large number of moving pieces.

This post will be in two parts. The first will give a very brief overview of celery, the architecture of a celery job queue, and how to setup a celery task, worker, and celery flower interface with docker and docker-compose. Part 2 will go over deployment using docker-swarm.


What is Celery?

Celery is a distributed job queuing system that allows us queue up oodles of tasks, and execute them as we have resources. 

From - 

Celery is an asynchronous task queue/job queue based on distributed message passing.It is focused on real-time operation, but supports scheduling as well.
The execution units, called tasks, are executed...
Continue Reading...

50% Complete

DevOps for Data Scientists Weekly Tutorials

Subscribe to the newsletter! You'll get a weekly tutorial on all the DevOps you need to know as a Data Scientist. Build Python Apps with Docker, Design and Deploy complex analyses with Apache Airflow, build computer vision platforms, and more.