Bioinformatics Solutions on AWS For Exploratory Analysis

This is part 1 of a series I have in the works about Bioinformatics Solutions on AWS. Each part of the series will stand on its own, so you won't be missing anything by reading one part and not the others.

So let's dive right in!

Lay of the Land

To keep things simple I am going to say exploratory analyses are analyses that are completed in the shell, with either a programming language such as Python or R, or just the terminal.

(Any other Bioinformaticians remember just how much we all used to use sed and awk? ;-) )

There can be some visualization here, but we'll draw the line at anything that can't be displayed in a Jupyterhub notebook or a Jupyter Lab instance.

In another post I'll discuss Production Analyses.

AWS Solutions for Exploratory Analyses

Common Infrastructure Components

It's always a good idea to look for terms such as "auto-scaling", "elastic" or "on demand" when using AWS. This means that there is some smart mechanism baked in that only uses the resources when you need...

Continue Reading...

Get a Fully Configured Apache Airflow Docker Dev Stack with Bitnami

apache airflow Aug 02, 2020

I've been using it for around 2 years now to build out custom workflow interfaces, like those used for Laboratory Information Management Systems (LIMs), Computer Vision pre and postprocessing pipelines, and to set and forget other genomics pipelines.

My favorite feature of Airflow is how completely agnostic it is to the work you are doing or where that work is taking place. It could take place locally, on a Docker image, on Kubernetes, on any number of AWS services, on an HPC system, etc. Using Airflow allows me to concentrate on the business logic of what I'm trying to accomplish without getting too bogged down in implementation details.

During that time I've adopted a set of systems that I use to quickly build out the main development stack with Docker and Docker Compose, using the Bitnami Apache Airflow stack. Generally, I either deploy the stack to production using either the same Docker compose stack if its a small enough instance that is isolated, or with Kubernetes when I...

Continue Reading...

Deploy Dash with Helm on Kubernetes with AWS EKS

aws dash eks kubernetes python Jul 14, 2020

Today we are going to be talking about Deploying a Dash App on Kubernetes with a Helm Chart using the AWS Managed Kubernetes Service EKS.

For this post, I'm going to assume that you have an EKS cluster up and running because I want to focus more on the strategy behind a real-time data visualization platform. If you don't, please check out my detailed project template for building AWS EKS Clusters.

Dash is a data visualization platform written in Python.

Dash is the most downloaded, trusted framework for building ML & data science web apps.

Dash empowers teams to build data science and ML apps that put the power of Python, R, and Julia in the hands of business users. Full stack apps that would typically require a front-end, backend, and dev ops team can now be built and deployed in hours by data scientists with Dash.

If you'd like to know what the Dash people say about Dash on Kubernetes you can read all about that here.

Pretty much though, Dash is...

Continue Reading...

RShiny Authentication with Polished on AWS Kubernetes

aws kubernetes r rshiny Jun 28, 2020

If you're looking for a hassle free way to add authentication to your RShiny Apps you should check out In their own words:

Polished is an R package that adds authentication, user management, and other goodies to your Shiny apps. is the quickest and easiest way to use polished. provides a hosted API and dashboard so you don't need to worry about setting up and maintaining the polished API. With you can stop worrying about boilerplate infrastructure and get back to building your core Shiny application logic. is a hosted solution for adding authentication to your RShiny Apps. It is completely free for the first 10 users, which gives you plenty of freedom to play around with it.

In this post we'll go over how to:

  • Quickly and easily get started with the hosted solution and grab your API credentials
  • Test out your API credentials with a pre built docker container
  • Deploy an AWS Kubernetes (EKS) Cluster
  • ...
Continue Reading...

Manage High Content Screening CellProfiler Pipelines with Apache Airflow

If you are running a High Content Screening Pipeline you probably have a lot of moving pieces. As a non exhaustive list you need to:

  • Trigger CellProfiler Analyses, either from a LIMS system, by watching a filesystem, or some other process.
  • Keep track of dependencies of CellProfiler Analyses - first run an illumination correction and then your analysis.
  • If you have a large dataset and you want to get it analyzed sometime this century you need to split your analysis, run, and then gather the results.
  • Once you have results you need to decide on a method of organization. You need to put your data in a database and set up in depth analysis pipelines.

These tasks are much easier to accomplish when you have a system or framework that is built for scientific workflows.

If you prefer to watch I have a video where I go through all the steps in this tutorial.

Enter Apache Airflow

Apache Airflow is :

Airflow is a platform created by the community to programmatically author, schedule and...

Continue Reading...

Deploy and Scale your Dask Cluster with Kubernetes

Dask is a parallel computing library for Python. I think of it as being like MPI without actually having to write MPI code, which I greatly appreciate!

Dask natively scales Python

Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love

One of the cooler aspects of Dask is that you scale across computers/servers/nodes/pods/container/etc. This is why I say it's like MPI.

What we'll be talking about today are:

  • Advantages to using Kubernetes
  • Disadvantages to using Kubernetes
  • Install the Dask Helm Chart
  • Scale your Dask Workers Up / Down with kubectl scale
  • Modify the Dask Helm Chart to add Extra Packages
  • Autoscale your Dask Workers with Horizontal Pod AutoScalers

Benfits to Dask on Kubernetes

Let's talk about some of the (many!) benefits to using Kubernetes!

Customizable Configuration

Another very important aspect of Dask, at least for me, is that I can set it up so that the infrastructure side of things is completely...

Continue Reading...

Setup a High Content Screening Imaging Platform with Label Studio

For a few years now I have been on a quest to find a tool I really like for annotating HCS images using a web interface. I've used several tools, including a desktop application called LabelImg, and I have finally found a tool that checks all the boxes called LabelStudio!

Label Studio is an annotation tool for images, audio, and text. Here we'll be concentrating on images as our medium of choice.

I go through the process in this video.


Grab the data

You can, of course, use your own data, but for this tutorial I will be using a publically available C. elegans dataset from the Broad BioImage Benchmark Collection.

mkdir data cd data wget unzip

HCS images are often very dark when opened in a system viewer. To use them for the rest of the pipeline we will have to do a two step conversion process, first using bftools to convert from tif -> png, and then using Imagmagick to do a levels...

Continue Reading...

Deploy RShiny with Kubernetes using AWS EKS and Terraform

Deploy RShiny on AWS EKS with Terraform

Today we will be deploying the rocker/shiny image on Kubernetes with AWS, or EKS.

If you're following along with the deploy RShiny on AWS Series, you'll know that I covered deploying RShiny with a helm chart . Today, I want to go deeper into deploying RShiny on EKS, along with some tips and tricks that I use for my everyday deployments.

If you'd like to learn more about deploying RShiny please consider checking out my FREE Deploy RShiny on AWS Guide!

An Extremely Brief Introduction to Kubernetes

Kubernetes is kind of a beast to get started with, and people constantly complain that its extremely complicated to get started with. They would be correct, but I'm here to give the 2 minute rundown of what you need to know to deploy your RShiny (or Dash, Flask, Django, Ruby on Rails, etc) application on Kubernetes. This is because Kubernetes is not magical, and it's not even that new. It's a very nice abstraction layer on...

Continue Reading...

Deploy RShiny with the Rocker/Shiny Docker Image

docker r rshiny Apr 13, 2020

If you are deploying RShiny with docker you can roll your own image , or you can use the rocker Dockerhub image .

Which solution you go for will depend upon your own needs. Normally, I am using a lot of bioinformatics and/or data science packages and nothing really beats the conda ecosystem for that in my opinion. On the other hand having an image with RShiny already installed is really nice! I also quite like the way that the rocker/shiny image is configured. You drop your files into a directory and as long as you have the packages you need your shiny app will start up directly!

If you'd like to learn more about deploying RShiny please consider checking out my FREE Deploy RShiny on AWS Guide!

Deploy a simple shiny app using the rocker/shiny image

Here is our usual shiny app mostly stolen from the R shiny docs . ;-)

#!/usr/bin/env Rscript # This example comes from the r-shiny examples github repo. #
Continue Reading...

Deploy RShiny on Kubernetes with a Helm Chart

helm kubernetes r rshiny Apr 01, 2020

If you want a very robust solution to deploy your RShiny applications a managed Kubernetes service on AWS, or EKS, is a great choice! Kubernetes has, for now, fully dominated the DevOps and deployment space in terms of deploying containerized services.

If you'd like to learn more about deploying RShiny please consider checking out my FREE Deploy RShiny on AWS Guide!

Should I Use Kubernetes to Deploy my RShiny App?

As with all things, there are pros and cons to using Kubernetes.


  • Kubernetes is a beast to get started with.
  • You need to have your RShiny App packaged with docker.


  • Kubernetes is incredibly well supported across a range of providers, including AWS, GCP, Azure, Digital Ocean, etc.
  • Kubernetes is complicated, but with complicated comes armies of engineers who want to abstract away the complicated into configuration files. So you if you stick with the growing pains it's likely you will get a lot of benefit and things will get less painful over time.
  • Kubernetes has...
Continue Reading...
1 2 3 4 5

50% Complete

DevOps for Data Scientists Weekly Tutorials

Subscribe to the newsletter! You'll get a weekly tutorial on all the DevOps you need to know as a Data Scientist. Build Python Apps with Docker, Design and Deploy complex analyses with Apache Airflow, build computer vision platforms, and more.