Manage High Content Screening CellProfiler Pipelines with Apache Airflow

If you are running a High Content Screening Pipeline you probably have a lot of moving pieces. As a non exhaustive list you need to:

  • Trigger CellProfiler Analyses, either from a LIMS system, by watching a filesystem, or some other process.
  • Keep track of dependencies of CellProfiler Analyses - first run an illumination correction and then your analysis.
  • If you have a large dataset and you want to get it analyzed sometime this century you need to split your analysis, run, and then gather the results.
  • Once you have results you need to decide on a method of organization. You need to put your data in a database and set up in depth analysis pipelines.

These tasks are much easier to accomplish when you have a system or framework that is built for scientific workflows.

If you prefer to watch I have a video where I go through all the steps in this tutorial.

Enter Apache Airflow

Apache Airflow is :

Airflow is a platform created by the community to programmatically author, schedule and...

Continue Reading...

50% Complete

DevOps for Data Scientists Weekly Tutorials

Subscribe to the newsletter! You'll get a weekly tutorial on all the DevOps you need to know as a Data Scientist. Build Python Apps with Docker, Design and Deploy complex analyses with Apache Airflow, build computer vision platforms, and more.