Deploy your RShiny App Locally with Docker

docker r rshiny Dec 10, 2019

My favorite way to deploy RShiny locally is to simply package it into a docker image and run it. Running any application in docker makes it easily transportable, and is a generally acceptable way of distributing applications in 2019.

This solution does not require rshiny-server or shinyapps.io, not because I have anything against either of those solutions. I just tend to stick to a few favorite deployment methods to keep my head from spinning straight off my body. ;-)

If you're not familiar with docker then I have a FREE course available. The first module is plenty to get you up and running and can be completed in an hour or two. For the most part, if you can use a command line you can use docker.

Package your R Shiny App in Docker

Now that we've covered some housekeeping let's get started building your docker image. Like any project, you want to have all your relevant code in a directory. This way it is accessible to the docker build process.

Project Directory Structure 

...
Continue Reading...

Learn Apache Airflow By Example – 4 Part Series

apacheairflow docker python Dec 02, 2019

Introduction

I've been having a blast the last few months learning ApacheAirflow. It's become an indispensable tool in my getting stuff done toolbox.

Follow me on Twitter @jillianerowe, or send me a message with questions, topic suggestions, ice cream recommendations, or general shenanigans. 

 

Get the Source Code

...

Continue Reading...

Deploy your RShiny App on AWS Series - Lightsail

aws r rshiny Nov 24, 2019
 

If your head is spinning over which deployment scenario to choose for your RShiny app look no further! I have a whole series planned out for you on various deployment options and why you should choose each one.

Why should you use AWS Lightsail for RShiny Deployment?

Deployment scenarios are like snowflakes. No two are exactly alike! ;-) You need different level of power and control for different deployment scenarios. Here we are going to talk about RShiny deployments, but it applies to just about everything.

Lightsail is a relatively recent addition to the whole AWS deployment ecosystem.  It makes it much simpler and more streamlined to deploy a web application than some of their other, more powerful solutions.

Lightsail would be a good choice for deployment for you if:

  • You don't feel comfortable deploying web applications or configuring web servers (Apache, NGINX).
  • You are just fine with configuring webservers but you need something quick and easy.
  • You have a smallish...
Continue Reading...

Setup a Bioinformatics Demultiplex Server from Scratch

Install Demultiplex Software

Installing demultiplexing such as bcl2fastq, CellRanger, LongRanger, demuxlet, and whatever else pops up, holds a special place in those that do Bioinformatics and Genomics hearts and potential support groups. It has been enough of an issue in my professional life that I thought I would dedicate a series to setting up servers for different analysis types.

Don't install system packages

This is my big chance to go on a total rant about bioinformatics servers!

Don't install all kinds of software as system packages. Ok? Just don't do it. It may not backfire on you today, or tomorrow, but someday it will!

I'm going to make a few caveats to that. Things like zlib, openssl, and ssh are fine. I'll even cheat sometimes and yum install some development tools. Mostly, what I am talking about here is bioinformatics software. Don't bother installing bcl2fastq, blast, augustus, R, python, dask, or pretty much anything else as system dependencies.

There are better...

Continue Reading...

AWS Elastic Compute Clusters for Genomics

aws bioinformatics hpc Oct 30, 2019

Amazon Web Services (AWS) gives you the ability to build out all kinds of cool infrastructure. Databases, web servers, Kubernetes backed applications, Spark clusters, machine learning models, and even High-Performance Computing Clusters with AWS ParallelCluster.

 

Not just clusters, but Elastic Clusters!

One of the cooler aspects of using a cloud provider like AWS is the ability to scale up and down based on requests or need. This is generally called Elastic, and applies to a whole lot of services. Storage, Kubernetes, load balancers, and compute clusters. This is first of all just awesome, because writing up something to scale up or down based on demand would be a major pain, and can give the best of all worlds.

Example Genomic Analysis Computional Needs

Let's say you're running a genomics analysis. First, you run your alignment, which takes (for the sake of argument) 20 nodes. Then you do variant calling, which takes 5 compute nodes, haplotype...

Continue Reading...

Deploy HPC Modules From Bioconda Packages

The Struggle is Real

I have been working in Bioinformatics for nearly 10 years, mostly on the computational side of things. I have spent a lot of that time building and installing software. Some of those wounds will never heal! Luckily, along came Anaconda, the scientific distribution of Python, along with the awesome BioConda who took on the task of installing bioinformatics software with relative ease! I don't know if Anaconda necessarily wanted to make life easier for those installing software on HPC systems, but in any case they did. 

(Disclaimer, I am technically a core team member of BioConda, but I'm really kind of a slacker core member and the real credit goes to the rest of the team!)

Deploy Modules with EasyBuild

One of my main goals in life is to deploy conda packages as HPC Modules. Deploying HPC Modules can be a bit of a pain. There are a lot of naming conventions, environmental variables, file permissions, recursive file...

Continue Reading...

Dask Tips and Tricks – HighLevelGraphs

Uncategorized Oct 23, 2019
Dask is an open source project in Python that allows you to scale your code on your laptop or on a cluster. Not only does it have a very clear syntax, you can also declare your order of operations in a data structure. This is a feature I was very interested in as this tends to be the use case I am tasked with most often. It's cool stuff!
For those of you who have written MPI, this is kind of like that, except you don't have to write MPI!If you would like to know more about basic Dask syntax, check out my blog post on Parallelizing For Loops with Dask.

Dask Syntax

Normally when using dask you wrapped dask.delayed around a function call, then when all those are queued up tell dask to compute your results. This is great, and I really like this syntax, but what about when you are fed a list of tasks and need to somehow feed these to Dask?

That is where a HighLevelGraph comes in!

Dask HighLevelGraphs

Dask HighLevelGraphs allow you to define a data...

Continue Reading...

Dask Tips and Tricks – Parallelize a For Loop

Uncategorized Oct 20, 2019

If you're in the scientific computing space there's a good chance you've used Python. A relatively recent addition to the family of awesome libraries in Python for Scientific Computing is Dask. It is a super cool library that allows you to parallelize your code with a very simple and straightforward syntax. There a few aspects of this library that especially call to me.

In no particular order, here they are!

  • It can be dropped into an existing codebase with little to no drama! Dask is meant to wrap around existing code and simply decide what can be executed asyncronously.
  • Dask can scale either on your laptop or to an entire compute cluster. Without writing MPI code! How cool is that?
  • Dask can parallelize data structures we already know and love, such as numpy arrays and data frames.

For those of you sitting here saying, but Spark can do all that, why yes, it can, but I don't find Spark nearly as easy to drop into an existing codebase as Dask. Also, I have I like to...

Continue Reading...

Dask on HPC

Recently I saw that Dask, a distributed Python library, created some really handy wrappers for running Dask projects on a High-Performance Computing Cluster, HPC.

Most people who use HPC are pretty well versed in technologies like MPI, and just generally abusing multiple compute nodes all at once, but I think technologies like Dask are really going to be game-changers in the way we all work. Because really, who wants to write MPI code or vectorize?

If you've never heard of Dask and it's awesomeness before, I think the easiest way to get started is to look at their Embarrassingly Parallel Example, and don't listen to the haters who think speeding up for loops is lame. It's a superpower!

Onward with examples!

Client and Scheduler

Firstly, these are all pretty much borrowed from the Dask Job Queues page. Pretty much, what you do, is you write your Python code as usual. Then, when you need to scale across nodes you leverage your HPC scheduler to get you some...

Continue Reading...

Deploy Bioinformatics Modules on HPC

Uncategorized Sep 22, 2019

Deploying scientific software in an HPC environment can be challenging. Deploying bioinformatics software anyhow anywhere, or say a High Performance Computing Cluster (HPC), can be especially challenging!

Life, at least in this regard, has become so much better in the last few years. Anaconda, the scientific python distro along with Conda, package manager and builder awesomeness made deploying software so much more streamlined. There are amazing groups contributing packages to conda. It's become a whole ecosystem of people working on infrastructure, software and packaging. Bioconda and Conda-Forge are two great groups that have added a ton of value to communities that use scientific software. EasyBuild gives you some pretty great capabilities for deploying your software as modules, which makes them available to everybody!

Disclaimer - I am a core team member of Bioconda, but I'm kind of a slacker member and they are awesome all on...

Continue Reading...
Close

50% Complete

DevOps for Data Scientists Weekly Tutorials

Subscribe to the newsletter! You'll get a weekly tutorial on all the DevOps you need to know as a Data Scientist. Build Python Apps with Docker, Design and Deploy complex analyses with Apache Airflow, build computer vision platforms, and more.