Amazon Web Services (AWS) gives you the ability to build out all kinds of cool infrastructure. Databases, web servers, Kubernetes backed applications, Spark clusters, machine learning models, and even High-Performance Computing Clusters with AWS ParallelCluster.
One of the cooler aspects of using a cloud provider like AWS is the ability to scale up and down based on requests or need. This is generally called Elastic, and applies to a whole lot of services. Storage, Kubernetes, load balancers, and compute clusters. This is first of all just awesome, because writing up something to scale up or down based on demand would be a major pain, and can give the best of all worlds.
Let's say you're running a genomics analysis. First, you run your alignment, which takes (for the sake of argument) 20 nodes. Then you do variant calling, which takes 5 compute nodes, haplotype calling which takes a million (just kidding! mostly!) and then some manner of QC or reporting that brings you back down to 5.
Genomics analysis is especially well suited to elastic compute environments because the workloads tend to be very spiky. Data comes from the sequencer and BOOM oodles of compute power is needed, followed by an iota of compute power, followed by a just a tad, back to oodles, so on and so forth. Nobody wants to keep track of all those nodes needing to be brought up or down. Systems that rely on humans to do tedious work and generally have their acts together are mostly doomed.
If you didn't have elastic capabilities you would need to choose some number between 5 and 20 computational nodes. This would depend upon a lot of variables. How quickly do you need to deliver results? Is it worth it to your cost to have a bunch of nodes lying idle?
If you go with a cloud provider that has elastic capabilities, you don't need to choose. It's perfect because its very straightforward. You submit your job that requires 20 nodes, 20 nodes spin up. Then, since you only need 5, 15 of those spin down. More nodes would spin up for your haplotype calling, and then spin back down for QC.
If you're interested in building out an elastic cluster for yourself, your team, or your startup check out the AWS ParallelCluster tool. It allows you to create SLURM, PBS, SGE for traditional HPC clusters or AWS Batch clusters for containerized workloads. You can mount EFS storage, start cron jobs, run RStudio Server instances, add users, install bioconda packages, submit dask worker processes and do anything you would do with a normal linux server or anything that you would do with an EC2 instance.
In my next post I will go through an example of building out a SLURM cluster on AWS.
As always, world domination and happy teching!
Subscribe to the newsletter! You'll get a weekly tutorial on all the DevOps you need to know as a Data Scientist. Build Python Apps with Docker, Design and Deploy complex analyses with Apache Airflow, build computer vision platforms, and more.