Setup a Bioinformatics Demultiplex Server from Scratch

Install Demultiplex Software

Installing demultiplexing such as bcl2fastq, CellRanger, LongRanger, demuxlet, and whatever else pops up, holds a special place in those that do Bioinformatics and Genomics hearts and potential support groups. It has been enough of an issue in my professional life that I thought I would dedicate a series to setting up servers for different analysis types.

Don't install system packages

This is my big chance to go on a total rant about bioinformatics servers!

Don't install all kinds of software as system packages. Ok? Just don't do it. It may not backfire on you today, or tomorrow, but someday it will!

I'm going to make a few caveats to that. Things like zlib, openssl, and ssh are fine. I'll even cheat sometimes and yum install some development tools. Mostly, what I am talking about here is bioinformatics software. Don't bother installing bcl2fastq, blast, augustus, R, python, dask, or pretty much anything else as system dependencies.

There are better...

Continue Reading...

AWS Elastic Compute Clusters for Genomics

aws bioinformatics hpc Oct 30, 2019

Amazon Web Services (AWS) gives you the ability to build out all kinds of cool infrastructure. Databases, web servers, Kubernetes backed applications, Spark clusters, machine learning models, and even High-Performance Computing Clusters with AWS ParallelCluster.

 

Not just clusters, but Elastic Clusters!

One of the cooler aspects of using a cloud provider like AWS is the ability to scale up and down based on requests or need. This is generally called Elastic, and applies to a whole lot of services. Storage, Kubernetes, load balancers, and compute clusters. This is first of all just awesome, because writing up something to scale up or down based on demand would be a major pain, and can give the best of all worlds.

Example Genomic Analysis Computional Needs

Let's say you're running a genomics analysis. First, you run your alignment, which takes (for the sake of argument) 20 nodes. Then you do variant calling, which takes 5 compute nodes, haplotype...

Continue Reading...

Deploy HPC Modules From Bioconda Packages

The Struggle is Real

I have been working in Bioinformatics for nearly 10 years, mostly on the computational side of things. I have spent a lot of that time building and installing software. Some of those wounds will never heal! Luckily, along came Anaconda, the scientific distribution of Python, along with the awesome BioConda who took on the task of installing bioinformatics software with relative ease! I don't know if Anaconda necessarily wanted to make life easier for those installing software on HPC systems, but in any case they did. 

(Disclaimer, I am technically a core team member of BioConda, but I'm really kind of a slacker core member and the real credit goes to the rest of the team!)

Deploy Modules with EasyBuild

One of my main goals in life is to deploy conda packages as HPC Modules. Deploying HPC Modules can be a bit of a pain. There are a lot of naming conventions, environmental variables, file permissions, recursive file...

Continue Reading...
Close

50% Complete

Two Step

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.