Deploy HPC Modules From Bioconda Packages

bioinformatics easybuild hpc Oct 28, 2019

The Struggle is Real

I have been working in Bioinformatics for nearly 10 years, mostly on the computational side of things. I have spent a lot of that time building and installing software. Some of those wounds will never heal! Luckily, along came Anaconda, the scientific distribution of Python, along with the awesome BioConda who took on the task of installing bioinformatics software with relative ease! I don't know if Anaconda necessarily wanted to make life easier for those installing software on HPC systems, but in any case they did. 

(Disclaimer, I am technically a core team member of BioConda, but I'm really kind of a slacker core member and the real credit goes to the rest of the team!)

Deploy Modules with EasyBuild

One of my main goals in life is to deploy conda packages as HPC Modules. Deploying HPC Modules can be a bit of a pain. There are a lot of naming conventions, environmental variables, file permissions, recursive file permissions, and just generally tons of stuff I don't want to deal with. 

In fact, I really shouldn't be dealing with it because any system that relies on me actually memorizing anything and having my act together is just doomed.

Anyways, I was introduced to EasyBuild a few years ago, and have since abused it mostly to install BioConda packages. It has so, so much more functionality than what I use it for, and I recommend you check it out!

Generating the Configs

Easybuild works by using templates, or EasyConfigs, which get parsed by some awesome Python code, and then spit out into ready to consume HPC Modules.

For YEARS I have been meaning to build a tool that would allow to easily spit out these configs and I finally have! Woooo.


Use the Script

First of all, this script comes along with the usual disclaimers. I wrote it, mostly for my use case, and I didn't consider other people's potential use cases. I tested this out and I've used it, but it isn't the most robust of tools! 😉 With all that out of the way there are 3 basic ways you might want to use this script.

Generate an EasyConfig from a Single Conda or BioConda Package

If, for instance you check the BioConda Recipe for Trimmomatic you will see it has all sorts of fun information, including a name, version, homepage and summary (most of the time anyways). This, as it turns out, is all the same information we need to build out our EasyConfig, and it's available through the Anaconda Client API!

# Create a single EasyConfig for Trimmomatic Version 0.39
python ./ module -p bioconda/trimmomatic/0.39

# Create a single EasyConfig for Trimmomatic with the Latest Version
python ./ module -p bioconda/trimmomatic

# Create EasyConfigs for Trimmomatic, FastQC, and Samtools
python ./ module -p bioconda/samtools/1.9 \
bioconda/trimmomatic/0.39 bioconda/fastqc

Generate a Bundle EasyConfig for Modules 

If you have existing modules, conda or not, and you would like to load them all with a single module load ​command, you can use the bundle subcommand. Note that this syntax is different from the package syntax we used above, as we are not querying the anaconda api.​​

When using this syntax you must ​specify both the name and the version of the module.​

python ./ bundle -n qc -v 1.0 \
-m trimmomatic=0.39 fastqc=0.11.8

This would create a qc/1.0 module, which when loaded would also load the trimmomatic/0.39 and the fastqc/0.11.8 modules. Less typing wooooo!

Generate a Bundle EasyConfig from Conda Packages

You can also just jump straight into generating a Bundle from a list of conda packages. This command will create the EasyConfigs for each conda package and the EasyConfig for the bundle.

python ./ bundle -n qc -v 1.0 \
-p bioconda/trimmomatic/0.39 bioconda/fastqc

Building the Modules

Once you have your EasyConfigs you just point Easybuild at them and let them roll!

# Install the qc/1.0 module and all of its dependencies
eb --robot qc-1.0.eb

Bioinformatics Solutions on AWS Newsletter 

Get the first 3 chapters of my book, Bioinformatics Solutions on AWS, as well as weekly updates on the world of Bioinformatics and Cloud Computing, completely free, by filling out the form next to this text.

Bioinformatics Solutions on AWS

If you'd like to learn more about AWS and how it relates to the future of Bioinformatics, sign up here.

We won't send spam. Unsubscribe at any time.