Learn Apache Airflow By Example – 4 Part Series

apacheairflow docker python Dec 02, 2019

Introduction

I've been having a blast the last few months learning ApacheAirflow. It's become an indispensable tool in my getting stuff done toolbox.

Follow me on Twitter @jillianerowe, or send me a message with questions, topic suggestions, ice cream recommendations, or general shenanigans. 

 

Get the Source Code

...

Continue Reading...

Deploy your RShiny App on AWS Series - Lightsail

aws r rshiny Nov 24, 2019
 

If your head is spinning over which deployment scenario to choose for your RShiny app look no further! I have a whole series planned out for you on various deployment options and why you should choose each one.

Why should you use AWS Lightsail for RShiny Deployment?

Deployment scenarios are like snowflakes. No two are exactly alike! ;-) You need different level of power and control for different deployment scenarios. Here we are going to talk about RShiny deployments, but it applies to just about everything.

Lightsail is a relatively recent addition to the whole AWS deployment ecosystem.  It makes it much simpler and more streamlined to deploy a web application than some of their other, more powerful solutions.

Lightsail would be a good choice for deployment for you if:

  • You don't feel comfortable deploying web applications or configuring web servers (Apache, NGINX).
  • You are just fine with configuring webservers but you need something quick and easy.
  • You have a smallish...
Continue Reading...

Drive Traffic to Your Freelance Tech Business by Starting a Blog

career Nov 21, 2019

If you're looking to start a freelance business in tech, or any business really, then you'll need to find a way to have the clients knocking down your doors. To make this a very short story you mostly do this by putting yourself out there, being helpful, and talking to people.

More specifically, you can start a blog to demonstrate your awesome knowledge (get the word out there) with awesome content (helping people!). 

I've been a software engineer for nearly 10 years (how time flies!) and about 6 months ago I started to seriously consider the idea that I wanted to venture off on my own. Be my own boss, be mistress of my own destiny, all that kind of thing. My biggest worry was well how do I find my own clients?

One of the first pieces of advice I got was to start a blog, and it's worked out fairly well for me, so here we are! 

I'm not affiliated with any of these products or services. These are simply my opinions.

Buy a Domain Name

Well, first of all, think of a name...

Continue Reading...

How To Become a Freelance Software or DevOps Engineer

career freelance Nov 18, 2019

Your first steps

Getting started freelancing is one of those things you can google for the rest of your life and it will just leave your head spinning. I know this because I've been working towards being my very own Jillian Inc for the last 6 months or so. Here are a few straightforward tasks you can set for yourself in order to get started.

Register a Business

I got a lot of mostly wishy washy advice on this. There are people out there that say if you're not making much just don't bother. That may be a fair enough point, but I would say if you truly want to pursue freelancing, even on a moonlighting basis, to just register a business.

First of all, you can expense stuff to your business (legit business expenses only!) and it will probably save you the cost of registering the business. It will also make your taxes a lot more straightforward. Being a software engineer who likes open source this felt a bit too beaurocratic for me, but seriously, just do it.

...
Continue Reading...

Setup a Bioinformatics Demultiplex Server from Scratch

Install Demultiplex Software

Installing demultiplexing such as bcl2fastq, CellRanger, LongRanger, demuxlet, and whatever else pops up, holds a special place in those that do Bioinformatics and Genomics hearts and potentially therapy sessions. It has been enough of an issue in my professional life that I thought I would dedicate a series to setting up servers for different analysis types.

Don't install system packages

This is my big chance to go on a total rant about bioinformatics servers!

Don't install all kinds of software as system packages. Ok? Just don't do it. It may not backfire on you today, or tomorrow, but someday it will!

I'm going to make a few caveats to that. Things like zlib, openssl, and ssh are fine. I'll even cheat sometimes and yum install some development tools. Mostly, what I am talking about here is bioinformatics software. Don't bother installing bcl2fastq, blast, augustus, R, python, dask, or pretty much anything else as system dependencies.

There are...

Continue Reading...

AWS Elastic Compute Clusters for Genomics

aws bioinformatics hpc Oct 30, 2019

Amazon Web Services (AWS) gives you the ability to build out all kinds of cool infrastructure. Databases, web servers, Kubernetes backed applications, Spark clusters, machine learning models, and even High Performance Computing Clusters with AWS ParallelCluster.

 

Not just clusters, but Elastic Clusters!

One of the cooler aspects of using a cloud provider like AWS is the ability to scale up and down based on requests or need. This is generally called Elastic, and applies to a whole lot of services. Storage, kubernetes, load balancers, and compute clusters. This is first of all just awesome, because writing up something to scale up or down based on demand would be a major pain, and can give the best of all worlds.

Example Genomic Analysis Computional Needs

Let's say you're running a genomics analysis. First, you run your alignment, which takes (for the sake of argument) 20 nodes. Then you do variant calling, which takes 5 compute nodes, haplotype...

Continue Reading...

Deploy HPC Modules From Bioconda Packages

Uncategorized Oct 27, 2019

Deploy HPC Modules From Bioconda Packages

The Struggle is Real

I have been working in Bioinformatics for nearly 10 years, mostly on the computational side of things. I have spent a lot of that time building and installing software. Some of those wounds will never heal! Luckily, along came Anaconda, the scientific distribution of Python, along with the awesome BioConda who took on the task of installing bioinformatics software with relative ease! I don't know if Anaconda necessarily wanted to make life easier for those installing software on HPC systems, but in any case they did. 

(Disclaimer, I am technically a core team member of BioConda, but I'm really kind of a slacker core member and the real credit goes to the rest of the team!)

Deploy Modules with EasyBuild

One of my main goals in life is to deploy conda packages as HPC Modules. Deploying HPC Modules can be a bit of a pain. There...

Continue Reading...

Dask Tips and Tricks – HighLevelGraphs

Uncategorized Oct 23, 2019
Dask is an open source project in Python that allows you to scale your code on your laptop or on a cluster. Not only does it have a very clear syntax, you can also declare your order of operations in a data structure. This is a feature I was very interested in as this tends to be the use case I am tasked with most often. It's cool stuff!
For those of you who have written MPI, this is kind of like that, except you don't have to write MPI!If you would like to know more about basic Dask syntax, check out my blog post on Parallelizing For Loops with Dask.

Dask Syntax

Normally when using dask you wrapped dask.delayed around a function call, then when all those are queued up tell dask to compute your results. This is great, and I really like this syntax, but what about when you are fed a list of tasks and need to somehow feed these to Dask?

That is where a HighLevelGraph comes in!

Dask HighLevelGraphs

Dask HighLevelGraphs allow you to define a data...

Continue Reading...

Dask Tips and Tricks – Parallelize a For Loop

Uncategorized Oct 20, 2019

If you're in the scientific computing space there's a good chance you've used Python. A relatively recent addition to the family of awesome libraries in Python for Scientific Computing is Dask. It is a super cool library that allows you to parallelize your code with a very simple and straightforward syntax. There a few aspects of this library that especially call to me.

In no particular order, here they are!

  • It can be dropped into an existing codebase with little to no drama! Dask is meant to wrap around existing code and simply decide what can be executed asyncronously.
  • Dask can scale either on your laptop or to an entire compute cluster. Without writing MPI code! How cool is that?
  • Dask can parallelize data structures we already know and love, such as numpy arrays and data frames.

For those of you sitting here saying, but Spark can do all that, why yes, it can, but I don't find Spark nearly as easy to drop into an existing codebase as Dask. Also, I have I like to...

Continue Reading...

Dask on HPC

Recently I saw that Dask, a distributed Python library, created some really handy wrappers for running Dask projects on a High Performance Computing Cluster, HPC.

Most people who use HPC are pretty well versed in technologies like MPI, and just generally abusing multiple compute nodes all at once, but I think technologies like Dask are really going to be game changers in the way we all work. Because really, who wants to write MPI code or vectorize?

If you've never heard of Dask and it's awesomeness before, I think the easiest way to get started is to look at their Embarrassingly Parallel Example, and don't listen to the haters who think speeding up for loops is lame. It's a superpower!

Onward with examples!

Client and Scheduler

Firstly, these are all pretty much borrowed from the Dask Job Queues page. Pretty much, what you do, is you write your Python code as usual. Then, when you need to scale across nodes you leverage your HPC scheduler to get you some...

Continue Reading...
1 2 3 4
Close

50% Complete

Two Step

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.