Deploy and Scale your Dask Cluster with Kubernetes

Dask is a parallel computing library for Python. I think of it as being like MPI without actually having to write MPI code, which I greatly appreciate!

Dask natively scales Python

Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love

One of the cooler aspects of Dask is that you scale across computers/servers/nodes/pods/container/etc. This is why I say it's like MPI.

What we'll be talking about today are:

  • Advantages to using Kubernetes
  • Disadvantages to using Kubernetes
  • Install the Dask Helm Chart
  • Scale your Dask Workers Up / Down with kubectl scale
  • Modify the Dask Helm Chart to add Extra Packages
  • Autoscale your Dask Workers with Horizontal Pod AutoScalers

Benfits to Dask on Kubernetes

Let's talk about some of the (many!) benefits to using Kubernetes!

Customizable Configuration

Another very important aspect of Dask, at least for me, is that I can set it up so that the infrastructure side of things is completely...

Continue Reading...

Dask on HPC

Recently I saw that Dask, a distributed Python library, created some really handy wrappers for running Dask projects on a High-Performance Computing Cluster, HPC.

Most people who use HPC are pretty well versed in technologies like MPI, and just generally abusing multiple compute nodes all at once, but I think technologies like Dask are really going to be game-changers in the way we all work. Because really, who wants to write MPI code or vectorize?

If you've never heard of Dask and it's awesomeness before, I think the easiest way to get started is to look at their Embarrassingly Parallel Example, and don't listen to the haters who think speeding up for loops is lame. It's a superpower!

Onward with examples!

Client and Scheduler

Firstly, these are all pretty much borrowed from the Dask Job Queues page. Pretty much, what you do, is you write your Python code as usual. Then, when you need to scale across nodes you leverage your HPC scheduler to get you some...

Continue Reading...

50% Complete

DevOps for Data Scientists Weekly Tutorials

Subscribe to the newsletter! You'll get a weekly tutorial on all the DevOps you need to know as a Data Scientist. Build Python Apps with Docker, Design and Deploy complex analyses with Apache Airflow, build computer vision platforms, and more.