This is part 1 of a series I have in the works about Bioinformatics Solutions on AWS. Each part of the series will stand on its own, so you won't be missing anything by reading one part and not the others.
So let's dive right in!
To keep things simple I am going to say exploratory analyses are analyses that are completed in the shell, with either a programming language such as Python or R, or just the terminal.
(Any other Bioinformaticians remember just how much we all used to use sed and awk? ;-) )
There can be some visualization here, but we'll draw the line at anything that can't be displayed in a Jupyterhub notebook or a Jupyter Lab instance.
In another post I'll discuss Production Analyses.
It's always a good idea to look for terms such as "auto-scaling", "elastic" or "on demand" when using AWS. This means that there is some smart mechanism baked in that only uses the resources when you need...
Today we are going to be talking about Deploying a Dash App on Kubernetes with a Helm Chart using the AWS Managed Kubernetes Service EKS.
For this post, I'm going to assume that you have an EKS cluster up and running because I want to focus more on the strategy behind a real-time data visualization platform. If you don't, please check out my detailed project template for building AWS EKS Clusters.
Dash is a data visualization platform written in Python.
Dash is the most downloaded, trusted framework for building ML & data science web apps.
Dash empowers teams to build data science and ML apps that put the power of Python, R, and Julia in the hands of business users. Full stack apps that would typically require a front-end, backend, and dev ops team can now be built and deployed in hours by data scientists with Dash. https://plotly.com/dash/
If you'd like to know what the Dash people say about Dash on Kubernetes you can read all about that here.
Pretty much though, Dash is...
If you're looking for a hassle free way to add authentication to your RShiny Apps you should check out polished.tech. In their own words:
Polished is an R package that adds authentication, user management, and other goodies to your Shiny apps. Polished.tech is the quickest and easiest way to use polished.
Polished.tech provides a hosted API and dashboard so you don't need to worry about setting up and maintaining the polished API. With polished.tech you can stop worrying about boilerplate infrastructure and get back to building your core Shiny application logic.
Polished.tech is a hosted solution for adding authentication to your RShiny Apps. It is completely free for the first 10 users, which gives you plenty of freedom to play around with it.
In this post we'll go over how to:
Dask is a parallel computing library for Python. I think of it as being like MPI without actually having to write MPI code, which I greatly appreciate!
Dask natively scales Python
Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love
One of the cooler aspects of Dask is that you scale across computers/servers/nodes/pods/container/etc. This is why I say it's like MPI.
What we'll be talking about today are:
Let's talk about some of the (many!) benefits to using Kubernetes!
Another very important aspect of Dask, at least for me, is that I can set it up so that the infrastructure side of things is completely...
If you're following along with the deploy RShiny on AWS Series, you'll know that I covered deploying RShiny with a helm chart . Today, I want to go deeper into deploying RShiny on EKS, along with some tips and tricks that I use for my everyday deployments.
If you'd like to learn more about deploying RShiny please consider checking out my FREE Deploy RShiny on AWS Guide!
Kubernetes is kind of a beast to get started with, and people constantly complain that its extremely complicated to get started with. They would be correct, but I'm here to give the 2 minute rundown of what you need to know to deploy your RShiny (or Dash, Flask, Django, Ruby on Rails, etc) application on Kubernetes. This is because Kubernetes is not magical, and it's not even that new. It's a very nice abstraction layer on...
If you're deploying applications on AWS one of the easier ways to get started is to simply deploy everything on EC2. If you're not familiar with AWS EC2, EC2 instances are compute instances. They are like your (linux) desktop, except that yous spin them up and kill them on demand with AWS console (or CLI or some other infrastructure automation tool).
If you're looking for a solution that has some built in scale and ability to throw a Load Balancer at it EC2 is fine. It's also nice because it's just a computer. You don't have to find a work around to get you a computer the way you do with Docker (docker run) or Kubernetes (kube exec). You deal with an EC2 instance the way you would any remote server, but its on AWS so you get some additional niceness such as the backups, Elastic IPs, and Load Balancers.
This is slightly more difficult than using LightSail. If you are completely new to AWS and on a tight deadline you may want to start there....
If your head is spinning over which deployment scenario to choose for your RShiny app look no further! I have a whole series planned out for you on various deployment options and why you should choose each one.
Deployment scenarios are like snowflakes. No two are exactly alike! ;-) You need different level of power and control for different deployment scenarios. Here we are going to talk about RShiny deployments, but it applies to just about everything.
Lightsail is a relatively recent addition to the whole AWS deployment ecosystem. It makes it much simpler and more streamlined to deploy a web application than some of their other, more powerful solutions.
Lightsail would be a good choice for deployment for you if:
Amazon Web Services (AWS) gives you the ability to build out all kinds of cool infrastructure. Databases, web servers, Kubernetes backed applications, Spark clusters, machine learning models, and even High-Performance Computing Clusters with AWS ParallelCluster.
One of the cooler aspects of using a cloud provider like AWS is the ability to scale up and down based on requests or need. This is generally called Elastic, and applies to a whole lot of services. Storage, Kubernetes, load balancers, and compute clusters. This is first of all just awesome, because writing up something to scale up or down based on demand would be a major pain, and can give the best of all worlds.
Let's say you're running a genomics analysis. First, you run your alignment, which takes (for the sake of argument) 20 nodes. Then you do variant calling, which takes 5 compute nodes, haplotype...
AWS Elastic Kubernetes Service (EKS) is a fully managed service AWS launched recently. Elastic services in AWS it means that the number of instances actually in use scales up or down based on the demand. This is first of all seriously cool, and second of all can cut down on costs. Fewer requests? Fewer nodes!
I'm just getting started with Kubernetes myself, and going through this walkthrough was a great learning exercise.
I love deploying applications with docker swarm because it's fairly simple and I already know it, however, Swarm for AWS has some downsides. Firstly, it is not elastic, secondly, in order to get sticky sessions you need to add an additional service such as Traefik. With session affinity you can deploy RShiny and Python Dash applications with no other functionality besides the built in, and that's amazing!
I also personally think the moving towards Kubernetes over Swarm. It even comes installed on my mac version of Docker. Now is a great time to get...
I'm loving creating videos, and so here is a 3 part series on getting started with AWS and EC2 Instances. Once you understand launching an EC2 instance absolutely every other part of AWS is going to make so much more sense. Behind the scenes we are all always just spinning up servers, installing all the things, and getting our stuff done. As always AWS has excellent documentation and I really encourage you to check it out!
Learning about deployment is an excellent strategy for so many reasons. Probably number one there is not having to wait around for people like me to have their acts together.
Not only that, there are just so many cool things that are possible with all the neat new deployment strategies and distributed computing libraries out there. It used to be that software engineers would need to put in a HUGE amount of time and effort in order to really optimize for speed.
These days the barrier to getting your application running fast is just so...
Subscribe to the newsletter! You'll get a weekly tutorial on all the DevOps you need to know as a Data Scientist. Build Python Apps with Docker, Design and Deploy complex analyses with Apache Airflow, build computer vision platforms, and more.