Setting up a Local Spark Development Environment using Docker

Introduction

This is a get up and running post. It does not get into the nitty gritty details of developing with Spark, since I am only just getting comfortable with Spark myself. Mostly I wanted to get up and running, and write a post about some of the issues that came up along the way.

 

What is Spark?

Spark is a distributed computing library with support for Java, Scala, Python, and R. It's what I refer to as a world domination technology, where you want to do lots of computations, and you want to do it fast. You can run computations from the embarrassingly parallel, such as parallelizing a for loop to complex workflows, and support for distributed machine learning as well. You can transparently scale out your computations to not only multiple cores, but even multiple machines by creating a spark cluster. How cool is that?

My favorite introduction to Spark and the Spark ecosystem is here at the mapr blog.

 

Why should I learn Spark?

Well, I'm not sure...

Continue Reading...
Close

50% Complete

Two Step

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.