This is a get up and running post. It does not get into the nitty gritty details of developing with Spark, since I am only just getting comfortable with Spark myself. Mostly I wanted to get up and running, and write a post about some of the issues that came up along the way.
Spark is a distributed computing library with support for Java, Scala, Python, and R. It's what I refer to as a world domination technology, where you want to do lots of computations, and you want to do it fast. You can run computations from the embarrassingly parallel, such as parallelizing a for loop to complex workflows, and support for distributed machine learning as well. You can transparently scale out your computations to not only multiple cores, but even multiple machines by creating a spark cluster. How cool is that?
My favorite introduction to Spark and the Spark ecosystem is here at the mapr blog.
Well, I'm not sure...