Colin Magdamo

Machine Learning Principal Mentor
Data Scientist at Mass General Hospital
LinkedIn

 

 

1. What did you find as a result of working together to preconfigure data science environments and infrastructure?

I found the entire process to be seamless and a massive boost to my productivity. Having a dev-ops/data engineer with extensive experience in setting up cloud computing environments is indispensable to my workflow as a data scientist. I have recognized this working independently, but especially so in working with Jillian on a new large-scale project where I took on a lead design role. Having a robust, scalable data science infrastructure informed every facet of project development and allowed for a focus on the science and core product design as opposed to fumbling over software setup and scalability.

 

2. What specific feature did you like most about this workflow?

I liked the consistency and scope of Jillian’s environments the most. Quick and accessible data science toolkits at scale are easier said than done, but Jillian set up a straightforward and performant infrastructure that included every tool necessary for the STEM-Away projects. This also meant that individual teams were all guaranteed to be working within the same environments, which makes project orchestration and technical collaboration significantly easier.

 

3. What would be three other benefits of the development of data science infrastructure?

One great benefit is that since the environment is built off open source tools such as python and R as well as public docker images, students can take away a foundational data science environment and build off of it as they learn. From large tech companies to academic labs to independent consultants, having a foundational software environment to quickly get analyses off the ground is critical. Jillian has provided an excellent default setup as well as guidance on how to customize it to new workflows.

 

4. Would you recommend this setup? If so, why?

I would absolutely recommend this setup. It reduces the friction of both starting a machine learning project as well as collaborating with others to build complex software products that contain machine learning models within them. As I’ve grown into my own strengths as a data scientist, I’ve also become acutely aware of where I struggle. I think I’m in the majority when I say that setting up a complicated data science infrastructure is one of those areas. One of the best parts about working with Jillian is that she not only removed that overhead for our project but has also provided tons of great resources and support so that I can catch up in this area. As this domain grows in complexity and expectations, it is becoming increasingly important to have at least a high-level understanding of these dev ops principles regardless of your title. This has been a great learning experience, and a collaboration that I hope continues.

 

5. Would you recommend this data science infrastructure set? If so, why?

I would absolutely recommend this setup. It reduces the friction of both starting a machine learning project as well as collaborating with others to build complex software products that contain machine learning models within them. As I’ve grown into my own strengths as a data scientist, I’ve also become acutely aware of where I struggle. I think I’m in the majority when I say that setting up a complicated data science infrastructure is one of those areas. One of the best parts about working with Jillian is that she not only removed that overhead for our project but has also provided tons of great resources and support so that I can catch up in this area. As this domain grows in complexity and expectations, it is becoming increasingly important to have at least a high-level understanding of these dev ops principles regardless of your title. This has been a great learning experience, and a collaboration that I hope continues.