DataEDGE Interview: MailChimp’s John Foreman
Today, we have an interview with John Foreman, chief scientist at MailChimp and a speaker at our upcoming DataEDGE 2014 conference. Here, he talks to us about looking at data science realistically, blending quantitative and qualitative skills, and why most data visualizations are overrated.
Tell us a little bit about yourself. What role do you play at MailChimp? What is your background?
At university, I studied math and operations research (i.e., applied math focusing on optimization), but once I left school, I became a management consultant. Specifically, I focused on implementing large-scale analytics systems for Fortune 500 companies, (i.e., revenue management systems for hotels and cruise lines, and juice blending optimization systems for beverage companies). The consulting experience taught me two important skills: how to move quickly in an analytics project and how to communicate complex data and math concepts to all parts of a business.
These experiences lent themselves well to being a data scientist at a start-up, so I took on the role as chief scientist for MailChimp. My role is both highly technical and highly creative. I know data science techniques (AI, data mining, optimization, forecasting, simulation, etc.) and I’ve made it my business to know all of MailChimp’s data. So when a problem or opportunity arises in the business, I am able to answer the question, “Can data and math solve this problem?” Perhaps the answer is to build a data product for the customer, or a forecast model for an internal team. Sometimes the answer is in fact, no, this is a user experience problem or a design problem, and not a math problem at all!
Is yours a solitary or team role? How do you see data scientists working within their companies?
I lead a team of other data scientists and engineers. The data science team operates as an internal consultancy for the rest of the business. That inherently means that the data science team is not the most important part of the business. Data science is really hot right now, but we must keep in mind that most businesses earn their money through something else (and I’m not talking ad targeting à la Google or Facebook)!
In the case of MailChimp, we make our money by sending e-mail campaigns for clients. So my job then is to enrich that service, to make it more valuable for our customers than it already is. And I do this using data. The data science team at MailChimp leads from the back in this way — constantly finding ways to serve other teams and our customers using analytics and data-driven products.
What will you be speaking about at DataEDGE?
At DataEDGE, I’ll discuss what data science looks like when it’s grounded firmly in business reality. A lot of companies treat data science like a new year’s resolution to get fit; they buy a bunch of exercise equipment (Hadoop-like substances) and maybe hire a personal trainer (consultants, data scientists), but then crap out once they have to stop spending money and actually do a little work. My talk will discuss how data science is used in practical, tangible ways at MailChimp to better enrich the user’s experience of the product.
The theme of DataEDGE this year is “A New Vision for Data Science.” What skills and tools do you see being important for future data scientists?
All over the data science landscape, we see nothing but adventures in missing the point. People are focusing on data visualization without ever asking whether the result they’re visualizing is helpful or correct. We see folks concentrating on getting more and more accuracy out of their predictive models (hi, Kaggle!) without ever figuring out whether the problem they’re solving needs to be solved. Even Netflix has moved away from the star ratings predictions that kicked off this focus on model refinement.
I believe the future of data science relies on a perfect blend of quantitative skills and soft skills. The Ph.D. deep learning expert who can’t communicate with the business and understand its problems is worthless. That person is doomed almost certainly to gold-plate a model that doesn’t matter to the business.
Why? Oftentimes the problems that folks “throw over the fence” to a data science team are ill posed or wholly misguided. The only way to know if the problem you’re solving is even the correct problem to solve is to get out of the basement and mingle with the business. It requires communication ability and business experience. And once an opportunity is truly identified, then the analytical firepower can be brought to bear to solve it.
In this way, I believe that data science is part of a rebirth in the liberal arts education. Data scientists at their best are Renaissance men and women. I’ve seen them pull from literature and anthropology and philosophy and economics and sociology on top of statistics and math and computer science. The best data scientists are going to have a well-roundedness to them that no MOOC could ever provide with an introduction to Hadoop and R and a tutorial on Random Forest. Often, journalists write about the newfangled startup or analytics tool that will “replace the data scientist.” To replace good data scientists is to replace really good human thinkers and problem solvers, not just folks who can run a regression in Python.
“Big Data” means a lot of things to a lot of people. What does “Big Data” mean to you?
I prefer a flexible but functional definition of big data. Big data is when your business wants to use data to solve a problem, answer a question, produce a product, etc., but the standard, simple methods (maybe it’s SQL, maybe it’s k-means, maybe it’s a single server with a cron job) break down on the size of the data set, causing time, effort, creativity, and money to be spent crafting a solution to the problem that leverages the data without simply sampling or tossing out records.
The main consideration here, then, is to weigh the cost of using “all the data” in this complex (and potentially brittle) solution versus the benefits gained over using a smaller data set in a cheaper, faster, more stable way.
What trend or development in data science has you most excited about the field? What are you tired of hearing about?
I’m tired of hearing about data viz. Most of the data visualizations I see are from companies looking to justify the money they’ve spent on distributed storage systems, data scientists, etc. These visualizations are often just marketing, but they fail the sniff test of, “How is your business better, because you’ve spent time doing this?” It’s all vanity.
Too often trends in data science and big data are driven by vendors, and for me, that’s a huge red flag. To have a vendor push a technique or a tool is to have folks wholly unconnected with the “business part of your business” trying to foist solutions on you to problems you likely don’t have.
Instead, I enjoy community-driven innovations — the development of analytics packages in Python and R, not to mention the birth of Julia for scientific programming.
We hear a lot about things like the Harvard Business Review’s “Sexiest Job of the 21st Century.” What do you think is the biggest myth or misnomer about data science? What would you say to set the record straight?
I wrote this article two years ago where I set the record straight on five myths about data science. Enjoy!