Structure Data: Training the Data Scientists of the Future

The following post explores just one of the many talks at Gigaom’s Structure Data. To find out more about Structure Data 2014, check out our recap here.

The final speaker at Structure Data 2014 was our very own UC Berkeley School of Information Dean AnnaLee (Anno) Saxenian. She took the stage Thursday with Gigaom Senior Writer Derrick Harris to answer questions about the future of data science education and reveal how top-notch training today will make a difference for the data scientists of tomorrow.

The first question posed to her was about the range of instruction. Specifically, Harris asked, “What kind of skills are we talking about when you talk about training data scientists?” Dean Saxenian explained that while the data science “basics” — programming, statistics, and their ilk — are crucial, that’s really only half of the battle. To fill the role of a well-rounded “data scientist,” rather than a practitioner who specializes in machine learning, programming, or another specific field, students must learn everything from how to analyze data to how to communicate their findings to others who don’t possess the same technical knowledge.

To answer this need, the I School has established a cross-disciplinary curriculum for the Master of Information and Data Science (MIDS) students, one that covers the traditional programming and statistics skills while also exploring machine learning, data visualization, data storage, and even the ethics and privacy debate inherent in the data science domain.

Is this really necessary? The 70+ speakers at Structure Data drew from a wide range of industries and skill sets, indicating just how broad the field of data analytics truly is. Data, after all, is an elastic concept. In the past, specialists have opted to pick and choose skills to match their current projects, but there’s no room to skimp when training future data scientists. Specifically, Saxenian said,

The world of tech keeps changing and education has to change as well.

In the case of MIDS, this means offering a degree in an online setting to ensure that the best students have access to the school, even if their location might otherwise not allow them to participate in an on-campus program. The students also tend to skew older, Saxenian said, with an average of 10 years experience versus five for applicants to other I School programs. This advanced preparation illustrates the demand for this type of degree, specifically from mid-career professionals who want to keep abreast of the changing opportunities in data science.

The UC Berkeley I School has also created a technical advisory board to ensure that the program’s curriculum stays ahead of the curve as technology changes. In the Dean’s view, the holistic future of data science means that organization and people skills are as important as the hard skills. For this reason, the MIDS program has been specifically designed as a full-service approach, one that does not shy away from either the technical or the soft skills required by a data science position.

Dean Saxenian’s talk highlighted some of the key distinctions for training the data scientists of the future. It is no longer enough to offer specialty courses that prepare a student in just one area; the future of data science is holistic and will require widespread data literacy. Saxenian and the UC Berkeley I School have worked hard to offer a “higher touch” education, one that can draw the best students from all around the world and provide them with a full-service education that covers everything from research design and teamwork to ethics and visualization. Watch: How Will We Train Data Scientists of the Future.

Want more from Gigaom’s Structure Data 2014? Check out our other posts: