datascience@berkeley Students Discuss Big Data

All the information available in the world — medical records, music and movie downloads, phone logs, and more — represent big data, which data scientists are trying to organize and make use of every day. With this in mind, we asked datascience@berkeley students:

What variables or elements make up big data? Is big data a factor or issue of velocity, variety, or volume, or is it some mix of these?

“It can be one or all of these, depending on the problem or application you are dealing with. For me, working in traditional consumer packaged goods, big data is an issue of volume — almost every transaction from every grocery store or mass merchandiser from the past five years is important to me.”

 – Elizabeth Peters, datascience@berkeley Graduate


 “There are definitely practical implications that come from working with high-velocity or high-volume data, but when companies or the media talk about big data, I think what we’re really interested in is data that lets us learn or accomplish something that hasn’t been possible before. In this sense, I don’t think any one of the ‘Vs’ is strictly necessary for the data to be ‘big’ — it just often happens that the most exciting data-driven projects have data with one or more of those attributes. If I had to pick, I think ‘variety’ might be the most important criteria for big data going forward. Having huge amounts of data is going to be increasingly common, but the real opportunities are going to be for the researchers who find ways to answer interesting questions by combining varied data sources in unexpected ways.”

 – Carson Forter, datascience@berkeley Student


 “Some mix, but I’d lean toward volume. I can see pretty clearly where volume, and perhaps even velocity, can affect health care right now. Health care data has always been varied, and I haven’t yet seen a good way to integrate the data we get from the bench, the bedside, and the steps in between.”

– Nihar Patel, datascience@berkeley Graduate


 "Any or all of the above. I’m imagining some businessperson, overwhelmed with a data-centric problem, reading answers to this question that emphasize velocity and volume, for example, and saying, ‘Oh, but I only have variety! I guess I don’t have a big data problem.’ For me, big data is a buzzword that resonates with so many people because of its inclusivity. It doesn’t matter which of the three Vs is plaguing you — maybe it’s all three — you’re still faced with the same set of challenges: lots of data, vague promises of insights contained within, and the challenge of bringing together the right sets of skills to get those insights.”

 – Sam Zaiss, datascience@berkeley Graduate


 

Want to learn more about the datascience@berkeley student body? View our class profile.