Fundamentals of Data Engineering
Analytics Solution Architectures / Data at Scale Concerns and Tradeoffs / Distributed Data Processing / Relational Databases / Graph Databases / Streaming Data Applications / Cube Technology
Python / Relational databases / Hadoop / Map reduce / Spark / Cloud Computing (AWS)
Mark Mims and Taylor Martin
Storing, managing, and processing datasets are foundational to both applied computer science and data science. Indeed, successful deployment of data science in any organization is closely tied to how data is stored and processed. This course introduces the fundamentals of data storage, retrieval, and processing systems in the context of common data analytics processing needs. As these fundamentals are introduced, representative technologies will be used to illustrate how to construct storage and processing architectures. This course aims to provide a set of “building blocks” by which one can construct a complete architecture for storing and processing data. The course will examine how technical architectures vary depending on the problem to be solved and the reliability and freshness of the result.
The course considers the complete breadth of technology choices. The content spans from traditional databases and business warehouse architectures, so-called big-data architectures, to streaming analytics solutions and graph processing. Students will consider both small and large datasets because both are equally important and both justify different trade-offs. Exercises and examples will consider both simple and complex data structures, as well as data that is both clean and structured and dirty and unstructured.