From Teaching

BigData: IntroToBigData

CS 567



Course description:

The field of computer science is experiencing a transition from computation-intensive to data-intensive problems, wherein data is produced in massive amounts by large sensor networks, new data acquisition techniques, simulations, and social networks. Efficiently extracting, interpreting, and learning from very large datasets requires a new generation of scalable algorithms as well as new data management technologies.

In this course we explore key data analysis and management techniques, which applied to massive datasets are the cornerstone that enables real-time decision making in distributed environments, business intelligence in the Web, and scientific discovery at large scale. In particular, we examine the map-reduce parallel computing paradigm and associated technologies such as distributed file systems, no-SQL databases, and stream computing engines. Additionally we review machine learning methods that make possible the efficient analysis of large volumes of data in near real time.

This course is highly interactive and based on the problem-based learning philosophy; students are expected to make use of said technologies to design highly scalable systems that can process and analyze Big Data for a variety of scientific, social, and environmental challenges.

Core topics:

Course objectives:

At the end of this course, the student will become familiar with the fundamental concepts of Big Data management an analytics; will become competent in recognizing challenges faced by applications dealing with very large volumes of data as well as in proposing scalable solutions for them; and will be able to understand how Big Data impacts business intelligence, scientific discovery, and our day-to-day life.

For more information look at the Syllabus


Hortonworks Academic Partner

Supported by AWS in Education Grant award


Poster session BigData 2013

Retrieved from
Page last modified on August 22, 2016, at 09:40 AM EST