BigData

Syllabus

BigData.Syllabus History

Hide minor edits - Show changes to output

Changed line 71 from:
The student can expect to have simple exercises and quizzes every meeting. Some of these daily assignments will be done in groups specified by the instructor and they will account for the participation grade of the course.
to:
The student can expect to have simple exercises and quizzes every meeting. Some of these daily assignments will be done in groups specified by the instructor and they will account for the participation grade of the course. Make up assignments will be allowed only if the instructor or TA were informed of a documented absence before the quiz took place.
Changed line 85 from:
Exams are this course's formal evaluation tool. In the exams students will be tested with respect to the learning goals of this course. Exams will comprise a mix of practical exercises and concepts. There will be only one midterm exam at around 3/4 of the semester
to:
Exams are this course's formal evaluation tool. In the exams students will be tested with respect to the learning goals of this course. Exams will comprise a mix of practical exercises and concepts. There will be only one midterm exam at around 3/4 of the semester. The exam is '''open notes''' but only handwritten notes are allowed.
Changed lines 52-71 from:
!!! Challenges

This course is designed to be a hands-on learning experience. I believe that students learn better by doing. Thus, by providing concrete, practical experience I expect that students will be better prepared to apply their new knowledge into real-life, data-intensive, research situations.

As part of this philosophy, there will be monthly Big Data challenges. Every 3 weeks,
a challenge will be released to the students who will compete with each other to design and implement the best solution. Full credit will be obtained regardless of the particular rank of a student's solution. The goal of these challenges is to expose the student to the use of learning algorithms and infrastructure used by Big Data technologies. Released problems will reflect as much as possible real challenges in fields such as astronomy, bioinformatics, and analysis of social media.


Challenge schedule is as follows:
* Challenge 0: Warming up (Aug 27 - due Sep 3rd)
* 1st challenge (Sep 8 - due Sep 25)
* Challenge 2 (Sep 29 - due Oct 15)
* Challenge 3 (Nov 3 - due Nov 19)

Challenges will be done in teams of 3 to 4 students

!!! Final project

Projects are one of the most important learning tools of this class. The final project is entirely to the discretion of the student (upon instructor approval). Students are free to explore a problem of their interest and propose their own solution.  The project has the following deliverables:

* '''Proposal.''' Maximum 1 page of project proposal, why the problem is important, what has been done so far in the field, and what are the expected outcomes
to:
!! Labs

We will have multiple labs during the semester. These labs are based on the Hortonworks material and we will use their virtual machine for most of them. Labs are due exactly one week after they are assigned.

!!! Class project

The final project is entirely to the discretion of the student (upon instructor approval). Students are free to explore
a problem of their interest and propose their own solution.  The project has the following deliverables:

* '''Proposal
.''' Maximum 2 pages of project proposal, why the problem is important, what has been done so far in the field, and what are the expected outcomes
* '''Presentations''' Expect 3 presentations during the semester, each one will detail different aspects of your project and preliminary results are expected
.
Changed lines 66-68 from:
Projects will be done individually.

to:
Projects will be done in teams of 3 grad students or 4 students if they include at least 1 undergraduate student.

Added line 94:
Changed lines 97-101 from:
In order to facilitate interaction between students and to promote a broader participation, I created a %target=_blank%[[https://piazza.com|Piazza group]]. This is a discussion forum for the class and members are expected to conduct themselves with respect by posting comments and replies only in the context of the course. Use the Piazza group to ask general questions about the homework, exams, and lectures. You can also paste small snippets of code to clarify an idea. Students are encouraged to answer each others questions. Recall that your thoughtful participation in this forum accounts through your final grade.



to:
In order to facilitate interaction between students and to promote a broader participation, I created a %target=_blank%[[https://piazza.com/class/is67dzarfg426j|Piazza group]]. This is a discussion forum for the class and members are expected to conduct themselves with respect by posting comments and replies only in the context of the course. Use the Piazza group to ask general questions about the homework, exams, and lectures. You can also paste small snippets of code to clarify an idea. Students are encouraged to answer each others questions. Recall that your thoughtful participation in this forum accounts through your final grade.



Changed lines 110-113 from:
* '''Challenge 1''' 10 pts
* '''Challenge 2''' 10 pts
* '''Challenge 3''' 10 pts
* '''Project reports''' 10
pts
to:
* '''Labs''' 25 pts
* '''Project presentations''' 30 pts
Changed lines 113-115 from:
* '''Class project''' 10 pts
* '''Midterm Exam''' 15 pts
* '''Final
Exam''' 15 pts
to:
* '''Final report''' 10 pts
* '''Exam''' 15 pts
Changed lines 15-16 from:
-->  https://marketplace.mimeo.com/studentmaterials
to:
-->  https://marketplace.mimeo.com/studentmaterials (Recommended HDP Analyst: Data Science-Lab Guide)
Added lines 14-15:
* Hortonworks Material
-->  https://marketplace.mimeo.com/studentmaterials
Changed line 15 from:
'''Supported by AWS in Education Grant award'''
to:
'''Supported by AWS in Education Grant award and Hortonworks University'''
Changed line 3 from:
* [[https://piazza.com/class/idestq8f3bg75u|Piazza link]]
to:
* [[https://piazza.com/class/is67dzarfg426j|Piazza link]]
Deleted lines 133-135:

!! Title IX:
In an effort to meet obligations under Title IX, UNM faculty, Teaching Assistants, and Graduate Assistants are considered “responsible employees” by the Department of Education (see pg 15 - http://www2.ed.gov/about/offices/list/ocr/docs/qa-201404-title-ix.pdf).  This designation requires that any report of gender discrimination which includes sexual harassment, sexual misconduct and sexual violence made to a faculty member, TA, or GA must be reported to the Title IX Coordinator at the Office of Equal Opportunity (oeo.unm.edu). For more information on the campus policy regarding sexual misconduct, see: https://policy.unm.edu/university-policies/2000/2740.html
Deleted line 0:
Changed lines 134-135 from:
!! SPECIAL ACCOMMODATIONS
If you need special accommodations or assistance, please contact the Accessibility Resource Center
(%target=_blank%http://as2.unm.edu/)
to:

!! Title IX:
In an effort to meet obligations under Title IX, UNM faculty, Teaching Assistants, and Graduate Assistants are considered “responsible employees” by the Department of Education
(see pg 15 - http://www2.ed.gov/about/offices/list/ocr/docs/qa-201404-title-ix.pdf).  This designation requires that any report of gender discrimination which includes sexual harassment, sexual misconduct and sexual violence made to a faculty member, TA, or GA must be reported to the Title IX Coordinator at the Office of Equal Opportunity (oeo.unm.edu). For more information on the campus policy regarding sexual misconduct, see: https://policy.unm.edu/university-policies/2000/2740.html
 

!! ADA:
In accordance with University Policy 2310 and the Americans with Disabilities Act (ADA), academic accommodations may be made for any student who notifies the instructor of the need for an accommodation. If you have a disability, either permanent or temporary, contact Accessibility Resource Center at 277-3506 for additional information.

Deleted lines 0-5:
!!! Instructor
* '''[[http://www.cs.unm.edu/~estrada|Trilce Estrada]]''', Assistant Professor
* Email: '''estrada@cs.unm.edu'''
* Office: ''' CARC 2004A '''

------
Changed line 4 from:
* Office: ''' FEC 325 '''
to:
* Office: ''' CARC 2004A '''
Added lines 212-216:


!!! Kafka
* [[https://kafka.apache.org/documentation.html#quickstart | Kafka documentation]]
* [[http://blog.cloudera.com/blog/2014/09/apache-kafka-for-beginners/| Kafka for beginners]]
Changed line 177 from:
to:
* [[http://www.toptal.com/spark/introduction-to-apache-spark | Introduction to Apache Spark with Examples and Use Cases]]
Changed lines 163-171 from:
!!!  Hadoop ecosystem and EC2 practice & Performance considerations and best practices HRW Ch. 3
* [[https://hadoopecosystemtable.github.io/ | Hadoop ecosystem]]
* [[http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-get-started-count-words.html]]
* [[http://hortonworks.com/blog/deploying-hadoop-cluster-amazon-ec2-hortonworks/]]
* [[http://hortonworks.com/products/hortonworks-sandbox/#install]]
* [[http://www.revelytix.com/?q=content/hadoop-ecosystem | Hadoop ecosystem]]
* [[http://cloudera.com/content/cloudera/en/training/library/apache-hadoop-ecosystem.html | Cloudera videos on Hadoop ecosystem]]

to:
Added lines 169-172:
!!! Challenge 1

!!! Page rank

Changed line 174 from:
to:
* [[https://databricks.com/spark/developer-resources | Very good set of resources from databricks]]
Changed lines 178-180 from:
!!! Challenge 1

to:
Changed lines 194-195 from:

to:
!!!  Hadoop ecosystem and EC2 practice & Performance considerations and best practices HRW Ch. 3
* [[https://hadoopecosystemtable.github.io/ | Hadoop ecosystem]]
* [[http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-get-started-count-words.html]]
* [[http://hortonworks.com/blog/deploying-hadoop-cluster-amazon-ec2-hortonworks/]]
* [[http://hortonworks.com/products/hortonworks-sandbox/#install]]
* [[http://www.revelytix.com/?q=content/hadoop-ecosystem | Hadoop ecosystem]]
* [[http://cloudera.com/content/cloudera/en/training/library/apache-hadoop-ecosystem.html | Cloudera videos on Hadoop ecosystem]]


Changed line 219 from:
!!! Page rank
to:
Added line 164:
* [[https://hadoopecosystemtable.github.io/ | Hadoop ecosystem]]
Changed line 179 from:
to:
* [[http://zdatainc.com/2014/08/real-time-streaming-apache-spark-streaming/ | Real time streaming]]
Changed line 164 from:
* [[http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-get-started-count-words.html]]]]
to:
* [[http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-get-started-count-words.html]]
Added line 174:
* [[http://train.ed.psu.edu/WFED-543/SocNet_TheoryApp.pdf |SNA Theory and Applications]]
Deleted lines 162-168:
!!! Searching, indexing, and their implications to memory management HRW Ch 1
* [[http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html | Intro to Information Retrieval]]
** From Boolean Retrieval to Scoring, term weighting and the vector space model

!!! Challenge 1

Added lines 171-182:
!! Social media analysis
* [[http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html | Sentiment analysis]]
* [[http://www.analytictech.com/networks.pdf | Social network analysis]]

!!Apache Spark

* [[http://kukuruku.co/hub/algorithms/social-network-analysis-spark-graphx | SNA with Spark and GraphX]]


!!! Challenge 1

Deleted line 184:
Added lines 191-199:


!!! Searching, indexing, and their implications to memory management HRW Ch 1
* [[http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html | Intro to Information Retrieval]]
** From Boolean Retrieval to Scoring, term weighting and the vector space model



Changed lines 210-211 from:
!!! Spark
to:
Deleted line 212:
!!! Network analysis
Changed lines 163-169 from:
to:
!!! Searching, indexing, and their implications to memory management HRW Ch 1
* [[http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html | Intro to Information Retrieval]]
** From Boolean Retrieval to Scoring, term weighting and the vector space model

!!! Challenge 1

Changed lines 187-189 from:
to:
!!! Challenge 2

Added lines 203-204:
!!! Challenge 3
Changed lines 207-212 from:
!!! Searching, indexing, and their implications to memory management HRW Ch 1
* [[http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html | Intro to Information Retrieval]]
** From Boolean Retrieval to Scoring, term weighting and the vector space model


to:
Changed line 234 from:
!!! Visualization as a preliminary data analysis tool HRW Ch 1
to:
!!! Visualization as a complementary data analysis tool HRW Ch 1
Changed lines 64-68 from:
* [[Challenge 0: Warming up]] (Aug 27 - due Sep 3rd)
* [[1st challenge|Challenge 1: The medicaid challenge (visualization, Hadoop practice)]] (Sep 8 - due Sep 25)
* [[Challenge 2: Plagiarism detection (Amazon EC2, Hadoop, LSH)]] (Sep 29 - due Oct 15)
* [[Challenge 3: Netflix recommender system (Mahout, Giraph)]]
(Nov 3 - due Nov 19)
to:
* Challenge 0: Warming up (Aug 27 - due Sep 3rd)
* 1st challenge (Sep 8 - due Sep 25)
* Challenge 2 (Sep 29 - due Oct 15)
* Challenge 3 (Nov 3 - due Nov 19)
Changed lines 148-150 from:
!!! (Aug 18) Introduction & Overview of available infrastructure (CARC Galles, Amazon EC2)

!!! (Aug 20) Big Data applications MMD Ch. 1
to:
!!! Introduction & Overview of available infrastructure (CARC Galles, Amazon EC2)


!!! Big Data applications MMD Ch. 1
Changed lines 155-167 from:
!!! (Aug 25) Visualization as a preliminary data analysis tool HRW Ch 1
* [[http://www.visualisingdata.com/index.php/resources/ | Important tools for visualising and communicating data]]
* [[http://www.cs.unm.edu/~estrada/files/04-visualization/ | VIsualization with R]]
* [[http://www.cs.unm.edu/~estrada/files/04-visualization/ | visualization with Google charts]]

!!! (Aug 27) Searching, indexing, and their implications to memory management HRW Ch 1
* [[http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html | Intro to Information Retrieval]]
** From Boolean Retrieval to Scoring, term weighting and the vector space model

!!! (Sep 3) The learning problem
* [[http://www.cs.unm.edu/~estrada/files/01-the-learning-problem.pdf | The learning problem]]

!!! (Sep 8 - 10)
The MapReduce paradigm & Hadoop and HDFS overview MMD Ch. 2
to:

!!! The MapReduce paradigm & Hadoop and HDFS overview MMD Ch. 2
Changed lines 163-168 from:
!!! (Sep 15 - 17) Architecting for the cloud HRW Ch 2 & 9
*[[http://cs.sfsu.edu/ccls/cloud/Amazon_EC2_Tutorial.pdf | EC2 tutorial from SFSU]]
*[[http://aws.amazon.com/documentation/ec2/ | Amazon EC2 documentation]]
*[[http://d36cz9buwru1tt.cloudfront.net/AWS_Cloud_Best_Practices.pdf | Architecting for the cloud: best practices]]

!!! (Sep 22 - 24)
Hadoop ecosystem and EC2 practice & Performance considerations and best practices HRW Ch. 3
to:

!!!  Hadoop ecosystem and EC2 practice & Performance considerations and best practices HRW Ch. 3
Changed lines 171-189 from:
!!! (Sep 29 - Oct 1) BigTable, Hive and Pig HRW Ch. 4
* [[http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf | BigTable]]
* [[http://www.cs.rutgers.edu/~pxk/417/notes/content/bigtable.html | BigTable from Rutgers]]
* [[https://cwiki.apache.org/confluence/display/Hive/Tutorial | Hive tutorial]]
* [[http://pig.apache.org/docs/r0.7.0/tutorial.html | Pig tutorial from Apache]]
* [[http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-pig/ | How to process data with Apache pig]]

!!! (Oct 6)  Page rank Ch. 5

!!! (Oct 8) Project presentations
* '''Project proposal due Oct 8'''

!!! (Oct 13) Finding similar items MMD Ch. 3
* [[http://infolab.stanford.edu/~ullman/mmds/ch3.pdf | Mining of Massive Datasets Ch 3]]

!!! (Oct 15) Recommender systems MMD Ch. 9


!!! (Oct 20 - 22)
Mahout, clustering, and classification MMD Ch. 12
to:

!!! Recommender systems MMD Ch. 9


!!!
Mahout, clustering, and classification MMD Ch. 12
Changed lines 181-185 from:
!!! (Oct 27 - 29) Minhashing & Locality Sensitive Hashing (LSH) MMD Ch 6

!!! (Nov 3 - 5) Frequent itemsets & Mining data streams MMD Ch. 4 & 6

!!! (Nov 10 - 12) Database evolution & NoSQL databases and MongoDB HRW Ch. 2
to:

!!! Finding similar items MMD Ch. 3
* [[http://infolab.stanford.edu/~ullman/mmds/ch3.pdf | Mining of Massive Datasets Ch 3]]

!!! Minhashing & Locality Sensitive Hashing (LSH) MMD Ch 6

!!! Frequent itemsets & Mining data streams MMD Ch. 4 & 6

!!! Spark

!!! Storm

!!! Network analysis

!!! Page rank

!!! Searching, indexing, and their implications to memory management HRW Ch 1
* [[http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html | Intro to Information Retrieval]]
** From Boolean Retrieval to Scoring, term weighting and the vector space model



!!! BigTable, Hive and Pig HRW Ch. 4
* [[http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf | BigTable]]
* [[http://www.cs.rutgers.edu/~pxk/417/notes/content/bigtable.html | BigTable from Rutgers]]
* [[https://cwiki.apache.org/confluence/display/Hive/Tutorial | Hive tutorial]]
* [[http://pig.apache.org/docs/r0.7.0/tutorial.html | Pig tutorial from Apache]]
* [[http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-pig/ | How to process data with Apache pig]]




!!! Architecting for the cloud HRW Ch 2 & 9
*[[http://cs.sfsu.edu/ccls/cloud/Amazon_EC2_Tutorial.pdf | EC2 tutorial from SFSU]]
*[[http://aws.amazon.com/documentation/ec2/ | Amazon EC2 documentation]]
*[[http://d36cz9buwru1tt.cloudfront.net/AWS_Cloud_Best_Practices.pdf | Architecting for the cloud: best practices]]



!!! Database evolution & NoSQL databases and MongoDB HRW Ch. 2
Changed lines 227-233 from:
!!! (Nov 17 - 19) Review, projects discussion, and Midterm exam
* Nov 12 Midterm exam
*
[[Poster session guidelines]]

!!! (Nov 24 - 26) Other approaches for performance improvement
: MPI

!!! (Dec 1 -3)
Other approaches for performance improvement: CUDA
to:


!!! Visualization as a preliminary data analysis tool HRW Ch 1
*
[[http://www.visualisingdata.com/index.php/resources/ | Important tools for visualising and communicating data]]
* [[http
://www.cs.unm.edu/~estrada/files/04-visualization/ | VIsualization with R]]
* [[http://www.cs.unm.edu/~estrada/files/04-visualization/ | visualization with Google charts]]




!!!  Other approaches for performance improvement: MPI

!!!
Other approaches for performance improvement: CUDA
Changed lines 126-130 from:
* '''Challenge 4''' 10 pts
* '''Final project''' 20 pts
* '''Poster
       ''' 10 pts
* '''Exam''' 15 pts
* '''Extra credit''' 3 pts
to:
* '''Project reports''' 10 pts
* '''Poster        ''' 5 pts
* '''Class project''' 10 pts
* '''Midterm Exam''' 15 pts
* '''Final Exam''' 15 pts

Changed line 10 from:
* [[https://piazza.com/class/hyajszkt2aa4cr|Piazza link]]
to:
* [[https://piazza.com/class/idestq8f3bg75u|Piazza link]]
Deleted line 4:
* Office hours: '''M 11:30-12:30 AM''' and '''F 10:00-12:00 AM'''
Deleted lines 10-14:
* Class Time: '''MW 10:00-11:15 AM'''
* Building and Room: '''Centennial Engineering Center 1026'''
* Prerequisites: Fluent in at least one of the following programming languages: '''Python, Java, or C'''
* Preferred: background in '''data mining, machine learning or statistics'''
* UNM Learn: CS-591-001 (Fall 2014)
Added line 207:
Added line 209:
*[[https://code.google.com/p/unresyst/wiki/CreateMahoutRecommender | Mahout hands on]]
Changed line 184 from:
* [[* [[http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-get-started-count-words.html]]]]
to:
* [[http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-get-started-count-words.html]]]]
Changed lines 165-167 from:
to:
* [[http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html | Intro to Information Retrieval]]
** From Boolean Retrieval to Scoring, term weighting and the vector space model

Added line 172:
* [[http://www.cs.rutgers.edu/~pxk/417/notes/content/mapreduce.html | MapReduce]]
Changed line 133 from:
* '''Final project''' 30 pts
to:
* '''Final project''' 20 pts
Changed lines 135-136 from:
* '''Extra credit''' 5 pts
to:
* '''Exam''' 15 pts
* '''Extra credit''' 3
pts
Added line 156:
* [[http://www.planet-data.eu/sites/default/files/presentations/Big_Data_Tutorial_part4.pdf | Big Data tutorial from Marko Grobelnik]]
Changed lines 157-162 from:
!!! (Aug 25 - 27) Architecting for the cloud HRW Ch 2 & 9
*[[http://cs.sfsu.edu/ccls/cloud/Amazon_EC2_Tutorial.pdf | EC2 tutorial from SFSU]]
*[[http://aws.amazon.com/documentation/ec2/ | Amazon EC2 documentation]]
*[[http://d36cz9buwru1tt.cloudfront.net/AWS_Cloud_Best_Practices.pdf | Architecting for the cloud: best practices]]

!!! (Sep 3
) Visualization as a preliminary data analysis tool HRW Ch 1
to:
!!! (Aug 25) Visualization as a preliminary data analysis tool HRW Ch 1
Changed lines 162-164 from:
!!! (Sep 8) Searching, indexing, and their implications to memory management HRW Ch 1

!!! (Sep 10) The learning problem
to:
!!! (Aug 27) Searching, indexing, and their implications to memory management HRW Ch 1

!!! (Sep 3) The learning problem
Changed line 167 from:
!!! (Sep 15 - 17) The MapReduce paradigm & Hadoop and HDFS overview MMD Ch. 2
to:
!!! (Sep 8 - 10) The MapReduce paradigm & Hadoop and HDFS overview MMD Ch. 2
Added lines 173-177:
!!! (Sep 15 - 17) Architecting for the cloud HRW Ch 2 & 9
*[[http://cs.sfsu.edu/ccls/cloud/Amazon_EC2_Tutorial.pdf | EC2 tutorial from SFSU]]
*[[http://aws.amazon.com/documentation/ec2/ | Amazon EC2 documentation]]
*[[http://d36cz9buwru1tt.cloudfront.net/AWS_Cloud_Best_Practices.pdf | Architecting for the cloud: best practices]]

Added line 179:
* [[* [[http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-get-started-count-words.html]]]]
Added line 180:
* [[http://hortonworks.com/products/hortonworks-sandbox/#install]]
Added line 179:
* [[http://hortonworks.com/blog/deploying-hadoop-cluster-amazon-ec2-hortonworks/]]
Changed line 231 from:
!!! The schedule is under construction!
to:
!!! The schedule is under construction!
Changed line 28 from:
to:
'''Supported by AWS in Education Grant award'''
Changed lines 5-6 from:
* Office hours: '''M 10:00-11:00 AM''' and '''T 9:00-11:00 AM'''
to:
* Office hours: '''M 11:30-12:30 AM''' and '''F 10:00-12:00 AM'''
Changed line 11 from:
* Class Time: '''MW 10:30-11:15 AM'''
to:
* Class Time: '''MW 10:00-11:15 AM'''
Changed lines 13-14 from:
* Prerequisites: Fluent in at least one of the following programming languages: '''Python, Java, C, or Matlab'''
to:
* Prerequisites: Fluent in at least one of the following programming languages: '''Python, Java, or C'''
* Preferred: background in '''data mining, machine learning or statistics
'''
July 31, 2014, at 04:13 PM EST by 64.106.39.101 -
Changed line 132 from:
* '''Final project''' 20 pts
to:
* '''Final project''' 30 pts
July 31, 2014, at 04:12 PM EST by 64.106.39.101 -
Changed lines 226-230 from:




to:
!!! (TBD) Poster presentations

Deleted lines 229-288:

!!! Week 1
# Introduction


## [[Suggested datasets]]

!!! Week 2
# [[http://www.cs.unm.edu/~estrada/files/03-visualization.pdf | Introduction to visualization]]


!!! Week 3
# Database evolution

!!! Week 4
#  MapReduce
## Suggested readings [[http://infolab.stanford.edu/~ullman/mmds/ch2.pdf | Mining of Massive Datasets Ch 2]]
# [[http://www.cs.unm.edu/~estrada/files/07-hadoop.zip| Hadoop installation]]

!!! Week 5
# [[http://www.cs.unm.edu/~estrada/files/08-intro_to_hadoop.pdf | Introduction to Hadoop]]
# [[http://www.cs.unm.edu/~estrada/files/09-map-reduce-algorithms.pdf | MapReduce algorithms]]
# '''[[1st challenge]]''' due Sep 27th



!! Week 7
# LSH
# [[Visit to CARC and use of the Galles cluster]]

!! Week 8
# Association rules 1
# Association rules 2

!! Week 9
# Page rank
# Page rank 2

!! Week 10
# Recommender systems

!! Week 11

# Hadoop ecosystem

# Mahout


!!! Week 12
# [[Challenge 2]]
# Out of town: work on your projects

!!! Week 13


!! Poster session


!! Latter
## Suggested readings [[http://infolab.stanford.edu/~ullman/mmds/ch4.pdf | Mining of Massive Datasets Ch 4]]
July 31, 2014, at 04:10 PM EST by 64.106.39.101 -
Changed line 70 from:
* [[Challenge 1: The medicaid challenge (visualization, Hadoop practice)]] (Sep 8 - due Sep 25)
to:
* [[1st challenge|Challenge 1: The medicaid challenge (visualization, Hadoop practice)]] (Sep 8 - due Sep 25)
Changed lines 178-179 from:
**[[http://www.revelytix.com/?q=content/hadoop-ecosystem | Hadoop ecosystem]]
**[[http://cloudera.com/content/cloudera/en/training/library/apache-hadoop-ecosystem.html | Cloudera videos on Hadoop ecosystem]]
to:
* [[http://www.revelytix.com/?q=content/hadoop-ecosystem | Hadoop ecosystem]]
* [[http://cloudera.com/content/cloudera/en/training/library/apache-hadoop-ecosystem.html | Cloudera videos on Hadoop ecosystem]]
July 31, 2014, at 04:07 PM EST by 64.106.39.101 -
Changed line 161 from:
!!! (Aug 3) Visualization as a preliminary data analysis tool HRW Ch 1
to:
!!! (Sep 3) Visualization as a preliminary data analysis tool HRW Ch 1
Changed lines 166-189 from:
!!! (Aug 8) Searching, indexing, and their implications to memory management HRW Ch 1



4 The MapReduce paradigm & Hadoop and HDFS overview MMD Ch. 2
5 Hadoop in EC2 practice & Performance considerations and best practices HRW Ch. 3
6 Finding similar items MMD Ch. 3
7 Frequent itemsets & Mining data streams MMD Ch. 4 & 6
8 Minhashing & Locality Sensitive Hashing (LSH) MMD Ch 6
9 Hadoop ecosystem, Hive and Pig HRW Ch. 4
10 Recommender systems MMD Ch. 9
11 Mahout, clustering, and classification MMD Ch. 12
13 Midterm exam
Projects overview
12 Graph analysis & Apache Giraph HWR Ch 7
14 Page rank & BigTable MMD Ch. 5
15 Database evolution & NoSQL databases and MongoDB HRW Ch. 2
16 Review of poster design and report guidelines
Poster presentation





to:
!!! (Sep 8) Searching, indexing, and their implications to memory management HRW Ch 1

!!! (Sep 10) The learning problem
* [[http://www.cs.unm.edu/~estrada/files/01-the-learning-problem.pdf | The learning problem]]

!!! (Sep 15 - 17) The MapReduce paradigm & Hadoop and HDFS overview MMD Ch. 2
* [[http://infolab.stanford.edu/~ullman/mmds/ch2.pdf | Mining of Massive Datasets Ch 2]]
* [[http://www.cs.unm.edu/~estrada/files/07-hadoop.zip| Hadoop installation]]
* [[http://www.cs.unm.edu/~estrada/files/08-intro_to_hadoop.pdf | Introduction to Hadoop]]
* [[http://www.cs.unm.edu/~estrada/files/09-map-reduce-algorithms.pdf | MapReduce algorithms]]

!!! (Sep 22 - 24) Hadoop ecosystem and EC2 practice & Performance considerations and best practices HRW Ch. 3
**[[http://www.revelytix.com/?q=content/hadoop-ecosystem | Hadoop ecosystem]]
**[[http://cloudera.com/content/cloudera/en/training/library/apache-hadoop-ecosystem.html | Cloudera videos on Hadoop ecosystem]]

!!! (Sep 29 - Oct 1) BigTable, Hive and Pig HRW Ch. 4
* [[http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf | BigTable]]
* [[http://www.cs.rutgers.edu/~pxk/417/notes/content/bigtable.html | BigTable from Rutgers]]
* [[https://cwiki.apache.org/confluence/display/Hive/Tutorial | Hive tutorial]]
* [[http://pig.apache.org/docs/r0.7.0/tutorial.html | Pig tutorial from Apache]]
* [[http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-pig/ | How to process data with Apache pig]]

!!! (Oct 6)  Page rank Ch. 5

!!! (Oct 8) Project presentations
* '''Project proposal due Oct 8'''

!!! (Oct 13) Finding similar items MMD Ch. 3
* [[http://infolab.stanford.edu/~ullman/mmds/ch3.pdf | Mining of Massive Datasets Ch 3]]

!!! (Oct 15) Recommender systems MMD Ch. 9

!!! (Oct 20 - 22) Mahout, clustering, and classification MMD Ch. 12
*[[http://www.ibm.com/developerworks/java/library/j-mahout/ | Very good Mahout tutorial]]
*[[http://girlincomputerscience.blogspot.com/2010/11/apache-mahout.html | Another Mahout tutorial]]
*[[http://www.slideshare.net/Cataldo/tutoria-mahout-recommendation | Mahout slides]]

!!! (Oct 27 - 29) Minhashing & Locality Sensitive Hashing (LSH) MMD Ch 6

!!! (Nov 3 - 5) Frequent itemsets & Mining data streams MMD Ch. 4 & 6

!!! (Nov 10 - 12) Database evolution & NoSQL databases and MongoDB HRW Ch. 2
* [[http://research.ijais.org/volume5/number4/ijais12-450888.pdf | Comparison between SQL and NoSQL DBs]]
*[[http://www.infoq.com/articles/mongodb-java-php-python | MongoDB for java, php, and python developers]]
*[[http://api.mongodb.org/wiki/current/Tutorial.html | MongoDB tutorial]]
*[[http://martinfowler.com/articles/nosql-intro-original.pdf | Polyglot Persistence]]
*[[http://readwrite.com/2009/02/12/is-the-relational-database-doomed#awesm=~ogp7vsMQNIgj8k | Is the relational DB domed]] 

!!! (Nov 17 - 19) Review, projects discussion, and Midterm exam
* Nov 12 Midterm exam
* [[Poster session guidelines]]

!!! (Nov 24 - 26) Other approaches for performance improvement: MPI

!!! (Dec 1 -3) Other approaches for performance improvement: CUDA
* [[http://www.nvidia.com/content/GTC-2010/pdfs/2131_GTC2010.pdf | CUDA slides]]
* [[http://devblogs.nvidia.com/parallelforall/easy-introduction-cuda-c-and-c/ | Easy introduction]]
* [[http://www.techpowerup.com/119073/nvidia-cuda-emulator-for-every-pc.html | CUDA emulator]]






Changed lines 235-236 from:
# [[http://www.cs.unm.edu/~estrada/files/01-the-learning-problem.pdf | The learning problem]]
to:

Changed lines 245-251 from:
# [[http://www.cs.unm.edu/~estrada/files/05-nosql.pdf | NoSQL databases and MongoDB]]
** [[http://research.ijais.org/volume5/number4/ijais12-450888.pdf | Comparison between SQL and NoSQL DBs]]
**[[http://www.infoq.com/articles/mongodb-java-php-python | MongoDB for java, php, and python developers]]
**[[http://api.mongodb.org/wiki/current/Tutorial.html | MongoDB tutorial]]
**[[http://martinfowler.com/articles/nosql-intro-original.pdf | Polyglot Persistence]]
**[[http://readwrite.com/2009/02/12/is-the-relational-database-doomed#awesm=~ogp7vsMQNIgj8k | Is the relational DB domed]] 

to:
Changed lines 256-261 from:
!!! Week 6
# Finding similar items
## Suggested readings [[http://infolab.stanford.edu/~ullman/mmds/ch3.pdf | Mining of Massive Datasets Ch 3]]
# Exercises with hadoop
# Student presentations

to:

Changed lines 276-277 from:
**[[http://www.revelytix.com/?q=content/hadoop-ecosystem | Hadoop ecosystem]]
**[[http://cloudera.com/content/cloudera/en/training/library/apache-hadoop-ecosystem.html | Cloudera videos on Hadoop ecosystem]]
to:
Changed lines 278-281 from:
**[[http://www.ibm.com/developerworks/java/library/j-mahout/ | Very good Mahout tutorial]]
**[[http://girlincomputerscience.blogspot.com/2010/11/apache-mahout.html | Another Mahout tutorial]]
**[[http://www.slideshare.net/Cataldo/tutoria-mahout-recommendation | Mahout slides]]

to:

Changed lines 285-297 from:
# CUDA
** [[http://www.nvidia.com/content/GTC-2010/pdfs/2131_GTC2010.pdf | CUDA slides]]
** [[http://devblogs.nvidia.com/parallelforall/easy-introduction-cuda-c-and-c/ | Easy introduction]]
** [[http://www.techpowerup.com/119073/nvidia-cuda-emulator-for-every-pc.html | CUDA emulator]]

!! Week 14
# BigTable, Hive, and Pig
** [[http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf | BigTable]]
** [[http://www.cs.rutgers.edu/~pxk/417/notes/content/bigtable.html | BigTable from Rutgers]]
** [[https://cwiki.apache.org/confluence/display/Hive/Tutorial | Hive tutorial]]
** [[http://pig.apache.org/docs/r0.7.0/tutorial.html | Pig tutorial from Apache]]
** [[http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-pig/ | How to process data with Apache pig]]

to:

Changed line 288 from:
* [[Poster session guidelines]]
to:
July 31, 2014, at 03:40 PM EST by 64.106.39.101 -
Added lines 151-189:
!!! (Aug 18) Introduction & Overview of available infrastructure (CARC Galles, Amazon EC2)

!!! (Aug 20) Big Data applications MMD Ch. 1
* [[http://prezi.com/e2xnc3-nrbja/applications-of-big-data/ | Big Data applications]]

!!! (Aug 25 - 27) Architecting for the cloud HRW Ch 2 & 9
*[[http://cs.sfsu.edu/ccls/cloud/Amazon_EC2_Tutorial.pdf | EC2 tutorial from SFSU]]
*[[http://aws.amazon.com/documentation/ec2/ | Amazon EC2 documentation]]
*[[http://d36cz9buwru1tt.cloudfront.net/AWS_Cloud_Best_Practices.pdf | Architecting for the cloud: best practices]]

!!! (Aug 3) Visualization as a preliminary data analysis tool HRW Ch 1
* [[http://www.visualisingdata.com/index.php/resources/ | Important tools for visualising and communicating data]]
* [[http://www.cs.unm.edu/~estrada/files/04-visualization/ | VIsualization with R]]
* [[http://www.cs.unm.edu/~estrada/files/04-visualization/ | visualization with Google charts]]

!!! (Aug 8) Searching, indexing, and their implications to memory management HRW Ch 1



4 The MapReduce paradigm & Hadoop and HDFS overview MMD Ch. 2
5 Hadoop in EC2 practice & Performance considerations and best practices HRW Ch. 3
6 Finding similar items MMD Ch. 3
7 Frequent itemsets & Mining data streams MMD Ch. 4 & 6
8 Minhashing & Locality Sensitive Hashing (LSH) MMD Ch 6
9 Hadoop ecosystem, Hive and Pig HRW Ch. 4
10 Recommender systems MMD Ch. 9
11 Mahout, clustering, and classification MMD Ch. 12
13 Midterm exam
Projects overview
12 Graph analysis & Apache Giraph HWR Ch 7
14 Page rank & BigTable MMD Ch. 5
15 Database evolution & NoSQL databases and MongoDB HRW Ch. 2
16 Review of poster design and report guidelines
Poster presentation




Changed line 195 from:
# [[http://prezi.com/e2xnc3-nrbja/applications-of-big-data/ | Big Data applications]]
to:
Changed lines 200-203 from:
## Suggested readings: [[http://www.visualisingdata.com/index.php/resources/ | Important tools for visualising and communicating data]]
# [[http://www.cs.unm.edu/~estrada/files/04-visualization/ | VIsualization with R]]
# Informal presentation of projects and [[http://www.cs.unm.edu/~estrada/files/04-visualization/ | visualization with Google charts]]

to:

Changed lines 243-246 from:
# Cloud computing
**[[http://cs.sfsu.edu/ccls/cloud/Amazon_EC2_Tutorial.pdf | EC2 tutorial from SFSU]]
**[[http://aws.amazon.com/documentation/ec2/ | Amazon EC2 documentation]]
**[[http://d36cz9buwru1tt.cloudfront.net/AWS_Cloud_Best_Practices.pdf | Architecting for the cloud: best practices]]
to:
July 31, 2014, at 02:47 PM EST by 64.106.39.101 -
Changed line 12 from:
* Building and Room: '''TBD'''
to:
* Building and Room: '''Centennial Engineering Center 1026'''
July 31, 2014, at 02:43 PM EST by 64.106.39.101 -
Changed lines 14-15 from:
* UNM Learn: CS-591-001 (Fall 2013)
to:
* UNM Learn: CS-591-001 (Fall 2014)
* [[https://piazza.com/class/hyajszkt2aa4cr|Piazza link]]
Deleted line 16:
July 31, 2014, at 02:35 PM EST by 64.106.39.101 -
Added lines 150-151:

!!! The schedule is under construction!
July 31, 2014, at 02:32 PM EST by 64.106.39.101 -
Changed line 2 from:
* '''Trilce Estrada''', Assistant Professor
to:
* '''[[http://www.cs.unm.edu/~estrada|Trilce Estrada]]''', Assistant Professor
July 31, 2014, at 02:30 PM EST by 64.106.39.101 -
Changed lines 57-58 from:
!! PROJECTS:
to:
!! ASSIGNMENTS:
Added lines 86-91:


!!! Daily assignments and quizzes:

The student can expect to have simple exercises and quizzes every meeting. Some of these daily assignments will be done in groups specified by the instructor and they will account for the participation grade of the course.

July 31, 2014, at 02:24 PM EST by 64.106.39.101 -
Deleted line 0:
Changed line 59 from:
Projects are the most important learning tool of this class. There will be [[small projects | monthly challenges]] assigned by the instructor during the semester, and one final project defined by the student.
to:
Projects are the most important learning tool of this class. There will be a series of small projects (challenges) assigned by the instructor during the semester, and one final project defined by the student.
July 31, 2014, at 02:22 PM EST by 64.106.39.101 -
Changed lines 1-2 from:
!! '''Please note that this syllabus is under construction'''
to:

!!! Instructor
*
'''Trilce Estrada''', Assistant Professor
* Email:
'''estrada@cs.unm.edu'''
* Office: ''' FEC 325 '''
* Office hours: '''M 10:00-11:00 AM''' and '''T 9:00-11:00 AM'''

------

Changed lines 28-35 from:
!!! Instructor
* '''Trilce Estrada''', Assistant Professor
* Email: '''estrada@cs.unm.edu'''
* Office: ''' FEC 325 '''
* Office hours: '''M 10:00-11:00 AM''' and '''T 9:00-11:00 AM'''


to:

Changed lines 34-37 from:
The field of computer science is experiencing a transition from computation-intensive to data-intensive problems, wherein data is produced in massive amounts by large sensor networks, new data acquisition techniques, simulations, and social networks. Efficiently extracting, interpreting, and learning from very large datasets requires a new generation of scalable algorithms as well as new data management technologies.

In
this course  we explore key data analysis and management techniques, which applied to massive datasets are the cornerstone that enables real-time decision making in distributed environments, business intelligence in the Web, and scientific discovery at large scale. In particular, we examine the map-reduce parallel computing paradigm and associated technologies such as distributed file systems, no-SQL databases, and stream computing engines. Additionally we review machine learning  methods that make possible the efficient analysis of large volumes of data in near real time. 
to:
The field of computer science is experiencing a transition from computation-intensive to data-intensive problems, wherein data is produced in massive amounts by large sensor networks, new data acquisition techniques, simulations, and social networks. Efficiently extracting, interpreting, and learning from very large datasets requires a new generation of scalable algorithms as well as new data management technologies.

In
this course we explore key data analysis and management techniques, which applied to massive datasets are the cornerstone that enables real-time decision making in distributed environments, business intelligence in the Web, and scientific discovery at large scale. In particular, we examine the map-reduce parallel computing paradigm and associated technologies such as distributed file systems, no-sql databases, and stream computing engines. Additionally we review machine learning methods that make possible the efficient analysis of large volumes of data in near real time.
Added line 41:
Changed lines 44-50 from:
* Large databases and their evolution.
* Big Data technology and trends, special consideration made to the Map-Reduce paradigm.
* Searching
, indexing, and their implications to memory management.
* Information extraction and feature selection.
* Supervised-, unsupervised-learning, and stream mining
.

to:
The course is divided into three main core topics:
* Introduction to the Big Data problem. Current challenges, trends
, and applications
* Algorithms for Big Data analysis
. Mining and learning algorithms that have been developed specifically to deal with large datasets
* Technologies for Big Data management
. Big Data technology and tools, special consideration made to the Map-Reduce paradigm and the Hadoop ecosystem.


Changed lines 53-55 from:
At the end of this course, the student will become familiar with the fundamental concepts of Big Data management an analytics; will become competent in recognizing challenges faced by applications dealing with very large volumes of data as well as in proposing scalable solutions for them; and will be able to understand how Big Data impacts business intelligence, scientific discovery, and our day-to-day life.

to:
At the end of this course, the student will become familiar with the fundamental concepts of Big Data management and analytics; will become competent in recognizing challenges faced by applications dealing with very large volumes of data as well as in proposing scalable solutions for them; and will be able to understand how Big Data impacts business intelligence, scientific discovery, and our day-to-day life.

Changed lines 66-67 from:
As part of this philosophy, there will be monthly Big Data challenges. Every month, a challenge will be released to the students who will compete with each other to design and implement the best solution. Full credit will be obtained regardless of the particular rank of a student's solution. The goal of these challenges is to expose the student to the use of learning algorithms and infrastructure used by Big Data technologies. Released problems will reflect as much as possible real challenges in fields such as astronomy, bioinformatics, and analysis of social media.
to:
As part of this philosophy, there will be monthly Big Data challenges. Every 3 weeks, a challenge will be released to the students who will compete with each other to design and implement the best solution. Full credit will be obtained regardless of the particular rank of a student's solution. The goal of these challenges is to expose the student to the use of learning algorithms and infrastructure used by Big Data technologies. Released problems will reflect as much as possible real challenges in fields such as astronomy, bioinformatics, and analysis of social media.


Challenge schedule is as follows:
* [[Challenge 0: Warming up]] (Aug 27 - due Sep 3rd)
* [[Challenge 1: The medicaid challenge (visualization, Hadoop practice)]] (Sep 8 - due Sep 25)
* [[Challenge 2: Plagiarism detection (Amazon EC2, Hadoop, LSH)]] (Sep 29 - due Oct 15)
* [[Challenge 3: Netflix recommender system (Mahout, Giraph)]] (Nov 3 - due Nov 19)

Challenges will be done in teams of 3 to 4 students

Changed lines 79-82 from:
The final project is entirely to the discretion of the student (upon instructor approval). Students would be free to explore a problem of their interest and propose their own solution. During the course we will hold weekly brainstorming sessions to discuss and strengthen every proposed project.

Projects will be done individually
. You must turn in only code written by you. Under no circumstance you should use code downloaded from Internet since this violation will result in serious penalties.
to:
Projects are one of the most important learning tools of this class. The final project is entirely to the discretion of the student (upon instructor approval). Students are free to explore a problem of their interest and propose their own solution.  The project has the following deliverables:

* '''Proposal
.''' Maximum 1 page of project proposal, why the problem is important, what has been done so far in the field, and what are the expected outcomes
* '''Poster and report.''' Maximum 10 page report highlighting consisting on the traditional sections of introduction, motivation, method, results, and conclusion

During the course we will hold bi-weekly brainstorming sessions to discuss and strengthen every proposed project.

Projects will be done individually
.
Changed lines 97-101 from:
!!! Exams

There won't be exams for this
course

to:
!!! Exam

Exams are this course's formal evaluation tool. In the exams students will be tested with respect to the learning goals of this course. Exams will comprise a mix of practical exercises and concepts. There will be only one midterm exam at around 3/4 of the semester

Changed lines 123-126 from:
* '''Challenge 1''' 15 pts
* '''Challenge 2''' 15 pts
* '''Challenge 3''' 15 pts
* '''Final project''' 40 pts
to:
* '''Challenge 1''' 10 pts
* '''Challenge 2''' 10 pts
* '''Challenge 3''' 10 pts
* '''Challenge 4''' 10 pts
* '''Final project''' 20 pts
* '''Poster        ''' 10
pts
July 31, 2014, at 02:03 PM EST by 64.106.39.101 -
Changed line 9 from:
* Textbook: '''[[http://infolab.stanford.edu/~ullman/mmds/book.pdf | Mining of Massive Datasets]]'''
to:
* Textbooks:
July 31, 2014, at 02:03 PM EST by 64.106.39.101 -
Changed lines 5-6 from:
* Class Time: '''MWF 9:00-9:50 AM'''
* Building and Room: '''CEC 1026'''
to:
* Class Time: '''MW 10:30-11:15 AM'''
* Building and Room: '''TBD'''
Added lines 10-19:

->'''[[http://infolab.stanford.edu/~ullman/mmds/book.pdf | Mining of Massive Datasets]]'''
--> by Anand Rajaraman and Jeffrey David Ullman
-->Publication Date: December 30, 2011 | ISBN-10: 1107015359 | ISBN-13: 978-1107015357
->'''[[http://lintool.github.io/MapReduceAlgorithms/index.html | Data-Intensive Text Processing with MapReduce]]'''
-->by Jimmy Lin and Chris Dyer
-->Morgan & Claypool Publishers, 2010.
->'''Hadoop Real World Solutions Cookbook'''
-->by Jonathan R. Owens, Brian Femiano, and Jon Lentz
-->Publication Date: February 7, 2013 | ISBN-10: 1849519129 | ISBN-13: 978-1849519120
Changed lines 9-10 from:
* Facebook group: https://www.facebook.com/groups/207533772733004/
to:
* Textbook: '''[[http://infolab.stanford.edu/~ullman/mmds/book.pdf | Mining of Massive Datasets]]'''
Changed lines 80-92 from:
Participation accounts for 15% of your final grade and won't be given for granted. You are required to participate either in class or electronically (through our %target=_blank%[[https://www.facebook.com/groups/207533772733004/ |facebook group]]).

!!! Facebook group:

In order
to facilitate interaction between students and to promote a broader participation, I created a %target=_blank%[[https://www.facebook.com/groups/207533772733004/|Facebook group]]. This is a discussion forum for the class and members are expected to conduct themselves with respect by posting comments and replies only in the context of the course. Use the FB group to ask general questions about the homework, exams, and lectures. You can also paste small snippets of code to clarify an idea. Students are encouraged to answer each others questions. Recall that your thoughtful participation in this forum accounts through your final grade.

If you don't have a facebook account don't worry, you can gain your participation points during class. This is more an experiment to encourage interaction, spark discussion, and identify possible misconceptions. Additionally, if I identify an interesting conversation happening in FB, I will post it in this Web site to make sure that everybody can benefit from it.

Please be aware that the social media interaction between me (the instructor) and students is only limited to this FB group and under no circumstance I am allowed to befriend students directly.




to:
Participation accounts for 15% of your final grade and won't be given for granted. You are required to participate either in class or electronically (through Piazza).

!!! Piazza
:

In order to facilitate interaction between students and
to promote a broader participation, I created a %target=_blank%[[https://piazza.com|Piazza group]]. This is a discussion forum for the class and members are expected to conduct themselves with respect by posting comments and replies only in the context of the course. Use the Piazza group to ask general questions about the homework, exams, and lectures. You can also paste small snippets of code to clarify an idea. Students are encouraged to answer each others questions. Recall that your thoughtful participation in this forum accounts through your final grade.



Changed lines 91-92 from:
I value student's opinions regarding the course and I will take them in consideration to make this course as exciting and engaging as possible. Thus, through the semester I will ask students formal and informal feedback. Formal feedback includes short surveys on my teaching effectiveness, preferred teaching methods, and pace of the class. Informal feedback will be in the form of FB polls or in-class questions regarding learning preferences. You can also leave anonymous feedback in the form of a note in my departmental mail box, or %target=_blank%[[https://docs.google.com/spreadsheet/viewform?fromEmail=true&formkey=dHZvUElpWXJsTXV6RTAwZGlCTkYtQVE6MQ | using this form]]. Remember that it is in the best interest of the class if you bring up to my attention if something is not working properly (e.g the pace of the class is too slow, the projects are boring, my teaching style is not effective) so that I can make the corrective steps.
to:
I value student's opinions regarding the course and I will take them in consideration to make this course as exciting and engaging as possible. Thus, through the semester I will ask students formal and informal feedback. Formal feedback includes short surveys on my teaching effectiveness, preferred teaching methods, and pace of the class. Informal feedback will be in the form of polls or in-class questions regarding learning preferences. You can also leave anonymous feedback in the form of a note in my departmental mail box, or %target=_blank%[[https://docs.google.com/spreadsheet/viewform?fromEmail=true&formkey=dHZvUElpWXJsTXV6RTAwZGlCTkYtQVE6MQ | using this form]]. Remember that it is in the best interest of the class if you bring up to my attention if something is not working properly (e.g the pace of the class is too slow, the projects are boring, my teaching style is not effective) so that I can make the corrective steps.
Changed line 111 from:
!! SPECIAL ACCOMODATIONS
to:
!! SPECIAL ACCOMMODATIONS
Changed lines 205-207 from:
to:
!! Poster session
* [[Poster session guidelines]]

Changed lines 197-205 from:
to:
!! Week 14
# BigTable, Hive, and Pig
** [[http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf | BigTable]]
** [[http://www.cs.rutgers.edu/~pxk/417/notes/content/bigtable.html | BigTable from Rutgers]]
** [[https://cwiki.apache.org/confluence/display/Hive/Tutorial | Hive tutorial]]
** [[http://pig.apache.org/docs/r0.7.0/tutorial.html | Pig tutorial from Apache]]
** [[http://hortonworks.com/hadoop-tutorial/how-to-process-data-with-apache-pig/ | How to process data with Apache pig]]

Added lines 191-197:
!!! Week 13
# CUDA
** [[http://www.nvidia.com/content/GTC-2010/pdfs/2131_GTC2010.pdf | CUDA slides]]
** [[http://devblogs.nvidia.com/parallelforall/easy-introduction-cuda-c-and-c/ | Easy introduction]]
** [[http://www.techpowerup.com/119073/nvidia-cuda-emulator-for-every-pc.html | CUDA emulator]]

Changed lines 183-184 from:
**[[http://www.ibm.com/developerworks/java/library/j-mahout/ } Very good Mahout tutorial]]
to:
**[[http://www.ibm.com/developerworks/java/library/j-mahout/ | Very good Mahout tutorial]]
**[[http://girlincomputerscience.blogspot.com/2010/11/apache-mahout.html | Another
Mahout tutorial]]
Changed lines 182-185 from:
to:
# Mahout
**[[http://www.ibm.com/developerworks/java/library/j-mahout/ } Very good Mahout tutorial]]
**[[http://www.slideshare.net/Cataldo/tutoria-mahout-recommendation | Mahout slides]]

Changed lines 162-163 from:
!!! Week 8
# Mining data streams
to:

!! Week 8
# Association rules 1
# Association rules 2

!! Week 9
# Page rank
# Page rank 2

!! Week 10
# Recommender systems

!! Week 11
# Cloud computing
**[[http://cs.sfsu.edu/ccls/cloud/Amazon_EC2_Tutorial.pdf | EC2 tutorial from SFSU]]
**[[http://aws.amazon.com/documentation/ec2/ | Amazon EC2 documentation]]
**[[http://d36cz9buwru1tt.cloudfront.net/AWS_Cloud_Best_Practices.pdf | Architecting for the cloud: best practices]]
# Hadoop ecosystem
**[[http://www.revelytix.com/?q=content/hadoop-ecosystem | Hadoop ecosystem]]
**[[http://cloudera.com/content/cloudera/en/training/library/apache-hadoop-ecosystem.html | Cloudera videos on Hadoop ecosystem]]

!!! Week 12
# [[Challenge 2]]
# Out of town: work on your projects

!! Latter
Changed lines 159-162 from:
!!! Week 7
to:
!! Week 7
# LSH
# [[Visit to CARC and use of the Galles cluster]]
!!! Week 8
Changed lines 151-152 from:
# '''[[1st challenge]]''' due Oct 27th
to:
# '''[[1st challenge]]''' due Sep 27th
Changed lines 151-152 from:
# [[1st challenge]]
to:
# '''[[1st challenge]]''' due Oct 27th
Changed lines 151-152 from:
# Visit to CARC and Hadoop on demand
to:
# [[1st challenge]]
Changed lines 146-148 from:
# Hadoop installation
# Presentation of related work
to:
# [[http://www.cs.unm.edu/~estrada/files/07-hadoop.zip| Hadoop installation]]
Added lines 149-153:
# [[http://www.cs.unm.edu/~estrada/files/08-intro_to_hadoop.pdf | Introduction to Hadoop]]
# [[http://www.cs.unm.edu/~estrada/files/09-map-reduce-algorithms.pdf | MapReduce algorithms]]
# Visit to CARC and Hadoop on demand

!!! Week 6
Changed line 159 from:
!!! Week 6
to:
!!! Week 7
Changed line 136 from:
# NoSQL databases
to:
# [[http://www.cs.unm.edu/~estrada/files/05-nosql.pdf | NoSQL databases and MongoDB]]
Changed lines 137-143 from:
## Suggested readings; ]]
***
[[http://research.ijais.org/volume5/number4/ijais12-450888.pdf | Comparison between SQL and NoSQL DBs]]
***[[http://www.infoq.com/articles/mongodb-java-php-python | MongoDB for java, php, and python developers]]
***[[http://api.mongodb.org/wiki/current/Tutorial.html | MongoDB tutorial]]
***[[http://martinfowler.com/articles/nosql-intro-original.pdf | Polyglot Persistence]]
***[[http://readwrite.com/2009/02/12/is-the-relational-database-doomed#awesm=~ogp7vsMQNIgj8k | Is the relational DB domed]] 
to:
** [[http://research.ijais.org/volume5/number4/ijais12-450888.pdf | Comparison between SQL and NoSQL DBs]]
**[[http://www.infoq.com/articles/mongodb-java-php-python | MongoDB for java, php, and python developers]]
**[[http://api.mongodb.org/wiki/current/Tutorial.html | MongoDB tutorial]]
**[[http://martinfowler.com/articles/nosql-intro-original.pdf | Polyglot Persistence]]
**[[http://readwrite.com/2009/02/12/is-the-relational-database-doomed#awesm=~ogp7vsMQNIgj8k | Is the relational DB domed]] 
Changed lines 137-138 from:
## Suggested readings; 
*** [[http://readwrite.com/2013/03/25/when-nosql-databases-are-good-for-you#awesm=~oey6QhweaxIfj0 | When NoSQL Databases Are — Yes — Good For You And Your Company
]]
to:
## Suggested readings; ]]
Changed lines 139-143 from:
to:
***[[http://www.infoq.com/articles/mongodb-java-php-python | MongoDB for java, php, and python developers]]
***[[http://api.mongodb.org/wiki/current/Tutorial.html | MongoDB tutorial]]
***[[http://martinfowler.com/articles/nosql-intro-original.pdf | Polyglot Persistence]]
***[[http://readwrite.com/2009/02/12/is-the-relational-database-doomed#awesm=~ogp7vsMQNIgj8k | Is the relational DB domed]] 

Changed line 158 from:
## Suggested readings [[http://infolab.stanford.edu/~ullman/mmds/ch4.pdf | Mining of Massive Datasets Ch 4]]
to:
## Suggested readings [[http://infolab.stanford.edu/~ullman/mmds/ch4.pdf | Mining of Massive Datasets Ch 4]]
Changed lines 137-138 from:
## Suggested readings; [[http://readwrite.com/2013/03/25/when-nosql-databases-are-good-for-you#awesm=~oey6QhweaxIfj0 | When NoSQL Databases Are — Yes — Good For You And Your Company ]]
to:
## Suggested readings;
*** [[http://readwrite.com/2013/03/25/when-nosql-databases-are-good-for-you#awesm=~oey6QhweaxIfj0 | When NoSQL Databases Are — Yes — Good For You And Your Company ]]
*** [[http://research.ijais.org/volume5/number4/ijais12-450888.pdf | Comparison between SQL and NoSQL DBs
]]
Deleted line 155:
Changed lines 126-127 from:
to:
## [[Suggested datasets]]
Changed lines 135-153 from:
# NoSQL databases
to:
# NoSQL databases
## Suggested readings; [[http://readwrite.com/2013/03/25/when-nosql-databases-are-good-for-you#awesm=~oey6QhweaxIfj0 | When NoSQL Databases Are — Yes — Good For You And Your Company ]]

!!! Week 4
#  MapReduce
## Suggested readings [[http://infolab.stanford.edu/~ullman/mmds/ch2.pdf | Mining of Massive Datasets Ch 2]]
# Hadoop installation
# Presentation of related work

!!! Week 5
# Finding similar items
## Suggested readings [[http://infolab.stanford.edu/~ullman/mmds/ch3.pdf | Mining of Massive Datasets Ch 3]]
# Exercises with hadoop
# Student presentations

!!! Week 6
# Mining data streams
## Suggested readings [[http://infolab.stanford.edu/~ullman/mmds/ch4.pdf | Mining of Massive Datasets Ch 4]]

Added line 129:
## Suggested readings: [[http://www.visualisingdata.com/index.php/resources/ | Important tools for visualising and communicating data]]
Changed lines 122-123 from:

TBD
to:
!!! Week 1
# Introduction
# [[http://www.cs.unm.edu/~estrada/files/01-the-learning-problem.pdf | The learning problem]]
# [[http://prezi.com/e2xnc3-nrbja/applications-of-big-data/ | Big Data applications]]

!!! Week 2
# [[http://www.cs.unm.edu/~estrada/files/03-visualization.pdf | Introduction to visualization]]
# [[http://www.cs.unm.edu/~estrada/files/04-visualization/ | VIsualization with R]]
# Informal presentation of projects and [[http://www.cs.unm.edu/~estrada/files/04-visualization/ | visualization with Google charts]]

!!! Week 3
# Database evolution
# NoSQL databases
Changed lines 7-8 from:
* Prerequisites: '''Fluent in at least one of the following programming languages: Python, Java, C, or Matlab'''
* Sakai?:
to:
* Prerequisites: Fluent in at least one of the following programming languages: '''Python, Java, C, or Matlab'''
* UNM Learn: CS-591-001 (Fall 2013)
Changed lines 6-7 from:
* Building and Room: '''TBD'''
* Prerequisites: '''TBD'''
to:
* Building and Room: '''CEC 1026'''
* Prerequisites: '''Fluent in at least one of the following programming languages: Python, Java, C, or Matlab'''
Changed lines 14-18 from:
* Office: ''' TBD '''
* Office hours: '''MW 10:00-11:00 AM'''


to:
* Office: ''' FEC 325 '''
* Office hours: '''M 10:00-11:00 AM''' and '''T 9:00-11:00 AM'''


Changed line 13 from:
* Email: '''TBD'''
to:
* Email: '''estrada@cs.unm.edu'''
Changed lines 25-27 from:
This course explores key data analysis and management techniques, which applied to massive datasets are the cornerstone that enables real-time decision making in distributed environments, business intelligence in the Web, and scientific discovery at large scale. In particular, we examine the map-reduce parallel computing paradigm and associated technologies such as distributed file systems, no-SQL databases, and stream computing engines. Additionally we review machine learning  methods that make possible the efficient analysis of large volumes of data in near real time. Finally, this course is highly interactive and based on the problem-based learning philosophy; students are expected to make use of said technologies to design highly scalable systems that can process and analyze Big Data for a variety of scientific, social, and environmental challenges.

to:
In this course  we explore key data analysis and management techniques, which applied to massive datasets are the cornerstone that enables real-time decision making in distributed environments, business intelligence in the Web, and scientific discovery at large scale. In particular, we examine the map-reduce parallel computing paradigm and associated technologies such as distributed file systems, no-SQL databases, and stream computing engines. Additionally we review machine learning  methods that make possible the efficient analysis of large volumes of data in near real time.

This
course is highly interactive and based on the problem-based learning philosophy; students are expected to make use of said technologies to design highly scalable systems that can process and analyze Big Data for a variety of scientific, social, and environmental challenges.

July 08, 2013, at 10:10 PM EST by 173.75.227.68 -
Changed lines 25-27 from:
This course explores key data analysis and management techniques, which applied to massive datasets are the cornerstone that enables real-time decision making in distributed environments, business intelligence in the Web, and scientific discovery at large scale. In particular, we examine the map-reduce parallel computing paradigm and associated technologies such as distributed file systems, no-SQL databases, and stream computing engines. In the second section of the course we review machine learning  methods that make possible the efficient analysis of large volumes of data in near real time. Finally, in the third section of the course students are expected to make use of said technologies to design highly scalable systems that can process and analyze Big Data for a variety of scientific, social, and environmental challenges.

to:
This course explores key data analysis and management techniques, which applied to massive datasets are the cornerstone that enables real-time decision making in distributed environments, business intelligence in the Web, and scientific discovery at large scale. In particular, we examine the map-reduce parallel computing paradigm and associated technologies such as distributed file systems, no-SQL databases, and stream computing engines. Additionally we review machine learning  methods that make possible the efficient analysis of large volumes of data in near real time. Finally, this course is highly interactive and based on the problem-based learning philosophy; students are expected to make use of said technologies to design highly scalable systems that can process and analyze Big Data for a variety of scientific, social, and environmental challenges.

Added lines 42-59:
------

!! PROJECTS:

Projects are the most important learning tool of this class. There will be [[small projects | monthly challenges]] assigned by the instructor during the semester, and one final project defined by the student.

!!! Challenges

This course is designed to be a hands-on learning experience. I believe that students learn better by doing. Thus, by providing concrete, practical experience I expect that students will be better prepared to apply their new knowledge into real-life, data-intensive, research situations.

As part of this philosophy, there will be monthly Big Data challenges. Every month, a challenge will be released to the students who will compete with each other to design and implement the best solution. Full credit will be obtained regardless of the particular rank of a student's solution. The goal of these challenges is to expose the student to the use of learning algorithms and infrastructure used by Big Data technologies. Released problems will reflect as much as possible real challenges in fields such as astronomy, bioinformatics, and analysis of social media.

!!! Final project

The final project is entirely to the discretion of the student (upon instructor approval). Students would be free to explore a problem of their interest and propose their own solution. During the course we will hold weekly brainstorming sessions to discuss and strengthen every proposed project.

Projects will be done individually. You must turn in only code written by you. Under no circumstance you should use code downloaded from Internet since this violation will result in serious penalties.

Changed lines 66-79 from:
Attendance to class is expected and note taking encouraged. Important information (about assignments, projects, policies) may be communicated only in the lectures. We may also cover additional material (not available in the book) during the lecture. If you miss a lecture, you should find what material was covered and if any announcement was made.


!!! Projects:

Projects are the most important learning tool of this class. There will be three [[small projects]] assigned by the instructor during the semester, and one final project defined by the student.

The 3 small projects will expose the student to the use of learning algorithms and infrastructure used by Big Data technologies. These projects will reflect as much as possible real challenges in scientific fields, such as astronomy, bioinformatics, and analysis of social media.

The final project is entirely to the discretion of the student (upon instructor approval). Students would be free to explore a problem of their interest and propose their own solution. During the course we will hold weekly brainstorming sessions to discuss and strengthen every proposed project.

Projects will be done individually. You must turn in only code written by you. Under no circumstance you should use code downloaded from Internet since this violation will result in serious penalties.


to:
Attendance to class is expected and note taking encouraged. Important information (about assignments, projects, policies) may be communicated only in the lectures. We may also cover additional material (not available in the notes) during the lecture. If you miss a lecture, you should find what material was covered and if any announcement was made.

Changed lines 99-101 from:
* '''Project 1''' 15 pts
* '''Project 2''' 15 pts
* '''Project 3''' 15 pts
to:
* '''Challenge 1''' 15 pts
* '''Challenge 2''' 15 pts
* '''Challenge 3''' 15 pts
July 02, 2013, at 01:47 PM EST by 128.4.236.117 -
Changed lines 55-56 from:
The 3 small projects will expose the student to the use of algorithms and infrastructure used by Big Data technologies. These projects will reflect as much as possible real challenges in scientific fields, such as astronomy, bioinformatics, and analysis of social media.
to:
The 3 small projects will expose the student to the use of learning algorithms and infrastructure used by Big Data technologies. These projects will reflect as much as possible real challenges in scientific fields, such as astronomy, bioinformatics, and analysis of social media.
July 02, 2013, at 01:44 PM EST by 128.4.236.117 -
Changed lines 96-97 from:
to:
* '''Extra credit''' 5 pts
July 02, 2013, at 01:42 PM EST by 128.4.236.117 -
Changed lines 91-96 from:
* [[Participation]] 15 pts
* [[Project 1]] 15 pts
* [[Project 2]] 15 pts
* [[Project 3]] 15 pts
* [[Final project]] 40 pts
to:
* '''Participation''' 15 pts
* '''Project 1''' 15 pts
* '''Project 2''' 15 pts
* '''Project 3''' 15 pts
* '''Final project''' 40 pts
July 02, 2013, at 01:41 PM EST by 128.4.236.117 -
Changed lines 71-72 from:
Participation accounts for 15% of your final grade and won't be given for granted. You are required to participate either in class or electronically (through our %target=_blank%[[https:// |facebook group]]).
to:
Participation accounts for 15% of your final grade and won't be given for granted. You are required to participate either in class or electronically (through our %target=_blank%[[https://www.facebook.com/groups/207533772733004/ |facebook group]]).
July 02, 2013, at 01:41 PM EST by 128.4.236.117 -
Changed lines 9-10 from:
* Facebook group:
to:
* Facebook group: https://www.facebook.com/groups/207533772733004/
July 02, 2013, at 01:40 PM EST by 128.4.236.117 -
Changed lines 75-76 from:
In order to facilitate interaction between students and to promote a broader participation, I created a %target=_blank%[[https://|Facebook group]]. This is a discussion forum for the class and members are expected to conduct themselves with respect by posting comments and replies only in the context of the course. Use the FB group to ask general questions about the homework, exams, and lectures. You can also paste small snippets of code to clarify an idea. Students are encouraged to answer each others questions. Recall that your thoughtful participation in this forum accounts through your final grade.
to:
In order to facilitate interaction between students and to promote a broader participation, I created a %target=_blank%[[https://www.facebook.com/groups/207533772733004/|Facebook group]]. This is a discussion forum for the class and members are expected to conduct themselves with respect by posting comments and replies only in the context of the course. Use the FB group to ask general questions about the homework, exams, and lectures. You can also paste small snippets of code to clarify an idea. Students are encouraged to answer each others questions. Recall that your thoughtful participation in this forum accounts through your final grade.
July 02, 2013, at 01:36 PM EST by 128.4.236.117 -
Changed lines 15-18 from:
* Office hours: '''TBD'''


to:
* Office hours: '''MW 10:00-11:00 AM'''


July 02, 2013, at 01:23 PM EST by 128.4.236.117 -
Changed line 5 from:
* Class Time: '''TBD'''
to:
* Class Time: '''MWF 9:00-9:50 AM'''
July 02, 2013, at 11:55 AM EST by 128.4.236.117 -
Changed lines 62-63 from:

to:
!!! Exams

There won't be exams for this course


July 02, 2013, at 11:49 AM EST by 128.4.236.117 -
Added lines 55-58:
The 3 small projects will expose the student to the use of algorithms and infrastructure used by Big Data technologies. These projects will reflect as much as possible real challenges in scientific fields, such as astronomy, bioinformatics, and analysis of social media.

The final project is entirely to the discretion of the student (upon instructor approval). Students would be free to explore a problem of their interest and propose their own solution. During the course we will hold weekly brainstorming sessions to discuss and strengthen every proposed project.

June 21, 2013, at 11:21 AM EST by 128.4.236.117 -
Changed lines 1-2 from:
!! '''Note that this syllabus is under construction'''
to:
!! '''Please note that this syllabus is under construction'''
June 21, 2013, at 11:20 AM EST by 128.4.236.117 -
Changed lines 1-2 from:
!! '''Note that this syllabus is under construction''''
to:
!! '''Note that this syllabus is under construction'''
June 21, 2013, at 11:20 AM EST by 128.4.236.117 -
Changed lines 1-3 from:

'''Note that this syllabus is under construction''''
to:
!! '''Note that this syllabus is under construction''''
June 21, 2013, at 11:16 AM EST by 128.4.236.117 -
Deleted line 1:
Changed lines 18-21 from:
If you have problems understanding the course material or you have questions regarding exams or grades, you can always ask through email, come to my office hours, or make an appointment with me.


to:

June 21, 2013, at 11:14 AM EST by 128.4.236.117 -
Changed lines 1-2 from:
! Introduction to Big Data Fall of 2013
to:

June 21, 2013, at 11:11 AM EST by 128.4.236.117 -
Changed lines 89-92 from:
* [[Homework]] 30 pts
* [[Projects]] 30 pts
* [[Exams]] 25 pts
to:
* [[Project 1]] 15 pts
* [[Project 2]] 15 pts
* [[Project 3]] 15 pts
* [[Final project]] 40
pts
June 21, 2013, at 11:06 AM EST by 128.4.236.117 -
Added lines 3-4:
'''Note that this syllabus is under construction''''
June 21, 2013, at 11:05 AM EST by 128.4.236.117 -
Changed lines 1-2 from:
!! Course description:
to:
! Introduction to Big Data Fall of 2013

!! COURSE INFORMATION

* Class Time: '''TBD'''
* Building and Room: '''TBD'''
* Prerequisites: '''TBD'''
* Sakai?:
* Facebook group:

!!! Instructor
* '''Trilce Estrada''', Assistant Professor
* Email: '''TBD'''
* Office: ''' TBD '''
* Office hours: '''TBD'''

If you have problems understanding the course material or you have questions regarding exams or grades, you can always ask through email, come to my office hours, or make an appointment with me.



-------

!! COURSE DESCRIPTION:

Changed lines 30-31 from:
!! Core topics:
to:
!!! Core topics:
Changed lines 39-41 from:
!! Course objectives:

At the end of this course, the student will become familiar with the fundamental concepts of Big Data management an analytics; will become competent in recognizing challenges faced by applications dealing with very large volumes of data as well as in proposing scalable solutions for them; and will be able to understand how Big Data impacts business intelligence, scientific discovery, and our day-to-day life.
to:
!!! Course objectives:

At the end of this course, the student will become familiar with the fundamental concepts of Big Data management an analytics; will become competent in recognizing challenges faced by applications dealing with very large volumes of data as well as in proposing scalable solutions for them; and will be able to understand how Big Data impacts business intelligence, scientific discovery, and our day-to-day life.


---------
!! POLICIES


!!! Class attendance:

Attendance to class is expected and note taking encouraged. Important information (about assignments, projects, policies) may be communicated only in the lectures. We may also cover additional material (not available in the book) during the lecture. If you miss a lecture, you should find what material was covered and if any announcement was made.


!!! Projects:

Projects are the most important learning tool of this class. There will be three [[small projects]] assigned by the instructor during the semester, and one final project defined by the student.

Projects will be done individually. You must turn in only code written by you. Under no circumstance you should use code downloaded from Internet since this violation will result in serious penalties.




!!! Participation

Participation is the barometer of the class. Based on it I can determine if the pace of the course is too fast or too slow, it helps me to spot pitfalls and misconceptions, and it helps you to reinforce the material you learned.

Participation accounts for 15% of your final grade and won't be given for granted. You are required to participate either in class or electronically (through our %target=_blank%[[https:// |facebook group]]).

!!! Facebook group:

In order to facilitate interaction between students and to promote a broader participation, I created a %target=_blank%[[https://|Facebook group]]. This is a discussion forum for the class and members are expected to conduct themselves with respect by posting comments and replies only in the context of the course. Use the FB group to ask general questions about the homework, exams, and lectures. You can also paste small snippets of code to clarify an idea. Students are encouraged to answer each others questions. Recall that your thoughtful participation in this forum accounts through your final grade.

If you don't have a facebook account don't worry, you can gain your participation points during class. This is more an experiment to encourage interaction, spark discussion, and identify possible misconceptions. Additionally, if I identify an interesting conversation happening in FB, I will post it in this Web site to make sure that everybody can benefit from it.

Please be aware that the social media interaction between me (the instructor) and students is only limited to this FB group and under no circumstance I am allowed to befriend students directly.




!!! Feedback:

I value student's opinions regarding the course and I will take them in consideration to make this course as exciting and engaging as possible. Thus, through the semester I will ask students formal and informal feedback. Formal feedback includes short surveys on my teaching effectiveness, preferred teaching methods, and pace of the class. Informal feedback will be in the form of FB polls or in-class questions regarding learning preferences. You can also leave anonymous feedback in the form of a note in my departmental mail box, or %target=_blank%[[https://docs.google.com/spreadsheet/viewform?fromEmail=true&formkey=dHZvUElpWXJsTXV6RTAwZGlCTkYtQVE6MQ | using this form]]. Remember that it is in the best interest of the class if you bring up to my attention if something is not working properly (e.g the pace of the class is too slow, the projects are boring, my teaching style is not effective) so that I can make the corrective steps.

----------

!! GRADING
* [[Participation]] 15 pts
* [[Homework]] 30 pts
* [[Projects]] 30 pts
* [[Exams]] 25 pts

Grades will be based on your earned points, following this grade scale. You need to get the specified number of points or more to obtain the grade from the same column. Scores will be rounded to the closest integer value.
[@
A A- B+ B B- C+ C C- D+ D D- F
95 90 87 83 80 77 73 70 67 63 60 <60
@]

* Incomplete can be assigned only for a documented medical reason

!! SPECIAL ACCOMODATIONS
If you need special accommodations or assistance, please contact the Accessibility Resource Center (%target=_blank%http://as2.unm.edu/)

----------

!! SCHEDULE


TBD
June 21, 2013, at 10:51 AM EST by 128.4.236.117 -
Added lines 1-19:
!! Course description:

The field of computer science is experiencing a transition from computation-intensive to data-intensive problems, wherein data is produced in massive amounts by large sensor networks, new data acquisition techniques, simulations, and social networks. Efficiently extracting, interpreting, and learning from very large datasets requires a new generation of scalable algorithms as well as new data management technologies.

This course explores key data analysis and management techniques, which applied to massive datasets are the cornerstone that enables real-time decision making in distributed environments, business intelligence in the Web, and scientific discovery at large scale. In particular, we examine the map-reduce parallel computing paradigm and associated technologies such as distributed file systems, no-SQL databases, and stream computing engines. In the second section of the course we review machine learning  methods that make possible the efficient analysis of large volumes of data in near real time. Finally, in the third section of the course students are expected to make use of said technologies to design highly scalable systems that can process and analyze Big Data for a variety of scientific, social, and environmental challenges.


!! Core topics:

* Large databases and their evolution.
* Big Data technology and trends, special consideration made to the Map-Reduce paradigm.
* Searching, indexing, and their implications to memory management.
* Information extraction and feature selection.
* Supervised-, unsupervised-learning, and stream mining.


!! Course objectives:

At the end of this course, the student will become familiar with the fundamental concepts of Big Data management an analytics; will become competent in recognizing challenges faced by applications dealing with very large volumes of data as well as in proposing scalable solutions for them; and will be able to understand how Big Data impacts business intelligence, scientific discovery, and our day-to-day life.