Data Mining SENG 474/ CSC 578D


Course website on Heat for SENG 474 and CSC 578D. There will be no further update on these two sites. More up-to-date contents will be on this site.

Tentative topics: <ul> <li>Finding similar items</li> <li>Frequent itemsets</li> </li>Classification</li> <li>Regression</li> <li>Clustering</li> <li>Recommender Systems</li> <li>Mining Social-Network Graphs</li> <li>Link Analysis</li> <li>Advertising on the Web</li> <li>A/B Testing</li> </ul>

Textbook: Materials in the class will be drawn mostly from the free-and-great book Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeff Ullman

Teaching Staffs


Instructor: Hung Le. <ul> <li>Email: hungle@uvic.ca</li> <li>Office: ECS 621</li> <li>Weekly Office Hours: 10:30 am - 12:30 pm Friday</li> </ul> TAs:

  • Sajjad Azami (Email: sajjadaazami@gmail.com)
  • Cole Peterson (Email: colpeterson@gmail.com)
  • Jasbir Singh (Email: jasbircheema96@gmail.com)
  • Weekly Office Hours: Monday 11:00 am - 12:30 pm and Tuesday 1:30-3:00 pm, all at ECS 253

Annoucement



Final Soluttion is posted. Check out here.

Practice Problems for the final exam is posted. Check out here.

Written Assignment 4 is posted. Check out here. Check out the solution written by Jasbir Singh.

Programming Assignment 4: The description for the programming assignemnet 4 is online. Plese checkout here. Ask me or the TA if you have any confusion.

Midterm solution is posted. Check out here.

Project guideline is posted. Check out here.

Written Assignment 3 is released. Check out here. Check out the solution written by Jasbir Singh.

Programming Assignment 3: The description for the programming assignemnet 3 is online. Plese checkout here. Ask me or the TA if you have any confusion.

Written Assignment 2 is released. Check out here. Check out the solution. The write up for Q2 and Q4b and Q5b is by Hung Le, for others is by Sajjad Azami.

Programming Assignment 2: The description for the programming assignemnet 2 is online. Plese checkout here. Ask me or the TA if you have any confusion.

Written Assignment 1 is released. Check out here. Check out the solution written by Jasbir Singh.

Programming Assignment 1: The description for the programming assignemnet 1 is online. Plese checkout here. Ask me or the TA if you have any confusion. Here is a sample solution.

Lecture


  1. Tue 08/01: Introduction. See chapter 1 of MMD book and my own note. Many interesting case studies on Big Data on Wikipedia.

  2. Wed 09/01: Review of hashing. Guest lectured by Dr. Nishant Mehta. See this note by Jeff Erickson.

  3. Fri 11/01: Finding Similar Items. See chapter 3 of MMD book and my own note. See here for the Amazon.com recommendation paper mentioned in class. Other similarity measures (or distances) beside the one we saw in class, you can checkout here or there.

  4. Tue 15/01: Finding Similar Items (Continued). See the references above.

  5. Wed 16/01: Frequent Itemsets. See chapter 6 of MMDS book and my own note. The diapers and beer story.

  6. Fri 18/01: Frequent Itemsets (Continued). See the references above.

  7. Tue 22/01: Frequent Itemsets (Continued). See the references above.

  8. Wed 23/01: Linear Regression. See Andrew Ng note and my own note. Also, for a review of linear algebra, see this note. Stochastic Gradient Descent Tricks by Léon Bottou.

  9. Fri 27/01: Linear Regression (Continued). See the references above.

  10. Tue 29/01: Linear Regression (Continued). See the references above.

  11. Wed 30/01: Support Vector Machine. See chapter 12 of MMDS book and my own note.

  12. Fri 01/02: Support Vector Machine (Continued). See the references above.

  13. Tue 05/02: Link Analysis. See chapter 5 of MMDS book and and my own note. Many good materials on link analysis from Harvard Amazing project. PageRank paper and HITS paper.

  14. Wed 06/02: Link Analysis (Continued). See the references above.

  15. Fri 08/02: Link Analysis (Continued). See the references above.

  16. Tue 12/02: Class canceled due to bad weather. The university is shutdown.

  17. Wed 13/02: Midterm.

  18. Fri 15/02: Link Analysis (Continued). See the references above.

  19. Tue 19/02: Reading break.

  20. Wed 20/02: Reading break.

  21. Fri 22/02: Reading break.

  22. Tue 26/02: Clustering. See chapter 7 of MMDS book and and my own note.

  23. Wed 27/02: Clustering (Continued). See the references above.

  24. Fri 01/03: Advertising on the Web. See chapter 8 of MMDS book and and my own note.

  25. Tue 05/03: Advertising on the Web (Continued). See the references above.

  26. Wed 06/03: Advertising on the Web (Continued). See the references above.

  27. Fri 08/03: Recommendation System. See chapter 9 of MMDS book and and my own note.

  28. Tue 12/03: Recommendation System (Continued). See the references above.

  29. Wed 13/03: Mining Social Network Graphs. See chapter 10 of MMDS book and and my own note.

  30. Fri 15/03: Mining Social Network Graphs (Continued). See the references above.

  31. Tue 19/03: Mining Social Network Graphs (Continued). See the references above.

  32. Wed 20/03: Mining Social Network Graphs (Continued). See the references above.

  33. Fri 22/03: Mining Social Network Graphs (Continued). See the references above.

  34. Tue 26/03: Dimensionality Reduction. See chapter 11 of MMDS book and and my own note.