Hung Le / Home Page

CMPSCI 611 : Advanced Algorithms

2023-04-10T00:00:00-07:00

Last Updated: August 27 2023.

Credit Hours: 3

Prerequisites: Students are expected to have mathematical maturity and knowledge of COMPSCI 311 or equivalence.

Teaching Staffs:

Instructor: Hung Le.
- Email: hungle@cs.umss.du
- Office: 332 CS Building
- Office hours: Tuesday 11:00am -12:00pm, and Thursday 4:00pm-5:00pm
TAs:
- An La, email: anla@umass.edu, office hours: 4:00pm-5:00pm, Friday.
- Hasnain Heickal, email: hheickal@cs.umass.edu, office hours: 10:00am - 11:00am, Wed.
- Ojaswi Acharya, email: oacharya@umass.edu, office hours: 1:30 pm -2:30 pm Monday.
Graders:
- Snigdha Viswanathan, email: sviswanathan@umass.edu
- Sai Vineeth Kumar Dara, email: sdara@umass.edu
- Suraj Jain, email: surajjain@umass.edu
- Edward Annatone, email: eannatone@umass.edu
- Om Prakash Prajapath, email: oprajapath@umass.edu
- Aditi Baskar: abaskar@umass.edu

Class Meetings: Tue and Thu, from 2:30pm-3:45pm.

Location: Hasbrouck Lab Room 134

Objectives: This course provides students with skills in designing efficient algorithms. We will go through a variety of algorithm design techniques, including greedy, divide and conquer, dynamic programming, network flow, linear programming, randomized algorithms, and approximation algorithms. We will illustrate these design techniques in solving different algorithmic problems. The emphasis of this course is on the mathematical aspects of designing algorithms.

Learning Outcomes: After completing this course, students are expected to be able to formulate an algorithmic problem, design an algorithm for the problem, prove the correctness, and analyze the running time.

Required Textbook: Lectures will be based on Jeff Erickson notes. Slides will be posted on Moodle.

Optional Textbook:

Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein.
Algorithm Design by Kleinberg and Tardos (KT).
Algorithms by Dasgupta, Papadimitriou, Vazirani (DPV).
Randomized Algorithms by Motwani and Raghavan (MR).
Probability and Computing by Mitzenmacher and Upfal (MU).
Approximation Algorithms by Vazirani.

Tentative topics:

Divide and Conquer (2 lectures)
Dynamic Programming (2 lectures)
Greedy Algorithms (3 lectures)
Randomized Algorithms (2 lectures)
Network Flow (3 lectures)
Linear Programming (3 lectures)
NP-Completeness (2 lectures)
Approximation Algorsithms (3 lectures)

Schedule:

Date	Topics	Readings
05 Sep	Intro, Master theorem, Mergesort	Erickson’s note on recursion
07 Sep	Closest Pair, Matrix Multiplication	DPV’s chapter 2
12 Sep	Problem Solving Session
14 Sep	Intro Greedy, Job Scheduling	Erickson’s note on geedy algs
19 Sep	Minimum Spanning Tree	Erickson’s note on MST
21 Sep	Matroid	Erickson’s note on matroid
26 Sep	Subset Sum, Optimal BST	Erickson’s note on DP
28 Sep	SSSP and TSP	Erickson’s note on SSSP and APSP
03 Oct	Problem Solving Session
05 Oct	Balls and Bins	Erickson’s note on Hashing
12 Oct	Midterm 1	Covering D&C, DP, and Greedy
17 Oct	Bloom Filter	Erickson’s note on filtering and streaming
19 Oct	Randomized Mincut	Erickson’s note on randomized mincut
24 Oct	Maxflow-Mincut	Erickson’s note on Maxflow
26 Oct	Maxflow in Strongly PolyTime	Erickson’s note on Maxflow
31 Oct	Applications of Maxflow	Erickson’s note on Applications of Maxflow
02 11	Problem Solving Session
07 Nov	Introduction to Linear Programming	Erickson’s note on LP
09 Nov	LP Duality	Erickson’s note on LP
14 Nov	P vs NP	Erickson’s note on NP-hardness
16 Nov	Midterm 2	Covering Randomized Algorithms, Maxflow, and LP
21 Nov	NP-complete Problems	Erickson’s note on NP-hardness
28 Nov	Vertex Cover,Set Cover	Erickson’s note on approximation algorithms
30 Nov	TSP	Erickson’s note on approximation algorithms
05 Dec	Problem Solving Session
07 Dec	Review
14 Dec	Final exam from 3:30 PM - 5:30 PM at classroom	Covering everything

Grading

Homework (40%): Homework is bi-weekly and includes 6 assignments. The lowest assignment will be dropped.
Weekly Quizzes (8%): We will have 11 quizzes total, and the lowest quiz will be dropped.
Attendance (2%).
Midterms 1 + 2: (30%), the maximum will be 20% and the minimum will be 10%
Final (20%): Scheduled by the university and will be comprehensive.

Grading Scale: A (100-90), A- (89-84), B+ (83-78), B (77-72), B- (71-66), C+ (65-60), C (59-54), F (53-0)

Late Policy: You have one late day on any HW of your choice. For other HWs, each one hour late within 24 hours incurs 2 points of penalty. Submission of more than 24 hours late will not be graded unless you have a good medical reason. Try your best to honor the deadlines.

Exam Make-up Policies: If you have a conflict exam with another class, you should contact University Registrar’s Office. If you cannot attend the exam for a medical reason, please notify the instructor at least one week before the exam. If you have a medical emergency, contact the instructor as soon as possible. You need to provide a document for the medical reason.

Platforms: We will use Moodle for general logistics, Campuswire for discussion and Gradescopes for homework assignments.

Communication Policy: Questions regarding homework assignments/class materials should be posted on Campuswire. All questions will be answered within 24 hours, except over weekends. Other questions should be sent by email to the instructor and/or TAs.

Posting Policy: You are not allowed to post any material in this course to public websites without the permission of the instructor.

Academic Honesty and Collaboration Policy:

You must do exams and quizzes on your own. No collaboration is allowed.
You might collaborate with at most 2 other students on homework. You must specify anyone you collaborated with in your submissions. The collaboration is verbal only. The write-up must be your own. You are NOT allowed to talk about the homework with anyone else outside your group (except TAs and the instructor). You are NOT allowed to consult any material on the Internet to do your homework.
You are allowed to bring at most 2 pages of A4 cheatsheets to the exams. NO other materials are allowed.
DO ask if you have any questions regarding academic honesty.

As members of the College of Information and Computer Sciences at UMass Amherst, we expect everyone to behave responsibly and honorably. In particular, we expect each of you not to give, receive, or use aid in examinations, nor to give, receive, or use unpermitted aid in any academic work. Doing your part in observing this code, and ensuring that others do likewise is essential for having a community of respect, integrity, fairness, and trust. If you cheat in a course, you are taking away from your own opportunity to learn and develop as a professional. You also hurt your colleagues, and this will hurt people you will work with in the future, who expect an honest and responsible professional.

As faculty, we pledge to use academic policies designed for fairness, avoiding situations that are conducive to violating academic honesty, as well as unreasonable or unusual procedures that assume dishonesty. We will follow the university’s Academic Honesty Policy and Procedures. This means we will report instances of dishonesty, which may lead to formal sanction and/or failing the course.

Attendance Policies: Attendance is not optional. If you do not attend a lecture, you are responsible for learning the materials covered in the leccture yourself. A small percentage point will be given to those who attend the lectures.

Accommodations for Disabilities: The University of Massachusetts Amherst is committed to providing an equal educational opportunity for all students. If you have a documented physical, psychological, or learning disability on file with Disability Services (DS), you may be eligible for reasonable academic accommodations to help you succeed in this course. If you have a documented disability that requires an accommodation, please notify the instructor within the first two weeks of the semester so that we can make appropriate arrangements. For more information, consult the Disability Services website at https://www.umass.edu/disability/.

Equity and Inclusion Statement: We are committed to fostering a culture of diversity and inclusion, where everyone is treated with dignity and respect. This course is for everyone. This course is for you, regardless of your age, background, citizenship, disability, sex, education, ethnicity, family status, gender, gender identity, geographical origin, language, military experience, political views, race, religion, sexual orientation, socioeconomic status, or work experience. Because of that, we should realize that we will be bringing different skills to the course, and we will all be learning from and with each other. We may have different backgrounds and skills in courses taken, mathematical, algorithmic, coding or testing background, ways to communicate orally and in writing, working alone or in groups, or plans for professional careers.

Please be kind and courteous. There’s no need to be mean or rude. Respect that people have differences of opinion, and work and approach problems differently. There is seldom a single right answer to complicated questions. Please keep unstructured critique to a minimum; any criticism should be constructive.

Disruptive behavior is not welcome, and insulting, demeaning, or harassing anyone is unacceptable. In particular, we don’t tolerate behavior that excludes people in socially marginalized groups. If you feel you have been or are being harassed or made uncomfortable by someone in this class, please contact a member of the course staff immediately, or if you feel uncomfortable doing so, contact the Dean of Students office.

This course is for all of us. We will all learn from each other. Welcome!

Names & Pronouns: Everyone has the right to be addressed by the name and pronouns that they use for themselves. You can indicate your preferred/chosen first name and pronouns on SPIRE, which appear on class rosters. I am committed to ensuring that I address you with your chosen name and pronouns. Please let me know what name and pronouns I should use for you if they are not on the roster. Please remember: A student’s chosen name and pronouns are to be respected at all times in the classroom.

Title IX Statement: UMass is committed to fostering a safe learning environment by responding promptly and effectively to complaints of all kinds of sexual misconduct. If you have been the victim of sexual violence, gender discrimination, or sexual harassment, the university can provide you with a variety of support resources and accommodations If you experience or witness sexual misconduct and wish to report the incident, please contact the UMass Amherst Equal Opportunity (EO) Office (413-545-3464, equalopportunity@admin.umass.edu) to request an intake meeting with EO staff. Members of the CICS community can also contact Erika Lynn Dawson Head, director of diversity and inclusive community development (erikahead@cics.umass.edu, 860-770-4770).

CMPSCI 611 : Advanced Algorithms

2023-03-16T00:00:00-07:00

Last Updated: March 16 2024.

Credit Hours: 3

Prerequisites: Students are expected to have mathematical maturity and knowledge of COMPSCI 311 or equivalence.

Teaching Staffs:

Instructor: Hung Le.
- Email: hungle@cs.umss.du
- Office: 332 CS Building
- Office hours: TBA
TAs: TBA
Graders: TBA

Class Meetings: Tue and Thu, from 2:30pm-3:45pm.

Location: Hasbrouck Lab Room 134

Required Textbook: Lectures will be based on Jeff Erickson notes. Slides will be posted on Moodle.

Optional Textbook:

Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein.
Algorithm Design by Kleinberg and Tardos (KT).
Algorithms by Dasgupta, Papadimitriou, Vazirani (DPV).
Randomized Algorithms by Motwani and Raghavan (MR).
Probability and Computing by Mitzenmacher and Upfal (MU).
Approximation Algorithms by Vazirani.

Tentative topics:

Divide and Conquer (2 lectures)
Dynamic Programming (2 lectures)
Greedy Algorithms (3 lectures)
Randomized Algorithms (2 lectures)
Network Flow (3 lectures)
Linear Programming (3 lectures)
NP-Completeness (2 lectures)
Approximation Algorsithms (3 lectures)

Schedule:

Date	Topics	Readings
05 Sep	Intro, Master theorem, Mergesort	Erickson’s note on recursion
07 Sep	Closest Pair, Matrix Multiplication	DPV’s chapter 2
12 Sep	Problem Solving Session
14 Sep	Intro Greedy, Job Scheduling	Erickson’s note on geedy algs
19 Sep	Minimum Spanning Tree	Erickson’s note on MST
21 Sep	Matroid	Erickson’s note on matroid
26 Sep	Subset Sum, Optimal BST	Erickson’s note on DP
28 Sep	SSSP and TSP	Erickson’s note on SSSP and APSP
03 Oct	Problem Solving Session
05 Oct	Balls and Bins	Erickson’s note on Hashing
12 Oct	Midterm 1	Covering D&C, DP, and Greedy
17 Oct	Bloom Filter	Erickson’s note on filtering and streaming
19 Oct	Randomized Mincut	Erickson’s note on randomized mincut
24 Oct	Maxflow-Mincut	Erickson’s note on Maxflow
26 Oct	Maxflow in Strongly PolyTime	Erickson’s note on Maxflow
31 Oct	Applications of Maxflow	Erickson’s note on Applications of Maxflow
02 11	Problem Solving Session
07 Nov	Introduction to Linear Programming	Erickson’s note on LP
09 Nov	LP Duality	Erickson’s note on LP
14 Nov	P vs NP	Erickson’s note on NP-hardness
16 Nov	Midterm 2	Covering Randomized Algorithms, Maxflow, and LP
21 Nov	NP-complete Problems	Erickson’s note on NP-hardness
28 Nov	Vertex Cover,Set Cover	Erickson’s note on approximation algorithms
30 Nov	TSP	Erickson’s note on approximation algorithms
05 Dec	Problem Solving Session
07 Dec	Review
14 Dec	Final exam from 3:30 PM - 5:30 PM at classroom	Covering everything

Grading

Homework (40%): Homework is bi-weekly and includes 6 assignments. The lowest assignment will be dropped.
Weekly Quizzes (8%): We will have 11 quizzes total, and the lowest quiz will be dropped.
Attendance (2%).
Midterms 1 + 2: (30%), the maximum will be 20% and the minimum will be 10%
Final (20%): Scheduled by the university and will be comprehensive.

Grading Scale: A (100-90), A- (89-84), B+ (83-78), B (77-72), B- (71-66), C+ (65-60), C (59-54), F (53-0)

Platforms: We will use Moodle for general logistics, Campuswire for discussion and Gradescopes for homework assignments.

Posting Policy: You are not allowed to post any material in this course to public websites without the permission of the instructor.

Academic Honesty and Collaboration Policy:

You must do exams and quizzes on your own. No collaboration is allowed.
You might collaborate with at most 2 other students on homework. You must specify anyone you collaborated with in your submissions. The collaboration is verbal only. The write-up must be your own. You are NOT allowed to talk about the homework with anyone else outside your group (except TAs and the instructor). You are NOT allowed to consult any material on the Internet to do your homework.
You are allowed to bring at most 2 pages of A4 cheatsheets to the exams. NO other materials are allowed.
DO ask if you have any questions regarding academic honesty.

This course is for all of us. We will all learn from each other. Welcome!

CMPSCI 611 : Advanced Algorithms Spring 2023

2022-10-21T00:00:00-07:00

Last Updated: January 31 2023.

Credit Hours: 3

Prerequisites: Students are expected to have mathematical maturity and knowledge of COMPSCI 311 or equivalence.

Teaching Staffs:

Instructor: Hung Le.
- Email: hungle@cs.umss.du
- Office: 332 CS Building
- Office hours: TBA
TAs: TBA

Class Meetings: Tue/Thu 10:00 AM - 11:15 AM from Feb 06 - May 17

Location: Hasbrouck Lab Room 124

Required Textbook: Lectures will be based on Jeff Erickson notes. Slides will be posted on Moodle.

Optional Textbook:

Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein.
Algorithm Design by Kleinberg and Tardos (KT).
Algorithms by Dasgupta, Papadimitriou, Vazirani (DPV).
Randomized Algorithms by Motwani and Raghavan (MR).
Probability and Computing by Mitzenmacher and Upfal (MU).
Approximation Algorithms by Vazirani.

Tentative topics:

Divide and Conquer (2 lectures)
Dynamic Programming (2 lectures)
Greedy Algorithms (3 lectures)
Randomized Algorithms (3 lectures)
Network Flow (3 lectures)
Linear Programming (2 lectures)
NP-Completeness (2 lectures)
Approximation Algorsithms (3 lectures)

Schedule:

Date	Topics	Readings
07 Feb	Intro, Master theorem, Mergesort	Erickson’s note on recursion
09 Feb	Closest Pair, Matrix Multiplication	DPV’s chapter 2
14 Feb	Problem Solving Session
16 Feb	Intro Greedy, Job Scheduling	Erickson’s note on geedy algs
21 Feb	Minimum Spanning Tree	Erickson’s note on MST
23 Feb	Matroid	Erickson’s note on matroid
28 Feb	Subset Sum, Optimal BST	Erickson’s note on DP
02 March	SSSP and TSP	Erickson’s note on SSSP and APSP
07 March	Problem Solving Session
09 March	Balls and Bins	Erickson’s note on Hashing
21 March	Midterm 1	Covering D&C, DP, and Greedy
23 March	Bloom Filter	Erickson’s note on filtering and streaming
28 March	Randomized Mincut	Erickson’s note on randomized mincut
30 March	Maxflow-Mincut	Erickson’s note on Maxflow
04 April	Maxflow in Strongly PolyTime	Erickson’s note on Maxflow
06 April	Applications of Maxflow	Erickson’s note on Applications of Maxflow
11 April	Problem Solving Session
13 April	Introduction to Linear Programming	Erickson’s note on LP
20 April	LP Duality	Erickson’s note on LP
25 April	P vs NP	Erickson’s note on NP-hardness
27 April	Midterm 2	Covering Randomized Algorithms, Maxflow, and LP
02 May	NP-complete Problems	Erickson’s note on NP-hardness
04 May	Vertex Cover,Set Cover	Erickson’s note on approximation algorithms
09 May	TSP	Erickson’s note on approximation algorithms
11 May	Problem Solving Session
16 May	Review
May 19	Final exam from 8:00 AM - 10:00 AM at classroom	Covering everything

Grading

Homework (40%): Homework is bi-weekly and includes 5 assignments and 1 bonus assignment. The grade of the bonus assignment could be used to replace the lowest grade of any 5 regular assignments.
Weekly Quizzes (8%): We will have 4 quizzes, and two bonus quizzes. The grades of two bonus quizzes could be used to replace the lowest grade of any two other quizzes.
Attendance (2%)
Midterm 1 (15%)
Midterm 2 (15%)
Final (20%): Scheduled by the university and will be comprehensive.

Grading Scale: A (100-90), A- (89-84), B+ (83-78), B (77-72), B- (71-66), C+ (65-60), C (59-54), F (53-0)

Late Policy: You have one late day on any HW of your choice, and you have to decide applying the late day to a homework before the deadline. For other HWs, each one hour late within 24 hours incurs 2 points of penalty. Submission of more than 24 hours late will not be graded unless you have a good medical reason. Try your best to honor the deadlines.

SAT/UNSAT: Any request for SAT/UNSAT must be made before the final exam. SAT/UNSAT option will not be given to anyone committing academic dishonesty.

Platforms: We will use Moodle for general logistics, Campuswire for discussion and Gradescopes for homework assignments.

Posting Policy: You are not allowed to post any material in this course to public websites without the permission of the instructor.

Academic Honesty and Collaboration Policy:

You must do exams and quizzes on your own. No collaboration is allowed.
You might collaborate with at most 2 other students on homework. You must specify anyone you collaborated with in your submissions. The collaboration is verbal only. The write-up must be your own. You are NOT allowed to talk about the homework with anyone else outside your group (except TAs and the instructor). You are NOT allowed to consult any material on the Internet to do your homework.
You are allowed to bring at most 2 pages of A4 cheatsheets to the exams. NO other materials are allowed.
DO ask if you have any questions regarding academic honesty.

This course is for all of us. We will all learn from each other. Welcome!

Prospective students

2022-08-09T00:00:00-07:00

Thank you for being interested in working with me. I enjoy working with students. And yes, I am looking for PhD students starting from Fall 2023 (application due by December 15, 2022). Feel free to send your CV and your transcript to hungle@cs.umass.edu. Check out here for general requirement; GRE is NOT required for PhD admission. Having a good math background will be appreciated. Note that I cannot answer questions regarding your chance of being admitted. This year, our college, Manning CICS, provides support for Iranian PhD Applicants.

CMPSCI 611 : Advanced Algorithms

2022-03-24T00:00:00-07:00

Last Updated: August 14 2022.

Credit Hours: 3

Prerequisites: Students are expected to have mathematical maturity and knowledge of COMPSCI 311 or equivalence.

Teaching Staffs:

Instructor: Hung Le.
- Email: hungle@cs.umss.du
- Office: 332 CS Building
- Office hours: Monday 11:00 AM - 12:00 PM, and Friday 3:00 PM - 4:00 PM, CS Building Room 332.
TAs
- Cuong Than, email: cthan@cs.umass.edu, office hours: TBA
- Samer Nashed, email: snashed@cs.umass.edu, office hours: TBA
Graders
- Roshitha Bezawada, email: rbezawada@umass.edu
- akhila jetty, email: ajetty@umass.edu
- Vinitha Maheswaran, email: vmaheswaran@umass.edu
- Veda Sree Bojanapally, email: vbojanapally@umass.edu

Class Meetings: Tue/Thu 2:30 PM - 3:45 PM every week at Hasbrouck Lab Room 134.

Location: Hasbrouck Lab Room 134

Required Textbook: Lectures will be based on Jeff Erickson notes. Slides will be posted on Moodle.

Optional Textbook:

Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein.
Algorithm Design by Kleinberg and Tardos (KT).
Algorithms by Dasgupta, Papadimitriou, Vazirani (DPV).
Randomized Algorithms by Motwani and Raghavan (MR).
Probability and Computing by Mitzenmacher and Upfal (MU).
Approximation Algorithms by Vazirani.

Tentative topics:

Divide and Conquer (2 lectures)
Dynamic Programming (2 lectures)
Greedy Algorithms (3 lectures)
Randomized Algorithms (2 lectures)
Network Flow (3 lectures)
Linear Programming (3 lectures)
NP-Completeness (2 lectures)
Approximation Algorsithms (3 lectures)

Schedule:

The following tentative schedule might suffer changes.

Date	Topics	Readings
06 Sept	Intro, Master theorem, Mergesort	Erickson’s note on recursion
08 Sept	Closest Pair, Matrix Multiplication	DPV’s chapter 2
13 Sept	Problem Solving Session
15 Sept	Intro Greedy, Job Scheduling	Erickson’s note on geedy algs
20 Sept	Minimum Spanning Tree	Erickson’s note on MST
22 Sept	Matroid	Erickson’s note on matroid
27 Sept	Subset Sum, Optimal BST	Erickson’s note on DP
29 Sept	SSSP and TSP	Erickson’s note on SSSP and APSP
04 Oct	Problem Solving Session
06 Oct	Balls and Bins	Erickson’s note on Hashing
11 Oct	Bloom Filter	Erickson’s note on filtering and streaming
13 Oct	Midterm 1	Covering D&C, DP, and Greedy
18 Oct	Maxflow-Mincut	Erickson’s note on Maxflow
20 Oct	Maxflow in Strongly PolyTime	Erickson’s note on Maxflow
25 Oct	Applications of Maxflow	Erickson’s note on Applications of Maxflow
27 Oct	Problem Solving Session
01 Nov	Introduction to Linear Programming	Erickson’s note on LP
03 Nov	LP Duality	Erickson’s note on LP
08 Nov	Simplex Algorithm	Erickson’s note on Simplex Algorithm
10 Nov	P vs NP	Erickson’s note on NP-hardness
15 Nov	NP-complete Problems	Erickson’s note on NP-hardness
17 Nov	Midterm 2	Covering Randomized Algorithms, Maxflow, and LP
22 Nov	Holiday
24 Nov	Vertex Cover,Set Cover	Erickson’s note on approximation algorithms
29 Nov	TSP, $k$-Center	Erickson’s note on approximation algorithms
01 Dec	Subset Sum	Erickson’s note on approximation algorithms
06 Oct	Problem Solving Session
08 Dec	Review
14 Dec	Final Exam 3:30-5:3- PM (at the classroom)	Covering everything

Grading

Homework (40%): Homework is bi-weekly and includes 6 assignments. The lowest assignment will be dropped.
Weekly Quizzes (10%): We will have 11 quizzes total, and the lowest quiz will be dropped.
Midterm 1 (15%)
Midterm 2 (15%)
Final (20%): Scheduled by the university and will be comprehensive.

Grading Scale: A (100-90), A- (89-84), B+ (83-78), B (77-72), B- (71-66), C+ (65-60), C (59-54), F (53-0)

Platforms: We will use Moodle for general logistics, Campuswire for discussion and Gradescopes for homework assignments.

Posting Policy: You are not allowed to post any material in this course to public websites without the permission of the instructor.

Academic Honesty and Collaboration Policy:

You must do exams and quizzes on your own. No collaboration is allowed.
You might collaborate with at most 2 other students on homework. You must specify anyone you collaborated with in your submissions. The collaboration is verbal only. The write-up must be your own. You are NOT allowed to talk about the homework with anyone else outside your group (except TAs and the instructor). You are NOT allowed to consult any material on the Internet to do your homework.
You are allowed to bring at most 2 pages of A4 cheatsheets to the exams. NO other materials are allowed.
DO ask if you have any questions regarding academic honesty.

This course is for all of us. We will all learn from each other. Welcome!

CMPSCI 611 : Advanced Algorithms

2021-03-18T00:00:00-07:00

Objectives: This course provides students with skills in designing efficient algorithms. After completing this course, students are expected to be able to formulate an algorithmic problem, design an algorithm for the problem, prove the correctness, and analyze the running time. This course will illustrate these skills through various algorithmic problems and important design techniques.

Prerequisites: Students are expected to have mathematical maturity and knowledge of COMPSCI 311 or equivalence.

Location: Agricultural Engineering Building, Room 119.

Teaching Staffs:

Instructor: Hung Le.
- Email: hungle@cs.umss.du
- Office: 332 CS Building
- Weekly Office Hours: Monday 11am -12 pm, Friday 3pm-4pm.
If my office hours do not work for you and you want to see me, you could either talk to me right after the class (preferred) or set up an appointment by email.
TAs:
- Hamid Mozaffari (hamid@cs.umass.edu), office hours: TBA
Graders:
- Fenil Manish Doshi (fdoshi@umass.edu)
- Shanmukh Swaroop Srinivas (shanmukhswar@umass.edu)
- Divya Katkam (dkatkam@umass.edu)
- Mohith Akhilesh Dhulipalla (mdhulipalla@umass.edu)

Grading

Homework (40%): Homework is bi-weekly and includes 6 assignments. The lowest assignment will be weighted 50% only.
Weekly Quizzes (10%): We will have 11 quizzes total, and the lowest quiz will be drop.
Midterm 1 (15%): Thu, Oct 07. Midterm 1 will cover divide and conquer, greedy algorithms, and dynamic programming.
Midterm 2 (15%): Tue, Nov 16. Midterm 2 will cover randomized algorithms, network flow, and linear programming.
Final (20%): Scheduled by the university and will be comprehensive.

Attendance policies: Attendance is not optional. If you do not attend a lecture, you are responsible for learning the materials covered in the leccture yourself.

Academic Honesty and Collaboration Policy:

You must do exams and quizzes on your own. No collaboration is allowed.
You might collaborate with at most 2 other students on homework. You must specify anyone you collaborated with in your submissions. The collaboration is verbal only. The write-up must be your own. You are NOT allowed to talk about the homework with anyone else outside your group (except TAs and the instructor). You are NOT allowed to consult any material on the Internet to do your homework.
You are allowed to bring at most 2 pages of A4 cheatsheets to the exams. NO other materials are allowed.
DO ask if you have any questions regarding academic honesty.

Late Policy: You have one late day on any HW of your choice. Late submissions otherwise will not be graded unless you have a good medical reason. Try your best to honor the deadlines.

Exam Make-up Policies: f you have a conflict exam with another class, you should contact University Registrar’s Office. If you cannot attend the exam for a medical reason, please notify the instructor at least one week before the exam. If you have a medical emergency, contact the instructor as soon as possible. You need to provide a document for the medical reason.

Posting Policy: You are not allowed to post any material in this course to public websites without the permission of the instructor.

Tentative topics:

Divide and Conquer (3 lectures)
Dynamic Programming (3 lectures)
Greedy Algorithms (3 lectures)
Randomized Algorithms (3 lectures)
Network Flow (3 lectures)
Linear Programming (3 lectures)
NP-Completeness (2 lectures)
Approximation Algorithms (3 lectures)

Required Textbook: Lectures will be based on Jeff Erickson notes. Slides will be posted on Moodle.

Optional Textbook:

Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein.
Algorithm Design by Kleinberg and Tardos (KT).
Algorithms by Dasgupta, Papadimitriou, Vazirani (DPV).
Randomized Algorithms by Motwani and Raghavan (MR).
Probability and Computing by Mitzenmacher and Upfal (MU).
Approximation Algorithms by Vazirani.

Schedule:

The following tentative schedule might suffer changes.

Date	Topics	Readings
02 Sept	Intro, Master theorem, Mergesort	Erickson’s note on recursion
07 Sept	Closest Pair, Matrix Multiplication	DPV’s chapter 2
09 Sept	Fast Fourier Transform	Erickson’s note on FFT
14 Sept	Intro Greedy, Job Scheduling	Erickson’s note on geedy algs
16 Sept	Minimum Spanning Tree	Erickson’s note on MST
21 Sept	Matroid	Erickson’s note on matroid
23 Sept	Subset Sum, Optimal BST	Erickson’s note on DP
28 Sept	SSSP and APSP	Erickson’s note on SSSP and APSP
30 Sept	TSP and Independent Set on Trees	DPV’s chapter 6 and Erickson’s note on DP
05 Oct	Nuts and Bolts, Quicksort	Erickson’s note on Randomized Algs
07 Oct	Midterm 1	Covering D&C, DP, and Greedy
12 Oct	Balls and Bins, Chernoff’s Bounds	Erickson’s note on Hashing
14 Oct	Bloom Filter	Erickson’s note on filtering and streaming
19 Oct	Maxflow-Mincut	Erickson’s note on Maxflow
21 Oct	Applications of Maxflow	Erickson’s note on Applications of Maxflow
26 Oct	Maxflow in Strongly PolyTime	Erickson’s note on Maxflow
28 Oct	Introduction to Linear Programming	Erickson’s note on LP
02 Nov	LP Duality	Erickson’s note on LP
04 Nov	Simplex Algorithm	Erickson’s note on Simplex Algorithm
09 Nov	P vs NP	Erickson’s note on NP-hardness
11 Nov	Veterans Day
16 Nov	Midterm 2	Covering Randomized Algorithms, Maxflow, and LP
18 Nov	NP-complete Problems	Erickson’s note on NP-hardness
23 Nov	Vertex Cover,Set Cover	Erickson’s note on approximation algorithms
25 Nov	Thanksgiving
30 Nov	TSP, $k$-Center	Erickson’s note on approximation algorithms
02 Dec	Subset Sum	Erickson’s note on approximation algorithms
07 Dec	Review
10 Oct - 16 Oct	Final Exam (exact date will be announced later)	Covering everything

Platforms: We will use Moodle for general logistics, Campuswire for discussion and Gradescopes for homework assignments.

This course is for all of us. We will all learn from each other. Welcome!

Accommodations for Disabilities: The University of Massachusetts Amherst is committed to making reasonable, effective and appropriate accommodations to meet the needs of students with disabilities and help create a barrier-free campus. If you have a disability and require accommodations, please register with Disability Services, located in 161 Whitmore Hall, (413) 545-0892, to have an accommodation letter sent to your faculty. Information on services and materials for registering is available on the University of Massachusetts Amherst Disability Services page.

Programming Assignment 1 Instructions

2020-07-16T00:00:00-07:00

Due by Jan 28, 2019 11:55 pm

Note written homework 1 is up.

Problem Specification

Goal: In this assignment, we will apply the locality sensitive hashing technique learned in the lecture to a question dataset. The goal is: for each question X, find a set of questions Y in the data set such that Sim(X,Y) ⩾ 0.6, where the similarity is Jaccard.

Input Format: The datasets are given in tvs (tab-separated) format. The file contains two columns: qid and question. Four datasets provided in a single zip-compressed file are:

question_4k.tsv: This dataset contains 4,000 questions.
question_50k.tsv: This dataset contains 50,000 questions.
question_150k.tsv: This dataset contains 150,000 questions.
question_290k.tsv: This dataset contains 290,000 questions.

The dataset can be downloaded from here.

Output Format: output must be given in tsv forrmat, with two columns: qid and similar-qids where qid is the qid of the queried question and similar-qids is the set of similar questions given by their qids. The format of column similar-qids is comma-separated. If a question has no similar question, then this column is empty. Below is an example of the output format:

qid	similar-qids
11
13	145970
15	229098,280602,6603,204128,164826,238609,65667,139632,265843,143673,217736,38330

The way to interpret the above sample output is: the question of qid 11 has no similar question, the question of qid 13 has 1 similar question of qid 145970 and the question of qid 15 has 12 similar questions. You can download a sample output tsv file here. The name of the output file must be question_sim_[*].tsv where [*] is replaced by the size of the dataset. For example, the output of the 4k question data set must be question_sim_4k.tsv.

There are two questions in this assigment. The first question is worth 15 points and the second question is worth 35 points, all of 50 points total.

Question 1 (15 points): Implement the native algorithm that, for each question, loops through the database, computes the Jaccard similarity and output questions of similarity at least 0.6. For full score, your algorithm must run in less than 3 minutes on the dataset question_4k.tsv.

Question 2 (35 points): Implement the locality sensitive hashing algorithm we learned in the class, with x = 0.6, s = 14 and r = 6, where s is the number of hash tables (we use b instead in the lecture slide) and r is the size of the minhash signature. For full score, your algorithm must run in less than 10 minutes on the dataset question_150k.tsv.

Note 1: As you may understand from the lecture, it could be that two non-similar questions are mapped to the same location in the locality sensitive data structure. This is called false positive. You must remove all false positives before writing to the output file.

Note 2: Submit your code and output data to the Connex

FAQ

Q1: Will 50k and 290k question datasets be graded?
Answer: No. They are provided for learning purposes.

Q2: How can we generate a random number in Python3?
Answer: Here is an example code that I use for generating a random 64-bit integer in my implementation.

Q3: What kind of hash function do you recommend for computing the minHash signature?
Answer: In my implementation, I use the linear hash function h(x) = (a*x +b) mod p, where a,b are two random 64-bits integers and p is a 64-bit prime integer. I set p = 15373875993579943603 for all hash functions.

Q4: How can I map a string (and a word specifically for this homework) to an integer so that I can feed it to the linear hash function in Q3.
Answer: I recommend the FNV hash function. You can download and install following the instruction in here. However, I use this library in a slightly different way. Here are steps: I download the libarary, look for the file name “init.py” in the downloaded package, rename it to “fnv.py”, put to the source code folder and import to my code. Here is an example of how to import it. You may notice that there are three diffent hash functions in the example. I use this function hash(data, bits=64) in my implementation.

Q5: If I don’t use python, where can I find a version of the FNV function implementation in other languages?
Answer: You can visit this site. It might have what you want.

Q6: Do you apply any advanced processing technique to nomarlize the datasets?
Answer: I don’t. I want to keep the implementation as simple as possible for learning purpose. I do use question.strip() to remove possible white-space characters ended at each question. Then, I just use split function of Python3 question.split() to break a question into words. You may notice that in this implementation, “what” and “What” would be regarded as different words because I do not handle capitalization. You are welcome to use any technique that can help you improve the correctness of your algorithm, but keep in mind the running time constraint.

Q7: If the outputs of my implementation and another group’s implementation are different, is this a problem?
Answer: No. Because the nature of randomness in locality sensitive hashing, I expect differences in the output. The assignment will mainly be graded based on: speed and your understanding of the algorithm reflected in your code. And don’t forget the dicussion policy that I specified in class.

Programming Assignment 2 Instructions

2020-07-16T00:00:00-07:00

Due by February 11, 2019 11:55 pm

Please note that written homework 2 is up.

Problem Specification

Goal: In this assignment, we will experiment with three different algorithms to train a linear regression models: solving normal equations, batch gradient descent, stochastic gradient descent.

Input Format: The datasets are given in tvs (tab-separated) format. The file format is:

1st row: the numer of data points N.
2nd row: the number of features D.
3rd row: the first column is the label, and following columns are feature names.
N following rows: each has (D+1) columns where the the first column is the label and following D columns are features.

An example file can be found here. There are two dataset that we will work with in this assignment.

data_10k_100.tsv: This dataset contains 10,000 points, each with 100 features.
data_100k_300.tsv: This dataset contains 100,000 points, each with 300 features.

The dataset can be downloaded from here.

Output Format: output must be given in tsv format, with (D+1) columns and two rows:

The first row is the coefficient names of the linear regression model. The first D columns contain w1, w2 up to wD, where wi is the coefficient of the i-th feature. The bias term, named w0, is in the last column.
The second row contains values corresponding to the coefficents of the regression model.

The sample output for the sample dataset above can be downloaded here.

There are three questions in this assigment. The first and second question are worth 10 points each where the third question is worth 30 points, all of 50 points total.

Question 1 (10 points): Implement the algoithm that solves the normal equation to learn linear regression models. For full score, your algorithm must run in less than 1 minutes on the dataset data_100k_300.tsv, with the loss function value less than 70.

Question 2 (10 points): Implement the batch gradient descent algorithm, with T = 200 epochs, learning rate η = 0.000001 (this is 10^-6). For full score, your algorithm must run in less than 5 minutes on the dataset data_10k_100.tsv with loss value less than 270,000 (this is 27x10⁴).

Question 3 (30 points): Implement the stochastic gradient descent algorithm with:

T = 20 epochs, learning rate η = 0.000001 (this is 10^-6) and batch size m = 1 on the dataset data_10k_100.tsv. For full score, your algorithm must run in less than 1 minutes with loss value less than 30.
T = 12 epochs, learning rate η = 0.0000001 (this is 10^-7) and batch size m = 1 on the dataset data_100k_300.tsv. For full score, your algorithm must run in less than 10 minutes with loss value less than 70.

Each part in question 3 is worth 15 points.

Note 1: Submit your code and output data to the Connex

FAQ

Q1: Can I use libarary for computing matrix inversion in Question 1.
Answer: Yes. You are allowed Numpy in question 1. You can also use Numpy for other questions as well.

Q2: How do I initiate the weight vector for gradient descent?
Answer: I initiat the weight vector randomly where each component is drawn from [0,1] randomly using numpy.random.random_sample()

Q3: What loss function should I use?
Answer: For all questions, you should use this loss function:

Programming Assignment 3 Instructions

2020-07-16T00:00:00-07:00

Due by March 04, 2019 11:55 pm

Please note that written homework 3 is up.

Problem Specification

Goal: In this assignment, we will compute PageRank score for the web dataset provided by Google in a programming challenge in a programming constest in 2002.

Input Format: The datasets are given in txt. The file format is:

Rows from 1 to 4: Metadata. They give information about the dataset and are self-explained.
Following rows: each row consists of 2 values represents the link from the web page in the 1st column to the web page in the 2nd column. For example, if the row is 0 11342, this means there is a directed link from the page id 0 to the page id 11324.

There are two dataset that we will work with in this assignment.

web-Google_10k.txt: This dataset contains 10,000 web pages and 78323 links. The dataset can be downloaded from here. DO NOT assume that page ids are from 0 to 10,000.
web-Google.txt: This dataset contains 875,713 web pages and 5,105,039 links. The dataset can be downloaded from here. DO NOT assume that page ids are from 0 to 875,713.

Also, it’s helpful to test your algorithm with this toy dataset.

Output Format: the output format for each quesion will be specified below.

There are two questions in this assigment worth 50 points total.

Question 1 (20 points): Find all dead ends. A node is a dead end if it has no out-going edges or all its outoging edges points to dead ends. For example, consider the graph A->B->C->D. All nodes A,B,C,D are dead ends by this definition. D is a dead end because it has no outgoing edge. C is a dead end because its only out-going neighbor, D, is a dead end. B is a dead end for the same reason, so is A.

(10 points) Find all dead ends of the dataset web-Google_10k.txt. For full score, your algorithm must run in less than 15 seconds. The output must be written to a file named deadends_10k.tsv
(10 points) Find all dead ends of the dataset web-Google_800k.txt. For full score, your algorithm must run in less than 1 minute. The output must be written to a file named deadends_800k.tsv

The output format for Question 1 is single column, where each column is the id of an dead end. See here for a sample output for the toy dataset.

Question 2 (30 points): Implement the PageRank algorithm for both datasets. The taxation parameter for both dataset is β = 0.85 and the number of PageRank iterations is T = 10.

(15 points)Run your algorithm for web-Google_10k.txt dataset. For full score, your algorithm must run in less than 30 seconds. The output must be written to a file named PR_10k.tsv
(15 points)Run your algorithm for web-Google.txt dataset. For full score, your algorithm must run in less than 2 minutes. The output must be written to a file named PR_800k.tsv

The output format for Question 2 is two-column:

The first column is the PageRank score.
The second column is the corresponding web page id.

The output must be sorted by descending order of the PageRank scores.

Here is a sample output for the toy dataset above.

Note 1: Submit your code and output data to the Connex

FAQ

Q1: How do I deal with dead ends?
Answer: I deal with deadend by recursively removing dead ends from the graph until there is no dead end. Then, I calculate the PageRank for the remaining nodes. Upon having the PageRank scores, I update the score for dead ends, by the reverse removing oder. Here I stress that the update order is reverse.

Q2: Do I initiate the PageRank score?
Answer: You should initiate the PageRank score for each page to be the same. Remember that we only run the actual PageRank after removing dead ends. Let’s say the number of pages after removing dead ends is Np, then each node should be initialized a PageRank score of 1.0/Np. It does not matter how do you initialze PageRanke score for dead ends because they are not involved in the actual PageRank calculation.

Q3: How do I know that my calculation is correct?
Answer: Run your algorithm on the sample input, make sure that the order of the pages by the PageRank scores matches with that of the sample output. There may be a slight difference in the PageRanke scores itself (because of round-off error), but the oder of the pages should be unaffected.

Also, check with the following outputs, that I take 10 pages with highest PageRank scores for each dataset:

web-Google_10k.txt: here is a sample output. This data has 1544 dead ends total.
web-Google.txt: here is a sample output. This data has 181057 dead ends total.

Q4: What do I do if I get the out of memory error on 800K dataset?
Answer: It’s probably because you construct a transition matrix to do PageRank computation. This matrix takes about 5TB (not GB) of memory, so it’s is natural that you will run out of memory. The way to get around is using a adjacency list, say L, together with the algorithm in page 21 of my note. For node i, L[i] is the set of nodes that link to i. Also, you should use a degree array D, where D[i] is the out-degree of i. That is, D[i] is the number of links from i to other nodes.

Q5: How do I find dead ends efficiently?
Answer: You probably want to check this out.

Programming Assignment 4 Instructions

2020-07-16T00:00:00-07:00

Due by March 25, 2019 11:55 pm

Problem Specification

Goal: In this assignment, we learn how to factorize the utility matrix to build recommender systems. We will use the MovieLens 100k Dataset. This dataset contains about 100k ratings from n = 943 users and m = 1682 movies. We will factorize the utility matrix into two matrices U, V of dimensions nxd and dxm, respectively, where d = 20.

Input File: Dowload file ml-100k.zip, look for the file name u.data. We only use data in this file to do factorization. DO NOT assume that users and movies are indexed from 0 to n and m, respectively.

Input Format: Each row has four tab-separated columns of the form:

UserId MovieId Rating Timestamp For example, the first line is:

196 242 3 881250949

which means that user 196 gave a rating of 3 to movie 242 at timestamp 881250949. For the matrix factorization approach, we will ignore the timestamp feature. It may be helpful to look at the toy dataset.

Output Format: Two files, named UT.tsv and VT.tsv, correspond to two matrices U and V:

UT.tsv: Each row of the file correspond to each row of the matrix U where the first column is the UserId and d (20 in this assignment) following columns represent the corresponding row of the user in U.
VT.tsv: Each row of the file correspond to each column of the matrix V where the first column is the MovieId and d (20 in this assignment) following columns represent the corresponding column of the movie in V.

See UT.tsv and VT.tsv for sample outputs of the toy dataset with d = 2.

There is only one question worth 50 points.

Question (50 points): Factorize the utility matrix into two matrix U and V. You should run your algorithm with T = 20 iterations. For full score, your algorithm must run in less than 5 minutes with RMSE less than 0.62.

Note 1: Submit your code and output data to the Connex

FAQ

Q1: How do I initialize matrices U and V?
Answer: I initialize entries of U and V by randomly selecting numbers from [0,1] using numpy.random.random_sample().