Programming Assignment 4 Instructions

Due by March 25, 2019 11:55 pm



Goal: In this assignment, we learn how to factorize the utility matrix to build recommender systems. We will use the MovieLens 100k Dataset. This dataset contains about 100k ratings from n = 943 users and m = 1682 movies. We will factorize the utility matrix into two matrices U, V of dimensions nxd and dxm, respectively, where d = 20.

Input File: Dowload file ml-100k.zip, look for the file name u.data. We only use data in this file to do factorization. DO NOT assume that users and movies are indexed from 0 to n and m, respectively.

Input Format: Each row has four tab-separated columns of the form:
UserId MovieId Rating Timestamp
For example, the first line is:
196 242 3 881250949
which means that user 196 gave a rating of 3 to movie 242 at timestamp 881250949. For the matrix factorization approach, we will ignore the timestamp feature. It may be helpful to look at the toy dataset.


Output Format: Two files, named UT.tsv and VT.tsv, correspond to two matrices U and V: See UT.tsv and VT.tsv for sample outputs of the toy dataset with d = 2.


There is only one question worth 50 points.

Question (50 points): Factorize the utility matrix into two matrix U and V. You should run your algorithm with T = 20 iterations. For full score, your algorithm must run in less than 5 minutes with RMSE less than 0.62.





Note 1: Submit your code and output data to the Connex




FAQ


Q1: How do I initialize matrices U and V?
Answer: I initialize entries of U and V by randomly selecting numbers from [0,1] using numpy.random.random_sample().


Want to go back to the course overview, click here.