Programming Assignment 4 Instructions
Published:
Due by March 25, 2019 11:55 pm
Problem Specification
Goal: In this assignment, we learn how to factorize the utility matrix to build recommender systems. We will use the MovieLens 100k Dataset. This dataset contains about 100k ratings from n = 943 users and m = 1682 movies. We will factorize the utility matrix into two matrices U, V of dimensions nxd and dxm, respectively, where d = 20.
Input File: Dowload file ml-100k.zip, look for the file name u.data. We only use data in this file to do factorization. DO NOT assume that users and movies are indexed from 0 to n and m, respectively.
Input Format: Each row has four tab-separated columns of the form:
which means that user 196 gave a rating of 3 to movie 242 at timestamp 881250949. For the matrix factorization approach, we will ignore the timestamp feature. It may be helpful to look at the toy dataset.
Output Format: Two files, named UT.tsv and VT.tsv, correspond to two matrices U and V:
- UT.tsv: Each row of the file correspond to each row of the matrix U where the first column is the UserId and d (20 in this assignment) following columns represent the corresponding row of the user in U.
- VT.tsv: Each row of the file correspond to each column of the matrix V where the first column is the MovieId and d (20 in this assignment) following columns represent the corresponding column of the movie in V.
See UT.tsv and VT.tsv for sample outputs of the toy dataset with d = 2.
There is only one question worth 50 points.
Question (50 points): Factorize the utility matrix into two matrix U and V. You should run your algorithm with T = 20 iterations. For full score, your algorithm must run in less than 5 minutes with RMSE less than 0.62.
Note 1: Submit your code and output data to the Connex
FAQ
Q1: How do I initialize matrices U and V?
Answer: I initialize entries of U and V by randomly selecting numbers from [0,1] using numpy.random.random_sample().