Programming Assignment 4 Instructions

2 minute read

Published:

Due by March 25, 2019 11:55 pm

Problem Specification

Goal: In this assignment, we learn how to factorize the utility matrix to build recommender systems. We will use the MovieLens 100k Dataset. This dataset contains about 100k ratings from n = 943 users and m = 1682 movies. We will factorize the utility matrix into two matrices U, V of dimensions nxd and dxm, respectively, where d = 20.

Input File: Dowload file ml-100k.zip, look for the file name u.data. We only use data in this file to do factorization. DO NOT assume that users and movies are indexed from 0 to n and m, respectively.

Input Format: Each row has four tab-separated columns of the form:

UserId MovieId Rating Timestamp
For example, the first line is:

196 242 3 881250949

which means that user 196 gave a rating of 3 to movie 242 at timestamp 881250949. For the matrix factorization approach, we will ignore the timestamp feature. It may be helpful to look at the toy dataset.


Output Format: Two files, named UT.tsv and VT.tsv, correspond to two matrices U and V:

  • UT.tsv: Each row of the file correspond to each row of the matrix U where the first column is the UserId and d (20 in this assignment) following columns represent the corresponding row of the user in U.
  • VT.tsv: Each row of the file correspond to each column of the matrix V where the first column is the MovieId and d (20 in this assignment) following columns represent the corresponding column of the movie in V.

See UT.tsv and VT.tsv for sample outputs of the toy dataset with d = 2.


There is only one question worth 50 points.

Question (50 points): Factorize the utility matrix into two matrix U and V. You should run your algorithm with T = 20 iterations. For full score, your algorithm must run in less than 5 minutes with RMSE less than 0.62.





Note 1: Submit your code and output data to the Connex

FAQ

Q1: How do I initialize matrices U and V?
Answer: I initialize entries of U and V by randomly selecting numbers from [0,1] using numpy.random.random_sample().