MovieLens Dataset¶
The MovieLens Dataset was collected by GroupLens Research. The data set contains some user information, movie information, and many movie ratings from [1-5]. The data sets have many version depending on the size of set. We use MovieLens 1M Dataset as a demo dataset, which contains 1 million ratings from 6000 users on 4000 movies. Released 2/2003.
Dataset Features¶
In ml-1m Dataset, there are many features in these dataset. The data files (which have ”.dat” extension) in ml-1m Dataset is basically CSV file that delimiter is ”::”. The description in README we quote here.
RATINGS FILE DESCRIPTION(ratings.dat)¶
All ratings are contained in the file “ratings.dat” and are in the following format:
UserID::MovieID::Rating::Timestamp
- UserIDs range between 1 and 6040
 - MovieIDs range between 1 and 3952
 - Ratings are made on a 5-star scale (whole-star ratings only)
 - Timestamp is represented in seconds since the epoch as returned by time(2)
 - Each user has at least 20 ratings
 
USERS FILE DESCRIPTION(users.dat)¶
User information is in the file “users.dat” and is in the following format:
UserID::Gender::Age::Occupation::Zip-code
All demographic information is provided voluntarily by the users and is not checked for accuracy. Only users who have provided some demographic information are included in this data set.
- Gender is denoted by a “M” for male and “F” for female
 - Age is chosen from the following ranges:
- 1: “Under 18”
 - 18: “18-24”
 - 25: “25-34”
 - 35: “35-44”
 - 45: “45-49”
 - 50: “50-55”
 - 56: “56+”
 
 - Occupation is chosen from the following choices:
- 0: “other” or not specified
 - 1: “academic/educator”
 - 2: “artist”
 - 3: “clerical/admin”
 - 4: “college/grad student”
 - 5: “customer service”
 - 6: “doctor/health care”
 - 7: “executive/managerial”
 - 8: “farmer”
 - 9: “homemaker”
 - 10: “K-12 student”
 - 11: “lawyer”
 - 12: “programmer”
 - 13: “retired”
 - 14: “sales/marketing”
 - 15: “scientist”
 - 16: “self-employed”
 - 17: “technician/engineer”
 - 18: “tradesman/craftsman”
 - 19: “unemployed”
 - 20: “writer”
 
 
MOVIES FILE DESCRIPTION(movies.dat)¶
Movie information is in the file “movies.dat” and is in the following format:
MovieID::Title::Genres
- Titles are identical to titles provided by the IMDB (including year of release)
 - Genres are pipe-separated and are selected from the following genres:
- Action
 - Adventure
 - Animation
 - Children’s
 - Comedy
 - Crime
 - Documentary
 - Drama
 - Fantasy
 - Film-Noir
 - Horror
 - Musical
 - Mystery
 - Romance
 - Sci-Fi
 - Thriller
 - War
 - Western
 
 - Some MovieIDs do not correspond to a movie due to accidental duplicate entries and/or test entries
 - Movies are mostly entered by hand, so errors and inconsistencies may exist
 
