  <div class="section" id="movielens-dataset">
<span id="movielens-dataset"></span><h1>MovieLens Dataset<a class="headerlink" href="#movielens-dataset" title="Permalink to this headline"></a></h1>
<p>The <a class="reference external" href="">MovieLens Dataset</a> was collected by GroupLens Research.
The data set contains some user information, movie information, and many movie ratings from [1-5].
The data sets have many version depending on the size of set.
We use <a class="reference external" href="">MovieLens 1M Dataset</a> as a demo dataset, which contains
1 million ratings from 6000 users on 4000 movies. Released 2/2003.</p>
<div class="section" id="dataset-features">
<span id="dataset-features"></span><h2>Dataset Features<a class="headerlink" href="#dataset-features" title="Permalink to this headline"></a></h2>
<p>In <a class="reference external" href="">ml-1m Dataset</a>, there are many features in these dataset.
The data files (which have &#8221;.dat&#8221; extension) in <a class="reference external" href="">ml-1m Dataset</a>
is basically CSV file that delimiter is &#8221;::&#8221;. The description in README we quote here.</p>
<div class="section" id="ratings-file-description-ratings-dat">
<span id="ratings-file-description-ratings-dat"></span><h3>RATINGS FILE DESCRIPTION(ratings.dat)<a class="headerlink" href="#ratings-file-description-ratings-dat" title="Permalink to this headline"></a></h3>
<p>All ratings are contained in the file &#8220;ratings.dat&#8221; and are in the
following format:</p>
<ul class="simple">
<li>UserIDs range between 1 and 6040</li>
<li>MovieIDs range between 1 and 3952</li>
<li>Ratings are made on a 5-star scale (whole-star ratings only)</li>
<li>Timestamp is represented in seconds since the epoch as returned by time(2)</li>
<li>Each user has at least 20 ratings</li>
<div class="section" id="users-file-description-users-dat">
<span id="users-file-description-users-dat"></span><h3>USERS FILE DESCRIPTION(users.dat)<a class="headerlink" href="#users-file-description-users-dat" title="Permalink to this headline"></a></h3>
<p>User information is in the file &#8220;users.dat&#8221; and is in the following
<p>All demographic information is provided voluntarily by the users and is
not checked for accuracy.  Only users who have provided some demographic
information are included in this data set.</p>
<ul class="simple">
<li>Gender is denoted by a &#8220;M&#8221; for male and &#8220;F&#8221; for female</li>
<li>Age is chosen from the following ranges:<ul>
<li>1:  &#8220;Under 18&#8221;</li>
<li>18:  &#8220;18-24&#8221;</li>
<li>25:  &#8220;25-34&#8221;</li>
<li>35:  &#8220;35-44&#8221;</li>
<li>45:  &#8220;45-49&#8221;</li>
<li>50:  &#8220;50-55&#8221;</li>
<li>56:  &#8220;56+&#8221;</li>
<li>Occupation is chosen from the following choices:<ul>
<li>0:  &#8220;other&#8221; or not specified</li>
<li>1:  &#8220;academic/educator&#8221;</li>
<li>2:  &#8220;artist&#8221;</li>
<li>3:  &#8220;clerical/admin&#8221;</li>
<li>4:  &#8220;college/grad student&#8221;</li>
<li>5:  &#8220;customer service&#8221;</li>
<li>6:  &#8220;doctor/health care&#8221;</li>
<li>7:  &#8220;executive/managerial&#8221;</li>
<li>8:  &#8220;farmer&#8221;</li>
<li>9:  &#8220;homemaker&#8221;</li>
<li>10:  &#8220;K-12 student&#8221;</li>
<li>11:  &#8220;lawyer&#8221;</li>
<li>12:  &#8220;programmer&#8221;</li>
<li>13:  &#8220;retired&#8221;</li>
<li>14:  &#8220;sales/marketing&#8221;</li>
<li>15:  &#8220;scientist&#8221;</li>
<li>16:  &#8220;self-employed&#8221;</li>
<li>17:  &#8220;technician/engineer&#8221;</li>
<li>18:  &#8220;tradesman/craftsman&#8221;</li>
<li>19:  &#8220;unemployed&#8221;</li>
<li>20:  &#8220;writer&#8221;</li>
<div class="section" id="movies-file-description-movies-dat">
<span id="movies-file-description-movies-dat"></span><h3>MOVIES FILE DESCRIPTION(movies.dat)<a class="headerlink" href="#movies-file-description-movies-dat" title="Permalink to this headline"></a></h3>
<p>Movie information is in the file &#8220;movies.dat&#8221; and is in the following
<ul class="simple">
<li>Titles are identical to titles provided by the IMDB (including
year of release)</li>
<li>Genres are pipe-separated and are selected from the following genres:<ul>
<li>Some MovieIDs do not correspond to a movie due to accidental duplicate
entries and/or test entries</li>
<li>Movies are mostly entered by hand, so errors and inconsistencies may exist</li>

