demo yelp

aff3321e · daminglu · 50eff4ac · aff3321e · aff3321e · aff3321e
5 changed file
--- a/06.understand_sentiment/100k_reviews.json
+++ b/06.understand_sentiment/100k_reviews.json
--- a/06.understand_sentiment/10_reviews.json
+++ b/06.understand_sentiment/10_reviews.json
+{"review_id":"v0i_UHJMo_hPBq9bxWvW4w","user_id":"bv2nCi5Qv5vroFiqKGopiw","business_id":"0W4lkclzZThpx3V65bVgig","stars":5,"date":"2016-05-28","text":"Love the staff, love the meat, love the place. Prepare for a long line around lunch or dinner hours. \n\nThey ask you how you want you meat, lean or something maybe, I can't remember. Just say you don't want it too fatty. \n\nGet a half sour pickle and a hot pepper. Hand cut french fries too.","useful":0,"funny":0,"cool":0}
+{"review_id":"vkVSCC7xljjrAI4UGfnKEQ","user_id":"bv2nCi5Qv5vroFiqKGopiw","business_id":"AEx2SYEUJmTxVVB18LlCwA","stars":5,"date":"2016-05-28","text":"Super simple place but amazing nonetheless. It's been around since the 30's and they still serve the same thing they started with: a bologna and salami sandwich with mustard. \n\nStaff was very helpful and friendly.","useful":0,"funny":0,"cool":0}
+{"review_id":"n6QzIUObkYshz4dz2QRJTw","user_id":"bv2nCi5Qv5vroFiqKGopiw","business_id":"VR6GpWIda3SfvPC-lg9H3w","stars":5,"date":"2016-05-28","text":"Small unassuming place that changes their menu every so often. Cool decor and vibe inside their 30 seat restaurant. Call for a reservation. \n\nWe had their beef tartar and pork belly to start and a salmon dish and lamb meal for mains. Everything was incredible! I could go on at length about how all the listed ingredients really make their dishes amazing but honestly you just need to go. \n\nA bit outside of downtown montreal but take the metro out and it's less than a 10 minute walk from the station.","useful":0,"funny":0,"cool":0}
+{"review_id":"MV3CcKScW05u5LVfF6ok0g","user_id":"bv2nCi5Qv5vroFiqKGopiw","business_id":"CKC0-MOWMqoeWf6s-szl8g","stars":5,"date":"2016-05-28","text":"Lester's is located in a beautiful neighborhood and has been there since 1951. They are known for smoked meat which most deli's have but their brisket sandwich is what I come to montreal for. They've got about 12 seats outside to go along with the inside. \n\nThe smoked meat is up there in quality and taste with Schwartz's and you'll find less tourists at Lester's as well.","useful":0,"funny":0,"cool":0}
+{"review_id":"IXvOzsEMYtiJI0CARmj77Q","user_id":"bv2nCi5Qv5vroFiqKGopiw","business_id":"ACFtxLv8pGrrxMm6EgjreA","stars":4,"date":"2016-05-28","text":"Love coming here. Yes the place always needs the floor swept but when you give out  peanuts in the shell how won't it always be a bit dirty. \n\nThe food speaks for itself, so good. Burgers are made to order and the meat is put on the grill when you order your sandwich. Getting the small burger just means 1 patty, the regular is a 2 patty burger which is twice the deliciousness. \n\nGetting the Cajun fries adds a bit of spice to them and whatever size you order they always throw more fries (a lot more fries) into the bag.","useful":0,"funny":0,"cool":0}
+{"review_id":"L_9BTb55X0GDtThi6GlZ6w","user_id":"bv2nCi5Qv5vroFiqKGopiw","business_id":"s2I_Ni76bjJNK9yG60iD-Q","stars":4,"date":"2016-05-28","text":"Had their chocolate almond croissant and it was amazing! So light and buttery and oh my how chocolaty.\n\nIf you're looking for a light breakfast then head out here. Perfect spot for a coffee\/latté before heading out to the old port","useful":0,"funny":0,"cool":0}
+{"review_id":"HRPm3vEZ_F-33TYVT7Pebw","user_id":"_4iMDXbXZ1p1ONG297YEAQ","business_id":"8QWPlVQ6D-OExqXoaD2Z1g","stars":5,"date":"2014-09-24","text":"Cycle Pub Las Vegas was a blast! Got a groupon and rented the bike for 11 of us for an afternoon tour. Each bar was more fun than the last. Downtown Las Vegas has changed so much and for the better. We had a wide age range in this group from early 20's to mid 50's and everyone had so much fun! Our driver Tony was knowledgable , friendly and just plain fun! Would recommend this to anyone looking to do something different away from the strip. You won't be disappointed!","useful":1,"funny":0,"cool":0}
+{"review_id":"ymAUG8DZfQcFTBSOiaNN4w","user_id":"u0LXt3Uea_GidxRW1xcsfg","business_id":"9_CGhHMz8698M9-PkVf0CQ","stars":4,"date":"2012-05-11","text":"Who would have guess that you would be able to get fairly decent Vietnamese restaurant in East York? \n\nNot quite the same as Chinatown in terms of pricing (slightly higher) but definitely one of the better Vietnamese restaurants outside of the neighbourhood. When I don't have time to go to Chinatown, this is the next best thing as it is down the street from me.\n\nSo far the only items I have tried are the phos (beef, chicken & vegetarian) - and they have not disappointed me! Especially the chicken pho.\n\nNext time I go back, I'm going to try the banh cuon (steamed rice noodle) and the vermicelli!","useful":0,"funny":0,"cool":2}
+{"review_id":"8UIishPUD92hXtScSga_gw","user_id":"u0LXt3Uea_GidxRW1xcsfg","business_id":"gkCorLgPyQLsptTHalL61g","stars":4,"date":"2015-10-27","text":"Always drove past this coffee house and wondered about it. BF and I finally made the stop to try this place out.\n\nCute, quaint coffee shop with nice muskoka chairs outside. \n\nBF ordered an ice coffee and really enjoyed it! Guess we will be back again!","useful":1,"funny":0,"cool":0}
+{"review_id":"w41ZS9shepfO3uEyhXEWuQ","user_id":"u0LXt3Uea_GidxRW1xcsfg","business_id":"5r6-G9C4YLbC7Ziz57l3rQ","stars":3,"date":"2013-02-09","text":"Not bad!! Love that there is a gluten-free, vegan version of the cheese curds and gravy!!\n\nHaven't done the poutine taste test yet with smoke's but Im excited to see which is better. However poutini's might win as they are vegan and gluten-free","useful":1,"funny":0,"cool":0}
--- a/06.understand_sentiment/50_reviews.json
+++ b/06.understand_sentiment/50_reviews.json
--- a/06.understand_sentiment/yelp.py
+++ b/06.understand_sentiment/yelp.py
+# Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserved
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+Yelp dataset.
+
+This module downloads IMDB dataset from
+http://ai.stanford.edu/%7Eamaas/data/sentiment/. This dataset contains a set
+of 25,000 highly polar movie reviews for training, and 25,000 for testing.
+Besides, this module also provides API for building dictionary.
+"""
+
+import collections
+import string
+import re
+import json
+import unicodedata
+
+
+def lazy_read(filename):
+    with open(filename) as fp:
+        while True:
+            line = fp.readline()
+            if not line:
+                fp.close()
+                break
+
+            parsed_json = json.loads(line)
+            if 'text' not in parsed_json:
+                continue
+
+            # we do not learn neutral ratings, only neg (1,2) and pos (4,5)
+            if 'stars' not in parsed_json or parsed_json['stars'] == 3:
+                continue
+
+            label = 0
+            if parsed_json['stars'] > 3:
+                label = 1
+
+            text = parsed_json['text']
+            text = unicodedata.normalize('NFKD', text).encode('ascii','ignore')
+            tokenized = text.rstrip("\n\r").translate(None, string.punctuation).lower().split()
+
+            yield (tokenized, label)
+
+
+def build_dict(filename, cutoff=1):
+    """
+    Build a word dictionary from the corpus. Keys of the dictionary are words,
+    and values are zero-based IDs of these words.
+    """
+    word_freq = collections.defaultdict(int)
+    for doc in lazy_read(filename):
+        for word in doc[0]:
+            word_freq[word] += 1
+
+    # Not sure if we should prune less-frequent words here.
+    word_freq = filter(lambda x: x[1] > cutoff, word_freq.items())
+
+    dictionary = sorted(word_freq, key=lambda x: (-x[1], x[0]))
+    words, _ = list(zip(*dictionary))
+    word_idx = dict(zip(words, xrange(len(words))))
+    word_idx['<unk>'] = len(words)
+    return word_idx
+
+
+def word_dict(filename):
+    """
+    Build a word dictionary from the corpus.
+
+    :return: Word dictionary
+    :rtype: dict
+    """
+    built_word_dict = build_dict(filename)
+    # import pdb;pdb.set_trace()
+    return built_word_dict
+
+
+def reader_creator(word_idx, filename):
+    UNK = word_idx['<unk>']
+    INS = []
+
+    def load(filename, out):
+        for doc in lazy_read(filename):
+            out.append(([word_idx.get(w, UNK) for w in doc[0]], doc[1]))
+
+    load(filename, INS)
+
+    def reader():
+        for doc, label in INS:
+            yield doc, label
+
+    return reader
+
+
+def train(word_idx, filename):
+    """
+    IMDB training set creator.
+
+    It returns a reader creator, each sample in the reader is an zero-based ID
+    sequence and label in [0, 1].
+
+    :param word_idx: word dictionary
+    :type word_idx: dict
+    :return: Training reader creator
+    :rtype: callable
+    """
+    return reader_creator(word_idx, filename)
+
+
+def test(word_idx, filename):
+    """
+    IMDB test set creator.
+
+    It returns a reader creator, each sample in the reader is an zero-based ID
+    sequence and label in [0, 1].
+
+    :param word_idx: word dictionary
+    :type word_idx: dict
+    :return: Test reader creator
+    :rtype: callable
+    """
+    return reader_creator(word_idx, filename)
+
+
--- a/06.understand_sentiment/yelp_conv.py
+++ b/06.understand_sentiment/yelp_conv.py
+# Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import print_function
+
+import time
+start = time.time()
+
+
+import os
+import paddle
+import paddle.fluid as fluid
+from functools import partial
+import numpy as np
+
+CLASS_DIM = 2
+EMB_DIM = 128
+HID_DIM = 512
+BATCH_SIZE = 100000
+
+
+TRAIN_FILE = '/paddle/daming_paddle_lab/book/06.understand_sentiment/100k_reviews.json'
+TEST_FILE =  '/paddle/daming_paddle_lab/book/06.understand_sentiment/10_reviews.json'
+TEST_FILE =  '/paddle/daming_paddle_lab/book/06.understand_sentiment/100k_reviews_test.json'
+
+
+def convolution_net(data, input_dim, class_dim, emb_dim, hid_dim):
+    emb = fluid.layers.embedding(
+        input=data, size=[input_dim, emb_dim], is_sparse=True)
+    conv_3 = fluid.nets.sequence_conv_pool(
+        input=emb,
+        num_filters=hid_dim,
+        filter_size=3,
+        act="tanh",
+        pool_type="sqrt")
+    conv_4 = fluid.nets.sequence_conv_pool(
+        input=emb,
+        num_filters=hid_dim,
+        filter_size=4,
+        act="tanh",
+        pool_type="sqrt")
+    prediction = fluid.layers.fc(
+        input=[conv_3, conv_4], size=class_dim, act="softmax")
+    return prediction
+
+
+def inference_program(word_dict):
+    data = fluid.layers.data(
+        name="words", shape=[1], dtype="int64", lod_level=1)
+
+    dict_dim = len(word_dict)
+    net = convolution_net(data, dict_dim, CLASS_DIM, EMB_DIM, HID_DIM)
+    return net
+
+
+def train_program(word_dict):
+    prediction = inference_program(word_dict)
+    label = fluid.layers.data(name="label", shape=[1], dtype="int64")
+    cost = fluid.layers.cross_entropy(input=prediction, label=label)
+    avg_cost = fluid.layers.mean(cost)
+    accuracy = fluid.layers.accuracy(input=prediction, label=label)
+    return [avg_cost, accuracy]
+
+
+def optimizer_func():
+    return fluid.optimizer.Adagrad(learning_rate=0.002)
+
+
+def train(use_cuda, train_program, params_dirname):
+    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
+    print("Loading IMDB word dict....")
+    
+    word_dict = paddle.dataset.yelp.word_dict(TRAIN_FILE)
+
+    print("Reading training data....")
+    train_reader = paddle.batch(
+        paddle.dataset.yelp.train(word_dict, TRAIN_FILE),batch_size=BATCH_SIZE)
+
+    print("Reading testing data....")
+    test_reader = paddle.batch(
+        paddle.dataset.yelp.test(word_dict, TEST_FILE), batch_size=BATCH_SIZE)
+
+    trainer = fluid.Trainer(
+        train_func=partial(train_program, word_dict),
+        place=place,
+        optimizer_func=optimizer_func)
+
+    feed_order = ['words', 'label']
+
+    def event_handler(event):
+        if isinstance(event, fluid.EndStepEvent):
+            if event.step % 10 == 0:
+                import pdb;pdb.set_trace()
+                avg_cost, acc = trainer.test(
+                    reader=test_reader, feed_order=feed_order)
+
+                print('Step {0}, Test Loss {1:0.2}, Acc {2:0.2}'.format(
+                    event.step, avg_cost, acc))
+
+                print("Step {0}, Epoch {1} Metrics {2}".format(
+                    event.step, event.epoch, map(np.array, event.metrics)))
+
+        elif isinstance(event, fluid.EndEpochEvent):
+            trainer.save_params(params_dirname)
+
+    trainer.train(
+        num_epochs=1,
+        event_handler=event_handler,
+        reader=train_reader,
+        feed_order=feed_order)
+
+
+def infer(use_cuda, inference_program, params_dirname=None):
+    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
+    import pdb;pdb.set_trace()
+    word_dict = paddle.dataset.yelp.word_dict(TEST_FILE)
+
+    inferencer = fluid.Inferencer(
+        infer_func=partial(inference_program, word_dict),
+        param_path=params_dirname,
+        place=place)
+
+    # Setup input by creating LoDTensor to represent sequence of words.
+    # Here each word is the basic element of the LoDTensor and the shape of 
+    # each word (base_shape) should be [1] since it is simply an index to 
+    # look up for the corresponding word vector.
+    # Suppose the length_based level of detail (lod) info is set to [[3, 4, 2]],
+    # which has only one lod level. Then the created LoDTensor will have only 
+    # one higher level structure (sequence of words, or sentence) than the basic 
+    # element (word). Hence the LoDTensor will hold data for three sentences of 
+    # length 3, 4 and 2, respectively. 
+    # Note that lod info should be a list of lists.
+
+    reviews_str = [
+        'Happy to find this hidden gem near my office.',  # 5
+        'I bought a photo book as a gift for my mom, and was told it would take 2 weeks to ship.',  # 1
+    ]
+    reviews = [c.split() for c in reviews_str]
+
+    UNK = word_dict['<unk>']
+    lod = []
+    for c in reviews:
+        lod.append([word_dict.get(words, UNK) for words in c])
+
+    base_shape = [[len(c) for c in lod]]
+
+    tensor_words = fluid.create_lod_tensor(lod, base_shape, place)
+    results = inferencer.infer({'words': tensor_words})
+
+    for i, r in enumerate(results[0]):
+        print("Predict probability of ", r[0], " to be positive and ", r[1],
+              " to be negative for review \'", reviews_str[i], "\'")
+
+
+def main(use_cuda):
+    if use_cuda and not fluid.core.is_compiled_with_cuda():
+        return
+    params_dirname = "understand_sentiment_conv.inference.model"
+    import pdb;pdb.set_trace()
+    train(use_cuda, train_program, params_dirname)
+    infer(use_cuda, inference_program, params_dirname)
+
+    finish = time.time()
+    elapsed = finish - start
+    print(elapsed)
+
+
+if __name__ == '__main__':
+    use_cuda = False # set to True if training with GPU
+    use_cuda = True # set to True if training with GPU
+    main(use_cuda)
+
+