implement Dataset Reader in Paddle Book (#1419) · Issue · PaddlePaddle / Paddle

implement Dataset Reader in Paddle Book

Created by: reyoung

This issue is a part of #1392 (closed).

There are eight books in the current Paddle Book. We need to write eight reader creators for each dataset. There are:

新手入门
识别数字
- http://scikit-learn.org/stable/datasets/ sklearn.dataset
图像分类
- cifar https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
词向量
- By @wangkuiyi
- It seems we could use nltk.corpus.treebank to fetch the exactly dataset used in Book.
情感分析
- By @wen-bo-yang
- The IMDB dataset is not included in nltk.corpus, but the Amazon reviews is included. So could we change the dataset used in Book?
语义角色标注
- By @reyoung
- The book uses CONLL 2005 dataset. The nltk.corpus contains CONLL 2000, CONLL 2002, CONLL 2007 dataset. Could we changed the dataset into CONLL 2007?
机器翻译
- nltk.corpus contains WMT dataset.
个性化推荐