implement Dataset Reader in Paddle Book
Created by: reyoung
This issue is a part of #1392 (closed).
There are eight books in the current Paddle Book. We need to write eight reader creators for each dataset. There are:
- 新手入门
- 识别数字
- http://scikit-learn.org/stable/datasets/ sklearn.dataset
- 图像分类
- 词向量
- By @wangkuiyi
- It seems we could use
nltk.corpus.treebank
to fetch the exactly dataset used in Book.
- 情感分析
- By @wen-bo-yang
- The IMDB dataset is not included in
nltk.corpus
, but the Amazon reviews is included. So could we change the dataset used in Book?
- 语义角色标注
- By @reyoung
- The book uses CONLL 2005 dataset. The
nltk.corpus
containsCONLL 2000
,CONLL 2002
,CONLL 2007
dataset. Could we changed the dataset intoCONLL 2007
?
- 机器翻译
-
nltk.corpus
containsWMT
dataset.
-
- 个性化推荐