Some thoughts about data reader
Created by: wangkuiyi
After reading this good reference about Iterator, Generator, and Yield, http://stackoverflow.com/questions/231767/what-does-the-yield-keyword-do-in-python, I wrote down the following code snippet for your reference. @qingqing01 @helinwang
def create_text_reader(filename):
f = open(filename, "r")
for line in f:
yield line.split()
f.close()
def make_lookup_reader(reader, word_to_id):
for l in reader:
yield map(word_to_id.get, l)
def make_concat_reader(readers):
for r in readers:
for i in r:
yield i
def make_minibatch_reader(reader, minibatch_size):
yield [reader.next() for x in xrange(minibatch_size)]
print "Read word IDs"
for l in make_lookup_reader(create_text_reader("hello.txt"),
{'first':100,
'second':200,
'third':300,
'line': 1000}):
print l
print "Multi-pass reading"
for p in xrange(3):
for l in create_text_reader("hello.txt"):
print l
print "Concatenate reading"
for l in make_concat_reader([create_text_reader("hello.txt"),
create_text_reader("hello.txt")]):
print l
print "Zip reading"
for l in zip(create_text_reader("hello.txt"),
create_text_reader("hello.txt")):
print l
print "Reading minibatches"
for l in make_minibatch_reader(create_text_reader("hello.txt"), 2):
print l
These demos are for answering questions I listed in Hi yesterday:
- 一个data synthesizer reader怎么写
- text reader只读取一遍。如果要读取多遍怎么写。
- text reader只读取一个instance。如果要读取一个minibatch怎么写?
- text reader没有考虑字典,返回字符串序列。如果要返回单词id怎么写?
- text reader只读取一路数据。如果要读取多路数据(image和label)怎么写?