diff --git a/language_model/README.md b/language_model/README.md
deleted file mode 100644
index 35350f10adea819ee2a26084ff7931ff82c87880..0000000000000000000000000000000000000000
--- a/language_model/README.md
+++ /dev/null
@@ -1,248 +0,0 @@
-# 语言模型
-## 简介
-语言模型即 Language Model,简称LM,它是一个概率分布模型,简单来说,就是用来计算一个句子的概率的模型。给定句子(词语序列):
-
-它的概率可以表示为:
-
(式1)
-
-语言模型可以计算(式1)中的P(S)及其中间结果。**利用它可以确定哪个词序列的可能性更大,或者给定若干个词,可以预测下一个最可能出现的词语。**
-
-## 应用场景
-**语言模型被应用在多个领域**,如:
-
-* **自动写作**:语言模型可以根据上文生成下一个词,递归下去可以生成整个句子、段落、篇章。
-* **QA**:语言模型可以根据Question生成Answer。
-* **机器翻译**:当前主流的机器翻译模型大多基于Encoder-Decoder模式,其中Decoder就是一个语言模型,用来生成目标语言。
-* **拼写检查**:语言模型可以计算出词语序列的概率,一般在拼写错误处序列的概率会骤减,可以用来识别拼写错误并提供改正候选集。
-* **词性标注、句法分析、语音识别......**
-
-## 关于本例
-Language Model 常见的实现方式有 N-Gram、RNN、seq2seq。本例中实现了基于N-Gram、RNN的语言模型。**本例的文件结构如下**:
-
-* data_util.py:实现了对语料的读取以及词典的建立、保存和加载。
-* lm_rnn.py:实现了基于rnn的语言模型的定义、训练以及做预测。
-* lm_ngram.py:实现了基于n-gram的语言模型的定义、训练以及做预测。
-
-**注:***一般情况下基于N-Gram的语言模型不如基于RNN的语言模型效果好,所以实际使用时建议使用基于RNN的语言模型,本例中也将着重介绍基于RNN的模型,简略介绍基于N-Gram的模型。*
-
-## RNN 语言模型
-### 简介
-
-RNN是一个序列模型,基本思路是:在时刻t,将前一时刻t-1的隐藏层输出ht-1和t时刻的词向量xt一起输入到隐藏层从而得到时刻t的特征表示ht,然后用这个特征表示得到t时刻的预测输出ŷ ,如此在时间维上递归下去,如下图所示:
-
-
-
-可以看出RNN善于使用上文信息、历史知识,具有“记忆”功能。理论上RNN能实现“长依赖”(即利用很久之前的知识),但在实际应用中发现效果并不理想,于是出现了很多RNN的变种,如常用的LSTM和GRU,它们对传统RNN的cell进行了改进,弥补了RNN的不足,下图是LSTM的示意图:
-
-
-
-本例中即使用了LSTM、GRU。
-
-### 模型结构
-
-lm_rnn.py 中的 lm() 函数定义了模型的结构。解析如下:
-
-* 1,首先,在\_\_main\_\_中定义了模型的参数变量。
-
- ```python
- # -- config : model --
- rnn_type = 'gru' # or 'lstm'
- emb_dim = 200
- hidden_size = 200
- num_passs = 2
- num_layer = 2
-
- ```
- 其中 rnn\_type 用于配置rnn cell类型,可以取‘lstm’或‘gru’;hidden\_size配置unit个数;num\_layer配置RNN的层数;num\_passs配置训练的轮数;emb_dim配置embedding的dimension。
-
-* 2,将输入的词(或字)序列映射成向量,即embedding。
-
- ```python
- data = paddle.layer.data(name="word", type=paddle.data_type.integer_value_sequence(vocab_size))
- target = paddle.layer.data("label", paddle.data_type.integer_value_sequence(vocab_size))
- emb = paddle.layer.embedding(input=data, size=emb_dim)
-
- ```
-* 3,根据配置实现RNN层,将上一步得到的embedding向量序列作为输入。
-
- ```python
- if rnn_type == 'lstm':
- rnn_cell = paddle.networks.simple_lstm(
- input=emb, size=hidden_size)
- for _ in range(num_layer - 1):
- rnn_cell = paddle.networks.simple_lstm(
- input=rnn_cell, size=hidden_size)
- elif rnn_type == 'gru':
- rnn_cell = paddle.networks.simple_gru(
- input=emb, size=hidden_size)
- for _ in range(num_layer - 1):
- rnn_cell = paddle.networks.simple_gru(
- input=rnn_cell, size=hidden_size)
- ```
-* 4,实现输出层(使用softmax归一化计算单词的概率,将output结果返回)、定义模型的cost(多类交叉熵损失函数)。
-
- ```python
- # fc and output layer
- output = paddle.layer.fc(input=[rnn_cell], size=vocab_size, act=paddle.activation.Softmax())
-
- # loss
- cost = paddle.layer.classification_cost(input=output, label=target)
- ```
-
-### 训练模型
-
-lm\_rnn.py 中的 train() 方法实现了模型的训练,流程如下:
-
-* 1,准备输入数据:本例中使用的是标准PTB数据,调用data\_util.py中的build\_vocab()方法建立词典,并使用save\_vocab()方法将词典持久化,以备复用(当语料量大时生成词典比较耗时,所以这里把第一次生成的词典保存下来复用)。然后使用data\_util.py中的train\_data()、test\_data()方法建立train\_reader和test\_reader用来实现对train数据和test数据的读取。
-
-* 2,初始化模型:包括模型的结构、参数、优化器(demo中使用的是Adam)以及训练器trainer。如下:
-
- ```python
- # network config
- cost, _ = lm(len(word_id_dict), emb_dim, rnn_type, hidden_size, num_layer)
-
- # create parameters
- parameters = paddle.parameters.create(cost)
-
- # create optimizer
- adam_optimizer = paddle.optimizer.Adam(
- learning_rate=1e-3,
- regularization=paddle.optimizer.L2Regularization(rate=1e-3),
- model_average=paddle.optimizer.ModelAverage(average_window=0.5))
-
- # create trainer
- trainer = paddle.trainer.SGD(
- cost=cost, parameters=parameters, update_equation=adam_optimizer)
-
- ```
-
-* 3,定义回调函数event_handler来跟踪训练过程中loss的变化,并在每轮时结束保存模型的参数:
-
- ```python
- # define event_handler callback
- def event_handler(event):
- if isinstance(event, paddle.event.EndIteration):
- if event.batch_id % 100 == 0:
- print("\nPass %d, Batch %d, Cost %f, %s" % (
- event.pass_id, event.batch_id, event.cost,
- event.metrics))
- else:
- sys.stdout.write('.')
- sys.stdout.flush()
-
- # save model each pass
- if isinstance(event, paddle.event.EndPass):
- result = trainer.test(reader=ptb_reader)
- print("\nTest with Pass %d, %s" % (event.pass_id, result.metrics))
- with gzip.open(model_file_name_prefix + str(event.pass_id) + '.tar.gz', 'w') as f:
- parameters.to_tar(f)
- ```
-
-* 4,开始train模型:
-
- ```python
- trainer.train(
- reader=ptb_reader, event_handler=event_handler, num_passes=num_passs)
- ```
-
-### 生成文本
-lm\_rnn.py中的predict()方法实现了做prediction、生成文本。流程如下:
-
-* 1,首先加载并缓存词典和模型,其中加载train好的模型参数方法如下:
-
- ```python
- parameters = paddle.parameters.Parameters.from_tar(gzip.open(model_file_name))
- ```
-
-* 2,生成文本,本例中生成文本的方式是启发式图搜索算法beam search,即lm\_rnn.py中的\_generate\_with\_beamSearch()方法。
-
-### 使用此demo
-
-本例中使用的是标准的PTB数据,如果用户要实现自己的model,则只需要做如下适配工作:
-
-#### 语料适配
-* 清洗语料:去除空格、tab、乱码,根据需要去除数字、标点符号、特殊符号等。
-* 编码格式:utf-8,本例中已经对中文做了适配。
-* 内容格式:每个句子占一行;每行中的各词之间使用一个空格分开。
-* 按需要配置lm\_rnn.py中\_\_main\_\_函数中对于data的配置:
-
- ```python
- # -- config : data --
- train_file = 'data/ptb.train.txt'
- test_file = 'data/ptb.test.txt'
- vocab_file = 'data/vocab_cn.txt' # the file to save vocab
- vocab_max_size = 3000
- min_sentence_length = 3
- max_sentence_length = 60
-
- ```
- 其中,vocab\_max\_size定义了词典的最大长度,如果语料中出现的不同词的个数大于这个值,则根据各词的词频倒序排,取top(vocab\_max\_size)个词纳入词典。
-
- *注:需要注意的是词典越大生成的内容越丰富但训练耗时越久,一般中文分词之后,语料中不同的词能有几万乃至几十万,如果vocab\_max\_size取值过小则导致\占比过高,如果vocab\_max\_size取值较大则严重影响训练速度(对精度也有影响),所以也有“按字”训练模型的方式,即:把每个汉字当做一个词,常用汉字也就几千个,使得字典的大小不会太大、不会丢失太多信息,但汉语中同一个字在不同词中语义相差很大,有时导致模型效果不理想。建议用户多试试、根据实际情况选择是“按词训练”还是“按字训练”。*
-
-#### 模型适配
-
-根据语料的大小按需调整模型的\_\_main\_\_中定义的参数。
-
-然后运行 python lm_rnn.py即可训练模型、做prediction。
-
-## n-gram 语言模型
-
-
-
-n-gram模型也称为n-1阶马尔科夫模型,它有一个有限历史假设:当前词的出现概率仅仅与前面n-1个词相关。因此 (式1) 可以近似为:
-
-一般采用最大似然估计(Maximum Likelihood Estimation,MLE)的方法对模型的参数进行估计。当n取1、2、3时,n-gram模型分别称为unigram、bigram和trigram语言模型。一般情况下,n越大、训练语料的规模越大,参数估计的结果越可靠,但由于模型较简单、表达能力不强以及数据稀疏等问题。一般情况下用n-gram实现的语言模型不如rnn、seq2seq效果好。
-
-### 模型结构
-
-lm\_ngram.py中的lm()定义了模型的结构,大致如下:
-
-* 1,demo中n取5,将前四个词分别做embedding,然后连接起来作为特征向量。
-* 2,后接DNN的hidden layer。
-* 3,将DNN的输出通过softmax layer做分类,得到下个词在词典中的概率分布。
-* 4,模型的loss采用交叉熵,用Adam optimizer对loss做优化。
-
-图示如下:
-
-
-### 模型训练
-
-lm\_ngram.py中的train()方法实现了模型的训练,过程和RNN LM类似,简介如下:
-
-* 1,准备输入数据:使用的是标准PTB数据,调用data\_util.py中的build\_vocab()方法建立词典,并使用save\_vocab()方法将词典持久化,使用data\_util.py中的train\_data()、test\_data()方法建立train\_reader和test\_reader用来实现对train数据和test数据的读取。
-* 2,初始化模型:包括模型的结构、参数、优化器(demo中使用的是Adam)以及trainer。
-* 3,定义回调函数event_handler来跟踪训练过程中loss的变化,并在每轮时结束保存模型的参数。
-* 4,使用trainer开始train模型。
-
-### 生成文本
-lm\_ngram.py中的\_\_main\_\_方法中对prediction(生成文本)做了简单的实现。流程如下:
-
-* 1,首先加载词典和模型:
-
- ```python
- # prepare model
- word_id_dict = reader.load_vocab(vocab_file) # load word dictionary
- _, output_layer = lm(len(word_id_dict), emb_dim, hidden_size, num_layer) # network config
- model_file_name = model_file_name_prefix + str(num_passs - 1) + '.tar.gz'
-parameters = paddle.parameters.Parameters.from_tar(gzip.open(model_file_name)) # load parameters
- ```
-
-* 2,根据4(n-1)个词的上文预测下一个单词并打印:
-
- ```python
- # generate
- text = 'the end of the' # use 4 words to predict the 5th word
- input = [[word_id_dict.get(w, word_id_dict['']) for w in text.split()]]
- predictions = paddle.infer(
- output_layer=output_layer,
- parameters=parameters,
- input=input,
- field=['value'])
- id_word_dict = dict([(v, k) for k, v in word_id_dict.items()]) # dictionary with type {id : word}
- predictions[-1][word_id_dict['']] = -1 # filter
- next_word = id_word_dict[np.argmax(predictions[-1])]
- print(next_word.encode('utf-8'))
- ```
-
- *注:这里展示了另一种做预测的方法,即使用paddle.infer方法。RNN的实例中使用的是paddle.inference.Inference接口。*
\ No newline at end of file
diff --git a/language_model/data/vocab_ptb.txt b/language_model/data/vocab_ptb.txt
new file mode 100644
index 0000000000000000000000000000000000000000..7b03f36e5f1fd31c97612e3e7e0f2c544ef2bfaa
--- /dev/null
+++ b/language_model/data/vocab_ptb.txt
@@ -0,0 +1,3002 @@
+limited 791
+consolidated 2482
+four 347
+facilities 1200
+asian 2798
+controversial 2177
+whose 623
+votes 2089
+founder 2229
+paris 1721
+adviser 1759
+edward 2090
+voted 1935
+under 125
+worth 977
+placed 1677
+merchant 2565
+pact 2130
+risk 647
+rise 498
+sellers 2851
+handling 2476
+every 539
+jack 1722
+reforms 2309
+affect 1968
+bringing 2469
+lehman 1238
+believed 1542
+school 722
+calif 2386
+companies 102
+wednesday 910
+van 2897
+announced 412
+pilson 2915
+expanded 2427
+force 534
+leaders 818
+miller 2247
+guidelines 1795
+estimates 784
+japanese 174
+elections 1720
+second 335
+street 323
+estimated 453
+machines 753
+even 114
+established 1755
+disk 2826
+pace 1888
+panama 1852
+contributed 1263
+nec 2551
+asia 2310
+spokesman 301
+above 626
+dr. 982
+new 36
+net 136
+increasing 987
+ever 823
+seeks 2454
+told 549
+specialist 2519
+never 575
+here 348
+hundreds 1689
+reported 221
+protection 955
+china 638
+brooks 2353
+active 1027
+balance 1687
+auction 968
+items 1470
+employees 457
+climbed 1323
+reports 658
+credit 355
+analysts 166
+chrysler 1969
+military 756
+poverty 2838
+changes 515
+criticism 2288
+golden 1750
+campaign 879
+reagan 1195
+peabody 2432
+highly 1329
+brought 1130
+opportunities 2661
+total 344
+unit 168
+swings 2036
+would 43
+army 2135
+hospital 1833
+m. 1664
+negative 1330
+noting 2958
+call 787
+asset 1302
+strike 1106
+type 2136
+until 315
+b.a.t 1873
+hahn 2992
+supporters 2678
+composite 431
+hurt 1005
+phone 1690
+berlin 2737
+hold 838
+must 405
+me 812
+word 2057
+room 1505
+rights 535
+pursue 2248
+work 222
+plunged 1604
+movies 1970
+henry 2960
+already 294
+merely 2444
+revenues 2521
+my 406
+example 469
+wang 1958
+estate 460
+give 438
+cited 1376
+india 2595
+involve 2801
+currency 820
+foods 2389
+woman 1128
+caution 2981
+ual 409
+want 358
+drive 1524
+times 421
+attract 2236
+totaled 1297
+guarantee 1982
+end 237
+recovery 1654
+turn 837
+provide 577
+travel 1487
+damage 516
+machine 1042
+how 243
+hot 2766
+interview 1220
+widespread 2410
+resignation 2178
+badly 2455
+regional 1486
+minority 1398
+lufkin 2979
+after 79
+damaged 1608
+modest 1441
+president 72
+mesa 2754
+law 279
+types 2203
+las 2785
+purchase 496
+attempt 1036
+third 277
+amid 1646
+headquarters 1203
+maintain 1346
+green 1997
+suggest 1894
+democratic 961
+order 529
+ec 2125
+wine 2831
+operations 223
+senators 1994
+office 284
+over 95
+expects 407
+london 410
+japan 203
+mayor 1819
+before 158
+fit 2767
+personal 629
+expectations 1418
+better 502
+production 360
+weeks 422
+easier 2329
+damages 2063
+then 224
+dec. 932
+affected 1709
+combination 2802
+lambert 1987
+weakness 1827
+safe 1998
+break 1784
+effects 1670
+they 39
+schools 1632
+silver 1988
+bank 105
+structural 1989
+represents 1651
+30-year 1456
+detroit 2961
+affiliate 2357
+victory 2820
+reasonable 2852
+each 216
+went 789
+side 1078
+bond 272
+financial 142
+suspended 1944
+fairly 1953
+series 442
+carolina 2099
+carry 1536
+currencies 2403
+trading 77
+impossible 2487
+substantially 1777
+temporary 1907
+saturday 2070
+burnham 1880
+t. 2213
+network 673
+crucial 2935
+tomorrow 1588
+semiconductor 2204
+encourage 2422
+daniel 2483
+got 596
+newly 2193
+millions 2159
+sluggish 2609
+gop 2456
+foundation 2583
+sept. 628
+turning 2442
+written 2032
+veto 1052
+u.s. 54
+threatened 2522
+little 327
+free 813
+standard 738
+estimate 884
+wanted 1030
+enormous 2291
+created 1053
+days 172
+pence 1347
+oppose 2993
+1970s 2205
+uses 1851
+r. 1170
+industrial 439
+suspension 2903
+economists 1119
+primary 1723
+hearing 1803
+adopted 2224
+another 206
+electronic 1284
+ 0
+rated 1609
+service 337
+top 525
+approximately 2994
+needed 911
+rates 173
+too 307
+percentage 761
+john 401
+ranging 2708
+urban 1922
+ceiling 2109
+collapse 1633
+serve 2110
+took 454
+rejected 1392
+direct 1201
+western 650
+somewhat 2739
+shortly 2626
+toronto 2091
+renewed 2853
+target 1313
+showed 988
+likely 372
+nations 1606
+project 802
+matter 1196
+greenspan 2542
+feeling 2982
+acquisition 470
+bridge 1481
+fashion 2411
+sees 2498
+ran 2457
+boston 652
+modern 2390
+mind 1714
+mine 2120
+talking 1923
+seen 920
+seem 1058
+seek 983
+relatively 1303
+forced 1172
+abroad 1909
+strength 1531
+concrete 2911
+responsible 1845
+sound 2100
+recommended 2898
+client 1786
+luxury 1999
+forces 1252
+unsecured 2433
+shipments 2292
+blue 2753
+nobody 2148
+philadelphia 2253
+though 329
+wells 1940
+involving 1571
+germany 705
+letter 1107
+competing 2791
+germans 2995
+consumers 1134
+antitrust 1990
+medical 765
+flow 1647
+competitors 1414
+points 395
+principle 2860
+after-tax 2451
+voting 2259
+consumer 582
+dow 448
+came 545
+reserve 717
+d. 674
+saying 715
+meetings 1926
+ending 1555
+showing 1659
+radio 1665
+poison 2936
+hungary 2330
+judges 1853
+finally 1383
+proposed 499
+representing 2098
+delays 2567
+unemployment 2413
+sugar 1971
+rico 2293
+bush 257
+rich 1991
+announce 2624
+resulting 2712
+do 88
+exports 1002
+de 990
+stop 1147
+preferred 1009
+coast 1828
+lenders 2331
+despite 486
+report 238
+du 2723
+volatility 929
+hall 2517
+runs 1678
+jaguar 574
+countries 676
+fields 2186
+high-yield 2138
+bay 759
+twice 1983
+bad 890
+release 1700
+prudential-bache 2691
+mergers 2963
+secretary 718
+headed 1658
+disaster 1352
+fair 2161
+w. 1820
+testing 2938
+decided 1104
+result 411
+discussions 1963
+resigned 1234
+taiwan 2793
+best 598
+subject 966
+brazil 2803
+said 16
+capacity 1137
+away 601
+irs 1049
+compensation 2619
+machinists 2106
+pressures 2768
+future 508
+cooperation 2340
+approach 1402
+co. 96
+profitable 1629
+we 65
+men 944
+terms 520
+extend 2369
+nature 2162
+wo 418
+ask 1906
+handful 2916
+weak 1520
+however 229
+retirement 1427
+extent 2206
+news 311
+convertible 956
+debt 208
+improve 1153
+suggested 1943
+received 517
+protect 1865
+met 1317
+country 359
+over-the-counter 1241
+against 175
+players 1578
+else 1394
+supplies 1802
+games 2111
+planned 950
+faces 1772
+studio 2318
+argue 1992
+asked 613
+prospect 2424
+tough 1401
+appeared 2008
+royal 2725
+offerings 2272
+represented 2623
+tons 1068
+initiative 2804
+trust 458
+telecommunications 1821
+conference 881
+puts 1847
+basis 854
+union 319
+anc 2756
+three 133
+been 59
+quickly 914
+commission 475
+beer 2358
+much 122
+interest 132
+basic 2273
+expected 171
+entered 2139
+containers 2625
+life 391
+families 1881
+mci 2741
+eastern 734
+drugs 1387
+republicans 1666
+worker 2523
+mca 1972
+enterprises 2282
+child 2250
+ogilvy 2261
+worked 1370
+slowdown 1653
+applied 2484
+commerce 1204
+has 31
+publicly 1377
+air 452
+ventures 1715
+near 958
+appeals 1920
+aid 870
+property 810
+study 936
+launched 1610
+seven 913
+changed 973
+metropolitan 2742
+mexico 1433
+is 14
+it 15
+expenses 1167
+ii 1917
+player 2524
+experts 1403
+world-wide 1393
+in 8
+victims 1468
+confident 2883
+turner 2588
+if 67
+grown 2391
+hong 794
+patent 2069
+things 607
+make 139
+linked 2895
+complex 1724
+split 1568
+several 249
+couple 1547
+european 568
+independent 829
+pick 2319
+hand 1054
+ownership 1443
+constitution 2696
+opportunity 1473
+kept 1785
+scenario 2918
+programs 561
+settled 1621
+savings 746
+materials 1318
+rey 2879
+mother 2320
+claims 595
+the 2
+corporate 353
+investments 744
+left 697
+quoted 1182
+yen 255
+mills 2718
+expanding 2640
+ideas 2966
+identify 2861
+human 1224
+campbell 2627
+yet 397
+previous 558
+adding 1344
+buyers 846
+hills 2375
+phillips 1895
+ease 1751
+had 51
+intends 2227
+spread 1437
+board 147
+easy 1540
+prison 2584
+east 519
+gave 1227
+municipal 1488
+possible 492
+possibly 2370
+buy-out 589
+judge 403
+replace 1986
+advanced 992
+desire 2649
+county 1223
+exxon 1691
+hunt 2459
+securities 126
+offices 1154
+officer 290
+night 1069
+security 648
+delmed 2689
+attorney 752
+right 382
+old 394
+deal 497
+people 109
+dead 2610
+consultants 1954
+donald 2460
+election 1725
+short-term 967
+specific 1228
+for 12
+bottom 2525
+comments 2302
+p.m. 2163
+when 68
+continue 388
+denied 2027
+steps 1652
+christmas 2321
+core 1866
+marketing 610
+corn 1579
+conventional 1883
+discount 1179
+restructuring 654
+plc 957
+packages 2664
+losing 1810
+brokerage 857
+post 1096
+manufacturing 737
+properties 1298
+georgia-pacific 1660
+chapter 1457
+dollars 604
+months 127
+costs 271
+magazine 781
+plus 1726
+afternoon 1736
+efforts 702
+slightly 743
+nixon 2526
+raised 748
+managers 585
+publishing 1584
+formerly 2392
+facility 1541
+civil 1229
+maxwell 2471
+marshall 2901
+son 2423
+down 119
+explain 2680
+magazines 1444
+dean 2805
+reducing 2294
+defendants 1811
+crowd 2983
+support 415
+initial 1074
+legislation 803
+cosmetics 2659
+per-share 1891
+why 763
+joseph 1955
+editor 1716
+way 228
+resulted 2527
+music 2289
+was 25
+war 978
+interest-rate 2629
+head 642
+economics 2800
+form 945
+manufacturers 931
+becoming 2341
+differences 2322
+ford 480
+failure 1638
+heat 2239
+hear 2568
+syndicate 2939
+sustained 2964
+stand 1419
+true 1550
+analyst 365
+nov. 408
+counsel 2628
+inside 2207
+bids 1404
+maximum 2538
+devices 1672
+tell 1345
+jan. 2081
+ 3
+stronger 1934
+one-third 2681
+evidence 1043
+promised 2597
+accounting 1410
+ship 2351
+program-trading 2844
+check 2323
+negotiations 1391
+regime 2822
+floor 1014
+phelan 2180
+stake 320
+generally 925
+credibility 2924
+successful 1698
+interested 1378
+role 795
+holding 477
+digital 1044
+test 895
+developers 2587
+bailout 2663
+roll 2332
+picture 2151
+'s 10
+brothers 1001
+delivered 2930
+models 1901
+surprise 2608
+felt 1918
+utilities 1717
+'d 1131
+invested 2295
+authorities 1892
+'m 827
+aware 2662
+weekend 1854
+died 2228
+jones 459
+reorganization 1411
+longer 921
+glass 2647
+assume 2312
+italy 1977
+connecticut 2959
+together 1190
+liquidity 1841
+premiums 2940
+time 104
+push 1812
+serious 1175
+profits 740
+concept 2786
+managed 1855
+chain 1273
+global 1551
+alternatives 2461
+focus 1247
+manager 611
+battle 1118
+creative 2919
+s.a. 2146
+certainly 1532
+everything 1498
+father 2393
+environment 1692
+charge 527
+asking 2026
+e. 1514
+marks 754
+suffered 2187
+circumstances 2290
+division 540
+supported 2316
+mixte 1733
+keeping 1929
+choice 2164
+liability 1756
+drexel 875
+lynch 834
+10-year 2216
+join 2149
+trouble 1513
+corp. 66
+governments 2352
+level 462
+did 144
+turns 2434
+proposals 1737
+democrat 2862
+standards 1405
+leave 1235
+settle 1927
+team 769
+quick 2414
+speculation 1311
+round 2863
+lloyd 1509
+prevent 1371
+says 45
+trend 1458
+gasoline 2439
+telerate 2073
+sign 1215
+mich. 2823
+cost 363
+aggressive 2210
+adds 872
+appear 1299
+hewlett-packard 2082
+assistance 1965
+shares 71
+current 265
+goes 1321
+international 198
+falling 1752
+principal 1510
+boost 962
+filled 2443
+paribas 1557
+transportation 847
+genes 2262
+french 897
+agreement 332
+water 1092
+baseball 1804
+groups 723
+address 2834
+alone 1325
+along 703
+earthquake 427
+change 429
+wait 2555
+canadian 741
+institute 760
+shift 1738
+guilty 1623
+trial 1162
+usually 1026
+corp 587
+bob 2612
+navigation 1667
+retired 2598
+defensive 2769
+extra 2092
+lending 1710
+mobil 2920
+crisis 2071
+market 48
+everybody 2613
+indicated 1296
+working 670
+prove 2000
+positive 1778
+psyllium 2721
+visit 2275
+third-quarter 349
+france 1003
+live 1636
+opposed 1779
+stearns 2281
+memory 2513
+francs 1012
+australian 1773
+household 2249
+today 326
+club 2058
+apparent 2770
+fuel 2225
+cautious 2577
+downturn 2549
+cases 685
+effort 677
+behalf 2771
+fly 2683
+organizations 2599
+valued 1216
+ibm 690
+tokyo 606
+car 602
+abortion 886
+believes 1095
+districts 2884
+ms. 487
+values 1673
+can 90
+growing 776
+making 424
+interstate 1936
+newspapers 2376
+claim 1535
+citizens 2371
+figure 1187
+predict 2499
+december 924
+chip 2093
+agent 2276
+1980s 2123
+heard 2378
+dropped 557
+council 1334
+allowed 1141
+requirements 1562
+winter 2650
+secured 2965
+bonds 129
+chemical 720
+beat 2816
+sunday 2500
+s. 2050
+fourth 915
+ensure 2283
+subsidiaries 2569
+economy 354
+product 521
+huge 828
+may 94
+southern 1277
+applications 2529
+membership 2140
+produce 979
+mae 1596
+designed 1098
+date 1326
+such 89
+data 451
+grow 1556
+man 898
+natural 1007
+johnson 1244
+maybe 1688
+futures 288
+borrowing 2864
+gap 2518
+so 106
+deposit 1856
+increase 244
+pulled 2473
+talk 1197
+typical 2600
+exclusive 2755
+no. 1656
+acts 2869
+seeing 2865
+sell-off 2772
+indeed 1010
+mainly 1871
+consulting 1611
+years 73
+ended 328
+experiments 2835
+cuts 1287
+argued 2133
+statements 2585
+cold 2263
+still 148
+stock-index 974
+group 97
+monitor 2941
+procedures 2734
+presence 2238
+troubles 2474
+forms 2726
+offers 1338
+policy 374
+mail 1617
+main 1112
+decades 2985
+texas 484
+happened 2152
+finance 573
+views 2141
+introduce 2539
+nation 507
+records 2023
+half 389
+not 64
+now 100
+provision 1207
+discuss 2124
+nor 1412
+term 1080
+attorneys 1857
+name 819
+january 1135
+drop 471
+rock 2866
+quarter 110
+el 2727
+square 2083
+significantly 1775
+latin 2827
+revised 1582
+s&p 835
+begun 2360
+year 41
+happen 2048
+worried 2084
+tried 1590
+canada 730
+living 2033
+shown 2028
+inventories 2418
+opened 1239
+space 971
+profit 178
+factory 2042
+looking 735
+investigation 1339
+indicating 2991
+shows 1133
+exactly 2675
+earlier 134
+theory 2570
+cars 701
+million 22
+incentives 2799
+possibility 2251
+quite 1583
+california 296
+besides 2709
+obligation 2264
+marine 2787
+card 2303
+care 858
+advance 1639
+training 1993
+language 2142
+ministry 1245
+discussing 2885
+wrong 1780
+british 304
+thing 899
+place 666
+massive 2441
+promotion 2644
+think 316
+first 75
+merrill 814
+revenue 187
+one 55
+opec 2552
+americans 1018
+one-time 2049
+directly 1085
+vote 839
+corporations 1640
+message 2601
+fight 1447
+open 671
+george 1031
+size 1070
+city 286
+given 732
+sheet 2998
+district 806
+caught 2436
+trillion 1744
+plastic 2571
+anyone 1453
+indicate 2165
+returns 984
+white 445
+friend 2788
+gives 1499
+hud 2043
+acquisitions 1491
+mining 2037
+mostly 1425
+that 11
+pittsburgh 2343
+season 1580
+moscow 1495
+alan 1745
+released 1625
+specialists 2166
+surged 1730
+than 56
+population 2507
+wide 1834
+television 758
+effective 1140
+rival 2117
+require 1214
+spokeswoman 1046
+officials 161
+venture 816
+were 47
+published 2398
+and 9
+mountain 2355
+san 305
+investors 117
+remained 1849
+turned 951
+argument 2774
+say 118
+plunge 1232
+allen 2836
+sells 1626
+saw 1463
+any 107
+accounted 2722
+offering 357
+regular 2035
+efficient 2743
+offer 209
+aside 2530
+note 952
+equipment 550
+mr. 24
+potential 649
+take 210
+performance 766
+wonder 2744
+registered 2783
+channel 2556
+begin 852
+sure 1138
+normal 1796
+track 2274
+price 116
+enter 2882
+paid 490
+icahn 2736
+nomura 2997
+america 541
+pages 1957
+honecker 2531
+manville 1637
+operate 1890
+especially 860
+surprising 2492
+payable 2694
+considered 994
+average 197
+later 467
+steady 2472
+sale 227
+federal 101
+professional 1757
+senior 446
+mass. 1701
+typically 1449
+filing 963
+laws 1489
+shop 2682
+rating 1384
+shot 2684
+surplus 2775
+show 465
+german 770
+delta 2806
+allegations 2508
+commitments 2018
+discovered 2059
+rep. 824
+soviets 1396
+fifth 2728
+ground 1840
+slow 1188
+ratio 2304
+gulf 1482
+title 2126
+daily 859
+enough 543
+crime 1563
+only 87
+going 325
+black 755
+treasury 282
+thompson 2528
+watching 2974
+congressional 912
+dispute 1702
+get 188
+contracts 646
+assistant 1634
+employers 2445
+nearly 441
+secondary 2395
+prime 953
+regarding 2630
+yield 236
+morning 1211
+miles 1357
+predicted 1693
+scott 2906
+where 252
+husband 2666
+salomon 1278
+declared 2009
+corry 2710
+committed 2266
+seat 2685
+elected 1240
+j. 891
+college 1500
+stanley 1348
+concern 289
+mortgage 583
+farmers 1202
+ways 1139
+jumped 1008
+review 1282
+representatives 1858
+forecast 1474
+weapons 2886
+outside 849
+bureau 1464
+between 179
+import 2179
+reading 2557
+across 1177
+jobs 1142
+august 398
+parent 731
+blame 2667
+article 1055
+cities 1806
+come 435
+reaction 2112
+acquiring 2240
+many 98
+trader 1281
+trades 1246
+according 215
+contract 317
+prompted 2486
+buy-back 2668
+senator 2967
+holders 675
+traded 926
+comes 1097
+among 212
+cancer 1019
+color 2652
+roman 2639
+period 341
+insist 2631
+confirmed 1964
+learning 2789
+moreover 1208
+poll 2824
+two-year 2324
+considering 1434
+save 1973
+unusual 1739
+west 380
+airlines 699
+mark 1061
+hutton 1340
+combined 1545
+hardly 2825
+mary 2942
+disclosed 871
+wants 821
+direction 1984
+shopping 1974
+offered 494
+formed 2252
+observers 2437
+wake 2010
+minister 1047
+former 338
+those 151
+pilot 2509
+case 308
+developing 1589
+these 145
+consultant 1373
+cash 268
+n't 33
+warning 2366
+policies 1312
+newspaper 1148
+situation 942
+shops 2632
+margin 1797
+region 1648
+eventually 1415
+metric 2265
+health-care 2218
+engaged 2837
+telephone 733
+quiet 2757
+middle 2686
+someone 1362
+attributed 1400
+technology 503
+worry 2001
+par 1164
+develop 1285
+pay 256
+same 313
+dealer 2586
+speech 2396
+grain 2200
+insurers 1668
+events 1483
+week 124
+buy-outs 2926
+oil 269
+singapore 2085
+boosted 1813
+drives 2072
+producers 855
+running 1122
+harris 2711
+intended 1951
+changing 2510
+anticipated 2344
+complained 2540
+costa 2211
+theater 2669
+largely 892
+charges 593
+no 103
+constitutional 2305
+roughly 1597
+mortgages 1537
+severe 2541
+without 350
+relief 1746
+model 1807
+researchers 1248
+charged 1657
+summer 976
+asset-backed 2687
+being 214
+money 162
+rest 1115
+kill 2633
+speed 1787
+weekly 2052
+announcement 1040
+death 1237
+rose 120
+seems 900
+except 1627
+improvement 1467
+westinghouse 2968
+setting 2150
+bloc 2634
+treatment 1355
+plenty 2962
+tuesday 474
+ross 2196
+scheduled 832
+negotiating 2017
+around 385
+read 1331
+papers 2695
+virginia 2698
+early 267
+inflation 620
+traffic 1703
+using 665
+accepted 1747
+ruled 2379
+intel 2167
+nissan 1271
+rivals 2616
+'ve 578
+annually 1758
+chamber 2615
+benefit 1279
+either 707
+retailers 2314
+fully 1145
+output 1450
+tower 2416
+reduced 901
+nikkei 2635
+competition 874
+loyalty 2970
+bigger 1869
+thinks 2086
+provided 1294
+earth 2821
+operators 2638
+recorded 2489
+legal 656
+conservative 1280
+critical 2101
+deficit 822
+provides 1304
+newport 2943
+moderate 2446
+football 2359
+assembly 2479
+scientific 2118
+power 339
+airways 2807
+equivalent 2428
+broker 1862
+broken 2808
+leadership 1902
+aide 2738
+manufacturer 2188
+on 17
+central 877
+package 1184
+of 5
+industry 154
+thousands 1567
+fell 204
+airline 767
+sachs 1679
+act 757
+mixed 1764
+mean 1465
+or 37
+confidence 1818
+tape 2658
+barrels 1850
+outlook 2219
+coupon 1661
+instruments 1867
+image 1341
+accounts 907
+determine 2636
+parties 1546
+operator 2127
+your 483
+pharmaceutical 1945
+fast 2512
+her 200
+area 449
+there 84
+alleged 1601
+start 903
+appears 1314
+low 615
+lot 580
+valley 1272
+billion 49
+complete 1490
+saatchi 2087
+delayed 2782
+sophisticated 2501
+brain 2975
+succeeded 2913
+two-thirds 2603
+technologies 2212
+trying 547
+with 23
+buying 361
+faster 2362
+volume 396
+october 522
+circulation 2923
+sears 1719
+default 2380
+wholesale 2699
+agree 1842
+strongly 2759
+gone 1843
+vehicles 1576
+ad 695
+ag 1674
+certain 426
+totaling 2637
+moved 1266
+sales 82
+deep 2945
+an 32
+cbs 1023
+britain 774
+at 19
+file 2012
+aids 2137
+politics 2113
+moves 1103
+film 1475
+fill 2946
+again 556
+consensus 2713
+personnel 2103
+storage 2490
+event 1938
+field 1105
+you 111
+poor 888
+a$ 2589
+congress 287
+separate 1127
+students 1358
+a. 943
+n.j. 1848
+important 590
+massachusetts 2537
+coverage 2002
+planners 2653
+brands 1335
+stocks 150
+building 482
+assets 297
+calls 724
+wife 1985
+invest 2114
+having 716
+directors 751
+mass 2654
+overseas 1060
+starting 1801
+original 1525
+represent 2013
+all 74
+sci 1781
+consider 930
+chinese 1564
+caused 989
+lack 1439
+dollar 333
+month 169
+mccaw 1947
+talks 711
+follow 2019
+settlement 917
+decisions 1868
+children 1028
+causes 2904
+reluctant 2947
+tv 659
+thursday 939
+shall 2832
+to 6
+program 157
+spain 2480
+health 468
+lawmakers 1388
+activities 1230
+calif. 850
+premium 1257
+returned 2053
+divisions 2954
+very 253
+resistance 2285
+worst 2325
+decide 1882
+fall 619
+sony 1155
+difference 1574
+condition 2119
+cable 1264
+louis 2128
+list 1407
+joined 2024
+large 381
+circuit 2199
+small 300
+webster 2984
+past 225
+rate 159
+arizona 1599
+design 1521
+lawyer 1359
+pass 2168
+nbc 1808
+further 393
+investment 146
+what 115
+abc 2189
+richard 867
+investing 2181
+sun 1581
+section 1236
+resume 2497
+brief 2700
+ 1
+noriega 1006
+version 1349
+scientists 1466
+certificates 1822
+learned 2604
+public 275
+contrast 1928
+movement 2038
+turmoil 2730
+full 632
+editorial 2488
+answers 2854
+hours 887
+citicorp 1995
+operating 299
+excess 2094
+november 1445
+strong 404
+thrift 1077
+publisher 2194
+prosecutors 1598
+ahead 970
+extraordinary 2147
+losses 392
+experience 1526
+prior 1874
+amount 542
+advertising 627
+social 1342
+action 588
+narrow 2688
+options 657
+via 645
+followed 1159
+family 617
+requiring 2855
+africa 1946
+thatcher 1533
+put 336
+aimed 1516
+establish 2559
+donaldson 2809
+shareholders 591
+eye 2190
+takes 1254
+petroleum 1422
+two 76
+generate 2760
+taken 640
+markets 191
+minor 1896
+more 46
+flat 1231
+israel 2241
+door 2054
+knows 2326
+fast-food 2543
+jr. 1209
+company 38
+broke 2856
+particular 1476
+known 709
+producing 1884
+town 1741
+jim 2773
+none 1919
+lilly 2874
+hour 1477
+science 2905
+des 2746
+remain 706
+sudden 2558
+nine 511
+sent 998
+morgan 905
+strategies 2230
+history 964
+purchases 1336
+processing 2306
+brown 1448
+pont 2888
+share 63
+accept 1527
+states 599
+pushed 2221
+minimum 1079
+numbers 1618
+purchased 1662
+sense 1116
+sharp 1123
+f. 2095
+information 532
+needs 1198
+answer 2440
+court 213
+advantage 1924
+rather 644
+hugo 1065
+conducted 2810
+earnings 137
+portfolios 2548
+plant 402
+plans 232
+advice 2747
+different 772
+reflect 1897
+fe 2003
+coming 762
+response 1124
+a 7
+short 694
+brady 2400
+departure 2889
+coal 2354
+broadcasting 1528
+responsibility 2068
+media 1034
+banks 248
+egg 2602
+playing 2447
+turnover 2701
+played 1839
+help 334
+september 312
+developed 1260
+soon 641
+trade 220
+held 417
+paper 399
+through 149
+committee 373
+signs 1694
+suffer 2948
+its 28
+developer 2969
+style 2074
+rapidly 2214
+actually 1071
+late 292
+systems 419
+conn. 2051
+stephen 1898
+inquiry 2999
+might 280
+tentatively 2907
+good 264
+return 504
+seeking 817
+food 622
+reflected 1886
+association 447
+easily 2145
+holiday 2763
+always 851
+stopped 1885
+eager 2990
+found 552
+heavy 771
+sterling 1641
+everyone 1385
+england 1492
+generation 2198
+house 165
+energy 773
+hard 712
+reduce 713
+idea 1165
+police 1399
+extended 2313
+expect 524
+advertisers 1704
+operation 1143
+beyond 1300
+insurance 240
+really 840
+deals 1306
+funding 1173
+carriers 2339
+blacks 1875
+robert 513
+since 156
+douglas 2381
+research 318
+participants 2315
+safety 1267
+hill 2075
+fujitsu 2707
+issue 186
+highway 1844
+reporting 2192
+risen 2908
+lawrence 2401
+friday 283
+houses 1332
+reason 878
+base 980
+members 450
+backed 1319
+beginning 937
+guy 2481
+director 276
+owners 1275
+benefits 991
+launch 2222
+just 152
+computers 565
+excluding 2317
+american 141
+threat 1705
+pilots 1045
+fallen 2154
+lawsuits 2160
+copper 1478
+major 138
+slipped 1903
+feel 1113
+number 295
+feet 1350
+done 927
+fees 792
+miss 2925
+causing 2470
+stage 2258
+story 1363
+heads 2532
+leading 815
+st. 1612
+kidder 1086
+least 298
+station 1887
+expand 1708
+statement 682
+dealing 2554
+compromise 1975
+store 1365
+listed 1605
+selling 400
+passed 1364
+relationship 1904
+behind 1120
+hotel 1727
+park 1518
+immediate 1930
+blue-chip 2729
+profitability 2566
+part 202
+favorable 2870
+believe 624
+hollywood 2007
+king 2242
+kind 948
+grew 1572
+rebound 2900
+double 1870
+pennsylvania 2361
+determined 1959
+risks 1484
+elaborate 2520
+messrs. 2402
+toward 811
+aug. 2039
+outstanding 548
+imports 949
+substantial 1315
+orders 456
+option 1389
+sell 183
+ratings 1573
+built 1099
+trip 2887
+gorbachev 1166
+officers 2670
+targets 2611
+majority 908
+internal 1327
+chairman 143
+finding 1899
+frequently 2387
+play 1117
+added 285
+electric 940
+goldman 1602
+eggs 2811
+measures 1322
+reach 1366
+freddie 2605
+most 91
+hired 2493
+shareholder 1032
+plan 176
+significant 893
+services 324
+extremely 1976
+approved 680
+soared 1910
+compaq 2690
+dealers 669
+clear 726
+sometimes 1276
+cover 1506
+rockefeller 2731
+traditional 1522
+three-month 2578
+clean 2790
+usual 1771
+institutions 826
+painewebber 1628
+sector 1075
+thomas 1269
+particularly 783
+gold 660
+commissions 2277
+nasdaq 1343
+session 1082
+businesses 434
+jury 1372
+fine 1829
+find 678
+impact 883
+gen. 2812
+giant 1100
+regulations 2060
+nevertheless 2955
+northern 1431
+justice 1011
+heavily 1440
+distributed 2871
+failed 778
+flights 2544
+pretty 1760
+equity 621
+giants 2169
+begins 2679
+his 50
+hit 777
+gains 554
+meanwhile 672
+express 1020
+financing 605
+collection 2878
+b 2327
+actions 1622
+closely 1102
+reporters 2170
+during 199
+him 302
+merchandise 2450
+appeal 1682
+doubled 2813
+six-month 2014
+banking 569
+common 247
+activity 807
+switzerland 2096
+coors 2909
+river 1996
+wrote 1565
+restaurants 2748
+set 386
+art 1459
+achieved 2761
+declines 1256
+sex 2986
+culture 2345
+see 379
+defense 536
+sec 1132
+are 27
+sea 1876
+tender 1765
+close 321
+arm 2890
+declined 413
+filings 2932
+# 687
+spirits 2776
+movie 1603
+century 1742
+currently 488
+won 1072
+various 1186
+probably 681
+conditions 1087
+supposed 2449
+available 679
+korea 1592
+recently 376
+creating 2363
+initially 2115
+dividends 1435
+sold 239
+attention 1420
+aircraft 1496
+succeed 2284
+coffee 2143
+opposition 1305
+franchise 2575
+dividend 704
+both 180
+prospects 2171
+last 70
+appropriations 1367
+annual 367
+foreign 245
+sensitive 2732
+connection 2591
+became 985
+long-term 688
+let 1025
+whole 1180
+baltimore 2749
+point 375
+reasons 1748
+loan 501
+community 922
+simply 946
+church 1960
+throughout 1766
+expensive 1619
+decline 461
+described 2182
+raise 630
+monthly 1504
+create 1288
+political 390
+due 260
+strategy 750
+convicted 2830
+whom 1713
+reduction 1501
+maintenance 2545
+meeting 476
+walter 2438
+firm 192
+partly 1110
+fire 1782
+gas 538
+convert 2794
+N 4
+fund 293
+whatever 2671
+lives 2129
+brokers 960
+bidding 1494
+demand 437
+prices 113
+plants 865
+georgia 2076
+look 714
+solid 2950
+judicial 2987
+bill 261
+budget 570
+governor 2672
+technical 1586
+while 121
+mainframe 2927
+ought 2546
+fleet 2928
+mitchell 2346
+guide 2792
+engineers 2762
+real 309
+pound 1066
+costly 2183
+voters 1683
+cents 108
+motors 1328
+stations 1740
+disappointing 2462
+itself 683
+ready 1788
+fannie 1967
+coca-cola 2910
+chase 2088
+underwriters 1718
+suggests 2045
+rules 906
+virtually 1753
+widely 1283
+grand 1426
+survey 1108
+dozen 1671
+higher 207
+development 444
+used 263
+lawyers 691
+d.c. 2988
+affairs 1699
+comprehensive 2655
+yesterday 123
+moment 1859
+levels 788
+moving 1408
+purpose 2617
+tobacco 2477
+recent 182
+lower 231
+task 2015
+older 1908
+studies 1956
+poland 1221
+spent 1149
+person 1442
+machinery 2511
+ltd. 555
+swiss 1416
+organization 1178
+spend 1270
+coup 2226
+one-year 2560
+junk-bond 1767
+networks 2464
+u.k. 1168
+competitive 1650
+quarters 2311
+questions 1093
+world 219
+alternative 1978
+wage 1158
+cut 378
+helping 2116
+$ 13
+also 60
+advisers 2044
+workers 432
+deputy 1809
+guaranteed 2268
+attractive 2467
+source 1076
+stock-market 2382
+parents 1877
+location 2777
+violations 2576
+guarantees 2004
+administrative 2233
+remaining 1428
+surprised 2478
+build 848
+customers 526
+australia 1249
+march 618
+emergency 1171
+demands 2648
+big 130
+bid 258
+matters 2104
+game 1088
+aerospace 1931
+bit 1893
+projects 868
+moody 995
+breeden 2364
+success 1395
+follows 1860
+signal 2383
+toyota 2929
+separately 1261
+communications 779
+arthur 2891
+individuals 1324
+yields 923
+popular 1429
+healthy 1805
+privately 2297
+often 518
+senate 463
+spring 1205
+b. 1814
+some 58
+back 193
+trends 2673
+economic 234
+pricing 1861
+apply 2465
+nicaragua 2503
+facing 2397
+scale 2750
+decision 531
+transactions 1083
+audience 2144
+per 1038
+eliminate 2561
+be 26
+run 612
+lose 1517
+continuing 1021
+fed 566
+refused 2077
+step 1210
+santa 1250
+served 2066
+at&t 1789
+by 18
+pipeline 2365
+goods 804
+anything 997
+truck 1792
+mrs. 662
+range 882
+ounce 1921
+duties 2917
+block 1035
+pollution 2951
+repair 2839
+steinhardt 2692
+into 92
+within 530
+retailer 2751
+nothing 1033
+primarily 1548
+sports 1259
+pentagon 1472
+bankruptcy 1029
+statistics 1939
+spending 509
+question 801
+long 352
+ordered 1823
+amr 2989
+suit 633
+himself 1056
+elsewhere 1731
+collapsed 2347
+vehicle 2061
+specialty 2269
+hoped 2872
+atlantic 2254
+pacific 689
+filed 528
+hopes 1101
+subsidiary 663
+line 464
+considerable 2714
+raising 1421
+posted 634
+up 53
+us 505
+maturity 1529
+'re 278
+exploration 2105
+viacom 2562
+similar 710
+called 342
+bell 1310
+associated 1669
+metal 2485
+influence 1905
+metals 1790
+engineering 1293
+associates 1121
+rally 975
+amounts 1356
+peace 2892
+fears 1711
+teams 2921
+yeargin 2494
+afford 2902
+politicians 2131
+reputation 2693
+income 185
+department 235
+manhattan 1430
+users 2367
+gross 2011
+problems 356
+prepared 1575
+william 782
+allowing 2579
+formal 2592
+sides 1783
+structure 1761
+ago 196
+urged 2223
+land 1217
+vice 233
+age 1067
+required 972
+bankers 1022
+responded 2873
+far 291
+fresh 2593
+requires 2384
+leveraged 1146
+once 500
+code 2372
+issued 902
+results 343
+existing 1251
+oct. 314
+ge 1889
+broader 2778
+go 387
+gm 844
+contributions 2463
+centers 1585
+issues 251
+seemed 1815
+concerned 1493
+young 841
+send 2779
+suits 2399
+citing 1961
+stable 2745
+quarterly 1090
+include 537
+friendly 2385
+resources 1059
+garden 2255
+automotive 1620
+continues 1050
+wave 2857
+putting 1663
+cellular 2448
+telling 2931
+continued 609
+entire 1436
+eased 2894
+sen. 969
+real-estate 1446
+positions 1380
+notes 377
+michael 798
+fewer 1732
+try 876
+race 2475
+noted 667
+guber 1183
+concluded 2244
+smaller 1048
+cds 1774
+crop 2296
+jump 2132
+video 2533
+expense 2458
+makers 768
+index 217
+edison 2814
+business 86
+chicago 485
+giving 1454
+expressed 2173
+practices 1798
+access 1455
+paying 1125
+waiting 1630
+indian 2840
+volatile 2030
+five-year 2651
+capital 181
+firms 366
+exercise 1743
+body 2875
+led 661
+lee 1816
+exchange 112
+pushing 2514
+commercial 478
+jointly 2040
+following 572
+northeast 2553
+them 128
+others 495
+great 592
+credits 3001
+receive 934
+involved 831
+larger 1307
+leaving 1835
+engine 2758
+merger 1037
+products 190
+opinion 1817
+residents 1754
+gene 1614
+makes 559
+maker 340
+fourth-quarter 2016
+named 523
+writer 2971
+apple 1538
+heart 1655
+win 1255
+manage 2933
+private 551
+fraud 1552
+names 1863
+motor 996
+scandal 1948
+standing 2715
+use 270
+from 21
+p&g 2217
+consumption 2618
+& 83
+remains 780
+illegal 2121
+cray 1553
+next 131
+few 262
+doubt 1878
+year-ago 1519
+themselves 873
+consecutive 2243
+reflects 1712
+usx 1502
+sort 1566
+parliament 2580
+started 889
+becomes 2934
+factor 2245
+benchmark 1824
+occurred 2841
+carrying 2590
+sharply 1051
+allianz 2867
+mitsubishi 1615
+appointed 2495
+women 941
+customer 1353
+account 636
+us$ 1539
+effectively 2876
+this 40
+challenge 1879
+clients 764
+recession 965
+thin 2429
+island 2868
+meet 896
+closing 1039
+n.y. 2208
+control 351
+beijing 2062
+slid 2780
+weaker 2716
+engelken 2703
+process 796
+a.m. 2859
+daiwa 2733
+tax 218
+purposes 2842
+high 241
+professor 2022
+reserves 825
+something 785
+sought 1432
+stories 2466
+voice 1981
+rape 2972
+sir 1836
+educational 2534
+united 562
+usair 2029
+democracy 2404
+recalls 2572
+six 364
+hampshire 2279
+arrangement 2795
+traders 281
+forest 2877
+instead 597
+stock 61
+buildings 1642
+farm 2097
+watch 2237
+tied 2256
+ties 2215
+boeing 1962
+light 1063
+lines 805
+commodities 2373
+chief 153
+road 2102
+allow 894
+executives 493
+martin 2417
+houston 1933
+holds 1199
+hanover 2547
+producer 1587
+institutional 1258
+move 330
+produced 1109
+alliance 2056
+including 211
+looks 2155
+quake 1064
+year-earlier 797
+industries 553
+delay 1932
+la 2041
+labor 700
+whites 2952
+willing 1289
+orange 2515
+covered 2156
+criminal 1479
+spot 2209
+pending 1503
+crash 1041
+greater 1225
+auto 616
+practice 1360
+earn 2937
+cutting 1949
+h. 2184
+hands 1316
+front 1911
+bar 2740
+republican 1409
+investor 506
+day 273
+capital-gains 1386
+successor 2704
+february 1799
+l. 1577
+warned 2280
+university 594
+covering 2724
+identified 2504
+morris 1979
+rising 1126
+bills 614
+warner 576
+doing 842
+strip 2944
+related 866
+society 1423
+books 1800
+measure 853
+our 230
+margins 1242
+agriculture 2717
+special 655
+out 85
+merc 2849
+' 135
+entertainment 1016
+defend 2815
+critics 1825
+electronics 1301
+cause 1213
+integrated 2286
+red 1013
+thrifts 1837
+disclose 2021
+shut 2912
+frank 1643
+ban 2065
+regulators 1084
+york 93
+regulatory 1374
+indicates 2596
+philip 1696
+navy 2034
+hostile 1791
+could 80
+florida 1864
+mac 2496
+keep 586
+ltd 1480
+davis 2781
+retain 2452
+retail 909
+south 436
+respond 2833
+plastics 2953
+succeeds 2031
+powerful 1616
+owned 1015
+strategic 1768
+owner 1507
+reached 698
+awarded 2409
+quality 1192
+nyse 2335
+legislative 2657
+management 266
+stands 2702
+los 639
+system 274
+relations 1560
+recapitalization 2880
+priority 2581
+their 52
+attack 2348
+intelligence 1649
+final 1111
+interests 775
+enforcement 2535
+shell 2973
+completed 790
+acquire 869
+environmental 1091
+chemicals 1591
+reflecting 1308
+branches 2415
+july 567
+institution 2706
+steel 600
+colleagues 2174
+hearings 2287
+commodity 1406
+patients 1952
+individual 736
+providing 2064
+creditors 1004
+projections 2996
+unchanged 1073
+partnership 1218
+lin 1613
+unlikely 2356
+have 35
+need 479
+apparently 1081
+clearly 1508
+rjr 2220
+documents 1675
+dallas 1830
+agency 303
+able 584
+purchasing 1941
+instance 1222
+concerns 830
+which 42
+campeau 1680
+coke 2956
+unless 1150
+who 57
+eight 959
+preliminary 1980
+device 2843
+segment 1569
+payment 1212
+so-called 1354
+request 1681
+face 664
+looked 2828
+proceedings 2195
+lowered 2176
+pictures 2267
+normally 2191
+fact 668
+goals 2764
+agreed 346
+charles 1379
+bring 1144
+planning 1351
+democrats 1156
+portfolio 684
+fear 1262
+economist 1189
+debate 1561
+decade 1219
+staff 861
+litigation 1485
+partners 918
+based 306
+earned 1017
+controls 1558
+should 205
+unable 2260
+candidates 1925
+employee 1624
+communist 1769
+local 579
+hope 1160
+meant 2845
+dinkins 1185
+handle 2435
+means 863
+fellow 2505
+familiar 1253
+overall 1268
+bear 1644
+reinsurance 2656
+joint 799
+ones 1460
+words 1749
+exchanges 1469
+buyer 2005
+kong 845
+chips 1600
+areas 880
+trucks 1684
+course 904
+numerous 2574
+taxes 954
+calling 2405
+she 164
+ohio 1570
+fixed 1424
+conduct 2819
+view 808
+europe 546
+temporarily 2122
+downward 3000
+acquired 747
+national 163
+accord 1770
+operates 1912
+edition 2846
+computer 226
+subcommittee 2430
+closer 2516
+nationwide 2594
+reform 1471
+nuclear 1676
+tend 2134
+favor 1697
+state 140
+closed 250
+crude 1900
+progress 2453
+neither 1397
+bought 686
+comparable 2006
+brewing 2425
+ability 1157
+opening 1461
+deliver 2234
+agencies 1169
+job 793
+takeover 423
+key 864
+approval 727
+precious 2858
+lawsuit 2536
+distribution 1793
+declining 1631
+david 693
+restrictions 1530
+limits 2307
+career 2349
+goal 1966
+taking 836
+equal 1950
+drug 510
+pulp 2563
+april 856
+figures 653
+jersey 1762
+otherwise 2406
+comment 472
+adjusted 1706
+english 2765
+co 571
+lang 2333
+agents 2172
+wall 331
+ca 563
+cd 2055
+packaging 2976
+qintex 1243
+table 2573
+oakland 1937
+industrials 2607
+addition 489
+genetic 2336
+permanent 2893
+agreements 2078
+proposal 491
+waste 2847
+faced 2412
+controlled 1763
+c. 1707
+league 2308
+am 1635
+sufficient 2977
+otc 1413
+essentially 2620
+c$ 1511
+bulk 2752
+finished 1831
+graphics 2257
+improved 1309
+atlanta 2431
+general 189
+present 1838
+homes 1515
+troubled 1543
+abandoned 2896
+unlike 1728
+sotheby 2848
+restaurant 2278
+harder 2550
+as 20
+value 310
+will 34
+owns 833
+wild 2978
+uncertainty 1686
+almost 512
+blood 2502
+thus 1114
+site 2491
+helped 745
+claimed 2606
+partner 1062
+shearson 938
+halt 2665
+tumbled 2407
+perhaps 986
+began 544
+administration 345
+cross 2079
+member 1094
+retailing 2419
+parts 935
+largest 414
+units 603
+party 728
+gets 1523
+difficult 1024
+material 1695
+columbia 809
+nekoosa 1916
+upon 2025
+effect 692
+forecasts 2299
+student 2246
+rumors 1914
+kkr 2614
+single 1136
+transaction 564
+off 170
+center 721
+i 69
+approve 2426
+well 201
+fighting 2641
+thought 947
+banker 2377
+sets 2235
+position 533
+soviet 384
+inc. 81
+latest 466
+stores 643
+less 246
+increasingly 1554
+executive 155
+domestic 708
+obtain 2157
+sources 1534
+underlying 2107
+rooms 2374
+seats 1832
+paul 1549
+rapid 2642
+ads 1417
+supply 933
+smith 1129
+deposits 1734
+realize 2298
+simple 2231
+add 1226
+other 62
+subordinated 1544
+match 2408
+boom 2582
+tests 1645
+increased 369
+provisions 1292
+government 99
+chancellor 2300
+increases 696
+five 259
+know 481
+press 916
+immediately 1451
+loss 254
+lincoln 1735
+necessary 1381
+like 167
+lost 635
+miami 2829
+taxpayers 2201
+lawson 1390
+payments 729
+james 560
+become 428
+works 1320
+soft 2674
+amendment 1846
+exceed 2643
+because 78
+arbitrage 981
+authority 1286
+growth 242
+export 1462
+cleveland 2796
+home 322
+peter 1163
+employment 1872
+line-item 2350
+lead 786
+broad 1593
+avoid 1265
+hurricane 1151
+slide 2270
+does 177
+york-based 2645
+chains 2621
+leader 862
+schedule 2719
+journal 919
+monetary 1452
+expansion 1594
+beach 2394
+pressure 843
+expire 2817
+although 368
+offset 1193
+includes 800
+loans 362
+vs. 2660
+panel 1559
+gained 725
+about 44
+actual 1174
+carried 2420
+debentures 1295
+freedom 2676
+shipping 2158
+surge 2080
+angeles 749
+holdings 885
+carries 2468
+carrier 1368
+introduced 1191
+software 993
+own 195
+letters 2175
+previously 719
+warrants 2046
+washington 514
+commitment 2337
+billions 2622
+getting 637
+malcolm 2922
+included 651
+guard 2881
+promise 2047
+managing 1274
+banco 2338
+utility 1913
+accused 1794
+additional 581
+krenz 1729
+transfer 2506
+housing 999
+secret 2949
+peters 1206
+continental 1776
+biggest 742
+pretax 1176
+fiscal 416
+buy 184
+north 739
+stadium 2899
+triggered 2368
+insurer 2564
+funds 194
+brand 1089
+akzo 2957
+but 30
+delivery 1000
+insured 2108
+construction 608
+gain 430
+courts 1942
+highest 1595
+ltv 2705
+he 29
+made 160
+places 2735
+whether 370
+cells 2388
+official 455
+signed 1194
+record 440
+below 631
+limit 1057
+ruling 1233
+problem 433
+piece 2185
+minutes 1497
+supreme 1152
+deaths 2818
+wcrs 2980
+slowing 1607
+flight 2020
+education 1382
+proceeds 1337
+worse 2197
+inc 443
+aetna 2720
+mutual 1369
+compared 371
+'ll 928
+variety 2784
+corporation 2271
+illinois 2646
+book 1291
+compares 2914
+details 1512
+branch 2202
+compete 2850
+gonzalez 2797
+junk 473
+francisco 420
+star 1826
+monday 383
+class 1361
+june 625
+ultimately 2153
+contends 2328
+stay 1333
+chance 1375
+bellsouth 2697
+priced 425
+friends 2421
+exposure 2301
+resolution 2067
+baker 1290
+factors 1438
+rule 1161
+ortega 2677
+portion 1685
+write 2342
+status 2334
+pension 1181
+understand 2232
+frankfurt 1915
diff --git a/language_model/img/lstm.png b/language_model/images/lstm.png
similarity index 100%
rename from language_model/img/lstm.png
rename to language_model/images/lstm.png
diff --git a/language_model/img/ngram.png b/language_model/images/ngram.png
similarity index 100%
rename from language_model/img/ngram.png
rename to language_model/images/ngram.png
diff --git a/language_model/img/ps.png b/language_model/images/ps.png
similarity index 100%
rename from language_model/img/ps.png
rename to language_model/images/ps.png
diff --git a/language_model/img/ps2.png b/language_model/images/ps2.png
similarity index 100%
rename from language_model/img/ps2.png
rename to language_model/images/ps2.png
diff --git a/language_model/img/rnn.png b/language_model/images/rnn.png
similarity index 100%
rename from language_model/img/rnn.png
rename to language_model/images/rnn.png
diff --git a/language_model/img/rnn_str.png b/language_model/images/rnn_str.png
similarity index 100%
rename from language_model/img/rnn_str.png
rename to language_model/images/rnn_str.png
diff --git a/language_model/img/s.png b/language_model/images/s.png
similarity index 100%
rename from language_model/img/s.png
rename to language_model/images/s.png
diff --git a/language_model/lm_ngram.py b/language_model/lm_ngram.py
index 4607da3c6f02a7ae1a85f06b7dd370983092c9b6..23cca1c828576608e414d95b84dca062082410d6 100644
--- a/language_model/lm_ngram.py
+++ b/language_model/lm_ngram.py
@@ -5,6 +5,7 @@ import data_util as reader
import gzip
import numpy as np
+
def lm(vocab_size, emb_dim, hidden_size, num_layer):
"""
ngram language model definition.
@@ -135,7 +136,6 @@ def train():
if __name__ == '__main__':
-
# -- config : model --
emb_dim = 200
hidden_size = 200
@@ -145,9 +145,9 @@ if __name__ == '__main__':
model_file_name_prefix = 'lm_ngram_pass_'
# -- config : data --
- train_file = 'data/chinese.txt'
- test_file = 'data/chinese.txt'
- vocab_file = 'data/vocab_cn.txt' # the file to save vocab
+ train_file = 'data/ptb.train.txt'
+ test_file = 'data/ptb.test.txt'
+ vocab_file = 'data/vocab_ptb.txt' # the file to save vocab
vocab_max_size = 3000
min_sentence_length = 3
max_sentence_length = 60
@@ -163,7 +163,7 @@ if __name__ == '__main__':
# prepare model
word_id_dict = reader.load_vocab(vocab_file) # load word dictionary
_, output_layer = lm(len(word_id_dict), emb_dim, hidden_size, num_layer) # network config
- model_file_name = model_file_name_prefix + str(num_passs - 1) + '.tar.gz'
+ model_file_name = model_file_name_prefix + str(num_passs - 1) + '.tar.gz'
parameters = paddle.parameters.Parameters.from_tar(gzip.open(model_file_name)) # load parameters
# generate
input = [[word_id_dict.get(w, word_id_dict['']) for w in text.split()]]
@@ -176,4 +176,3 @@ if __name__ == '__main__':
predictions[-1][word_id_dict['']] = -1 # filter
next_word = id_word_dict[np.argmax(predictions[-1])]
print(next_word.encode('utf-8'))
-
diff --git a/language_model/lm_rnn.py b/language_model/lm_rnn.py
index 6072d599cabd273cb48e26f2ea17f5f1d75ee707..5a9721bbca009a3f6ef572a7d44ee860689e74c5 100644
--- a/language_model/lm_rnn.py
+++ b/language_model/lm_rnn.py
@@ -6,6 +6,7 @@ import gzip
import os
import numpy as np
+
def lm(vocab_size, emb_dim, rnn_type, hidden_size, num_layer):
"""
rnn language model definition.
@@ -63,8 +64,8 @@ def train():
# prepare word dictionary
print('prepare vocab...')
- word_id_dict = reader.build_vocab(train_file, vocab_max_size) # build vocab
- reader.save_vocab(word_id_dict, vocab_file) # save vocab
+ word_id_dict = reader.build_vocab(train_file, vocab_max_size) # build vocab
+ reader.save_vocab(word_id_dict, vocab_file) # save vocab
# define data reader
train_reader = paddle.batch(
@@ -188,7 +189,7 @@ def predict():
if os.path.isfile(vocab_file):
word_id_dict = reader.load_vocab(vocab_file) # load word dictionary
else:
- word_id_dict = reader.build_vocab(train_file, vocab_max_size) # build vocab
+ word_id_dict = reader.build_vocab(train_file, vocab_max_size) # build vocab
reader.save_vocab(word_id_dict, vocab_file) # save vocab
# prepare and cache model
@@ -209,10 +210,10 @@ def predict():
print('prob: ', prob)
print('-------')
-if __name__ == '__main__':
+if __name__ == '__main__':
# -- config : model --
- rnn_type = 'gru' # or 'lstm'
+ rnn_type = 'gru' # or 'lstm'
emb_dim = 200
hidden_size = 200
num_passs = 2
@@ -232,4 +233,4 @@ if __name__ == '__main__':
train()
# -- predict --
- predict()
\ No newline at end of file
+ predict()