我们输入test case( 我们目前采用的是analogical-reasoning的任务:找到A - B = C - D的结构,为此我们计算A - B + D,通过cosine距离找最近的C,计算准确率要去除候选中出现A、B、D的候选 )然后计算候选和整个embeding中所有词的余弦相似度,并且取topK(K由参数 --rank_num确定,默认为4)打印出来。
如:
对于:boy - girl + aunt = uncle
0 nearest aunt:0.89
1 nearest uncle:0.70
2 nearest grandmother:0.67
3 nearest father:0.64
您也可以在`build_test_case`方法中模仿给出的例子增加自己的测试
要从测试文件运行测试用例,请将测试文件下载到“test”目录中
我们为每个案例提供以下结构的测试:
`word1 word2 word3 word4`
所以我们可以将它构建成`word1 - word2 + word3 = word4`
训练中预测:
```bash
python infer.py --infer_during_train 2>&1 | tee infer.log
```
使用某个model进行离线预测:
```bash
python infer.py --infer_once--model_output_dir ./models/[具体的models文件目录] 2>&1 | tee infer.log
In infer.py we construct some test cases in the `build_test_case` method to evaluate the effect of word embeding:
We enter the test case (we are currently using the analogical-reasoning task: find the structure of A - B = C - D, for which we calculate A - B + D, find the nearest C by cosine distance, the calculation accuracy is removed Candidates for A, B, and D appear in the candidate) Then calculate the cosine similarity of the candidate and all words in the entire embeding, and print out the topK (K is determined by the parameter --rank_num, the default is 4).
Such as:
For: boy - girl + aunt = uncle
0 nearest aunt: 0.89
1 nearest uncle: 0.70
2 nearest grandmother: 0.67
3 nearest father:0.64
You can also add your own tests by mimicking the examples given in the `build_test_case` method.
To running test case from test files, please download the test files into 'test' directory
we provide test for each case with the following structure:
`word1 word2 word3 word4`
so we can build it into `word1 - word2 + word3 = word4`
Forecast in training:
```bash
```bash
Python infer.py --infer_during_train 2>&1 | tee infer.log
1. Please prepare some CPU machines on Baidu Cloud following the steps in [train_on_baidu_cloud](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/training/train_on_baidu_cloud_cn.rst)
1. Prepare dataset using preprocess.py.
1. Split the train.txt to trainer_num parts and put them on the machines.
1. Run training with the cluster train using the command in `Distributed Train` above.