提交 a1af98a5 编写于 作者: H Hai Liang Wang

Support customized dict and model

上级 792750c5
......@@ -11,7 +11,7 @@ Chinese Synonyms for Natural Language Processing and Understanding.
* [Install](https://github.com/huyingxi/Synonyms#welcome)
* [Usage](https://github.com/huyingxi/Synonyms#usage)
* [Demo](https://github.com/huyingxi/Synonyms#demo)
* [Quick Get Start](https://github.com/huyingxi/Synonyms#quick-get-start)
* [Valuation](https://github.com/huyingxi/Synonyms#valuation)
* [Benchmark](https://github.com/huyingxi/Synonyms#benchmark)
* [Statement](https://github.com/huyingxi/Synonyms#statement)
......@@ -24,7 +24,9 @@ Chinese Synonyms for Natural Language Processing and Understanding.
```
pip install -U synonyms
```
兼容py2和py3,当前稳定版本 [v2.x](https://github.com/huyingxi/Synonyms/releases)。**同时,Node.js 用户可以使用 [node-synonyms](https://www.npmjs.com/package/node-synonyms)了。**
兼容py2和py3,当前稳定版本 [v2.x](https://github.com/huyingxi/Synonyms/releases)。
**Node.js 用户可以使用 [node-synonyms](https://www.npmjs.com/package/node-synonyms)了。**
```
npm install node-synonyms
......@@ -32,12 +34,17 @@ npm install node-synonyms
![](./assets/3.gif)
## Samples
![](assets/2.png)
本文档的配置和接口说明面向python工具包, node版本查看[项目](https://www.npmjs.com/package/node-synonyms)。
## Usage
支持使用环境变量配置分词词表和word2vec词向量文件。
| 环境变量 | 描述 |
| --- | --- |
| *SYNONYMS_WORD2VEC_BIN_MODEL_ZH_CN* | 使用word2vec训练的词向量文件,二进制格式。 |
| *SYNONYMS_WORDSEG_DICT* | 中文分词[**主字典**](https://github.com/fxsjy/jieba#%E5%BB%B6%E8%BF%9F%E5%8A%A0%E8%BD%BD%E6%9C%BA%E5%88%B6),格式和使用[参考](https://github.com/fxsjy/jieba#%E8%BD%BD%E5%85%A5%E8%AF%8D%E5%85%B8) |
### synonyms#nearby
```
import synonyms
......@@ -96,7 +103,11 @@ synonyms.nearby(人脸) = [
![](assets/1.png)
## Demo
## Samples
![](assets/2.png)
## Quick Get Start
```
$ pip install -r Requirements.txt
$ python demo.py
......
......@@ -13,7 +13,7 @@ Welcome
setup(
name='synonyms',
version='2.5',
version='2.6',
description='Chinese Synonyms for Natural Language Processing and Understanding',
long_description=LONGDOC,
author='Hai Liang Wang, Hu Ying Xi',
......
......@@ -41,6 +41,9 @@ if sys.version_info[0] < 3:
else:
PLT = 3
# Get Environment variables
ENVIRON = os.environ.copy()
import json
import gzip
import shutil
......@@ -66,6 +69,17 @@ lambda fns
_similarity_smooth = lambda x, y, z: (x * y) + z
_sim_molecule = lambda x: np.sum(x, axis=0) # 分子
'''
tokenizer settings
'''
if "SYNONYMS_WORDSEG_DICT" in ENVIRON:
tokenizer_dict = ENVIRON["SYNONYMS_WORDSEG_DICT"]
if os.exist(tokenizer_dict):
jieba.set_dictionary(tokenizer_dict)
print("info: set wordseg dict with %s" % tokenizer_dict)
else: print("warning: can not find dict at [%s]" % tokenizer_dict)
'''
nearby
'''
......@@ -133,13 +147,15 @@ def _segment_words(sen):
# vectors
_f_model = os.path.join(curdir, 'data', 'words.vector')
if "SYNONYMS_WORD2VEC_BIN_MODEL_ZH_CN" in ENVIRON:
_f_model = ENVIRON["SYNONYMS_WORD2VEC_BIN_MODEL_ZH_CN"]
def _load_w2v(model_file=_f_model, binary=True):
'''
load word2vec model
'''
if not os.path.exists(model_file):
print("os.path : ", os.path)
raise Exception("Model file does not exist.")
raise Exception("Model file [%s] does not exist." % model_file)
return KeyedVectors.load_word2vec_format(
model_file, binary=binary, unicode_errors='ignore')
print(">> Synonyms on loading vectors ...")
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册