README.md 7.5 KB
Newer Older
weixin_43283383's avatar
weixin_43283383 已提交
1 2 3
IK Analysis for ElasticSearch
==================================

goBD's avatar
goBD 已提交
4
更新说明:
5
  对于使用es集群,用ik作为分词插件,经常会修改自定义词典,增加远程加载,每次更新都会重新加载词典,不必重启es服务。
goBD's avatar
goBD 已提交
6

weixin_43283383's avatar
weixin_43283383 已提交
7 8
The IK Analysis plugin integrates Lucene IK analyzer into elasticsearch, support customized dictionary.

weixin_43283383's avatar
weixin_43283383 已提交
9
Tokenizer: `ik`
weixin_43283383's avatar
weixin_43283383 已提交
10 11 12

Version
-------------
D
David Yun 已提交
13
 master                      | 1.5.0 -> master
weixin_43283383's avatar
weixin_43283383 已提交
14
 1.4.0                       | 1.6.0
D
David Yun 已提交
15
 1.3.0                       | 1.5.0
16
 1.2.9                       | 1.4.0
weixin_43283383's avatar
weixin_43283383 已提交
17
 1.2.8                       | 1.3.2
weixin_43283383's avatar
weixin_43283383 已提交
18
 1.2.7                       | 1.2.1
weixin_43283383's avatar
weixin_43283383 已提交
19 20 21 22 23 24 25 26
 1.2.6                       | 1.0.0
 1.2.5                       | 0.90.2
 1.2.3                       | 0.90.2
 1.2.0                       | 0.90.0
 1.1.3                       | 0.20.2
 1.1.2                       | 0.19.x
 1.0.0                       | 0.16.2 -> 0.19.0   

weixin_43283383's avatar
weixin_43283383 已提交
27 28 29 30 31 32 33 34
Thanks
-------------
YourKit supports IK Analysis for ElasticSearch project with its full-featured Java Profiler.
YourKit, LLC is the creator of innovative and intelligent tools for profiling
Java and .NET applications. Take a look at YourKit's leading software products:
<a href="http://www.yourkit.com/java/profiler/index.jsp">YourKit Java Profiler</a> and
<a href="http://www.yourkit.com/.net/profiler/index.jsp">YourKit .NET Profiler</a>.

weixin_43283383's avatar
weixin_43283383 已提交
35 36 37
Install
-------------
you can download this plugin from RTF project(https://github.com/medcl/elasticsearch-rtf)
weixin_43283383's avatar
weixin_43283383 已提交
38 39
https://github.com/medcl/elasticsearch-rtf/tree/master/plugins/analysis-ik
https://github.com/medcl/elasticsearch-rtf/tree/master/config/ik
weixin_43283383's avatar
weixin_43283383 已提交
40 41 42 43 44 45 46 47 48 49 50 51

<del>also remember to download the dict files,unzip these dict file into your elasticsearch's config folder,such as: your-es-root/config/ik</del>

you need a service restart after that!

Dict Configuration (es-root/config/ik/IKAnalyzer.cfg.xml)
-------------

https://github.com/medcl/elasticsearch-analysis-ik/blob/master/config/ik/IKAnalyzer.cfg.xml

<pre>

52
<?xml version="1.0" encoding="UTF-8"?>
weixin_43283383's avatar
weixin_43283383 已提交
53 54 55 56 57 58
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">  
<properties>  
	<comment>IK Analyzer 扩展配置</comment>
	<!--用户可以在这里配置自己的扩展字典 -->
	<entry key="ext_dict">custom/mydict.dic;custom/single_word_low_freq.dic</entry> 	
	 <!--用户可以在这里配置自己的扩展停止词字典-->
goBD's avatar
goBD 已提交
59 60
	<entry key="ext_stopwords">custom/ext_stopword.dic</entry> 
 	<!--用户可以在这里配置远程扩展字典 -->
goBD's avatar
goBD 已提交
61
	<entry key="remote_ext_dict">location</entry> 
goBD's avatar
goBD 已提交
62
 	<!--用户可以在这里配置远程扩展停止词字典-->
goBD's avatar
goBD 已提交
63
	<entry key="remote_ext_stopwords">location</entry> 
weixin_43283383's avatar
weixin_43283383 已提交
64 65 66 67 68 69 70
</properties>

</pre>

Analysis Configuration (elasticsearch.yml)
-------------

71
<pre>
weixin_43283383's avatar
weixin_43283383 已提交
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136
index:
  analysis:                   
    analyzer:      
      ik:
          alias: [ik_analyzer]
          type: org.elasticsearch.index.analysis.IkAnalyzerProvider
      ik_max_word:
          type: ik
          use_smart: false
      ik_smart:
          type: ik
          use_smart: true
</pre>
Or
<pre>
index.analysis.analyzer.ik.type : "ik"
</pre>

you can set your prefer segment mode,default `use_smart` is false.

Mapping Configuration
-------------

Here is a quick example:
1.create a index

<pre>

curl -XPUT http://localhost:9200/index

</pre>

2.create a mapping

<pre>

curl -XPOST http://localhost:9200/index/fulltext/_mapping -d'
{
    "fulltext": {
             "_all": {
            "indexAnalyzer": "ik",
            "searchAnalyzer": "ik",
            "term_vector": "no",
            "store": "false"
        },
        "properties": {
            "content": {
                "type": "string",
                "store": "no",
                "term_vector": "with_positions_offsets",
                "indexAnalyzer": "ik",
                "searchAnalyzer": "ik",
                "include_in_all": "true",
                "boost": 8
            }
        }
    }
}'
</pre>

3.index some docs

<pre>

curl -XPOST http://localhost:9200/index/fulltext/1 -d'
137
{"content":"美国留给伊拉克的是个烂摊子吗"}
weixin_43283383's avatar
weixin_43283383 已提交
138 139 140
'

curl -XPOST http://localhost:9200/index/fulltext/2 -d'
141
{"content":"公安部:各地校车将享最高路权"}
weixin_43283383's avatar
weixin_43283383 已提交
142 143 144
'

curl -XPOST http://localhost:9200/index/fulltext/3 -d'
145
{"content":"中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"}
weixin_43283383's avatar
weixin_43283383 已提交
146 147 148
'

curl -XPOST http://localhost:9200/index/fulltext/4 -d'
149
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}
weixin_43283383's avatar
weixin_43283383 已提交
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222
'
</pre>

4.query with highlighting

<pre>

curl -XPOST http://localhost:9200/index/fulltext/_search  -d'
{
    "query" : { "term" : { "content" : "中国" }},
    "highlight" : {
        "pre_tags" : ["<tag1>", "<tag2>"],
        "post_tags" : ["</tag1>", "</tag2>"],
        "fields" : {
            "content" : {}
        }
    }
}
'
</pre>

here is the query result

<pre>

{
    "took": 14,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 2,
        "hits": [
            {
                "_index": "index",
                "_type": "fulltext",
                "_id": "4",
                "_score": 2,
                "_source": {
                    "content": "中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"
                },
                "highlight": {
                    "content": [
                        "<tag1>中国</tag1>驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首 "
                    ]
                }
            },
            {
                "_index": "index",
                "_type": "fulltext",
                "_id": "3",
                "_score": 2,
                "_source": {
                    "content": "中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"
                },
                "highlight": {
                    "content": [
                        "均每天扣1艘<tag1>中国</tag1>渔船 "
                    ]
                }
            }
        ]
    }
}

</pre>

have fun.

223 224
热更新IK分词使用方法
----------
225
目前该插件支持热更新 ik 分词,通过上文在 ik 配置文件中提到的如下配置
226 227 228 229 230 231 232 233

<pre>
 	<!--用户可以在这里配置远程扩展字典 -->
	<entry key="remote_ext_dict">location</entry>
 	<!--用户可以在这里配置远程扩展停止词字典-->
	<entry key="remote_ext_stopwords">location</entry>
</pre>

234
其中 `location` 是指一个 url,比如 `http://yoursite.com/getCustomDict`,该请求只需满足一下两点即可完成分词热更新。
235

236
1. 该 http 请求需要返回两个头部,一个是 `Last-Modified`,一个是 `ETags`,这两者都是字符串类型,只要有一个发生变化,该插件就会去抓取新的分词进而更新词库。
237

238
2. 该 http 请求返回的内容格式是一行一个分词,换行符用 `\n` 即可。
239

240
满足上面两点要求就可以实现热更新分词了,不需要重启 es 实例。
weixin_43283383's avatar
weixin_43283383 已提交
241 242 243 244

常见问题:
-------------
1.自定义词典为什么没有生效?
245
请确保你的扩展词典的文本格式为 UTF8 编码
D
David Yun 已提交
246 247 248 249 250 251 252

2.如何手动安装,以 1.3.0 為例?(参考:https://github.com/medcl/elasticsearch-analysis-ik/issues/46)

`git clone https://github.com/medcl/elasticsearch-analysis-ik`
`cd elasticsearch-analysis-ik`
`mvn compile`
`mvn package`
253
`plugin --install analysis-ik --url file:///#{project_path}/elasticsearch-analysis-ik/target/releases/elasticsearch-analysis-ik-1.3.0.zip`