Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
myguguang
elasticsearch-analysis-ik
提交
3d47fa60
E
elasticsearch-analysis-ik
项目概览
myguguang
/
elasticsearch-analysis-ik
与 Fork 源项目一致
从无法访问的项目Fork
通知
5
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
E
elasticsearch-analysis-ik
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
前往新版Gitcode,体验更适合开发者的 AI 搜索 >>
提交
3d47fa60
编写于
10月 31, 2015
作者:
weixin_43283383
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
update to support es 2.0
上级
a60059f8
变更
18
隐藏空白更改
内联
并排
Showing
18 changed file
with
309 addition
and
190 deletion
+309
-190
README.md
README.md
+22
-89
pom.xml
pom.xml
+13
-13
src/main/assemblies/plugin.xml
src/main/assemblies/plugin.xml
+7
-0
src/main/config/ik.yaml
src/main/config/ik.yaml
+0
-0
src/main/java/org/elasticsearch/index/analysis/IkAnalysisBinderProcessor.java
...asticsearch/index/analysis/IkAnalysisBinderProcessor.java
+6
-5
src/main/java/org/elasticsearch/index/analysis/IkAnalyzerProvider.java
.../org/elasticsearch/index/analysis/IkAnalyzerProvider.java
+4
-3
src/main/java/org/elasticsearch/index/analysis/IkTokenizerFactory.java
.../org/elasticsearch/index/analysis/IkTokenizerFactory.java
+7
-9
src/main/java/org/elasticsearch/indices/analysis/IKIndicesAnalysis.java
...org/elasticsearch/indices/analysis/IKIndicesAnalysis.java
+78
-0
src/main/java/org/elasticsearch/indices/analysis/IKIndicesAnalysisModule.java
...asticsearch/indices/analysis/IKIndicesAnalysisModule.java
+32
-0
src/main/java/org/elasticsearch/plugin/analysis/ik/AnalysisIkPlugin.java
...rg/elasticsearch/plugin/analysis/ik/AnalysisIkPlugin.java
+29
-6
src/main/java/org/wltea/analyzer/cfg/Configuration.java
src/main/java/org/wltea/analyzer/cfg/Configuration.java
+13
-10
src/main/java/org/wltea/analyzer/core/IKSegmenter.java
src/main/java/org/wltea/analyzer/core/IKSegmenter.java
+3
-23
src/main/java/org/wltea/analyzer/lucene/IKAnalyzer.java
src/main/java/org/wltea/analyzer/lucene/IKAnalyzer.java
+8
-22
src/main/java/org/wltea/analyzer/lucene/IKTokenizer.java
src/main/java/org/wltea/analyzer/lucene/IKTokenizer.java
+3
-4
src/main/java/org/wltea/analyzer/query/SWMCQueryBuilder.java
src/main/java/org/wltea/analyzer/query/SWMCQueryBuilder.java
+2
-2
src/main/java/org/wltea/analyzer/sample/LuceneIndexAndSearchDemo.java
...a/org/wltea/analyzer/sample/LuceneIndexAndSearchDemo.java
+2
-2
src/main/resources/es-plugin.properties
src/main/resources/es-plugin.properties
+0
-2
src/main/resources/plugin-descriptor.properties
src/main/resources/plugin-descriptor.properties
+80
-0
未找到文件。
README.md
浏览文件 @
3d47fa60
...
...
@@ -3,16 +3,15 @@ IK Analysis for ElasticSearch
The IK Analysis plugin integrates Lucene IK analyzer into elasticsearch, support customized dictionary.
Tokenizer:
`ik`
更新:对于使用 ES 集群,用 IK 作为分词插件,经常会修改自定义词典的使用者,可以透过远程加载的方式,每次更新都会重新加载词典,不必重启 ES 服务。
Analyzer:
`ik_smart`
,
`ik_max_word`
, Tokenizer:
`ik_smart`
,
`ik_max_word`
Versions
--------
IK version | ES version
-----------|-----------
master | 1.5.0 -> master
master | 2.0.0 -> master
1.
5.0 | 2.0.0
1.
4.1 | 1.7.2
1.
4.0 | 1.6.0
1.
3.0 | 1.5.0
...
...
@@ -30,108 +29,42 @@ master | 1.5.0 -> master
Install
-------
you can download this plugin from RTF project(https://github.com/medcl/elasticsearch-rtf)
https://github.com/medcl/elasticsearch-rtf/tree/master/plugins/analysis-ik
https://github.com/medcl/elasticsearch-rtf/tree/master/config/ik
<del>
also remember to download the dict files,unzip these dict file into your elasticsearch's config folder,such as: your-es-root/config/ik
</del>
you need a service restart after that!
Configuration
-------------
### Analysis Configuration
#### `elasticsearch.yml`
```
yaml
index
:
analysis
:
analyzer
:
ik
:
alias
:
[
ik_analyzer
]
type
:
org.elasticsearch.index.analysis.IkAnalyzerProvider
ik_max_word
:
type
:
ik
use_smart
:
false
ik_smart
:
type
:
ik
use_smart
:
true
```
Or
```
yaml
index.analysis.analyzer.ik.type
:
"
ik"
```
#### 以上两种配置方式的区别:
1.
compile
1、第二种方式,只定义了一个名为 ik 的 analyzer,其 use_smart 采用默认值 false
`mvn package`
2、第一种方式,定义了三个 analyzer,分别为:ik、ik_max_word、ik_smart,其中 ik_max_word 和 ik_smart 是基于 ik 这个 analyzer 定义的,并各自明确设置了 use_smart 的不同值。
copy and unzip
`target/release/ik**.zip`
to
`your-es-root/plugins/ik`
3、其实,ik_max_word 等同于 ik。ik_max_word 会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合;而 ik_smart 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”。
2.
config files:
因此,建议,在设置 mapping 时,用 ik 这个 analyzer,以尽可能地被搜索条件匹配到。
download the dict files,unzip these dict file into your elasticsearch's config folder,such as:
`your-es-root/config/ik`
不过,如果你想将 /index_name/_analyze 这个 RESTful API 做为分词器用,用来提取某段文字中的主题词,则建议使用 ik_smart 这个 analyzer:
3.
restart elasticsearch
```
POST /hailiang/_analyze?analyzer=ik_smart HTTP/1.1
Host: localhost:9200
Cache-Control: no-cache
中华人民共和国国歌
```
返回值:
```
json
{
"tokens"
:
[
{
"token"
:
"中华人民共和国"
,
"start_offset"
:
0
,
"end_offset"
:
7
,
"type"
:
"CN_WORD"
,
"position"
:
1
},
{
"token"
:
"国歌"
,
"start_offset"
:
7
,
"end_offset"
:
9
,
"type"
:
"CN_WORD"
,
"position"
:
2
}
]
}
```
Tips:
另外,可以在 elasticsearch.yml 里加上如下一行,设置默认的 analyzer 为 ik:
```
yaml
index.analysis.analyzer.default.type
:
"
ik"
```
ik_max_word: 会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合;
ik_smart: 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”。
### Mapping Configuration
#### Quick Example
1.
create a index
1.
create a index
```
bash
curl
-XPUT
http://localhost:9200/index
```
2.
create a mapping
2.
create a mapping
```
bash
curl
-XPOST
http://localhost:9200/index/fulltext/_mapping
-d
'
{
"fulltext": {
"_all": {
"indexAnalyzer": "ik",
"searchAnalyzer": "ik",
"indexAnalyzer": "ik
_max_word
",
"searchAnalyzer": "ik
_max_word
",
"term_vector": "no",
"store": "false"
},
...
...
@@ -140,8 +73,8 @@ curl -XPOST http://localhost:9200/index/fulltext/_mapping -d'
"type": "string",
"store": "no",
"term_vector": "with_positions_offsets",
"indexAnalyzer": "ik",
"searchAnalyzer": "ik",
"indexAnalyzer": "ik
_max_word
",
"searchAnalyzer": "ik
_max_word
",
"include_in_all": "true",
"boost": 8
}
...
...
@@ -150,7 +83,7 @@ curl -XPOST http://localhost:9200/index/fulltext/_mapping -d'
}'
```
3.
index some docs
3.
index some docs
```
bash
curl
-XPOST
http://localhost:9200/index/fulltext/1
-d
'
...
...
@@ -176,7 +109,7 @@ curl -XPOST http://localhost:9200/index/fulltext/4 -d'
'
```
4.
query with highlighting
4.
query with highlighting
```
bash
curl
-XPOST
http://localhost:9200/index/fulltext/_search
-d
'
...
...
@@ -193,7 +126,7 @@ curl -XPOST http://localhost:9200/index/fulltext/_search -d'
'
```
####
Result
Result
```
json
{
...
...
@@ -257,7 +190,7 @@ curl -XPOST http://localhost:9200/index/fulltext/_search -d'
<!--用户可以在这里配置远程扩展字典 -->
<entry
key=
"remote_ext_dict"
>
location
</entry>
<!--用户可以在这里配置远程扩展停止词字典-->
<entry
key=
"remote_ext_stopwords"
>
location
</entry>
<entry
key=
"remote_ext_stopwords"
>
http://xxx.com/xxx.dic
</entry>
</properties>
```
...
...
pom.xml
浏览文件 @
3d47fa60
...
...
@@ -6,10 +6,21 @@
<modelVersion>
4.0.0
</modelVersion>
<groupId>
org.elasticsearch
</groupId>
<artifactId>
elasticsearch-analysis-ik
</artifactId>
<version>
1.
4.1
</version>
<version>
1.
5.0
</version>
<packaging>
jar
</packaging>
<description>
IK Analyzer for ElasticSearch
</description>
<inceptionYear>
2009
</inceptionYear>
<properties>
<elasticsearch.version>
2.0.0
</elasticsearch.version>
<elasticsearch.assembly.descriptor>
${project.basedir}/src/main/assemblies/plugin.xml
</elasticsearch.assembly.descriptor>
<elasticsearch.plugin.classname>
org.elasticsearch.plugin.analysis.ik.AnalysisIkPlugin
</elasticsearch.plugin.classname>
<elasticsearch.plugin.jvm>
true
</elasticsearch.plugin.jvm>
<tests.rest.load_packaged>
false
</tests.rest.load_packaged>
<skip.unit.tests>
true
</skip.unit.tests>
</properties>
<licenses>
<license>
<name>
The Apache Software License, Version 2.0
</name>
...
...
@@ -17,6 +28,7 @@
<distribution>
repo
</distribution>
</license>
</licenses>
<scm>
<connection>
scm:git:git@github.com:medcl/elasticsearch-analysis-ik.git
</connection>
<developerConnection>
scm:git:git@github.com:medcl/elasticsearch-analysis-ik.git
...
...
@@ -30,10 +42,6 @@
<version>
7
</version>
</parent>
<properties>
<elasticsearch.version>
1.7.2
</elasticsearch.version>
</properties>
<repositories>
<repository>
<id>
oss.sonatype.org
</id>
...
...
@@ -84,11 +92,6 @@
<version>
4.10
</version>
<scope>
test
</scope>
</dependency>
<dependency>
<groupId>
org.apache.lucene
</groupId>
<artifactId>
lucene-core
</artifactId>
<version>
4.10.4
</version>
</dependency>
</dependencies>
<build>
...
...
@@ -137,9 +140,6 @@
<mainClass>
fully.qualified.MainClass
</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>
jar-with-dependencies
</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
...
...
src/main/assemblies/plugin.xml
浏览文件 @
3d47fa60
...
...
@@ -5,6 +5,13 @@
<format>
zip
</format>
</formats>
<includeBaseDirectory>
false
</includeBaseDirectory>
<files>
<file>
<source>
${project.basedir}/src/main/resources/plugin-descriptor.properties
</source>
<outputDirectory></outputDirectory>
<filtered>
true
</filtered>
</file>
</files>
<dependencySets>
<dependencySet>
<outputDirectory>
/
</outputDirectory>
...
...
src/main/config/ik.yaml
0 → 100644
浏览文件 @
3d47fa60
src/main/java/org/elasticsearch/index/analysis/IkAnalysisBinderProcessor.java
浏览文件 @
3d47fa60
...
...
@@ -3,20 +3,21 @@ package org.elasticsearch.index.analysis;
public
class
IkAnalysisBinderProcessor
extends
AnalysisModule
.
AnalysisBinderProcessor
{
@Override
public
void
processTokenFilters
(
TokenFiltersBindings
tokenFiltersBindings
)
{
@Override
public
void
processTokenFilters
(
TokenFiltersBindings
tokenFiltersBindings
)
{
}
@Override
public
void
processAnalyzers
(
AnalyzersBindings
analyzersBindings
)
{
@Override
public
void
processAnalyzers
(
AnalyzersBindings
analyzersBindings
)
{
analyzersBindings
.
processAnalyzer
(
"ik"
,
IkAnalyzerProvider
.
class
);
super
.
processAnalyzers
(
analyzersBindings
);
}
@Override
public
void
processTokenizers
(
TokenizersBindings
tokenizersBindings
)
{
tokenizersBindings
.
processTokenizer
(
"ik"
,
IkTokenizerFactory
.
class
);
super
.
processTokenizers
(
tokenizersBindings
);
tokenizersBindings
.
processTokenizer
(
"ik_tokenizer"
,
IkTokenizerFactory
.
class
);
}
}
src/main/java/org/elasticsearch/index/analysis/IkAnalyzerProvider.java
浏览文件 @
3d47fa60
package
org.elasticsearch.index.analysis
;
import
org.elasticsearch.common.inject.Inject
;
import
org.elasticsearch.common.inject.assistedinject.Assisted
;
import
org.elasticsearch.common.settings.Settings
;
import
org.elasticsearch.env.Environment
;
import
org.elasticsearch.index.Index
;
...
...
@@ -12,12 +11,14 @@ import org.wltea.analyzer.lucene.IKAnalyzer;
public
class
IkAnalyzerProvider
extends
AbstractIndexAnalyzerProvider
<
IKAnalyzer
>
{
private
final
IKAnalyzer
analyzer
;
private
boolean
useSmart
=
false
;
@Inject
public
IkAnalyzerProvider
(
Index
index
,
@IndexSettings
Settings
indexSettings
,
Environment
env
,
@Assisted
String
name
,
@Assisted
Settings
settings
)
{
public
IkAnalyzerProvider
(
Index
index
,
@IndexSettings
Settings
indexSettings
,
Environment
env
,
String
name
,
Settings
settings
)
{
super
(
index
,
indexSettings
,
name
,
settings
);
Dictionary
.
initial
(
new
Configuration
(
env
));
analyzer
=
new
IKAnalyzer
(
indexSettings
,
settings
,
env
);
useSmart
=
settings
.
get
(
"use_smart"
,
"false"
).
equals
(
"true"
);
analyzer
=
new
IKAnalyzer
(
useSmart
);
}
@Override
public
IKAnalyzer
get
()
{
...
...
src/main/java/org/elasticsearch/index/analysis/IkTokenizerFactory.java
浏览文件 @
3d47fa60
...
...
@@ -11,23 +11,21 @@ import org.wltea.analyzer.cfg.Configuration;
import
org.wltea.analyzer.dic.Dictionary
;
import
org.wltea.analyzer.lucene.IKTokenizer
;
import
java.io.Reader
;
public
class
IkTokenizerFactory
extends
AbstractTokenizerFactory
{
private
Environment
environment
;
private
Settings
settings
;
private
final
Settings
settings
;
private
boolean
useSmart
=
false
;
@Inject
public
IkTokenizerFactory
(
Index
index
,
@IndexSettings
Settings
indexSettings
,
Environment
env
,
@Assisted
String
name
,
@Assisted
Settings
settings
)
{
super
(
index
,
indexSettings
,
name
,
settings
);
this
.
environment
=
env
;
this
.
settings
=
settings
;
this
.
settings
=
settings
;
Dictionary
.
initial
(
new
Configuration
(
env
));
}
@Override
public
Tokenizer
create
(
Reader
reader
)
{
return
new
IKTokenizer
(
reader
,
settings
,
environment
);
}
public
Tokenizer
create
()
{
this
.
useSmart
=
settings
.
get
(
"use_smart"
,
"false"
).
equals
(
"true"
);
return
new
IKTokenizer
(
useSmart
);
}
}
src/main/java/org/elasticsearch/indices/analysis/IKIndicesAnalysis.java
0 → 100644
浏览文件 @
3d47fa60
package
org.elasticsearch.indices.analysis
;
import
org.apache.lucene.analysis.Tokenizer
;
import
org.elasticsearch.common.component.AbstractComponent
;
import
org.elasticsearch.common.inject.Inject
;
import
org.elasticsearch.common.settings.Settings
;
import
org.elasticsearch.index.analysis.AnalyzerScope
;
import
org.elasticsearch.index.analysis.PreBuiltAnalyzerProviderFactory
;
import
org.elasticsearch.index.analysis.PreBuiltTokenizerFactoryFactory
;
import
org.elasticsearch.index.analysis.TokenizerFactory
;
import
org.wltea.analyzer.lucene.IKAnalyzer
;
import
org.wltea.analyzer.lucene.IKTokenizer
;
/**
* Registers indices level analysis components so, if not explicitly configured,
* will be shared among all indices.
*/
public
class
IKIndicesAnalysis
extends
AbstractComponent
{
private
boolean
useSmart
=
false
;
@Inject
public
IKIndicesAnalysis
(
final
Settings
settings
,
IndicesAnalysisService
indicesAnalysisService
)
{
super
(
settings
);
this
.
useSmart
=
settings
.
get
(
"use_smart"
,
"false"
).
equals
(
"true"
);
indicesAnalysisService
.
analyzerProviderFactories
().
put
(
"ik"
,
new
PreBuiltAnalyzerProviderFactory
(
"ik"
,
AnalyzerScope
.
INDICES
,
new
IKAnalyzer
(
useSmart
)));
indicesAnalysisService
.
analyzerProviderFactories
().
put
(
"ik_smart"
,
new
PreBuiltAnalyzerProviderFactory
(
"ik_smart"
,
AnalyzerScope
.
INDICES
,
new
IKAnalyzer
(
true
)));
indicesAnalysisService
.
analyzerProviderFactories
().
put
(
"ik_max_word"
,
new
PreBuiltAnalyzerProviderFactory
(
"ik_max_word"
,
AnalyzerScope
.
INDICES
,
new
IKAnalyzer
(
false
)));
indicesAnalysisService
.
tokenizerFactories
().
put
(
"ik"
,
new
PreBuiltTokenizerFactoryFactory
(
new
TokenizerFactory
()
{
@Override
public
String
name
()
{
return
"ik"
;
}
@Override
public
Tokenizer
create
()
{
return
new
IKTokenizer
(
false
);
}
}));
indicesAnalysisService
.
tokenizerFactories
().
put
(
"ik_smart"
,
new
PreBuiltTokenizerFactoryFactory
(
new
TokenizerFactory
()
{
@Override
public
String
name
()
{
return
"ik_smart"
;
}
@Override
public
Tokenizer
create
()
{
return
new
IKTokenizer
(
true
);
}
}));
indicesAnalysisService
.
tokenizerFactories
().
put
(
"ik_max_word"
,
new
PreBuiltTokenizerFactoryFactory
(
new
TokenizerFactory
()
{
@Override
public
String
name
()
{
return
"ik_max_word"
;
}
@Override
public
Tokenizer
create
()
{
return
new
IKTokenizer
(
false
);
}
}));
}
}
\ No newline at end of file
src/main/java/org/elasticsearch/indices/analysis/IKIndicesAnalysisModule.java
0 → 100644
浏览文件 @
3d47fa60
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package
org.elasticsearch.indices.analysis
;
import
org.elasticsearch.common.inject.AbstractModule
;
/**
*/
public
class
IKIndicesAnalysisModule
extends
AbstractModule
{
@Override
protected
void
configure
()
{
bind
(
IKIndicesAnalysis
.
class
).
asEagerSingleton
();
}
}
\ No newline at end of file
src/main/java/org/elasticsearch/plugin/analysis/ik/AnalysisIkPlugin.java
浏览文件 @
3d47fa60
package
org.elasticsearch.plugin.analysis.ik
;
import
org.elasticsearch.common.inject.AbstractModule
;
import
org.elasticsearch.common.inject.Module
;
import
org.elasticsearch.common.logging.ESLogger
;
import
org.elasticsearch.common.logging.ESLoggerFactory
;
import
org.elasticsearch.common.settings.Settings
;
import
org.elasticsearch.index.analysis.AnalysisModule
;
import
org.elasticsearch.index.analysis.IkAnalysisBinderProcessor
;
import
org.elasticsearch.plugins.AbstractPlugin
;
import
org.elasticsearch.indices.analysis.IKIndicesAnalysisModule
;
import
org.elasticsearch.plugins.Plugin
;
import
java.util.Collection
;
import
java.util.Collections
;
import
java.util.logging.Logger
;
public
class
AnalysisIkPlugin
extends
AbstractPlugin
{
import
static
java
.
rmi
.
Naming
.
bind
;
public
class
AnalysisIkPlugin
extends
Plugin
{
private
final
Settings
settings
;
public
AnalysisIkPlugin
(
Settings
settings
){
this
.
settings
=
settings
;
}
@Override
public
String
name
()
{
return
"analysis-ik"
;
...
...
@@ -17,11 +33,18 @@ public class AnalysisIkPlugin extends AbstractPlugin {
return
"ik analysis"
;
}
@Override
public
Collection
<
Module
>
nodeModules
()
{
return
Collections
.<
Module
>
singletonList
(
new
IKIndicesAnalysisModule
());
}
@Override
public
void
processModule
(
Module
module
)
{
if
(
module
instanceof
AnalysisModule
)
{
AnalysisModule
analysisModule
=
(
AnalysisModule
)
module
;
analysisModule
.
addProcessor
(
new
IkAnalysisBinderProcessor
());
public
static
class
ConfiguredExampleModule
extends
AbstractModule
{
@Override
protected
void
configure
()
{
}
}
public
void
onModule
(
AnalysisModule
module
)
{
module
.
addProcessor
(
new
IkAnalysisBinderProcessor
());
}
}
src/main/java/org/wltea/analyzer/cfg/Configuration.java
浏览文件 @
3d47fa60
...
...
@@ -3,16 +3,17 @@
*/
package
org.wltea.analyzer.cfg
;
import
org.elasticsearch.common.inject.Inject
;
import
org.elasticsearch.common.logging.ESLogger
;
import
org.elasticsearch.common.logging.Loggers
;
import
org.elasticsearch.env.Environment
;
import
java.io.*
;
import
java.util.ArrayList
;
import
java.util.InvalidPropertiesFormatException
;
import
java.util.List
;
import
java.util.Properties
;
import
org.elasticsearch.common.logging.ESLogger
;
import
org.elasticsearch.common.logging.Loggers
;
import
org.elasticsearch.env.Environment
;
public
class
Configuration
{
private
static
String
FILE_NAME
=
"ik/IKAnalyzer.cfg.xml"
;
...
...
@@ -20,16 +21,18 @@ public class Configuration {
private
static
final
String
REMOTE_EXT_DICT
=
"remote_ext_dict"
;
private
static
final
String
EXT_STOP
=
"ext_stopwords"
;
private
static
final
String
REMOTE_EXT_STOP
=
"remote_ext_stopwords"
;
private
static
ESLogger
logger
=
null
;
private
static
ESLogger
logger
=
Loggers
.
getLogger
(
"ik-analyzer"
)
;
private
Properties
props
;
private
Environment
environment
;
@Inject
public
Configuration
(
Environment
env
){
logger
=
Loggers
.
getLogger
(
"ik-analyzer"
);
props
=
new
Properties
();
environment
=
env
;
File
fileConfig
=
new
File
(
environment
.
configFile
(),
FILE_NAME
);
File
fileConfig
=
new
File
(
environment
.
configFile
().
toFile
(),
FILE_NAME
);
InputStream
input
=
null
;
try
{
...
...
@@ -41,9 +44,9 @@ public class Configuration {
try
{
props
.
loadFromXML
(
input
);
}
catch
(
InvalidPropertiesFormatException
e
)
{
e
.
printStackTrace
(
);
logger
.
error
(
"ik-analyzer"
,
e
);
}
catch
(
IOException
e
)
{
e
.
printStackTrace
(
);
logger
.
error
(
"ik-analyzer"
,
e
);
}
}
}
...
...
@@ -123,6 +126,6 @@ public class Configuration {
}
public
File
getDictRoot
()
{
return
environment
.
configFile
();
return
environment
.
configFile
()
.
toFile
()
;
}
}
src/main/java/org/wltea/analyzer/core/IKSegmenter.java
浏览文件 @
3d47fa60
...
...
@@ -41,8 +41,6 @@ public final class IKSegmenter {
//字符窜reader
private
Reader
input
;
//分词器配置项
private
Configuration
cfg
;
//分词器上下文
private
AnalyzeContext
context
;
//分词处理器列表
...
...
@@ -56,35 +54,17 @@ public final class IKSegmenter {
* IK分词器构造函数
* @param input
*/
public
IKSegmenter
(
Reader
input
,
Settings
settings
,
Environment
environmen
t
){
public
IKSegmenter
(
Reader
input
,
boolean
useSmar
t
){
this
.
input
=
input
;
this
.
cfg
=
new
Configuration
(
environment
);
this
.
useSmart
=
settings
.
get
(
"use_smart"
,
"false"
).
equals
(
"true"
);
this
.
useSmart
=
useSmart
;
this
.
init
();
}
public
IKSegmenter
(
Reader
input
){
new
IKSegmenter
(
input
,
null
,
null
);
}
// /**
// * IK分词器构造函数
// * @param input
// * @param cfg 使用自定义的Configuration构造分词器
// *
// */
// public IKSegmenter(Reader input , Configuration cfg){
// this.input = input;
// this.cfg = cfg;
// this.init();
// }
/**
* 初始化
*/
private
void
init
(){
//初始化词典单例
Dictionary
.
initial
(
this
.
cfg
);
//初始化分词上下文
this
.
context
=
new
AnalyzeContext
(
useSmart
);
//加载子分词器
...
...
src/main/java/org/wltea/analyzer/lucene/IKAnalyzer.java
浏览文件 @
3d47fa60
...
...
@@ -24,13 +24,8 @@
*/
package
org.wltea.analyzer.lucene
;
import
java.io.Reader
;
import
org.apache.lucene.analysis.Analyzer
;
import
org.apache.lucene.analysis.Tokenizer
;
import
org.elasticsearch.common.settings.ImmutableSettings
;
import
org.elasticsearch.common.settings.Settings
;
import
org.elasticsearch.env.Environment
;
/**
* IK分词器,Lucene Analyzer接口实现
...
...
@@ -39,8 +34,8 @@ import org.elasticsearch.env.Environment;
public
final
class
IKAnalyzer
extends
Analyzer
{
private
boolean
useSmart
;
public
boolean
useSmart
()
{
public
boolean
useSmart
()
{
return
useSmart
;
}
...
...
@@ -54,35 +49,26 @@ public final class IKAnalyzer extends Analyzer{
* 默认细粒度切分算法
*/
public
IKAnalyzer
(){
this
(
false
);
}
/**
/**
* IK分词器Lucene Analyzer接口实现类
*
* @param useSmart 当为true时,分词器进行智能切分
*/
public
IKAnalyzer
(
boolean
useSmart
){
super
();
this
.
useSmart
=
useSmart
;
this
.
useSmart
=
useSmart
;
}
Settings
settings
=
ImmutableSettings
.
EMPTY
;
Environment
environment
=
new
Environment
();
public
IKAnalyzer
(
Settings
indexSetting
,
Settings
settings
,
Environment
environment
)
{
super
();
this
.
settings
=
settings
;
this
.
environment
=
environment
;
}
/**
* 重载Analyzer接口,构造分词组件
*/
@Override
protected
TokenStreamComponents
createComponents
(
String
fieldName
,
final
Reader
in
)
{
Tokenizer
_IKTokenizer
=
new
IKTokenizer
(
in
,
settings
,
environmen
t
);
protected
TokenStreamComponents
createComponents
(
String
fieldName
)
{
Tokenizer
_IKTokenizer
=
new
IKTokenizer
(
useSmar
t
);
return
new
TokenStreamComponents
(
_IKTokenizer
);
}
}
}
src/main/java/org/wltea/analyzer/lucene/IKTokenizer.java
浏览文件 @
3d47fa60
...
...
@@ -66,14 +66,14 @@ public final class IKTokenizer extends Tokenizer {
* Lucene 4.0 Tokenizer适配器类构造函数
* @param in
*/
public
IKTokenizer
(
Reader
in
,
Settings
settings
,
Environment
environmen
t
){
super
(
in
);
public
IKTokenizer
(
boolean
useSmar
t
){
super
();
offsetAtt
=
addAttribute
(
OffsetAttribute
.
class
);
termAtt
=
addAttribute
(
CharTermAttribute
.
class
);
typeAtt
=
addAttribute
(
TypeAttribute
.
class
);
posIncrAtt
=
addAttribute
(
PositionIncrementAttribute
.
class
);
_IKImplement
=
new
IKSegmenter
(
input
,
settings
,
environmen
t
);
_IKImplement
=
new
IKSegmenter
(
input
,
useSmar
t
);
}
/* (non-Javadoc)
...
...
@@ -95,7 +95,6 @@ public final class IKTokenizer extends Tokenizer {
//设置词元长度
termAtt
.
setLength
(
nextLexeme
.
getLength
());
//设置词元位移
// offsetAtt.setOffset(nextLexeme.getBeginPosition(), nextLexeme.getEndPosition());
offsetAtt
.
setOffset
(
correctOffset
(
nextLexeme
.
getBeginPosition
()),
correctOffset
(
nextLexeme
.
getEndPosition
()));
//记录分词的最后位置
...
...
src/main/java/org/wltea/analyzer/query/SWMCQueryBuilder.java
浏览文件 @
3d47fa60
...
...
@@ -71,7 +71,7 @@ public class SWMCQueryBuilder {
private
static
List
<
Lexeme
>
doAnalyze
(
String
keywords
){
List
<
Lexeme
>
lexemes
=
new
ArrayList
<
Lexeme
>();
IKSegmenter
ikSeg
=
new
IKSegmenter
(
new
StringReader
(
keywords
));
IKSegmenter
ikSeg
=
new
IKSegmenter
(
new
StringReader
(
keywords
)
,
true
);
try
{
Lexeme
l
=
null
;
while
(
(
l
=
ikSeg
.
next
())
!=
null
){
...
...
@@ -125,7 +125,7 @@ public class SWMCQueryBuilder {
}
//借助lucene queryparser 生成SWMC Query
QueryParser
qp
=
new
QueryParser
(
Version
.
LUCENE_40
,
fieldName
,
new
StandardAnalyzer
(
Version
.
LUCENE_40
));
QueryParser
qp
=
new
QueryParser
(
fieldName
,
new
StandardAnalyzer
(
));
qp
.
setDefaultOperator
(
QueryParser
.
AND_OPERATOR
);
qp
.
setAutoGeneratePhraseQueries
(
true
);
...
...
src/main/java/org/wltea/analyzer/sample/LuceneIndexAndSearchDemo.java
浏览文件 @
3d47fa60
...
...
@@ -86,7 +86,7 @@ public class LuceneIndexAndSearchDemo {
directory
=
new
RAMDirectory
();
//配置IndexWriterConfig
IndexWriterConfig
iwConfig
=
new
IndexWriterConfig
(
Version
.
LUCENE_40
,
analyzer
);
IndexWriterConfig
iwConfig
=
new
IndexWriterConfig
(
analyzer
);
iwConfig
.
setOpenMode
(
OpenMode
.
CREATE_OR_APPEND
);
iwriter
=
new
IndexWriter
(
directory
,
iwConfig
);
//写入索引
...
...
@@ -104,7 +104,7 @@ public class LuceneIndexAndSearchDemo {
String
keyword
=
"中文分词工具包"
;
//使用QueryParser查询分析器构造Query对象
QueryParser
qp
=
new
QueryParser
(
Version
.
LUCENE_40
,
fieldName
,
analyzer
);
QueryParser
qp
=
new
QueryParser
(
fieldName
,
analyzer
);
qp
.
setDefaultOperator
(
QueryParser
.
AND_OPERATOR
);
Query
query
=
qp
.
parse
(
keyword
);
System
.
out
.
println
(
"Query = "
+
query
);
...
...
src/main/resources/es-plugin.properties
已删除
100644 → 0
浏览文件 @
a60059f8
plugin
=
org.elasticsearch.plugin.analysis.ik.AnalysisIkPlugin
version
=
${project.version}
\ No newline at end of file
src/main/resources/plugin-descriptor.properties
0 → 100644
浏览文件 @
3d47fa60
# Elasticsearch plugin descriptor file
# This file must exist as 'plugin-descriptor.properties' at
# the root directory of all plugins.
#
# A plugin can be 'site', 'jvm', or both.
#
### example site plugin for "foo":
#
# foo.zip <-- zip file for the plugin, with this structure:
# _site/ <-- the contents that will be served
# plugin-descriptor.properties <-- example contents below:
#
# site=true
# description=My cool plugin
# version=1.0
#
### example jvm plugin for "foo"
#
# foo.zip <-- zip file for the plugin, with this structure:
# <arbitrary name1>.jar <-- classes, resources, dependencies
# <arbitrary nameN>.jar <-- any number of jars
# plugin-descriptor.properties <-- example contents below:
#
# jvm=true
# classname=foo.bar.BazPlugin
# description=My cool plugin
# version=2.0.0-rc1
# elasticsearch.version=2.0
# java.version=1.7
#
### mandatory elements for all plugins:
#
# 'description': simple summary of the plugin
description
=
${project.description}
#
# 'version': plugin's version
version
=
${project.version}
#
# 'name': the plugin name
name
=
${elasticsearch.plugin.name}
### mandatory elements for site plugins:
#
# 'site': set to true to indicate contents of the _site/
# directory in the root of the plugin should be served.
site
=
${elasticsearch.plugin.site}
#
### mandatory elements for jvm plugins :
#
# 'jvm': true if the 'classname' class should be loaded
# from jar files in the root directory of the plugin.
# Note that only jar files in the root directory are
# added to the classpath for the plugin! If you need
# other resources, package them into a resources jar.
jvm
=
${elasticsearch.plugin.jvm}
#
# 'classname': the name of the class to load, fully-qualified.
classname
=
${elasticsearch.plugin.classname}
#
# 'java.version' version of java the code is built against
# use the system property java.specification.version
# version string must be a sequence of nonnegative decimal integers
# separated by "."'s and may have leading zeros
java.version
=
${maven.compiler.target}
#
# 'elasticsearch.version' version of elasticsearch compiled against
# You will have to release a new version of the plugin for each new
# elasticsearch release. This version is checked when the plugin
# is loaded so Elasticsearch will refuse to start in the presence of
# plugins with the incorrect elasticsearch.version.
elasticsearch.version
=
${elasticsearch.version}
#
### deprecated elements for jvm plugins :
#
# 'isolated': true if the plugin should have its own classloader.
# passing false is deprecated, and only intended to support plugins
# that have hard dependencies against each other. If this is
# not specified, then the plugin is isolated by default.
isolated
=
${elasticsearch.plugin.isolated}
#
\ No newline at end of file
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录