微服务ELasticSearch
FANSEAELasticSearch
项目优势
- 基于ES自定义拼音分词器实现对关键词、拼音实现搜索框自动补全
- 基于ES实现依靠关键词,距离,城市,价格范围等多条件高效搜索,并对关键词进行高亮显示
- 聚合索引库数据,根据酒店口杯、数量筛选,动态展现搜索条件
- 基于RabbitMq实现ES索引库和数据库数据同步
- 部署es集群结合Ribbon实现负载均衡
倒排索引概念
将数据分为词条,保存各词条与id的映射关系
mysql中的like查询是逐一查询的,速率很慢;但是对于简单的id查询工作数据库速度其实不慢。
ElasticSearch架构:
es主要可以用于数据搜索(查询商品),mysql强调数据持久化和一致性(订单系统,博客系统)

IK中文分词插件
IK插件词条分析模式:
ik_smart
:最少切分
ik_max_word
:最细切分
1 2 3 4 5
| POST /_analyze { "text": "我是黑马java程序员哈哈!", "analyzer": "ik_max_word" }
|
IK字典管理:
字典中有些网络热词没有添加进来,还有一些不该出现的词需要禁掉,这里我们需要对他的一个配置文件进行设置
修改IKAnalyzer.cfg.xml文件:
1 2 3 4 5 6 7 8 9 10 11 12 13
| <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties> <comment>IK Analyzer 扩展配置</comment> <entry key="ext_dict">ext.dic</entry> <entry key="ext_stopwords">stopword.dic</entry> </properties>
|
写入词语进入ext.dic则会扩展词典,写入stopword.dic则会添加禁语
索引库属性
mapping属性:
这里index的属性默认为ture,意思为参与倒排索引搜索

创建索引库(数据库表):
类似于mysql的数据库表,需要定义各种属性,以及数据类型
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| PUT /heima { "mappings": { "properties": { "info":{ "type": "text", "analyzer": "ik_smart" }, "email":{ "type": "keyword", "index": "false" }, "name":{ "properties": { "firstName": { "type": "keyword" } } }, } } }
|
DSL语法:
索引库操作(数据库表)
无法修改字段,但是可以新增字段

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
| #新增索引库 PUT /heima { "mappings": { "properties": { "name": { "type": "text", "analyzer": "ik_smart" }, "email": { "type": "keyword", "index": false } } } }
#获取索引 GET /heima
#添加索引属性 PUT /heima/_mapping { "properties": { "isRoot": { "type": "boolean", "index": false } } }
#删除索引库 DELETE /heima
|
文档操作(表字段)
其实就是CRUD

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
| #添加文档(字段) POST /heima/_doc/1 { "name": "赵云", "email": "3065941239@qq.cm", "isRoot": true }
#查看文档 GET /heima/_doc/1
#删除文档 DELETE /heima/_doc/1
#修改文档1:全量修改 PUT /heima/_doc/1 { "name": "赵云", "email": "3065941239@qq.cm", "isRoot": false }
#修改文档2:单变量修改 POST /heima/_update/1 { "doc": { "isRoot": true } }
|
查询词条

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
| #利用分词器分词查询单字段 GET /hotel/_search { "query": { "match": { "name": "酒店" } } }
#多字段查询 GET /hotel/_search { "query": { "multi_match": { "query": "酒店", "fields": ["brand","business","name"] } } }
#根据范围查询(keyword) GET /hotel/_search { "query": { "range": { "price": { "gte": 10, "lte": 20 } } } }
#根据keyword字段精确查询 GET /hotel/_search { "query": { "term": { "business": { "value": "外滩" } } } }
#根据地理位置周围圆周区域查询 GET /hotel/_search { "query": { "geo_distance":{ "distance": "15km", "location": "31.21,121.5" } } }
GET /hotel/_search { "query": { "geo_bounding_box": { "FIELD": { "top_left": { "lat": 31.1, "lon": 121.5 }, "bottom_right": { "lat": 30.9, "lon": 121.7 } } } } }
|
在java代码操作ES(RESTClient)
RESTClient文档
根据数据表设计索引库

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
| PUT /hotel { "mappings": { "properties": { "id": { "type": "keyword" }, "name": { "type": "text", "analyzer": "ik_max_word", "copy_to": "all" }, "address": { "type": "keyword", "index": false }, "price": { "type": "integer" }, "score": { "type": "integer" }, "brand": { "type": "keyword", "copy_to": "all" }, "city": { "type": "keyword" }, "starName": { "type": "keyword" }, "business": { "type": "keyword", "copy_to": "all" }, "location": { "type": "geo_point" }, "pic": { "type": "keyword", "index": false }, "all": { "type": "text", "analyzer": "ik_max_word" } } } }
|
copy_to是将该字段加入到all,对all操作就能一起查询对应加入的字段
javaRestClient快速开始
- 引入依赖
1 2 3 4 5 6 7 8 9
| <properties> <java.version>1.8</java.version> <elasticsearch.version>7.12.1</elasticsearch.version> </properties>
<dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>elasticsearch-rest-high-level-client</artifactId> </dependency>
|
- 简单使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| RestHighLevelClient client;
@BeforeEach void initialize(){ this.client = new RestHighLevelClient( RestClient.builder(HttpHost.create("http://192.168.25.80:9200")) ); }
@AfterEach void close() throws IOException { client.close(); }
@Test void testClient(){ System.out.println(client); }
|
操作索引库(indices)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| RestHighLevelClient client; @Test void testClient() throws IOException { CreateIndexRequest request = new CreateIndexRequest("hotel"); request.source(HOTEL_INDEX, XContentType.JSON); client.indices().create(request, RequestOptions.DEFAULT); } @Test void testDelete() throws IOException { DeleteIndexRequest request = new DeleteIndexRequest("hotel"); client.indices().delete(request, RequestOptions.DEFAULT); } @Test void testExit() throws IOException { GetIndexRequest request = new GetIndexRequest("hotel"); Boolean getIndexResponse = client.indices().exists(request, RequestOptions.DEFAULT); System.out.println(getIndexResponse); }
|
操作文档(index)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
| @Test
void testIndexAdd() throws IOException { Hotel hotel = hotelMapper.selectById(36934L); HotelDoc hotelDoc = new HotelDoc(hotel); IndexRequest request = new IndexRequest("hotel").id(hotelDoc.getId().toString()); request.source(JSON.toJSON(hotelDoc),XContentType.JSON); client.index(request,RequestOptions.DEFAULT); } @Test void testIndexGet() throws IOException { GetRequest request = new GetRequest("hotel","36934"); GetResponse response = client.get(request, RequestOptions.DEFAULT); String json = response.getSourceAsString(); System.out.println(json); }
@Test void testIndexUpdate() throws IOException { Hotel hotel = hotelMapper.selectById(36934L); HotelDoc hotelDoc = new HotelDoc(hotel); UpdateRequest request = new UpdateRequest("hotel", hotelDoc.getId().toString()); request.doc("city","北京"); client.update(request,RequestOptions.DEFAULT); } @Test void testIndexDelete() throws IOException { Hotel hotel = hotelMapper.selectById(36934L); HotelDoc hotelDoc = new HotelDoc(hotel); DeleteRequest request = new DeleteRequest("hotel", hotelDoc.getId().toString()); client.delete(request,RequestOptions.DEFAULT); }
@Test void testIndexBulk() throws IOException { List<Hotel> hotels = hotelMapper.selectList(null); BulkRequest request = new BulkRequest(); for(Hotel hotel:hotels ){ HotelDoc hotelDoc = new HotelDoc(hotel); request.add(new IndexRequest("hotel").id(hotelDoc.getId().toString()).source(JSON.toJSON(hotelDoc),XContentType.JSON)); } client.bulk(request,RequestOptions.DEFAULT); }
|
相关性排名算分
- TF算法
完全根据词条在该文档出现的频率打分
- TF-IDF算法
吸收上种算法的特性,再比较他每个词条在全文中的权重
- BM25算法


function_score修改排名算分

复合条件查询(bool查询)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
| GET /hotel/_search { "query": { "bool": { "must": [ {"term": { "brand": { "value": "如家" } }} ], "must_not": [ { "range": { "price": { "gt": 300 } } } ], "filter": [ { "geo_distance": { "distance": "10km", "location": { "lat": 31.3, "lon": 121.4 } } } ] } } }
|
搜索结果处理
sort标签与query同级
排序
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| # 如果score值相同,则会按照price升序排序 GET /hotel/_search { "query": { "match_all": {} }, "sort": [ { "score": "desc" }, { "price": "asc" } ] }
|
分页
指定from,size
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| GET /hotel/_search { "query": { "match_all": {} }, "from": 0, "size": 2, "sort": [ { "price": { "order": "asc" } } ] }
|
RestClient查找操作
单字段查询
1 2 3 4 5 6 7 8 9 10 11
| @Test void testSearch() throws IOException { SearchRequest request = new SearchRequest("hotel");
request.source().query(QueryBuilders.rangeQuery("price").lte(500).gte(300)); SearchResponse response = client.search(request, RequestOptions.DEFAULT); handleResponse(response);
}
|
复合条件查询
1 2 3 4 5 6 7 8 9 10 11 12
| @Test void testSearch() throws IOException { SearchRequest request = new SearchRequest("hotel"); BoolQueryBuilder boolQuery = QueryBuilders.boolQuery(); boolQuery.must(QueryBuilders.termQuery("brand","如家")); boolQuery.mustNot(QueryBuilders.rangeQuery("price").gt(1000)); boolQuery.filter(QueryBuilders.geoDistanceQuery("location").point(31.1D,121.5D).distance("100km")); request.source().query(boolQuery); SearchResponse response = client.search(request, RequestOptions.DEFAULT); handleResponse(response);
}
|
分页,排序查询
1 2 3 4 5 6 7 8 9 10 11 12
| @Test void testOrder() throws IOException { int page = 1; int size = 5; SearchRequest request = new SearchRequest("hotel"); request.source().query(QueryBuilders.matchAllQuery()); request.source().sort("price", SortOrder.ASC); request.source().from((page - 1)*size).size(size); SearchResponse response = client.search(request, RequestOptions.DEFAULT); handleResponse(response);
}
|
高亮显示
requireFieldMatch
是 HighlightBuilder
的一个设置,用于确定高亮是否仅应用于与查询匹配的字段。当 requireFieldMatch
设置为 true
时,Elasticsearch将仅高亮那些在查询中明确提及的字段中与查询匹配的文本。这意味着如果你查询了字段A但没有查询字段B,即使字段B中有与查询匹配的文本,字段B中的文本也不会被高亮。
当你将 requireFieldMatch
设置为 false
时,Elasticsearch将尝试在请求中指定的所有字段中查找与查询匹配的文本,并对其进行高亮,而不仅仅是在查询中明确提及的字段。
1 2 3 4 5 6 7 8 9
| @Test void testHighlight() throws IOException { SearchRequest request = new SearchRequest("hotel"); request.source().query(QueryBuilders.matchQuery("all","如家")); request.source().highlighter(new HighlightBuilder().field("name").requireFieldMatch(false)); SearchResponse response = client.search(request, RequestOptions.DEFAULT); handleResponse(response);
}
|
黑马旅游

广告置顶

- 添加广告标识
1 2 3
| public class HotelDoc { private Boolean isAD; }
|
- 使用functionScore排序将isAD为true的重新算分
1 2 3 4 5 6 7
| FunctionScoreQueryBuilder scoreQuery = QueryBuilders.functionScoreQuery(boolQuery, new FunctionScoreQueryBuilder.FilterFunctionBuilder[]{ new FunctionScoreQueryBuilder.FilterFunctionBuilder( QueryBuilders.termQuery("isAD", true), ScoreFunctionBuilders.weightFactorFunction(10) ) }); request.source().query(scoreQuery);
|
数据聚合展示桌面选项


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
| public Map<String, List<String>> getMultiAggregation() { Map<String, List<String>> map = new HashMap<>(); SearchRequest request = new SearchRequest("hotel"); request.source().size(0); request.source().aggregation(AggregationBuilders.terms("brandAgg").field("brand").size(20)); request.source().aggregation(AggregationBuilders.terms("cityAgg").field("city").size(20)); request.source().aggregation(AggregationBuilders.terms("starAgg").field("starName").size(20));
SearchResponse response = null; try { response = client.search(request, RequestOptions.DEFAULT); } catch (IOException e) { e.printStackTrace(); }
Aggregations aggregations = response.getAggregations(); List listBrand = getAggList(aggregations,"brandAgg"); map.put("品牌",listBrand); List listCity = getAggList(aggregations,"cityAgg"); map.put("城市",listCity); List listStar = getAggList(aggregations,"starAgg"); map.put("星级",listStar); return map; } private List getAggList(Aggregations aggregations,String aggName) { Terms brandAgg = aggregations.get(aggName); List list = new ArrayList<String>(); List<? extends Terms.Bucket> buckets = brandAgg.getBuckets(); for(Terms.Bucket bucket:buckets){ String key = bucket.getKeyAsString(); list.add(key);
} return list; }
|
数据聚合
查询文档
数据聚合可以理解为对文档做分类之后统计数据,比如会统计:
- 品牌分类
- 品牌平均评分,最高分,最低分
聚合分类

聚合语法
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
| GET /hotel/_search { #设置聚合的范围 "query": { "range": { "price": { "lte": 1000 } } }, #设置命中文档为0,只显示聚合结果 "size": 0, "aggs": { "brandAgg": { "terms": { "field": "brand", "size": 10, #order选择需要排序的字段 "order": { "scoreAgg.avg": "desc" } }, #对评分做聚合,结果带上评分聚合结果 "aggs": { "scoreAgg": { "stats": { "field": "score" } } } } } }
|
RestAPI实现聚合


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| @Test void testAggregation() throws IOException { SearchRequest request = new SearchRequest("hotel"); request.source().size(0); request.source().aggregation(AggregationBuilders.terms("brandAgg").field("brand").size(20)); SearchResponse response = client.search(request, RequestOptions.DEFAULT); System.out.println("response = " + response); Aggregations aggregations = response.getAggregations(); Terms brandAgg = aggregations.get("brandAgg"); List<? extends Terms.Bucket> buckets = brandAgg.getBuckets(); for(Terms.Bucket bucket:buckets){ String key = bucket.getKeyAsString(); System.out.println("key = " + key); }
}
|
自动补全
拼音分词器
下载拼音分词器
自定义分词器

analyzer:该分词器是创建倒排索引时,使用的分词器,相当于将分词初始化。
search_analyzer:这个是将搜索词语分词,再在倒排索引中匹配返回。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
| PUT /test { "settings": { "analysis": { "analyzer": { "my_analyzer": { "tokenizer": "ik_max_word", "filter": "py" } }, "filter": { "py": { "type": "pinyin", "keep_full_pinyin": false, "keep_joined_full_pinyin": true, "keep_original": true, "limit_first_letter_length": 16, "remove_duplicated_term": true, "none_chinese_pinyin_tokenize": false } } } }, "mappings": { "properties": { "name": { "type": "text", "analyzer": "my_analyzer", "search_analyzer": "ik_max_word" }, "id": { "type": "keyword" } } } }
|
这个filter
配置是用于Elasticsearch的一个拼音分词器(pinyin analyzer)的自定义设置。Elasticsearch是一个基于Lucene的搜索和分析引擎,它提供了全文搜索功能,可以对大量的数据进行实时分析。拼音分词器可以将中文文本转换为拼音,这对于中文的模糊搜索、拼音搜索等场景非常有用。
下面是对这个filter
配置中各个字段的分析:
- type
- 值: “pinyin”
- 说明: 这个字段指定了分词器的类型为”pinyin”,即拼音分词器。
- keep_full_pinyin
- 值: false
- 说明: 这个设置决定了是否保留完整的拼音。设置为false意味着不会保留完整的拼音,只会保留首字母或其他配置的拼音形式。
- keep_joined_full_pinyin
- 值: true
- 说明: 这个设置决定了是否保留连写的完整拼音。设置为true意味着会保留连续的完整拼音。
- keep_original
- 值: true
- 说明: 这个设置决定了是否保留原始的中文字符。设置为true意味着在分词结果中会保留原始的中文字符。
- limit_first_letter_length
- 值: 16
- 说明: 这个设置用于限制首字母的长度。例如,当
keep_full_pinyin
为false时,这个设置可以限制保留的首字母的最大长度。这里设置为16,意味着最多保留16个首字母。
- remove_duplicated_term
- 值: true
- 说明: 这个设置决定了是否去除重复的词条。设置为true意味着会去除分词结果中的重复词条。
- none_chinese_pinyin_tokenize
- 值: false
- 说明: 这个设置决定了非中文字符是否应该被拼音分词器处理。设置为false意味着非中文字符不会被拼音分词器处理,它们会保持原样
分词效果:
1 2 3 4 5 6 7 8
| POST /test/_analyze { "text": ["小米手机"], "analyzer": "my_analyzer" }
小米,xiaomi,xm,手机,shouji,sj
|
RestApi实现自动补全
completion查询字段
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
| PUT test { "mappings": { "properties": { "title":{ "type": "completion" } } } }
POST test/_doc { "title": ["Sony", "WH-1000XM3"] } POST test/_doc { "title": ["SK-II", "PITERA"] } POST test/_doc { "title": ["Nintendo", "switch"] }
|
发起请求
重新设计索引库:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
| PUT /hotel { "settings": { "analysis": { "analyzer": { "text_anlyzer": { "tokenizer": "ik_max_word", "filter": "py" }, "completion_analyzer": { "tokenizer": "keyword", "filter": "py" } }, "filter": { "py": { "type": "pinyin", "keep_full_pinyin": false, "keep_joined_full_pinyin": true, "keep_original": true, "limit_first_letter_length": 16, "remove_duplicated_term": true, "none_chinese_pinyin_tokenize": false } } } }, "mappings": { "properties": { "id":{ "type": "keyword" }, "name":{ "type": "text", "analyzer": "text_anlyzer", "search_analyzer": "ik_smart", "copy_to": "all" }, "address":{ "type": "keyword", "index": false }, "price":{ "type": "integer" }, "score":{ "type": "integer" }, "brand":{ "type": "keyword", "copy_to": "all" }, "city":{ "type": "keyword" }, "starName":{ "type": "keyword" }, "business":{ "type": "keyword", "copy_to": "all" }, "location":{ "type": "geo_point" }, "pic":{ "type": "keyword", "index": false }, "all":{ "type": "text", "analyzer": "text_anlyzer", "search_analyzer": "ik_smart" }, "suggestion":{ "type": "completion", "analyzer": "completion_analyzer" } } } }
|

处理结果

数据同步

方式二:异步通知

方式三:监听binlog

消息队列实现数据同步

ES集群搭建

集群的脑裂

分片存储
