ELasticSearch

FANSEA2024-03-152024-09-23

ELasticSearch

项目优势

基于ES自定义拼音分词器实现对关键词、拼音实现搜索框自动补全

基于ES实现依靠关键词，距离，城市，价格范围等多条件高效搜索，并对关键词进行高亮显示

聚合索引库数据，根据酒店口杯、数量筛选，动态展现搜索条件

基于RabbitMq实现ES索引库和数据库数据同步

部署es集群结合Ribbon实现负载均衡

倒排索引概念

将数据分为词条，保存各词条与id的映射关系

mysql中的like查询是逐一查询的，速率很慢；但是对于简单的id查询工作数据库速度其实不慢。

ElasticSearch架构：

es主要可以用于数据搜索（查询商品），mysql强调数据持久化和一致性（订单系统，博客系统）

IK中文分词插件

IK插件词条分析模式：

ik_smart：最少切分
ik_max_word：最细切分

POST /_analyze
{
  "text": "我是黑马java程序员哈哈！",
  "analyzer": "ik_max_word"
}

IK字典管理：

字典中有些网络热词没有添加进来，还有一些不该出现的词需要禁掉，这里我们需要对他的一个配置文件进行设置

修改IKAnalyzer.cfg.xml文件：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer 扩展配置</comment>
	<!--用户可以在这里配置自己的扩展字典 -->
	<entry key="ext_dict">ext.dic</entry>
	 <!--用户可以在这里配置自己的扩展停止词字典-->
	<entry key="ext_stopwords">stopword.dic</entry>
	<!--用户可以在这里配置远程扩展字典 -->
	<!-- <entry key="remote_ext_dict">words_location</entry> -->
	<!--用户可以在这里配置远程扩展停止词字典-->
	<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

写入词语进入ext.dic则会扩展词典，写入stopword.dic则会添加禁语

索引库属性

mapping属性：

这里index的属性默认为ture，意思为参与倒排索引搜索

创建索引库（数据库表）：

类似于mysql的数据库表，需要定义各种属性，以及数据类型

PUT /heima
{
  "mappings": {
    "properties": {
      "info":{
        "type": "text",
        "analyzer": "ik_smart"
      },
      "email":{
        "type": "keyword",
        "index": "false"
      },
      "name":{
        "properties": {
          "firstName": {
            "type": "keyword"
          }
        }
      },
    }
  }
}

DSL语法：

索引库操作（数据库表）

无法修改字段，但是可以新增字段

#新增索引库
PUT /heima
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "ik_smart"
      },
      "email": {
        "type": "keyword",
        "index": false
      }
    }
  }
}

#获取索引
GET /heima

#添加索引属性
PUT /heima/_mapping
{
  "properties": {
    "isRoot": {
      "type": "boolean",
      "index": false
    }
  }
}

#删除索引库
DELETE /heima

文档操作（表字段）

其实就是CRUD

#添加文档（字段）
POST /heima/_doc/1
{
  "name": "赵云",
  "email": "3065941239@qq.cm",
  "isRoot": true
}

#查看文档
GET /heima/_doc/1

#删除文档
DELETE /heima/_doc/1

#修改文档1：全量修改
PUT /heima/_doc/1
{
  "name": "赵云",
  "email": "3065941239@qq.cm",
  "isRoot": false
}

#修改文档2：单变量修改
POST /heima/_update/1
{
  "doc": {
    "isRoot": true
  }
}

查询词条

#利用分词器分词查询单字段
GET /hotel/_search
{
  "query": {
    "match": {
      "name": "酒店"
    }
  }
}

#多字段查询
GET /hotel/_search
{
  "query": {
    "multi_match": {
      "query": "酒店",
      "fields": ["brand","business","name"]
    }
  }
}

#根据范围查询（keyword）
GET /hotel/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 10,
        "lte": 20
      }
    }
  }
}

#根据keyword字段精确查询
GET /hotel/_search
{
  "query": {
    "term": {
      "business": {
        "value": "外滩"
      }
    }
  }
}

#根据地理位置周围圆周区域查询
GET /hotel/_search
{
  "query": {
    "geo_distance":{
      "distance": "15km",
      "location": "31.21,121.5"
    }
  }
}

GET /hotel/_search
{
  "query": {
    "geo_bounding_box": {
      "FIELD": {
        "top_left": {
          "lat": 31.1,
          "lon": 121.5
        },
        "bottom_right": {
          "lat": 30.9,
          "lon": 121.7
        }
      }
    }
  }
}

在java代码操作ES(RESTClient)

RESTClient文档

根据数据表设计索引库

PUT /hotel
{
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "name": {
        "type": "text",
        "analyzer": "ik_max_word",
        "copy_to": "all"
      },
      "address": {
        "type": "keyword",
        "index": false
      },
      "price": {
        "type": "integer"
      },
      "score": {
        "type": "integer"
      },
      "brand": {
        "type": "keyword",
        "copy_to": "all"
      },
      "city": {
        "type": "keyword"
      },
      "starName": {
        "type": "keyword"
      },
      "business": {
        "type": "keyword",
        "copy_to": "all"
      },
      "location": {
        "type": "geo_point"
      },
      "pic": {
        "type": "keyword",
        "index": false
      },
      "all": {
        "type": "text",
        "analyzer": "ik_max_word"
      }
    }
  }
}

copy_to是将该字段加入到all，对all操作就能一起查询对应加入的字段

javaRestClient快速开始

引入依赖

<properties>
     <java.version>1.8</java.version>
     <elasticsearch.version>7.12.1</elasticsearch.version>
</properties>

<dependency>
     <groupId>org.elasticsearch.client</groupId>
     <artifactId>elasticsearch-rest-high-level-client</artifactId>
</dependency>

简单使用

RestHighLevelClient client;

   @BeforeEach
   void initialize(){
       this.client = new RestHighLevelClient(
               RestClient.builder(HttpHost.create("http://192.168.25.80:9200"))
       );
   }

   @AfterEach
   void close() throws IOException {
       client.close();
   }

   @Test
   void testClient(){
       System.out.println(client);
   }

操作索引库（indices）

RestHighLevelClient client;
@Test
   void testClient() throws IOException {
       CreateIndexRequest request = new CreateIndexRequest("hotel");
       request.source(HOTEL_INDEX, XContentType.JSON);
       client.indices().create(request, RequestOptions.DEFAULT);
   }
   @Test
   void testDelete() throws IOException {
       DeleteIndexRequest request = new DeleteIndexRequest("hotel");
       client.indices().delete(request, RequestOptions.DEFAULT);
   }
   @Test
   void testExit() throws IOException {
       GetIndexRequest request = new GetIndexRequest("hotel");
       Boolean getIndexResponse = client.indices().exists(request, RequestOptions.DEFAULT);
       System.out.println(getIndexResponse);
   }

操作文档（index）

@Test
//添加文档，多字段更新文档
   void testIndexAdd() throws IOException {
       Hotel hotel = hotelMapper.selectById(36934L);
       HotelDoc hotelDoc = new HotelDoc(hotel);
       IndexRequest request = new IndexRequest("hotel").id(hotelDoc.getId().toString());
       request.source(JSON.toJSON(hotelDoc),XContentType.JSON);
       client.index(request,RequestOptions.DEFAULT);
   }
   //查询文档
   @Test
   void testIndexGet() throws IOException {
       GetRequest request = new GetRequest("hotel","36934");
       GetResponse response = client.get(request, RequestOptions.DEFAULT);
       String json = response.getSourceAsString();
       System.out.println(json);
   }
//更新文档，单字段更新
   @Test
   void testIndexUpdate() throws IOException {
       Hotel hotel = hotelMapper.selectById(36934L);
       HotelDoc hotelDoc = new HotelDoc(hotel);
       UpdateRequest request = new UpdateRequest("hotel", hotelDoc.getId().toString());
       request.doc("city","北京");
       client.update(request,RequestOptions.DEFAULT);
   }
   //删除文档
   @Test
   void testIndexDelete() throws IOException {
       Hotel hotel = hotelMapper.selectById(36934L);
       HotelDoc hotelDoc = new HotelDoc(hotel);
       DeleteRequest request = new DeleteRequest("hotel", hotelDoc.getId().toString());
       client.delete(request,RequestOptions.DEFAULT);
   }
//批量添加文档
   @Test
   void testIndexBulk() throws IOException {
       List<Hotel> hotels = hotelMapper.selectList(null);
       BulkRequest request = new BulkRequest();
       for(Hotel hotel:hotels ){
           HotelDoc hotelDoc = new HotelDoc(hotel);
           request.add(new IndexRequest("hotel").id(hotelDoc.getId().toString()).source(JSON.toJSON(hotelDoc),XContentType.JSON));
       }
       client.bulk(request,RequestOptions.DEFAULT);
   }

复合条件查询（bool查询）

GET /hotel/_search
{
  "query": {
    "bool": {
      "must": [
        {"term": {
          "brand": {
            "value": "如家"
          }
        }}
      ],
      "must_not": [
        {
          "range": {
            "price": {
              "gt": 300
            }
          }
        }
      ],
      "filter": [
        {
          "geo_distance": {
            "distance": "10km",
            "location": {
              "lat": 31.3,
              "lon": 121.4
            }
          }
        }
      ]
    }
  }
}

搜索结果处理

sort标签与query同级

排序

# 如果score值相同，则会按照price升序排序
GET /hotel/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "score": "desc"
    },
    {
      "price": "asc"
    }
  ]
}

分页

指定from，size

GET  /hotel/_search
{
  "query": {
    "match_all": {}
  },
  "from": 0,
  "size": 2,
  "sort": [
    {
      "price": {
        "order": "asc"
      }
    }
  ]
}

RestClient查找操作

单字段查询

@Test
    void testSearch() throws IOException {
        SearchRequest request = new SearchRequest("hotel");
//        request.source().query(QueryBuilders.matchQuery("all","如家"));
//        request.source().query(QueryBuilders.multiMatchQuery("如家","brand","business","name"));
//        request.source().query(QueryBuilders.termQuery("brand","如家"));
	    request.source().query(QueryBuilders.rangeQuery("price").lte(500).gte(300));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        handleResponse(response);

    }

复合条件查询

@Test
    void testSearch() throws IOException {
        SearchRequest request = new SearchRequest("hotel");
        BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
        boolQuery.must(QueryBuilders.termQuery("brand","如家"));
        boolQuery.mustNot(QueryBuilders.rangeQuery("price").gt(1000));
        boolQuery.filter(QueryBuilders.geoDistanceQuery("location").point(31.1D,121.5D).distance("100km"));
        request.source().query(boolQuery);
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        handleResponse(response);

    }

分页，排序查询

@Test
    void testOrder() throws IOException {
        int page = 1;
        int size = 5;
        SearchRequest request = new SearchRequest("hotel");
        request.source().query(QueryBuilders.matchAllQuery());
        request.source().sort("price", SortOrder.ASC);
        request.source().from((page - 1)*size).size(size);
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        handleResponse(response);

    }

高亮显示

requireFieldMatch 是 HighlightBuilder 的一个设置，用于确定高亮是否仅应用于与查询匹配的字段。当 requireFieldMatch 设置为 true 时，Elasticsearch将仅高亮那些在查询中明确提及的字段中与查询匹配的文本。这意味着如果你查询了字段A但没有查询字段B，即使字段B中有与查询匹配的文本，字段B中的文本也不会被高亮。

当你将 requireFieldMatch 设置为 false 时，Elasticsearch将尝试在请求中指定的所有字段中查找与查询匹配的文本，并对其进行高亮，而不仅仅是在查询中明确提及的字段。

@Test
    void testHighlight() throws IOException {
        SearchRequest request = new SearchRequest("hotel");
        request.source().query(QueryBuilders.matchQuery("all","如家"));
        request.source().highlighter(new HighlightBuilder().field("name").requireFieldMatch(false));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        handleResponse(response);

    }

黑马旅游

广告置顶

添加广告标识

1
2
3

public class HotelDoc {
    private Boolean isAD;
}

使用functionScore排序将isAD为true的重新算分

FunctionScoreQueryBuilder scoreQuery = QueryBuilders.functionScoreQuery(boolQuery, new FunctionScoreQueryBuilder.FilterFunctionBuilder[]{
                new FunctionScoreQueryBuilder.FilterFunctionBuilder(
                        QueryBuilders.termQuery("isAD", true),
                        ScoreFunctionBuilders.weightFactorFunction(10)
                )
        });
request.source().query(scoreQuery);

数据聚合展示桌面选项

118a7526b7eb50dc05b3753e071014dc

public Map<String, List<String>> getMultiAggregation() {
        Map<String, List<String>> map = new HashMap<>();
        SearchRequest request = new SearchRequest("hotel");
        request.source().size(0);
        request.source().aggregation(AggregationBuilders.terms("brandAgg").field("brand").size(20));
        request.source().aggregation(AggregationBuilders.terms("cityAgg").field("city").size(20));
        request.source().aggregation(AggregationBuilders.terms("starAgg").field("starName").size(20));
//        request.source().aggregation();
        SearchResponse response = null;
        try {
            response = client.search(request, RequestOptions.DEFAULT);
        } catch (IOException e) {
            e.printStackTrace();
        }
//        System.out.println("response = " + response);
        Aggregations aggregations = response.getAggregations();
        List listBrand = getAggList(aggregations,"brandAgg");
        map.put("品牌",listBrand);
        List listCity = getAggList(aggregations,"cityAgg");
        map.put("城市",listCity);
        List listStar = getAggList(aggregations,"starAgg");
        map.put("星级",listStar);
        return map;
    }
    private List getAggList(Aggregations aggregations,String aggName) {
        Terms brandAgg = aggregations.get(aggName);
        List list = new ArrayList<String>();
        List<? extends Terms.Bucket> buckets = brandAgg.getBuckets();
        for(Terms.Bucket bucket:buckets){
            String key = bucket.getKeyAsString();
            list.add(key);
//            System.out.println("key = " + key);
        }
        return list;
    }

数据聚合

查询文档

数据聚合可以理解为对文档做分类之后统计数据，比如会统计：

品牌分类

品牌平均评分，最高分，最低分

聚合分类

聚合语法

GET /hotel/_search
{
  #设置聚合的范围
  "query": {
    "range": {
      "price": {
        "lte": 1000
      }
    }
  }, 
#设置命中文档为0，只显示聚合结果
  "size": 0,
  "aggs": {
    "brandAgg": {
      "terms": {
        "field": "brand",
        "size": 10,
          #order选择需要排序的字段
        "order": {
          "scoreAgg.avg": "desc"
        }
      },
      #对评分做聚合，结果带上评分聚合结果
      "aggs": {
        "scoreAgg": {
          "stats": {
            "field": "score"
          }
        }
      }
    }
  }
}

RestAPI实现聚合

@Test
    void testAggregation() throws IOException {
        //实现聚合
        SearchRequest request = new SearchRequest("hotel");
        request.source().size(0);
        request.source().aggregation(AggregationBuilders.terms("brandAgg").field("brand").size(20));
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        System.out.println("response = " + response);
        //解析response
        Aggregations aggregations = response.getAggregations();
        Terms brandAgg = aggregations.get("brandAgg");
        List<? extends Terms.Bucket> buckets = brandAgg.getBuckets();
        for(Terms.Bucket bucket:buckets){
            String key = bucket.getKeyAsString();
            System.out.println("key = " + key);
        }

    }

自动补全

拼音分词器

下载拼音分词器

自定义分词器

analyzer：该分词器是创建倒排索引时，使用的分词器，相当于将分词初始化。

search_analyzer：这个是将搜索词语分词，再在倒排索引中匹配返回。

// 自定义拼音分词器
PUT /test
{
  "settings": {
    "analysis": {
      "analyzer": { 
        "my_analyzer": { 
          "tokenizer": "ik_max_word",
          "filter": "py"
        }
      },
      "filter": {
        "py": { 
          "type": "pinyin",
          "keep_full_pinyin": false,
          "keep_joined_full_pinyin": true,
          "keep_original": true,
          "limit_first_letter_length": 16,
          "remove_duplicated_term": true,
          "none_chinese_pinyin_tokenize": false
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "my_analyzer",
        "search_analyzer": "ik_max_word"
      },
      "id": {
        "type": "keyword"
      }
    }
  }
}

这个filter配置是用于Elasticsearch的一个拼音分词器（pinyin analyzer）的自定义设置。Elasticsearch是一个基于Lucene的搜索和分析引擎，它提供了全文搜索功能，可以对大量的数据进行实时分析。拼音分词器可以将中文文本转换为拼音，这对于中文的模糊搜索、拼音搜索等场景非常有用。

下面是对这个filter配置中各个字段的分析：

type

值: “pinyin”

说明: 这个字段指定了分词器的类型为”pinyin”，即拼音分词器。

keep_full_pinyin

值: false

说明: 这个设置决定了是否保留完整的拼音。设置为false意味着不会保留完整的拼音，只会保留首字母或其他配置的拼音形式。

keep_joined_full_pinyin

值: true

说明: 这个设置决定了是否保留连写的完整拼音。设置为true意味着会保留连续的完整拼音。

keep_original

值: true

说明: 这个设置决定了是否保留原始的中文字符。设置为true意味着在分词结果中会保留原始的中文字符。

limit_first_letter_length

值: 16

说明: 这个设置用于限制首字母的长度。例如，当keep_full_pinyin为false时，这个设置可以限制保留的首字母的最大长度。这里设置为16，意味着最多保留16个首字母。

remove_duplicated_term

值: true

说明: 这个设置决定了是否去除重复的词条。设置为true意味着会去除分词结果中的重复词条。

none_chinese_pinyin_tokenize

值: false

说明: 这个设置决定了非中文字符是否应该被拼音分词器处理。设置为false意味着非中文字符不会被拼音分词器处理，它们会保持原样

分词效果：

POST /test/_analyze
{
  "text": ["小米手机"],
  "analyzer": "my_analyzer"
}

//分词效果
小米，xiaomi，xm，手机，shouji，sj

RestApi实现自动补全

completion查询字段

// 自动补全的索引库
PUT test
{
  "mappings": {
    "properties": {
      "title":{
        "type": "completion"
      }
    }
  }
}
// 示例数据
POST test/_doc
{
  "title": ["Sony", "WH-1000XM3"]
}
POST test/_doc
{
  "title": ["SK-II", "PITERA"]
}
POST test/_doc
{
  "title": ["Nintendo", "switch"]
}

发起请求

重新设计索引库：

// 酒店数据索引库
PUT /hotel
{
  "settings": {
    "analysis": {
      "analyzer": {
        "text_anlyzer": {
          "tokenizer": "ik_max_word",
          "filter": "py"
        },
        "completion_analyzer": {
          "tokenizer": "keyword",
          "filter": "py"
        }
      },
      "filter": {
        "py": {
          "type": "pinyin",
          "keep_full_pinyin": false,
          "keep_joined_full_pinyin": true,
          "keep_original": true,
          "limit_first_letter_length": 16,
          "remove_duplicated_term": true,
          "none_chinese_pinyin_tokenize": false
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "id":{
        "type": "keyword"
      },
      "name":{
        "type": "text",
        "analyzer": "text_anlyzer",
        "search_analyzer": "ik_smart",
        "copy_to": "all"
      },
      "address":{
        "type": "keyword",
        "index": false
      },
      "price":{
        "type": "integer"
      },
      "score":{
        "type": "integer"
      },
      "brand":{
        "type": "keyword",
        "copy_to": "all"
      },
      "city":{
        "type": "keyword"
      },
      "starName":{
        "type": "keyword"
      },
      "business":{
        "type": "keyword",
        "copy_to": "all"
      },
      "location":{
        "type": "geo_point"
      },
      "pic":{
        "type": "keyword",
        "index": false
      },
      "all":{
        "type": "text",
        "analyzer": "text_anlyzer",
        "search_analyzer": "ik_smart"
      },
      "suggestion":{
          "type": "completion",
          "analyzer": "completion_analyzer"
      }
    }
  }
}

ELasticSearch

项目优势

倒排索引概念

IK中文分词插件

索引库属性

DSL语法：

索引库操作（数据库表）

文档操作（表字段）

查询词条

在java代码操作ES(RESTClient)

根据数据表设计索引库

javaRestClient快速开始

操作索引库（indices）

操作文档（index）

相关性排名算分

复合条件查询（bool查询）

搜索结果处理

排序

分页

RestClient查找操作

单字段查询

复合条件查询

分页，排序查询

高亮显示

黑马旅游

广告置顶

数据聚合展示桌面选项

数据聚合

聚合分类

聚合语法

RestAPI实现聚合

自动补全

拼音分词器

自定义分词器

RestApi实现自动补全

completion查询字段

发起请求

处理结果

数据同步

消息队列实现数据同步

ES集群搭建

集群的脑裂

分片存储