前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
社区首页 >专栏 >PB级数据实现秒级查询ES的安装

PB级数据实现秒级查询ES的安装

作者头像
趣学程序-shaofeer
发布于 2019-09-18 02:49:11
发布于 2019-09-18 02:49:11
58800
代码可运行
举报
文章被收录于专栏:upuptop的专栏upuptop的专栏
运行总次数:0
代码可运行

什么是ES?ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。Elasticsearch是用Java语言开发的,并作为Apache许可条款下的开放源码发布,是一种流行的企业级搜索引擎。ElasticSearch用于云计算中,能够达到实时搜索,稳定,可靠,快速,安装使用方便。

  • 本教程使用软件为6.4.3版本

下载

ElasticSearch:

https://www.elastic.co/cn/downloads/elasticsearch

Kibana:

https://www.elastic.co/cn/downloads/kibana

上传至Linux服务器

解压并修改配置文件

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
# 注意:需要使用普通用户操作ES,不可使用root用户进行操作。

#使用root用户,添加用户
#添加用户命令:
[root@hadoop137 ~]# useradd shaofei
[root@hadoop137 ~]# passwd shaofei
更改用户 shaofei的密码 。
新的 密码:
无效的密码: 过于简单化/系统化
无效的密码: 过于简单
重新输入新的 密码:
passwd: 所有的身份验证令牌已经成功更新。

#切换为shaofei用户:
[root@hadoop137 ~]# su shaofei
[shaofei@hadoop137 root]$ cd
[shaofei@hadoop137 ~]$

# 解压elasticsearch
[shaofei@hadoop137 softwear]$ tar -zxvf elasticsearch-6.4.3.tar.gz -C ../module/
[shaofei@hadoop137 elasticsearch-6.4.3]$ pwd
/opt/module/elasticsearch-6.4.3

# 修改配置文件
[shaofei@hadoop137 elasticsearch-6.4.3]$ vim config/elasticsearch.yml
# 注意修改yml文件时,配置项后面要空格

elasticsearch.yml

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
# 修改集群名称
cluster.name: myes
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
# 修改当前节点名称——下一步分发到其他集群时需要修改
node.name: hadoop137
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
# 当前节点的ip地址,我这里配置了hosts文件所以写了主机名称——下一步分发之后需要修改为对应主机的ip地址
network.host: hadoop137
#
# Set a custom port for HTTP:
# ES对外开放服务的端口号
http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes:
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

#添加集群ip地址
discovery.zen.ping.unicast.hosts: ["192.168.23.137", "192.168.23.138", "192.168.23.139"]

分发各个集群中的其他主机

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
[shaofei@hadoop137 module]$ rsync -rvl elasticsearch-6.4.3 shaofei@hadoop138:`pwd`
[shaofei@hadoop137 module]$ rsync -rvl elasticsearch-6.4.3 shaofei@hadoop139:`pwd`

分别启动集群中的ES

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
[shaofei@hadoop137 elasticsearch-6.4.3]$ ./bin/elasticsearch

[shaofei@hadoop138 elasticsearch-6.4.3]$ ./bin/elasticsearch

[shaofei@hadoop139 elasticsearch-6.4.3]$ ./bin/elasticsearch

启动中遇到的问题:

ERROR: [4] bootstrap checks failed [1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536] [2]: max number of threads [1024] for user [shaofei] is too low, increase to at least [4096] [3]: max virtual memory areas vm.maxmapcount [65530] is too low, increase to at least [262144]

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
对于第【1】个问题:
[root@hadoop139 shaofei]# vi /etc/security/limits.conf
末尾添加:
# End of file

* soft nofile 65536

* hard nofile 131072

* soft nproc 2048

* hard nproc 4096

对于第【2】个问题:
[root@hadoop139 shaofei]# vim /etc/security/limits.d/90-nproc.conf
修改:
[root@hadoop139 shaofei]# cat /etc/security/limits.d/90-nproc.conf
# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.


*          soft    nproc     10240
root       soft    nproc     unlimited


# 重新登陆用户即可

对于第【3】个问题:
vi /etc/sysctl.conf
添加下面配置:
vm.max_map_count=655360

并执行命令:
sysctl -p

java.lang.UnsupportedOperationException: seccomp unavailable: CONFIGSECCOMP not compiled into kernel, CONFIGSECCOMP and CONFIGSECCOMPFILTER are needed

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
# 修改elasticsearch.yml 添加一下内容

bootstrap.memory_lock: false
bootstrap.system_call_filter: false

问题解决之后重新启动es即可。

  • 启动成功

访问:http://ip:9200

安装kibana

  • 下载安装包
  • 解压
  • 修改配置文件

kibana.yml

代码语言:javascript
代码运行次数:0
运行
AI代码解释
复制
# Kibana is served by a back end server. This setting specifies the port to use.
# 配置端口
server.port: 5601

# Specifies the address to which the Kibana server will bind. IP addresses and host names are both valid values.
# The default is 'localhost', which usually means remote machines will not be able to connect.
# To allow connections from remote users, set this parameter to a non-loopback address.
# 配置主机ip
server.host: "hadoop137"

# Enables you to specify a path to mount Kibana at if you are running behind a proxy.
# Use the `server.rewriteBasePath` setting to tell Kibana if it should remove the basePath
# from requests it receives, and to prevent a deprecation warning at startup.
# This setting cannot end in a slash.
#server.basePath: ""

# Specifies whether Kibana should rewrite requests that are prefixed with
# `server.basePath` or require that they are rewritten by your reverse proxy.
# This setting was effectively always `false` before Kibana 6.3 and will
# default to `true` starting in Kibana 7.0.
#server.rewriteBasePath: false

# The maximum payload size in bytes for incoming server requests.
#server.maxPayloadBytes: 1048576

# The Kibana server's name.  This is used for display purposes.
#server.name: "your-hostname"

# The URL of the Elasticsearch instance to use for all your queries.
#配置elasticsearch的访问
elasticsearch.url: "http://hadoop137:9200"

# When this setting's value is true Kibana uses the hostname specified in the server.host
# setting. When the value of this setting is false, Kibana uses the hostname of the host
# that connects to this Kibana instance.
#elasticsearch.preserveHost: true

# Kibana uses an index in Elasticsearch to store saved searches, visualizations and
# dashboards. Kibana creates a new index if the index doesn't already exist.
#kibana.index: ".kibana"

# The default application to load.
#kibana.defaultAppId: "home"

# If your Elasticsearch is protected with basic authentication, these settings provide
# the username and password that the Kibana server uses to perform maintenance on the Kibana
# index at startup. Your Kibana users still need to authenticate with Elasticsearch, which
# is proxied through the Kibana server.
#elasticsearch.username: "user"
#elasticsearch.password: "pass"

# Enables SSL and paths to the PEM-format SSL certificate and SSL key files, respectively.
# These settings enable SSL for outgoing requests from the Kibana server to the browser.
#server.ssl.enabled: false
#server.ssl.certificate: /path/to/your/server.crt
#server.ssl.key: /path/to/your/server.key

# Optional settings that provide the paths to the PEM-format SSL certificate and key files.
# These files validate that your Elasticsearch backend uses the same key files.
#elasticsearch.ssl.certificate: /path/to/your/client.crt
#elasticsearch.ssl.key: /path/to/your/client.key

# Optional setting that enables you to specify a path to the PEM file for the certificate
# authority for your Elasticsearch instance.
#elasticsearch.ssl.certificateAuthorities: [ "/path/to/your/CA.pem" ]

# To disregard the validity of SSL certificates, change this setting's value to 'none'.
#elasticsearch.ssl.verificationMode: full

# Time in milliseconds to wait for Elasticsearch to respond to pings. Defaults to the value of
# the elasticsearch.requestTimeout setting.
#elasticsearch.pingTimeout: 1500

# Time in milliseconds to wait for responses from the back end or Elasticsearch. This value
# must be a positive integer.
#elasticsearch.requestTimeout: 30000

# List of Kibana client-side headers to send to Elasticsearch. To send *no* client-side
# headers, set this value to [] (an empty list).
#elasticsearch.requestHeadersWhitelist: [ authorization ]

# Header names and values that are sent to Elasticsearch. Any custom headers cannot be overwritten
# by client-side headers, regardless of the elasticsearch.requestHeadersWhitelist configuration.
#elasticsearch.customHeaders: {}

# Time in milliseconds for Elasticsearch to wait for responses from shards. Set to 0 to disable.
#elasticsearch.shardTimeout: 30000

# Time in milliseconds to wait for Elasticsearch at Kibana startup before retrying.
#elasticsearch.startupTimeout: 5000

# Logs queries sent to Elasticsearch. Requires logging.verbose set to true.
#elasticsearch.logQueries: false

# Specifies the path where Kibana creates the process ID file.
#pid.file: /var/run/kibana.pid

# Enables you specify a file where Kibana stores log output.
#logging.dest: stdout

# Set the value of this setting to true to suppress all logging output.
#logging.silent: false

# Set the value of this setting to true to suppress all logging output other than error messages.
#logging.quiet: false

# Set the value of this setting to true to log all events, including system usage information
# and all requests.
#logging.verbose: false

# Set the interval in milliseconds to sample system and process performance
# metrics. Minimum is 100ms. Defaults to 5000.
#ops.interval: 5000

# The default locale. This locale can be used in certain circumstances to substitute any missing
# translations.
#i18n.defaultLocale: "en"

访问:http://hadoop137:5601/ 即可验证是否成功

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2019-09-17,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 趣学程序 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
暂无评论
推荐阅读
spark2.0.1安装部署及使用jdbc连接基于hive的sparksql
复制一份spark-env.sh.template,改名为spark-env.sh。然后编辑spark-env.sh
尚浩宇
2018/08/17
1.7K0
spark2.0.1安装部署及使用jdbc连接基于hive的sparksql
Hadoop Hive入门及与spring boot整合实现增删改查
Apache Hive 是一个构建在 Apache Hadoop 之上的数据仓库系统,旨在简化大规模数据集的查询和分析过程。它提供了一种 SQL-like 查询语言(HiveQL 或 Hive Query Language),使得熟悉 SQL 的用户能够以声明式的方式操作存储在 Hadoop 分布式文件系统(HDFS)或其他兼容存储系统(如 Amazon S3)上的数据. 下面说说Hive 的关键特性与优势:
用户7353950
2024/06/18
8620
Hadoop Hive入门及与spring boot整合实现增删改查
慕课网Spark SQL日志分析 - 4.从Hive平滑过渡到Spark SQL
老版本文档:http://spark.apache.org/docs/1.6.1/
Meet相识
2018/09/12
8350
慕课网Spark SQL日志分析 - 4.从Hive平滑过渡到Spark SQL
Zeppelin0.7.2结合hive解释器进行报表展示
前提:服务器已经安装好了Hadoop_client端即hadoop的环境hbase,hive等相关组件
星哥玩云
2022/07/24
4390
Zeppelin0.7.2结合hive解释器进行报表展示
Apache Zeppelin 中 JDBC通用 解释器
片刻
2018/01/05
3K0
Apache Zeppelin 中 JDBC通用 解释器
接收Kafka数据并消费至Hive表
将Kafka中的数据消费到Hive可以通过以下简单而稳定的步骤来实现。这里假设的数据是以字符串格式存储在Kafka中的。
火之高兴
2024/07/25
2710
CDH 6.3.1整合Zeppelin 0.8.2
Zeppelin是一个基于Web的笔记本,可以直接在浏览器中编写代码,对数据进行查询分析并生成报表或图表,做出数据驱动的、交互、协作的文档,并且可以共享笔记。Zeppelin提供了内置的Apache Spark集成,提供的功能有:
用户1148526
2020/03/18
2.3K0
CDH 6.3.1整合Zeppelin 0.8.2
Flink开发-Hive数据导入Phoenix中
ResultSet.next其实是取一条就跟数据库通讯拿一条数据,并不是全部取出放在内存,因为ResultSet.next之前,是获取了数据库连接的,数据库连接断开,你就获取不到数据了,说明是有通讯的。
码客说
2023/03/06
6890
3.sparkSQL整合Hive
  spark SQL经常需要访问Hive metastore,Spark SQL可以通过Hive metastore获取Hive表的元数据。从Spark 1.4.0开始,Spark SQL只需简单的配置,就支持各版本Hive metastore的访问。注意,涉及到metastore时Spar SQL忽略了Hive的版本。Spark SQL内部将Hive反编译至Hive 1.2.1版本,Spark SQL的内部操作(serdes, UDFs, UDAFs, etc)都调用Hive 1.2.1版本的class。
intsmaze-刘洋
2018/08/29
2.9K0
3.sparkSQL整合Hive
Hive 简单JDBC client程序
https://cwiki.apache.org/confluence/display/Hive/HiveJDBCInterface
esse LL
2024/06/03
1890
大数据集群搭建之Linux安装Hive2.3.2
GettingStarted - Apache Hive - Apache Software Foundation
静谧星空TEL
2022/01/05
1.2K0
大数据集群搭建之Linux安装Hive2.3.2
Spark SQL实战(08)-整合Hive
Apache Spark 是一个快速、可扩展的分布式计算引擎,而 Hive 则是一个数据仓库工具,它提供了数据存储和查询功能。在 Spark 中使用 Hive 可以提高数据处理和查询的效率。
JavaEdge
2023/03/27
1.3K0
Spark SQL实战(08)-整合Hive
如何使用java代码通过JDBC连接Hive(附github源码)
前面我们讲过《如何使用java代码通过JDBC连接Impala(附Github源码)》,本篇文章主要讲述如何使用Java代码通过JDBC的方式连接Hive。
Fayson
2018/03/29
7.3K0
如何使用java代码通过JDBC连接Hive(附github源码)
Apache Zeppelin 中 Elasticsearch 解释器
概述 Elasticsearch是一个高度可扩展的开源全文搜索和分析引擎。它允许您快速,实时地存储,搜索和分析大量数据。它通常用作为具有复杂的搜索功能和要求的应用程序提供的底层引擎/技术。 配置
片刻
2018/01/05
1.7K0
Apache Zeppelin 中 Elasticsearch 解释器
2021年大数据Spark(三十三):SparkSQL分布式SQL引擎
SparkSQL模块从Hive框架衍生发展而来,所以Hive提供的所有功能(数据分析交互式方式)都支持,文档:http://spark.apache.org/docs/2.4.5/sql-distributed-sql-engine.html。
Lansonli
2021/10/09
5500
JDBC访问SparkSQL
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. id: 1     name: xiaoli    age: 16 Process finished with exit code 0
程裕强
2022/05/06
5770
远程连接hive server流程详解
本文介绍了如何通过HiveServer2、beeline、SQuirrel SQL Client等工具远程连接HiveServer2,并执行SQL语句。主要包括了配置HiveServer2、客户端连接、执行SQL语句、结果展示等功能。
挖掘大数据
2017/12/28
4.2K0
远程连接hive server流程详解
Python连接星环数仓取数
阿黎逸阳
2024/04/17
3850
Python连接星环数仓取数
Spring boot with Apache Hive
本文节选自《Netkiller Database 手札》 5.26. Spring boot with Apache Hive 5.26.1. Maven <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> </properties> <dependencies> <dependency> <groupId>org.springframew
netkiller old
2018/03/05
2.2K0
Spring boot with Hive
本文节选自《Netkiller Java 手札》 摘要: spring boot 1.5.6 + hive 2.3.0 + hadoop 2.5.0 + hbase 1.3.1 5.29. Spring boot with Apache Hive 5.29.1. Maven <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-jdbc</artifactId> </
netkiller old
2018/03/05
4.6K0
相关推荐
spark2.0.1安装部署及使用jdbc连接基于hive的sparksql
更多 >
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档