我的用例很简单。我使用AWS数据目录作为元存储,通过CDK运行Presto部署了一个EMR集群。集群将只具有运行查询的默认用户。默认情况下,主用户是hadoop,我可以使用它通过JDBC连接到集群并运行查询。但是,我可以在没有密码的情况下建立上述连接。我已经阅读了Presto文档,它们提到了LDAP、Kerberos和基于文件的身份验证。我只想让它表现得像,比方说,一个MySQL数据库,在这里,我必须同时传递用户名和密码才能连接。然而,在我的生活中,我找不到设置密码的配置。以下是我到目前为止的设置:
{
classification: 'spark-hive-site',
configurationProperties: {
'hive.metastore.client.factory.class': 'com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory',
},
},
{
classification: 'emrfs-site',
configurationProperties: {
'fs.s3.maxConnections': '5000',
'fs.s3.maxRetries': '200',
},
},
{
classification: 'presto-connector-hive',
configurationProperties: {
'hive.metastore.glue.datacatalog.enabled': 'true',
'hive.parquet.use-column-names': 'true',
'hive.max-partitions-per-writers': '7000000',
'hive.table-statistics-enabled': 'true',
'hive.metastore.glue.max-connections': '20',
'hive.metastore.glue.max-error-retries': '10',
'hive.s3.use-instance-credentials': 'true',
'hive.s3.max-error-retries': '200',
'hive.s3.max-client-retries': '100',
'hive.s3.max-connections': '5000',
},
},我可以使用哪个设置来设置hadoop密码?对于这个简单的用例,基于Kerberos、LDAP和文件似乎太复杂了。我漏掉了什么明显的东西吗?
编辑在阅读了无数页文档并与AWS支持对话之后,我决定转到Trino,但是遇到了更多的问题。以下是我的CDK部署的当前配置:
configurations: [
{
classification: 'spark-hive-site',
configurationProperties: {
'hive.metastore.client.factory.class': 'com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory',
},
},
{
classification: 'emrfs-site',
configurationProperties: {
'fs.s3.maxConnections': '5000',
'fs.s3.maxRetries': '200',
},
},
{
classification: 'presto-connector-hive',
configurationProperties: {
'hive.metastore.glue.datacatalog.enabled': 'true',
'hive.parquet.use-column-names': 'true',
'hive.max-partitions-per-writers': '7000000',
'hive.table-statistics-enabled': 'true',
'hive.metastore.glue.max-connections': '20',
'hive.metastore.glue.max-error-retries': '10',
'hive.s3.use-instance-credentials': 'true',
'hive.s3.max-error-retries': '200',
'hive.s3.max-client-retries': '100',
'hive.s3.max-connections': '5000',
},
},
{
classification: 'trino-config',
configurationProperties: {
'query.max-memory-per-node': `${instanceMemory * 0.15}GB`, // 25% of a node
'query.max-total-memory-per-node': `${instanceMemory * 0.5}GB`, // 50% of a node
'query.max-memory': `${instanceMemory * 0.5 * coreInstanceGroupNodeCount}GB`, // 50% of the cluster
'query.max-total-memory': `${instanceMemory * 0.8 * coreInstanceGroupNodeCount}GB`, // 80% of the cluster
'query.low-memory-killer.policy': 'none',
'task.concurrency': vcpuCount.toString(),
'task.max-worker-threads': (vcpuCount * 4).toString(),
'http-server.authentication.type': 'PASSWORD',
'http-server.http.enabled': 'false',
'internal-communication.shared-secret': 'abcdefghijklnmopqrstuvwxyz',
'http-server.https.enabled': 'true',
'http-server.https.port': '8443',
'http-server.https.keystore.path': '/home/hadoop/fullCert.pem',
},
},
{
classification: 'trino-password-authenticator',
configurationProperties: {
'password-authenticator.name': 'file',
'file.password-file': '/home/hadoop/password.db',
'file.refresh-period': '5s',
'file.auth-token-cache.max-size': '1000',
},
},
],我从这里开始:https://trino.io/docs/current/security/tls.html
我采用的方法是:
“直接保护Trino服务器。这要求您获得有效的证书,并将其添加到Trino协调员的配置中。”
我已经从我的公司获得了内部通配符证书。这让我觉得:
出发地:https://trino.io/docs/current/security/inspect-pem.html
看来我需要将这3个文件插入一个文件中,为此我需要:
-----BEGIN RSA PRIVATE KEY-----
Content of private key
-----END RSA PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
Content of certificate text
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
First content of chain
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
Second content of chain
-----END CERTIFICATE-----然后,通过引导操作,我将文件放入所有节点。这样我就可以完成这个:https://trino.io/docs/current/security/tls.html#configure-the-coordinator和这些吐露:
'http-server.https.enabled': 'true',
'http-server.https.port': '8443',
'http-server.https.keystore.path': '/home/hadoop/fullCert.pem',我确信该文件已部署到节点。THen I继续这样做:https://trino.io/docs/current/security/password-file.html
我还知道特定的部分可以工作,因为如果我在主节点上直接使用trino CLI时密码错误,我就会得到一个凭据错误。
现在,我被困在这样做了:
[hadoop@ip-10-0-10-245 ~]$ trino-cli --server https://localhost:8446 --catalog awsdatacatalog --user hadoop --password --insecure
trino> select 1;
Query 20220701_201620_00001_9nksi failed: Insufficient active worker nodes. Waited 5.00m for at least 1 workers, but only 0 workers are active从/var/log/trino/server.log我看到:
2022-07-01T21:30:12.966Z WARN http-client-node-manager-51 io.trino.metadata.RemoteNodeState Error fetching node state from https://ip-10-0-10-245.ec2.internal:8446/v1/info/state: Failed communicating with server: https://ip-10-0-10-245.ec2.internal:8446/v1/info/state
2022-07-01T21:30:13.902Z ERROR Announcer-0 io.airlift.discovery.client.Announcer Service announcement failed after 8.11ms. Next request will happen within 1000.00ms
2022-07-01T21:30:14.913Z ERROR Announcer-1 io.airlift.discovery.client.Announcer Service announcement failed after 10.35ms. Next request will happen within 1000.00ms
2022-07-01T21:30:15.921Z ERROR Announcer-3 io.airlift.discovery.client.Announcer Service announcement failed after 8.40ms. Next request will happen within 1000.00ms
2022-07-01T21:30:16.930Z ERROR Announcer-0 io.airlift.discovery.client.Announcer Service announcement failed after 8.59ms. Next request will happen within 1000.00ms
2022-07-01T21:30:17.938Z ERROR Announcer-1 io.airlift.discovery.client.Announcer Service announcement failed after 8.36ms. Next request will happen within 1000.00ms同样有了这一点:
[hadoop@ip-10-0-10-245 ~]$ trino-cli --server https://localhost:8446 --catalog awsdatacatalog --user hadoop --password
trino> select 1;
Error running command: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
trino> 尽管我遵循这个命令将.pem文件作为资产上传到S3:
我说这么简单的事情不应该这么复杂,我说错了吗?我真的很感激这里的任何帮助。
发布于 2022-07-06 18:00:58
基于您从Trino,Insufficient active worker nodes获得的消息,身份验证系统正在工作,您现在在使用安全内部通信时遇到了问题。具体来说,这些机器在相互交谈时遇到了问题。首先,我将禁用内部TLS,验证一切是否正常,然后才启用它(假设您在您的环境中需要此功能)。若要禁用TLS,请使用:
internal-communication.shared-secret=<secret>
internal-communication.https.required=false
discovery.uri=http://<coordinator ip address>:<http port>那就把你所有的机器都找回来。你不应该看到Service announcement failed。当机器启动时,可能会出现几个错误消息,但是一旦它们建立了通信,错误消息就应该停止。
https://stackoverflow.com/questions/72793713
复制相似问题