Hive Metadata Database

Last updated: 2024-01-12 11:12:27

Feature Overview

When deploying the optional component Hive in a newly created EMR cluster, the system provides two methods of Hive metadata storage, thereby achieving unified management of Hive metadata. The first method is the cluster default, where Hive metadata is stored in the MetaDB purchased independently by the cluster. The second method involves linking to an external Hive metadata database, where you can choose to link to EMR-MetaDB or a self-built MySQL database. The metadata will be stored in the linked database and will not be destroyed with the cluster.

Preparations

Cluster Default: This involves the independent automatic purchase of a MetaDB cloud database instance storage unit as the metadata storage location. It stores metadata along with other component metadata and is destroyed along with the MetaDB cloud database when the cluster is destroyed. If you need to preserve metadata, you must manually save the metadata in the cloud database in advance.
Hive metadata is stored together with the metadata of Hue, Ranger, Oozie, Presto, Druid, and Superset components.
The cluster requires the separate purchase of a MetaDB as a metadata storage unit.
The MetaDB is destroyed along with the cluster, meaning that the metadata is also destroyed with the cluster.
Associated EMR-MetaDB: During the creation of the cluster, the system will fetch the available MetaDB from the cloud for the new cluster's Hive component to store metadata, eliminating the need for a separate purchase of MetaDB and saving costs. Moreover, the Hive metadata will not be destroyed with the current cluster.
The available MetaDB instance ID corresponds to an existing MetaDB within the EMR cluster under the same account.
When one or more components such as Hue, Ranger, Oozie, Druid, or Superset are selected, the system will automatically purchase a MetaDB for the storage of component metadata, excluding Hive.
To destroy the associated EMR-MetaDB, one must proceed to the cloud database for destruction. Once destroyed, the Hive metadata database will be irretrievable.
It is necessary to ensure that the associated EMR-MetaDB network and the newly created cluster exist within the same network environment.
Associated with a self-built MySQL database: Associating your locally self-built MySQL database as Hive metadata storage also eliminates the need for a separate purchase of MetaDB, thus saving costs. It is necessary to accurately fill in the local address starting with "jdbc:mysql://", the database name, and the database login password, and ensure that the network is connected with the current cluster network.
Please ensure that the self-built database and the EMR cluster are within the same network.
Accurately fill in the database username and password.
When one or more components such as Hue, Ranger, Oozie, Druid, or Superset are selected, the system will automatically purchase a MetaDB for metadata storage, excluding Hive.
It is essential to ensure that the Hive metadata version in the custom database is greater than or equal to the Hive version in the new cluster.

Instructions

Create cluster

1. Log in to your Tencent Cloud account, click on Purchase Now, and in the Optional Components section of the Available Zone and Software Configuration page, select the Hive component.
2. For Hive metadata storage, you can choose based on your needs, with the default options being EMR-MetaDB or a self-built MySQL database.
3. Configure according to your selection and the aforementioned restrictions.

Install the HIVE component afterwards.

1. After the successful creation of the cluster, log in to the EMR Console, enter the Cluster List Page, and click on the Cluster ID/Name you wish to manage.
2. Select Add Component in the Cluster Services and install the Hive component.

3. For Hive metadata storage, you can choose based on your needs, with the default options being EMR-MetaDB or a self-built MySQL database.
4. Configure according to your selection and the aforementioned restrictions.