The content of this page has been automatically translated by AI. If you encounter any problems while reading, you can view the corresponding content in Chinese.

ElasticSearch Data Source

Last updated: 2024-09-06 15:05:49

DataInLong provides real-time writing capability for Elasticsearch. This article introduces the current support status of Elasticsearch real-time data synchronization.

Supported Versions

Currently, DataInLong supports real-time whole database writing to Elasticsearch at the database level. The following version restrictions apply to real-time writing capability:
Type
Version
Elasticsearch
5.x, 6.x, 7.x

Use Limits

Currently, only a real-time whole database link from TDSQL-C MySQL to Elasticsearch is supported.
The primary key values in Elasticsearch are generated as _id by default and cannot be modified.
Currently, only the document's _id is supported as the partition column.
Elasticsearch currently uses dynamic field mapping to create tables. The DDL change strategy does not support automatic modification of column types, deletion of columns, or renaming columns (which will result in the addition of new columns).
Elasticsearch with Kerberos Authentication enabled is not supported.

Real-time whole database writing configuration

Supported Data Sources

Currently, the following source types of data are supported for real-time whole database synchronization to the Elasticsearch target:

Data Target Configuration

There are some configuration differences between Elasticsearch version 6.x and above (inclusive) and the following versions:
Configuration details for versions 6.x and above (inclusive):

Configuration details for versions below 6.x:

Parameter
Description
Data Flow Direction
Select the target data source for synchronization.
ES Version
The corresponding version is automatically displayed based on the selected data source
Write mode
Upsert: Update and write, updating all fields of each record (this is the only mode currently supported).
Index/Type matching strategy
Index and Type Naming Rules in ES:
For versions 6.x and above (inclusive), the index defaults to the same name as the source table; the type defaults to _doc and cannot be modified
For versions below 6.x, the index defaults to the same name as the source database; the type defaults to the same name as the source table
Custom Definition: Supports using a combination of built-in parameters and strings to generate target index/type names.
Note
Example: If the source table name is table1, and the mapping rule is ${table_name_di_src}_inlong, then the data from table1 will eventually be mapped and written into table1_inlong.
Gradual Value Retrieval Method
Currently, only default generation of _id values is supported

Log Collection Write Node Configuration


Parameter
Description
Data Source
Select an available Elasticsearch data source in the current project.
Index
Index Name in the Elasticsearch data source.
type
Automatically identify indices. For Elasticsearch 7.X, the default type is _doc.
Write mode
Elasticsearch only supports row-wise updates, updating all fields of each record.
Primary key value method
Three value methods are supported:
Source Table Primary Key: The document's id uses the primary key of the source table.
Composite Primary Key: The document's id is determined by multiple columns of the source table.
No Primary Key: The _id value is generated by default.
Enable routing
Enable routing partition index data in Elasticsearch. After enabling the routing feature, you can control which partition to use for storing documents in Elasticsearch.
Advanced Settings (Optional)
You can configure parameters based on business requirements.

Data Type Conversion Supported

Internal Type
JSON Types
CHAR / VARCHAR / STRING
string
BOOLEAN
boolean
BINARY / VARBINARY
string with encoding: base64
DECIMAL
number
TINYINT
number
SMALLINT
number
INT
number
BIGINT
number
FLOAT
number
DOUBLE
number
DATE
string with format: date
TIME
string with format: time
TIMESTAMP
string with format: date-time
TIMESTAMP_WITH_LOCAL_TIME_ZONE
string with format: date-time (with UTC time zone)
INTERVAL
number
ARRAY
array
MAP / MULTISET
object
ROW
object