Scenarios

Last updated: 2023-12-21 15:33:34

Elastic MapReduce (EMR) clusters have a multitude of applications. Every scenario that Hadoop and Spark can support, EMR can also accommodate, as EMR is fundamentally a cluster service for Hadoop and Spark. The following text presents classic scenarios for EMR application.

Offline Data Analysis

Massive logs from game servers, web applications, mobile apps, and other business servers can be synchronized to EMR data nodes or COS. With the aid of tools like Hue, mainstream computing frameworks such as Hive, Spark, and Presto can be used to quickly gain data insights. Tools like Sqoop can be employed to load data scattered across various TencentDB or other storage engines, and the analyzed data can be synchronized back to TencentDB, providing data support for data visualization products like RayData.

Stream Data Processing

After pushing real-time data generated on business servers to CMQ message middleware through APIs or SDKs in programs/tools, an appropriate stream data processing engine can be selected in the EMR product to analyze the data, enabling real-time alerts for business changes. Additionally, the analysis results can be synchronized in real-time to storage engines like TencentDB, facilitating real-time visual inspection of business status through data visualization products like RayData.

Analyzing COS Data

Massive data stored on COS can be swiftly analyzed through the EMR product, achieving thorough storage-compute separation. With such a design, the rich data synchronization tools provided by COS can be fully utilized. Simultaneously, it allows multiple Hadoop clusters of different versions to analyze the same data, addressing the issue of coexistence of multiple Hadoop clusters due to data consistency and historical reasons.