The content of this page has been automatically translated by AI. If you encounter any problems while reading, you can view the corresponding content in Chinese.

Overview

Last updated: 2024-08-22 09:52:52

WeData (hereinafter referred to as WeData) is a cloud-based, one-stop data development and governance platform. It integrates full-chain DataOps capabilities, including DataInLong, data development, and task operation and maintenance. Additionally, it features a series of data governance and operation capabilities such as data map, data quality, and data security, helping enterprises achieve cost reduction and efficiency improvement, and maximize data value during the data construction and application process.

Positioning

Target Industries and Users

Suitable for industries such as government, finance, pan-Internet, industry, energy, transportation, education, cultural tourism, real estate, retail, medical, and media. The target audience includes but is not limited to:
Technical personnel engaged in data development, algorithm development, data operation and maintenance.
Business personnel engaged in data analysis, product operation.
Management personnel responsible for data security compliance.
Management personnel in charge of the company's core data assets.

Business Challenges and Pain Points

Since the outbreak of the information technology revolution and the rapid development of mobile internet in recent years, coupled with the continuous evolution and implementation of the Internet+ concept, enterprises in various industries have accumulated more and more data, leading to an urgent need for data processing and data application. However, there are many problems and challenges in this process:
Complex infrastructure construction: Technologies such as Hadoop and Spark are numerous and complex to construct.
Weak technical risk resistance: There is a disconnect between development and testing, leading to high error rates in data, numerous data tasks, complex dependencies, and a lack of effective change control.
Complex data link: Open source projects often only solve specific scenarios and require the combination of multiple open-source projects to build a complete data link.
Difficult data management: Involves cross-departmental and cross-team collaboration, complex team roles, and high communication costs.
Difficult data governance implementation: Data quality and data security cannot be guaranteed, making upper-level applications hesitant to use the data.
Long business build cycle: Data warehouse construction cycles are too long, taking six months to a year; data requirement responses are slow, with delays of two to three days.

Core Capabilities

WeData provides comprehensive product services for data production and consumption. The core service capabilities are as follows:

Collaboration

Based on the collaborative space around the data value chain, it enables better collaboration among different roles in the data team, breaking down silos between teams and shortening the path from raw data to data value.
DataOps Concept
In large-scale task development scenarios, high concurrency allows for online execution of data development and testing.
Developers focus on task development and unit testing, avoiding the learning curve of business logic.
Orchestration personnel focus on task orchestration and scheduling configuration, with dedicated personnel shortening the implementation cycle.
In agile development scenarios, the integration of development and orchestration improves efficiency.
The data task development is completed in the process of implementing orchestration business logic.
It is possible to test data logic and business logic simultaneously.
Implementation Process
First develop, then orchestrate: Workflow design does not block development work, and developers do not need to understand orchestration logic.
After completing the development space, import into the orchestration space for dedicated task orchestration.
Suitable for central teams working on large-scale, high-concurrency development tasks.
First orchestrate, then develop: Developers understand business logic, design workflows first, then develop.
Directly orchestrate tasks and conduct development testing in the orchestration space for more agility.
Suitable for teams working on small-scale or incremental tasks in an agile development model.

Efficiency

Based on the DataOps concept with agile iterations, automated processes, and tools to enhance data reliability and speed up data generation and analysis link efficiency.
Agile and easy to use: Supports incremental code development and release; code auto-completion; visual drag-and-drop process design; online code debugging and log viewing.
Flexible development: Development modes adapt to multiple scenarios, supporting both first develop then orchestrate and first orchestrate then develop.
High performance and scalable: High-performance scheduling engine supporting millions of daily task schedules, integrates with multiple engines and supports engine extensions, default supports most engines with JDBC interfaces including EMR, DLC, TBDS, RDS, and more than 20 engines.
DataOps concept
Supports submission, comparison, and rollback capabilities for version management to enable gray release of tasks.
Supports incremental release of tasks, events, parameters, and functions instead of traditional cyclical releases.
Agile development, rapid iteration to overall shorten the data assetization cycle.
Implementation Process
After the development of data tasks, version submission is required to reflect in the workflow.
Different task versions can be quickly debugged within the same workflow.
Gray release implemented in different workflows of the same project with different task versions.
Incremental release by date in release management, enabling fast iteration.

Integrated

Serving multiple roles in enterprise data management, data production, data application, and data operations, providing an integrated product experience from different perspectives.
Full-link Production Governance: Provides robust quality and security guarantees for data production and consumption through pre-planning, in-process exception blocking, post-event quality and cost analysis, and data flow security control.
One-stop Operational Governance: Based on data self-service and the democratic concept, on a secure and stable foundation, data maps, data insights, and sharing make it easier to find, understand, analyze, and share data.

Quality

Data quality control covering pre, during, and post stages, embedded in the DataOps pipeline-style development process, ensuring comprehensive data quality enhancement.
DataOps Concept
Shift from post-event quality scoring to in-process quality monitoring, integrating both code testing and data testing to ensure high-quality data analysis.
Shift from post-event standard benchmarking to pre-event standard implementation to ensure data quality and consistency in statistical metrics during data analysis.
Implementation Process
Data tasks/workflow must pass online debugging before submission. Online debugging will automatically trigger corresponding quality monitoring tasks for the data tables.
Agile Data Warehouse Modeling Tool supports direct referencing of pre-defined data standards, ensuring standard implementation at the source.
Tables adhering to data standards support setting a zero-tolerance threshold for dirty data during DataInLong tasks to ensure standard compliance.