Core features of WeData include the following:
Project Management
Achieve project isolation from the system/tenant level, providing administrators with the ability to manage user (member) permissions, underlying computing engine configuration, and execution resources for users using WeData.
Data Planning
Note:
Thank you very much for your attention and support for our product. However, the Data Planning feature is not yet open. Thank you again for your patience and understanding. We look forward to sharing our latest feature with you in the near future. Thanks!
Provides holistic data planning design capabilities, including data warehouse layering, logical model design, metric dimension definition, and data standards. This helps enterprises unify data warehouse specification design and standard definition, achieving automated transition from design to development.
Data Warehouse Specification: Data planning based on the global business object planning and standard definition, with layer design management of models, classified and domain management according to specific business themes, forming a hierarchical structure of business tags.
Model Design: Definition and design of logical models and entity relationships, including definition, copy, modification, deletion, import/export, version management abilities, and establishing associative mapping with physical models and metric dimensions to achieve automatic synchronization from design to development.
Standard Management: Includes standard content management and benchmark task management. By designing standard rules and configuring tasks, it standardizes data values, libraries, table structures, table names, metric dimension tags, and other levels.
Business Definition: Metric/Dimension dictionary, lifecycle definition management of base/derived metrics, dimension criteria (common dimensions, business constraints, time cycles, degenerate dimensions), and establishing associations with models for automatic metric production code generation.
DataInLong
Lightweight operation, visualized process, and open capabilities of DataInLong, supporting high-speed, stable mass data synchronization between heterogeneous data sources in complex network environments.
Full-scenario Synchronization: Includes real-time synchronization and offline synchronization.
Multi-type Heterogeneous Data Sources: Supports 30+ data sources, providing star schema support for random read-write matching.
Transformation
Data Level: Perform content transformation on synchronized data, such as data filtering, Join, etc.
Field Level: Provide single-field transformation processing, including custom data field, format conversion, date format conversion, etc.
Task and Data Monitoring
Read and Write Metrics: Support real-time statistics of task read-write metrics, including total read-write volume, speed, throughput, and dirty data, etc.
Monitoring and Alarm: Support task and resource monitoring, covering multi-channel alarms including SMS, Email, and HTTP.
Data Development
Through rigorous CI/CD process specifications and automation capabilities for test release and operation and maintenance, shorten the path from raw data processing and operation and maintenance to business application data, improving efficiency while ensuring data quality.
Online Code Development: Support code development, allowing easy drag-and-drop orchestration of task workflows, and also support visual presentation of large-scale task orchestration.
Code Development: Support online code development, debugging, and version management for tasks such as HiveSQL, SparkSQL, JDBCSQL, Spark, Shell, MapReduce, PySpark, Python, TBase, DLC SQL, DLCSpark, TCHouse-P, Impala, etc.
Task Testing: Support task and workflow testing and version management.
Development Assistance: Provide parameter configuration at three levels of granularity: project, workflow, and task, supporting time parameter operations and function parameters.
Version Management: Support version management of events, functions, tasks, and parameters.
Code Management: Provide unified code management, import, and export.
Orchestration and Scheduling: Perform flow orchestration and submission scheduling of tasks.
Scheduling Method: Support periodic, one-time, and event-triggered scheduling, with periodic scheduling configured in a crontab manner.
Dependency Strategy: Support task self-dependency and workflow self-dependency.
Cross-cycle Dependency Configuration: Provide cross-cycle dependency configuration and self-definition dependency configuration. The scope of upstream and downstream dependency instances can be selected as needed by self-defining.
Batch Orchestration: Offer the ability to batch create tasks and dependencies via Excel, speeding up task dependency orchestration efficiency.
Release and Operation: Publish completed development tasks to the production environment as needed, and provide unified monitoring and operation for the tasks.
Task Release: Support releasing the development outcomes online.
Monitoring and Operation: Perform flow orchestration and submission scheduling of tasks.
Analysis and Exploration: Enhance task collaborative development efficiency through intelligent and user-friendly data development methods, helping users clearly view the task processing steps and significantly improving data ad-hoc exploration efficiency.
Online Editing: Provide a visual interactive analysis IDE.
Run: Offer visualized execution information.
Development Assistance: Provide efficiency tools for development assistance.
Data Quality
Comprehensive data quality auditing capabilities are provided through flexible rule configuration, comprehensive task management, and multi-dimensional quality assessment across all stages of the data lifecycle from ingestion, integration, processing, to consumption.
Multi-source Data Monitoring: Support monitoring data sources and engine types including EMR Hive, Spark, DLC (public cloud), TCHouse-P, TBDS, Gbase (private cloud), etc., offering the ability to perform full-scale data validation across multiple sources.
Rich Rule Templates: Currently provides 6 dimensions and 56 industry-standard built-in table-level and field-level rule templates, realizing true out-of-the-box usability and significantly improving quality control workflow efficiency, helping users perceive data changes and issues arising during the ETL process from various dimensions.
Flexible Quality Control Configuration: Support three rule creation modes — system quality rule templates, self-defined templates, and self-defined SQL. Parameters can be adjusted according to business needs, task execution strategies can be configured, and full-link quality control validation can be easily achieved.
Global Link Guarantee: Supports two execution methods: associated production scheduling and offline periodic detection, providing pre-, mid-, and post-event full-link data assurance operational capability. Timely alerts and block interceptions prevent dirty data from spreading downstream.
Multi-dimensional Governance Visualization: The Quality Overview and Quality Report modules provide users with a global perspective, allowing them to fully understand the status of quality tasks, alarm blockage trends, and quality scores across various dimensions. This helps quickly identify and locate issues, and understand quality improvement effects.
Data Security
Provides centralized data security control and a collaborative mechanism to ensure effective data flow under secure conditions.
Unified Data Security Management: Deeply integrates security policies with bound computational storage engines, unifying data access and simplifying data use processes.
Permission Approval: Bridges the Ranger Permission Policy System, achieving accountability to individuals and table-level permission control capability. Provides channels for permission application and approval, securely opening data access control capability.
Data Operations
Based on powerful underlying metadata capabilities, provides data directory, lineage analysis, popularity analysis, asset rating, business classification, tag management, and other data asset services, effectively enhancing users' understanding, control, and cooperation abilities with enterprise-level massive data.
Data Discovery: Unified metadata collection and management.
Data Overview: Provides an overview of data assets, including basic information such as items, tables, storage volume, and data type coverage, as well as features like data panorama and hot ranking.
Data Directory: Supports quick search and location of global table-level and field-level data; table details provide comprehensive technical and business information, as well as data lineage, temperature, quality, production and change, preview, and other features.
Database Table Management: Supports management of global database tables.
Business Classification: Supports creating and managing topic categories, data warehouse layering, and business tags according to business needs, and bulk classification and layering operations on data tables.
Data Service
Provides capabilities covering the full lifecycle of APIs, including API production, API management, and API market, helping enterprises unify the management of internal and external API services and build a unified data service bus.
Quick API Production.
API Management and Operation.
API Secure Invocation.