Pupa’s Data Synchronization Tool: Enabling Bi-directional Data Flow for Multiple Scenarios
By: Pupa Technology Platform Team
Background
As Pupa’sbusiness rapidly expands, the impact of business disruptions becomes increasingly severe. Enhancing business disaster recovery capabilities has become a top priority. To address regional disasters, the company has launched adual-active construction plan, aiming for rapid recovery of core business within minutes. Dual-active construction requires both data centers to simultaneously handle read and write operations. To ensure dataconsistency between the two, real-time bi-directional synchronization is essential.
Currently, the company utilizes various data sources, including MySQL, Elasticsearch, Kafka, Redis, and Doris. However, existing open-source tools generally only support unidirectionalsynchronization, failing to meet the bi-directional synchronization requirement, let alone simultaneous support for multiple data sources. Therefore, we decided to develop a data synchronization tool to achieve real-time bi-directional synchronization between multiple data sources, ensuring the smoothprogress of dual-active construction.
Architecture Design
Design Goals
- Support Bi-directional Synchronization: The initial purpose of this tool is to support dual-active scenarios, fulfilling the requirement of simultaneous read and write operations on databases in each available unit. To guarantee data consistency, bi-directional synchronizationfunctionality is needed. Additionally, the tool should support unidirectional synchronization to support other scenarios like disaster recovery, data migration, and heterogeneous synchronization.
- Support Multi-Data Source Synchronization: The company’s business utilizes a wide range of data sources. To unify internal data synchronization tools and reduce operational costs, the tool needsto support data synchronization between various data sources, including uni- and bi-directional synchronization between the same data sources and unidirectional synchronization between heterogeneous data sources.
- High Scalability: As business complexity increases and the tool is widely adopted, new data sources may be continuously added. Therefore, the tool needs a highly scalable architecture toreduce the cost of integrating new data sources.
- Support Data Consistency Verification: Whether it’s uni- or bi-directional synchronization, data consistency is a crucial aspect that needs to be addressed during the synchronization process. The tool should support data consistency verification between source and target databases at a specific time. This includesconsistency verification for both homogeneous and heterogeneous data.
- Support High Availability: The data synchronization tool will serve different businesses, and data latency and consistency are critical metrics for these businesses. High availability of the tool can reduce the risk of data latency and inconsistency. In the event of a failure, it can quickly recover andresume transmission, minimizing the impact of data inconsistency on business operations.
Overall Design
Figure 1: System Architecture Diagram
Figure 1 depicts the system architecture of the tool, divided into four layers from top to bottom: Control Layer, Common Capability Layer, Execution Layer, and Storage Layer.
- Control Layer: Responsible for centralized management and control of the entire data synchronization tool, directly facing tool users. Besides basic data synchronization configuration functions, it also provides permission management and log auditing functionalities.
- Common Capability Layer: Provides common capabilities such as task scheduling and allocation, monitoring metric collection, and workflow management.
- Execution Layer: Responsible for executing data synchronization tasks, including data extraction, transformation, and loading. It supports different data sources and synchronization modes.
- Storage Layer: Responsible for storing data synchronization metadata, logs, and configuration information. It ensures data persistence and reliability.
Key Features
- Bi-directional Synchronization: Supports simultaneous read and write operations on databases in both data centers, ensuring data consistency.
- Multi-Data Source Support: Supports synchronization between various data sources, including MySQL, Elasticsearch, Kafka, Redis, and Doris.
- High Scalability: Designed with a highly scalable architecture,enabling easy integration of new data sources.
- Data Consistency Verification: Supports data consistency verification between source and target databases at a specific time, ensuring data integrity.
- High Availability: Provides high availability to minimize data latency and inconsistency, with rapid recovery and resumption of transmission in case of failures.
Conclusion
Pupa’s data synchronization tool is a powerful and versatile solution for ensuring data consistency and enabling bi-directional data flow across multiple data sources. Its robust architecture, key features, and focus on high availability make it a valuable asset for businesses seeking to enhance their disaster recovery capabilities and optimize data management practices.
Views: 0