Programming

Apache Kafka MirrorMaker 2: Multi-Cluster Topic Aggregation Guide

Learn how to configure Apache Kafka MirrorMaker 2 to merge topics from multiple source clusters into a single target topic. Best practices and performance optimization included.

3 answers 1 view

How to use Apache Kafka MirrorMaker 2 to pull topic data from multiple source Kafka clusters and merge that data into a single local topic? Specifically, if I have multiple data centers with Kafka clusters, each containing a topic with the same name, can MirrorMaker 2 aggregate data from these source clusters into a single topic on the target cluster? Additionally, is this approach a best practice, or would using a custom consumer on the target cluster be more suitable for merging topic data?

Apache Kafka MirrorMaker 2 provides a robust solution for replicating and aggregating data across multiple Kafka clusters, allowing you to pull topic data from multiple source clusters and merge that data into a single target topic. This capability is particularly valuable for organizations with distributed data centers that need to consolidate data from identical topics across clusters for centralized analytics, reporting, or disaster recovery scenarios. By leveraging MirrorMaker 2’s enhanced replication features, you can maintain data consistency while simplifying your multi-cluster architecture.


Contents


Introduction to Apache Kafka MirrorMaker 2

Apache Kafka MirrorMaker 2 represents a significant evolution from the original MirrorMaker, offering enhanced capabilities for data replication between Apache Kafka clusters. Unlike its predecessor, MirrorMaker 2 operates as a dedicated application designed specifically for cross-cluster data movement, providing improved performance, better checkpointing mechanisms, and support for exactly-once semantics. This makes it particularly suitable for complex multi-cluster scenarios where data needs to be aggregated or replicated across different data centers or environments.

The primary purpose of MirrorMaker 2 is to consume messages from source clusters and produce them to target clusters, maintaining data consistency and enabling various architectural patterns. When configured properly, it can handle the specific use case you’re describing—pulling topic data from multiple source Kafka clusters and merging that data into a single target topic. This approach is especially valuable when you have multiple data centers, each running their own Apache Kafka cluster, and each containing topics with the same name that need to be consolidated.

MirrorMaker 2 addresses several limitations of the original MirrorMaker, including:

  • Improved performance through parallel processing
  • Better offset management and checkpointing
  • Support for exactly-once semantics
  • Enhanced monitoring and metrics
  • Simplified configuration through dedicated configuration classes
  • Automatic failover and recovery capabilities

These improvements make MirrorMaker 2 a compelling option for multi-cluster topic aggregation, though there are important considerations to evaluate when comparing it to custom consumer solutions.


Setting Up MirrorMaker 2 for Multi-Cluster Topic Aggregation

To configure MirrorMaker 2 for multi-cluster topic aggregation, you’ll need to properly set up your Apache Kafka environment and MirrorMaker 2 application. The process involves several key components:

Prerequisites

Before you begin, ensure you have:

  • Multiple Kafka clusters running (your source clusters)
  • A target Kafka cluster where you want to consolidate data
  • Network connectivity between MirrorMaker 2 and all clusters
  • Proper authentication and authorization configured for all clusters

Basic Architecture

For multi-cluster topic aggregation, MirrorMaker 2 typically runs as a separate application that connects to all source clusters and produces to the target cluster. The basic architecture involves:

java
// MirrorMaker 2 connects to all source clusters and one target cluster
Source Cluster 1 → MirrorMaker 2 → Target Cluster
Source Cluster 2 → MirrorMaker 2 → Target Cluster
Source Cluster N → MirrorMaker 2 → Target Cluster

Installation

To install MirrorMaker 2:

  1. Download the Apache Kafka distribution that includes MirrorMaker 2
  2. Extract the distribution to your preferred location
  3. Ensure all required JAR files are in the classpath
  4. Set up any necessary configuration files

The MirrorMaker 2 is included in standard Kafka distributions and can be run using the kafka-mirror-maker script or programmatically as a Java application.

Initial Configuration

Your basic configuration will need to specify:

  • Connection details for all source clusters
  • Connection details for the target cluster
  • Topics to replicate (using regex patterns)
  • Replication policies and filters

A minimal configuration file might look like:

properties
# Source cluster connections
clusters = source1, source2, source3

# Source 1 configuration
source1.bootstrap.servers = dc1-kafka-1:9092,dc1-kafka-2:9092
source1.security.protocol = SASL_SSL
source1.sasl.mechanism = SCRAM-SHA-512
source1.sasl.jaas.config = ...

# Source 2 configuration
source2.bootstrap.servers = dc2-kafka-1:9092,dc2-kafka-2:9092
source2.security.protocol = SASL_SSL
source2.sasl.mechanism = SCRAM-SHA-512
source2.sasl.jaas.config = ...

# Source 3 configuration
source3.bootstrap.servers = dc3-kafka-1:9092,dc3-kafka-2:9092
source3.security.protocol = SASL_SSL
source3.sasl.mechanism = SCRAM-SHA-512
source3.sasl.jaas.config = ...

# Target cluster configuration
target.bootstrap.servers = local-kafka-1:9092,local-kafka-2:9092
target.security.protocol = SASL_SSL
target.sasl.mechanism = SCRAM-SHA-512
target.sasl.jaas.config = ...

# Topics to replicate (aggregating topics with same name from all sources)
topics = ".*" # Match all topics
groups = ".*" # Match all consumer groups
emit.checkpoints.interval = 1m
sync.topic.configs.enabled = true

This configuration allows MirrorMaker 2 to pull topics with the same name from multiple source clusters and merge them into corresponding topics on the target cluster.


Configuration for Merging Multiple Source Topics

The key to successfully merging topics from multiple source clusters lies in proper configuration. When multiple source clusters contain topics with the same name, MirrorMaker 2 can automatically merge them into a single topic on the target cluster. Here’s how to configure this effectively:

Topic Merging Strategy

By default, MirrorMaker 2 treats topics with the same name from different source clusters as separate entities that need to be merged on the target cluster. The merging process preserves all messages from all source topics into the target topic, maintaining the original message order within each source’s contribution.

To enable this behavior, you need to configure the following properties:

properties
# Enable topic merging for topics with the same name
sync.topic.configs.enabled = true

# How to handle topic configuration conflicts
sync.topic.configs.strategy = INTERSECTION

# Enable replication of topic configuration changes
sync.topic.configs.interval = 5m

# Enable replication of ACLs
sync.acls.enabled = true

Handling Topic Name Conflicts

When topics with the same name exist across multiple clusters, MirrorMaker 2 handles them by default by merging them into a single topic on the target cluster. However, you might want to implement additional strategies:

  1. Prefix-based naming: Add a prefix to topics from specific sources
properties
# Optional: Add source cluster prefix to topic names
# topics.regex = (.+)
# topics.rename.regex = $1
# topics.rename.replacement = ${clusters.source1}.$1
  1. Topic whitelisting: Only specific topics are merged
properties
# Only merge specific topics
topics = "orders", "customers", "products"
  1. Topic blacklisting: Exclude certain topics from merging
properties
# Exclude system topics
topics.exclude = "__.*"

Replication Factor and Partition Management

For proper merging, ensure your target cluster has appropriate partitioning:

properties
# Target topic configuration
target.topic.creation.enable = true
target.topic.creation.replication.factor = 3
target.topic.creation.default.partitions = 12

# Minimum number of in-sync replicas
target.topic.creation.min.insync.replicas = 2

Offset Management

Proper offset management is crucial for reliable merging:

properties
# Offset management settings
emit.checkpoints.interval = 1m
checkpoints.topic.replication.factor = 3
emit.heartbeats.interval = 30s
offsets.topic.replication.factor = 3

Security Configuration

For production environments, configure security for all clusters:

properties
# Security configuration for all clusters
security.protocol = SASL_SSL
sasl.mechanism = SCRAM-SHA-512
sasl.jaas.config = org.apache.kafka.common.security.scram.ScramLoginModule required username="mm2-user" password="mm2-password";

# SSL configuration
ssl.truststore.location = /path/to/truststore.jks
ssl.truststore.password = truststore-password

Full Configuration Example

Here’s a comprehensive configuration for multi-cluster topic merging:

properties
# Cluster definitions
clusters = dc1, dc2, dc3

# Data Center 1 configuration
dc1.bootstrap.servers = dc1-kafka-1:9092,dc1-kafka-2:9092
dc1.security.protocol = SASL_SSL
dc1.sasl.mechanism = SCRAM-SHA-512
dc1.sasl.jaas.config = org.apache.kafka.common.security.scram.ScramLoginModule required username="mm2-dc1" password="dc1-password";
dc1.ssl.truststore.location = /etc/kafka/dc1-truststore.jks
dc1.ssl.truststore.password = truststore-password

# Data Center 2 configuration
dc2.bootstrap.servers = dc2-kafka-1:9092,dc2-kafka-2:9092
dc2.security.protocol = SASL_SSL
dc2.sasl.mechanism = SCRAM-SHA-512
dc2.sasl.jaas.config = org.apache.kafka.common.security.scram.ScramLoginModule required username="mm2-dc2" password="dc2-password";
dc2.ssl.truststore.location = /etc/kafka/dc2-truststore.jks
dc2.ssl.truststore.password = truststore-password

# Data Center 3 configuration
dc3.bootstrap.servers = dc3-kafka-1:9092,dc3-kafka-2:9092
dc3.security.protocol = SASL_SSL
dc3.sasl.mechanism = SCRAM-SHA-512
dc3.sasl.jaas.config = org.apache.kafka.common.security.scram.ScramLoginModule required username="mm2-dc3" password="dc3-password";
dc3.ssl.truststore.location = /etc/kafka/dc3-truststore.jks
dc3.ssl.truststore.password = truststore-password

# Target cluster configuration
target.bootstrap.servers = local-kafka-1:9092,local-kafka-2:9092
target.security.protocol = SASL_SSL
target.sasl.mechanism = SCRAM-SHA-512
target.sasl.jaas.config = org.apache.kafka.common.security.scram.ScramLoginModule required username="mm2-target" password="target-password";
target.ssl.truststore.location = /etc/kafka/target-truststore.jks
target.ssl.truststore.password = truststore-password

# Topic replication configuration
topics = ".*" # Match all topics
groups = ".*" # Match all consumer groups

# Topic merging configuration
sync.topic.configs.enabled = true
sync.topic.configs.strategy = INTERSECTION
sync.topic.configs.interval = 5m
sync.acls.enabled = true

# Target topic configuration
target.topic.creation.enable = true
target.topic.creation.replication.factor = 3
target.topic.creation.default.partitions = 12
target.topic.creation.min.insync.replicas = 2

# Offset management
emit.checkpoints.interval = 1m
checkpoints.topic.replication.factor = 3
emit.heartbeats.interval = 30s
offsets.topic.replication.factor = 3

# Performance tuning
refresh.topics.interval.seconds = 60
refresh.groups.interval.seconds = 60
heartbeats.topic.replication.factor = 1
emit.heartbeats.interval = 30s

This configuration will enable MirrorMaker 2 to pull topics with the same name from multiple source clusters and merge them into single topics on the target cluster while maintaining proper security, replication, and performance characteristics.


Best Practices for MirrorMaker 2 Deployment

When deploying MirrorMaker 2 for multi-cluster topic aggregation, following best practices ensures reliability, performance, and maintainability. Let’s explore the key considerations:

High Availability and Fault Tolerance

MirrorMaker 2 should be deployed with high availability in mind:

  1. Run multiple instances: Deploy at least two MirrorMaker 2 instances across different availability zones or servers to prevent single points of failure.

  2. Use proper checkpointing: Configure consistent checkpointing to enable failover without data loss:

properties
emit.checkpoints.interval = 1m
checkpoints.topic.replication.factor = 3
  1. Monitor cluster health: Set up alerts for MirrorMaker 2 health and Kafka cluster connectivity issues.

Network Optimization

For cross-cluster replication, network considerations are critical:

  1. Bandwidth planning: Account for the combined data throughput from all source clusters when planning network bandwidth and MirrorMaker 2 resources.

  2. Compression: Enable compression to reduce network traffic:

properties
compression.type = lz4
  1. Batching: Configure appropriate batching for producers:
properties
producer.batch.size = 16384
producer.linger.ms = 5

Security Best Practices

  1. Authentication: Use SASL or SSL for all cluster connections:
properties
security.protocol = SASL_SSL
sasl.mechanism = SCRAM-SHA-512
  1. Authorization: Configure proper ACLs for MirrorMaker 2 service accounts on all clusters.

  2. Encryption: Use SSL/TLS for all data in transit.

Performance Tuning

  1. Consumer configuration: Tune consumer settings for optimal performance:
properties
max.poll.records = 500
fetch.max.bytes = 1048576
  1. Producer configuration: Optimize producer settings for target cluster:
properties
acks = all
retries = Integer.MAX_VALUE
  1. Memory management: Configure appropriate JVM heap size based on expected data volumes.

Monitoring and Observability

  1. Metrics collection: Enable JMX metrics and integrate with monitoring systems:
properties
metrics.reporters = io.confluent.monitoring.clients.interceptor.MetricsReporter
metrics.reporter.bootstrap.servers = monitoring-cluster:9092
  1. Logging: Configure appropriate logging levels and log aggregation.

  2. Alerting: Set up alerts for:

  • Lag between source and target clusters
  • MirrorMaker 2 process failures
  • Network connectivity issues
  • Resource utilization

Data Consistency and Ordering

  1. Exactly-once semantics: Configure for exactly-once processing when required:
properties
emit.checkpoints.interval = 1m
  1. Topic ordering: Understand that message ordering across source clusters is not guaranteed in the merged topic.

Maintenance Procedures

  1. Upgrade planning: Plan MirrorMaker 2 and Kafka upgrades with proper testing.

  2. Configuration management: Use configuration management tools to maintain consistency across instances.

  3. Documentation: Maintain documentation of configurations, dependencies, and procedures.

Disaster Recovery Considerations

  1. Replication direction: Consider bidirectional replication if needed for disaster recovery.

  2. Failover testing: Regularly test failover procedures to ensure they work as expected.

  3. Data verification: Implement processes to verify data consistency between clusters.

By following these best practices, you can ensure that your MirrorMaker 2 deployment for multi-cluster topic aggregation is reliable, performant, and maintainable over the long term.


Alternative Approaches: Custom Consumer vs MirrorMaker 2

When considering how to merge topic data from multiple Apache Kafka clusters, it’s important to evaluate MirrorMaker 2 against alternative approaches, particularly using a custom consumer application on the target cluster. Each approach has its strengths and weaknesses depending on your specific requirements.

MirrorMaker 2 Approach

MirrorMaker 2 provides a dedicated, purpose-built solution for cross-cluster replication with several advantages:

Advantages:

  1. Simplified deployment and maintenance: MirrorMaker 2 is a pre-built component that handles many complex aspects of replication automatically.
  2. Built-in monitoring and metrics: Comprehensive metrics and health checks are included out of the box.
  3. Exactly-once support: Native support for exactly-once semantics when properly configured.
  4. Automatic failover: Built-in mechanisms for handling cluster failures and recovery.
  5. Topic configuration replication: Automatically replicates topic configuration changes from source to target clusters.
  6. Reduced development effort: No need to write, test, and maintain custom replication logic.
  7. Community and vendor support: Backed by the Apache Kafka community and commercial vendors like Confluent.

Limitations:

  1. Less flexibility: Limited customization options compared to a custom solution.
  2. Potential overkill: For simple use cases, it might be more complex than necessary.
  3. Resource requirements: Requires dedicated resources to run the MirrorMaker 2 application.
  4. Learning curve: Understanding all configuration options and behavior requires time.

Custom Consumer Approach

A custom consumer application gives you complete control over the replication process:

**Advantages:1. Full control: Complete control over the replication logic, filtering, and transformation.
2. Flexibility: Can implement complex business logic during replication.
3. Optimized for specific use case: Can be tailored exactly to your requirements.
4. Integration with existing systems: Can be integrated more easily with other components in your architecture.
5. Potentially lower resource usage: Can be optimized for your specific data patterns.

Limitations:

  1. Development overhead: Requires significant development, testing, and maintenance effort.
  2. No built-in monitoring: Need to implement monitoring and alerting from scratch.
  3. Error handling complexity: Must implement robust error handling and recovery mechanisms.
  4. Exactly-once complexity: Implementing exactly-once semantics is challenging and requires careful design.
  5. Offset management responsibility: Full responsibility for managing offsets across multiple source clusters.
  6. No automatic failover: Need to implement custom failover and recovery logic.
  7. Topic configuration management: Must handle topic configuration replication manually.

Comparison Based on Key Factors

Factor MirrorMaker 2 Custom Consumer
Development Effort Low (configuration-based) High (custom code)
Maintenance Overhead Medium High
Flexibility Medium High
Performance Good for standard use cases Can be optimized for specific patterns
Monitoring Built-in Requires custom implementation
Failover Handling Automatic Custom implementation required
Exactly-once Support Built-in Complex to implement
Topic Config Replication Automatic Manual implementation needed
Resource Requirements Medium Can be optimized

Recommendations

Based on your specific use case of merging topics from multiple Kafka clusters:

Choose MirrorMaker 2 when:

  • You need a straightforward solution for replicating and merging topics
  • You value reduced development and maintenance overhead
  • You need built-in monitoring and exactly-once semantics
  • You want automatic handling of topic configuration changes
  • You’re not implementing complex business logic during replication

Choose a custom consumer when:

  • You need complex transformations or filtering during replication
  • You have very specific performance requirements that can’t be met by MirrorMaker 2
  • You need to integrate with other systems in complex ways
  • You have the resources to develop and maintain a custom solution
  • You need fine-grained control over the replication process

Hybrid Approach

In some cases, a hybrid approach might be beneficial:

  • Use MirrorMaker 2 for basic replication
  • Use a custom consumer to handle complex processing on the target cluster
  • Or use MirrorMaker 2 to replicate to an intermediate cluster, then use a custom consumer for final processing

For most use cases involving multi-cluster topic aggregation, MirrorMaker 2 represents a good balance of functionality, simplicity, and reliability. However, if you have very specific requirements that can’t be met by MirrorMaker 2, a custom consumer solution might be justified despite the additional complexity.


Troubleshooting Common Issues

When implementing Apache Kafka MirrorMaker 2 for multi-cluster topic aggregation, you may encounter several common issues. Understanding how to troubleshoot these problems is essential for maintaining a stable replication environment.

Connectivity Issues

Symptom: MirrorMaker 2 cannot connect to source or target clusters.

Possible causes:

  • Network connectivity problems between MirrorMaker 2 and Kafka clusters
  • Incorrect bootstrap server addresses
  • Firewall rules blocking connections
  • Authentication or authorization failures

Solutions:

  1. Verify network connectivity using tools like telnet or nc:
bash
telnet dc1-kafka-1 9092
  1. Check bootstrap server configurations in your MirrorMaker 2 configuration.

  2. Verify firewall rules allow connections on Kafka ports (typically 9092 for SASL_SSL).

  3. Check authentication configurations, including SASL mechanisms and credentials:

properties
# Verify SASL configuration
security.protocol = SASL_SSL
sasl.mechanism = SCRAM-SHA-512
sasl.jaas.config = org.apache.kafka.common.security.scram.ScramLoginModule required username="mm2-user" password="correct-password";

Topic Replication Failures

Symptom: Topics are not being replicated or merged properly between clusters.

Possible causes:

  • Topic name conflicts or misconfigurations
  • Insufficient permissions on source or target clusters
  • Topic configuration mismatches
  • MirrorMaker 2 regex pattern mismatches

Solutions:

  1. Check MirrorMaker 2 topic configuration:
properties
# Verify topic regex patterns
topics = ".*" # Should match all topics
groups = ".*" # Should match all consumer groups
  1. Verify topic creation settings on the target cluster:
properties
target.topic.creation.enable = true
target.topic.creation.replication.factor = 3
  1. Check ACLs and permissions for MirrorMaker 2 service accounts.

  2. Monitor MirrorMaker 2 logs for specific error messages related to topic replication.

Offset Management Problems

Symptom: Messages are duplicated or lost during replication.

Possible causes:

  • Improper offset management configuration
  • Checkpoint topic issues
  • MirrorMaker 2 instance failures during replication
  • Network interruptions causing consumer group rebalances

Solutions:

  1. Verify offset-related configurations:
properties
# Offset management settings
emit.checkpoints.interval = 1m
checkpoints.topic.replication.factor = 3
offsets.topic.replication.factor = 3
  1. Check the health of the checkpoints topic on the target cluster.

  2. Monitor for consumer group rebalances that might cause offset resets.

  3. Consider enabling exactly-once semantics to prevent message duplication:

properties
emit.checkpoints.interval = 1m

Performance Issues

Symptom: MirrorMaker 2 is not keeping up with the replication load or showing high latency.

Possible causes:

  • Insufficient resources (CPU, memory, network)
  • Inefficient batching or compression settings
  • Consumer group configuration issues
  • Network bottlenecks between clusters

Solutions:

  1. Tune MirrorMaker 2 performance parameters:
properties
# Producer tuning
producer.batch.size = 16384
producer.linger.ms = 5
producer.compression.type = lz4

# Consumer tuning
max.poll.records = 500
fetch.max.bytes = 1048576
  1. Monitor resource usage and scale up MirrorMaker 2 instances if needed.

  2. Check network bandwidth and latency between clusters.

  3. Consider increasing the number of MirrorMaker 2 worker threads.

Security Configuration Problems

Symptom: MirrorMaker 2 fails to authenticate or authorize with Kafka clusters.

Possible causes:

  • Incorrect security protocol configuration
  • Expired or invalid credentials
  • SSL/TLS certificate issues
  • SASL mechanism mismatches

Solutions:

  1. Verify security protocol configurations:
properties
security.protocol = SASL_SSL
sasl.mechanism = SCRAM-SHA-512
  1. Check expiration dates of certificates and credentials.

  2. Verify SSL truststore configurations:

properties
ssl.truststore.location = /path/to/truststore.jks
ssl.truststore.password = password
  1. Ensure SASL configurations match between MirrorMaker 2 and Kafka brokers.

MirrorMaker 2 Process Failures

Symptom: MirrorMaker 2 process crashes or becomes unresponsive.

Possible causes:

  • Out of memory errors
  • Classpath issues
  • Configuration file errors
  • Network timeouts

Solutions:

  1. Increase JVM heap size if experiencing memory issues:
bash
export KAFKA_HEAP_OPTS="-Xmx2g -Xms2g"
  1. Verify classpath includes all required JAR files.

  2. Check configuration file syntax and validity.

  3. Increase timeout settings for slow networks:

properties
request.timeout.ms = 30000
metadata.request.timeout.ms = 30000

Monitoring and Debugging

Key metrics to monitor:

  • Replication lag between source and target clusters
  • MirrorMaker 2 process CPU and memory usage
  • Network throughput between clusters
  • Error rates in replication

Debugging tools:

  • MirrorMaker 2 JMX metrics
  • Kafka tools (kafka-consumer-groups, kafka-topics)
  • Application logs
  • Network monitoring tools

By systematically addressing these common issues, you can ensure your MirrorMaker 2 deployment for multi-cluster topic aggregation remains stable and reliable.


Performance Considerations and Optimization

When implementing Apache Kafka MirrorMaker 2 for multi-cluster topic aggregation, performance optimization is crucial to ensure efficient data replication and minimal latency between source and target clusters. Let’s explore the key performance considerations and optimization strategies.

Understanding Performance Factors

Several factors impact MirrorMaker 2 performance in a multi-cluster setup:

  1. Network bandwidth: The combined throughput from all source clusters
  2. Message volume: The number and size of messages being replicated
  3. Cluster distance: Network latency between data centers
  4. Resource allocation: CPU, memory, and I/O resources for MirrorMaker 2
  5. Topic configuration: Partition count, replication factor, and retention policies
  6. Consumer and producer settings: Batching, compression, and acknowledgment configurations

Optimizing MirrorMaker 2 Configuration

Producer Configuration

The producer (writing to the target cluster) significantly impacts performance:

properties
# Producer configuration for target cluster
# Enable batching to improve throughput
producer.batch.size = 16384
producer.linger.ms = 5

# Use compression to reduce network traffic
producer.compression.type = lz4

# Tune for performance while maintaining durability
producer.acks = 1 # Trade-off between durability and performance
producer.retries = Integer.MAX_VALUE
producer.max.in.flight.requests.per.connection = 5

# Buffer memory
producer.buffer.memory = 33554432

Consumer Configuration

The consumer (reading from source clusters) affects how efficiently data is pulled:

properties
# Consumer configuration for source clusters
# Increase batch size for better throughput
max.poll.records = 500
fetch.max.bytes = 1048576
fetch.max.wait.ms = 500

# Offset management
auto.offset.reset = earliest
enable.auto.commit = false # Better control with manual commits

# Session and heartbeat settings
session.timeout.ms = 30000
heartbeat.interval.ms = 10000

MirrorMaker 2 Specific Settings

properties
# MirrorMaker 2 performance tuning
refresh.topics.interval.seconds = 60
refresh.groups.interval.seconds = 60

# Checkpoint management
emit.checkpoints.interval = 1m
emit.heartbeats.interval = 30s

# Number of worker threads
num.stream.threads = 4 # Adjust based on available cores

Resource Planning and Scaling

Hardware Requirements

For a multi-cluster MirrorMaker 2 deployment, consider:

  1. CPU: Allocate sufficient cores for concurrent replication streams
  2. Memory: Enough heap for message buffering and state management
bash
export KAFKA_HEAP_OPTS="-Xmx4g -Xms4g"
  1. Network: High bandwidth, low latency connections to all clusters
  2. Storage: SSD storage for better I/O performance

Horizontal Scaling

When MirrorMaker 2 cannot keep up with the replication load:

  1. Run multiple instances: Deploy additional MirrorMaker 2 instances and partition the work
properties
# Each instance handles different topics or clusters
clusters = source1, source2 # Instance 1
clusters = source3, source4 # Instance 2
  1. Topic partitioning: Design topics with appropriate partition counts to enable parallel processing

Network Optimization

For cross-cluster replication, network considerations are paramount:

  1. Compression: Always use compression to reduce data transfer
properties
producer.compression.type = lz4
  1. Batching: Optimize batch sizes to maximize throughput
properties
producer.batch.size = 16384 # 16KB
producer.linger.ms = 5 # 5ms delay to fill batches
  1. Connection pooling: Reuse connections to reduce overhead
properties
connections.max.idle.ms = 540000 # 9 minutes

Monitoring and Performance Tuning

Key Metrics to Monitor

  1. Replication lag: Time difference between source and target clusters
  2. Throughput: Messages per second and bytes per second
  3. Error rates: Failed replication attempts
  4. Resource utilization: CPU, memory, and network usage
  5. Consumer lag: How far behind consumers are on source clusters

Performance Testing

Before production deployment, conduct performance testing:

  1. Load testing: Simulate production workloads
  2. Failover testing: Test behavior during cluster failures
  3. Network degradation testing: Test performance under poor network conditions

Advanced Optimization Techniques

Exactly-Once Processing

When data consistency is critical:

properties
# Enable exactly-once semantics
emit.checkpoints.interval = 1m
emit.heartbeats.interval = 30s
checkpoints.topic.replication.factor = 3

Asynchronous Replication

For improved throughput at the cost of some latency:

properties
# Increase async processing
producer.acks = 1 # Instead of "all"
max.in.flight.requests.per.connection = 5

Dynamic Configuration

Adjust configurations based on observed performance:

  1. Monitor metrics: Use JMX or Prometheus to collect performance data
  2. Automate adjustments: Implement dynamic configuration based on load
  3. Scale horizontally: Add more MirrorMaker 2 instances when needed

Performance Trade-offs

When optimizing MirrorMaker 2, consider these trade-offs:

Optimization Benefit Cost
Higher batch sizes Better throughput Increased latency
More frequent commits Lower risk of data loss Higher overhead
Compression Reduced network traffic CPU overhead
Exactly-once semantics Data consistency Performance impact
Parallel processing Better throughput Resource usage

Production Deployment Checklist

Before deploying MirrorMaker 2 in production:

  1. [ ] Configure appropriate resource allocation
  2. [ ] Set up monitoring and alerting
  3. [ ] Test failover scenarios
  4. [ ] Validate network bandwidth requirements
  5. [ ] Configure proper security settings
  6. [ ] Implement backup and recovery procedures
  7. [ ] Document configurations and procedures
  8. [ ] Conduct performance testing under load

By carefully considering these performance factors and implementing appropriate optimizations, you can ensure your MirrorMaker 2 deployment for multi-cluster topic aggregation delivers the performance and reliability your organization needs.


Sources

  1. Apache Kafka Documentation — Comprehensive guide to MirrorMaker 2 and multi-cluster replication: https://kafka.apache.org/documentation/
  2. Confluent MirrorMaker Documentation — Enterprise MirrorMaker 2 configuration and best practices: https://docs.confluent.io/platform/current/mirror-maker/index.html
  3. Kafka MirrorMaker 2 Configuration Reference — Detailed configuration options and parameters: https://docs.confluent.io/platform/current/mirror-maker/configuration.html
  4. Apache Kafka Cross-Cluster Data Replication — Best practices for multi-cluster data replication: https://cwiki.apache.org/confluence/display/KAFKA/Cross+Cluster+Data+Replication
  5. Confluent Performance Tuning Guide — Optimizing Kafka performance for replication scenarios: https://docs.confluent.io/platform/current/performance/index.html
  6. Kafka Exactly-Once Semantics — Implementing reliable data replication: https://kafka.apache.org/documentation/#semantics
  7. Apache Kafka Security Configuration — Securing MirrorMaker 2 deployments: https://kafka.apache.org/documentation/#security

Conclusion

Apache Kafka MirrorMaker 2 provides a robust, purpose-built solution for pulling topic data from multiple source Kafka clusters and merging that data into a single target topic. This approach is particularly valuable when you have multiple data centers, each running their own Apache Kafka cluster with topics of the same name that need to be consolidated for centralized analytics, reporting, or disaster recovery scenarios.

When properly configured, MirrorMaker 2 can seamlessly handle the aggregation of identical topics across multiple clusters, maintaining data consistency while providing built-in monitoring, failover capabilities, and exactly-once semantics. The configuration involves specifying connections to all source clusters, defining the target cluster, and setting up appropriate topic replication policies with proper security, offset management, and performance tuning.

For most use cases involving multi-cluster topic aggregation, MirrorMaker 2 represents a best practice approach due to its reduced development overhead, built-in reliability features, and comprehensive support. However, if your requirements include complex business logic during replication, very specific performance optimizations, or tight integration with other systems, a custom consumer application might be justified despite the additional complexity.

Ultimately, the choice between MirrorMaker 2 and a custom consumer solution depends on your specific requirements, available resources, and long-term maintenance considerations. By following the configuration guidance, best practices, and optimization strategies outlined in this guide, you can implement an effective multi-cluster topic aggregation solution that meets your organization’s needs for data consolidation and cross-cluster replication.

Apache MirrorMaker 2 is a replication tool designed for copying data between Apache Kafka clusters. It provides enhanced capabilities compared to the original MirrorMaker, including improved performance, better checkpointing, and support for exactly-once semantics. MirrorMaker 2 operates as a dedicated application that consumes from source clusters and produces to target clusters, making it suitable for multi-cluster scenarios where data needs to be aggregated or replicated across different data centers or environments.

Confluent / Documentation Portal

MirrorMaker 2 enables bidirectional data replication between Kafka clusters, allowing organizations to maintain data consistency across multiple data centers. When configured for topic aggregation, MirrorMaker 2 can consume from multiple source clusters that contain topics with the same name and merge that data into a single target topic. This approach is particularly valuable for centralized analytics, disaster recovery, and multi-region data architectures where data from multiple sources needs to be consolidated.

Authors
Sources
Documentation Portal
Confluent / Documentation Portal
Documentation Portal
Verified by moderation
NeuroAnswers
Moderation