Submit
Icon for Apache Kafka

Apache Kafka

Apache Kafka is an open-source distributed event streaming platform for high-performance data pipelines, streaming analytics, and mission-critical applications. Handles trillions of events per day with durable, fault-tolerant storage.

Screenshot of Apache Kafka website

Apache Kafka is the technological foundation for the always-on world where businesses are increasingly software-defined and automated. It serves as a distributed event store and stream-processing platform capable of handling trillions of events per day.

Key capabilities

Publish and subscribe

Kafka enables continuous data import/export through its producer and consumer APIs, allowing applications to publish and subscribe to streams of events.

Storage

Events are stored durably and reliably for as long as needed in a distributed, fault-tolerant cluster. Unlike traditional message queues, events are not deleted after consumption.

Processing

The Kafka Streams API enables real-time stream processing with transformations, aggregations, joins, and windowing operations with exactly-once processing guarantees.

Architecture

Brokers

Form the storage layer in a distributed cluster that can span multiple datacenters or cloud regions. Supports up to 1,000 brokers per cluster.

Topics

Events are organized into topics, similar to folders in a filesystem. Topics are partitioned across brokers for horizontal scalability.

Producers and consumers

Client applications that publish and subscribe to events. Producers and consumers are fully decoupled - events persist and can be read as often as needed.

Replication

Data is replicated across brokers and regions for fault tolerance and high availability.

Five core APIs

  • Admin API: Manage topics, brokers, and other objects
  • Producer API: Publish events to topics
  • Consumer API: Subscribe to and process events
  • Kafka Streams API: Stream processing with stateful operations
  • Kafka Connect API: Build reusable data import/export connectors

Integration ecosystem

Kafka Connect provides hundreds of pre-built connectors for systems including PostgreSQL, JMS, Elasticsearch, AWS S3, and more.

Stream Processing integrates with Apache Flink, Spark Streaming, and Samza for complex event processing.

Deployment options

  • KRaft Mode: ZooKeeper-less deployment (recommended since Kafka 3.3)
  • Docker: Official images available
  • Kubernetes: Strimzi and Confluent operators available
  • Cloud: Managed services from AWS (MSK), Azure, GCP, and Confluent Cloud
  • Multi-datacenter: MirrorMaker for cross-cluster replication

Manufacturing use cases

  • IoT data ingestion: High-throughput ingestion of sensor data from factory floors
  • Predictive maintenance: Real-time analysis of equipment telemetry
  • Supply chain visibility: Tracking shipments and inventory across the value chain
  • Quality monitoring: Stream processing for defect detection
  • Digital twins: Feeding real-time data to digital twin platforms

Limitations

  • Requires significant operational expertise for self-managed deployments (broker tuning, partition management, KRaft migration)
  • High resource overhead: minimum 3 brokers for production, requiring substantial RAM and disk I/O
  • Not designed for low-latency request-reply patterns — use traditional message brokers (RabbitMQ, NATS) for RPC
  • No built-in MQTT support — requires a separate MQTT broker or Confluent's MQTT Proxy for direct IoT device integration
  • Consumer offset management adds complexity for exactly-once processing guarantees

Share:

Kind
Platform
License
Open Source
Website
kafka.apache.org
Deployment TypeLanguageLicense
Show all
Active
Ad
Icon

 

  
 

Similar to Apache Kafka

Icon

 

  
  
Icon

 

  
  
Icon