Icon for Apache Kafka

Apache Kafka

Apache Kafka is an open-source distributed event streaming platform for high-performance data pipelines, streaming analytics, and mission-critical applications. Handles trillions of events per day with durable, fault-tolerant storage.

Screenshot of Apache Kafka website

Apache Kafka is the technological foundation for the always-on world where businesses are increasingly software-defined and automated. It serves as a distributed event store and stream-processing platform capable of handling trillions of events per day.

Key capabilities

Publish and subscribe

Kafka enables continuous data import/export through its producer and consumer APIs, allowing applications to publish and subscribe to streams of events.

Storage

Events are stored durably and reliably for as long as needed in a distributed, fault-tolerant cluster. Unlike traditional message queues, events are not deleted after consumption.

Processing

The Kafka Streams API enables real-time stream processing with transformations, aggregations, joins, and windowing operations with exactly-once processing guarantees.

Architecture

Brokers

Form the storage layer in a distributed cluster that can span multiple datacenters or cloud regions. Supports up to 1,000 brokers per cluster.

Topics

Events are organized into topics, similar to folders in a filesystem. Topics are partitioned across brokers for horizontal scalability.

Producers and consumers

Client applications that publish and subscribe to events. Producers and consumers are fully decoupled - events persist and can be read as often as needed.

Replication

Data is replicated across brokers and regions for fault tolerance and high availability.

Five core APIs

  • Admin API: Manage topics, brokers, and other objects
  • Producer API: Publish events to topics
  • Consumer API: Subscribe to and process events
  • Kafka Streams API: Stream processing with stateful operations
  • Kafka Connect API: Build reusable data import/export connectors

Integration ecosystem

Kafka Connect provides hundreds of pre-built connectors for systems including PostgreSQL, JMS, Elasticsearch, AWS S3, and more.

Stream Processing integrates with Apache Flink, Spark Streaming, and Samza for complex event processing.

Deployment options

  • KRaft Mode: ZooKeeper-less deployment (recommended since Kafka 3.3)
  • Docker: Official images available
  • Kubernetes: Strimzi and Confluent operators available
  • Cloud: Managed services from AWS (MSK), Azure, GCP, and Confluent Cloud
  • Multi-datacenter: MirrorMaker for cross-cluster replication

Manufacturing use cases

  • IoT data ingestion: High-throughput ingestion of sensor data from factory floors
  • Predictive maintenance: Real-time analysis of equipment telemetry
  • Supply chain visibility: Tracking shipments and inventory across the value chain
  • Quality monitoring: Stream processing for defect detection
  • Digital twins: Feeding real-time data to digital twin platforms

Limitations

  • Requires significant operational expertise for self-managed deployments (broker tuning, partition management, KRaft migration)
  • High resource overhead: minimum 3 brokers for production, requiring substantial RAM and disk I/O
  • Not designed for low-latency request-reply patterns — use traditional message brokers (RabbitMQ, NATS) for RPC
  • No built-in MQTT support — requires a separate MQTT broker or Confluent's MQTT Proxy for direct IoT device integration
  • Consumer offset management adds complexity for exactly-once processing guarantees

Share:

Kind
Platform
Vendor
Apache Software Foundation
License
Open Source
Website
kafka.apache.org
Deployment TypeLanguageLicense
Show all
Ad
Icon

 

  
 

More from Apache Software Foundation

Icon

 

  
  
Icon

 

  
  
Icon

 

  
  

Similar to Apache Kafka

Icon

 

  
  
Icon

 

  
  
Icon