AI Product Engineer Logo

Command Palette

Search for a command to run...

Back to AI Ecosystem

Samza

Empowering real-time data processing at scale through Apache Samza's distributed framework utilizing Apache Kafka and Apache Hadoop YARN.

Samza logo
Open Source Infrastructure

Samza is an open-source distributed stream processing framework under the Apache Software Foundation, utilizing Apache Kafka for messaging and Apache Hadoop YARN for resource management. It provides real-time data processing at scale with features like high availability, fault tolerance, and support for various programming models.

About Samza

Samza is a top-level project of the Apache Software Foundation that develops a distributed stream processing framework. It utilizes Apache Kafka for messaging and Apache Hadoop YARN for fault tolerance, processor isolation, security, and resource management.

Key features of Samza include:

  • Uses Apache Kafka for messaging
  • Leverages Apache Hadoop YARN for distributed processing
  • Provides real-time data processing at scale
  • Offers high availability and fault tolerance
  • Supports various programming models such as MapReduce, Session Windows, and Triggers

Samza can be built using Gradle and supports contribution from the developer community following specific guidelines. It is a popular open-source project under the Apache umbrella with a large and active user base.