Samza
Empowering real-time data processing at scale through Apache Samza's distributed framework utilizing Apache Kafka and Apache Hadoop YARN.
Samza is an open-source distributed stream processing framework under the Apache Software Foundation, utilizing Apache Kafka for messaging and Apache Hadoop YARN for resource management. It provides real-time data processing at scale with features like high availability, fault tolerance, and support for various programming models.
About Samza
Samza is a top-level project of the Apache Software Foundation that develops a distributed stream processing framework. It utilizes Apache Kafka for messaging and Apache Hadoop YARN for fault tolerance, processor isolation, security, and resource management.
Key features of Samza include:
- Uses Apache Kafka for messaging
- Leverages Apache Hadoop YARN for distributed processing
- Provides real-time data processing at scale
- Offers high availability and fault tolerance
- Supports various programming models such as MapReduce, Session Windows, and Triggers
Samza can be built using Gradle and supports contribution from the developer community following specific guidelines. It is a popular open-source project under the Apache umbrella with a large and active user base.