AI - Hudi

Manage large analytics datasets efficiently with Hudi's open-source project: Merge-on-Read for flexible data processing and incremental updates on distributed file systems.

Logo of Hudi
Last Audited At

About Hudi

Hudi is an open-source data management project under the Apache umbrella, primarily developed for managing the storage of large analytical datasets on distributed file systems like HDFS and Cloud Stores. The name "Hudi" is derived from the term "Hadoop Upserts Deletes and Incrementals."

Hudi offers the following core functionalities:

  • Manages data lake metadata, including tables, partitions, columns, and other metadata.
  • Provides Merge-on-Read (MOR) functionality for overwriting existing data with new data, making it an ideal choice for streaming use cases.
  • Supports deletion of specific rows or entire partitions, enabling efficient data lake management.
  • Incremental updates allow for reading only the changed data when querying data in near real-time.

Hudi comes with various pre-built jars optimized for different Spark and Flink versions. Developers can customize their builds by specifying different Maven options. The project's active development is evident from its frequent commits, as shown on the provided badges. Users are encouraged to join the Hudi community on Slack and follow the official Twitter account for updates and discussions.

Notably, Hudi is licensed under the Apache License, Version 2.0, ensuring its availability and flexibility for various projects.

Was this page helpful?

More companies

IBM

Innovating for industries's portfolio of AI-powered solutions, collaborative services, and industry-specific technologies driving impactful transformation.

Read more

Backblaze

Empowering individuals and businesses with cost-effective, scalable, and durable cloud storage solutions through Backblaze's S3-compatible B2 Cloud Storage and seamless backup services.

Read more

Plaid

Empowering businesses with seamless access to financial institutions' data and payments through Plaid's comprehensive suite of APIs and services.

Read more

Tell us about your project

Our Hubs

London, United Kingdom

A global AI hotspot, thrives on innovation, diverse talent, and a dynamic tech ecosystem, offering unparalleled opportunities for AI engineers.

Munich, Germany

A vibrant AI hub, merges cutting-edge technology with rich cultural experiences, creating an inspiring environment for AI engineers.