AI - Orc

Optimizing large-scale data processing with self-describing, type-aware ORC file format and libraries for Java and C++ under Apache umbrella.

Logo of Orc
Last Audited At

About Orc

Orc is a file format project under the Apache umbrella that develops and provides both Java and C++ libraries for reading and writing the Optimized Row Columnar (ORC) file format. The ORC file format is designed for Hadoop workloads, optimized for large streaming reads while also supporting quick searches. It is a self-describing type-aware columnar format that lets readers process only the required values for each query due to its type-awareness and internal indexes.

The ORC project includes a C++ reader and writer and a Java reader and writer, which are completely independent of each other. The libraries can read all versions of ORC files. Users can build and test releases using Apache Jira for bug tracking, Maven Central for downloads, or the latest releases from Apache.

To build ORC, you'll need to install Java 17 or higher, Maven 3.9.6 or higher, and cmake 3.12 or higher. Users can build release versions with debug information, a debug version without debug information, or just the Java library or C++ library by following specific build instructions.

The ORC file format is designed to optimize streaming reads and provide quick searches. By using type-awareness and internal indexes, it allows readers to read, decompress, and process only the required values for the current query. The format supports a complete set of types in Hive, including complex data structures like structs, lists, maps, and unions.

Additionally, the ORC project offers an optional AVX512 compilation feature which can be set at both compile time and run time using the BUILD_ENABLE_AVX512 cmake option or the ORC_USER_SIMD_LEVEL environment variable, respectively. This enables SIMD optimization for specific hardware.

Was this page helpful?

More companies

BaiduResearch Ernie 3.0

Advancing the future of artificial intelligence through collaborative research and innovation at BaiduResearch Ernie 3.0.

Read more

Skydio

Transforming industries with AI-powered drones: obstacle avoidance, real-time object recognition, auto-flight planning. Partnering with top organizations for expanded reach.

Read more

MarkLogic

Empowering organizations to extract actionable intelligence from any data volume or type using MarkLogic's advanced AI-driven data integration and search solutions.

Read more

Tell us about your project

Our Hubs

London, United Kingdom

A global AI hotspot, thrives on innovation, diverse talent, and a dynamic tech ecosystem, offering unparalleled opportunities for AI engineers.

Munich, Germany

A vibrant AI hub, merges cutting-edge technology with rich cultural experiences, creating an inspiring environment for AI engineers.