AI - Orc

Optimizing large-scale data processing with self-describing, type-aware ORC file format and libraries for Java and C++ under Apache umbrella.

Logo of Orc
Last Audited At

About Orc

Orc is a file format project under the Apache umbrella that develops and provides both Java and C++ libraries for reading and writing the Optimized Row Columnar (ORC) file format. The ORC file format is designed for Hadoop workloads, optimized for large streaming reads while also supporting quick searches. It is a self-describing type-aware columnar format that lets readers process only the required values for each query due to its type-awareness and internal indexes.

The ORC project includes a C++ reader and writer and a Java reader and writer, which are completely independent of each other. The libraries can read all versions of ORC files. Users can build and test releases using Apache Jira for bug tracking, Maven Central for downloads, or the latest releases from Apache.

To build ORC, you'll need to install Java 17 or higher, Maven 3.9.6 or higher, and cmake 3.12 or higher. Users can build release versions with debug information, a debug version without debug information, or just the Java library or C++ library by following specific build instructions.

The ORC file format is designed to optimize streaming reads and provide quick searches. By using type-awareness and internal indexes, it allows readers to read, decompress, and process only the required values for the current query. The format supports a complete set of types in Hive, including complex data structures like structs, lists, maps, and unions.

Additionally, the ORC project offers an optional AVX512 compilation feature which can be set at both compile time and run time using the BUILD_ENABLE_AVX512 cmake option or the ORC_USER_SIMD_LEVEL environment variable, respectively. This enables SIMD optimization for specific hardware.

More companies

Crux Data

Leading AI partner for financial institutions, enhancing capabilities through advanced analytics and market insights.

Read more

Cube

Empowering organizations with an API-first semantic layer solution for unified data access, improved application performance, and beautiful UX/UI experiences through advanced AI technologies.

Read more

RoBERTa

Advancing natural language processing through groundbreaking AI research: The RoBERTa Project by Facebook AI.

Read more

Tell us about your project

Our Hubs

London, United Kingdom

A global AI hotspot, thrives on innovation, diverse talent, and a dynamic tech ecosystem, offering unparalleled opportunities for AI engineers.

Munich, Germany

A vibrant AI hub, merges cutting-edge technology with rich cultural experiences, creating an inspiring environment for AI engineers.