AI - Orc

Optimizing large-scale data processing with self-describing, type-aware ORC file format and libraries for Java and C++ under Apache umbrella.

Logo of Orc
Last Audited At

About Orc

Orc is a file format project under the Apache umbrella that develops and provides both Java and C++ libraries for reading and writing the Optimized Row Columnar (ORC) file format. The ORC file format is designed for Hadoop workloads, optimized for large streaming reads while also supporting quick searches. It is a self-describing type-aware columnar format that lets readers process only the required values for each query due to its type-awareness and internal indexes.

The ORC project includes a C++ reader and writer and a Java reader and writer, which are completely independent of each other. The libraries can read all versions of ORC files. Users can build and test releases using Apache Jira for bug tracking, Maven Central for downloads, or the latest releases from Apache.

To build ORC, you'll need to install Java 17 or higher, Maven 3.9.6 or higher, and cmake 3.12 or higher. Users can build release versions with debug information, a debug version without debug information, or just the Java library or C++ library by following specific build instructions.

The ORC file format is designed to optimize streaming reads and provide quick searches. By using type-awareness and internal indexes, it allows readers to read, decompress, and process only the required values for the current query. The format supports a complete set of types in Hive, including complex data structures like structs, lists, maps, and unions.

Additionally, the ORC project offers an optional AVX512 compilation feature which can be set at both compile time and run time using the BUILD_ENABLE_AVX512 cmake option or the ORC_USER_SIMD_LEVEL environment variable, respectively. This enables SIMD optimization for specific hardware.

Was this page helpful?

More companies

Graphcore

Empowering innovation in artificial intelligence through Graphcore's purpose-built IPUs and optimized Poplar Software platform, delivering high performance for machine learning workloads.

Read more

Copy.ai

Revolutionizing marketing and sales workflows with AI-powered Operating Systems's Marketing OS and Sales OS.

Read more

Alkymi

Revolutionizing data extraction and processing for industries through AI-powered workflow automation's comprehensive platform.

Read more

Tell us about your project

Our Hubs

London, United Kingdom

A global AI hotspot, thrives on innovation, diverse talent, and a dynamic tech ecosystem, offering unparalleled opportunities for AI engineers.

Munich, Germany

A vibrant AI hub, merges cutting-edge technology with rich cultural experiences, creating an inspiring environment for AI engineers.