Skip to main content
Version: dev

Row Format

This page covers the row-based serialization format for high-performance, cache-friendly data access.

Overview

Apache Fory™ Row Format is a binary format optimized for:

  • Random Access: Read any field without deserializing the entire object
  • Zero-copy: Direct memory access without data copying
  • Cache-Friendly: Contiguous memory layout for CPU cache efficiency
  • Columnar Conversion: Easy conversion to Apache Arrow format
  • Partial Serialization: Serialize only needed fields

When to Use Row Format

Use CaseRow FormatObject Graph
Analytics/OLAP
Random field access
Full object serialization
Complex object graphs
Reference tracking
Cross-language (simple types)

Quick Start

#include "fory/encoder/row_encoder.h"
#include "fory/row/writer.h"

using namespace fory::row;
using namespace fory::row::encoder;

struct Person {
int32_t id;
std::string name;
float score;
FORY_STRUCT(Person, id, name, score);
};

int main() {
// Create encoder
RowEncoder<Person> encoder;

// encode a person
Person person{1, "Alice", 95.5f};
encoder.encode(person);

// get the encoded row
auto row = encoder.get_writer().to_row();

// Random access to fields
int32_t id = row->get_int32(0);
std::string name = row->get_string(1);
float score = row->get_float(2);

assert(id == 1);
assert(name == "Alice");
assert(score == 95.5f);

return 0;
}

Row Encoder

Basic Usage

The RowEncoder<T> template class provides type-safe encoding:

#include "fory/encoder/row_encoder.h"

struct Point {
double x;
double y;
FORY_STRUCT(Point, x, y);
};

// Create encoder
RowEncoder<Point> encoder;

// Access schema (for inspection)
const Schema& schema = encoder.get_schema();
std::cout << "Fields: " << schema.field_names().size() << std::endl;

// encode value
Point p{1.0, 2.0};
encoder.encode(p);

// get result as Row
auto row = encoder.get_writer().to_row();

Nested Structs

struct Address {
std::string city;
std::string country;
FORY_STRUCT(Address, city, country);
};

struct Person {
std::string name;
Address address;
FORY_STRUCT(Person, name, address);
};

// encode nested struct
RowEncoder<Person> encoder;
Person person{"Alice", {"New York", "USA"}};
encoder.encode(person);

auto row = encoder.get_writer().to_row();
std::string name = row->get_string(0);

// Access nested struct
auto address_row = row->get_struct(1);
std::string city = address_row->get_string(0);
std::string country = address_row->get_string(1);

Arrays / Lists

struct Record {
std::vector<int32_t> values;
std::string label;
FORY_STRUCT(Record, values, label);
};

RowEncoder<Record> encoder;
Record record{{1, 2, 3, 4, 5}, "test"};
encoder.encode(record);

auto row = encoder.get_writer().to_row();
auto array = row->get_array(0);

int count = array->num_elements();
for (int i = 0; i < count; i++) {
int32_t value = array->get_int32(i);
}

Encoding Arrays Directly

// encode a vector directly (not inside a struct)
std::vector<Person> people{
{"Alice", {"NYC", "USA"}},
{"Bob", {"London", "UK"}}
};

RowEncoder<decltype(people)> encoder;
encoder.encode(people);

// get array data
auto array = encoder.get_writer().copy_to_array_data();
auto first_person = array->get_struct(0);
std::string first_name = first_person->get_string(0);

Row Data Access

Row Class

The Row class provides random access to struct fields:

class Row {
public:
// Null check
bool is_null_at(int i) const;

// Primitive getters
bool get_boolean(int i) const;
int8_t get_int8(int i) const;
int16_t get_int16(int i) const;
int32_t get_int32(int i) const;
int64_t get_int64(int i) const;
float get_float(int i) const;
double get_double(int i) const;

// String/binary getter
std::string get_string(int i) const;
std::vector<uint8_t> get_binary(int i) const;

// Nested types
std::shared_ptr<Row> get_struct(int i) const;
std::shared_ptr<ArrayData> get_array(int i) const;
std::shared_ptr<MapData> get_map(int i) const;

// Metadata
int num_fields() const;
SchemaPtr schema() const;

// Debug
std::string to_string() const;
};

ArrayData Class

The ArrayData class provides access to list/array elements:

class ArrayData {
public:
// Null check
bool is_null_at(int i) const;

// Element count
int num_elements() const;

// Primitive getters (same as Row)
int32_t get_int32(int i) const;
// ... other primitives

// String getter
std::string get_string(int i) const;

// Nested types
std::shared_ptr<Row> get_struct(int i) const;
std::shared_ptr<ArrayData> get_array(int i) const;
std::shared_ptr<MapData> get_map(int i) const;

// Type info
ListTypePtr type() const;
};

MapData Class

The MapData class provides access to map key-value pairs:

class MapData {
public:
// Element count
int num_elements();

// Access keys and values as arrays
std::shared_ptr<ArrayData> keys_array();
std::shared_ptr<ArrayData> values_array();

// Type info
MapTypePtr type();
};

Schema and Types

Schema Definition

Schemas define the structure of row data:

#include "fory/row/schema.h"

using namespace fory::row;

// Create schema programmatically
auto person_schema = schema({
field("id", int32()),
field("name", utf8()),
field("score", float32()),
field("active", boolean())
});

// Access schema info
for (const auto& f : person_schema->fields()) {
std::cout << f->name() << ": " << f->type()->name() << std::endl;
}

Type System

Available types for row format:

// Primitive types
DataTypePtr boolean(); // bool
DataTypePtr int8(); // int8_t
DataTypePtr int16(); // int16_t
DataTypePtr int32(); // int32_t
DataTypePtr int64(); // int64_t
DataTypePtr float32(); // float
DataTypePtr float64(); // double

// String and binary
DataTypePtr utf8(); // std::string
DataTypePtr binary(); // std::vector<uint8_t>

// Complex types
DataTypePtr list(DataTypePtr element_type);
DataTypePtr map(DataTypePtr key_type, DataTypePtr value_type);
DataTypePtr struct_(std::vector<FieldPtr> fields);

Type Inference

The RowEncodeTrait template automatically infers types:

// Type inference for primitives
RowEncodeTrait<int32_t>::Type(); // Returns int32()
RowEncodeTrait<float>::Type(); // Returns float32()
RowEncodeTrait<std::string>::Type(); // Returns utf8()

// Type inference for collections
RowEncodeTrait<std::vector<int32_t>>::Type(); // Returns list(int32())

// Type inference for maps
RowEncodeTrait<std::map<std::string, int32_t>>::Type();
// Returns map(utf8(), int32())

// Type inference for structs (requires FORY_STRUCT)
RowEncodeTrait<Person>::Type(); // Returns struct_({...})
RowEncodeTrait<Person>::Schema(); // Returns schema({...})

Row Writer

RowWriter

For manual row construction:

#include "fory/row/writer.h"

// Create schema
auto my_schema = schema({
field("x", int32()),
field("y", float64()),
field("name", utf8())
});

// Create writer
RowWriter writer(my_schema);
writer.reset();

// write fields
writer.write(0, 42); // x = 42
writer.write(1, 3.14); // y = 3.14
writer.write_string(2, "test"); // name = "test"

// get result
auto row = writer.to_row();

ArrayWriter

For manual array construction:

// Create array type
auto array_type = list(int32());

// Create writer
ArrayWriter writer(array_type);
writer.reset(5); // 5 elements

// write elements
for (int i = 0; i < 5; i++) {
writer.write(i, i * 10);
}

// get result
auto array = writer.copy_to_array_data();

Null Values

// Set null at specific index
writer.set_null_at(2); // Field 2 is null

// Check null when reading
if (!row->is_null_at(2)) {
std::string value = row->get_string(2);
}

Memory Layout

Row Layout

+------------------+--------------------+--------------------+
| Null Bitmap | Fixed-Size Data | Variable-Size Data |
+------------------+--------------------+--------------------+
| ceil(n/8) B | 8 * n bytes | variable |
+------------------+--------------------+--------------------+
  • Null Bitmap: One bit per field, indicates null values
  • Fixed-Size Data: 8 bytes per field (primitives stored directly, offset+size for variable)
  • Variable-Size Data: Strings, arrays, nested structs

Array Layout

+------------+------------------+--------------------+--------------------+
| Num Elems | Null Bitmap | Fixed-Size Data | Variable-Size Data |
+------------+------------------+--------------------+--------------------+
| 8 bytes | ceil(n/8) bytes | elem_size * n | variable |
+------------+------------------+--------------------+--------------------+

Map Layout

+------------------+------------------+
| Keys Array | Values Array |
+------------------+------------------+

Performance Tips

1. Reuse Encoders

RowEncoder<Person> encoder;

// encode multiple records
for (const auto& person : people) {
encoder.encode(person);
auto row = encoder.get_writer().to_row();
// Process row...
}

2. Pre-allocate Buffer

// get buffer reference for pre-allocation
auto& buffer = encoder.get_writer().buffer();
buffer->reserve(expected_size);

3. Batch Processing

// Process in batches for better cache utilization
std::vector<Person> batch;
batch.reserve(BATCH_SIZE);

while (has_more()) {
batch.clear();
fill_batch(batch);

for (const auto& person : batch) {
encoder.encode(person);
process(encoder.get_writer().to_row());
}
}

4. Zero-copy Reading

// Point to existing buffer (zero-copy)
Row row(schema);
row.point_to(buffer, offset, size);

// Access fields directly from buffer
int32_t id = row.get_int32(0);

Supported Types Summary

C++ TypeRow TypeFixed Size
boolboolean()1 byte
int8_tint8()1 byte
int16_tint16()2 bytes
int32_tint32()4 bytes
int64_tint64()8 bytes
floatfloat32()4 bytes
doublefloat64()8 bytes
std::stringutf8()Variable
std::vector<T>list(T)Variable
std::map<K,V>map(K,V)Variable
std::optional<T>Inner typeNullable
Struct (FORY_STRUCT)struct_({...})Variable