Skip to main content
Version: 0.14

Row Format

This page covers the row-based serialization format for high-performance, cache-friendly data access.

Overview

Apache Fory™ Row Format is a binary format optimized for:

  • Random Access: Read any field without deserializing the entire object
  • Zero-Copy: Direct memory access without data copying
  • Cache-Friendly: Contiguous memory layout for CPU cache efficiency
  • Columnar Conversion: Easy conversion to Apache Arrow format
  • Partial Serialization: Serialize only needed fields

When to Use Row Format

Use CaseRow FormatObject Graph
Analytics/OLAP
Random field access
Full object serialization
Complex object graphs
Reference tracking
Cross-language (simple types)

Quick Start

#include "fory/encoder/row_encoder.h"
#include "fory/row/writer.h"

using namespace fory::row;
using namespace fory::row::encoder;

// Define a struct
struct Person {
int32_t id;
std::string name;
float score;
};

// Register field metadata (required for row encoding)
FORY_FIELD_INFO(Person, id, name, score);

int main() {
// Create encoder
RowEncoder<Person> encoder;

// Encode a person
Person person{1, "Alice", 95.5f};
encoder.Encode(person);

// Get the encoded row
auto row = encoder.GetWriter().ToRow();

// Random access to fields
int32_t id = row->GetInt32(0);
std::string name = row->GetString(1);
float score = row->GetFloat(2);

assert(id == 1);
assert(name == "Alice");
assert(score == 95.5f);

return 0;
}

Row Encoder

Basic Usage

The RowEncoder<T> template class provides type-safe encoding:

#include "fory/encoder/row_encoder.h"

// Define struct with FORY_FIELD_INFO
struct Point {
double x;
double y;
};
FORY_FIELD_INFO(Point, x, y);

// Create encoder
RowEncoder<Point> encoder;

// Access schema (for inspection)
const Schema& schema = encoder.GetSchema();
std::cout << "Fields: " << schema.field_names().size() << std::endl;

// Encode value
Point p{1.0, 2.0};
encoder.Encode(p);

// Get result as Row
auto row = encoder.GetWriter().ToRow();

Nested Structs

struct Address {
std::string city;
std::string country;
};
FORY_FIELD_INFO(Address, city, country);

struct Person {
std::string name;
Address address;
};
FORY_FIELD_INFO(Person, name, address);

// Encode nested struct
RowEncoder<Person> encoder;
Person person{"Alice", {"New York", "USA"}};
encoder.Encode(person);

auto row = encoder.GetWriter().ToRow();
std::string name = row->GetString(0);

// Access nested struct
auto address_row = row->GetStruct(1);
std::string city = address_row->GetString(0);
std::string country = address_row->GetString(1);

Arrays / Lists

struct Record {
std::vector<int32_t> values;
std::string label;
};
FORY_FIELD_INFO(Record, values, label);

RowEncoder<Record> encoder;
Record record{{1, 2, 3, 4, 5}, "test"};
encoder.Encode(record);

auto row = encoder.GetWriter().ToRow();
auto array = row->GetArray(0);

int count = array->num_elements();
for (int i = 0; i < count; i++) {
int32_t value = array->GetInt32(i);
}

Encoding Arrays Directly

// Encode a vector directly (not inside a struct)
std::vector<Person> people{
{"Alice", {"NYC", "USA"}},
{"Bob", {"London", "UK"}}
};

RowEncoder<decltype(people)> encoder;
encoder.Encode(people);

// Get array data
auto array = encoder.GetWriter().CopyToArrayData();
auto first_person = array->GetStruct(0);
std::string first_name = first_person->GetString(0);

Row Data Access

Row Class

The Row class provides random access to struct fields:

class Row {
public:
// Null check
bool IsNullAt(int i) const;

// Primitive getters
bool GetBoolean(int i) const;
int8_t GetInt8(int i) const;
int16_t GetInt16(int i) const;
int32_t GetInt32(int i) const;
int64_t GetInt64(int i) const;
float GetFloat(int i) const;
double GetDouble(int i) const;

// String/binary getter
std::string GetString(int i) const;
std::vector<uint8_t> GetBinary(int i) const;

// Nested types
std::shared_ptr<Row> GetStruct(int i) const;
std::shared_ptr<ArrayData> GetArray(int i) const;
std::shared_ptr<MapData> GetMap(int i) const;

// Metadata
int num_fields() const;
SchemaPtr schema() const;

// Debug
std::string ToString() const;
};

ArrayData Class

The ArrayData class provides access to list/array elements:

class ArrayData {
public:
// Null check
bool IsNullAt(int i) const;

// Element count
int num_elements() const;

// Primitive getters (same as Row)
int32_t GetInt32(int i) const;
// ... other primitives

// String getter
std::string GetString(int i) const;

// Nested types
std::shared_ptr<Row> GetStruct(int i) const;
std::shared_ptr<ArrayData> GetArray(int i) const;
std::shared_ptr<MapData> GetMap(int i) const;

// Type info
ListTypePtr type() const;
};

MapData Class

The MapData class provides access to map key-value pairs:

class MapData {
public:
// Element count
int num_elements();

// Access keys and values as arrays
std::shared_ptr<ArrayData> keys_array();
std::shared_ptr<ArrayData> values_array();

// Type info
MapTypePtr type();
};

Schema and Types

Schema Definition

Schemas define the structure of row data:

#include "fory/row/schema.h"

using namespace fory::row;

// Create schema programmatically
auto person_schema = schema({
field("id", int32()),
field("name", utf8()),
field("score", float32()),
field("active", boolean())
});

// Access schema info
for (const auto& f : person_schema->fields()) {
std::cout << f->name() << ": " << f->type()->name() << std::endl;
}

Type System

Available types for row format:

// Primitive types
DataTypePtr boolean(); // bool
DataTypePtr int8(); // int8_t
DataTypePtr int16(); // int16_t
DataTypePtr int32(); // int32_t
DataTypePtr int64(); // int64_t
DataTypePtr float32(); // float
DataTypePtr float64(); // double

// String and binary
DataTypePtr utf8(); // std::string
DataTypePtr binary(); // std::vector<uint8_t>

// Complex types
DataTypePtr list(DataTypePtr element_type);
DataTypePtr map(DataTypePtr key_type, DataTypePtr value_type);
DataTypePtr struct_(std::vector<FieldPtr> fields);

Type Inference

The RowEncodeTrait template automatically infers types:

// Type inference for primitives
RowEncodeTrait<int32_t>::Type(); // Returns int32()
RowEncodeTrait<float>::Type(); // Returns float32()
RowEncodeTrait<std::string>::Type(); // Returns utf8()

// Type inference for collections
RowEncodeTrait<std::vector<int32_t>>::Type(); // Returns list(int32())

// Type inference for maps
RowEncodeTrait<std::map<std::string, int32_t>>::Type();
// Returns map(utf8(), int32())

// Type inference for structs (requires FORY_FIELD_INFO)
RowEncodeTrait<Person>::Type(); // Returns struct_({...})
RowEncodeTrait<Person>::Schema(); // Returns schema({...})

Row Writer

RowWriter

For manual row construction:

#include "fory/row/writer.h"

// Create schema
auto my_schema = schema({
field("x", int32()),
field("y", float64()),
field("name", utf8())
});

// Create writer
RowWriter writer(my_schema);
writer.Reset();

// Write fields
writer.Write(0, 42); // x = 42
writer.Write(1, 3.14); // y = 3.14
writer.WriteString(2, "test"); // name = "test"

// Get result
auto row = writer.ToRow();

ArrayWriter

For manual array construction:

// Create array type
auto array_type = list(int32());

// Create writer
ArrayWriter writer(array_type);
writer.Reset(5); // 5 elements

// Write elements
for (int i = 0; i < 5; i++) {
writer.Write(i, i * 10);
}

// Get result
auto array = writer.CopyToArrayData();

Null Values

// Set null at specific index
writer.SetNullAt(2); // Field 2 is null

// Check null when reading
if (!row->IsNullAt(2)) {
std::string value = row->GetString(2);
}

Memory Layout

Row Layout

+------------------+--------------------+--------------------+
| Null Bitmap | Fixed-Size Data | Variable-Size Data |
+------------------+--------------------+--------------------+
| ceil(n/8) B | 8 * n bytes | variable |
+------------------+--------------------+--------------------+
  • Null Bitmap: One bit per field, indicates null values
  • Fixed-Size Data: 8 bytes per field (primitives stored directly, offset+size for variable)
  • Variable-Size Data: Strings, arrays, nested structs

Array Layout

+------------+------------------+--------------------+--------------------+
| Num Elems | Null Bitmap | Fixed-Size Data | Variable-Size Data |
+------------+------------------+--------------------+--------------------+
| 8 bytes | ceil(n/8) bytes | elem_size * n | variable |
+------------+------------------+--------------------+--------------------+

Map Layout

+------------------+------------------+
| Keys Array | Values Array |
+------------------+------------------+

Performance Tips

1. Reuse Encoders

RowEncoder<Person> encoder;

// Encode multiple records
for (const auto& person : people) {
encoder.Encode(person);
auto row = encoder.GetWriter().ToRow();
// Process row...
}

2. Pre-allocate Buffer

// Get buffer reference for pre-allocation
auto& buffer = encoder.GetWriter().buffer();
buffer->Reserve(expected_size);

3. Batch Processing

// Process in batches for better cache utilization
std::vector<Person> batch;
batch.reserve(BATCH_SIZE);

while (hasMore()) {
batch.clear();
fillBatch(batch);

for (const auto& person : batch) {
encoder.Encode(person);
process(encoder.GetWriter().ToRow());
}
}

4. Zero-Copy Reading

// Point to existing buffer (zero-copy)
Row row(schema);
row.PointTo(buffer, offset, size);

// Access fields directly from buffer
int32_t id = row.GetInt32(0);

Supported Types Summary

C++ TypeRow TypeFixed Size
boolboolean()1 byte
int8_tint8()1 byte
int16_tint16()2 bytes
int32_tint32()4 bytes
int64_tint64()8 bytes
floatfloat32()4 bytes
doublefloat64()8 bytes
std::stringutf8()Variable
std::vector<T>list(T)Variable
std::map<K,V>map(K,V)Variable
std::optional<T>Inner typeNullable
Struct (FORY_FIELD_INFO)struct_({...})Variable