Skip to main content
Version: 1.0.0

Python Serialization Guide

Apache Fory™ is a blazing fast multi-language serialization framework powered by JIT compilation and zero-copy techniques, providing up to ultra-fast performance while maintaining ease of use and safety.

pyfory provides the Python implementation of Apache Fory™, offering xlang mode for cross-language payloads, native mode for Python-only object serialization, and advanced row-format capabilities for data processing tasks.

Key Features

Flexible Serialization Modes

  • Xlang mode: Default cross-language wire format with compatible schema evolution
  • Python native mode: Same-language mode and drop-in replacement for pickle/cloudpickle
  • Row Format: Zero-copy row format for analytics workloads

Versatile Serialization Features

  • Reference tracking for shared xlang schema objects and Python native-mode circular graphs
  • Polymorphism support for customized types with automatic type dispatching
  • Schema evolution support for backward/forward compatibility when using dataclasses in xlang mode
  • Out-of-band buffer support for zero-copy serialization of large data structures like NumPy arrays and Pandas DataFrames, compatible with pickle protocol 5

Blazing Fast Performance

  • Extremely fast performance compared to other serialization frameworks
  • Runtime code generation and Cython-accelerated core implementation for optimal performance

Compact Data Size

  • Compact object graph protocol with minimal space overhead—up to 3× size reduction compared to pickle/cloudpickle
  • Meta packing and sharing to minimize type forward/backward compatibility space overhead

Security & Safety

  • Strict mode prevents deserialization of untrusted types by type registration and checks.
  • Reference tracking for handling circular references safely

Installation

Basic Installation

pip install pyfory

Optional Dependencies

# Install with row format support (requires Apache Arrow)
pip install pyfory[format]

# Install from source for development
git clone https://github.com/apache/fory.git
cd fory/python
pip install -e ".[dev,format]"

Requirements

  • Python: 3.8 or higher
  • OS: Linux, macOS, Windows

Thread Safety

pyfory provides ThreadSafeFory for thread-safe serialization using a pooled wrapper:

import pyfory
import threading
from dataclasses import dataclass

@dataclass
class Person:
name: str
age: int

# Create a thread-safe xlang Fory instance
fory = pyfory.ThreadSafeFory(xlang=True, ref=True)
fory.register(Person)

# Use in multiple threads safely
def serialize_in_thread(thread_id):
person = Person(name=f"User{thread_id}", age=25 + thread_id)
data = fory.serialize(person)
result = fory.deserialize(data)
print(f"Thread {thread_id}: {result}")

threads = [threading.Thread(target=serialize_in_thread, args=(i,)) for i in range(10)]
for t in threads: t.start()
for t in threads: t.join()

Key Features:

  • Instance Pool: Maintains a pool of Fory instances protected by a lock for thread safety
  • Shared Configuration: All registrations must be done upfront and are applied to all instances
  • Same API: Drop-in replacement for Fory class with identical methods
  • Registration Safety: Prevents registration after first use to ensure consistency

When to Use:

  • Multi-threaded Applications: Web servers, concurrent workers, parallel processing
  • Shared Fory Instances: When multiple threads need to serialize/deserialize data
  • Thread Pools: Applications using thread pools or concurrent.futures

Quick Start

import pyfory
from dataclasses import dataclass

@dataclass
class Person:
name: str
age: int

# Create an xlang Fory instance
fory = pyfory.Fory(xlang=True, ref=True)
fory.register(Person)

person = Person("Alice", 30)
data = fory.serialize(person)
result = fory.deserialize(data)
print(result) # Person(name='Alice', age=30)

Xlang Mode And Native Mode

Use xlang mode for cross-language payloads and dataclass schemas shared with other Fory runtimes. Xlang mode is the default Python wire mode, and Python examples that use it set xlang=True explicitly so the mode choice is visible.

Use native mode for Python-only traffic. Native mode is selected with xlang=False, uses schema-consistent payloads unless compatible mode is enabled, and owns pickle/cloudpickle-style behavior such as functions, lambdas, classes, methods, __reduce__, __getstate__, and out-of-band pickle protocol 5 buffers. It is optimized for Python's type system and supports a broader Python object surface than xlang mode, so use it when replacing pickle or cloudpickle.

See Native Serialization for Python-only serialization details and Xlang Serialization for Python xlang registration and interoperability rules.

Next Steps