Xlang Serialization Format
Cross-language Serialization Specification
Apache Fory™ xlang serialization enables automatic cross-language object serialization with support for shared references, circular references, and polymorphism. Unlike traditional serialization frameworks that require IDL definitions and schema compilation, Fory serializes objects directly without any intermediate steps.
Key characteristics:
- Automatic: No IDL definition, no schema compilation, no manual object-to-protocol conversion
- Cross-language: Same binary format works seamlessly across Java, Python, C++, Rust, Go, JavaScript, and more
- Reference-aware: Handles shared references and circular references without duplication or infinite recursion
- Polymorphic: Supports object polymorphism with runtime type resolution
This specification defines the Fory xlang binary format. The format is dynamic rather than static, which enables flexibility and ease of use at the cost of additional complexity in the wire format.
Type Systems
Data Types
- bool: a boolean value (true or false).
- int8: a 8-bit signed integer.
- int16: a 16-bit signed integer.
- int32: a 32-bit signed integer.
- var32: a 32-bit signed integer which use fory variable-length encoding.
- int64: a 64-bit signed integer.
- var64: a 64-bit signed integer which use fory PVL encoding.
- hybrid64: a 64-bit signed integer which use fory Hybrid encoding.
- uint8: an 8-bit unsigned integer.
- uint16: a 16-bit unsigned integer.
- uint32: a 32-bit unsigned integer.
- varu32: a 32-bit unsigned integer which use fory variable-length encoding.
- uint64: a 64-bit unsigned integer.
- varu64: a 64-bit unsigned integer which use fory PVL encoding.
- hybridu64: a 64-bit unsigned integer which use fory Hybrid encoding.
- float16: a 16-bit floating point number.
- float32: a 32-bit floating point number.
- float64: a 64-bit floating point number including NaN and Infinity.
- string: a text string encoded using Latin1/UTF16/UTF-8 encoding.
- enum: a data type consisting of a set of named values. Rust enum with non-predefined field values are not supported as an enum.
- named_enum: an enum whose value will be serialized as the registered name.
- struct: a dynamic(final) type serialized by Fory Struct serializer. i.e. it doesn't have subclasses. Suppose we're
deserializing
List<SomeClass>, we can save dynamic serializer dispatch sinceSomeClassis dynamic(final). - compatible_struct: a dynamic(final) type serialized by Fory compatible Struct serializer.
- named_struct: a
structwhose type mapping will be encoded as a name. - named_compatible_struct: a
compatible_structwhose type mapping will be encoded as a name. - ext: a type which will be serialized by a customized serializer.
- named_ext: an
exttype whose type mapping will be encoded as a name. - list: a sequence of objects.
- set: an unordered set of unique elements.
- map: a map of key-value pairs. Mutable types such as
list/map/set/arrayare not allowed as key of map. - duration: an absolute length of time, independent of any calendar/timezone, as a count of nanoseconds.
- timestamp: a point in time, independent of any calendar/timezone, as a count of nanoseconds. The count is relative to an epoch at UTC midnight on January 1, 1970.
- local_date: a naive date without timezone. The count is days relative to an epoch at UTC midnight on Jan 1, 1970.
- decimal: exact decimal value represented as an integer value in two's complement.
- binary: an variable-length array of bytes.
- array: only allow 1d numeric components. Other arrays will be taken as List. The implementation should support the
interoperability between array and list.
- bool_array: one dimensional bool array.
- int8_array: one dimensional int8 array.
- int16_array: one dimensional int16 array.
- int32_array: one dimensional int32 array.
- int64_array: one dimensional int64 array.
- float16_array: one dimensional half_float_16 array.
- float32_array: one dimensional float32 array.
- float64_array: one dimensional float64 array.
- union: a tagged union type that can hold one of several alternative types. The active alternative is identified by an index.
- none: represents an empty/unit value with no data (e.g., for empty union alternatives).
Note:
- Unsigned integer types use the same byte sizes as their signed counterparts; the difference is in value interpretation. See Type mapping for language-specific type mappings.
Polymorphisms
For polymorphism, if one non-final class is registered, and only one subclass is registered, then we can take all elements in List/Map have same type, thus reduce runtime check cost.
Collection/Array polymorphism are not fully supported, since some languages such as golang have only one collection type. If users want to get exactly the type he passed, he must pass that type when deserializing or annotate that type to the field of struct.
Type disambiguation
Due to differences between type systems of languages, those types can't be mapped one-to-one between languages. When deserializing, Fory use the target data structure type and the data type in the data jointly to determine how to deserialize and populate the target data structure. For example:
class Foo {
int[] intArray;
Object[] objects;
List<Object> objectList;
}
class Foo2 {
int[] intArray;
List<Object> objects;
List<Object> objectList;
}
intArray has an int32_array type. But both objects and objectList fields in the serialize data have list data
type. When deserializing, the implementation will create an Object array for objects, but create a ArrayList
for objectList to populate its elements. And the serialized data of Foo can be deserialized into Foo2 too.
Users can also provide meta hints for fields of a type, or the type whole. Here is an example in java which use annotation to provide such information.
@ForyObject(fieldsNullable = false, trackingRef = false)
class Foo {
@ForyField(trackingRef = false)
int[] intArray;
@ForyField(polymorphic = true)
Object object;
@ForyField(tagId = 1, nullable = true)
List<Object> objectList;
}
Such information can be provided in other languages too:
- cpp: use macro and template.
- golang: use struct tag.
- python: use typehint.
- rust: use macro.
Type ID
All internal data types are expressed using an ID in range 0~64. Users can use IDs in range 0~8192 for registering their
custom types (struct/ext/enum). User type IDs are in a separate namespace and combined with internal type IDs via bit shifting:
(user_type_id << 8) | internal_type_id.
Internal Type ID Table
| Type ID | Name | Description |
|---|---|---|
| 0 | UNKNOWN | Unknown type, used for dynamic typing |
| 1 | BOOL | Boolean value |
| 2 | INT8 | 8-bit signed integer |
| 3 | INT16 | 16-bit signed integer |
| 4 | INT32 | 32-bit signed integer |
| 5 | VARINT32 | Variable-length encoded 32-bit signed integer |
| 6 | INT64 | 64-bit signed integer |
| 7 | VARINT64 | Variable-length encoded 64-bit signed integer |
| 8 | TAGGED_INT64 | Hybrid encoded 64-bit signed integer |
| 9 | UINT8 | 8-bit unsigned integer |
| 10 | UINT16 | 16-bit unsigned integer |
| 11 | UINT32 | 32-bit unsigned integer |
| 12 | VAR_UINT32 | Variable-length encoded 32-bit unsigned integer |
| 13 | UINT64 | 64-bit unsigned integer |
| 14 | VAR_UINT64 | Variable-length encoded 64-bit unsigned integer |
| 15 | TAGGED_UINT64 | Hybrid encoded 64-bit unsigned integer |
| 16 | FLOAT16 | 16-bit floating point (half precision) |
| 17 | FLOAT32 | 32-bit floating point (single precision) |
| 18 | FLOAT64 | 64-bit floating point (double precision) |
| 19 | STRING | UTF-8/UTF-16/Latin1 encoded string |
| 20 | LIST | Ordered collection (List, Array, Vector) |
| 21 | SET | Unordered collection of unique elements |
| 22 | MAP | Key-value mapping |
| 23 | ENUM | Enum registered by numeric ID |
| 24 | NAMED_ENUM | Enum registered by namespace + type name |
| 25 | STRUCT | Struct registered by numeric ID (schema consistent) |
| 26 | COMPATIBLE_STRUCT | Struct with schema evolution support (by ID) |
| 27 | NAMED_STRUCT | Struct registered by namespace + type name |
| 28 | NAMED_COMPATIBLE_STRUCT | Struct with schema evolution (by name) |
| 29 | EXT | Extension type registered by numeric ID |
| 30 | NAMED_EXT | Extension type registered by namespace + type name |
| 31 | UNION | Tagged union type (one of several alternatives) |
| 32 | NONE | Empty/unit type (no data) |
| 33 | DURATION | Time duration (seconds + nanoseconds) |
| 34 | TIMESTAMP | Point in time (nanoseconds since epoch) |
| 35 | LOCAL_DATE | Date without timezone (days since epoch) |
| 36 | DECIMAL | Arbitrary precision decimal |
| 37 | BINARY | Raw binary data |
| 38 | ARRAY | Generic array type |
| 39 | BOOL_ARRAY | 1D boolean array |
| 40 | INT8_ARRAY | 1D int8 array |
| 41 | INT16_ARRAY | 1D int16 array |
| 42 | INT32_ARRAY | 1D int32 array |
| 43 | INT64_ARRAY | 1D int64 array |
| 44 | UINT8_ARRAY | 1D uint8 array |
| 45 | UINT16_ARRAY | 1D uint16 array |
| 46 | UINT32_ARRAY | 1D uint32 array |
| 47 | UINT64_ARRAY | 1D uint64 array |
| 48 | FLOAT16_ARRAY | 1D float16 array |
| 49 | FLOAT32_ARRAY | 1D float32 array |
| 50 | FLOAT64_ARRAY | 1D float64 array |
Type ID Encoding for User Types
When registering user types (struct/ext/enum), the full type ID combines user ID and internal type ID:
Full Type ID = (user_type_id << 8) | internal_type_id
Examples:
| User ID | Type | Internal ID | Full Type ID | Decimal |
|---|---|---|---|---|
| 0 | STRUCT | 25 | (0 << 8) | 25 | 25 |
| 0 | ENUM | 23 | (0 << 8) | 23 | 23 |
| 1 | STRUCT | 25 | (1 << 8) | 25 | 281 |
| 1 | COMPATIBLE_STRUCT | 26 | (1 << 8) | 26 | 282 |
| 2 | NAMED_STRUCT | 27 | (2 << 8) | 27 | 539 |
When reading type IDs:
- Extract internal type:
internal_type_id = full_type_id & 0xFF - Extract user type ID:
user_type_id = full_type_id >> 8
Type mapping
See Type mapping
Spec overview
Here is the overall format:
| fory header | object ref meta | object type meta | object value data |
The data are serialized using little endian byte order for all types.