Xlang Implementation Guide
Implementation guidelines
How to reduce memory read/write code
- Try to merge multiple bytes into an int/long write before writing to reduce memory IO and bound check cost.
- Read multiple bytes as an int/long, then split into multiple bytes to reduce memory IO and bound check cost.
- Try to use one varint/long to write flags and length together to save one byte cost and reduce memory io.
- Condition branches are less expensive compared to memory IO cost unless there are too many branches.
Fast deserialization for static languages without runtime codegen support
For type evolution, the serializer will encode the type meta into the serialized data. The deserializer will compare this meta with class meta in the current process, and use the diff to determine how to deserialize the data.
For java/javascript/python, we can use the diff to generate serializer code at runtime and load it as class/function for deserialization. In this way, the type evolution will be as fast as type consist mode.
For C++/Rust, we can't generate the serializer code at runtime. So we need to generate the code at compile-time using meta programming. But at that time, we don't know the type schema in other processes, so we can't generate the serializer code for such inconsistent types. We may need to generate the code which has a loop and compare field name one by one to decide whether to deserialize and assign the field or skip the field value.
One fast way is that we can optimize the string comparison into jump instructions:
- Assume the current type has
nfields, and the peer type hasn1fields. - Generate an auto growing
field idfrom0for every sorted field in the current type at the compile time. - Compare the received type meta with current type, generate same id if the field name is same, otherwise generate an
auto growing id starting from
n, cache this meta at runtime. - Iterate the fields of received type meta, use a
switchto compare thefield idto deserialize data andassign/skipfield value. Continuous field id will be optimized intojumpinswitchblock, so it will very fast.
Here is an example, suppose process A has a class Foo with version 1 defined as Foo1, process B has a class Foo
with version 2 defined as Foo2:
// class Foo with version 1
class Foo1 {
int32_t v1; // id 0
std::string v2; // id 1
};
// class Foo with version 2
class Foo2 {
// id 0, but will have id 2 in process A
bool v0;
// id 1, but will have id 0 in process A
int32_t v1;
// id 2, but will have id 3 in process A
int64_t long_value;
// id 3, but will have id 1 in process A
std::string v2;
// id 4, but will have id 4 in process A
std::vector<std::string> list;
};