Abstract
The new edition of the C++ standard [1], expected to be released in 2026, will add reflection to C++. This will allow developers to write code that will examine the properties of classes, types, functions, and other elements and make changes to them. At the time of writing, reflection is not supported by the main branches of compilers, but it is available in branches such as clang [2] and gcc, which can be build on user's own. For quick experiments, one can also use Compiler Explorer [A].
In review articles and videos on this topic, the most common example is printing arbitrary structures or serializing them in JSON. In this article, the author invites the reader to practice with a slightly more complex example -deserialization with protobuf.
Overview
Section Goals and Objectives outlines the goals and objectives, while the Instrumentarium section provides a brief overview of the reflection features that will be needed for our exercise. Section Protobuf briefly describes the serialization format. In the Design section, we will formulate the design rationales. In the Implementation section, we will write the code, and in Verification, we will outline the testing methodology. Section Conclusions will summarize the results of our exercises. The Discussion section will supplement the details regarding the intentions or motivations behind some aspects of the design and implementation.
Goals and Objectives
The purpose of this article is to write a solution for deserializing arbitrary messages in protobuf format with the possibility of using them in projects without dynamic memory (malloc-free). The main goal is to master new C++ features and familiarize readers with examples of their practical use. A secondary goal is to understand the Protobuf serialization format.
Instrumentarium
^^exprthe reflection operator - prefix^^- produces a reflection value from a grammatical construct - its operandexpr. Reflection values — reflections for short - values of an opaque typestd::meta::info, which one can use in metafunctions to analyze or alter properties of grammar constructs. [1][: refl:]splicer - produces an expression evaluating to the entity represented by r in grammatical contexts that permit expressionsFor example [A]
int main() { int x = 1; constexpr auto r = ^^x; return [: r :]; }nonstatic_data_members_ofreturns a vector of reflections representing the direct non-static members of the class type represented by its first argument. In some cases its result can be used only after transforming to a static array with functiondefine_static_array[3].access_contextis a non-aggregate type that represents a namespace, class, or function from which queries pertaining to access rules may be performed, as well as the designating class.access_context::uncheckedfunction, returningaccess_contextwith no access restrictions.For example [B]
struct foo { int x; }; int main() { foo bar {12}; constexpr auto refl = define_static_array(nonstatic_data_members_of(^^foo, access_context::unchecked()))[0]; return bar.[: refl :]; }type_ofreturns a metaobject, representing type of the reflection, passed as its parameter.extract<type>returns value, associated with the reflection, passed as its parameter.template foris an expansion statement [4], unrolling the loop at compile time, repeating its body for all elements of the argument. Important difference from plainforis that the type of the cycle variable is evaluated for each iteration. This allows iterating over tuple elements.For example [C]
template for (auto i : make_tuple(0, 'a')) { println("{}", i); }[[=v]]annotation is a compile-time value that can be associated with a construct to which attributes can appertain [5]. Annotations differs from attributes by presence of ‘=’ prefix. Annotations allow values of a user type, provided the values areconstexpr.annotations_ofreturns all the annotations on item.For example [D]
enum [[=11]] foo {}; int main() { constexpr auto bar = annotations_of(^^foo)[0]; return extract<int>(bar); }data_member_specreturns a reflection of a data member description for a data member of given type.define_aggregatetakes the reflection of an incomplete class/struct/union type and a range of reflections of data member descriptions and completes the given class type with data members as describedconsteval { stmt; }designates a scope that allows calling reflection functions with side effects, such asdefine_aggregateFor example [E]
struct foo; consteval { define_aggregate(^^foo, { data_member_spec(^^int, {"value"}) }); } int main() { foo bar { 10 }; return bar.value; }
Protobuf
Protobuf (Protocol Buffers) is a data serialization format [6] from Google for encoding structured data into a compact binary format for exchanging data (messages) between different systems. The message format is described in a .proto file [7], from which C++ (or other) code for serialization and deserialization can then be generated. Protobuf supports basic field types (arithmetic, string, boolean), user-defined types (enumerations and structures), sequences (repeated), unions (oneof), and maps (map).
Integer types explicitly specify their bit width (int32, int64) and subtype fixed or signed (fixed32, sint64). Fixed types are stored with a fixed number of bytes, while other types are stored with a variable, minimum-required number of bytes.
In binary format, fields are identified by their numbers from proto and type, and can follow in any order or be omitted [8]. Although in proto2 fields can be marked as required, this is not recommended (required: Do not use), and in proto3 all fields are optional.
Design Rationales
Decisions
- The Protobuf message type for C++ code will be created or translated by the user using C++ data structures, supported data types, and annotations
- Field numbers will be assigned by the user by adding an annotation of type int with the field number
- For scalar types and Protobuf data subtypes, we will define analogues - enum with corresponding base types
int32,sint32,uint32,fixed32,sfixed32,int64,sint64,uint64,fixed64,sfixed64 - We will also support some native C++ types:
bool,int32_t,int64_t,uint32_t,uint64_t,float, anddouble, which will represent Protobuf typesbool,sint32,sint64,uint32,uint64,float, anddouble, respectively - For Protobuf enumeration types, the user will define analogues in C++, and our solution will allow their use in message structures.
- Protobuf character strings will be supported using
std::string,char[N], orstd::array<char,N>at the user's discretion. - We will support nested messages as nested structures or as
std::unique_ptrto such structures - For repeated fields, we will support static arrays and containers with the
push_backmethod -std::vector,std::list - For all types except containers, we will allow the use of
std::optionalwrappers - Protobuf
map-std::maporstd::unordered_mapat the user's choice oneof- anonymous union or std::variant at the user's choice- Packed sequences will also be marked with the packed annotation
- We will use
std::istreamas the input data stream. - We will ignore unrecognized or excessive elements in the input stream.
- To deserialize fields, we will use a dispatch table with functions for reading message fields.
- We will fill the table using reflection for all message fields.
- The result of the message deserialization function will be a set of flags indicating the presence of values in the fields
Definition
Based on the decisions made, we can make the following definitions:
- A field is a non-static member of a structure with a number in the annotation.
- A message is a structure (class) that has at least one field.
Principle of operations
- The user creates a message structure or translates it from proto.
- The user code calls the deserialize function with the input stream and an instance of the message structure.
- At compile time, our code fills in a dispatch table with functions for reading fields.
- During deserialization, our code
- For each element in the input stream, tries all field reading functions until it succeeds
- If the element is not recognized, it skips it
- Upon completion of deserialization, returns a set of flags as an indication of the presence of fields
Translation example
| Protobuf | C++ |
|---|---|
enum Enum {
NOTOK = 0;
OK = 42;
}
message Example {
sint32 signed_int = 1;
uint32 unsigned_int = 2;
fixed32 fixed_32 = 3;
string text = 4;
Enum status = 5;
repeated double array_of_double = 6;
map<sint32, string> map_int_to_text = 7;
oneof Union {
sint64 int_alternative = 8;
double double_alternative = 9;
}
}
|
enum class Enum {
NOTOK = 0,
OK = 42
};
struct Example {
sint32 signed_int [[=1]];
uint32 unsigned_int [[=2]];
fixed32 fixed_32 [[=3]];
string text [[=4]];
Enum status [[=5]];
vector<double> array_of_double [[=6]];
map<sint32, string> map_int_to_text [[=7]];
union {
sint64 int_alternative [[=8]];
double double_alternative [[=9]];
};
};
|
Implementation
Skeleton of the message deserialization function
Implementing the decisions made in the previous section and abstracting from the details, let's write the skeleton of the deserialization function “in the first approximation”:
template<message Message>
auto deserialize(std::istream &input, Message &msg) {
// Filling dispach table at compile time
static constexpr auto field_readers = make_field_readers<Message>();
deserialize_result_type<Message> result { };
while (input.good()) { // While reading is possible
const auto [num, type] = read_id_type(input); // reading data type and id
if (! input.good()) break;
deserialize_result_type<Message> field_success { };
for (auto reader : field_readers) { // Trying to apply every item from the dispatch table
if (const auto attempt_success = reader(input, msg, num, type); attempt_success) {
field_success = attempt_success; // Stop trying on success
break;
}
}
if (field_success) {
result |= field_success;
} else {
skip(input, type);
}
}
return result;
}
The functions and types used here (message, make_field_readers, deserialize_result_type, read_id_type, skip) are details that we have abstracted away and will gradually implement in the following sections.
An online example of this function can be found at [F]. So far, this is standard C++ code and does not require any features from C++26.
The dispatch table generation function make_field_readers
make_field_readers template function generates field reading methods by iterating over all fields of a message and generating a reading function for each one, via instantiating the field_reader template function (example [G]). Here, the splice operator ([:field:]) will come in handy. In order to use it in a loop, the field variable must be constexpr, which is only possible in the template for extension operator. Accordingly, the extension statement argument must be static, i.e. the fields_of function must use define_static_array, just like make_field_readers.
template<message Message>
consteval auto make_field_readers() {
std::vector<field_reader_type<Message>> result {};
template for(constexpr auto field : fields_of<Message>()) {
result.push_back(field_reader<field, num_of(field)>);
}
return std::define_static_array(result);
}
Field reader function field_reader
This template function checks whether the received field number type matches the expected one, and if they match, calls the deserialize function for this field type. Several deserialization types are expected, so our deserialize will be overloaded, and one of the overloads will be the message deserialization function we wrote earlier.
Function num_of
This function finds and returns the first annotation of integer type (int) in the input reflection parameter. To implement it, we will use annotations_of to get a list of annotations, type_of to find out the data type of the annotation, and extract<int> to get the value (example [H]).
consteval auto num_of(std::meta::info member) {
auto annotations = std::meta::annotations_of(member);
auto found = std::ranges::find_if(annotations, [](auto annotation) {
return std::meta::type_of(annotation) == ^^int;
} );
return found == annotations.end() ? invalid_num : std::meta::extract<int>(*found);
}
Function fields_of
This function returns a list of fields of the class passed as a template parameter (example [H]). Here, the reflection operator ^^Class and nonstatic_data_members_of are used to obtain a list of non-static data members and filter them by the presence of a number in the annotation.
template<class Class>
consteval auto fields_of() {
constexpr auto ctx = std::meta::access_context::unchecked();
std::vector<std::meta::info> fields{};
template for(constexpr auto member : define_static_array(nonstatic_data_members_of(^^Class, ctx))) {
if constexpr (num_of(member) != invalid_num )
fields.push_back(member);
}
return std::define_static_array(fields);
}
Concept message
Now we have enough means to define message concept.
template<class Message> concept message = std::is_class_v<Message> && (fields_of<Message>().size() != 0);
Deserialization result type deserialize_result_type
We will create this type as a set of bit fields, each of which will correspond to a field in the message. To make it easier to use, we will make the names of the bit fields identical to the names of the message fields. To do this, we will use define_aggregate and data_member_spec to create a definition for our incomplete type deserialize_result_type (example [I]), and identifier_of to get the message field identifier.
template<message Message>
struct deserialize_result_type;
template<typename Message>
consteval auto define_result() {
return define_aggregate(^^deserialize_result_type<Message>,
std::views::transform(fields_of<Message>(), [](auto field) {
return data_member_spec(^^bool, {identifier_of(field), {}, 1U});
}));
}
To complete the definition, this function should be called in a consteval block.
struct foo {
int num1 [[=1]];
long num2 [[=2]];
};
consteval { define_result<foo>(); }
This is a workable solution, but it puts responsibility to call define_result for each message type on the user. To make this happen automatically, as needed, this consteval block must be in a context where Message is present as a parameter, i.e., in an auxiliary template structure. And in order for field_reader to return a result with a bit corresponding to the read field, we need a function that would set this bit according to the reflection value of the field. Let's do all this in an auxiliary template (example [J]).
template<message Message>
struct deserialize_helper {
struct result_type;
static consteval auto define_result() {
return std::meta::define_aggregate(^^result_type, std::views::transform(fields_of<Message>(), [](auto field) {
return std::meta::data_member_spec(^^bool, {std::meta::identifier_of(field), {}, 1U});
}));
}
consteval { define_result(); }
static consteval result_type set_by_name_of(auto Info) {
struct result_type result {};
template for(constexpr auto bit : members_of<result_type>()) {
if (identifier_of(Info) == identifier_of(bit)) {
result.[:bit:] = true;
}
}
return result;
}
template<std::meta::info Member>
requires (std::same_as<Message, class_of<Member>>)
static constexpr result_type deserialized() {
template for(constexpr auto field : fields_of<class_of<Member>>()) {
if constexpr(field == Member) {
return set_by_name_of(field);
}
}
return {};
}
};
template<message Message>
using deserialize_result_type = deserialize_helper<Message>::result_type;
template<std::meta::info Field>
constexpr auto deserialized() {
return deserialize_helper<class_of<Field>>::template deserialized<Field>();
}
(Un)packed sequences
Packed Protobuf sequences mean that their elements will follow one after another, which saves on field numbers. Elements of unpacked sequences can be mixed with other fields, each preceded by a field number. Sequences of primitive types are packed by default, but can be marked as unpacked. To mark them as such, we define the data type:
enum class packed {
default_,
packed,
unpacked
};
Users will use this data type to denote unpacked sequences of scalar types:
struct foo {
int unp[2] [[=1, =packed::unpacked]];
long pack[2] [[=2, =packed::packed]];
long norm[2] [[=3]];
};
To use this attribute in our code, let's define a structure with proto attributes supported in our implementation (only packed for now):
struct attributes {
packed packed;
};
And the function of reading attributes using reflection (example [K]):
consteval auto attrs_of(std::meta::info member) {
return std::ranges::fold_left(std::meta::annotations_of(member), attributes { },
[](attributes attrs, auto annotation) {
if (type_of(annotation) == ^^packed)
attrs.packed = std::meta::extract<packed>(annotation);
return attrs;
});
}
We will use the obtained attributes in the field_reader function
This concludes the need for reflection; from this point on our implementation will only use old good C++23.
Serialization of primitive types
Protobuf uses variable-bit-width encoding [7], in which an input stream byte contains seven informative bits and one service bit, whose zero value indicates that it is a terminal byte, and non-zero value indicates that it is an intermediate byte. To read variable data, we will write a function:
inline std::uint64_t read_variant(std::istream &input) {
static constexpr char high = 0x80;
std::uint64_t val { };
char chr { };
unsigned count = 0;
for (bool done = false; !done && input.get(chr).good(); done = (chr & high) != high, count += 7) {
val |= std::uint64_t(chr & 0x7F) << count;
}
return val;
}
For signed integers (signed int), zigzag encoding is also used to reduce the number of non-zero high bits. Zigzag can be decoded using the following function:
template<typename T>
constexpr auto decode_zigzag(std::uint64_t value) noexcept {
return static_cast<T>((value >> 1) ^ (-(value & 1)));
}
One of the features of protobuf is the ability to skip a data element based solely on data from the input stream. The need to skip something may arise due to the presence of new fields in the input stream that have no counterparts in the local message type definitions. The function for skipping such elements may look like this:
inline auto skip(std::istream &input, data_type type) {
switch (type) {
case data_type::fixed32:
input.ignore(sizeof(fixed32));
break;
case data_type::fixed64:
input.ignore(sizeof(fixed64));
break;
case data_type::variant:
read_variant(input);
break;
case data_type::lengthy:
input.ignore(read_variant(input));
}
}
Data types and concepts
Let's define data types corresponding to types in Protobuf
enum int32 : <code>std::int32_t</code> {};
enum sint32 : <code>std::int32_t</code> {};
enum uint32 : <code>std::uint32_t</code> {};
enum fixed32 : <code>std::uint32_t</code> {};
enum sfixed32 : <code>std::int32_t</code> {};
enum int64 : <code>std::int64_t</code> {};
enum sint64 : <code>std::int64_t</code> {};
enum uint64 : <code>std::uint64_t</code> {};
enum fixed64 : <code>std::uint64_t</code> {};
enum sfixed64 : <code>std::int64_t</code> {};
group them by serialization methods using concepts:
template<template<typename A, typename B> class Pred, typename A, typename ... B>
consteval auto anyof() {return std::disjunction_v<Pred<A, B>...>;}
template<typename Type>
concept enumeration = std::is_enum_v<Type>
&& !anyof<std::is_same, Type, int32, sint32, uint32, fixed32, sfixed32, int64, sint64, uint64, fixed64, sfixed64>();
template<typename Type>
concept variant_integral = anyof<std::is_same, Type, bool, std::uint32_t, int32, uint32, std::uint64_t, int64, uint64>()
|| enumeration<Type>;
template<typename Type>
concept zigzag_integral = anyof<std::is_same, Type, std::int32_t, sint32, std::int64_t, sint64>();
template<typename Type>
concept fixed_arithmetic = anyof<std::is_same, Type, fixed32, sfixed32, fixed64, sfixed64, float, double>();
and write deserialization functions for each of these concepts:
template<variant_integral Type>
void deserialize(std::istream& input, Type& value) {
value = static_cast<Type>(read_variant(input));
}
void deserialize(std::istream& input, fixed_arithmetic auto& value) {
input.read(static_cast<char*>(static_cast<void*>(&value)), sizeof(value));
}
template<zigzag_integral Type>
void deserialize(std::istream& input, Type& value) {
value = decode_zigzag<Type>(read_variant(input));
}
We will fill dynamic arrays using push_back, resize, and move elements, so let's define the relevant concepts:
template<typename Container>
concept resizable = requires(Container c) {
c.resize(1);
};
template<typename Container>
concept back_insertable = requires(Container container) {
container.push_back(std::declval<typename Container::value_type>());
};
and concept of an associative container:
template<typename Container>
concept associative = requires(Container c, Container::key_type k, Container::mapped_type v) {
c[k] = v;
};
Static arrays do not satisfy back-insertable concept, so we define bounded_array concept:
template<typename Array>
concept bounded_array =
(std::is_bounded_array_v<Array> && elementary<std::remove_extent_t<Array>>) ||
(std::ranges::output_range<Array, typename Array::value_type> && !resizable<Array>
&& elementary<typename Array::value_type>);
For each of these concepts, we will write a deserialize function.
Oneof support
At the time of writing, the version of clang with reflection support did not allow anonymous unions [9] to be accessed as regular fields. Instead, it required reflection on the anonymous union itself: obj.[:outer:].[:Field:]. To workaround this, we will add a separate union_reader function.
A fields of type std::variant needs numbers for each alternative, and the reader needs to call to emplace<Index>() before accessing the alternative. To do this, we will also write a separate function, oneof_reader. This will slightly complicate the make_field_readers function, which now has to choose one of three reading functions.
Nested messages
The length of nested messages is recorded before the message data, and the read function must not exceed this length. However, for the root message the length is not provided, and the read function is limited only by the length of the data stream. To implement these features without repeating the body of the deserialize function, let's add a parameter — a pointer to the function that will read the length of nested messages. For root messages, we will pass a pointer to the unlimited function, which will simply return the max_length constant.
The final version of our deserialization solution is presented in example [L].
Verification
The protoc compiler allows serializing data for any message described in the proto format [10].
protoc --encode=tests.numeric.SInt_32 numeric.proto
Let's use this capability to prepare test vectors for functional tests of our solution. Unfortunately, the compilation time limits set on compiler-explorer allow only a small number of tests to be compiled online. The offline solution contains 400+ tests and will be available on github.
Conclusions
Reflection in C++26, even in its initial form, is a powerful tool that will allow us to do without external code generation in many applications that are currently impossible without it. Reflection tools are fairly easy to understand and learn, and their use will reduce boilerplate code, simplify serialization and deserialization implementations, and enhance metaprogramming capabilities.
Discussion
Is a dispatch table necessary?
All functions that operate on reflection must be executed in a consteval context. A constexpr dispatch table guarantees such a context and is fairly simple to create and use. The author admits that there may be solutions without such a table.
Between reflection and the type system
Reflection functions can solve all tasks on types that were previously performed, for example, by template specialization. However, the author was unable to find a simple solution to replace specialization.
template<typename>
struct member_traits;
template<class Class, typename Type>
struct member_traits<std::optional<Type> Class::*> {
using value_type = Type;
};
Perhaps template specialization is and will remain a simpler solution than writing functions in the reflection space.
Subtype differentiation
protobuf defines several subtypes for integer types - fixed32, sfixed32, sint32, int32, uint32, and similarly for 64-bit types. The subtype affects how the value is serialized. For example, fixed32/sfixed32 are always encoded in 4 bytes, others use variable encoding, and sint32/sint64 also use zigzag encoding.
We could be possible define corresponding aliases for subtypes and use the reflection operator's property to return different info objects for different aliases, but these types appear as de-aliased in class definitions.
For these experiments, the author used simple enums. However, it should be noted that enum “breaks” standard type property queries, such as numeric_limits, is_integral, is_signed, is_unsigned, etc. Only some of them can be specialized (numeric_limits), so using message structures with such field types in other templates or libraries may be difficult.
Indicators of value presence
Since all message fields are optional, we need to decide how to inform the client which fields have been read and which have not. One option is to use std::optional. However, this can lead to unwanted “bloating” of the structure. Another option is to return a set of flags from the deserialization operation as an indication of the presence/absence of a field.
However, implementing this approach for nested structures can be non-trivial and inconvenient to use.
The default field value can also indicate the absence of a value, since protobuf skips fields with default values during serialization, with the exception of packed arrays.
This article implements support for std::optional and a set of flags for the root message.
oneof
The protobuf documentation specifies that the fields combined with oneof are deserialized as if they were placed directly in the message. This corresponds to anonymous unions in C++. However, unions have certain safety issues. Therefore, in this solution, the author also implemented support for std::variant with the corresponding count of numbers in the annotations:
std::variant<int, double, std::string> oneof [[=4, =5, =6]];
Building clang with reflection
The clang sources with reflection support is available at [2]. The author used the following commands to build the compiler:
cmake -S llvm -B build -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind" cmake --build build --parallel 5
protoc
The protobuf compiler is available as a standard package on Linux builds. On Ubuntu, it can be installed with the command:
sudo apt-get install protobuf-compiler
Possible development
- Validation of message types for
- non-uniqueness of field numbers
- presence of more than one integer annotation per field,
- mismatch between the number of field numbers and the number of alternatives in the variant, etc.
- Serialization
- Abstraction from iostream
List of references
- Reflection for C++26
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p2996r13.html - LLVM fork for P2996
https://github.com/bloomberg/clang-p2996 - define_static_{string,object,array}
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3491r3.html - Expansion Statements
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p1306r5.html - Annotations for Reflection
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3394r4.html - Protocol Buffers
https://uk.wikipedia.org/wiki/Protocol_Buffers - Language Guide (proto 3)
https://protobuf.dev/programming-guides/proto3/ - Encoding
https://protobuf.dev/programming-guides/encoding/ - Reflection of an anonymous
unionmember is not giving expected results
https://github.com/bloomberg/clang-p2996/issues/251 - Protocol buffer compiler
https://manpages.debian.org/testing/protobuf-compiler/protoc.1.en.html
List of online examples
- https://compiler-explorer.com/z/511q4rPPY
- https://compiler-explorer.com/z/ccaxa79WY
- https://compiler-explorer.com/z/aePE7Wvhh
- https://compiler-explorer.com/z/hjG56Wqoj
- https://compiler-explorer.com/z/43aaeMK4q
- https://compiler-explorer.com/z/dqv4874Ee
- https://compiler-explorer.com/z/sEh8T7qcj
- https://compiler-explorer.com/z/vqdarbqMe
- https://compiler-explorer.com/z/Teesnh4db
- https://compiler-explorer.com/z/nrn4srWfs
- https://compiler-explorer.com/z/hGv5azh69
- https://compiler-explorer.com/z/oPcTEh9vv
Post a Comment