Serialiazing your data with Protobuf
You probably already had to develop a project where you needed to exchange information between processes or even across different machines with different processor architectures. One well-known technique in this scenario is serialization, which is summarized in the translation of data structures or object state into a format that can be stored and retrieved by the both sides.
In this blog post, we will discuss the Protobuf (Protocol Buffers), a project that can extend more than a simple library for serialization. The entire example presented here is available on Github.
What is Protobuf?
Protocol Buffers is an open source project under the BSD 3-Clause license, a popular one developed by Google, to provide a language-neutral, platform-neutral and extensible mechanism for serializing structured data. It supports many popular languages such as C++, C#, Dart, Go, Java and Python. Although there are still other not official add-ons, that support other languages, such as C. You can find the source code on Github, where its popularity reaches almost 32K stars!
The neutral language used by Protobuf allows you to model messages in a structured format through .proto files:
In the example above we use a structure that represents a person’s information, where it has mandatory attributes, such as name and age, as well as having the optional email data. Mandatory fields, as the name already says, must be filled when a new message is constructed, otherwise, a runtime error will occur.
But Why not XML?
But, why another language and serialization mechanism if we can use something already available like XML? The answer is performance.
Protobuf has many advantages for serialization that go beyond the capacity of XML. It allows you to create a simpler description than using XML. Even for small messages, when requiring multiple nested messages, reading XML starts to get difficult for human eyes.
Another advantage is the size, as the Protobuf format is simplified, the files can reach 10 times smaller compared to XML. But the great benefit is its speed, which can reach 100 times faster than the standard XML serialization, all due to its optimized mechanism. In addition to size and speed, Protobuf has a compiler capable of processing a .proto file to generate multiple supported languages, unlike the traditional method where it is necessary to arrange the same structure in multiple source files.
That sounds good, but how do I use it in real life?
So that we can illustrate the use of Protocol Buffers, we will exchange messages through different architectures and opposite languages. We will compile a code in C++ for armv7hf architecture, serialize an object to file, and retrieve through a Python script. An advantageous model for those who need to exchange messages between opposing architectures through IPC techniques, even for embedded systems.
For our example, we will use a message that has the reading of several sensors. The file sensor.proto, which will represent the message, is described below:
The variable syntax refers to the version of the Protobuf used, which can be proto2 or proto3. Versions 2 and 3 have important differences, but we will only address version 2 in this post. For more information about version 3, see the official documentation. In addition to the declared attributes, and previously highlighted there is the enumerator SwitchLevel, which represents the state of a port. We could still include new messages, or even lists for multiple ports, for example. For a complete description of the syntax used in proto version 2, see the language guide.
The Protobuf serialization mechanism is given through the
protoc application, this compiler
will parse the
.proto file and will generate as output, source files according to the
configured language by its arguments, in this case, C++. You can also obtain more information
about, reading the section compiler invocation.
protoc compiler will generate the
sensor.pb.cc files, respectively,
of which have the getters and setters needed to access the attributes, as well as methods for
serializing and parsing. The files work only as a stub, and it is necessary to include the headers
distributed by Protobuf. Without this compiler, we would have to describe all the steps of object
serialization in our code, and for any new change, it would be needed to update the C++ and Python
Now that we have the stubs, we can implement an example to serialize the data collected by a sensor.
main.cpp will be described below:
The Sensor object can be serialized through methods inherited from the Message class. For example, we can serialize to a string by the SerializeAsString method.
Note that this reconstruction can be performed by other languages also supported by Protobuf, in addition to other architectures. In order for the transmission to occur through different processes, it will be necessary to use IPC techniques, for this, Google provides the gRPC project, a universalRPC framework, that supports Protobuf directly. However, our intention in this post is just to talk about Protobuf, so we will use the only text file as a means to exchange messages between processes:
To perform serialization through a file, we use the SerializeToOstream method.
Building the project
For the next step, we will describe the actions for constructing the project by CMake:
This recipe searches for the modules, libraries, and macros provided by the Protobuf project when
calling find_package. Once found and
protobuf_generate macros will be available for use. The
protobuf_generate_cpp function is responsible for executing the
protoc and populating the
PROTO_HDRS variables with their generated
files. Without this functionality, you would need to manually add the
protoc command and the
required arguments. The subsequent lines follow the most usual of CMake projects. Because the
generated files will be in the build directory, you need to include it by
target_include_directories so that
main.cc can resolve
It is also possible to observe that we are using Conan to solve Protobuf as a
dependency. The conan_basic_setup
function will be in charge of configuring all the necessary variables, besides generating the
In addition, you must also declare the conanfile.txt file with the following dependencies:
Since Protobuf can be divided into two parts, the protoc installer, and the libraries, there are two
separate packages. Thus, it will be possible to install
protoc for the same host architecture,
and libraries for a target architecture. As we are using CMake for this project, we need to declare
the CMake generator.
Now just run the commands to build the project:
So far so good, but how is it done in case of cross compilation? In this case, it will be necessary to inform the compiler and the target platform:
In the above commands, we have installed only the prebuilt Protobuf libraries for armv7hf. The
protoc will only hold for amd64 because it ignores arch, making use of only the host
architecture by arch_build in your profile. CMake needs to be informed which compiler will be used, so we
define it through
CMAKE_CXX_COMPILER. Once ready, we can copy our application directly to the
Parsing with Python
Now we get to the second step, read the file and retrieve the object using Python. For this, we will only update the CMake script, so that it generates the C++ files and also the python stub:
protobuf_generate_python function has the same goal as
protobuf_generate_cpp but will
generate the file
proto_python virtual target was added to force CMake
to call the generator for Python.
The next step is to develop the script that will read the file with the serialized data and parse it through the script generated in the previous step:
The script is fairly straightforward, just like the code in C++ and can be copied together with the
sensor_pb2.py file directly to the target platform.
Transfer data between processes, serializing objects or even storing data are techniques that are widely used in all scenarios, but they require a lot of effort when implemented and are often not the goal of the project under development. Serialization techniques can be solved through several projects available, such as Protobuf, without having to delve into the low level required to process all the data.
The success in using Protobuf is not only in serializing the data, but in the mechanism as a whole, from the neutral language used, flexible and easy to understand, to the compiler with support for multiple languages, and even integration with other products, such as the gRPC, which provides direct communication between processes without much effort.
This post blog was a tutorial to demonstrate how tasks that could take up to hours to complete, with library development, can be solved in a few steps, only using what is ready and without the need to build from the sources.
Interested in knowing more or commenting on the subject? Please do not hesitate to open a new issue.