C Binary Serialization Optional Field

ROS two Message Research

This article captures the research washed in regards to the serialization component, including an overview of the current implementation in ROS 1 and the alternatives for ROS 2.

Authors: Dirk Thomas and Esteve Fernandez

Appointment Written: 2013-12

Terminal Modified: 2019-05

This document pre-dates the decision to build ROS 2 on top of DDS.

This is an exploration of possible message interfaces and the relation of the underlying bulletin serialization. This paper is focused on specifying the message API and designing the integration with the serialization with performance too every bit flexibility in mind. It is expected that there are one or more message serialization implementations which tin exist used, such as Protobuf, MessagePack and Thrift.

Background

The letters are a crucial part of ROS since they are the interface between functional components and are exposed in every userland code. A futurity change of this API would crave a pregnant amount of work. So a very important goal is to brand the message interface flexible plenty to be future-proof.

Existing Implementations

ROS 1 messages are information objects which employ member-based access to the message fields. While the bulletin specification is not very feature rich the serializer is pretty fast. The ROS distribution contains message serializers implemented in C++, Python and Lisp. As well that the community provided implementations for other languages, similar C, Ruddy, MatLab, etc.

Other existing serialization libraries provide more features and are tailored for specific needs ranging from small memory footprint over small wire protocol to low performance impact. For the usage on a minor embedded device the constraints regarding the programming language and the available resource is very different from when existence used on a desktop estimator. Similar depending on the network connectivity the importance of the size of the wire format varies. The needs might even exist dissimilar within one ROS graph but different entities.

Areas for Comeback

Future-proof API

Due to the broad domains where ROS is being (and volition be) used and their different requirements and constraints we could not identify a single serialization library which matches all of them perfectly well. It is likewise likely that in the near future more libraries with new advantages (and disadvantages) will come up upwardly. So in club to design the message API in a hereafter-proof way information technology should not expose the serialization library used but make the actually used serialization library an implementation particular.

Serialization should exist optional

With the goal to dynamically choose betwixt the former node and nodelet style of composing a organisation the important the corporeality of scenarios where letters are actually serialized (rather than passed by reference) is likely to decrease. Therefore it would be good if no serialization library needs to exist linked if the functionality is not used at all (e.g. on a self-independent product without external connections). This approach would besides encourage a clean modular design.

Use existing library

In order to reduce the future maintenance attempt existing libraries should be used in social club not to specify and implement nevertheless another wire protocol.

Support more features

Fixed length letters

Optional variant of a message which avoids dynamic memory allocation due east.g. in existent time systems. Since this utilise case implies severe constraints that are not optimal for scenarios where dynamic retentiveness allocation is viable this should not limit the solution but should exist provided equally an alternative implementation.

Optional fields, default values

In ROS 1, letters and services require all information members and arguments to be specified. By using optional fields and default values, we can define simpler APIs and then that users’ code can be more succinct and more readable. At the same fourth dimension we could also provide sane values for certain APIs, such as for sensors.

Additional field types: dictionary

Dictionaries or maps are widely used in several programming languages (e.g. Python), thanks to being built-in data types. However, in order to support dictionaries in equally many languages as we tin can, we have to take into consideration whether all languages provide mechanisms for supporting them. Sure semantics will accept to be considered in the IDL, such equally what datatypes can exist used as keys. This feature would also imply backwards-incompatible changes to the ROS IDL.

Considerations

Member-based vs. method-based access

A message interface which utilizes fellow member-based access to the message fields is a straightforward API. By definition each message object stores its data directly in its members which implies the lowest overhead at runtime. These fields are then serialized directly into the wire format. (This does not imply that the bulletin is a POD – depending on the used field types it can not be mem-copied.)

When considering existing libraries for serialization this approach implies a operation overhead since the message must either be copied into the object provided by the serialization library (implying an boosted copy) or custom code must be developed which serializes the message fields directly into the wire format (while bypassing the bulletin course of the serialization library). Furthermore member-based access makes it problematic to add support for optional field.

On the other paw a method-based interface allows to implement arbitrary storage paradigms behind the API. The methods could either only access some private member variable directly or delegate the information storage to a separate entity which could east.chiliad. be the serialization library specific data object. Normally the methods tin be inlined in languages like C++ and so they don’t pose a significant performance hit merely depending on the utilized storage the API might not expose mutable access to the fields which can imply an overhead when modifying information in-place.

Each serialization library has sure pros and cons depending on the scenario. The features of a serialization library tin be extrinsic (exposed functionality through API, e.thousand. optional fields) or intrinsic (e.m. firmness of wire format). Conceptually only the intrinsic features tin can be exploited when a serialization library is used internally, e.g. behind a method-based message interface.

Support pluggable serialization library

The possible approaches to select the serialization library vary from a compile determination to being able to dynamically select the serialization library for each communication channel. Particularly when a ROS graph spans multiple devices and networks the needs inside ane network are likely already different. Therefore a dynamic solution is preferred.

TODO add more benefits from whiteboard film Serializatoin Pluggability

Possible bulletin storage and serialization process

Pipeline A

The message used by the userland lawmaking stores its information straight. For each communication channel the message data is so copied into the serialization specific message representation. The serialization library will perform the serialization into the wire format from in that location.

Pipeline B

The message fields tin be serialized directly into the wire format using custom code. While this avoids the extra information re-create information technology requires a meaning attempt for implementing the custom serialization routine.

Pipeline C

The message delegates the data storage to an internally held storage backend, eastward.chiliad. the serialization library specific message representation. Since the information is stored directly in the serialization library specific representation copying the data before serializing it is not necessary anymore.

This assumes that the API of the serialization library specific representation tin can be wrapped inside the ROS message API (meet Technical Problems -> Variances in field types).

Select message storage

Under the assumption that nosotros want to avoid implementing the serialization procedure from a custom bulletin grade into each supported serialization format (pipeline B) the process will either require one extra copy of the data (pipeline A) or the message must directly shop its data in the specific bulletin representation of the used serialization library (pipeline C).

For the later approach the conclusion tin can exist fabricated transparent to the userland lawmaking. Each publisher can deed as a factory for message instances and create messages for the userland code which fit the currently used communication channels and their serialization format best. If multiple communication channels employ different serialization formats the publisher should still choose one of them as the storage format for the created message instance to avert at least 1 of the necessary storage conversions.

Binary compatibility of message revisions

When message objects are used in nodelets one problem is that two nodelets which run in the same process might have been linked against different definitions of a bulletin. E.g. if we add optional fields to the message IDL one might contain the version without the optional field while the other does contain the extended version of the message.

The two different binary representations will break the ability to exchange them using a shared arrow.

This can exist the case for any of the pipelines. In the case of pipeline A where only the ROS message is part of the nodelet library (the serialization specific code is just part of the nodelet managing director) both revisions must exist binary compatible. In the case of pipeline C some serialization libraries (e.g. Protobuf) are definitely not binary uniform when features like optional fields are being used (bank check this assumption).

Minimal code around existing libraries

Both pipeline A likewise as C a possible to implement using a pocket-size layer around an external serialization libraries to adapt them to the message API and make them pluggable into the ROS bulletin system.

Generate POD letters for embedded / real time use

Generate a special message class which acts as a POD which is mem-copyable as well equally without any dynamic memory resource allotment. Although this tin can be done from any linguistic communication, ane particularly useful state of affairs is for portable C99, for use in everything from microcontrollers to soft-core processors on FPGA’southward, to screwed difficult real-fourth dimension environments.

For each set of max size constraints the message form would require a “mangled” proper noun: e.g. in C99 Foo_10_20 would represent a message Foo where the two dynamic sized fields have max sizes of 10 and xx. For each blazon a default max size could be provided equally well as each field could have a specific custom override in the message IDL. See https://github.com/ros2/prototypes/tree/master/c_fixed_msg for a prototype illustrating the concept.

Performance Evaluation

Member/method-based access and message copy vs. serialization

Serializing messages (pocket-size every bit well as big ones) is at least two orders of magnitude slower than accessing message fields and copying messages in memory. Therefore performing a message re-create (from one information representation to the serialization library data representation) tin can exist considered a neglectable overhead since the serialization is the clear performance bottleneck. The selection of i serialization library has a much college bear on on the performance. See https://github.com/ros2/prototypes/tree/chief/c_fixed_msg for criterion results of serialization libraries. Choosing pipeline A over pipeline B or C should therefore not impose any significant performance hit.

Member-based vs. method-based admission (with storage in fellow member variable)

As the results of produce_consume_struct and produce_consume_method show the performance divergence is not measurable. Therefore a method-based interface is preferred equally it allows future customizations (e.g. changing the fashion the data is stored internally).

Storage in message vs. storage in templated backend (both using method-based access)

As the results of produce_consume_method and produce_consume_backend_plain testify the performance difference is again not measurable. Therefore a templated backend is preferred as it allows customizations (e.thousand. add together value range validation, defer storage, implement thread safe, custom logging for debugging/introspection purposes) and enable to drop in custom storage backends (e.g. any bulletin grade of an existing serialization which suits our API).

Technical Issues

Variances in field types

Dissimilar serialization libraries specify dissimilar field type when e.one thousand. generating C++ code for the messages. The major problem is the mapping between those types in an efficient manner. From a performance betoken of view the bulletin interface should betrayal const references (especially to “big” fields). But those can only be mapped to the specific API of the serialization library if the types are exchangeable. But for example a byte assortment is represented differently in C++ in the various serialization libraries:

Protobuf:
- dynamic array<T>: RepeatedField<T> (STL-like interface)
- fixed array<T>: —
- binary/cord: std::string
Thrift:
- dynamic array<T>: std::vector<T>
- fixed array<T>: —
- binary/cord: std::string
ROS:
- array<T>: std::vector<T>
- stock-still assortment<T, Northward>: heave::array<T, N>
- cord: std::string

Furthermore the serialization library specific message API might non expose mutable access which could therefore not be provided by RO either when using pipeline C.

Preliminary Determination

Serialization Pipeline

Due to the mentioned issues and added complexity the pipeline C is not viable. Since the corporeality of time necessary for the memory copy before a serialization is orders of magnitudes smaller than the actual serialization pipeline A is selected for further prototyping. This still allows us to implement the optimization described as pipeline B for e.g. the default serialization library if need is.

Message Interface

Nether the assumption that a method-based access is not significantly impacting the performance it is preferred over a member-based access in club to enable changing the storage backend in the future and enabling overriding information technology with a custom implementation.

Update

With the decision to build ROS 2 on meridian of DDS the Pipeline B will exist used. The previous conclusion to switch from member to method based access has been revisited. Since nosotros practice not have the need to make the storage backend exchangeable anymore and we might prefer keeping the member based access of messages to keep it similar with ROS 1.

Source: https://design.ros2.org/articles/serialization.html