Interoperable Internet-Enabled DDS Applications

Middleware News Brief (MNB) features news and technical information about Open Source middleware technologies.

Introduction

DDS (Data Distribution Service) is an OMG standard and a potentially excellent technology for the (Industrial) Internet of Things (IIoT), since it allows developers to create robust, secure, and interoperable distributed applications by providing publish/subscribe semantics to a shared cache.

Interoperability for DDS applications is achieved by using the Real-Time Publish Subscribe (RTPS) protocol. However, the discovery procotols in RTPS assume that the underlying network supports multicast and that datagrams are not subject to network address translation (NAT), a technique employed by many firewalls to allow a number of hosts to share the same public IP address by translating packets sent from an internal IP and port combination to one of its external IP addresses and ports and vice versa. Both of these assumptions are violated in a typical IIoT application. Thus, in order for DDS and RTPS to become viable technologies for IIoT, we must expand the reach of RTPS to include networks where multicast is not supported and packets are subject to NAT.

This article presents a solution to these problems by introducing a service that relays RTPS messages.

To illustrate the aforementioned challenges, suppose you are a manufacturer of industrial equipment. You have already selected DDS as a technology, perhaps due to its QoS policies, and you have committed to using RTPS with security to ensure interoperability with other vendors and equipment.

In terms of IIoT, you would like to monitor the equipment for predictive maintenance and R&D, provide a dashboard to customers, etc. Furthermore, suppose that you would like the cloud-hosted or datacenter-hosted part of the solution to be based on DDS/RTPS for similiar reasons (QoS, interoperability, and security) instead of simpler and less efficient protocols like HTTP and MQTT.

When your equipment deployed in your customer's network attempts to discover services in your cloud or datacenter, it will certainly run into problems stemming from a lack of multicast on the public Internet. It may also run into problems with your customer's firewall performing NAT.

Before describing the problems in detail, some terminology is needed.

In DDS, a writer writes samples on its respective topic, and a reader reads samples from its respective topic.

publisher is a collection of writers, and a subscriber is a collection of readers.

participant is a collection of publishers and subscribers.

RTPS defines two protocols for discovery: the Simple Participant Discovery Protocol (SPDP) and Simple Endpoint Discovery Protocol (SEDP). An endpoint in RTPS terminology corresponds to a DDS reader or writer.

SPDP relies on multicast, which is not supported on the public Internet. We acknowledge that various DDS implementations, including OpenDDS, can be configured with the SPDP unicast addresses of the other participants. However, we feel this is not a viable solution, since a DDS deployment that spans the public Internet will most likely not have control over the IP address assignments of the participants. Thus, the first barrier to overcome is the exchange of SPDP messages.

SPDP and SEDP messages contain locators (IP address and port pairs), which encode the addresses of endpoints. RTPS uses these locators to send messages to particular endpoints. If the locators refer to IP addresses that are behind a firewall that performs NAT, the locators are useless, as they are not valid on the public side of the firewall. The second barrier, then, is to provide a way for DDS applications to tunnel through firewalls that perform NAT.

For more detail on these challenges, see our previous article, in which we describe how to run interoperable DDS applications in cloud environments that don't support multicast.

In the tutorial below, we use OpenDDS as the implementation of DDS, but other implementations of DDS could be extended to take advantage of the proposed solution.

The RtpsRelay

 +-------------+     +-------------+
 |  RtpsRelay  |<--->|  RtpsRelay  |
 +-------------+     +-------------+
        ^                   ^
        |                   |
        v                   v
 +-------------+     +-------------+
 |   Firewall  |     |   Firewall  |
 +-------------+     +-------------+
        ^                   ^
        |                   |
        v                   v
 +-------------+     +-------------+
 | Participant |     | Participant |
 +-------------+     +-------------+

The RtpsRelay is a distributed, scalable, fault-tolerant service that forwards RTPS messages among virtual multicast groups that are defined by the application. The relay depends on unicast, which is supported on the public Internet and works as expected when subject to NAT.

When an application sends an RTPS message to the relay, the relay sends it to all of the other participants that are in the group. It does this by maintaining a map of participants and their addresses and a separate map of participants and their groups. These maps are generated by parsing the source address, GUID in the header of each RTPS message, and a property that declares the groups in SPDP messages.

Groups are a concept in the RtpsRelay and not DDS or RTPS.

To use a relay, a participant must:

  1. Send all relevant messages to the relay
  2. Include its set of groups in its SPDP messages
  3. Maintain NAT bindings

A participant must send SPDP messages to the relay on a periodic basis. This has the effect of informing the relay of the groups for that participant and keeping alive the NAT bindings so the relay can deliver SPDP messages to that participant. A participant may choose not to send SEDP messages and ordinary RTPS messages if it knows that none of the recipients are using the relay.

The NAT binding problem also applies to endpoints that do not send messages periodically, such as a best-effort reader. For these cases, the participant should send an empty RTPS message periodically to maintain NAT bindings and inform the relay of the return address of the endpoint. The relay silently discards the empty RTPS messages.

The RtpsRelay does not participate in discovery (although we have considered this for future work). That is, the relay does not consider the associations between readers and writers when deciding if an RTPS message should be relayed.

The size and activity of a group, then, determine the outgoing message rate of the relay, which in turn dictates its efficiency.

Applications that make use of the Partition QoS will often use the partitions as groups. The groups are specified using a comma-separated list of strings passed via the "OpenDDS.RtpsRelay.Groups" property.

The RtpsRelay can be used with or without security, since it does not participate in discovery. Obviously, an application running over the public Internet should use security; however, not using security may be useful for internal deployments and debugging.

Even when using security, SPDP messages are sent in the clear to bootstrap discovery and verified through a secure exchange. This represents a potential security threat, since a man-in-middle could modify the groups for a participant or learn sensitive information encoded in the groups. While we have not addressed this problem in the current implementation of the relay, we have plans to support DTLS to secure the SPDP messages.

The RtpsRelay itself is a distributed, scalable, fault-tolerant DDS application. Each RtpsRelay process has a set of ports for exchanging application RTPS messages with the participants called the vertical ports.

The relays gossip about the participants' groups and addresses using DDS topics. Each RtpsRelay process also has a set of ports for exchanging application RTPS messages with other relays called the horizontal ports.

As implied above, each participant should use a single relay server due to the way NAT bindings work. Most firewalls will only forward packets received from the destination address that was originally used to create the NAT binding. That is, if participant A is interacting with relay A, and participant B is interacting with relay B, then a message from A to B must go from A to relay A, to relay B, and finally to B. Relay A cannot send directly to B because that packet will not be accepted by the firewall.

In the best case, two participants share the same relay, resulting in two hops per message. In the worst (common) case, two participants use different relays, resulting in three hops per message.

Tutorial

In this section, we'll use the relay to exchange RTPS messages when direct connectivity cannot be established. We will construct the following system with Docker containers:

 +-------------------------------+
 |             relay             |
 +-------------------------------+
       ^                  ^
       |                  |
       v                  v
 +------------+     +------------+
 |   pubnet   |     |   subnet   |
 +------------+     +------------+
       ^                  ^
       |                  |
       v                  v
 +------------+     +------------+
 | publisher  |     | subscriber |
 +------------+     +------------+

The relay is an RtpsRelay, the publisher is a publisher, and subscriber is a subscriber. 

pubnet and subnet are two networks that will isolate the publisher and subscriber.

Docker will not perform network address translation in this scenario. The relay will have a network interface on both pubnet and subnet.

  1. Open a new terminal for the relay and execute
        # Create a network for the publisher
        docker network create pubnet
        # Create a network for the subscriber
        docker network create subnet
        # Create a container for the relay, connect it to both networks, and start it
        docker create --name=relay --rm -ti --net=pubnet -w /opt/OpenDDS/tools/rtpsrelay objectcomputing/opendds:relay ./RtpsRelay -DCPSConfigFile rtps.ini
        docker network connect subnet relay
        docker start relay
  1. Open a new terminal for the publisher and execute
docker run --name=publisher --rm -ti --net=pubnet -w /opt/OpenDDS/tests/DCPS/Messenger -e LD_LIBRARY_PATH=. objectcomputing/opendds:relay publisher -DCPSConfigFile relay_rtps.ini
  1. Open a new terminal for the subscriber and execute
docker run --name=subscriber --rm -ti --net=subnet -w /opt/OpenDDS/tests/DCPS/Messenger -e LD_LIBRARY_PATH=. objectcomputing/opendds:relay subscriber -DCPSConfigFile relay_rtps.ini
  1. Wait for the publisher and subscriber to discover each other (up to a minute)
  2. Verify the output of the publisher
        Starting publisher
        Starting publisher with 1 args
        Reliable DataWriter
        Creating Writer
        Starting Writer
        Writer finished
        Writer wait small time
        deleting DW
        deleting contained entities
        deleting participant
        shutdown
  1. Verify the output of the subscriber
        Transport is RELIABLE
        Reliable DataReader
        /opt/OpenDDS/tests/DCPS/Messenger/DataReaderListener.cpp:146: INFO: on_subscription_matched()
        /opt/OpenDDS/tests/DCPS/Messenger/DataReaderListener.cpp:139: INFO: on_liveliness_changed()
        SampleInfo.sample_rank = 0
        SampleInfo.instance_state = 1
        Message: subject    = Review
        subject_id = 99
        from       = Comic Book Guy
        count      = 0
        text       = Worst. Movie. Ever.
        ...
        SampleInfo.sample_rank = 0
        SampleInfo.instance_state = 1
        Message: subject    = Review
        subject_id = 99
        from       = Comic Book Guy
        count      = 39
        text       = Worst. Movie. Ever.
        SampleInfo.sample_rank = 0
        SampleInfo.instance_state = 2
        /opt/OpenDDS/tests/DCPS/Messenger/DataReaderListener.cpp:94: INFO: instance is disposed
        /opt/OpenDDS/tests/DCPS/Messenger/DataReaderListener.cpp:139: INFO: on_liveliness_changed()
        /opt/OpenDDS/tests/DCPS/Messenger/DataReaderListener.cpp:146: INFO: on_subscription_matched()
  1. Clean up
        docker kill relay
        docker network remove pubnet
        docker network remove subnet

Discussion

The relay is started with the name relay, so that we can refer to it in the configuration of the publisher and subscriber. (Naming a container with --name=NAME causes a DNS record to be generated for the container.)

By default, the relay listens on 0.0.0.0:4444 for SPDP messages, 0.0.0.0:4445 for SEDP messages, and 0.0.0.0.4446 for data messages.

The relay_rtps.ini file used by the publisher and subscriber contains the configuration necessary to use the relay:

    [common]
    DCPSGlobalTransportConfig=$file
 
    [domain/4]
    DiscoveryConfig=rtps
 
    [rtps_discovery/rtps]
    SpdpRtpsRelayAddress=relay:4444
    SedpMulticast=0
    SedpRtpsRelayAddress=relay:4445
 
    [transport/the_rtps_transport]
    transport_type=rtps_udp
    use_multicast=0
    DataRtpsRelayAddress=relay:4446

The SedpMulticast=0 and use_multicast=0 lines are optional and can be omitted.

The relay distributes messages among the publisher and subscriber because they are both in the Messenger group.

The following code snippet shows how the groups are declared to OpenDDS:

    DDS::DomainParticipantFactory_var dpf = TheParticipantFactoryWithArgs(argc, argv);
    DDS::DomainParticipantQos part_qos;
    dpf->get_default_participant_qos(part_qos);
    DDS::PropertySeq& props = part_qos.property.value;
 
    const DDS::Property_t prop = { "OpenDDS.RtpsRelay.Groups", "Messenger", true /* propagate */ };
    const unsigned int len = props.length();
    props.length(len + 1);
    props[len] = prop;
 
    DDS::DomainParticipant_var participant =
      dpf->create_participant(DOMAIN,
                              part_qos,
                              0,
                              0);

A property with propagate set to true is included in SPDP messages, which is necessary for communicating the "OpenDDS.RtpsRelay.Groups" property to the relay.

Deployment Considerations

The set of vertical (application-facing) ports can be controlled with the -VerticalAddress option.

The set of horizontal (relay-facing) ports can be controlled with the -HorizontalAddress option.

The ports are separate because horizontal messages are processed differently than vertical messages. This separation also provides added deployment flexibility and allows dual network interface configurations where one interface services the vertical while the other services the horizontal.

A dual network interface configuration may be useful in pursuing different security and/or performance objectives.

The RtpsRelay is a DDS application whose domain can be configured with the -Domain option. As such, it is possible to write other DDS applications that use the group and routing information maintained by the relays. This information may be useful for monitoring and control. For example, one could use it to determine if a particular group is too large, empty, etc.

The RtpsRelay purges inactive participants. This behavior is controlled by the -RenewAfter option and the -Lifespan option.

The -Lifespan option sets the lifespan in seconds of the group and routing information that is shared among the relays and defaults to 5 minutes. The group and routing information must be renewed within this amount of time; otherwise, the participant is purged.

Instead of renewing the group and routing information for every received message, which would be excessively chatty, the relay waits a certain amount of time between renewals. This duration is specified by the -RenewAfter option, which defaults to 1 minute.

The -Lifespan option should always be larger than the -RenewAfteroption.

Running a relay cluster with RTPS in the cloud leads to a bootstrapping issue because multicast is not supported in the cloud. Possible solutions for this include:

  • Use a discovery mechanism that does not rely on multicast
  • Run a single well-known relay that allows the other relays to discover each other
  • Use the multicast repeater to form a virtual multicast group among all the relays

RTPS uses UDP, which typically cannot be load balanced effectively due to the way NAT bindings work. Consequently, each RtpsRelay server must have a public IP address.

Load balancing can be achieved by having the participants choose a relay according to a load balancing policy.

To illustrate, each relay could also run an HTTP server that does nothing but serve the public IP address of the relay. These simple web servers would be exposed via a centralized load balancer. A participant, then, could access the HTTP load balancer to select a relay.

Conclusion

The lack of multicast on the public Internet and the possibility of NAT preclude the development of interoperable DDS applications that are distributed over the public Internet and limit the applicability of DDS in IIoT. The solution described in this article introduces an external service called the RtpsRelay that repeats RTPS messages among a group of application-defined OpenDDS participants. The RtpsRelay is itself a distributed and scalable DDS application.