In IoT applications the transferred data between devices and the cloud is a major cost factor. Especially when the devices are connected over a mobile network, data transfer is quite expensive. Moreover when it comes to a lot of devices, even the traffic on the cloud site might have an impact on the costs. Therefore it is crucial to think about the message payload size.

The first step is to find a suitable and compact communication protocol. Of course MQTT does a good job, as the overhead is very small compared to HTTP for example. However, the payload itself is not defined by the communication protocols. In this article we want to look at the payload sizes only and neglect the protocol overhead. There are two common options for IoT message payloads: JSON and low-level byte protocol. Of course we do not look into XML or other even more verbose protocols. That goes without saying. While JSON is a standard and pretty compact protocol, a low-level byte protocol usually is designed for the specific use case and therefore the most efficient option, which on the other hand has some drawbacks. In this article we will compare message sizes for an example when using JSON and a specific bye protocol in conjunction with compression.

Light example

We will use a light as an example IoT device. The light has to transfer its serial number, the temperature, the power, the dim level and the current on state. So we have the following 5 values:

  • Serial number: String with 12 characters (e.g. LXA34-691E90)
  • Temperature: float (e.g. 23.5 °C)
  • Power: float (e.g. 20W)
  • Dim level: unsigned short (e.g. 50%)
  • On: boolean (e.g. true)

The JSON representation could look like this:

{
   "serialNumber": "LXA34-691E90",
   "temperature": 23.5,
   "power": 50.5,
   "dimLevel": 75,
   "on": true
}

When removing all spaces and line breaks this leads to an overall message size of 87 bytes.

In contrast to the JSON representation, we can also design a specific byte protocol. We generally recommend to use Google Protocol Buffers, which generates a parser and serializer from a specification. For simplicity in this blog post we manually define our own and smallest possible protocol. The first 12 Bytes are reserved for the characters of the serial number, followed by 4 bytes for the temperature float, additional 4 bytes for the power float, 2 bytes for the dim level unsigned short and 1 byte for the boolean, whereas true is decoded as 1 and false as 0.

The hex coded bytes for the same values as in the JSON is given below:

4C 58 41 33 34 2D 36 39 31 45 39 30 41 B8 00 00 42 48 00 00 00 4B 01

In the binary version the message is 23 bytes small. This means the byte payload is approx. 4 times smaller than the JSON payload.

Of course this mainly results from the JSON keys like „temperature“, „serialNumber“ etc. The big advantage of the JSON payload is, that it is human readable and can be extended with new properties without threatening the backward compatibility. A compromise could be to use shorter keys, for example abbreviations:

{
   "sn": "LXA34-691E90",
   "temp": 23.5,
   "w": 50.5,
   "dim": 75,
   "on": true
}

When again removing all spaces and linke breaks we get a size of 61 bytes, which is reduction of approx. 30%. But it is still three times bigger than the plain byte payload and it is not that fully self-explaining anymore compared to the first JSON. For further calculations we use the original and more verbose variant.

Remark: Numbers are serialized as characters in JSON. That means the value 23.5 takes just 4 bytes, which is equal to the float byte representation. But the precision of 32 bit float is higher, so with 4 bytes you can also decode 23.54356, which would consume additional 4 bytes in JSON. On the other hand a „0“ value would just be 1 byte in JSON, where it is still 4 bytes in the byte format. So the resulting payload size even depends on the concrete data and the precision that is required.

Compression

Compression is able to minimize the size of the payload usually without a lose of information. That basically works by searching for duplications in the text and replacing them by a shorter value. There are a lot of different compression algorithms, which all have their strengths and weaknesses. In this article we use gzip, as it is widely spread and gives good results for different types of data.

If we look at our light example data, you can see that there is almost no repetition in the whole message, neither in the keys, nor in the values. Because of that the compression effectively can not replace anything, but brings additional overhead. Gzipping the payload results in 101 bytes (14 bytes more) for the JSON payload and 43 bytes (20 bytes more) for the bytes payload.

So one major conclusion already is that compression does not bring any benefit if lots of different values for a device are sent. Compression itself also brings an overhead in producing and processing the payload. While it might be negligible on the cloud site, the device might be more constrained and compressing a big payload consumes a lot of CPU and memory. So you might have to discuss if its worth to spend a few bucks more on the hardware in order to reduce costs coming from the traffic.

Multiple lights in one message

A quite typical use case is sending values for multiple lights within one message. This might result from the fact, that usually not every device in an IoT application is directly connected to the Internet. In this situation a gateway is used, which connects devices over a more low-level protocol like ZigBee or KNX. The gateway connects to the Internet and to the IoT cloud and transfers the data for all lights. Thats how Philipps hue works for example. The gateway might collect the data for all lights and send one message which contains all values.

In JSON we can create an array of the previously created JSON objects, which looks like this:

[
   {
      "serialNumber": "LXA34-691E90",
      "temperature": 23.67,
      "power": 40.32,
      "dimLevel": 44,
      "on": true
 },
 {
      "serialNumber": "LXA34-691E91",
      "temperature": 24.59,
      "power": 10.12,
      "dimLevel": 3,
      "on": true
 },
 // [...]
]

If we assume one message contains 10 lights, this results in a message size of approx. 902 bytes for random values, which is 87 bytes * 10 + some additional overhead for commas and two braces.

For the byte payload we need to add one unsigned short (2 bytes) upfront, which tells the parser how many values are following. Besides that the payload is exactly the same, which results in 23 bytes * 10 + 2 bytes = 232 bytes. So the ratio is the same: the byte format is still 4 times (26%) smaller than the JSON.

Compression of multiple lights

But if we now apply compression we get a totally different result. The JSON payload contains a lot of repetition due to the keys, which are now repeated 10 times. Moreover the way we build up the serial number also gives some compression potential, as half of the serial number specifies the product in this example (LXA34-) and just the rest is randomized. So when gzipping both messages we get the following result:

  • JSON payload with 10 lights gzipped: 305 bytes (34% of the original size)
  • Bytes payload with 10 lights gzipped: 190 bytes (82% of the original size)

Moreover the gzipped JSON is just 73 bytes bigger (24%) than the ungzipped bytes and 115 bytes (38%) bigger than the gzipped bytes message. Within the byte payload only half of the serial number can be replaced, because all other values are randomized. That is why the compression of the byte payload is not that good as for the JSON payload.

Equal light values with compression

In the previous example we had fully randomized values. If we now remove the randomization and assume all lights have the same values, the result is quite interesting again. One could argue that this is an artificial case, but in our example it is likely that all lights are turned off, which would result in zero values for dim level, power and on state. So for the following samples only the temperature value and still half of the serial number is randomized. Let´s look at the results:

  • JSON payload with 10 lights: 868 bytes
  • JSON payload with 10 lights gzipped: 167 bytes
  • JSON payload with 50 lights: 4385 bytes
  • JSON payload with 50 lights gzipped: 374 bytes
  • Bytes payload with 10 lights: 232 bytes
  • Bytes payload with 10 lights gzipped: 66 bytes
  • Bytes payload with 50 lights: 1192 bytes
  • Bytes payload with 50 lights gzipped: 151 bytes

It is obvious that this example is perfect for compression, because gzip can replace big parts of the overall message. So the compression ratio is very high. This leads to the situation that the compressed JSON is even smaller than the uncompressed equivalent byte payload. But in contrast to the randomized values, compression of the byte payload for equal values also brings an enormous benefit by a compressed payload that is 12% of the original byte payload.

Raw results

All calculated payloads:

Payload name Bytes
light.bytes 23
light.bytes.gzip 43
light.compact.json 61
light.compact.json.gzip 76
light.json 87
light.json.gzip 101
lights.equal.10.bytes 232
lights.equal.10.bytes.gzip 66
lights.equal.10.json 868
lights.equal.10.json.gzip 167
lights.equal.50.bytes 1200
lights.equal.50.bytes.gzip 151
lights.equal.50.json 4300
lights.equal.50.json.gzip 374
lights.random.10.bytes 232
lights.random.10.bytes.gzip 190
lights.random.10.json 902
lights.random.10.json.gzip 305

Conclusion

So what is the conclusion? Should I use JSON or pure bytes as payload and should I compress it? Right, you already guess it: It depends.

If every byte counts you should go with a specific byte protocol, as there is nothing more efficient when it comes to payload size. Compression only brings a benefit, if lots of values are equal, otherwise you should skip the overhead of compression of a byte payload. But we highly recommend to not design and implement a protocol and parser by your own. There are great tools like Google Protocol Buffers which are designed for this specific purpose.

The big advantage of JSON is its human readability and that the protocol can be extended without breaking backward compatibility. If other developers should use your communication protocol as API, its also easier to have JSON rather than a specific byte payload. When you have lists of values and repeating keys you can get an high benefit of compressing JSON payloads, which brings you near to the size of byte payloads.

Basically look at your case and see wether you are willing to pay the overhead of JSON.

Final remark: We only looked at payload sizes. Of course there is also a difference when it comes to serializing and deserializing JSON compared to raw byte protocols. It is quite likely that the processing of JSON is more time-consuming and resource intensive rather than a byte protocol. Please keep that in mind, too, when trying to make a decision in your project which payload to use.

The source code for generating the different payloads can be found at github.