More efficient BinIo implementation

MarvinW · ‎08-11-2023

In BinIo, most of the methods are implemented with a CASE structure and a single case for each IO. According to my understanding the compiler turns this into very long chains of IF-ELSIF-ELSIF-ELSIF-ELSIF... And for each IO all conditions need to be checked until the right IoIndex is found. For stations with many hundred IOs this is actually costly.

A faster implementation is to create an array with all data that is currently accessed through CASE structures. That's the variable itself (I used a pointer to the original BOOL), the two event numbers and the additional text. Then, each of the methods uses the IDX_ variables to access the correct entry from the array to get the requested data directly. The compiler turns this into an address calculation, which is basically a multiplication and an addition. This only has complexity O(1), instead of the O(n) of the current implementation. Considering that having n IO in your station also means you are probably going to access these n times per PLC cycle, the respective total complexities are O(n) for my proposal and O(n²) of the current implementation.

I modified the BinIo export template to test this. See attachment. It also needs a new STRUCT like this:

TYPE BinIoItemStruct :
STRUCT
  pVariable : POINTER TO BOOL;
  EventS0   : DINT;
  EventS1   : DINT;
  AddText   : EVENTADDLTEXT_T;
END_STRUCT
END_TYPE

My station only has 258 IO in total (inputs, outputs and flags) and doesn't use them very much. My CPU usage went down from 48 to 47. But I simulated a bigger station by adding some more calls to BinIo.GetState and SetState and the difference was 51 to 48. So it does make a difference.

Are there any reasons that speak against this solution? I guess how big the effect on speed is also depends on how the most commonly used objects are implemented, e.g. whether they use GetState/SetState or just grab the addresses with GetAddress and then use these directly. And of course what the station programmer is doing in the application directly.

I also tested a version where the BOOL variables were directly in the array of the structure instead of a pointer, because that should be even faster, but it wasn't noticable compared to the pointer approach. Also, that would not be a compatible change anymore. This is the V2 in the attached zip.

MaBi · ‎10-18-2023

Hi MarvinW.

Some thousand I/Os should be absolutely no problem for a standard PLC (e.g. Rexroth VPB40).

Are you sure that the compiler really creates many IF-comparisons in the machine code?

All compilers I know create an internal jump table for CASE instructions. Maybe you are right and TwinCAT (CoDeSys) is just not not intelligent enough.

I wanted to know what is the fact, so I created a project with two methods, first using CASE, the second ELSIF.

The result is for 1000000 CASEs/IFs per plc cycle:

CASE ~80µs

ELSIF ~550µs

MarvinW · ‎10-19-2023

That's really interesting to know. How would such a jump table look like for a switch statements whose cases are 1, 50 and 1000 though? It sure won't build a table with 1000 entries just so it can look up the jump target. So it needs to check individually again. Doing a binary search on a sorted table is still the best the compiler can do, at least from what I can come up with. Curious to learn more.

In general I don't trust the TwinCAT compiler to do a lot of optimizations because basically every program is running in super debug mode all the time. But I don't know any information for a fact regarding this topic.

If your test shows that CASE is faster than ELSIF and my test shows that a self-made table is still faster than CASE, even in a case where the jump targets are all adjacent, then it seems the compiler really creates a table but not quite as efficient as it can?

My main concern is not that a PLC cannot handle many thousand IOs. My concern is that we should keep an eye on performance every now and then, especially with software that is running on many instances. If you have 1000 Nexeed machines with BinIo in the field and you save just 1% CPU load of all of them, then you saved 10 PLCs worth of computing power. Just by changing an export template that ships with your engineering. BinIo is probably not the biggest impact BCI can achieve, but it's one that I was able to do experiments myself.

nexidator · ‎10-19-2023

This is just speculation because I don't know either what exactly the compiler is doing, but maybe something like a hashtable is used internally. Hashtable access has a complexity of O(log n), which seems a possible explanation for the times measured.

MarvinW · ‎12-07-2023

There was a very subtle bug in the template. The event numbers were missing a minus sign. For throwing events this has no consequences (maybe the HMI looks up events by absolute?). But in my application I needed to calculate the event hash like the PlcError DDL event does, and this lead to incorrect hash numbers. I updated the attachment.

More efficient BinIo implementation

More efficient BinIo implementation

Still looking for something?