This meshes well with my own observation while working on flow-based programming.

Our implementation was effectively single-threaded (greenthreaded on Racket), and many connection and communication patterns should be possible for the engine to optimize down to simple function calls.

When profiling the fractalide engine, the load code of which is actually itself built as an FBP graph, most of the time was spent sending and receiving messages, and many services are written as "read IN, do thing, write OUT", whereas the consumer is written as "write SERVICE_OUT, read SERVICE_IN", where consumer.SERVICE_OUT -> service.IN, service.OUT -> consumer.SERVICE_IN, which in the end is just a very expensive synchronous function call.