Parallel Execution¶
Sitar supports shared-memory parallel simulation via OpenMP. The two-phase execution model maps naturally onto a parallel loop: all modules are independent within each phase (no module reads what another writes in the same phase), so every module in a phase can run on a separate thread. A single barrier between phases is all that is needed for correctness.
This page covers how to enable parallel execution, how to measure speedup, and how to customize the mapping of modules to threads.
How Parallelism Works in Sitar¶
The default simulation loop (in sitar_default_main.cpp) runs as follows in parallel mode:
for each (cycle, phase):
#pragma omp for -- each thread runs a subset of modules
#pragma omp barrier -- all threads synchronize before next phase
The flattenHierarchy function collects all modules into a flat list. OpenMP distributes that list across threads using a static schedule (round-robin by default). The barrier after each phase enforces the read/write discipline: no module begins the next phase until all modules have completed the current one.
Because modules are independent within a phase by construction (the two-phase rule prohibits same-phase read-write conflicts), no locks or shared state are needed inside the loop.
A Simple Example¶
The following model has four modules connected in a clique. Each module burns approximately 1 ms of CPU time per phase using a busy-wait loop, and sends a token to a randomly chosen neighbour every COMM_INTERVAL cycles.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | |
The communication structure:
flowchart LR
subgraph TOP
subgraph sys["sys (System)"]
A["a (Node)"]
B["b (Node)"]
C["c (Node)"]
D["d (Node)"]
A <-->|"ab / ba"| B
A <-->|"ac / ca"| C
A <-->|"ad / da"| D
B <-->|"bc / cb"| C
B <-->|"bd / db"| D
C <-->|"cd / dc"| D
end
end
Compiling and Running¶
Compile without OpenMP for a serial baseline:
Then compile with OpenMP and compare:
The time command reports wall-clock elapsed time. With 4 modules each doing ~2 ms of work per cycle (1 ms per phase), the serial run takes approximately 20 cycles x 2 ms = 40 ms. With 4 threads you should see close to 4x speedup, approaching 10 ms.
Setting the Number of Threads¶
The number of threads is controlled by the OMP_NUM_THREADS environment variable:
export OMP_NUM_THREADS=1 # effectively serial
export OMP_NUM_THREADS=2
export OMP_NUM_THREADS=4 # one thread per module for this example
Set OMP_NUM_THREADS to the number of modules (or a divisor of it) for best load balance with the default static schedule.
Customizing Module-to-Thread Mapping¶
By default, Sitar flattens the entire hierarchy and distributes modules round-robin across threads. For some models you may want a specific static assignment: for example, placing communicating modules on the same thread to reduce synchronization overhead, or isolating a heavy module on its own thread.
To do this, supply a custom main.cpp at compile time:
The key function in the custom main is the module list construction. Instead of calling flattenHierarchy, you build the list explicitly:
vector<module*> modules_to_run;
// Group 0: a, b, c -- will land on thread 0 with OMP_NUM_THREADS=2
modules_to_run.push_back(&TOP->sys.a);
modules_to_run.push_back(&TOP->sys.b);
modules_to_run.push_back(&TOP->sys.c);
// Group 1: d -- will land on thread 1
modules_to_run.push_back(&TOP->sys.d);
With OMP_NUM_THREADS=2 and schedule(static), OpenMP assigns the first half of the list to thread 0 and the second half to thread 1. The rest of the parallel loop is identical to the default.
The full custom main for this example is at docs/sitar_examples/5_parallel_custom_main.cpp.
Important Considerations¶
Logging in Parallel Mode¶
In parallel execution, multiple modules run concurrently. Writing to a shared output stream (such as std::cout) from multiple threads simultaneously will interleave log lines unpredictably. Sitar handles this by assigning each module its own log file in parallel mode:
string log_name = modules_to_run[i]->hierarchicalId() + "_log.txt";
logstreams[i]->open(log_name.c_str());
modules_to_run[i]->log.setOstream(logstreams[i]);
This produces one log file per module (e.g. TOP.sys.a_log.txt, TOP.sys.b_log.txt, etc.), each written exclusively by one module. The files can be inspected individually or merged and sorted by timestamp after simulation.
Warning
Never share a single output stream across modules in parallel mode. Even if individual << calls are individually atomic, multi-field log lines will interleave across threads, producing unreadable output.
Random Number Generation¶
If your modules use random number generation, each module must use its own independent random number generator. Sharing a single generator across threads without locking causes data races and non-deterministic (and incorrect) results.
The recommended pattern is to declare a generator as a member of each module and seed it uniquely before simulation starts:
Then in the main, before the parallel loop, assign a unique seed to each module:
Inside the module behavior, use seed (combined with this_cycle for additional variation if needed) to initialize calls to srand / rand or any other generator:
Warning
Do not use a global srand call or a shared rand() in parallel simulation. Each execution thread must have its own generator state, seeded independently.
What's Next¶
Return to the Language and Examples section to learn the full Sitar modeling language, or jump directly to Advanced Examples for complete working models.