Skip to content

Processor and memory model

This example models a simple processor-memory system. The processor is a probabilistic instruction trace generator: it generates a stream of instructions, each of which unconditionally performs an instruction fetch and optionally performs a data load, a data store, or both (an atomic load-store). The memory is a byte-addressable array that responds to each request after a fixed latency.

The two modules communicate through a request-response token protocol: the processor sends a request token and blocks until the memory returns a response token. This models blocking memory access — a natural starting point before introducing features such as out-of-order issue or caches.

What this example demonstrates:

  • Request-response protocol with two directed nets
  • Typed multi-field token payloads with sitar::pack / sitar::unpack
  • Blocking communication: wait until (port.peek()) to suspend until a response arrives
  • Probabilistic behavior using rand() in code blocks
  • Module-level initialization of a data array in init

System overview

flowchart LR
    proc["Processor\n(trace generator)"]
    mem["Memory\n(1 KB, latency=LATENCY)"]
    proc -->|"req  token<12>"| mem
    mem  -->|"resp token<8>"| proc

The two nets have capacity 1, enforcing strict alternation: the processor issues one request and the memory returns one response before the next request can proceed.


Token formats

Request  token<12>:  [ type : 4B | addr : 4B | data : 4B ]
Response token<8> :  [ error : 4B | data : 4B ]

The type field encodes the operation:

Value Name Description
0 IFETCH Instruction fetch — always issued for every instruction
1 LOAD Data load only
2 STORE Data store only
3 ATOMIC_LS Combined load and store

The error field in the response is 1 if the address is out of bounds; 0 otherwise.


Top-level structure

The system is parameterized at the System level. The processor parameters LOAD_PCT and STORE_PCT are independent integer percentages (0-100): an instruction has probability LOAD_PCT% of including a load and independently STORE_PCT% of including a store. An instruction with both becomes an ATOMIC_LS.

module Top
    submodule sys : System
end module

module System
    // Processor parameters: load%, store%, instruction count
    submodule proc : Processor<40, 30, 5>
    // Memory parameters: latency in cycles
    submodule mem  : Memory<3>

    net req  : capacity 1 width 12   // Processor -> Memory
    net resp : capacity 1 width 8    // Memory    -> Processor

    proc.req_port  => req
    mem.req_port   <= req
    mem.resp_port  => resp
    proc.resp_port <= resp
end module

Processor

The processor behavior loops over instructions. For each instruction:

  1. IFETCH — always send an instruction fetch to the memory at address pc * 4 and wait for the response.
  2. Probabilistic decision — independently decide whether to load and/or store, using rand() against the percentage thresholds.
  3. Data access — if a data access is needed, pack the appropriate request token, push it (with retry on backpressure), and wait for the response.

The processor uses member variables (declared in decl) for all state so that values persist between code blocks within the same behavior iteration.

module Processor
    parameter int LOAD_PCT  = 40   // % of instructions with a load  (0-100)
    parameter int STORE_PCT = 30   // % of instructions with a store (0-100)
    parameter int NUM_INSTR = 5    // total instructions to simulate

    outport req_port  : width 12
    inport  resp_port : width 8

    include $#include <cstdlib>$

    decl $
    enum MemOpType { IFETCH = 0, LOAD = 1, STORE = 2, ATOMIC_LS = 3 };

    int      pc;           // program counter (instruction index)
    int      instr_count;  // instructions completed

    // Pending memory operation fields
    int      req_type;
    int      req_addr;
    int      req_data;
    int      resp_error;
    int      resp_rdata;
    token<12> req_tok;
    token<8>  resp_tok;
    bool     ok;
    bool     pulled;

    // Decoded access decisions for current instruction
    bool     do_load;
    bool     do_store;
    $
    init $
    pc = 0;  instr_count = 0;
    srand(42);   // fixed seed for reproducibility
    $

    behavior
        do
            // ============================================================
            // Step 1: Instruction fetch (every instruction)
            // ============================================================
            $
            req_type = IFETCH;
            req_addr = pc * 4;   // byte address of instruction word
            req_data = 0;
            sitar::pack(req_tok, req_type, req_addr, req_data);
            ok = false;
            $;
            do
                wait until (this_phase == 1);
                $ok = req_port.push(req_tok);$;
                if (not ok) then wait end if;
            while (not ok) end do;
            $log << endl << "IFETCH  pc=" << pc << "  addr=" << req_addr;$;

            wait until (resp_port.peek());
            $
            pulled = resp_port.pull(resp_tok);
            sitar::unpack(resp_tok, resp_error, resp_rdata);
            if (resp_error)
                log << endl << "  IFETCH ERROR (out of bounds)";
            else
                log << endl << "  IFETCH OK  data=" << resp_rdata;
            $;

            // ============================================================
            // Step 2: Decide data memory accesses for this instruction
            // ============================================================
            $
            do_load  = (rand() % 100) < LOAD_PCT;
            do_store = (rand() % 100) < STORE_PCT;
            int data_addr  = (pc * 4 + 64) % 1024;   // example data address
            int write_data = pc * 10;

            req_addr = data_addr;
            req_data = write_data;

            if (do_load && do_store)
                req_type = ATOMIC_LS;
            else if (do_load)
                req_type = LOAD;
            else if (do_store)
                req_type = STORE;
            $;

            // ============================================================
            // Step 3: Issue data access (if any)
            // ============================================================
            if (do_load || do_store) then
                $
                sitar::pack(req_tok, req_type, req_addr, req_data);
                ok = false;
                $;
                do
                    wait until (this_phase == 1);
                    $ok = req_port.push(req_tok);$;
                    if (not ok) then wait end if;
                while (not ok) end do;

                $
                const char* op_name =
                    (req_type == ATOMIC_LS) ? "ATOMIC_LS" :
                    (req_type == LOAD)      ? "LOAD"      : "STORE";
                log << endl << op_name
                    << "  addr=" << req_addr
                    << "  wdata=" << req_data;
                $;

                wait until (resp_port.peek());
                $
                pulled = resp_port.pull(resp_tok);
                sitar::unpack(resp_tok, resp_error, resp_rdata);
                if (resp_error)
                    log << endl << "  ERROR (out of bounds)";
                else
                    log << endl << "  OK  rdata=" << resp_rdata;
                $;
            end if;

            $pc++;  instr_count++;$;
            $log << endl << "--- instr " << instr_count << " complete ---";$;
        while (instr_count < NUM_INSTR) end do;

        $log << endl << "Processor done: " << NUM_INSTR << " instructions executed";$;
        stop simulation;
    end behavior
end module

Blocking communication

wait until (resp_port.peek()) suspends the processor in place until the memory places a response token on resp. The processor makes no forward progress while waiting. This models a blocking memory interface: one outstanding request at a time.


Memory

The memory module holds a 1 KB byte array initialized in init with the pattern mem[i] = i % 256. It processes one request at a time: pull the request, wait LATENCY cycles, perform the read or write, and push the response.

module Memory
    parameter int LATENCY = 3      // response latency in cycles
    parameter int SIZE    = 1024   // memory size in bytes

    inport  req_port  : width 12
    outport resp_port : width 8

    include $#include <cstring>$

    decl $
    enum MemOpType { IFETCH = 0, LOAD = 1, STORE = 2, ATOMIC_LS = 3 };

    unsigned char mem_array[SIZE];

    int      req_type;
    int      req_addr;
    int      req_data;
    int      resp_error;
    int      resp_rdata;
    token<12> req_tok;
    token<8>  resp_tok;
    bool     pulled;
    bool     ok;
    $
    init $
    // Initialize: mem[i] = i % 256 (a simple known pattern)
    for (int i = 0; i < SIZE; i++)
        mem_array[i] = (unsigned char)(i % 256);
    $

    behavior
        do
            // ---- Wait for a request ----
            // Pull in phase 0 for phase discipline
            $pulled = false;$;
            do
                wait until (this_phase == 0);
                $pulled = req_port.pull(req_tok);$;
                if (not pulled) then wait end if;
            while (not pulled) end do;

            // ---- Simulate access latency ----
            wait(LATENCY, 0);

            // ---- Process the request ----
            $
            sitar::unpack(req_tok, req_type, req_addr, req_data);
            resp_error  = 0;
            resp_rdata  = 0;

            bool in_bounds = (req_addr >= 0 && (req_addr + 4) <= SIZE);

            if (!in_bounds) {
                resp_error = 1;
                log << endl << "MEM ERROR OOB  addr=" << req_addr
                    << "  type=" << req_type;
            } else {
                // Write path (STORE or ATOMIC_LS)
                if (req_type == STORE || req_type == ATOMIC_LS) {
                    memcpy(&mem_array[req_addr], &req_data, 4);
                    log << endl << "MEM WRITE  addr=" << req_addr
                        << "  data=" << req_data;
                }
                // Read path (IFETCH, LOAD, or ATOMIC_LS)
                if (req_type == IFETCH || req_type == LOAD || req_type == ATOMIC_LS) {
                    memcpy(&resp_rdata, &mem_array[req_addr], 4);
                    log << endl << "MEM READ   addr=" << req_addr
                        << "  data=" << resp_rdata;
                }
            }
            sitar::pack(resp_tok, resp_error, resp_rdata);
            ok = false;
            $;

            // ---- Send response in phase 1 ----
            do
                wait until (this_phase == 1);
                $ok = resp_port.push(resp_tok);$;
                if (not ok) then wait end if;
            while (not ok) end do;
        while (1) end do;
    end behavior
end module

For a write (STORE or ATOMIC_LS), the memory copies 4 bytes from the request data field into the array. For a read (IFETCH, LOAD, or ATOMIC_LS), it copies 4 bytes from the array into the response data field. An out-of-bounds address sets error=1 in the response without modifying the array.


Expected output

With LOAD_PCT=40, STORE_PCT=30, NUM_INSTR=5, LATENCY=3, a representative run might look like:

(0,1)  TOP.sys.proc : IFETCH  pc=0  addr=0
(3,0)  TOP.sys.mem  : MEM READ   addr=0  data=50462976
(3,0)  TOP.sys.proc : IFETCH OK  data=50462976
(3,1)  TOP.sys.proc : LOAD  addr=64  wdata=0
(6,0)  TOP.sys.mem  : MEM READ   addr=64  data=67438087
(6,0)  TOP.sys.proc : LOAD  OK  rdata=67438087
(6,0)  TOP.sys.proc : --- instr 1 complete ---
(6,1)  TOP.sys.proc : IFETCH  pc=1  addr=4
...
(29,0) TOP.sys.proc : Processor done: 5 instructions executed
Simulation stopped at time (29,0)

The exact load/store pattern varies with the random seed. With srand(42) the output is deterministic across runs.

Varying the parameters

  • Set LOAD_PCT=0 and STORE_PCT=0 for a pure instruction-fetch workload.
  • Increase NUM_INSTR to generate longer traces.
  • Increase LATENCY to observe the processor spending more cycles waiting for memory responses.
  • To model a non-blocking processor, separate request issue and response collection into two concurrent branches using a parallel block or a second module.