Square wave
A synthesizable square wave is two states that toggle on every clock. The left panel is a cocoa coroutine you would write directly; the right what the compiler will generate.
Coroutines are modelled as initial processes that drive outputs and wait for events. Since a state machine cycles forever, the process must either loop with forever or idle after completion.
Combinational outputs are driven with blocking assignments and work identically to always_comb blocks. Each @(posedge clk) transitions to a new state; the lines above it drive the previous state's outputs. When the clock ticks, control moves past the @ and the next chunk "runs".
Adjusting the duty cycle
To make the wave 75% duty, the state machine should cycle every four clocks and hold o high for three of them. In the cocoa source we just add more @(posedge clk); statements — each one inserts a new state.
Notice we don't reassign o = 1'b1 in the new states. Outputs are held across states until reassigned; the transpiler tracks held outputs and drives them combinationally in every state. No latch is inferred.
Held outputs are propagated combinationally, not latched. Inputs that need to survive a clock edge — covered later — do require a flop.
Waiting on a request
State machines rarely advance unconditionally. A guarded event — @(posedge clk iff cond) — only fires when the condition is true on the clock edge. In source terms, the process suspends in the current state until the guard is satisfied.
Capture, then emit
Sample data_in when valid_in arrives, hold it one cycle, then drive the captured byte out with valid_out high.
data_q is written in one state and read in the next. Cocoa notices the cross-edge read and turns it into a flop in the generated module. Locals that are only touched within a single state stay as combinational temporaries; no flop is inferred.
A local that lives across a clock edge becomes a flop automatically. There's no need to declare it separately or write a manual always_ff block.
Branching outputs
After go, drive a different value on q depending on mode. An if/else between two clock waits selects what gets driven during the second state.
This is the simplest branch form: an if/else with no @ inside. Both branches drive the same signals to different values and the state's output expression becomes a ternary. Branches can contain their own @ — see the reference for the diverging form — but the no-wait variant is enough here.
N-cycle delay
After go, wait N clocks then pulse done. Useful for fixed-latency sequences — config-register write windows, bus turnaround timing, and the like.
repeat (N) @(posedge clk); collapses to a single FSM state with an auto-generated counter (cyc0_q in the output). The state self-loops while the counter is below N-1 and exits when it hits the bound. No unrolling: the cost is one state plus a $clog2(N)-bit counter regardless of N.
N can be a parameter; the counter width follows from it.
Burst write
After go, drive N consecutive bus writes — w_addr walks 0..N-1 and w_data mirrors the index. The kind of inner loop you'd write to initialise a small register file.
The counter i is a module-level local, so the transpiler exposes it as a real flop you can reference in the body — here driving both w_addr and w_data. The loop body must contain a clock wait on every path, which is why @(posedge clk) sits inside the begin/end.
Busy wait
After start, spin while busy_in is high. When it clears, pulse ready for one cycle.
while (cond) @ produces a state that self-loops while the condition holds and falls through when it clears. No counter is generated — the exit condition isn't bounded by an iteration count. To bound it, write while (cond && cyc < TIMEOUT) and pair it with a counter.
Tasks for reuse
A bus write is two cycles: drive bus_req, bus_addr, and bus_data for one clock, then deassert. Doing it twice inline duplicates four lines. Wrap it in a task automatic and call it twice.
Tasks are inlined at compile time with arguments substituted in. The two call sites in the generated FSM appear as states S1..S4 — the same code laid out back to back with the literal arguments in place. Tasks accept input formals only, and the call graph must be acyclic.
Boot init sequence
After boot asserts, write four (addr, data) pairs to the config bus and pulse init_done when the writes finish. The bring-up sequence you'd otherwise build out of a hand-coded state machine plus a counter and a ROM lookup.
The source stays linear top-to-bottom; the generated FSM grows one block per call. Cost scales with the number of phases, not with surrounding control logic.
Init sequences are the second place coroutines pay off, after protocols. The hand-written equivalent needs an explicit counter, an output mux, and a termination test. Here the writes appear in the order they happen on the bus.
UART transmitter
Send a UART frame: start bit (0), eight data bits LSB-first, stop bit (1). Each bit holds for CLKS_PER_BIT cycles. The body reads top-to-bottom as exactly that — send_bit(0), eight calls for data_reg[0] through data_reg[7], then send_bit(1).
send_bit is called ten times across the frame. Naively inlining would give ten distinct states with nearly identical contents. The transpiler collapses repeated call sites of the same task into a single state controlled by a small program-counter register pc_q. The generated FSM has just two states — S0 (idle) and S1 (sending). S1's tx output is a case (pc_q) mux selecting the bit for each call site. The bit-period delay (repeat (CLKS_PER_BIT) @) lives inside send_bit and produces the inner counter cyc0_q.
That collapse is why the generated module fits on screen. Without it, it's roughly 5× the size.
Further reading.
tests/uart/— adds an RX coroutine running concurrently. Two independentinitialblocks shareclkandrst; each compiles to its own state machine.tests/axi_lite_write/— the full AXI-Lite write handshake. Same set of constructs at protocol scale.