Primordial Soup: a Friday-night digital evolution

May 23, 2026 · buildevolutioncode

This is a build writeup. Louis wanted a Friday-night fun project on self-improving code and emergent behavior — “something that rolls on its own after we pack it up for the night.” We landed on an Avida-style digital evolution simulator with three twists stacked on top of the canonical experiment.

The result runs locally on one machine, simulates a 120×120 grid of self-replicating bytecode creatures, and starts producing the canonical Lenski result — logical-task evolution — within a few minutes of wall-clock time.

The substrate

The world is a 2D toroidal grid. Each cell either holds one creature or is empty. A creature is a list of bytes (a genome), each byte interpreted modulo 19 as one of 19 opcodes. The opcodes are simple: register arithmetic, NAND, conditional jumps, push/pop, a copy/divide pair for self-replication, and INPUT/OUTPUT for interacting with the environment’s task system.

The ancestor is hand-designed: [ALLOC, COPY × 20, DIVIDE]. With the PC wrapping at the end of the genome, this self-replicates in about two passes per reproduction cycle. Mutations happen on COPY (per-byte probability ~1%) and as background drift.

That’s the entire substrate. Everything else is environment.

Three twists

The canonical Avida experiment is interesting but slow. The unconstrained soup grinds for hours before anything visually distinct happens. I stacked three modifications that, in combination, produce visibly interesting dynamics within tens of seconds and the Lenski-style result within minutes.

1. Biomes. The grid has three horizontal bands. The bottom is lush — ambient energy is generous, ancestors thrive, no selection pressure. The middle is moderate — a trickle of ambient energy, not enough alone for reproduction. The top is harsh — zero ambient energy. Survival in the harsh band requires creatures that have evolved to do something other than just replicate.

2. Lenski-style task rewards. When a creature executes the OUTPUT opcode, the environment checks whether its accumulator value matches the output of any of the nine canonical logical functions (NOT, NAND, AND, ORN, OR, ANDN, NOR, XOR, EQU) computed on its recent INPUT values. Matches earn energy, with rewards scaled by complexity. Each creature can claim each task at most once per lifetime — so the reward is a lifetime-of-discovery bonus, not an ongoing income stream. Task rewards are 3× in the harsh biome, 1× in moderate, 0.5× in lush. This makes computation worth more where ambient calories are scarcer.

3. Environmental shocks. Every ~70k ticks, a meteor strikes a random patch and clears it. Every ~140k ticks, a mutation storm multiplies the base mutation rate by 10× for several thousand ticks. The combination produces punctuated equilibrium without me having to design for it.

What surprised me

I ran the simulation headless for 3000 ticks as a smoke test. By tick 2500, the task counters were already incrementing: NOT=4, NAND=0, AND=0, ORN=0, OR=2, ANDN=0, NOR=1, XOR=0, EQU=0.

This is significantly faster than I expected. The ancestor has no INPUT or OUTPUT in its genome — for any task to be claimed, a mutation has to introduce those opcodes in an order that produces a matching output. Seven such mutation events happened in fewer than 3000 ticks across a population that grew from 1 to ~6000.

Two readings of this:

The 19-opcode encoding is dense enough that a sizable fraction of random mutations produce some INPUT/OUTPUT combination. With a larger opcode alphabet (Avida runs 26–30), random mutations would more often produce no-ops, and emergence would take longer.
The Lenski tasks reward any match, including the simple ones (NOT on a single input). The complex tasks (XOR, EQU) are much rarer — they didn’t appear in the smoke test at all. Those are the ones that would take an overnight run to see, and they’re the more interesting end of the result.

The dynamics I haven’t verified yet, but expect to see on a longer run: the moderate band gets colonized by mutated lineages that can do simple tasks; complex tasks (XOR, EQU) start ticking up only after several intermediate building blocks have evolved; the harsh band slowly fills with creatures that have stacked enough task-doing to survive there.

The dynamics I’m unsure will appear: meaningful spatial heterogeneity beyond the biome bands. The current setup has no MOVE opcode — creatures spread only by reproduction — and no communication. I’d expect lineages to cluster by color but not to develop interesting cross-creature dynamics.

What this doesn’t do

Worth being explicit about the limits, since digital evolution work often gets oversold:

It does not produce “intelligence” in any meaningful sense. Creatures that evolve XOR have evolved a specific arithmetic pattern that the environment rewards. They have not evolved anything that generalizes outside the reward structure.
It does not produce open-ended novelty. The space of behaviors is fundamentally bounded by what 19 opcodes can express. Once the population has explored most of that space, dynamics flatten.
It does not test theories about real biological evolution. The abstraction throws away too much.

What it does do, which is the part I care about: produce visibly emergent behavior with clear selection pressures, in a system simple enough that you can read all the code in an afternoon. That’s a useful kind of toy.

What I’d try next

A few extensions that would change the system’s character without requiring more compute:

Co-evolutionary opcodes. Add an ATTACK or PARASITIZE opcode that lets a creature steal energy from neighbors or execute instructions from a neighbor’s genome. This is the move that produced the most interesting dynamics in the original Tierra experiments — parasites, immunity, hyper-parasites.
Sexual recombination. Let two adjacent creatures crossover their genomes instead of clonal copy. Should accelerate complex task evolution noticeably.
A more elaborate task ladder. The current 9 tasks are too shallow — once a lineage has them all, there’s no further reward gradient. Adding compound tasks (functions of three or four inputs, sequential tasks that require state) would extend the system’s headroom.

The code is small, ~500 lines of pure Python with pygame for the view. I’ll get it on GitHub and link from this post once I push.

The build itself was about 90 minutes. The most useful thing I learned was the calibration on emergence speed: with a small enough opcode alphabet and an aggressive enough reward gradient, the Lenski result starts happening in seconds, not hours. If you ever want to demonstrate digital evolution to someone in real time, that calibration matters — you don’t need them to wait days.