The Prospero Challenge

2025-03-24 05:04:00
www.mattkeeter.com

The Prospero Challenge

prospero.vm
is a plain-text file containing 7866 math expressions.

It begins as follows:

# Text of a monologue from The Tempest
_0 const 2.95
_1 var-x
_2 const 8.13008
_3 mul _1 _2
_4 add _0 _3

…and continues in that vein for many, many lines.

If you evaluate the expression with (x,y) values in the ±1
square, then color pixels black or white based on their sign, you get the
following image:

The “challenge” is simple: render the image as quickly as possible.

A basic renderer is 28 lines of Python, using Numpy for pixel parallelism:

import numpy as np

with open('prospero.vm') as f:
    text = f.read().strip()

image_size = 1024
space = np.linspace(-1, 1, image_size)
(x, y) = np.meshgrid(space, -space)
v = {}

for line in text.split('\n'):
    if line.startswith('#'):
        continue
    [out, op, *args] = line.split()
    match op:
        case "var-x": v[out] = x
        case "var-y": v[out] = y
        case "const": v[out] = float(args[0])
        case "add": v[out] = v[args[0]] + v[args[1]]
        case "sub": v[out] = v[args[0]] - v[args[1]]
        case "mul": v[out] = v[args[0]] * v[args[1]]
        case "max": v[out] = np.maximum(v[args[0]], v[args[1]])
        case "min": v[out] = np.minimum(v[args[0]], v[args[1]])
        case "neg": v[out] = -v[args[0]]
        case "square": v[out] = v[args[0]] * v[args[0]]
        case "sqrt": v[out] = np.sqrt(v[args[0]])
        case _: raise Exception(f"unknown opcode '{op}'")
out = v[out]

with open('out.ppm', 'wb') as f: # write the image out
    f.write(f'P5\n{image_size} {image_size}\n255\n'.encode())
    f.write(((out

On my machine, this takes about 15 seconds to produce a 1024×1024 image.

(Spoilers: it also allocates 60+ GB of RAM for intermediate
results, so consider reducing the image size before running it on your own
machine)

“But wait!”, you say. “There’s so much obvious room for improvement! You
could pre-parse the expression, or use numba, or run it on
the GPU, or use LLVM …”

Yes. I would like you to do those things, and then tell me about
them!

This page has been seeded with a few of my projects, but I’d like for
it to grow to contain different experiments by other people.

A few ideas to think about:

The Python implementation above is a simple interpreter. What would
an optimized interpreter look like? How about a compiler?
How can we take advantage of the expression’s mathematical
structure?
Start-up time versus steady-state performance. What kinds of
precomputation help, and how long does it take? (Obviously, you
could precompute the entire image, but that’s against the spirit of
the challenge)
Along those lines, what happens if the user is making changes to the
model? Strategies which rely on expensive precomputation may not be
suitable for interactive use.
How does rendering performance change with image size? A 1024×1024
image has 4× as many pixels as 512×512; does it necessarily
take 4× longer to render?
Panning and zooming is a simple transform of the incoming
(x,y) coordinates. How does your system handle
interactive viewing of the model?

Submitting your experiments

Feel free to submit both successful and unsuccessful experiments via
email to

matt.j.keeter@gmail.com,
and I will add them to this page.

The ideal submission would be a blog post or code repository, along
with a few-sentence description that you’d like included on this page. I’d
also be happy to include hero images or video embeds.

The description should call out any particularly promising results,
clever ideas, or interesting tools! Don’t get too fixated on
absolute speed; the goal is to explore ideas, and that can absolutely be
done without breaking any performance records!

Is there a prize?

Besides eternal fame and glory?

If you submit an interesting write-up, I will buy you a coffee or drink
if we happen to be in the same city. It will be particularly easy for
Cambridge / Boston / Somerville residents to receive their prize; I make no
guarantees for submissions from farther afield.

Background material

Interval Arithmetic and Recursive Subdivision for Implicit Functions and
Constructive Solid Geometry (Duff ’92) is a great introduction to common
techniques. In particular, the combination of interval arithmetic, spatial
subdivision, and expression simplification remains the basis for modern
renderers.

These ideas are also presented in §3.7 of my thesis,
Hierarchical Volumetric Object Representations
for Digital Fabrication Workflows (2013),
although my personal implementations have evolved since then (see below for
examples).

Finally, I gave a talk titled
Implicit Surfaces & Independent Research
(2025), which covers all of this material and hews closely to my
current implementations. The audience was CS undergraduates in their senior
year, so it’s intended to be approachable!

Gallery

Massively Parallel Rendering of Complex Closed-Form Implicit Surfaces

Matt Keeter, 2020

Paper homepage

The MPR implementation compiles the expression down to a register-based
bytecode, then executes it with a small interpreter running in a CUDA
kernel. The interpreter evaluates the expression using
interval arithmetic
on many spatial regions in parallel, then simplifies the expression,
subdivides the active regions, and recurses. Below a certain size, we swap
over to per-pixel evaluation, also in parallel on the GPU.

Expression simplification is particularly important for performance: by the
time we evaluate individual pixels, the expression is 200× smaller!

Most of the hard work went into making a deeply recursive algorithm (with
heterogeneous workloads in each branch) into something more GPU-friendly.

Rendering a 1024×1024 image takes 3.9 milliseconds using a Tesla
V100, which was a high-end GPU back in 2020 (and is still absurdly
powerful). Performance is basically flat through 4096×4096, indicating that
we’re far from saturating the GPU for this kind of 2D rendering (the paper
also describes a similar strategy for 3D rendering).

Unfortunately, I don’t have timing numbers for setup time, but the
kernels are generic interpreters, i.e. not specialized to a
particular expression.

Fidget

Matt Keeter, 2025

Project homepage

Fidget uses the same broad strategy of interval evaluation and expression
simplification, but runs on the CPU and compiles expressions down to
machine code with a JIT compiler.

Running on an M1 Max CPU (2021), it renders a 1024×1024 image in
6.3 milliseconds, with about 11 ms of setup time:

$ cargo run -pfidget-cli --release -- render2d -i models/prospero.vm \
    -s1024 --eval=jit -N500 --mode mono
[2025-03-23T21:26:44Z INFO  fidget_cli] Loaded file in 4.994333ms
[2025-03-23T21:26:44Z INFO  fidget_cli] Built shape in 6.04525ms
[2025-03-23T21:26:47Z INFO  fidget_cli] Rendered 500x at 6.331644 ms/frame

Scaling up to 4096×4096, it takes 57 milliseconds, which is about
9× slower; this is sublinear scaling, because it’s 16× more pixels.

There are a few interesting aspects to this project:

The JIT compiler’s architecture is specialized for very fast
compilation, because it’s running many times to evaluate a single
frame!
There’s also a VM interpreter, which is almost as fast: 7.5
ms for a 1024×1024 image
Everything except the JIT compiler can be run in WebAssembly, albeit
with a performance penalty

Prospero challenge, now with more garbage collection

Max Bernstein, 2025

Blog post

Max addresses the “60 GB of intermediate results” issue by adding garbage
collection to the Python interpreter, seeing a quick 4× speedup (and a
dramatic reduction in RAM usage, down to about 1 GB).

The post also shows off CuPy, a drop-in
replacement for Numpy which offloads computation to the GPU. Simply
replacing the import statement yields a further 6.6× speedup!

That’s all for now; but you can help by submitting your own experiments!

Source Link

Keep your files stored safely and securely with the SanDisk 2TB Extreme Portable SSD. With over 69,505 ratings and an impressive 4.6 out of 5 stars, this product has been purchased over 8K+ times in the past month. At only $129.99, this Amazon’s Choice product is a must-have for secure file storage.

Help keep private content private with the included password protection featuring 256-bit AES hardware encryption. Order now for just $129.99 on Amazon!

Start your free Amazon Prime trial
today and unlock unlimited streaming and more!

Help Power Techcratic’s Future – Scan To Support

If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.

As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!

BITCOIN

Bitcoin Logo