2024-12-12 13:41:00
richiejp.com
One Dimensional Reversible Automata
I created a demo for the GFXPrim library. It
implements and displays a nearest-neighbor, one-dimensional, binary
cell automata. Additionally it implements a reversible automata,
which is almost identical except for a small change to make it
reversible. The automata is displayed over time in two dimensions,
time travels from top to bottom. Although in the reversible case
time could be played backwards.
The automata works as follows:
- Each cell has a state, which is on or off, black or white,
boolean etc. - At each time step, the state of a cell in the next step is
chosen by a rule. - The rule looks at a cell’s current value and the values of its
left and right neighbors. - There are 23 = 8
possible state combinations (patterns) for 3 binary cells. - A rule states which patterns result in a black cell in the next
time step. - There are 28 = 256
possible rules. That is, 256 unique combinations of patterns.
So a pattern is a 3 digit binary number, where each digit
corresponds to a cell. The middle digit is the center cell, the high
order bit the left cell, the low order bit the right cell. A rule
can be display by showing a row of patterns and a row of next
states.
Above is rule 110
, 0x6e
or
01101110
. It essentially says to match patterns
110
, 101
, 011
,
010
, 001
. Where a pattern match results in
the cell being set to 1 at the next time step. If no pattern is
matched or equivalently, an inactive pattern is matched, then the
cell will be set to 0.
Again note that each pattern resembles a 3bit binary number. Also
the values of the active patterns resemble an 8bit binary number. We
can use this to perform efficient matching of the patterns using
binary operations.
Let’s assume our CPU natively operates on 64bit integers (called
words). We can pack a 64 cell automata into a single 64bit
integer. Each bit corresponds to a cell. If a bit is 1
then it is a black cell and 0
for white. In this case
we are using integers as bit fields. We don’t care about the integer
number the bits can represent.
The CPU can perform bitwise operations on all 64bits in parallel
and without branching. This means we can perform a single operation
64 times in parallel.
If we rotate (wrapped >>
) all bits to the right by one,
then we get a new integer where the left neighbor of a bit is now in
its position. Likewise if we shift all bits to the left, then we get
an integer representing the right neighbors. This gives us 3
integers where the left, center and right bits are in the same
position. For example, using only 8bits:
left: | 0100 1011 | >> |
center: | 1001 0110 | |
right: | 0010 1101 |
|
Each pattern can be represented as a 3bit number, plus a 4th bit
to say whether it is active in a given rule. As we want to operate
on all 64bits at once in the left, right and center bit fields. We
can generate 64bit long masks from the value of each bit in
a given pattern.
So if we have a pattern where the left cell should be one, then
we can create a 64bit mask of all ones. If it should be
zero, then all zeroes. Likewise for the center and right cells. The
masks can be xor’ed (^
)
with the corresponding cell fields to show if no match occurred.
That is, if the pattern is one and the cell is zero or the cell is
one and the pattern is zero. We can invert this (~
) to
give one when a match occurs.
To see whether all components (left, right, center) of a pattern
matches we can bitwise and (&
) them
together. We can then bitwise or
(|
) the result of the pattern matches together to
produce the final values.
If we wish to operate on an automata larger than 64 cells, then
we can combine multiple integers into an array. After performing the
left and right shifts, we get the high or low bit from the next or
previous integers in the array. Then set the low and high bits of
the right and left bit fields. In other words we chain them together
using the end bits of the left and right bit fields.
For illustration purposes, below is the kernel of the
the automata algorithm.
/* If bit n is 1 then make all bits 1 otherwise 0 */
#define BIT_TO_MAX(b, n) (((b >> n) & 1) * ~0UL)
/* Numeric representation of the current update rule */
static uint8_t rule = 110;
/* Apply the current rule to a 64bit segment of a row */
static inline uint64_t ca1d_rule_apply(uint64_t c_prev,
uint64_t c,
uint64_t c_next,
uint64_t c_prev_step)
{
int i;
/* These are wrapping shifts when c_prev == c or c_next == c */
uint64_t l = (c >> 1) ^ (c_prev 63);
uint64_t r = (c 1) ^ (c_next >> 63);
uint64_t c_next_step = 0;
for (i = 0; i 8; i++) {
uint64_t active = BIT_TO_MAX(rule, i);
uint64_t left = BIT_TO_MAX(i, 2);
uint64_t center = BIT_TO_MAX(i, 1);
uint64_t right = BIT_TO_MAX(i, 0);
|=
c_next_step & ~(left ^ l) & ~(center ^ c) & ~(right ^ r);
active }
/* The automata becomes reversible when we use c_prev_state */
return c_next_step ^ c_prev_step;
}
To make the automata “reversible” an extra step can be added. We
look at a cell’s previous (in addition to the current, left and
right) and if it was one then invert the next value. This
is equivalent to xor’ring the previous value with the next.
It is not entirely clear to me what the mathematical implications
are of being reversible. However it is important to physics and
makes some really cool patterns which mimic nature. Also entropy and
the second rule of themodynamics, yada, yada…
The automata definition is taken from Stephen Wolfram’s “A new
kind of science”. He proposes at least one obvious C implementation using arrays of
cells. He also provides a table of binary expressions for each rule.
E.g. rule 90 reduces to just the l^r
binary expression.
It may be possible for the compiler to automatically reduce my
implementation to these minimal expressions.
To see why, let’s consider rule 90 for each pattern.
01011010 = 90
First for pattern 000
.
& ~(left ^ l) & ~(center ^ c) & ~(right ^ r);
active = 0 & ~(0 ^ l) & ~(0 ^ c) & ~(0 ^ r);
= 0.`
Active is zero so the whole line reduces to zero. Now for
001
. Note that 1
here actually means
~0UL
, that is 64bit integer max.
1 & ~(0 ^ l) & ~(0 ^ c) & ~(1 ^ r);
= ~l & ~c & r.
As expected pattern 001
matches
l=0, c=0, r=1
. Let’s just list the remaining patterns
or’ed together in their reduced state. Then reduce that further.
Note that the for
loop in ca1d_rule_apply
will be unrolled by the compiler when optimising for
performance. It’s also quite clear that c_next_step
is
dependant on an expression from the previous iteration or zero. So
all the pattern match results will get or’ed together.
& c & ~r | l & ~c & ~r | ~l & c & r | ~l & ~c & r;
l = l & ~r | ~l & r;
= l ^ r.
See on the top row that
(l & c & ~r | l & ~c & ~r)
or’s
together c
and not c
. So we can remove it.
Then we get an expression equivalent to xor’ring l
and
r
.
In theory at least, the compiler can see that rule
only has 256 values and create a reduced version of
ca1d_rule_apply
for each value. Whether it actually
does is not of much practical concern when the rendering code is the
bottle neck. However it’s interesting to see if the compiler can
deduce the best solution or whether anything trips it up.
Judging from the disassembly from
gcc -O3 -mcpu=native -mtune=native
, it may actually do
this. Additionally it vectorizes the code packing four
64bit ints at a time into 256bit registers and operating on those. I
don’t know which part of the code it is vectorising or how. It’s
possible that what I think is the rule being reduced is something
related to vectorisation.
To render the automata we take the approach of iterating over
each pixel in the image. We calculate which cell the pixel falls
inside and set the color of the pixel to that of the cell. That’s
it.
/* Draws a pixel */
static inline void shade_pixel(gp_pixmap *p, gp_coord x, gp_coord y,
, gp_pixel fg)
gp_pixel bg{
;
gp_pixel pxsize_t i = (x * (64 * width)) / p->w;
size_t j = (y * height) / p->h;
size_t k = 63 - (i & 63);
uint64_t c = steps[gp_matrix_idx(width, j, i >> 6)];
= BIT_TO_MAX(c, k);
c = (fg & c) | (bg & ~c);
px
(p, x, y, px);
gp_putpixel_raw}
GFXPrim makes drawing very simple. The above code is fast enough
for my purpsoses, but a significant improvement can be had. Integer
division is much slower than floating point multiplication on most
newer CPUs. It’s actually much faster (2x at least) on my CPU to
calculate a pair of ratios in floating point, then convert them back
to integers.
However, you may ask why we are even drawing on the CPU in the
first place? This is because GFXPrim targets embedded systems with
no graphics processor. Additionally the CPU may not even support
floating point natively. So integer division may actually be faster
in this case. Still better would be to limit the size of the pixmap
to be 2x larger
than the dimensions of the automata, where x ∈ ℕ then we can use shifts
instead of division.
Keep your files stored safely and securely with the SanDisk 2TB Extreme Portable SSD. With over 69,505 ratings and an impressive 4.6 out of 5 stars, this product has been purchased over 8K+ times in the past month. At only $129.99, this Amazon’s Choice product is a must-have for secure file storage.
Help keep private content private with the included password protection featuring 256-bit AES hardware encryption. Order now for just $129.99 on Amazon!
Support Techcratic
If you find value in Techcratic’s insights and articles, consider supporting us with Bitcoin. Your support helps me, as a solo operator, continue delivering high-quality content while managing all the technical aspects, from server maintenance to blog writing, future updates, and improvements. Support Innovation! Thank you.
Bitcoin Address:
bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge
Please verify this address before sending funds.
Bitcoin QR Code
Simply scan the QR code below to support Techcratic.
Please read the Privacy and Security Disclaimer on how Techcratic handles your support.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.