2025-06-23 14:10:00
veitner.bearblog.dev
In this blogpost we will show how to perform very fast calculation of the Fibonacci sequence using GPU programming. In this blogpost we will employ Thrust an NVIDIA library which uses concepts from modern C++ to make GPU programming easy.
Introduction
Scan is one of the fundamental examples for parallelizable algorithms.
If you are interested in the foundations of the algorithm I refer you to a previous blogpost where I implemented scan in CUDA.
The scan is the operation which transforms
to
We call the inclusive iff
and exclusive iff
where is an associative binary operator.
Simple exlusive scan in thrust
This is how to perform an exlusive scan using thrust.
thrust::device_vectorint> in{1,2,3};
thrust::device_vectorint> out(3);
By default we apply the plus operator. Here we use 0
as the initial element .
thrust::exclusive_scan(thrust::device,
in.begin(),
in.end(),
out.begin(),
0
);
This will output
We can similary apply the multiply operator:
thrust::exclusive_scan(thrust::device,
in.begin(),
in.end(),
out.begin(),
1,
[=]__host__ __device__(int cur, int prev) { return cur * prev; }
);
where we use a simple lambda function to define the binary operation.
This will output
Matrix scan operation
Thrust is very flexible and the matrix multiplication is associative. This allows us to perform a scan using matrices!
// 2 x 2 Matrix
// a00, a01, a10, a11
using Matrix = thrust::tupleint, int, int, int>;
...
Matrix scale_two{thrust::make_tuple(2, 0, 0, 2)};
Matrix identity_matrix{thrust::make_tuple(1, 0, 0, 1)};
...
thrust::exclusive_scan(thrust::device,
in.begin(),
in.end(),
out.begin(),
identity_matrix,
matmul_lambda
);
...
This will output
out:
[1, 0, 0, 1]
[2, 0, 0, 2]
[4, 0, 0, 4]
[8, 0, 0, 8]
[16, 0, 0, 16]
At each step our scan operation is simply a multiplication by the diagonal matrix with 2
on the diagonal. The result at step n simply contains the corresponding power of 2
. It is verify on paper that this is what we expect.
The matmul can be easily defined as a lambda as follows:
// Mult the current matrix from left to the previous one as scan op.
auto matmul_lambda = [=]__host__ __device__(const Matrix &x, const Matrix &y) {
auto a00 = thrust::get0>(x); auto a01 = thrust::get1>(x);
auto a10 = thrust::get2>(x); auto a11 = thrust::get3>(x);
auto b00 = thrust::get0>(y); auto b01 = thrust::get1>(y);
auto b10 = thrust::get2>(y); auto b11 = thrust::get3>(y);
auto c00 = a00 * b00 + a01 * b10;
auto c01 = a00 * b01 + a01 * b11;
auto c10 = a10 * b00 + a11 * b10;
auto c11 = a10 * b01 + a11 * b11;
return thrust::make_tuple(
c00,
c01,
c10,
c11
);
};
Fibonacci in terms of matrices
We are now ready to calculate the Fibonacci numbers on GPU.
Note the following identity which you can convince yourself of by calculating it on a piece of paper.
That means we can simply calculate by performing the scan with the matrix and obtain the fibonacci number by taking one of the antidiagonal entries.
const int N = 32;
...
Matrix fibonacci{thrust::make_tuple(1, 1, 1, 0)};
Matrix identity_matrix{thrust::make_tuple(1, 0, 0, 1)};
thrust::device_vectorMatrix> in(N, fibonacci);
thrust::device_vectorMatrix> out(N);
...
thrust::exclusive_scan(thrust::device,
in.begin(),
in.end(),
out.begin(),
identity_matrix,
matmul_lambda
);
...
This will output
out:
0;1;1;2;3;5;8;13;21;34;55;89;144;233;377;610;987;1597;2584;4181;6765;10946;17711;28657;46368;75025;121393;196418;317811;514229;832040;1346269;
which are indeed the correct results which we can verify for example using Wolfram Alpha:
Going large
To calculate extremly large fibonacci numbers we need to avoid integer overflow.
A simple way to avoid that is to just take everything modulo a given number in the matrix multiplication.
We can than verify that we obtain the correct result modulo this number using Wolfram Alpha.
Using the power of GPUs we are able to calculate in only 17 milliseconds on my Consumer GPU NVIDIA GeForce RTX 3060 Mobile
.
You can verify the correctness yourself by checking out my code on Github.
I obtain as a result
Last element of out (F(99999999) mod 9837):
7558
which can again be verified using Wolfram Alpha.
Conclusion
I hoped this blogpost illustrated what a powerful operation Scan is and how easy it is to implement quiet complicated operations using Thrust.
The idea to calculate the Fibonacci numbers using Scan is from an exercise given by Guy Blelloch. I suggest to check this paper out for a deeper dive into the Scan operation.
Keep your files stored safely and securely with the SanDisk 2TB Extreme Portable SSD. With over 69,505 ratings and an impressive 4.6 out of 5 stars, this product has been purchased over 8K+ times in the past month. At only $129.99, this Amazon’s Choice product is a must-have for secure file storage.
Help keep private content private with the included password protection featuring 256-bit AES hardware encryption. Order now for just $129.99 on Amazon!
Help Power Techcratic’s Future – Scan To Support
If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.
As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!
BITCOIN bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge Scan the QR code with your crypto wallet app |
DOGECOIN D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA Scan the QR code with your crypto wallet app |
ETHEREUM 0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a Scan the QR code with your crypto wallet app |
Please read the Privacy and Security Disclaimer on how Techcratic handles your support.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.