2024-12-30 11:23:00
blog.lohr.dev
So, initially, I just wanted to see what the smallest binary size for a ‘Hello World’ program written in Rust would be. Why? Out of curiosity – it’s probably just a simple compiler flag anyway, right? Well, turns out there are some that help, but you need a lot more work to get a truly minimal binary. Much of it is not even related to Rust! Of course, there are many drawbacks when optimizing for a minimal executable, but there are valid use cases where space or transfer size is crucial.
As a first step, I want to see what the lowest general limit for a ‘Hello World’ program is. To have the most control and be sure that there is no overhead from a compiler, I will develop it in assembly. With that baseline, I can then compare the resulting binary with one written in Rust (or even Zig and C) in future.
Let’s first establish some rules for our ‘Hello World’ program:
-
The program has to be executable on any modern 64-bit x86 Linux machine
-
It should be able to execute directly without passing to any other programs first (so no decompression)
-
It should be a ‘proper‘ executable binary according to the spec
-
It should print ‘Hello World‘ to the standard output and exit with code
0
(success) -
Performance does not matter, but it should show the text fast
Now, to write the x86 assembly: A normal ‘Hello World‘ program is actually not as trivial as it sounds, since we need to interact with our operating system to print to the terminal. We can craft the syscall ourselves, but typically developers would use libc
to call the printf
function. However, since we are on our quest for a minimal binary, this won’t be an option since printf
does actually quite more than just print to the stdout
and we would have to link to libc
which comes with a lot of overhead!
This is the most minimal assembly I was able to come up with:
msg: db 'Hello, World!', 0xA
global _start
_start:
mov rax, 1 ; syscall: sys_write
mov rdi, 1 ; file descriptor: stdout
lea rsi, [rel msg] ; pointer to message
mov rdx, 14 ; message length
syscall
mov rax, 60 ; syscall: sys_exit
xor edi, edi ; exit code 0
syscall
To give a short explanation: First, we write the bytes of the null-terminated ‘Hello World‘-string statically into the assembly. We expose our application’s entry point to the ELF interface (which we will learn about later) by defining the label _start
as global.
To actually print something, we want to call the sys_write
syscall. In the Linux source code, it is defined here as:
SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf, size_t, count)
{
return ksys_write(fd, buf, count);
}
So it is defined by some C macro and just calls another helper function. The signature will expand to something like:
long sys_write(unsigned int fd, const char __user *buf, size_t count);
The first argument identifies the file descriptor (stdout
in our case, which is 1
). The second one is a pointer to the data; finally, we have the length of the data.
We still need the final syscall to exit cleanly with code 0
; then we are done. I avoid using the .data section as this will introduce additional section headers, metadata and alignment bytes.
To assemble this, I use the NASM assembler and link with ld
:
$ nasm -f elf64 -o hello.o hello.asm
$ ld -o hello hello.o
$ chmod +x hello
Now let’s see if it works and print the size:
$ ./hello
Hello World!
$ wc -c hello
4728 hello
So 4728 bytes. Not bad ey? But can we get it smaller!?
Well, first we could use a 32-bit architecture instead because instructions are encoded using fewer bytes, pointers only use 4 bytes and the binary is 4-byte aligned (instead of 8 bytes). I tried it and it got the binary size down to 4548 bytes which is much smaller! But remember the rules? We restricted ourselves to a modern 64-bit architecture!
But why do we need so many bytes for something so simple in the first place? Remember the two commands we used to build the executable? Let’s print the size after every step:
$ wc -c hello.o
640 hello.o
$ wc -c hello
4728 hello
WAIT A MINUTE?! Why does our binary get more than 7 times bigger during the linking process with ld
? And why do we need to link anyway?!
In short: The GNU linker (ld
) takes one or more object files (e.g., hello.o
) produced by assemblers and combines them into an executable binary or a shared library (.so
). It resolves symbols (like the _start
label) and hardcodes their final memory addresses inside the binary. It will also move some addresses around and add zero bytes to optimize the memory layout. If we were to use shared libraries it would also set those up for us. Additionally, debug symbols are generated to make debugging possible. Finally, it generates the application entry point so the system can directly execute it. Since this process involves understanding the assembly and moving around many things, optimizing the input object file (removing padding/alignment bytes or symbols) won’t decrease the file size of the final executable but might even break the linking process – believe me, I tried.
Instead, we can try to remove some of that information from the binary. For example, symbols will help us debug the application – but we don’t need that since our code always works perfectly first try. Let’s have a look at them:
$ nm hello
0000000000402000 T __bss_start
0000000000402000 T _edata
0000000000402000 T _end
0000000000401000 t msg
000000000040100e T _start
Some of them might sound familiar from our assembly code, others are built-ins. Using strip
we can get rid of them:
$ wc -c hello
4728 hello
$ strip --strip-all hello
$ wc -c hello
4352 hello
$ ./hello
Hello World!
Down to 4352 bytes! So we removed a bunch of stuff and the executable still works just fine.
This is not bad at all, but to go further we have to understand every single byte of the binary. The format of binaries on Linux is called ELF. But who is this magic elf 🧝? It stands for “Executable and Linkable Format“ and describes the format of an assembled binary. You can read through the spec here but I will summarize quickly. The spec contains this nice figure which describes the layout of our binary before (the hello.o
file) and after linking:
So while the small binary before linking also adheres to the ELF format most of the information is only added to the executable. We will have a more detailed look at the header later. The program header describes which memory sections to load at runtime. The section header describes static data (.text
, .data
sections). The segments define what needs to be loaded into memory for execution. This figure from Wikipedia visualizes the execution view quite well, but you might have to zoom in.
We can assemble our executable without the ELF format as a so-called ‘flat binary‘ and get only the bytes for our hello-world code:
$ nasm -f bin -o hello.o hello.asm
$ wc -c hello.o
47 hello.o
$ ld -o hello hello.o
ld:hello.o: file format not recognized; treating as linker script
ld:hello.o:1: syntax error
So the raw assembled binary is now 47 bytes. However, we cannot link and execute it inside of our operating system anymore. This is because we don’t have the ELF header anymore! This is useful if we would want to build our own BIOS or system kernel.
But in this case, we still want an executable! Since we don’t rely on any of the other features of the linker or ELF format in this case, we can just create the ELF header ourselves. Using assembly we can write the required bytes directly into the code:
And now assembly and execute it:
$ nasm -f bin -o elf elf.asm; chmod +x elf
$ ./elf
Hello World!
$ wc -c elf
167 elf
Great! It took me way too long to get this to work, but I can recommend the documentation on Wikipedia if you want to give it a shot yourself (or just copy my headers).
Finally, we are now down to 167 bytes! Targeting a 32-bit architecture I got it down to 129 bytes. Now, there are some ways we can get this number down even further, but I consider them violating our “according to spec“ requirement. For example, not all ELF header bytes are actually used (or the system just might not care), so we could start our program earlier by reusing some of the ELF header bytes as Brain Raiter demonstrated here. With that technique, we would get it probably below 100 bytes.
What a journey! If you want to go really deep on executables, I can recommend the blog series “Making our own executable packer“ by fasterthanlime. Have a great one!
Keep your files stored safely and securely with the SanDisk 2TB Extreme Portable SSD. With over 69,505 ratings and an impressive 4.6 out of 5 stars, this product has been purchased over 8K+ times in the past month. At only $129.99, this Amazon’s Choice product is a must-have for secure file storage.
Help keep private content private with the included password protection featuring 256-bit AES hardware encryption. Order now for just $129.99 on Amazon!
Support Techcratic
If you find value in Techcratic’s insights and articles, consider supporting us with Bitcoin. Your support helps me, as a solo operator, continue delivering high-quality content while managing all the technical aspects, from server maintenance to blog writing, future updates, and improvements. Support Innovation! Thank you.
Bitcoin Address:
bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge
Please verify this address before sending funds.
Bitcoin QR Code
Simply scan the QR code below to support Techcratic.
Please read the Privacy and Security Disclaimer on how Techcratic handles your support.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.