2024-11-18 10:19:00
jvernay.fr
Publication: 2024-11-17
Packing 2D rectangles into bigger fixed-size rectangles is a need for most multimedia projects.
In GPU programming, changing textures (“binding”) is expensive. Thus, when rendering text,
you don’t want to have one texture per glyph. Instead, it is desirable to pack the glyphs in a single texture, called the “atlas”.
In 2D-based games, atlas containing sprites are called spritesheets.
Spritesheets are also used for websites, because a single, bigger file download is better than one file per icon / logo.
Initially, I thought this was quite a specific problem, but it happens to have industrial impacts too.
How many ads can I fit on this newspaper page? How many shapes can I cut in this piece of wood?
How many packages can I fit in the back of a delivery van? Thus, the 2D packing problem has been studied academically too.
The most valuable source I have found is this excellent survey from Jukka Jylänki.
It presents 4 kinds of algorithms, and evaluates them in practice. Two of them stands out:
MAXRECTS
if you know before-hand the rectangles to pack (“offline packings”)SKYLINE
if you don’t know (“online packings”)
Offline packing is more optimal, because sorting the rectangles before packing helps considerably.
It is however impossible for some scenarios. In my scenario, I want to cache the glyphs rasterized from a font,
but I don’t know before-hand all the glyphs which will be used with each font and each format (bold/italic/size…).
That’s why I have settled on the skyline algorithm. That is also what uses stb_rect_pack.h,
fontstash and thus nanovg.
This article explains the skyline algorithm and provides an implementation.
The implementation is available online, in the public domain (UNLICENSE).
The API to be implemented
The API I propose is freestanding, and I’ve tried to be the most minimalist possible.
Thus, the API consists of a single structure and a single function. The caller is responsible to initialize the structure.
struct JvPack2d {
// Give an array capable of holding '2 * maxWidth' uint16_t
uint16_t* pSkyline;
// Size of the atlas.
uint16_t maxWidth;
uint16_t maxHeight;
// Private state to be zero-initialized.
bool _bInitialized;
uint16_t _skylineCount;
};
bool jvPack2dAdd(JvPack2d* pP2D, uint16_t const size[2], uint16_t pos[2]);
maxWidth
and maxHeight
are straight-forward: these are the dimensions of the target rectangle, eg. your texture.
I’ve settled for uint16_t
coordinates, because I aim for GPU textures, which are limited to 16384 pixels per dimension.
The pSkyline
array makes the user responsible for the allocation.
The required capacity 2 * maxWidth
is the worst case scenario if you have 1-pixel large rectangles.
If you want to spare a few kilobytes, and you know (or can ensure) all rectangles’ widths will be multiple of 4,
you can divide all widths and maxWidth
by 4, and multiplying by 4 the horizontal positions given by jvPack2dAdd()
.
How the Skyline algorithm works
The Skyline algorithm packs rectangles from the bottom to the top of the atlas.
It only keeps track of the upper-most position for each column of the atlas.
Conceptually, the algorithm’s state is equivalent to a 1D height map,
vaguely ressembling the silhouette of buildings, thus the algorithm is named “skyline”.
Because the algorithm only manipulates the silhouette, the layering of rectangles
can create untracked gaps, ie. wasted space. This can be mitigated with a freelist of such gaps,
but it requires to compromise on performance.
Then, the algorithm can be divided in two steps:
- We follow the skyline to find the lowest spot where the rectangle can be inserted.
Given the same vertical position, we choose the left-most position. - We update the skyline data structure with the silhouette of the new rectangle.
The data structure
In my implementation, I represent the skyline as an array of coordinates,
each one being the left point of an horizontal segment, sorted from left to right.
In the above image, the skyline is represented by four points:
\[\texttt{skyline} = [ (0,8), (2,5), (5,3), (7,4) ]\]
In C, this array is represented by the buffer pSkyline
which holds alternatively the X and Y positions of the points,
and the number of points stored in the buffer: skylineCount
. There are maxWidth
pixels on the horizontal axis,
thus the skyline may at most have maxWidth
segments, where each segment is 1-pixel long, for instance
in the shape of a castle’s crenellations. Thus, the skyline array contains at most maxWidth
two-dimensional coordinates.
This is why, in the API, pSkyline
must be able to store 2 * maxWidth
numbers.
bool jvPack2dAdd(JvPack2d* pP2D, uint16_t const size[2], uint16_t pos[2])
{
// Makes no sense to pack 0-width/0-height rectangles: early exit.
uint16_t width = size[0];
uint16_t height = size[1];
if (width == 0 || height == 0)
return false;
// Utility structure to make the array indexing more readable.
typedef struct Point {
uint16_t x;
uint16_t y;
} Point;
Point* pSkyline = (Point*)pP2D->pSkyline;
// Initial state contains the bottom-left point.
if (!pP2D->_bInitialized) {
_jvASSERT(pP2D->maxWidth, >, 0);
_jvASSERT(pP2D->maxHeight, >, 0);
pP2D->_bInitialized = true;
pP2D->_skylineCount = 1;
pSkyline[0].x = 0;
pSkyline[0].y = 0;
}
Searching for the insertion spot
The algorithm loops through the skyline in order to find the best candidate location,
ie. the lower row, then the left-most column. It doesn’t need to loop through every column.
It only goes through each point of the skyline array, because of the “left-most column” preference:
if another suitable position is found, a better position is obtained by moving it to the left,
until reaching one of the encoding point.
Given a rectangle of size \((w,h)\) we try to fit for each candidate position \((x,y)\),
we find which skyline points \((x_2,y_2)\) are horizontally overlapping with the rectangle,
ie. \(x \le x_2 , for two reasons:
-
If \(y_2 > y\), the rectangle collides with the skyline. In which case,
the rectangle is raised upper to prevent the collision: \(y \gets y_2\). -
These overlapping points will be removed from the skyline array,
because they will be shadowed by the silhouette of the new rectangle.
In the code, the range of overlapping points for the best candidate so far
is stored as the two indices idxBest
(inclusive) and idxBest2
(exclusive).
These overlapping points are always a consecutive range, because the array
is sorted by increasing horizontal position.
// Aliases to make the code less verbose.
uint16_t maxWidth = pP2D->maxWidth;
uint16_t maxHeight = pP2D->maxHeight;
uint16_t skylineCount = pP2D->_skylineCount;
// Stores the best candidate so far.
uint16_t idxBest = UINT16_MAX;
uint16_t idxBest2 = UINT16_MAX;
uint16_t bestX = UINT16_MAX;
uint16_t bestY = UINT16_MAX;
// Search loop for best location.
for (uint16_t idx = 0; idx maxWidth - x)
break; // We have reached the atlas' right boundary.
if (y >= bestY)
continue; // We will not beat the current best.
// Raise 'y' such that we are above all intersecting points.
uint16_t xMax = x + width;
uint16_t idx2;
for (idx2 = idx + 1; idx2 = bestY)
continue; // We did not beat the current best.
if (height > maxHeight - y)
continue; // We have reached the atlas' top boundary.
idxBest = idx;
idxBest2 = idx2;
bestX = x;
bestY = y;
}
if (idxBest == UINT16_MAX)
return false; // Did not find any space available.
_jvASSERT(idxBest, , 0);
Updating the skyline data structure
One we have found the best position, we need to update the skyline array.
The overlapping points identified in the previous steps are removed,
and replaced by one or two points corresponding to the silhouette of the added rectangle.
The following image illustrates three possible cases.
-
The new rectangle C overlaps two points, which will be removed.
Two points are added: TL (top-left) and BR (bottom-right).
Note that BR is lowered to the same row as the last overlapping point. -
The new rectangle D overlaps a single position (the rectangle’s origin), which will be removed.
The top-left point TL is added. The bottom-right point BR is not added,
because it would correspond to the right boundary of the atlas. -
The new rectangle G overlaps a single position (the rectangle’s origin), which will be removed.
The top-left point TL is added. The bottom-right point BR is not added,
because it would correspond to the same column as the next point in the skyline array,
and we want to maintain at most one point per column, to guarantee worst-case performance and memory usage.
The decision corresponds to the following code.
// We replace the points overshadowed by the current rect, by new points.
uint16_t removedCount = idxBest2 - idxBest;
Point newTL, newBR; // TopLeft, BottomRight
newTL.x = bestX;
newTL.y = bestY + height;
newBR.x = bestX + width;
newBR.y = pSkyline[idxBest2 - 1].y;
bool bBottomRightPoint =
(idxBest2
Then we need to either shrink or expand the array to accomodate the removal and insertion.
With a standard library providing array manipulation utilities like std::vector
from C++,
this can be simply one call to .erase()
, then .insert()
.
But I prefer for this article to express the algorithm independently of a standard library,
thus the implementation shifts elements explicitly:
// Logic for inserting and erasing in an array.
// It can be done in two lines if we use a standard library,
// eg. in C++ it would be std::vector erase() + insert()
// but I preferred the algorithm to be explicit and freestanding.
if (insertedCount > removedCount) {
// Expansion. Shift elements to the right. We need to iterate backwards.
uint16_t idx = skylineCount - 1;
uint16_t idx2 = idx + (insertedCount - removedCount);
for (; idx >= idxBest2; --idx, --idx2)
pSkyline[idx2] = pSkyline[idx];
pP2D->_skylineCount = skylineCount + (insertedCount - removedCount);
}
else if (insertedCount _skylineCount = skylineCount - (removedCount - insertedCount);
}
pSkyline[idxBest] = newTL;
if (bBottomRightPoint)
pSkyline[idxBest + 1] = newBR;
pos[0] = bestX;
pos[1] = bestY;
return true;
}
Results
Here is a GIF animation showing how the skyline algorithm solves the initial problem:
In terms of performance, I don’t have a proper benchmark, but as part of my unit tests,
I’ve written four “worst-case” scenarios where we fill an atlas with different patterns,
and it was easy enough to get timing measurements from these. Technically, it also measures the assertions,
but after profiling the code, the assertions were not part of the hot path, so the measurement seems representative.
The scenarios are parameterized by a single parameter ATLAS_SIZE
.
-
WorstCaseWidth
: the atlas’ size is(ATLAS_SIZE, 2)
, and we insert rectangles of size(1,1)
.
2 * ATLAS_SIZE
rectangles are packed. Requires4 * ATLAS_SIZE
bytes. -
WorstCaseHeight
: the atlas’ size is(2, ATLAS_SIZE)
, and we insert rectangles of size(1,1)
.
2 * ATLAS_SIZE
rectangles are packed. Requires8
bytes. -
WorstCaseDiagonalVertical
: the atlas’ size is(ATLAS_SIZE, ATLAS_SIZE)
,
and we insert 1-width rectangles to form a diagonal pattern.
2 * ATLAS_SIZE - 1
rectangles are packed. Requires4 * ATLAS_SIZE
bytes. -
WorstCaseDiagonalHorizontal
: the atlas’ size is(ATLAS_SIZE, ATLAS_SIZE)
,
and we insert 1-height rectangles to form a diagonal pattern.
2 * ATLAS_SIZE - 1
rectangles are packed. Requires4 * ATLAS_SIZE
bytes.
Here is a comparison of the execution time of the corresponding unit test.
For each row, we double the ATLAS_SIZE
. Thus the number of packed rectangles is also doubled.
The measure was done on a laptop with CPU AMD Ryzen 5 7520U 2.80 GHz,
thus the raw times are not that relevant, but the evolution regarding ATLAS_SIZE
is interesting.
ATLAS_SIZE | WorstCaseWidth | WorstCaseHeight | WorstCaseDiagonalVertical | WorstCaseDiagonalHorizontal |
---|---|---|---|---|
512 | 0.7 ms | 0.0 ms | 0.6 ms | 0.0 ms |
1024 | 2.9 ms | 0.0 ms | 3.0 ms | 0.0 ms |
2048 | 10.0 ms | 0.0 ms | 10.2 ms | 0.0 ms |
4096 | 37.2 ms | 0.1 ms | 39.8 ms | 0.2 ms |
8192 | 122.3 ms | 0.2 ms | 166.7 ms | 0.4 ms |
16384 | 524.4 ms | 0.6 ms | 775.2 ms | 0.6 ms |
32768 | 2322.1 ms | 1.1 ms | 2355.7 ms | 1.1 ms |
65535 | 7935.0 ms | 2.4 ms | 9549.4 ms | 1.9 ms |
-
WorstCaseHeight
andWorstCaseDiagonalHorizontal
are composed of 2 rectangles per line.
The insertion time is fast, and linear inATLAS_SIZE
; the cost per added rectangle is fixed. -
WorstCaseWidth
andWorstCaseDiagonalVertical
are composed of 2 rectangles per column.
The insertion time is much slower, and quadratic inATLAS_SIZE
; the cost per added rectangle is itself linear inATLAS_SIZE
.
Note that the algorithm contains two nested loops, thus the cost per added rectangle could theoretically be quadratic in ATLAS_SIZE
,
but I did not measure it. Without doing a mathematical analysis, my intuition is that the inner loop checking collisions
with the skyline is proportional to the rectangle’s width; but the bigger the width is, the less points
there will be in the skyline, thus the outer loop does less work next time.
My implementation is available in the public domain. If you find it useful, feel free to send me an email or share this article. 😊
Thanks for reading, have a nice day!
Support Techcratic
If you find value in Techcratic’s insights and articles, consider supporting us with Bitcoin. Your support helps me, as a solo operator, continue delivering high-quality content while managing all the technical aspects, from server maintenance to blog writing, future updates, and improvements. Support Innovation! Thank you.
Bitcoin Address:
bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge
Please verify this address before sending funds.
Bitcoin QR Code
Simply scan the QR code below to support Techcratic.
Please read the Privacy and Security Disclaimer on how Techcratic handles your support.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.