Monday, November 18, 2024

2024-11-18 10:19:00
jvernay.fr

Publication: 2024-11-17

Packing 2D rectangles into bigger fixed-size rectangles is a need for most multimedia projects.
In GPU programming, changing textures (“binding”) is expensive. Thus, when rendering text,
you don’t want to have one texture per glyph. Instead, it is desirable to pack the glyphs in a single texture, called the “atlas”.
In 2D-based games, atlas containing sprites are called spritesheets.
Spritesheets are also used for websites, because a single, bigger file download is better than one file per icon / logo.

Example of 2D packing problem

Initially, I thought this was quite a specific problem, but it happens to have industrial impacts too.
How many ads can I fit on this newspaper page? How many shapes can I cut in this piece of wood?
How many packages can I fit in the back of a delivery van? Thus, the 2D packing problem has been studied academically too.

The most valuable source I have found is this excellent survey from Jukka Jylänki.
It presents 4 kinds of algorithms, and evaluates them in practice. Two of them stands out:

  • MAXRECTS if you know before-hand the rectangles to pack (“offline packings”)
  • SKYLINE if you don’t know (“online packings”)

Offline packing is more optimal, because sorting the rectangles before packing helps considerably.
It is however impossible for some scenarios. In my scenario, I want to cache the glyphs rasterized from a font,
but I don’t know before-hand all the glyphs which will be used with each font and each format (bold/italic/size…).

That’s why I have settled on the skyline algorithm. That is also what uses stb_rect_pack.h,
fontstash and thus nanovg.

This article explains the skyline algorithm and provides an implementation.
The implementation is available online, in the public domain (UNLICENSE).


The API to be implemented

The API I propose is freestanding, and I’ve tried to be the most minimalist possible.
Thus, the API consists of a single structure and a single function. The caller is responsible to initialize the structure.

struct JvPack2d {
    // Give an array capable of holding '2 * maxWidth' uint16_t
    uint16_t* pSkyline;

    // Size of the atlas.
    uint16_t maxWidth;
    uint16_t maxHeight;

    // Private state to be zero-initialized.
    bool _bInitialized;
    uint16_t _skylineCount;
};

bool jvPack2dAdd(JvPack2d* pP2D, uint16_t const size[2], uint16_t pos[2]);

maxWidth and maxHeight are straight-forward: these are the dimensions of the target rectangle, eg. your texture.
I’ve settled for uint16_t coordinates, because I aim for GPU textures, which are limited to 16384 pixels per dimension.

The pSkyline array makes the user responsible for the allocation.
The required capacity 2 * maxWidth is the worst case scenario if you have 1-pixel large rectangles.
If you want to spare a few kilobytes, and you know (or can ensure) all rectangles’ widths will be multiple of 4,
you can divide all widths and maxWidth by 4, and multiplying by 4 the horizontal positions given by jvPack2dAdd().


How the Skyline algorithm works

The Skyline algorithm packs rectangles from the bottom to the top of the atlas.
It only keeps track of the upper-most position for each column of the atlas.
Conceptually, the algorithm’s state is equivalent to a 1D height map,
vaguely ressembling the silhouette of buildings, thus the algorithm is named “skyline”.

Because the algorithm only manipulates the silhouette, the layering of rectangles
can create untracked gaps, ie. wasted space. This can be mitigated with a freelist of such gaps,
but it requires to compromise on performance.

Illustration of the skyline and wasted space

Then, the algorithm can be divided in two steps:

  1. We follow the skyline to find the lowest spot where the rectangle can be inserted.
    Given the same vertical position, we choose the left-most position.
  2. We update the skyline data structure with the silhouette of the new rectangle.

The data structure

In my implementation, I represent the skyline as an array of coordinates,
each one being the left point of an horizontal segment, sorted from left to right.
In the above image, the skyline is represented by four points:

\[\texttt{skyline} = [ (0,8), (2,5), (5,3), (7,4) ]\]

In C, this array is represented by the buffer pSkyline which holds alternatively the X and Y positions of the points,
and the number of points stored in the buffer: skylineCount. There are maxWidth pixels on the horizontal axis,
thus the skyline may at most have maxWidth segments, where each segment is 1-pixel long, for instance
in the shape of a castle’s crenellations. Thus, the skyline array contains at most maxWidth two-dimensional coordinates.
This is why, in the API, pSkyline must be able to store 2 * maxWidth numbers.

bool jvPack2dAdd(JvPack2d* pP2D, uint16_t const size[2], uint16_t pos[2])
{
    // Makes no sense to pack 0-width/0-height rectangles: early exit.
    uint16_t width = size[0];
    uint16_t height = size[1];
    if (width == 0 || height == 0)
        return false;

    // Utility structure to make the array indexing more readable.
    typedef struct Point {
        uint16_t x;
        uint16_t y;
    } Point;
    Point* pSkyline = (Point*)pP2D->pSkyline;

    // Initial state contains the bottom-left point.
    if (!pP2D->_bInitialized) {
        _jvASSERT(pP2D->maxWidth, >, 0);
        _jvASSERT(pP2D->maxHeight, >, 0);
        pP2D->_bInitialized = true;
        pP2D->_skylineCount = 1;
        pSkyline[0].x = 0;
        pSkyline[0].y = 0;
    }

Searching for the insertion spot

The algorithm loops through the skyline in order to find the best candidate location,
ie. the lower row, then the left-most column. It doesn’t need to loop through every column.
It only goes through each point of the skyline array, because of the “left-most column” preference:
if another suitable position is found, a better position is obtained by moving it to the left,
until reaching one of the encoding point.

Illustration of the search for best candidate

Given a rectangle of size \((w,h)\) we try to fit for each candidate position \((x,y)\),
we find which skyline points \((x_2,y_2)\) are horizontally overlapping with the rectangle,
ie. \(x \le x_2 , for two reasons:

  1. If \(y_2 > y\), the rectangle collides with the skyline. In which case,
    the rectangle is raised upper to prevent the collision: \(y \gets y_2\).

  2. These overlapping points will be removed from the skyline array,
    because they will be shadowed by the silhouette of the new rectangle.

In the code, the range of overlapping points for the best candidate so far
is stored as the two indices idxBest (inclusive) and idxBest2 (exclusive).
These overlapping points are always a consecutive range, because the array
is sorted by increasing horizontal position.

    // Aliases to make the code less verbose.
    uint16_t maxWidth = pP2D->maxWidth;
    uint16_t maxHeight = pP2D->maxHeight;
    uint16_t skylineCount = pP2D->_skylineCount;

    // Stores the best candidate so far.
    uint16_t idxBest = UINT16_MAX;
    uint16_t idxBest2 = UINT16_MAX;
    uint16_t bestX = UINT16_MAX;
    uint16_t bestY = UINT16_MAX;

    // Search loop for best location.
    for (uint16_t idx = 0; idx  maxWidth - x)
            break; // We have reached the atlas' right boundary.
        if (y >= bestY)
            continue; // We will not beat the current best.

        // Raise 'y' such that we are above all intersecting points.
        uint16_t xMax = x + width;
        uint16_t idx2;
        for (idx2 = idx + 1; idx2 = bestY)
            continue; // We did not beat the current best.
        if (height > maxHeight - y)
            continue; // We have reached the atlas' top boundary.

        idxBest = idx;
        idxBest2 = idx2;
        bestX = x;
        bestY = y;
    }

    if (idxBest == UINT16_MAX)
        return false; // Did not find any space available.
    _jvASSERT(idxBest, , 0);

Updating the skyline data structure

One we have found the best position, we need to update the skyline array.
The overlapping points identified in the previous steps are removed,
and replaced by one or two points corresponding to the silhouette of the added rectangle.
The following image illustrates three possible cases.

Illustration of the skyline array updates

  1. The new rectangle C overlaps two points, which will be removed.
    Two points are added: TL (top-left) and BR (bottom-right).
    Note that BR is lowered to the same row as the last overlapping point.

  2. The new rectangle D overlaps a single position (the rectangle’s origin), which will be removed.
    The top-left point TL is added. The bottom-right point BR is not added,
    because it would correspond to the right boundary of the atlas.

  3. The new rectangle G overlaps a single position (the rectangle’s origin), which will be removed.
    The top-left point TL is added. The bottom-right point BR is not added,
    because it would correspond to the same column as the next point in the skyline array,
    and we want to maintain at most one point per column, to guarantee worst-case performance and memory usage.

The decision corresponds to the following code.

    // We replace the points overshadowed by the current rect, by new points.

    uint16_t removedCount = idxBest2 - idxBest;

    Point newTL, newBR; // TopLeft, BottomRight
    newTL.x = bestX;
    newTL.y = bestY + height;
    newBR.x = bestX + width;
    newBR.y = pSkyline[idxBest2 - 1].y;
    bool bBottomRightPoint =
        (idxBest2 

Then we need to either shrink or expand the array to accomodate the removal and insertion.
With a standard library providing array manipulation utilities like std::vector from C++,
this can be simply one call to .erase(), then .insert().
But I prefer for this article to express the algorithm independently of a standard library,
thus the implementation shifts elements explicitly:

    // Logic for inserting and erasing in an array.
    // It can be done in two lines if we use a standard library,
    // eg. in C++ it would be std::vector erase() + insert()
    // but I preferred the algorithm to be explicit and freestanding.

    if (insertedCount > removedCount) {
        // Expansion. Shift elements to the right. We need to iterate backwards.
        uint16_t idx = skylineCount - 1;
        uint16_t idx2 = idx + (insertedCount - removedCount);
        for (; idx >= idxBest2; --idx, --idx2)
            pSkyline[idx2] = pSkyline[idx];
        pP2D->_skylineCount = skylineCount + (insertedCount - removedCount);
    }
    else if (insertedCount _skylineCount = skylineCount - (removedCount - insertedCount);
    }

    pSkyline[idxBest] = newTL;
    if (bBottomRightPoint)
        pSkyline[idxBest + 1] = newBR;

    pos[0] = bestX;
    pos[1] = bestY;
    return true;
}

Results

Here is a GIF animation showing how the skyline algorithm solves the initial problem:

Animation of the skyline algorithm applied on the initial problem

In terms of performance, I don’t have a proper benchmark, but as part of my unit tests,
I’ve written four “worst-case” scenarios where we fill an atlas with different patterns,
and it was easy enough to get timing measurements from these. Technically, it also measures the assertions,
but after profiling the code, the assertions were not part of the hot path, so the measurement seems representative.
The scenarios are parameterized by a single parameter ATLAS_SIZE.

Illustration of the four scenarios generated by the unit tests

  1. WorstCaseWidth: the atlas’ size is (ATLAS_SIZE, 2), and we insert rectangles of size (1,1).
    2 * ATLAS_SIZE rectangles are packed. Requires 4 * ATLAS_SIZE bytes.

  2. WorstCaseHeight: the atlas’ size is (2, ATLAS_SIZE), and we insert rectangles of size (1,1).
    2 * ATLAS_SIZE rectangles are packed. Requires 8 bytes.

  3. WorstCaseDiagonalVertical: the atlas’ size is (ATLAS_SIZE, ATLAS_SIZE),
    and we insert 1-width rectangles to form a diagonal pattern.
    2 * ATLAS_SIZE - 1 rectangles are packed. Requires 4 * ATLAS_SIZE bytes.

  4. WorstCaseDiagonalHorizontal: the atlas’ size is (ATLAS_SIZE, ATLAS_SIZE),
    and we insert 1-height rectangles to form a diagonal pattern.
    2 * ATLAS_SIZE - 1 rectangles are packed. Requires 4 * ATLAS_SIZE bytes.

Here is a comparison of the execution time of the corresponding unit test.
For each row, we double the ATLAS_SIZE. Thus the number of packed rectangles is also doubled.
The measure was done on a laptop with CPU AMD Ryzen 5 7520U 2.80 GHz,
thus the raw times are not that relevant, but the evolution regarding ATLAS_SIZE is interesting.

ATLAS_SIZE WorstCaseWidth WorstCaseHeight WorstCaseDiagonalVertical WorstCaseDiagonalHorizontal
512 0.7 ms 0.0 ms 0.6 ms 0.0 ms
1024 2.9 ms 0.0 ms 3.0 ms 0.0 ms
2048 10.0 ms 0.0 ms 10.2 ms 0.0 ms
4096 37.2 ms 0.1 ms 39.8 ms 0.2 ms
8192 122.3 ms 0.2 ms 166.7 ms 0.4 ms
16384 524.4 ms 0.6 ms 775.2 ms 0.6 ms
32768 2322.1 ms 1.1 ms 2355.7 ms 1.1 ms
65535 7935.0 ms 2.4 ms 9549.4 ms 1.9 ms
  • WorstCaseHeight and WorstCaseDiagonalHorizontal are composed of 2 rectangles per line.
    The insertion time is fast, and linear in ATLAS_SIZE; the cost per added rectangle is fixed.

  • WorstCaseWidth and WorstCaseDiagonalVertical are composed of 2 rectangles per column.
    The insertion time is much slower, and quadratic in ATLAS_SIZE; the cost per added rectangle is itself linear in ATLAS_SIZE.

Note that the algorithm contains two nested loops, thus the cost per added rectangle could theoretically be quadratic in ATLAS_SIZE,
but I did not measure it. Without doing a mathematical analysis, my intuition is that the inner loop checking collisions
with the skyline is proportional to the rectangle’s width; but the bigger the width is, the less points
there will be in the skyline, thus the outer loop does less work next time.

My implementation is available in the public domain. If you find it useful, feel free to send me an email or share this article. 😊

Thanks for reading, have a nice day!

Source Link

Support Techcratic

If you find value in Techcratic’s insights and articles, consider supporting us with Bitcoin. Your support helps me, as a solo operator, continue delivering high-quality content while managing all the technical aspects, from server maintenance to blog writing, future updates, and improvements. Support Innovation! Thank you.

Bitcoin Address:

bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge

Please verify this address before sending funds.

Bitcoin QR Code

Simply scan the QR code below to support Techcratic.

Bitcoin QR code for donations

Please read the Privacy and Security Disclaimer on how Techcratic handles your support.

Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.

Hacker News

Hacker News

Stay updated with Hacker News, where technology meets entrepreneurial spirit. Get the latest on tech trends, startup news, and discussions from the tech community. Read the latest updates here at Techcratic.

Related Posts

Next Post

Leave a Reply

Your email address will not be published. Required fields are marked *