2024-11-02 15:57:00
railsatscale.com
In 2023, I wrote about how we’ve tuned Ruby’s garbage collector for Shopify’s monolith,
including how we implemented out-of-band garbage collection to reduce the impact of major collection on latency.
While the latency improvements were massive, we weren’t entirely satisfied with the heuristics used to trigger out-of-band
garbage collection. It was purely based on averages, so we had to trade latency for capacity.
More importantly, it didn’t fully eliminate major collection from request cycles, it only made it very rare.
But in December 2023, while discussing with Koichi Sasada, we came up with a new idea.
Disabling Major GC Entirely
If we want major GC to never trigger during a request cycle, why not disable it entirely?
In March 2024, during our annual Ruby Infrastructure team gathering, we fleshed out the details of the new feature we wanted,
and Matthew Valentine-House started working on a proof of concept, which we then deployed to a small percentage of our production servers to see how effective it could be.
First, we needed a way to entirely prevent the Garbage Collector from automatically performing a major collection, but
also to stop promoting objects to the old generation. Ideally in a web application, aside from some in-memory caches, no object allocated as part of a request should survive longer than the request itself.
Any object that does is probably something that should be eagerly loaded during boot, or some state that is leaking between requests.
As such, any object promoted to the old generation during a request cycle is very unlikely to be immortal, so promoting it is wasteful.
We also needed a way to ask the GC whether it would have run a major collection so that we could manually trigger it outside
of the request cycle, and only exactly as much as needed.
After some back and forth with other Ruby committers, it became a single new method: GC.config(rgengc_allow_full_mark: true/false)
.
We also exposed a new key in GC.latest_gc_info
, :needs_major_by
, for use in checking whether a major GC needs to run: GC.latest_gc_info(:needs_major_by)
.
This new feature was released as part of Ruby 3.4.0-preview2
.
Effectiveness
Since Shopify monolith runs on Ruby’s master branch, we don’t have to wait for the December release to use these new features,
so recently I went to work on enabling the new out-of-band GC implementation on 50% of production servers, and the results are amazing on all metrics.
First, as we anticipated, the time spent in GC during request cycles at the very tail end (p95/p99/p99.99) dropped very significantly.
However, more surprisingly, it also improved median latency:
The overall impact on service latency is of course more modest, but still very nice with a 5% reduction of average latency and a 10% reduction of p99 latency:
The impact on capacity, however, is less significant than we had hoped for. During the day, when there are frequent deploys, this doesn’t make much of a difference.
However when deploys pause for a few hours, the new out-of-band collector runs much less often than the old implementation:
Implementation
In addition, to be more effective, this new implementation is also radically simple, thanks to the hooks provided by Pitchfork
# pitchfork.conf.rb
after_worker_fork do |_server, _worker|
GC.config(rgengc_allow_full_mark: false)
end
after_request_complete do |_server, _worker, _rack_env|
if GC.latest_gc_info(:need_major_by)
GC.start
end
end
Next Steps?
Now that the major collection is out of the picture, the next step is to look at the minor collections.
We can’t disable minor collection, as otherwise large requests that allocate a lot would run out of memory. However, we could try to
additionally use heuristics from GC.stat
to eagerly trigger minor garbage collection out-of-band, so that the majority of requests don’t have
to spend any time at all in GC.
But the potential gains are much smaller because minor collection is quite fast even on our monolith.
Support Techcratic
If you find value in Techcratic’s insights and articles, consider supporting us with Bitcoin. Your support helps me, as a solo operator, continue delivering high-quality content while managing all the technical aspects, from server maintenance to blog writing, future updates, and improvements. Support Innovation! Thank you.
Bitcoin Address:
bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge
Please verify this address before sending funds.
Bitcoin QR Code
Simply scan the QR code below to support Techcratic.
Please read the Privacy and Security Disclaimer on how Techcratic handles your support.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.