• About TC
  • Affiliate Disclaimer
  • Privacy Policy
  • TOS
  • Contact
Monday, June 16, 2025
Techcratic
  • TC
  • AI
    Artificial Intelligence

    Automatically Build AI Workflows with Magical AI

    Artificial Intelligence

    Amazon Nova Lite enables Bito to offer a free tier option for its AI-powered code reviews

    Artificial Intelligence

    Bridging the Gap: New Datasets Push Recommender Research Toward Real-World Scale

    Artificial Intelligence

    7 Python Errors That Are Actually Features

    Artificial Intelligence

    10 Awesome OCR Models for 2025

    Artificial Intelligence

    5 Error Handling Patterns in Python (Beyond Try-Except)

    Artificial Intelligence

    Top 5 Alternative Data Career Paths and How to Learn Them for Free

    Artificial Intelligence

    Implementing Machine Learning Pipelines with Apache Spark

    Artificial Intelligence

    Learn Power BI for Free This Week

  • Crypto
    Fed Watchers Eye September as Tension Builds Around Central Bank Cuts

    Fed Watchers Eye September as Tension Builds Around Central Bank Cuts

    Uniswap Surges 24% on $88B Volume, Targeting $12

    Tron Plans US Public Offering via Nasdaq Reverse Merger

    Report: Justin Sun’s Tron Aims for Nasdaq Listing in High-Stakes Merger Deal

    Report: Justin Sun’s Tron Aims for Nasdaq Listing in High-Stakes Merger Deal

    Best Presales to Buy Today – Which Coins Are Poised for a Breakout?

    2025’s Breakout Meme Coin? Why Everyone’s Rushing to Get a Piece of $AKE

    Metaplanet Acquires Additional 1,112 Bitcoin, Total Holdings Reach 10,000 BTC

    Metaplanet Acquires Additional 1,112 Bitcoin, Total Holdings Reach 10,000 BTC

    Crypto to “Become Part of All Sectors” Under Trump: Kevin O’Leary

    Metaplanet Issues Fresh $210M Bonds to Evo Fund

    Bitcoin Going to $1M: Saylor’s Call Revives Interest in Adam Back’s 21M BTC Order

    Bitcoin Going to $1M: Saylor’s Call Revives Interest in Adam Back’s 21M BTC Order

    Bitcoin Eyes $30T Treasury Store of Value Market, Says Bitwise CEO

    Bitcoin Eyes $30T Treasury Store of Value Market, Says Bitwise CEO

    ZKJ Token Plummets More Than 60% in Flash Crash Amid Rug-Pull Allegations

    ZKJ Token Plummets More Than 60% in Flash Crash Amid Rug-Pull Allegations

  • Cybersecurity
    Cybersecurity

    AI Agents Run on Secret Accounts — Learn How to Secure Them in This Webinar

    Cybersecurity

    How to Address the Expanding Security Risk

    Cybersecurity

    ConnectWise to Rotate ScreenConnect Code Signing Certificates Due to Security Risks

    Cybersecurity

    5 Lessons from River Island

    Cybersecurity

    INTERPOL Dismantles 20,000+ Malicious IPs Linked to 69 Malware Variants in Operation Secure

    Cybersecurity

    SinoTrack GPS Devices Vulnerable to Remote Vehicle Control via Default Passwords

    Cybersecurity

    Researchers Uncover 20+ Configuration Risks, Including Five CVEs, in Salesforce Industry Cloud

    Cybersecurity

    Adobe Releases Patch Fixing 254 Vulnerabilities, Closing High-Severity Security Gaps

    Cybersecurity

    Researcher Found Flaw to Discover Phone Numbers Linked to Any Google Account

  • Deals
    Western Digital 8TB WD Red Plus NAS Internal Hard Drive HDD – 5640 RPM, SATA 6 Gb/s,…

    Western Digital 8TB WD Red Plus NAS Internal Hard Drive HDD – 5640 RPM, SATA 6 Gb/s,…

    Seagate BarraCuda Mobile Hard Drive 4TB SATA 6Gb/s 128MB Cache 2.5-Inch 15mm…

    Seagate BarraCuda Mobile Hard Drive 4TB SATA 6Gb/s 128MB Cache 2.5-Inch 15mm…

    Lexar 128GB (2-PK) Professional SILVER PRO SD Card, UHS-II, C10, U3, V60, Full HD, 4K,…

    Lexar 128GB (2-PK) Professional SILVER PRO SD Card, UHS-II, C10, U3, V60, Full HD, 4K,…

    SABRENT 2.5 Inch SATA to USB 3.0 Tool Free External Hard Drive Enclosure [Optimized for…

    SABRENT 2.5 Inch SATA to USB 3.0 Tool Free External Hard Drive Enclosure [Optimized for…

    B221000 Black Toner Cartridge B/MB2236 Replacement for Lexmark B221000 Toner Cartridge…

    B221000 Black Toner Cartridge B/MB2236 Replacement for Lexmark B221000 Toner Cartridge…

    Lexar 1TB Professional Go Portable SSD w/Hub, Supports Apple 4K 60fps ProRes, Up to…

    Lexar 1TB Professional Go Portable SSD w/Hub, Supports Apple 4K 60fps ProRes, Up to…

    Kingston NV3 1TB M.2 2280 NVMe SSD | PCIe 4.0 Gen 4×4 | Up to 6000 MB/s | SNV3S/1000G

    Kingston NV3 1TB M.2 2280 NVMe SSD | PCIe 4.0 Gen 4×4 | Up to 6000 MB/s | SNV3S/1000G

    Intel Core Ultra 7 Desktop Processor 265K – 20 cores (8 P-cores + 12 E-cores) up to 5.5…

    Intel Core Ultra 7 Desktop Processor 265K – 20 cores (8 P-cores + 12 E-cores) up to 5.5…

    Hitachi FIJ0038 Fuel Injector

    Hitachi FIJ0038 Fuel Injector

  • Gaming
    ASRock B860 LiveMixer Wi-Fi review

    ASRock B860 LiveMixer Wi-Fi review

    Okay, Ubisoft. We Need To Talk.

    Okay, Ubisoft. We Need To Talk.

    This New DRG Game Looks AMAZING! Deep Rock Galactic Rogue Core !

    This New DRG Game Looks AMAZING! Deep Rock Galactic Rogue Core !

    Squid Game 2 – (Full Walkthrough) | Roblox

    Squid Game 2 – (Full Walkthrough) | Roblox

    Windows Recall gets an export feature to let non-Microsoft websites and apps use your Copilot PC’s snapshots

    Windows Recall gets an export feature to let non-Microsoft websites and apps use your Copilot PC’s snapshots

    Good Game Review – The Witcher 3: Wild Hunt – TX: 19/5/15

    Good Game Review – The Witcher 3: Wild Hunt – TX: 19/5/15

    Fortnite CHAPTER 6 SEASON 3 – Trailer

    Fortnite CHAPTER 6 SEASON 3 – Trailer

    Minions Paradise – Gameplay Walkthrough Part 1 – Level 1-3 (iOS, Android)

    Minions Paradise – Gameplay Walkthrough Part 1 – Level 1-3 (iOS, Android)

    The new Windows 11 Insider release has a weird bug where it plays the Windows Vista start-up music instead of the current one

    The new Windows 11 Insider release has a weird bug where it plays the Windows Vista start-up music instead of the current one

  • Tesla
    Car Windshield Cleaner, Windshield Cleaning Tool Kit with Detachable Handle Spray…

    Car Windshield Cleaner, Windshield Cleaning Tool Kit with Detachable Handle Spray…

    Winch Stopper,Winch Accessories,Car Accessories Winch Cable Stopper,Rubber Winch…

    Winch Stopper,Winch Accessories,Car Accessories Winch Cable Stopper,Rubber Winch…

    LUCKEASY 2PCS Storage Box Compatible with Tesla Cybertruck 2024 2023 Center Console…

    LUCKEASY 2PCS Storage Box Compatible with Tesla Cybertruck 2024 2023 Center Console…

    Tesla on ‘self-driving’ gets stuck on train track and hit by train

    Tesla on ‘self-driving’ gets stuck on train track and hit by train

    Level 1/2 Tesla Charger – 16A 3.84KW Mobile EV Charging with 240V NEMA 6-20 Plug, 5-15…

    Level 1/2 Tesla Charger – 16A 3.84KW Mobile EV Charging with 240V NEMA 6-20 Plug, 5-15…

    Upgrade fit Tesla Model Y (2019-2023) Center Console Wireless Charger Mat – Silicone…

    Upgrade fit Tesla Model Y (2019-2023) Center Console Wireless Charger Mat – Silicone…

    Torx Plus Socket, 5-External Torx Socket 1/4″ Dr 10EPR Compatible With Tesla Model 3…

    Torx Plus Socket, 5-External Torx Socket 1/4″ Dr 10EPR Compatible With Tesla Model 3…

    Car Seat Organizers,Multi-functional Back Seat Protectors, Storage Pouches, and Tray…

    Car Seat Organizers,Multi-functional Back Seat Protectors, Storage Pouches, and Tray…

    AOHI USB C Car Charger, PD 45W&QC 30W 2 Port Type-C Fast Charging Car Charger Lighter…

    AOHI USB C Car Charger, PD 45W&QC 30W 2 Port Type-C Fast Charging Car Charger Lighter…

  • UFO
    How Joe Biden Crashed his Bike #shorts

    How Joe Biden Crashed his Bike #shorts

    F205 Drone with 2.4″ Screen, Brushless Motor FPV Drone with Camera for Adult, Auto-Hover, Gesture Control, 8GB SD Card with Card Reader, One-Key Start for Beginner

    F205 Drone with 2.4″ Screen, Brushless Motor FPV Drone with Camera for Adult, Auto-Hover, Gesture Control, 8GB SD Card with Card Reader, One-Key Start for Beginner

    Are Aliens Real? #dailyfactorz #facts #earth

    Are Aliens Real? #dailyfactorz #facts #earth

    Alien Birthday Party Decorations Door Curtain with Foil Fringe Tinsel Spacecraft Flying Saucer Spaceship Hanging Banner for Alien Halloween Party (Fluorescent Green)

    Alien Birthday Party Decorations Door Curtain with Foil Fringe Tinsel Spacecraft Flying Saucer Spaceship Hanging Banner for Alien Halloween Party (Fluorescent Green)

    I SHREDDED Alien's Belongings in VR! – Blinnk and the Vacuum of Space VR

    I SHREDDED Alien's Belongings in VR! – Blinnk and the Vacuum of Space VR

    Mind Blowing Encounters with Spiritual Beings and Astral Realms – With Erik Unger P-2

    Mind Blowing Encounters with Spiritual Beings and Astral Realms – With Erik Unger P-2

    Katie’s Bumpers Frequent Flyer UFO Yellow – FF7YEL

    Katie’s Bumpers Frequent Flyer UFO Yellow – FF7YEL

    Did Ancient astronauts visit Earth?? new evidence fuels extraterrestrial Theories! #viral #history

    Did Ancient astronauts visit Earth?? new evidence fuels extraterrestrial Theories! #viral #history

    INFUNLY 4pcs Solar System Patches Iron on Sequin Planet Embroidery Patch Rainbow UFO Patch Space Sew on Patch Spacecraft Patch Celestial Applique for DIY Clothing Jeans Bags Jacket Backpack Hat

    INFUNLY 4pcs Solar System Patches Iron on Sequin Planet Embroidery Patch Rainbow UFO Patch Space Sew on Patch Spacecraft Patch Celestial Applique for DIY Clothing Jeans Bags Jacket Backpack Hat

No Result
View All Result
  • TC
  • AI
    Artificial Intelligence

    Automatically Build AI Workflows with Magical AI

    Artificial Intelligence

    Amazon Nova Lite enables Bito to offer a free tier option for its AI-powered code reviews

    Artificial Intelligence

    Bridging the Gap: New Datasets Push Recommender Research Toward Real-World Scale

    Artificial Intelligence

    7 Python Errors That Are Actually Features

    Artificial Intelligence

    10 Awesome OCR Models for 2025

    Artificial Intelligence

    5 Error Handling Patterns in Python (Beyond Try-Except)

    Artificial Intelligence

    Top 5 Alternative Data Career Paths and How to Learn Them for Free

    Artificial Intelligence

    Implementing Machine Learning Pipelines with Apache Spark

    Artificial Intelligence

    Learn Power BI for Free This Week

  • Crypto
    Fed Watchers Eye September as Tension Builds Around Central Bank Cuts

    Fed Watchers Eye September as Tension Builds Around Central Bank Cuts

    Uniswap Surges 24% on $88B Volume, Targeting $12

    Tron Plans US Public Offering via Nasdaq Reverse Merger

    Report: Justin Sun’s Tron Aims for Nasdaq Listing in High-Stakes Merger Deal

    Report: Justin Sun’s Tron Aims for Nasdaq Listing in High-Stakes Merger Deal

    Best Presales to Buy Today – Which Coins Are Poised for a Breakout?

    2025’s Breakout Meme Coin? Why Everyone’s Rushing to Get a Piece of $AKE

    Metaplanet Acquires Additional 1,112 Bitcoin, Total Holdings Reach 10,000 BTC

    Metaplanet Acquires Additional 1,112 Bitcoin, Total Holdings Reach 10,000 BTC

    Crypto to “Become Part of All Sectors” Under Trump: Kevin O’Leary

    Metaplanet Issues Fresh $210M Bonds to Evo Fund

    Bitcoin Going to $1M: Saylor’s Call Revives Interest in Adam Back’s 21M BTC Order

    Bitcoin Going to $1M: Saylor’s Call Revives Interest in Adam Back’s 21M BTC Order

    Bitcoin Eyes $30T Treasury Store of Value Market, Says Bitwise CEO

    Bitcoin Eyes $30T Treasury Store of Value Market, Says Bitwise CEO

    ZKJ Token Plummets More Than 60% in Flash Crash Amid Rug-Pull Allegations

    ZKJ Token Plummets More Than 60% in Flash Crash Amid Rug-Pull Allegations

  • Cybersecurity
    Cybersecurity

    AI Agents Run on Secret Accounts — Learn How to Secure Them in This Webinar

    Cybersecurity

    How to Address the Expanding Security Risk

    Cybersecurity

    ConnectWise to Rotate ScreenConnect Code Signing Certificates Due to Security Risks

    Cybersecurity

    5 Lessons from River Island

    Cybersecurity

    INTERPOL Dismantles 20,000+ Malicious IPs Linked to 69 Malware Variants in Operation Secure

    Cybersecurity

    SinoTrack GPS Devices Vulnerable to Remote Vehicle Control via Default Passwords

    Cybersecurity

    Researchers Uncover 20+ Configuration Risks, Including Five CVEs, in Salesforce Industry Cloud

    Cybersecurity

    Adobe Releases Patch Fixing 254 Vulnerabilities, Closing High-Severity Security Gaps

    Cybersecurity

    Researcher Found Flaw to Discover Phone Numbers Linked to Any Google Account

  • Deals
    Western Digital 8TB WD Red Plus NAS Internal Hard Drive HDD – 5640 RPM, SATA 6 Gb/s,…

    Western Digital 8TB WD Red Plus NAS Internal Hard Drive HDD – 5640 RPM, SATA 6 Gb/s,…

    Seagate BarraCuda Mobile Hard Drive 4TB SATA 6Gb/s 128MB Cache 2.5-Inch 15mm…

    Seagate BarraCuda Mobile Hard Drive 4TB SATA 6Gb/s 128MB Cache 2.5-Inch 15mm…

    Lexar 128GB (2-PK) Professional SILVER PRO SD Card, UHS-II, C10, U3, V60, Full HD, 4K,…

    Lexar 128GB (2-PK) Professional SILVER PRO SD Card, UHS-II, C10, U3, V60, Full HD, 4K,…

    SABRENT 2.5 Inch SATA to USB 3.0 Tool Free External Hard Drive Enclosure [Optimized for…

    SABRENT 2.5 Inch SATA to USB 3.0 Tool Free External Hard Drive Enclosure [Optimized for…

    B221000 Black Toner Cartridge B/MB2236 Replacement for Lexmark B221000 Toner Cartridge…

    B221000 Black Toner Cartridge B/MB2236 Replacement for Lexmark B221000 Toner Cartridge…

    Lexar 1TB Professional Go Portable SSD w/Hub, Supports Apple 4K 60fps ProRes, Up to…

    Lexar 1TB Professional Go Portable SSD w/Hub, Supports Apple 4K 60fps ProRes, Up to…

    Kingston NV3 1TB M.2 2280 NVMe SSD | PCIe 4.0 Gen 4×4 | Up to 6000 MB/s | SNV3S/1000G

    Kingston NV3 1TB M.2 2280 NVMe SSD | PCIe 4.0 Gen 4×4 | Up to 6000 MB/s | SNV3S/1000G

    Intel Core Ultra 7 Desktop Processor 265K – 20 cores (8 P-cores + 12 E-cores) up to 5.5…

    Intel Core Ultra 7 Desktop Processor 265K – 20 cores (8 P-cores + 12 E-cores) up to 5.5…

    Hitachi FIJ0038 Fuel Injector

    Hitachi FIJ0038 Fuel Injector

  • Gaming
    ASRock B860 LiveMixer Wi-Fi review

    ASRock B860 LiveMixer Wi-Fi review

    Okay, Ubisoft. We Need To Talk.

    Okay, Ubisoft. We Need To Talk.

    This New DRG Game Looks AMAZING! Deep Rock Galactic Rogue Core !

    This New DRG Game Looks AMAZING! Deep Rock Galactic Rogue Core !

    Squid Game 2 – (Full Walkthrough) | Roblox

    Squid Game 2 – (Full Walkthrough) | Roblox

    Windows Recall gets an export feature to let non-Microsoft websites and apps use your Copilot PC’s snapshots

    Windows Recall gets an export feature to let non-Microsoft websites and apps use your Copilot PC’s snapshots

    Good Game Review – The Witcher 3: Wild Hunt – TX: 19/5/15

    Good Game Review – The Witcher 3: Wild Hunt – TX: 19/5/15

    Fortnite CHAPTER 6 SEASON 3 – Trailer

    Fortnite CHAPTER 6 SEASON 3 – Trailer

    Minions Paradise – Gameplay Walkthrough Part 1 – Level 1-3 (iOS, Android)

    Minions Paradise – Gameplay Walkthrough Part 1 – Level 1-3 (iOS, Android)

    The new Windows 11 Insider release has a weird bug where it plays the Windows Vista start-up music instead of the current one

    The new Windows 11 Insider release has a weird bug where it plays the Windows Vista start-up music instead of the current one

  • Tesla
    Car Windshield Cleaner, Windshield Cleaning Tool Kit with Detachable Handle Spray…

    Car Windshield Cleaner, Windshield Cleaning Tool Kit with Detachable Handle Spray…

    Winch Stopper,Winch Accessories,Car Accessories Winch Cable Stopper,Rubber Winch…

    Winch Stopper,Winch Accessories,Car Accessories Winch Cable Stopper,Rubber Winch…

    LUCKEASY 2PCS Storage Box Compatible with Tesla Cybertruck 2024 2023 Center Console…

    LUCKEASY 2PCS Storage Box Compatible with Tesla Cybertruck 2024 2023 Center Console…

    Tesla on ‘self-driving’ gets stuck on train track and hit by train

    Tesla on ‘self-driving’ gets stuck on train track and hit by train

    Level 1/2 Tesla Charger – 16A 3.84KW Mobile EV Charging with 240V NEMA 6-20 Plug, 5-15…

    Level 1/2 Tesla Charger – 16A 3.84KW Mobile EV Charging with 240V NEMA 6-20 Plug, 5-15…

    Upgrade fit Tesla Model Y (2019-2023) Center Console Wireless Charger Mat – Silicone…

    Upgrade fit Tesla Model Y (2019-2023) Center Console Wireless Charger Mat – Silicone…

    Torx Plus Socket, 5-External Torx Socket 1/4″ Dr 10EPR Compatible With Tesla Model 3…

    Torx Plus Socket, 5-External Torx Socket 1/4″ Dr 10EPR Compatible With Tesla Model 3…

    Car Seat Organizers,Multi-functional Back Seat Protectors, Storage Pouches, and Tray…

    Car Seat Organizers,Multi-functional Back Seat Protectors, Storage Pouches, and Tray…

    AOHI USB C Car Charger, PD 45W&QC 30W 2 Port Type-C Fast Charging Car Charger Lighter…

    AOHI USB C Car Charger, PD 45W&QC 30W 2 Port Type-C Fast Charging Car Charger Lighter…

  • UFO
    How Joe Biden Crashed his Bike #shorts

    How Joe Biden Crashed his Bike #shorts

    F205 Drone with 2.4″ Screen, Brushless Motor FPV Drone with Camera for Adult, Auto-Hover, Gesture Control, 8GB SD Card with Card Reader, One-Key Start for Beginner

    F205 Drone with 2.4″ Screen, Brushless Motor FPV Drone with Camera for Adult, Auto-Hover, Gesture Control, 8GB SD Card with Card Reader, One-Key Start for Beginner

    Are Aliens Real? #dailyfactorz #facts #earth

    Are Aliens Real? #dailyfactorz #facts #earth

    Alien Birthday Party Decorations Door Curtain with Foil Fringe Tinsel Spacecraft Flying Saucer Spaceship Hanging Banner for Alien Halloween Party (Fluorescent Green)

    Alien Birthday Party Decorations Door Curtain with Foil Fringe Tinsel Spacecraft Flying Saucer Spaceship Hanging Banner for Alien Halloween Party (Fluorescent Green)

    I SHREDDED Alien's Belongings in VR! – Blinnk and the Vacuum of Space VR

    I SHREDDED Alien's Belongings in VR! – Blinnk and the Vacuum of Space VR

    Mind Blowing Encounters with Spiritual Beings and Astral Realms – With Erik Unger P-2

    Mind Blowing Encounters with Spiritual Beings and Astral Realms – With Erik Unger P-2

    Katie’s Bumpers Frequent Flyer UFO Yellow – FF7YEL

    Katie’s Bumpers Frequent Flyer UFO Yellow – FF7YEL

    Did Ancient astronauts visit Earth?? new evidence fuels extraterrestrial Theories! #viral #history

    Did Ancient astronauts visit Earth?? new evidence fuels extraterrestrial Theories! #viral #history

    INFUNLY 4pcs Solar System Patches Iron on Sequin Planet Embroidery Patch Rainbow UFO Patch Space Sew on Patch Spacecraft Patch Celestial Applique for DIY Clothing Jeans Bags Jacket Backpack Hat

    INFUNLY 4pcs Solar System Patches Iron on Sequin Planet Embroidery Patch Rainbow UFO Patch Space Sew on Patch Spacecraft Patch Celestial Applique for DIY Clothing Jeans Bags Jacket Backpack Hat

No Result
View All Result
Techcratic
No Result
View All Result
Home Hacker News

RDNA 4’s “Out-of-Order” Memory Accesses

Hacker News by Hacker News
March 23, 2025
in Hacker News
Reading Time: 33 mins read
124 6
A A
0

2025-03-23 18:18:00
chipsandcheese.com

AMD’s RDNA 4 brings a variety of memory subsystem enhancements. Among those, one slide stood out because it dealt with out-of-order memory accesses. According to the slide, RDNA 4 allows requests from different shaders to be satisfied out-of-order, and adds new out-of-order queues for memory requests.

AMD apparently had a false dependency case in the memory subsystem prior to RDNA 4. One wave could wait for a memory loads made by another wave. A “wavefront”, “wave”, or “warp” on a GPU is the rough equivalent of a CPU thread. It has its own register state, and can run out of sync with other waves. Each wave’s instructions are independent from those in other waves with very few exceptions (like atomic operations).

In RDNA 3, there was a strict ordering on the return of data, such that effectively a request that was made later in time was not permitted to pass a request made earlier in time, even if the data for it was ready much sooner.

Navi 4 Architecture Deep Dive, Andrew Pomianowski, CVP, Silicon Design Engineering (AMD)

A fundamental tenet of multithreaded programming is that you get no ordering guarantees between threads unless you make it happen via locks or other mechanisms. That’s what makes multithreaded performance scaling work. AMD’s slide took me by surprise because there’s no reason memory reads should be an exception. I re-watched the video several times and stared at the slide for a while to see if that’s really what they meant. They clearly meant it, but I still didn’t believe my eyes and ears. So I took time to craft a test for it.

AMD’s slide describes a scenario where one wave’s cache misses prevent another wave from quickly consuming data from cache hits. Causing cache misses is easy. I can pointer chase through a large array with a random pattern (“wave Y”). Similarly, I can keep accesses within a small memory footprint to get cache hits (“wave X”). But doing both at the same time is problematic. Wave Y may evict data used by wave X, causing cache misses on wave X.

Focusing on this scenario, and trying to create a Wave X and Wave Y that might hold each other up

Instead of going for cache hits and misses, I tested by seeing whether waiting on memory accesses in one wave would falsely wait in on memory accesses made by another. My “wave Y” is basically a memory latency test, and makes a fixed number of accesses. Each access depends on the previous one’s result, and I have the wave pointer chase through a 1 GB array to ensure cache misses. My “wave X” makes four independent memory accesses per loop iteration. It then consumes the load data, which means waiting for data to arrive from memory.

Once wave Y completes all of its accesses, it sets a flag in local memory. Wave X makes as many memory accesses as it can until it sees the flag set, after which it writes out its “score” and terminates. I run both waves in the same workgroup to ensure they share a WGP, and therefore share as much of the memory subsystem as possible. Keeping both waves in the same workgroup also lets me place the “finished” flag in local memory. Wave X has to check that flag every iteration, and it’s best to have flag checks not go through the same caches that wave Y is busy contaminating.

If each wave X access gets delayed by a wave Y one, I should see approximately the same number of accesses from both. Instead on RDNA 3, I see wave X make more accesses than wave Y by exactly the loop unroll factor on wave X. AMD’s compiler statically schedules instructions and sends out all four accesses before waiting on data. It then waits on load completion with s_waitcnt vmcnt(...) instructions.

Annotated RDNA 3 assembly generated by AMD’s compiler for wave X. Note that unrolling the loop to use four memory accesses per iteration lets the compiler issue those four accesses before waiting on them

Accesses tracked by vmcnt always return in-order, letting the compiler wait on specific accesses by waiting until vmcnt decrements to a certain value or lower. In wave Y, I make all accesses dependent so the compiler only waits for vmcnt to reach 0.

Annotated RDNA 3 assembly for wave y, for completeness

On RDNA 3, s_waitcnt vmcnt(...) seems to wait for requests to complete not only from its wave, but from other waves too. That explains why wave X makes exactly four accesses for each access that wave Y makes. If I unroll the loop more, letting the compiler schedule more independent accesses before waiting, the ratio goes up to match the unroll factor.

On RDNA 4, the two waves don’t care what the other is doing. That’s the way it should be. RDNA 4 also displays more run-to-run variation, which is also expected because cache behavior is highly unpredictable in this test. I’m surprised by the results, but it’s convincing evidence that AMD indeed had false cross-wave memory delays on RDNA 3 and older GPU architectures. I also tested on Renoir’s Vega iGPU, and saw the same behavior as RDNA 3.

At a simplistic level, you can imagine that requests from the shaders go into a queue to be serviced, and many of those requests can be in flight

Navi 4 Architecture Deep Dive, Andrew Pomianowski, CVP, Silicon Design Engineering (AMD)

AMD’s presentation hints that RDNA 3 and older GPUs had multiple waves sharing a memory access queue. As mentioned above, AMD GPUs since GCN handle memory dependencies with hardware counters that software waits on. By keeping vmcnt returns in-order, the compiler can wait on the specific load that produces data needed by the next instruction, without also waiting on every other load the wave has pending. RDNA 3 and prior AMD GPUs possibly had a shared memory access queue, with each entry tagged with its wave’s ID. As each memory access leaves the queue in-order, hardware decrements the counter for its wave.

Perhaps RDNA 4 divides the shared queue into per-thread queues. That would align with the point on AMD’s slide saying RDNA 4 introduces “additional out-of-order queues” for memory requests. Or perhaps RDNA 4 retains a shared queue, but can drain entries out-of-order. That would require tracking extra info, like whether a memory access is the oldest one for its wave.

Sharing a memory access queue and returning data in-order seems like a natural hardware simplification. That raises the question of whether GPU architectures from Intel and Nvidia had similar limitations.

Intel’s Xe-LPG does not have false cross-wave memory dependencies. Running the same test on Meteor Lake’s iGPU shows variation depending on where the two waves end up. If wave X and wave Y run on XVEs with shared instruction control logic, wave X’s performance is lower than in other cases. Regardless, it’s clear Xe-LPG doesn’t force a wave to wait on another’s accesses. Intel’s subsequent Battlemage (Xe2) architecture shows similar behavior, and the same applies to Intel’s Gen 9 (Skylake) graphics from a while ago.

I also checked generated assembly to ensure Intel’s compiler wasn’t unrolling the loop further.

Generated assembly on Meteor Lake’s iGPU for Wave X. UGM = untyped global memory, SLM = shared local memory. The rest is trivial, just remember that Intel GPUs have registers that are full of registers…never mind

Nvidia’s Pascal has varying behavior depending on where waves are located within a SM. Each Pascal SM has four partitions, which are arranged in pairs that share a texture unit and a 24 KB texture cache. Waves are assigned to partitions within a pair first. It’s as if the partitions are numbered [0,1]-> tex, [2,3]-> tex. Waves in the same sub-partition pair have the false dependency issue. Evidently they share some general load/store logic besides the texture unit, because I don’t touch textures in this test.

If a wave is not offset from another one by a multiple of four or multiple of 4 plus one, it doesn’t have the false dependency problem. Turing, as tested on the GTX 1660 Ti, doesn’t have a problem either.

Besides removing false cross-wave delays, AMD also improved memory request handling within a wave. Much like in-order CPU cores, like Arm’s Cortex A510, GPUs can execute independent instructions while waiting on memory access. A thread only stalls when it tries to use the memory access’s result. GPUs have done this for decades, though the implementation details differ. Intel and Nvidia’s GPUs use a software managed scoreboard. AMD used pending request counters from GCN onward.

RDNA 4 uses the same scheme but splits out the vmcnt category into several counters. A thread can interleave global memory, texture sampling, and raytracing intersection test requests, and wait on them separately. That gives the compiler more flexibility to move work ahead of a wait for memory access completion. Another interpretation of AMD’s slide is that the each counter corresponds to a separate queue, each of which has out-of-order behavior across waves (but may have in-order behavior within a wave).

Example of RDNA 4 assembly from 3DMark’s raytracing feature test, showing a basic block separately waiting on global memory loads and texture sampling requests issued by other basic blocks

Similarly, lgkmcnt gets separated into kmcnt for scalar memory loads and dscnt for LDS accesses. Scalar memory loads are out-of-order, which means the compiler must wait for all scalar memory loads to complete (kmcnt=0 or lgmkcnt=0) before using results from any pending scalar memory load. On RDNA 4, the compiler can interleave scalar memory and LDS accesses without having to wait for lgkmcnt=0.

Intel and Nvidia’s GPUs use software managed scoreboards. A scoreboard entry can be set or and waited on by any instruction, regardless of memory access type. Therefore RDNA 4’s optimization isn’t applicable to those other GPU architectures. A cost to Intel/Nvidia’s approach is that utilizing a big memory request queue would require a correspondingly large scoreboard. AMD can extend a counter by one bit and double the number of queue entries a wave can use.

RDNA 4’s memory subsystem enhancements are exciting and improve performance across a variety of workloads compared to RDNA 3. AMD specifically calls out benefits in raytracing workloads, where traversal and result handling may occur simultaneously on the same WGP. Traversal involves pointer chasing, while result handling might involve more cache friendly data lookups and texture sampling. Breaking cross-wave memory dependencies would prevent different memory access patterns in those tasks from delaying each other.

Likely this wasn’t an issue with rasterization because waves assigned to a WGP probably work on pixels in close proximity. Those waves may sample the same textures, and even take samples in close proximity to each other within the same texture. If one wave misses in cache, the others likely do too.

Breaking up vmcnt and lgmkcnt probably helps raytracing too. Raytracing shaders make BVH intersection and LDS stack management requests during traversal. Then they might sample textures or access global memory buffers during result handling. Giving the compiler flexibility to interleave those request types and still wait on a specific request is a good thing.

Radeon logo on the RX 9070 graciously provided by AMD for review

But RDNA 4’s scheme for handling memory dependencies isn’t fundamentally different from that of GCN many years ago. While the implementation details differ, RDNA 4, GCN, and Intel and Nvidia’s GPUs can all absorb cache misses without immediately stalling a thread. Each GPU maker has improved their ability to do so, whether it’s with more scoreboard tokens or more counters. RDNA 4 indeed can do Cortex A510 style nonblocking loads, but it’s far from a new feature in the world of GPUs.

Resolving false cross-wave dependencies isn’t new either. Nvidia had “out-of-order” cross-wave memory access handling in Turing, and presumably their newer architectures too. Intel had the same at least as far back as Gen 9 (Skylake) graphics. Therefore RDNA 4’s “out-of-order” memory subsystem enhancements are best seen as generational tweaks, rather than new game changing techniques.

Still, AMD’s engineers deserve credit for making them happen. RDNA 4’s arguably makes the most significant change to AMD’s GPU memory subsystem since RDNA launched in 2019. I’m glad to see the company continue to improve their GPU architecture and make it better suited to emerging workloads like raytracing.

Source Link


Keep your files stored safely and securely with the SanDisk 2TB Extreme Portable SSD. With over 69,505 ratings and an impressive 4.6 out of 5 stars, this product has been purchased over 8K+ times in the past month. At only $129.99, this Amazon’s Choice product is a must-have for secure file storage.

Help keep private content private with the included password protection featuring 256-bit AES hardware encryption. Order now for just $129.99 on Amazon!


Start your free Amazon Prime trial
today and unlock unlimited streaming and more!

Help Power Techcratic’s Future – Scan To Support

If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.

As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!

BITCOIN

Bitcoin Logo

Bitcoin QR Code

bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge

Scan the QR code with your crypto wallet app

DOGECOIN

Dogecoin Logo

Dogecoin QR Code

D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA

Scan the QR code with your crypto wallet app

ETHEREUM

Ethereum Logo

Ethereum QR Code

0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a

Scan the QR code with your crypto wallet app

Please read the Privacy and Security Disclaimer on how Techcratic handles your support.

Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.

Tags: Hacker News
Share161ShareTweet101
Previous Post

Bling Car Gear Knob Cover, Rhinestone Anti-Slip Gear Shift Sleeve with Crystal Diamond,…

Next Post

The Xiaomi Redmi A5 4G is coming to Europe at a very affordable price

Hacker News

Hacker News

Stay updated with Hacker News, where technology meets entrepreneurial spirit. Get the latest on tech trends, startup news, and discussions from the tech community. Read the latest updates here at Techcratic.

Related Posts

OpenTelemetry for Go: measuring the overhead
Hacker News

OpenTelemetry for Go: measuring the overhead

June 16, 2025
1.3k
Getting free internet on a cruise, saving $170
Hacker News

Getting free internet on a cruise, saving $170

June 16, 2025
1.3k
ccbikai/ssh-ai-chat: Chat with AI over SSH.
Hacker News

ccbikai/ssh-ai-chat: Chat with AI over SSH.

June 16, 2025
1.3k
rorosen/zeekstd: Rust implementation of the Zstandard Seekable Format
Hacker News

rorosen/zeekstd: Rust implementation of the Zstandard Seekable Format

June 16, 2025
1.3k
Solving LinkedIn Queens with APL
Hacker News

Solving LinkedIn Queens with APL

June 16, 2025
1.3k
KAIST NEWS CENTER
Hacker News

KAIST NEWS CENTER

June 15, 2025
1.3k
How fast can the RPython GC allocate?
Hacker News

How fast can the RPython GC allocate?

June 15, 2025
1.3k
Biofuels Policy, a Mainstay of American Agriculture, Has Been a Failure for the Climate, a New Report Claims
Hacker News

Biofuels Policy, a Mainstay of American Agriculture, Has Been a Failure for the Climate, a New Report Claims

June 15, 2025
1.3k
Load More
Next Post
Smartphone

The Xiaomi Redmi A5 4G is coming to Europe at a very affordable price

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Tech Resources

  • 30 Second Tech ™
  • AI
  • App Zone ™
  • Apple
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • Forbes
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo
  • Google News
  • Hacker News
  • Harvard Tech
  • I Like Cats ™
  • I Like Dogs ™
  • LifeHacker
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • PC World
  • Photofocus
  • Physics
  • Random Tech
  • Retro Rewind ™
  • Robot Report
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Art
  • Tech Careers
  • Tech Deals
  • Techcratic ™
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Wired
  • ZDNet

Tech News

  • 30 Second Tech ™
  • AI
  • Apple Insider
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • ExtremeTech
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo

Tech News

  • Harvard Tech
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • Physics
  • PC World
  • Random Tech
  • Retro Rewind ™
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Careers

Tech News​

  • Tech Art
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Wired
  • ZDNet

Site Links

  • About Techcratic
  • Affiliate Disclaimer
  • Affiliate Link Policy
  • Contact Techcratic
  • Dealors Discount Store
  • Privacy and Security Disclaimer
  • Privacy Policy
  • RSS Feed
  • Site Map
  • Support Techcratic
  • Techcratic
  • Tech Deals
  • TOS
  • 𝕏
Click For A Secret Deal

Techcratic – Your All In One Tech Hub © 2020 – 2025
All Rights Reserved
∞

No Result
View All Result
  • 30 Second Tech ™
  • AI
  • App Zone ™
  • Apple
  • Ars Technica
  • CNET
  • Crypto News
  • Cybersecurity
  • Endgadget
  • Gaming
  • I Like Cats ™
  • I Like Dogs ™
  • MacRumors
  • Macworld
  • Tech Deals
  • Techcratic ™
  • Techs Got To Eat ™
  • Tesla
  • UFO
  • Wired