• About TC
  • Affiliate Disclaimer
  • Privacy Policy
  • TOS
  • Contact
Monday, May 19, 2025
Techcratic
Click For A Secret Deal
  • TC
  • AI
    Artificial Intelligence

    Set up a custom plugin on Amazon Q Business and authenticate with Amazon Cognito to interact with backend systems

    Artificial Intelligence

    StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

    Artificial Intelligence

    3 Excellent Practical Generative AI Courses

    Artificial Intelligence

    Building End-to-End Data Pipelines with Dask

    Artificial Intelligence

    Automate document translation and standardization with Amazon Bedrock and Amazon Translate

    Artificial Intelligence

    InterVision accelerates AI development using AWS LLM League and Amazon SageMaker AI

    Artificial Intelligence

    FireDucks: An Accelerated Fully Compatible Pandas Library

    Artificial Intelligence

    Breaking Out of Beginner: Python Patterns for Intermediate Data Scientists

    Artificial Intelligence

    Building a Personal Knowledge Management Tool with Reor

  • Crypto
    Bitcoin bull market ‘almost over?’ Traders split over BTC price at $105K

    Bitcoin bull market ‘almost over?’ Traders split over BTC price at $105K

    Bitcoin Price Watch: Market Recoils After $107K Peak, Eyes Key Support

    Bitcoin Price Watch: Market Recoils After $107K Peak, Eyes Key Support

    Success: El Salvador Boasts Hundreds of Millions of Dollars in Bitcoin Revenue

    Success: El Salvador Boasts Hundreds of Millions of Dollars in Bitcoin Revenue

    Fireblocks Reveals 90% of Industry Participants Are Actively Engaging With Stablecoins

    Fireblocks Reveals 90% of Industry Participants Are Actively Engaging With Stablecoins

    Australian Police Seize Hacker’s Bitcoin, Mansion and Luxury Car

    Australian Police Seize Hacker’s Bitcoin, Mansion and Luxury Car

    Ripple Hails Dubai’s Crypto Vision at Fintech Summit

    Ripple Hails Dubai’s Crypto Vision at Fintech Summit

    Bitcoin Poised for 20-Year Outperformance, Says Analyst

    Bitcoin Poised for 20-Year Outperformance, Says Analyst

    Canadian Bitcoin Miner POW.RE to Acquire Swiss Firm Block Green

    Canadian Bitcoin Miner POW.RE to Acquire Swiss Firm Block Green

    Panama City Eyes Bold Move With Potential Bitcoin Reserve

    Panama City Eyes Bold Move With Potential Bitcoin Reserve

  • Cybersecurity
    Cybersecurity

    RVTools Official Site Hacked to Deliver Bumblebee Malware via Trojanized Installer

    Cybersecurity

    Ransomware Gangs Use Skitnet Malware for Stealthy Data Theft and Remote Access

    Cybersecurity

    Firefox Patches 2 Zero-Days Exploited at Pwn2Own Berlin with $100K in Rewards

    Cybersecurity

    New HTTPBot Botnet Launches 200+ Precision DDoS Attacks on Gaming and Tech Sectors

    Cybersecurity

    Top 10 Best Practices for Effective Data Protection

    Cybersecurity

    Researchers Expose New Intel CPU Flaws Enabling Memory Leaks and Spectre v2 Attacks

    Cybersecurity

    Learn a Smarter Way to Defend Modern Applications

    Cybersecurity

    Meta to Train AI on E.U. User Data From May 27 Without Consent; Noyb Threatens Lawsuit

    Cybersecurity

    5 BCDR Essentials for Effective Ransomware Defense

  • Deals
    Seagate 3TB 7200RPM 64MB Cache SATA 6.0Gb/s 3.5in (Heavy Duty) Internal Desktop Hard…

    Seagate 3TB 7200RPM 64MB Cache SATA 6.0Gb/s 3.5in (Heavy Duty) Internal Desktop Hard…

    SanDisk 32GB 4-Pack Outdoors HD SDHC UHS-I Memory Card (4x32GB) – Up to 100MB/s, C4,…

    SanDisk 32GB 4-Pack Outdoors HD SDHC UHS-I Memory Card (4x32GB) – Up to 100MB/s, C4,…

    Samsung Galaxy Watch4 44MM SM-R870 Aluminum Smartwatch GPS Only (Renewed)

    Samsung Galaxy Watch4 44MM SM-R870 Aluminum Smartwatch GPS Only (Renewed)

    USB Audio Adapter, External Sound Card for PC and Laptop, USB to 3.5mm(1/8 inch) Audio…

    USB Audio Adapter, External Sound Card for PC and Laptop, USB to 3.5mm(1/8 inch) Audio…

    EMTEC 32GB-Wonderwoman DC Comics Collector USB 2.0 3D Soft Touch Gum Flash Drive

    EMTEC 32GB-Wonderwoman DC Comics Collector USB 2.0 3D Soft Touch Gum Flash Drive

    Mediasonic Hard Drive Handle x 4 for Hard Drive Enclosure

    Mediasonic Hard Drive Handle x 4 for Hard Drive Enclosure

    Original OEM Supernova 8 PIN to 8 PIN (6+2) PCI Express VGA Power Cable (W001-00-000147)

    Original OEM Supernova 8 PIN to 8 PIN (6+2) PCI Express VGA Power Cable (W001-00-000147)

    Dell Latitude 5550 15.6″ FHD Business Laptop Computer, Intel Ultra 5 135U (Beat…

    Dell Latitude 5550 15.6″ FHD Business Laptop Computer, Intel Ultra 5 135U (Beat…

    Eco-Friendly Ink Cartridge Replacement for Canon 245XL 246XL Black and Color Combo 2…

    Eco-Friendly Ink Cartridge Replacement for Canon 245XL 246XL Black and Color Combo 2…

  • Gaming
    Legend of Zelda Breath of the Wild The Royal Recipe Gameplay Walkthrough

    Legend of Zelda Breath of the Wild The Royal Recipe Gameplay Walkthrough

    The Legend of Zelda Ocarina of Time Walkthrough parte 36 Ice Cavern

    The Legend of Zelda Ocarina of Time Walkthrough parte 36 Ice Cavern

    NVIDIA Research Breakthroughs Put Advanced Robots in Motion

    NVIDIA Research Breakthroughs Put Advanced Robots in Motion

    Zelda: The Minish Cap | Episode 1

    Zelda: The Minish Cap | Episode 1

    How to Get The Thunder Helm in Zelda Breath of the Wild

    GTA 6 TRAILER 2 NOVITA' UFFICIALI! IL VERO PIANO DI ROCKSTAR GAMES PER GTA 6

    GTA 6 TRAILER 2 NOVITA' UFFICIALI! IL VERO PIANO DI ROCKSTAR GAMES PER GTA 6

    GTA 6 Trailer 2: Has Rockstar Confirmed the Release Date?

    GTA 6 Trailer 2: Has Rockstar Confirmed the Release Date?

    God Of War Ragnarok Walkthrough Part 6 (PS4)

    God Of War Ragnarok Walkthrough Part 6 (PS4)

    God Of War Ragnarok Valhalla Is Way Bigger Than You Think (God Of War Valhalla Gameplay)

    God Of War Ragnarok Valhalla Is Way Bigger Than You Think (God Of War Valhalla Gameplay)

  • Tesla
    Tesla paid Powerwall owners $10 million through virtual power plants

    Tesla paid Powerwall owners $10 million through virtual power plants

    4PCS Wheel Center Hub Caps Cover for Tesla Cybertruck, ABS Full Coverage Wheel Hub…

    4PCS Wheel Center Hub Caps Cover for Tesla Cybertruck, ABS Full Coverage Wheel Hub…

    NACS to CCS1 Charging Adapter, Max 250KW Supercharger Adapter, Electric Vehicle Charging…

    NACS to CCS1 Charging Adapter, Max 250KW Supercharger Adapter, Electric Vehicle Charging…

    2 Pack HEPA Air Filter for Tesla Model 3 Model Y, Compatible with 2016-2024, 2 Count,…

    2 Pack HEPA Air Filter for Tesla Model 3 Model Y, Compatible with 2016-2024, 2 Count,…

    Turtle Wax 53787 Hybrid Solutions Ceramic Graphene Inside Job, Interior Car Cleaner and…

    Turtle Wax 53787 Hybrid Solutions Ceramic Graphene Inside Job, Interior Car Cleaner and…

    Tinlucys Center Console Protector Cover White Designed for Tesla Model 3/Y 2021-2024(Not…

    Tinlucys Center Console Protector Cover White Designed for Tesla Model 3/Y 2021-2024(Not…

    Ziciner 4 Pack Bling Car Cup Coaster, Crystal Soft Rubber Cup Holder Insert Coaster,…

    Ziciner 4 Pack Bling Car Cup Coaster, Crystal Soft Rubber Cup Holder Insert Coaster,…

    Hitch Cover for Tesla Model Y, Easy to Install and Remove with Magnet Design

    Hitch Cover for Tesla Model Y, Easy to Install and Remove with Magnet Design

    Marretoo for Tesla Cybertruck Truck Under Seat Storage Box- Foldable Replacement for…

    Marretoo for Tesla Cybertruck Truck Under Seat Storage Box- Foldable Replacement for…

  • UFO
    Plan 9 From Outer-Space

    Plan 9 From Outer-Space

    The Basics of Spacecraft Survival

    The Basics of Spacecraft Survival

    A Mind-Blowing Encounter With a Tarantula

    A Mind-Blowing Encounter With a Tarantula

    Are we alone? Dark Forest Theory #astrophysics #universe  #space #extraterrestrial #alien

    Are we alone? Dark Forest Theory #astrophysics #universe #space #extraterrestrial #alien

    Bencailor 12 Pack Carnival Alien Glasses with Lenses Funny Glasses for Alien Costume Accessories Mardi Gras Party Favor(Mixed Colors)

    Bencailor 12 Pack Carnival Alien Glasses with Lenses Funny Glasses for Alien Costume Accessories Mardi Gras Party Favor(Mixed Colors)

    Valiant Thor: A UFO, the Pentagon and a 3-year Mission to Save the World

    UFO whistleblower David Grusch: 'We are not alone' | Official Ross Coulthart NewsNation interview

    UFO whistleblower David Grusch: 'We are not alone' | Official Ross Coulthart NewsNation interview

    MAGCOMSEN Men’s Sun Protection Long Sleeve Shirts 1/4 Zip Up Rash Guard Lightweight Quick Dry Fishing Running T-Shirts

    MAGCOMSEN Men’s Sun Protection Long Sleeve Shirts 1/4 Zip Up Rash Guard Lightweight Quick Dry Fishing Running T-Shirts

    Agricultural Research Council believes that alien trees are threatening water supplies.

    Agricultural Research Council believes that alien trees are threatening water supplies.

No Result
View All Result
  • TC
  • AI
    Artificial Intelligence

    Set up a custom plugin on Amazon Q Business and authenticate with Amazon Cognito to interact with backend systems

    Artificial Intelligence

    StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant

    Artificial Intelligence

    3 Excellent Practical Generative AI Courses

    Artificial Intelligence

    Building End-to-End Data Pipelines with Dask

    Artificial Intelligence

    Automate document translation and standardization with Amazon Bedrock and Amazon Translate

    Artificial Intelligence

    InterVision accelerates AI development using AWS LLM League and Amazon SageMaker AI

    Artificial Intelligence

    FireDucks: An Accelerated Fully Compatible Pandas Library

    Artificial Intelligence

    Breaking Out of Beginner: Python Patterns for Intermediate Data Scientists

    Artificial Intelligence

    Building a Personal Knowledge Management Tool with Reor

  • Crypto
    Bitcoin bull market ‘almost over?’ Traders split over BTC price at $105K

    Bitcoin bull market ‘almost over?’ Traders split over BTC price at $105K

    Bitcoin Price Watch: Market Recoils After $107K Peak, Eyes Key Support

    Bitcoin Price Watch: Market Recoils After $107K Peak, Eyes Key Support

    Success: El Salvador Boasts Hundreds of Millions of Dollars in Bitcoin Revenue

    Success: El Salvador Boasts Hundreds of Millions of Dollars in Bitcoin Revenue

    Fireblocks Reveals 90% of Industry Participants Are Actively Engaging With Stablecoins

    Fireblocks Reveals 90% of Industry Participants Are Actively Engaging With Stablecoins

    Australian Police Seize Hacker’s Bitcoin, Mansion and Luxury Car

    Australian Police Seize Hacker’s Bitcoin, Mansion and Luxury Car

    Ripple Hails Dubai’s Crypto Vision at Fintech Summit

    Ripple Hails Dubai’s Crypto Vision at Fintech Summit

    Bitcoin Poised for 20-Year Outperformance, Says Analyst

    Bitcoin Poised for 20-Year Outperformance, Says Analyst

    Canadian Bitcoin Miner POW.RE to Acquire Swiss Firm Block Green

    Canadian Bitcoin Miner POW.RE to Acquire Swiss Firm Block Green

    Panama City Eyes Bold Move With Potential Bitcoin Reserve

    Panama City Eyes Bold Move With Potential Bitcoin Reserve

  • Cybersecurity
    Cybersecurity

    RVTools Official Site Hacked to Deliver Bumblebee Malware via Trojanized Installer

    Cybersecurity

    Ransomware Gangs Use Skitnet Malware for Stealthy Data Theft and Remote Access

    Cybersecurity

    Firefox Patches 2 Zero-Days Exploited at Pwn2Own Berlin with $100K in Rewards

    Cybersecurity

    New HTTPBot Botnet Launches 200+ Precision DDoS Attacks on Gaming and Tech Sectors

    Cybersecurity

    Top 10 Best Practices for Effective Data Protection

    Cybersecurity

    Researchers Expose New Intel CPU Flaws Enabling Memory Leaks and Spectre v2 Attacks

    Cybersecurity

    Learn a Smarter Way to Defend Modern Applications

    Cybersecurity

    Meta to Train AI on E.U. User Data From May 27 Without Consent; Noyb Threatens Lawsuit

    Cybersecurity

    5 BCDR Essentials for Effective Ransomware Defense

  • Deals
    Seagate 3TB 7200RPM 64MB Cache SATA 6.0Gb/s 3.5in (Heavy Duty) Internal Desktop Hard…

    Seagate 3TB 7200RPM 64MB Cache SATA 6.0Gb/s 3.5in (Heavy Duty) Internal Desktop Hard…

    SanDisk 32GB 4-Pack Outdoors HD SDHC UHS-I Memory Card (4x32GB) – Up to 100MB/s, C4,…

    SanDisk 32GB 4-Pack Outdoors HD SDHC UHS-I Memory Card (4x32GB) – Up to 100MB/s, C4,…

    Samsung Galaxy Watch4 44MM SM-R870 Aluminum Smartwatch GPS Only (Renewed)

    Samsung Galaxy Watch4 44MM SM-R870 Aluminum Smartwatch GPS Only (Renewed)

    USB Audio Adapter, External Sound Card for PC and Laptop, USB to 3.5mm(1/8 inch) Audio…

    USB Audio Adapter, External Sound Card for PC and Laptop, USB to 3.5mm(1/8 inch) Audio…

    EMTEC 32GB-Wonderwoman DC Comics Collector USB 2.0 3D Soft Touch Gum Flash Drive

    EMTEC 32GB-Wonderwoman DC Comics Collector USB 2.0 3D Soft Touch Gum Flash Drive

    Mediasonic Hard Drive Handle x 4 for Hard Drive Enclosure

    Mediasonic Hard Drive Handle x 4 for Hard Drive Enclosure

    Original OEM Supernova 8 PIN to 8 PIN (6+2) PCI Express VGA Power Cable (W001-00-000147)

    Original OEM Supernova 8 PIN to 8 PIN (6+2) PCI Express VGA Power Cable (W001-00-000147)

    Dell Latitude 5550 15.6″ FHD Business Laptop Computer, Intel Ultra 5 135U (Beat…

    Dell Latitude 5550 15.6″ FHD Business Laptop Computer, Intel Ultra 5 135U (Beat…

    Eco-Friendly Ink Cartridge Replacement for Canon 245XL 246XL Black and Color Combo 2…

    Eco-Friendly Ink Cartridge Replacement for Canon 245XL 246XL Black and Color Combo 2…

  • Gaming
    Legend of Zelda Breath of the Wild The Royal Recipe Gameplay Walkthrough

    Legend of Zelda Breath of the Wild The Royal Recipe Gameplay Walkthrough

    The Legend of Zelda Ocarina of Time Walkthrough parte 36 Ice Cavern

    The Legend of Zelda Ocarina of Time Walkthrough parte 36 Ice Cavern

    NVIDIA Research Breakthroughs Put Advanced Robots in Motion

    NVIDIA Research Breakthroughs Put Advanced Robots in Motion

    Zelda: The Minish Cap | Episode 1

    Zelda: The Minish Cap | Episode 1

    How to Get The Thunder Helm in Zelda Breath of the Wild

    GTA 6 TRAILER 2 NOVITA' UFFICIALI! IL VERO PIANO DI ROCKSTAR GAMES PER GTA 6

    GTA 6 TRAILER 2 NOVITA' UFFICIALI! IL VERO PIANO DI ROCKSTAR GAMES PER GTA 6

    GTA 6 Trailer 2: Has Rockstar Confirmed the Release Date?

    GTA 6 Trailer 2: Has Rockstar Confirmed the Release Date?

    God Of War Ragnarok Walkthrough Part 6 (PS4)

    God Of War Ragnarok Walkthrough Part 6 (PS4)

    God Of War Ragnarok Valhalla Is Way Bigger Than You Think (God Of War Valhalla Gameplay)

    God Of War Ragnarok Valhalla Is Way Bigger Than You Think (God Of War Valhalla Gameplay)

  • Tesla
    Tesla paid Powerwall owners $10 million through virtual power plants

    Tesla paid Powerwall owners $10 million through virtual power plants

    4PCS Wheel Center Hub Caps Cover for Tesla Cybertruck, ABS Full Coverage Wheel Hub…

    4PCS Wheel Center Hub Caps Cover for Tesla Cybertruck, ABS Full Coverage Wheel Hub…

    NACS to CCS1 Charging Adapter, Max 250KW Supercharger Adapter, Electric Vehicle Charging…

    NACS to CCS1 Charging Adapter, Max 250KW Supercharger Adapter, Electric Vehicle Charging…

    2 Pack HEPA Air Filter for Tesla Model 3 Model Y, Compatible with 2016-2024, 2 Count,…

    2 Pack HEPA Air Filter for Tesla Model 3 Model Y, Compatible with 2016-2024, 2 Count,…

    Turtle Wax 53787 Hybrid Solutions Ceramic Graphene Inside Job, Interior Car Cleaner and…

    Turtle Wax 53787 Hybrid Solutions Ceramic Graphene Inside Job, Interior Car Cleaner and…

    Tinlucys Center Console Protector Cover White Designed for Tesla Model 3/Y 2021-2024(Not…

    Tinlucys Center Console Protector Cover White Designed for Tesla Model 3/Y 2021-2024(Not…

    Ziciner 4 Pack Bling Car Cup Coaster, Crystal Soft Rubber Cup Holder Insert Coaster,…

    Ziciner 4 Pack Bling Car Cup Coaster, Crystal Soft Rubber Cup Holder Insert Coaster,…

    Hitch Cover for Tesla Model Y, Easy to Install and Remove with Magnet Design

    Hitch Cover for Tesla Model Y, Easy to Install and Remove with Magnet Design

    Marretoo for Tesla Cybertruck Truck Under Seat Storage Box- Foldable Replacement for…

    Marretoo for Tesla Cybertruck Truck Under Seat Storage Box- Foldable Replacement for…

  • UFO
    Plan 9 From Outer-Space

    Plan 9 From Outer-Space

    The Basics of Spacecraft Survival

    The Basics of Spacecraft Survival

    A Mind-Blowing Encounter With a Tarantula

    A Mind-Blowing Encounter With a Tarantula

    Are we alone? Dark Forest Theory #astrophysics #universe  #space #extraterrestrial #alien

    Are we alone? Dark Forest Theory #astrophysics #universe #space #extraterrestrial #alien

    Bencailor 12 Pack Carnival Alien Glasses with Lenses Funny Glasses for Alien Costume Accessories Mardi Gras Party Favor(Mixed Colors)

    Bencailor 12 Pack Carnival Alien Glasses with Lenses Funny Glasses for Alien Costume Accessories Mardi Gras Party Favor(Mixed Colors)

    Valiant Thor: A UFO, the Pentagon and a 3-year Mission to Save the World

    UFO whistleblower David Grusch: 'We are not alone' | Official Ross Coulthart NewsNation interview

    UFO whistleblower David Grusch: 'We are not alone' | Official Ross Coulthart NewsNation interview

    MAGCOMSEN Men’s Sun Protection Long Sleeve Shirts 1/4 Zip Up Rash Guard Lightweight Quick Dry Fishing Running T-Shirts

    MAGCOMSEN Men’s Sun Protection Long Sleeve Shirts 1/4 Zip Up Rash Guard Lightweight Quick Dry Fishing Running T-Shirts

    Agricultural Research Council believes that alien trees are threatening water supplies.

    Agricultural Research Council believes that alien trees are threatening water supplies.

No Result
View All Result
Techcratic
No Result
View All Result

Diffusion models explained simply | sean goedecke

Hacker News by Hacker News
May 19, 2025
in Hacker News
Reading Time: 10 mins read
122 8
A A
0
Home Hacker News
Share on FacebookShare on XShare on LinkedIn

2025-05-19 09:06:00
www.seangoedecke.com

Transformer-based large language models are relatively easy to understand. You break language down into a finite set of “tokens” (words or sub-word components), then train a neural network on millions of token sequences so it can predict the next token based on all the previous ones. Despite some clever tricks (mainly about how the model processes the previous tokens in the sequence), the core mechanism is relatively simple.

It’s harder to build the same kind of intuition about diffusion models (in part because the papers are much harder to read). But diffusion models are almost as big a part of the AI revolution as transformers. High-quality image generation has driven a lot of user interest in AI, particularly ChatGPT’s recent upgraded image generation.

Even if you don’t care much about images, there are also some fairly capable text-based diffusion models – not yet competitive with frontier transformer models, but it’s certainly possible that we’d someday see a diffusion language model that’s state-of-the-art in its niche.

The core intuition

So what are diffusion models? How are they different from transformers? What is the animating intuition that makes sense of how diffusion models work?

Imagine a picture of a dog. You could slowly add randomly-colored pixels to that picture – the visual equivalent of “white noise” – until it just looks like noise. You could do the same for any possible image. All those possible images look very different, but the eventual noise looks the same. That means that for any possible image, there is a gradient of steps between that image and “pure noise”.




gaussian noise

What if you could train a model to understand that gradient?

Training and inference

To train a diffusion model, you take a large set of images, each expressed as a big tensor, and a caption for each image, each expressed as a normal text-model embedding. At each step in the training, for the current image, you add a little bit of random noise. Then you pass that noisy image and caption to the model, and ask it to predict exactly what noise was added to the image (e.g. which pixels changed from what color to what color). Unlike a language model, there’s no “tokens” – every model step takes a full image as input and produces a “noise report” as output. Finally, you reward the model based on how close the model’s prediction was.

It’s important to train on noisy images, all the way from a little bit of noise to images that are indistinguishable from static. Typically that’s done by adding increasing amounts of noise to images in the training set during training (on a fixed schedule). Eventually your model gets really good at identifying the last layer of noise, even from images that just look like the “pure noise” image above.

At inference time, that’s exactly what you do: start with pure noise and a user-provided caption, then run the model to identify the “top” layer of noise. Remove that layer, then keep running the model and removing layers until you’re left with the “original” image. In reality, that image was entirely generated by the model. This process of identifying a layer of noise and reversing it is called “denoising”.

There are lots of tricks that get used in this process, but the two most important ones are variational auto-encoders and classifier-free guidance.

Variational auto-encoders

Expressing an image (or a video) as a big tensor is very expensive. Images have a lot of pixels! In practice, diffusion models operate on a compressed version of each image, kind of like how text models operate on strings of tokens rather than individual letters of bytes. How is that compressed version generated?

Typically with a variational autoencoder (VAE) model that is trained first. That model learns to turn a big image tensor into a smaller random-looking tensor, while still being able to convert it back into the original image. Why use a VAE rather than an existing well-known compression like JPEG?

  • It’s important that the compressed representation be random-looking (i.e. Gaussian-shaped) so the denoising process works properly. JPEG compression is highly structured
  • The compressed representation must always be the same size, which current compression algorithms don’t do
  • It’s OK for the VAE to discard some details (e.g. camera noise) which JPEG compression will retain

So the usual strategy for training and inference is to run a VAE over your image tensor, add noise, denoise on that, and then decode it back to an original full-size image. Note that there are some models that don’t use VAE, like DALLE-3, but it’s much slower and more expensive.

Classifier-free guidance

There’s a common trick to make sure the model is actually learning to generate images based on the caption, instead of just any possible image. During training, you zero out the caption for some images, so the model learns two functions: not just how to remove the noise for a caption, but how to remove the noise for any possible image. During inference, you run once with a caption and once without, and blend the predictions (magnifying the difference between those two vectors). That makes sure the model is paying a lot of attention to the caption.

Key differences from transformers

The fundamental operation here is totally different from transformer-based language models, so many of your intuitions about transformers won’t apply. For instance:

  • At each inference step, transformers keep generating new tokens, while diffusion models go from a (e.g.) 256×256 pixel image to a different 256×256 pixel image.
  • Transformers start with nothing but the prompt, but diffusion models need a “blank canvas” of pure noise to work from.
  • Transformers don’t “edit” previously generated tokens – once they’re outputted, they’re locked in – but diffusion models can and do change previous output as they go.
  • If you stop a transformer early, you probably don’t get the answer you were looking for. If you stop a diffusion model early, you get a noisy version of the image you wanted.

That last point indicates an interesting capability that diffusion models have: you get a kind of built-in quality knob. If you want fast inference at the cost of quality, you can just run the model for less time and end up with more noise in the final output. If you want high quality and you’re happy to take your time getting there, you can keep running the model until it’s finished removing noise.

Why does it work?

Transformers work because (as it turns out) the structure of human language contains a functional model of the world. If you train a system to predict the next word in a sentence, you therefore get a system that “understands” how the world works at a surprisingly high level. All kinds of exciting capabilities fall out of that – long-term planning, human-like conversation, tool use, programming, and so on.

What is the equivalent animating intuition for diffusion models? I don’t really know, but it’s probably something about the relationship between noise and data – if you can train a system to tell the difference between them, you’re necessarily encoding a model of the world into that system? I bet there’s a much nicer way of articulating this, or a better intuition that could be teased out here.

The same principles that work for images work for other kinds of data: video, audiom, and even text.

Diffusion video models

So far this has all been about image diffusion models. What about diffusion models that generate video? As far as I can tell, there are lots of different approaches, but the simplest one is to treat the entire video as a single noisy input. Instead of having your input be a tensor that represents a single picture, your input is a (much larger) tensor that represents all the frames in a video clip. As the model learns to identify noise, it’s also learning each frame relates to the other frames in the clip (object permanence, cause and effect, and so on).

I find it very cool that you can run effectively the same approach for video that you do for single images. It suggests that the fundamental mechanism here is very powerful. It also sheds some light on why the current video diffusion models (like OpenAI’s Sora or Google’s VEO) only generate clips and can’t just “keep going” like a text-based transformer model can.

Incidentally, audio generation works the same way, just with a big audio tensor instead of a big video tensor.

Diffusion text models

What about diffusion models that generate text? Text-based diffusion models are really strange, because you can’t just add noisy pixels to text in the same way that you can to images or video. The main strategy seems to be adding noise to the text embeddings. At inference time, you start with a big block of pure-noise embeddings (presumably just random numbers) then denoise until it becomes actual decodable text.

How do you turn embeddings back into text? There’s no obvious way. If you just try and look up the “closest” token to each embedding, you often end up with gibberish. If you use a separate decoder model to translate the embeddings, that works but feels a bit like cheating – at that point your diffusion model is really just generating a plan for your real text-generation model.

Summary

  • Diffusion models are trained to identify small amounts of noise in images, based on a caption embedding
  • That means you can start with pure noise and a user-provided caption and just keep chipping away layers of noise until you get to what the model thinks the original image should look like
  • The operating model is very different from transformers: not sequence-based, operates on previous outputs, and can in principle be sped up or stopped early
  • Video diffusion works the same way as image diffusion, but it’s harder for the model to learn because it requires tracking consistency over time
  • Text diffusion is weird because you can’t easily add noise to language, and if you convert to embeddings before adding noise it’s hard to reliably convert back

If you liked this post, consider subscribing to email updates about my new posts.

Source Link


Keep your files stored safely and securely with the SanDisk 2TB Extreme Portable SSD. With over 69,505 ratings and an impressive 4.6 out of 5 stars, this product has been purchased over 8K+ times in the past month. At only $129.99, this Amazon’s Choice product is a must-have for secure file storage.

Help keep private content private with the included password protection featuring 256-bit AES hardware encryption. Order now for just $129.99 on Amazon!


Start your free Amazon Prime trial
today and unlock unlimited streaming and more!

Help Power Techcratic’s Future – Scan To Support

If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.

As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!

BITCOIN

Bitcoin Logo

Bitcoin QR Code

bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge

Scan the QR code with your crypto wallet app

DOGECOIN

Dogecoin Logo

Dogecoin QR Code

D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA

Scan the QR code with your crypto wallet app

ETHEREUM

Ethereum Logo

Ethereum QR Code

0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a

Scan the QR code with your crypto wallet app

Please read the Privacy and Security Disclaimer on how Techcratic handles your support.

Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.

Tags: Hacker News
Share161Tweet101Share28
Previous Post

Apple’s USB-C Magic Mouse is on sale for $68 for Memorial Day

Next Post

Why Does Music Stop Playing When You Open Facebook?

Hacker News

Hacker News

Stay updated with Hacker News, where technology meets entrepreneurial spirit. Get the latest on tech trends, startup news, and discussions from the tech community. Read the latest updates here at Techcratic.

Related Posts

The Windows Subsystem for Linux is now open source
Hacker News

The Windows Subsystem for Linux is now open source

May 19, 2025
1.3k
clawsoftware/clawPDF: Open Source Virtual (Network) Printer for Windows that allows you to create PDFs, OCR text, and print images, with advanced features usually available only in enterprise solutions.
Hacker News

clawsoftware/clawPDF: Open Source Virtual (Network) Printer for Windows that allows you to create PDFs, OCR text, and print images, with advanced features usually available only in enterprise solutions.

May 19, 2025
1.3k
Don’t Guess My Language
Hacker News

Don’t Guess My Language

May 19, 2025
1.3k
GitHub – Blackmamoth/sshsync
Hacker News

GitHub – Blackmamoth/sshsync

May 19, 2025
1.3k
aspizu/goboscript: goboscript is the Scratch compiler
Hacker News

aspizu/goboscript: goboscript is the Scratch compiler

May 19, 2025
1.3k
MinishLab/model2vec-rs: Official Rust Implementation of Model2Vec
Hacker News

MinishLab/model2vec-rs: Official Rust Implementation of Model2Vec

May 18, 2025
1.3k
Load More
Next Post
Why Does Music Stop Playing When You Open Facebook?

Why Does Music Stop Playing When You Open Facebook?

Starburst targets AI bottlenecks with smarter data access and governance

Starburst targets AI bottlenecks with smarter data access and governance

Adobe Illustrator Patterns – Illustrator Virtual Summit

Adobe Illustrator Patterns - Illustrator Virtual Summit

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Tech Resources

  • 30 Second Tech ™
  • AI
  • App Zone ™
  • Apple
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo
  • Google News
  • Hacker News
  • Harvard Tech
  • I Like Cats ™
  • I Like Dogs ™
  • LifeHacker
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • PC World
  • Photofocus
  • Physics
  • Random Tech
  • Retro Rewind ™
  • Robot Report
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Art
  • Tech Careers
  • Tech Deals
  • Techcratic ™
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Weird Stuff
  • Wired
  • ZDNet

Tech News

  • 30 Second Tech ™
  • AI
  • AnandTech
  • Apple Insider
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • ExtremeTech
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo

Tech News

  • Harvard Tech
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • Physics
  • PC World
  • Random Tech
  • Retro Rewind ™
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Careers

Tech News​

  • Tech Art
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Weird Stuff
  • Wired
  • ZDNet

Site Links

  • About Techcratic
  • Affiliate Disclaimer
  • Affiliate Link Policy
  • Contact Techcratic
  • Dealors Discount Store
  • Privacy and Security Disclaimer
  • Privacy Policy
  • RSS Feed
  • Site Map
  • Support Techcratic
  • Techcratic
  • Tech Deals
  • TOS
  • 𝕏
Click For A Secret Deal

Techcratic – Your All In One Tech Hub © 2020 – 2025
All Rights Reserved
∞

No Result
View All Result
  • Home
  • Apple
  • Gaming
  • Microsoft
  • AnandTech