• About TC
  • Affiliate Disclaimer
  • Privacy Policy
  • TOS
  • Contact
Sunday, June 15, 2025
Techcratic
  • TC
  • AI
    Artificial Intelligence

    Amazon Nova Lite enables Bito to offer a free tier option for its AI-powered code reviews

    Artificial Intelligence

    Bridging the Gap: New Datasets Push Recommender Research Toward Real-World Scale

    Artificial Intelligence

    7 Python Errors That Are Actually Features

    Artificial Intelligence

    10 Awesome OCR Models for 2025

    Artificial Intelligence

    5 Error Handling Patterns in Python (Beyond Try-Except)

    Artificial Intelligence

    Top 5 Alternative Data Career Paths and How to Learn Them for Free

    Artificial Intelligence

    Implementing Machine Learning Pipelines with Apache Spark

    Artificial Intelligence

    Learn Power BI for Free This Week

    Artificial Intelligence

    Build GraphRAG applications using Amazon Bedrock Knowledge Bases

  • Crypto
    Bitcoin Holding $105K During War Is Bullish for Crypto

    Bitcoin Holding $105K During War Is Bullish for Crypto

    Saylor Signals Another Bitcoin Buy—Orange Dots Strike Again

    Saylor Signals Another Bitcoin Buy—Orange Dots Strike Again

    XRP Technical Analysis: Downtrend Dominates—Is the $2.05 Floor About to Collapse?

    XRP Technical Analysis: Downtrend Dominates—Is the $2.05 Floor About to Collapse?

    Best Presales to Buy Today – Which Coins Are Poised for a Breakout?

    Last 72 Hours to Join $49M Raise

    Corporate Bitcoin Holdings a ‘Blessing’ Now, Challenge Later, Says Roxom CEO

    Corporate Bitcoin Holdings a ‘Blessing’ Now, Challenge Later, Says Roxom CEO

    Cryptocurrency Remittances Spike 40% in Latin America

    Cryptocurrency Remittances Spike 40% in Latin America

    Sonic Integrates Bubblemaps V2 to Enhance Onchain Intelligence

    Sonic Integrates Bubblemaps V2 to Enhance Onchain Intelligence

    UBS Sees Senate Rushing ‘One Big, Beautiful Bill’ Toward High-Stakes Finale

    UBS Sees Senate Rushing ‘One Big, Beautiful Bill’ Toward High-Stakes Finale

    Ripple and SEC Seek to End XRP Dispute—Lawyer Sees 70% Chance Judge Will Approve Request

    Ripple and SEC Seek to End XRP Dispute—Lawyer Sees 70% Chance Judge Will Approve Request

  • Cybersecurity
    Cybersecurity

    AI Agents Run on Secret Accounts — Learn How to Secure Them in This Webinar

    Cybersecurity

    How to Address the Expanding Security Risk

    Cybersecurity

    ConnectWise to Rotate ScreenConnect Code Signing Certificates Due to Security Risks

    Cybersecurity

    5 Lessons from River Island

    Cybersecurity

    INTERPOL Dismantles 20,000+ Malicious IPs Linked to 69 Malware Variants in Operation Secure

    Cybersecurity

    SinoTrack GPS Devices Vulnerable to Remote Vehicle Control via Default Passwords

    Cybersecurity

    Researchers Uncover 20+ Configuration Risks, Including Five CVEs, in Salesforce Industry Cloud

    Cybersecurity

    Adobe Releases Patch Fixing 254 Vulnerabilities, Closing High-Severity Security Gaps

    Cybersecurity

    Researcher Found Flaw to Discover Phone Numbers Linked to Any Google Account

  • Deals
    Hitachi FIJ0038 Fuel Injector

    Hitachi FIJ0038 Fuel Injector

    EVGA Supernova 1300 P+, 80+ Platinum 1300W, Fully Modular, 10 Year Warranty, Includes…

    EVGA Supernova 1300 P+, 80+ Platinum 1300W, Fully Modular, 10 Year Warranty, Includes…

    Logitech G502 X Plus Wireless Gaming Mouse – LIGHTSPEED Optical, LIGHTFORCE Switches,…

    Logitech G502 X Plus Wireless Gaming Mouse – LIGHTSPEED Optical, LIGHTFORCE Switches,…

    Cable Matters 8-Pack Snagless Cat 5e Ethernet Cable – 5ft, Gigabit Cat5e Cable, Cat5e…

    Cable Matters 8-Pack Snagless Cat 5e Ethernet Cable – 5ft, Gigabit Cat5e Cable, Cat5e…

    Logitech iPad Pro 12.9 inch Keyboard Case | SLIM COMBO with Detachable, Backlit,…

    Logitech iPad Pro 12.9 inch Keyboard Case | SLIM COMBO with Detachable, Backlit,…

    TECKNET Ergonomic Mouse, Wireless Bluetooth Vertical Mouse, 4800 DPI Optical Tracking, 6…

    TECKNET Ergonomic Mouse, Wireless Bluetooth Vertical Mouse, 4800 DPI Optical Tracking, 6…

    DUMOS Ergonomic Gaming Desk Chair – PU Leather Recliner with Footrest, Lumbar Support,…

    DUMOS Ergonomic Gaming Desk Chair – PU Leather Recliner with Footrest, Lumbar Support,…

    Far Cry 3 (Renewed)

    Far Cry 3 (Renewed)

    Super Smash Bros. Brawl (Renewed)

    Super Smash Bros. Brawl (Renewed)

  • Gaming
    Legend of Zelda Breath of the Wild Gameplay/Walkthrough – Part 3

    Legend of Zelda Breath of the Wild Gameplay/Walkthrough – Part 3

    I'm done with Transformers Reactivate

    I'm done with Transformers Reactivate

    Amazon taps Star Trek Beyond and Mindhunter writer to helm Mass Effect show

    Amazon taps Star Trek Beyond and Mindhunter writer to helm Mass Effect show

    The Last of Us Remastered Honest Review

    The Last of Us Remastered Honest Review

    Stray Review – Quick Review

    Stray Review – Quick Review

    I was surprised and delighted to learn that every FromSoft protagonist outside of Elden Ring Nightreign is 5 feet 7 inches tall, aka the best height for cool people

    I was surprised and delighted to learn that every FromSoft protagonist outside of Elden Ring Nightreign is 5 feet 7 inches tall, aka the best height for cool people

    Pokemon Violet Walkthrough Part 17 – Iron Treads, The Quaking Earth Titan

    Pokemon Violet Walkthrough Part 17 – Iron Treads, The Quaking Earth Titan

    100% Shiny ICE Pokemon Locations in Scarlet & Violet

    100% Shiny ICE Pokemon Locations in Scarlet & Violet

    The Legend of Zelda: Ocarina of Time Master Quest Walkthrough (Pt. 59)

    The Legend of Zelda: Ocarina of Time Master Quest Walkthrough (Pt. 59)

  • Tesla
    Car Seat Organizers,Multi-functional Back Seat Protectors, Storage Pouches, and Tray…

    Car Seat Organizers,Multi-functional Back Seat Protectors, Storage Pouches, and Tray…

    AOHI USB C Car Charger, PD 45W&QC 30W 2 Port Type-C Fast Charging Car Charger Lighter…

    AOHI USB C Car Charger, PD 45W&QC 30W 2 Port Type-C Fast Charging Car Charger Lighter…

    Roof Sunshades for Tesla Model 3 2025, Upgraded 3.0 Sunroof Shade Sunshade Roof Sun…

    Roof Sunshades for Tesla Model 3 2025, Upgraded 3.0 Sunroof Shade Sunshade Roof Sun…

    SOOPII for Tesla Phone Mount,Strongest Magnetic Monitor Mount for Tesla 3/Y…

    SOOPII for Tesla Phone Mount,Strongest Magnetic Monitor Mount for Tesla 3/Y…

    A2C Gym Fitness Phone Mount for MagSafe – 17 N52 Strong Magnets Stable and Secure Phone…

    A2C Gym Fitness Phone Mount for MagSafe – 17 N52 Strong Magnets Stable and Secure Phone…

    3PCS All Weather Tesla Model 3 Highland Floor Mats 2024 2025 | Sleek Design,…

    3PCS All Weather Tesla Model 3 Highland Floor Mats 2024 2025 | Sleek Design,…

    Tesla Door Handle Cover Model Y Model 3 2020-2025 Door Handle Protector Model Y Model 3…

    Tesla Door Handle Cover Model Y Model 3 2020-2025 Door Handle Protector Model Y Model 3…

    FH Group Custom Fit Car Seat Covers for 2020-2024 Tesla Model Y Ultraflex Neoprene Water…

    FH Group Custom Fit Car Seat Covers for 2020-2024 Tesla Model Y Ultraflex Neoprene Water…

    2025 Floor Mats for Tesla Model 3 Highland 2024 Accessories 3PCS TPE All-Weather Foot…

    2025 Floor Mats for Tesla Model 3 Highland 2024 Accessories 3PCS TPE All-Weather Foot…

  • UFO
    New Balance Men’s 574 Greens V2 Golf Shoe

    New Balance Men’s 574 Greens V2 Golf Shoe

    Armin van Buuren rocking Ultra Miami with the new Exploration Of Space (Third Contact Remix)

    Armin van Buuren rocking Ultra Miami with the new Exploration Of Space (Third Contact Remix)

    I found footage of me explaining all 7 Paranormal Activity movies while haunted

    I found footage of me explaining all 7 Paranormal Activity movies while haunted

    Retro Aviator Sunglasses for Women Men – 70s Trendy Square Vintage Shade Sun Glasses UV Protection

    Retro Aviator Sunglasses for Women Men – 70s Trendy Square Vintage Shade Sun Glasses UV Protection

    Pop Culture Conspiracy Theories! Stanley Cups, Love Is Blind, and Dune

    Pop Culture Conspiracy Theories! Stanley Cups, Love Is Blind, and Dune

    Pocket Squares Handkerchiefs for Men Soft Cotton Mens Womens White Pocket Square with Holder for Suit Wedding Business

    Pocket Squares Handkerchiefs for Men Soft Cotton Mens Womens White Pocket Square with Holder for Suit Wedding Business

    unidentified flying object #shorts #ufo #jetplane

    unidentified flying object #shorts #ufo #jetplane

    LIVE: Blue Origin launches Katy Perry, Gayle King to space

    LIVE: Blue Origin launches Katy Perry, Gayle King to space

    Men’s Short Sleeve T-Shirt UPF 50+ Quick Dry Rash Guard Athletic Running T Shirts UV Sun Protection Swim Water Shirt

    Men’s Short Sleeve T-Shirt UPF 50+ Quick Dry Rash Guard Athletic Running T Shirts UV Sun Protection Swim Water Shirt

No Result
View All Result
  • TC
  • AI
    Artificial Intelligence

    Amazon Nova Lite enables Bito to offer a free tier option for its AI-powered code reviews

    Artificial Intelligence

    Bridging the Gap: New Datasets Push Recommender Research Toward Real-World Scale

    Artificial Intelligence

    7 Python Errors That Are Actually Features

    Artificial Intelligence

    10 Awesome OCR Models for 2025

    Artificial Intelligence

    5 Error Handling Patterns in Python (Beyond Try-Except)

    Artificial Intelligence

    Top 5 Alternative Data Career Paths and How to Learn Them for Free

    Artificial Intelligence

    Implementing Machine Learning Pipelines with Apache Spark

    Artificial Intelligence

    Learn Power BI for Free This Week

    Artificial Intelligence

    Build GraphRAG applications using Amazon Bedrock Knowledge Bases

  • Crypto
    Bitcoin Holding $105K During War Is Bullish for Crypto

    Bitcoin Holding $105K During War Is Bullish for Crypto

    Saylor Signals Another Bitcoin Buy—Orange Dots Strike Again

    Saylor Signals Another Bitcoin Buy—Orange Dots Strike Again

    XRP Technical Analysis: Downtrend Dominates—Is the $2.05 Floor About to Collapse?

    XRP Technical Analysis: Downtrend Dominates—Is the $2.05 Floor About to Collapse?

    Best Presales to Buy Today – Which Coins Are Poised for a Breakout?

    Last 72 Hours to Join $49M Raise

    Corporate Bitcoin Holdings a ‘Blessing’ Now, Challenge Later, Says Roxom CEO

    Corporate Bitcoin Holdings a ‘Blessing’ Now, Challenge Later, Says Roxom CEO

    Cryptocurrency Remittances Spike 40% in Latin America

    Cryptocurrency Remittances Spike 40% in Latin America

    Sonic Integrates Bubblemaps V2 to Enhance Onchain Intelligence

    Sonic Integrates Bubblemaps V2 to Enhance Onchain Intelligence

    UBS Sees Senate Rushing ‘One Big, Beautiful Bill’ Toward High-Stakes Finale

    UBS Sees Senate Rushing ‘One Big, Beautiful Bill’ Toward High-Stakes Finale

    Ripple and SEC Seek to End XRP Dispute—Lawyer Sees 70% Chance Judge Will Approve Request

    Ripple and SEC Seek to End XRP Dispute—Lawyer Sees 70% Chance Judge Will Approve Request

  • Cybersecurity
    Cybersecurity

    AI Agents Run on Secret Accounts — Learn How to Secure Them in This Webinar

    Cybersecurity

    How to Address the Expanding Security Risk

    Cybersecurity

    ConnectWise to Rotate ScreenConnect Code Signing Certificates Due to Security Risks

    Cybersecurity

    5 Lessons from River Island

    Cybersecurity

    INTERPOL Dismantles 20,000+ Malicious IPs Linked to 69 Malware Variants in Operation Secure

    Cybersecurity

    SinoTrack GPS Devices Vulnerable to Remote Vehicle Control via Default Passwords

    Cybersecurity

    Researchers Uncover 20+ Configuration Risks, Including Five CVEs, in Salesforce Industry Cloud

    Cybersecurity

    Adobe Releases Patch Fixing 254 Vulnerabilities, Closing High-Severity Security Gaps

    Cybersecurity

    Researcher Found Flaw to Discover Phone Numbers Linked to Any Google Account

  • Deals
    Hitachi FIJ0038 Fuel Injector

    Hitachi FIJ0038 Fuel Injector

    EVGA Supernova 1300 P+, 80+ Platinum 1300W, Fully Modular, 10 Year Warranty, Includes…

    EVGA Supernova 1300 P+, 80+ Platinum 1300W, Fully Modular, 10 Year Warranty, Includes…

    Logitech G502 X Plus Wireless Gaming Mouse – LIGHTSPEED Optical, LIGHTFORCE Switches,…

    Logitech G502 X Plus Wireless Gaming Mouse – LIGHTSPEED Optical, LIGHTFORCE Switches,…

    Cable Matters 8-Pack Snagless Cat 5e Ethernet Cable – 5ft, Gigabit Cat5e Cable, Cat5e…

    Cable Matters 8-Pack Snagless Cat 5e Ethernet Cable – 5ft, Gigabit Cat5e Cable, Cat5e…

    Logitech iPad Pro 12.9 inch Keyboard Case | SLIM COMBO with Detachable, Backlit,…

    Logitech iPad Pro 12.9 inch Keyboard Case | SLIM COMBO with Detachable, Backlit,…

    TECKNET Ergonomic Mouse, Wireless Bluetooth Vertical Mouse, 4800 DPI Optical Tracking, 6…

    TECKNET Ergonomic Mouse, Wireless Bluetooth Vertical Mouse, 4800 DPI Optical Tracking, 6…

    DUMOS Ergonomic Gaming Desk Chair – PU Leather Recliner with Footrest, Lumbar Support,…

    DUMOS Ergonomic Gaming Desk Chair – PU Leather Recliner with Footrest, Lumbar Support,…

    Far Cry 3 (Renewed)

    Far Cry 3 (Renewed)

    Super Smash Bros. Brawl (Renewed)

    Super Smash Bros. Brawl (Renewed)

  • Gaming
    Legend of Zelda Breath of the Wild Gameplay/Walkthrough – Part 3

    Legend of Zelda Breath of the Wild Gameplay/Walkthrough – Part 3

    I'm done with Transformers Reactivate

    I'm done with Transformers Reactivate

    Amazon taps Star Trek Beyond and Mindhunter writer to helm Mass Effect show

    Amazon taps Star Trek Beyond and Mindhunter writer to helm Mass Effect show

    The Last of Us Remastered Honest Review

    The Last of Us Remastered Honest Review

    Stray Review – Quick Review

    Stray Review – Quick Review

    I was surprised and delighted to learn that every FromSoft protagonist outside of Elden Ring Nightreign is 5 feet 7 inches tall, aka the best height for cool people

    I was surprised and delighted to learn that every FromSoft protagonist outside of Elden Ring Nightreign is 5 feet 7 inches tall, aka the best height for cool people

    Pokemon Violet Walkthrough Part 17 – Iron Treads, The Quaking Earth Titan

    Pokemon Violet Walkthrough Part 17 – Iron Treads, The Quaking Earth Titan

    100% Shiny ICE Pokemon Locations in Scarlet & Violet

    100% Shiny ICE Pokemon Locations in Scarlet & Violet

    The Legend of Zelda: Ocarina of Time Master Quest Walkthrough (Pt. 59)

    The Legend of Zelda: Ocarina of Time Master Quest Walkthrough (Pt. 59)

  • Tesla
    Car Seat Organizers,Multi-functional Back Seat Protectors, Storage Pouches, and Tray…

    Car Seat Organizers,Multi-functional Back Seat Protectors, Storage Pouches, and Tray…

    AOHI USB C Car Charger, PD 45W&QC 30W 2 Port Type-C Fast Charging Car Charger Lighter…

    AOHI USB C Car Charger, PD 45W&QC 30W 2 Port Type-C Fast Charging Car Charger Lighter…

    Roof Sunshades for Tesla Model 3 2025, Upgraded 3.0 Sunroof Shade Sunshade Roof Sun…

    Roof Sunshades for Tesla Model 3 2025, Upgraded 3.0 Sunroof Shade Sunshade Roof Sun…

    SOOPII for Tesla Phone Mount,Strongest Magnetic Monitor Mount for Tesla 3/Y…

    SOOPII for Tesla Phone Mount,Strongest Magnetic Monitor Mount for Tesla 3/Y…

    A2C Gym Fitness Phone Mount for MagSafe – 17 N52 Strong Magnets Stable and Secure Phone…

    A2C Gym Fitness Phone Mount for MagSafe – 17 N52 Strong Magnets Stable and Secure Phone…

    3PCS All Weather Tesla Model 3 Highland Floor Mats 2024 2025 | Sleek Design,…

    3PCS All Weather Tesla Model 3 Highland Floor Mats 2024 2025 | Sleek Design,…

    Tesla Door Handle Cover Model Y Model 3 2020-2025 Door Handle Protector Model Y Model 3…

    Tesla Door Handle Cover Model Y Model 3 2020-2025 Door Handle Protector Model Y Model 3…

    FH Group Custom Fit Car Seat Covers for 2020-2024 Tesla Model Y Ultraflex Neoprene Water…

    FH Group Custom Fit Car Seat Covers for 2020-2024 Tesla Model Y Ultraflex Neoprene Water…

    2025 Floor Mats for Tesla Model 3 Highland 2024 Accessories 3PCS TPE All-Weather Foot…

    2025 Floor Mats for Tesla Model 3 Highland 2024 Accessories 3PCS TPE All-Weather Foot…

  • UFO
    New Balance Men’s 574 Greens V2 Golf Shoe

    New Balance Men’s 574 Greens V2 Golf Shoe

    Armin van Buuren rocking Ultra Miami with the new Exploration Of Space (Third Contact Remix)

    Armin van Buuren rocking Ultra Miami with the new Exploration Of Space (Third Contact Remix)

    I found footage of me explaining all 7 Paranormal Activity movies while haunted

    I found footage of me explaining all 7 Paranormal Activity movies while haunted

    Retro Aviator Sunglasses for Women Men – 70s Trendy Square Vintage Shade Sun Glasses UV Protection

    Retro Aviator Sunglasses for Women Men – 70s Trendy Square Vintage Shade Sun Glasses UV Protection

    Pop Culture Conspiracy Theories! Stanley Cups, Love Is Blind, and Dune

    Pop Culture Conspiracy Theories! Stanley Cups, Love Is Blind, and Dune

    Pocket Squares Handkerchiefs for Men Soft Cotton Mens Womens White Pocket Square with Holder for Suit Wedding Business

    Pocket Squares Handkerchiefs for Men Soft Cotton Mens Womens White Pocket Square with Holder for Suit Wedding Business

    unidentified flying object #shorts #ufo #jetplane

    unidentified flying object #shorts #ufo #jetplane

    LIVE: Blue Origin launches Katy Perry, Gayle King to space

    LIVE: Blue Origin launches Katy Perry, Gayle King to space

    Men’s Short Sleeve T-Shirt UPF 50+ Quick Dry Rash Guard Athletic Running T Shirts UV Sun Protection Swim Water Shirt

    Men’s Short Sleeve T-Shirt UPF 50+ Quick Dry Rash Guard Athletic Running T Shirts UV Sun Protection Swim Water Shirt

No Result
View All Result
Techcratic
No Result
View All Result
Home Hacker News

Visual Reasoning is Coming Soon

Hacker News by Hacker News
April 9, 2025
in Hacker News
Reading Time: 13 mins read
124 6
A A
0

2025-04-09 11:58:00
arcturus-labs.com

I gotta say – I love it living in exponential times. I can just wish that something existed and then within a month it does! This time it happened with OpenAI’s 4o image generation release. In this blog post I’ll briefly cover the release and why I think it’s pretty cool. Then I’ll dive into a new opportunity that I think is even more exciting – visual reasoning.

Visual Reasoning Robot

Rather watch than read?

Hey, I get it – sometimes you just want to kick back and watch! Check out this quick video where I walk through everything in this post. Same great content, just easier on the eyes!

Why Image Manipulation with LLMs Stinks

Working with images in Multimodal LLMs has been a mostly one-sided affair. On one hand, it’s really cool that you can drop an image into an LLM conversation and get the model to reason about it. But when you ask the model to generate an image, there is a disconnect, because all the model can do is describe the image in text and then call out to an external image generation tool to generate the image based on that text. Text is a poor communication medium for images, and the resulting image is often quite disconnected from the expected result because the short description that the LLM provides to the image generation tool will rarely capture the full context of the conversation.

The problem is most pronounced when attempting to go back and forth working on an idea for an image. You can show the LLM an image of your cat and then say “make this cat wear a detective hat and a monocle”. The best the model can do is to put a detective hat and monocle on some cat, not the one in your image. To make matters worse, the model can’t even see the image that it has just created. So if you ask for a modification to first generation attempt, then the subsequent generations are really just starting over from scratch and hoping that a more detailed description to the image generation tool will make things better… it won’t.

Your Cat
Bad Cat Generation
Left: OpenAI’s pet cat. Right: The best a traditional LLM can do when asked to give it a detective hat and monocle.

OpenAI’s Release Brings True Image Manipulation to LLMs

OpenAI’s image generation release changes all of this. OpenAI’s newest GPT-4o model carries the full context of the conversation into the image generation, and rather than call out to an external model, it is the very same GPT-4o model that does the image generation. By carrying the full context of the conversation – including the previous images – into the generation of the new image you can generate much more consistent imagery.

Thus, stealing OpenAI’s example, you can show it a photo of your cat, tell it to give your cat a detective hat, et voila! It works!

Your Cat
Good Cat Generation
Left: OpenAI’s pet cat. Right: GPT-4o’s generation when asked to give the same cat a detective hat and monocle. Much more consistent!

Now if all we could do with this technology is stick silly costumes on our cats, then this would honestly be a pretty lame achievement. But think about what we’re angling for here. You’ll be able to turn hand-drawn garbage into sophisticated infographics and marketing material. You’ll be able to take crappy matplotlib charts and reformat them as professional-looking plots that go into slides. You’ll be able to try on clothes virtually before you buy them. You’ll be able to create photo collages from your photos, game assets, and 3D models. You’ll be able to edit your vacation photos to remove that one person who blinked. You’ll be able to take a photo of your living room and try out different furniture arrangements and color schemes. You’ll be able to take a photo of your garden and get suggestions for where to plant new flowers. The possibilities are endless when you can manipulate images with the full context of a conversation.

Visual Reasoning will be the Real Innovation

All of the above is but prologue for what will come next. Models perform better when they take time to reason. This is because models have no internal monologue. Whereas humans pause and think about a question prior to answering, models have to generate tokens in order to think. But to this point models have only been able to reason in the space of words, and words are ill-suited for reasoning through many types of real world problems. But soon the models’ new image manipulation capabilities will power a new form of reasoning. Right now, this ability is mostly useful for image editing. Soon, models will have the ability to visualize different scenarios and visually reason about the world around them.

Demoing Visual Reasoning

Let’s try out an example right now and get a glimpse of how visual reasoning will work. For our experiment we’ll use Matthew Berman’s marble in a cup problem – it’s one of his go-to reasoning challenges when playing around with a new model:

I’ve put a marble in a glass. The glass is then turned upside down and put on a table. Then the glass is picked up and put into the microwave. Where is the marble?

It’s a great problem because it requires common sense knowledge of how objects work in the world, and it’s especially suited for this post because it requires spatial reasoning to extrapolate to a new scenario. I asked the new GPT-4o a variant of this problem in text:

I set a plate on a table, and glass next to it. I set a marble on the plate. Then I pick up the marble, drop it in the glass. Then I turn the glass upside down and set it on the plate. Then, I pick up the glass and put it in the microwave. Where is the marble?

I’ve added an extra item, a plate, for just a bit more complexity. Here’s the answer it gave me:

The marble is in the microwave, inside the glass.

Here’s a breakdown of what happened:

You placed a marble on the plate.

You picked up the marble and dropped it in the glass — now the marble is in the glass.

You turned the glass upside down and set it on the plate — unless the glass is sealed or the marble is somehow stuck, the marble would fall out onto the plate.

Then you picked up the glass (now empty) and put it in the microwave.

So at this point:

The glass is in the microwave.

The marble is on the plate, outside the microwave.

But you asked: Where is the marble? You didn’t say you picked it up again, so based on what you said, the marble is on the plate, not in the microwave.

Want to run it back or add any twists?

It’s actually a pretty weird answer. It immediately answers – incorrectly – then performs post-hoc chain-of-thought reasoning and comes to the correct answer but ignores the fact that it completely contradicted itself.

Let’s now do a vision-enhanced version of this chain-of-thought reasoning and demonstrate how spatial reasoning will improve the model’s reasoning capabilities.

Problem Statement
Problem Statement Image

I prompted it with the following (including extra requirements to think visually):

I’m going to pick up the marble, put it in the glass, turn the glass over onto the plate, pick the glass up, and then put the glass in the microwave.

The question is, where is the marble?

Solve this one step at a time using images to imagine what happens at each step. For each step, first state the step, and then generate an image corresponding to what the scene will look like after that step. And then summarize the image in English, highlighting the important aspects for solving the problem.

Here are the reasoning responses from the model:

Step 1

Step 1: Pick up the marble.

In the image, the marble has not yet been moved, but we are about to pick it up and place it in the glass.

Step 2

Step 2: Put the marble in the glass.

In the image, the marble is now inside the glass, and the plate is empty.

Step 3

Step 3: Turn the glass over onto the plate.

In the image, the glass is now upside down on the plate, trapping the marble inside.

Step 4

Step 4: Pick the glass up.

In the image, the hand is lifting the glass off the plate. The marble remains on the plate, as it is no longer contained.

Step 5

Step 5: Put the glass in the microwave.

In the image, the glass is now inside the microwave with the marble at the bottom.

Final answer: The marble is inside the microwave, at the bottom of the glass.

So… uh… notably, we made some amazingly consistent images, and then completely got the answer wrong. And we were so close! At Step 4, the model acknowledged that the marble was no longer contained in the glass. Unfortunately, by the time the glass was in the microwave, the model imagined that the marble had somehow followed it.

Training Models to Think Visually

I think the failure to correctly use visual reasoning is because the model hasn’t yet been trained for visual reasoning.

Who knows how OpenAI trained these models, but based on their debuting post, the training to date is almost certainly unrelated to visual reasoning. The examples in that post – while extremely impressive – are about better rendering for text, and better instruction following. But all the instructions follow examples related to image manipulation rather than reasoning. Instead, we need to start training models specifically to perform visual reasoning.

For visual reasoning practice, we can do supervised fine-tuning on sequences similar to the marble example above. For instance, to understand more about the physical world, we can show the model sequential pictures of Slinkys going down stairs, or basketball players shooting 3-pointers, or people hammering birdhouses together. We can make the model become more socially aware by letting it see sequences of text and images from social interactions, and having it predict body language and facial expressions for the next scene. We can train for spatial reasoning by having models solve tangrams or having them visualize what a contraption will look like when rotated by 90 degrees. If you think for very long it’s easy to can come up with tons of ideas like these – visual reasoning is central to most of our lives.

But where will we get all this training data? For spatial and physical reasoning tasks, we can leverage computer graphics to generate synthetic data. This approach is particularly valuable because simulations provide a controlled environment where we can create scenarios with known outcomes, making it easy to verify the model’s predictions. But we’ll also need real-world examples. Fortunately, there’s an abundance of video content online that we can tap into. While initial datasets might require human annotation, soon models themselves will be able to process videos and their transcripts to extract training examples automatically. Consider all the how-to videos online – creators typically explain what they’re going to do before demonstrating it. We can use these to create training pairs: given the current scene and verbal description, have models predict what happens next.

As an aside – I think that videos are going to be our richest source of new training data, and we’re just now starting to really tap into it. But who owns the all of the video content on the web? YouTube… Google. I’d say that bodes really well for the future of the Gemini models. Wouldn’t you?

From Chain-of-Thought to Reasoning Models

As early as 2022 we had recognized that LLMs profited from chain-of-thought reasoning. If a model was making a snap judgment and rushing to an incorrect solution, it was commonplace to add a “let’s think step by step”, hope for a better output, and often get it. Soon, deeper reasoning approaches were discovered. In particular, Tree of Thoughts encouraged the model to think through multiple possible approaches for solving a problem, rank them according to perceived feasibility, and then the application would help direct the model optimally through the paths and toward the problem solution.

With OpenAI’s o1 model and several models that followed soon thereafter, the ability to do this sort of tree-of-thoughts reasoning is baked into the model. Prior to providing a final solution, the model may now think through the problem within special “thinking” tags. According to DeepSeek (who has been much more transparent than OpenAI), their R1 model has been trained through a form of reinforcement learning with verifiable problems so that the model can be rewarded for efficiently thinking its way to the verifiable correct solution. Interestingly, and reminiscent of tree-of-thoughts, the models can think themselves into a corner – recognize that they have messed up – backtrack, and get back to a better course toward solving the problem.

Enter visual reasoning. In the coming year, models will undergo supervised fine-tuning to develop scene prediction abilities – given a current scene and a proposed change, they’ll learn to visualize the outcome. For example, they’ll be able to mentally rotate objects in 3D space or anticipate how people might react in social situations. As training progresses using scenarios with verifiable outcomes, models will develop more sophisticated reasoning capabilities. They’ll be able to observe their environment, formulate plans, mentally simulate the consequences of different actions, and learn from comparing their predictions to real-world results. This advancement will be particularly transformative for robotics, where physical interaction with the environment requires robust visual understanding and planning.

Currently, there is one obvious problem with visual reasoning – image creation is a very slow process. However, like everything else in our world right now, this will soon become faster and more feasible. But even if the visual processing remains slow, the very act of training the models to reason about images will improve the text reasoning for visual/spatial problems. What’s more, the internal representation of the images seems to be created at several levels of granularity – a small blurry image, and then a refined high-def image. The image you see when playing with ChatGPT is the latter, high-def image. Perhaps visual reasoning can make use of the blurry image – it will be quicker to generate and will help with the reasoning process.

Conclusion

I’m excited about what’s coming! Just like every moment for the past 4 years, we’re on the cusp of a radical discovery that will change our world – a radical discovery which is soon realized. I don’t think this one is going to let us down. The ever-more capable visual reasoning models will be able to make better sense of our work – not only in terms of understanding the mechanics of physical objects, but also in reading social cues, and really in anything else that we do where vision is of use to us! And then 2026 will be the year of the robots…

Terminator

Hey, and if you liked this post, then maybe we should be friends!

Source Link


Keep your files stored safely and securely with the SanDisk 2TB Extreme Portable SSD. With over 69,505 ratings and an impressive 4.6 out of 5 stars, this product has been purchased over 8K+ times in the past month. At only $129.99, this Amazon’s Choice product is a must-have for secure file storage.

Help keep private content private with the included password protection featuring 256-bit AES hardware encryption. Order now for just $129.99 on Amazon!


Start your free Amazon Prime trial
today and unlock unlimited streaming and more!

Help Power Techcratic’s Future – Scan To Support

If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.

As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!

BITCOIN

Bitcoin Logo

Bitcoin QR Code

bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge

Scan the QR code with your crypto wallet app

DOGECOIN

Dogecoin Logo

Dogecoin QR Code

D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA

Scan the QR code with your crypto wallet app

ETHEREUM

Ethereum Logo

Ethereum QR Code

0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a

Scan the QR code with your crypto wallet app

Please read the Privacy and Security Disclaimer on how Techcratic handles your support.

Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.

Tags: Hacker News
Previous Post

Adopting A Kitten | Cat's First Days Home!

Next Post

Legacy Video Game AV Cable (Atari 2600, Colecovision, Intellivision & more)

Hacker News

Hacker News

Stay updated with Hacker News, where technology meets entrepreneurial spirit. Get the latest on tech trends, startup news, and discussions from the tech community. Read the latest updates here at Techcratic.

Related Posts

Biofuels Policy, a Mainstay of American Agriculture, Has Been a Failure for the Climate, a New Report Claims
Hacker News

Biofuels Policy, a Mainstay of American Agriculture, Has Been a Failure for the Climate, a New Report Claims

June 15, 2025
1.3k
SakanaAI/text-to-lora: Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input
Hacker News

SakanaAI/text-to-lora: Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input

June 15, 2025
1.3k
tanelp/tiny-diffusion: A minimal PyTorch implementation of probabilistic diffusion models for 2D datasets.
Hacker News

tanelp/tiny-diffusion: A minimal PyTorch implementation of probabilistic diffusion models for 2D datasets.

June 15, 2025
1.3k
How we investigated Amsterdam’s attempt to build a ‘fair’ fraud detection model
Hacker News

How we investigated Amsterdam’s attempt to build a ‘fair’ fraud detection model

June 14, 2025
1.3k
Waymo rides cost more than Uber or Lyft — and people are paying anyway
Hacker News

Waymo rides cost more than Uber or Lyft — and people are paying anyway

June 14, 2025
1.3k
GitHub – zachlatta/sshtron: $ ssh sshtron.zachlatta.com
Hacker News

GitHub – zachlatta/sshtron: $ ssh sshtron.zachlatta.com

June 14, 2025
1.3k
yousef-rafat/miniDiffusion: A reimplementation of Stable Diffusion 3.5 in pure PyTorch
Hacker News

yousef-rafat/miniDiffusion: A reimplementation of Stable Diffusion 3.5 in pure PyTorch

June 14, 2025
1.3k
TimeGuessr
Hacker News

TimeGuessr

June 14, 2025
1.3k
Load More
Next Post
Legacy Video Game AV Cable (Atari 2600, Colecovision, Intellivision & more)

Legacy Video Game AV Cable (Atari 2600, Colecovision, Intellivision & more)

Tugboat Vs. Towboat: What’s The Difference?

Tugboat Vs. Towboat: What's The Difference?

Tesla announces launch in Saudi Arabia

Tesla's top financial controller leaves (TSLA)

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Tech Resources

  • 30 Second Tech ™
  • AI
  • App Zone ™
  • Apple
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo
  • Google News
  • Hacker News
  • Harvard Tech
  • I Like Cats ™
  • I Like Dogs ™
  • LifeHacker
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • PC World
  • Photofocus
  • Physics
  • Random Tech
  • Retro Rewind ™
  • Robot Report
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Art
  • Tech Careers
  • Tech Deals
  • Techcratic ™
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Wired
  • ZDNet

Tech News

  • 30 Second Tech ™
  • AI
  • Apple Insider
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • ExtremeTech
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo

Tech News

  • Harvard Tech
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • Physics
  • PC World
  • Random Tech
  • Retro Rewind ™
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Careers

Tech News​

  • Tech Art
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Wired
  • ZDNet

Site Links

  • About Techcratic
  • Affiliate Disclaimer
  • Affiliate Link Policy
  • Contact Techcratic
  • Dealors Discount Store
  • Privacy and Security Disclaimer
  • Privacy Policy
  • RSS Feed
  • Site Map
  • Support Techcratic
  • Techcratic
  • Tech Deals
  • TOS
  • 𝕏
Click For A Secret Deal

Techcratic – Your All In One Tech Hub © 2020 – 2025
All Rights Reserved
∞

No Result
View All Result
  • 30 Second Tech ™
  • AI
  • App Zone ™
  • Apple
  • Ars Technica
  • CNET
  • Crypto News
  • Cybersecurity
  • Endgadget
  • Gaming
  • I Like Cats ™
  • I Like Dogs ™
  • MacRumors
  • Macworld
  • Tech Deals
  • Techcratic ™
  • Techs Got To Eat ™
  • Tesla
  • UFO
  • Wired