• About TC
  • Affiliate Disclaimer
  • Privacy Policy
  • TOS
  • Contact
Sunday, July 6, 2025
Techcratic
  • TC
  • AI
    Artificial Intelligence

    Transforming network operations with AI: How Swisscom built a network assistant using Amazon Bedrock

    Artificial Intelligence

    EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video

    Artificial Intelligence

    Instruction-Following Pruning for Large Language Models

    Artificial Intelligence

    How to Combine Streamlit, Pandas, and Plotly for Interactive Data Apps

    Artificial Intelligence

    Tailor responsible AI with new safeguard tiers in Amazon Bedrock Guardrails

    Artificial Intelligence

    Automate Data Quality Reports with n8n: From CSV to Professional Analysis

    Artificial Intelligence

    NewDay builds A Generative AI based Customer service Agent Assist with over 90% accuracy

    Artificial Intelligence

    5 Things You Need to Know About Agentic AI

    Artificial Intelligence

    Normalizing Flows are Capable Generative Models

  • App Zone
    Top 3 Launcher Apps for Apple: Features, Pros, and Cons

    Top 3 Launcher Apps for Apple: Features, Pros, and Cons

    Top 3 Launcher Apps for Android: Features, Pros, and Cons

    Top 3 Launcher Apps for Android: Features, Pros, and Cons

    Top 3 Card Game Apps of 2025: Features, Pros, and Cons

    Top 3 Card Game Apps of 2025: Features, Pros, and Cons

    Top 3 Medical Apps of 2025: Features, Pros, and Cons

    Top 3 Medical Apps of 2025: Features, Pros, and Cons

    Top 3 Travel Apps of 2025: Features, Pros, and Cons

    Top 3 Travel Apps of 2025: Features, Pros, and Cons

    Top 3 Casual Game Apps for 2025: Features, Pros, and Cons

    Top 3 Casual Game Apps for 2025: Features, Pros, and Cons

    Top 3 Food Apps for 2025: Features, Pros, and Cons

    Top 3 Food Apps for 2025: Features, Pros, and Cons

    Top 3 Sport Apps for 2025: Features, Pros, and Cons

    Top 3 Sport Apps for 2025: Features, Pros, and Cons

    Top 3 Productivity Apps for 2025: Features, Pros, and Cons

    Top 3 Productivity Apps for 2025: Features, Pros, and Cons

  • Apple
    How to stop LG & Samsung smart TV tracking, screen captures

    How to stop LG & Samsung smart TV tracking, screen captures

    Apple’s F1 expected to hit $300M at the box office this weekend

    Apple’s F1 expected to hit $300M at the box office this weekend

    Apple is reportedly working on a cheaper MacBook, but will it stick the landing?

    Apple is reportedly working on a cheaper MacBook, but will it stick the landing?

    Apple @ Work: Macs have never been more expensive to repair, but never been more reliable

    Apple @ Work: Macs have never been more expensive to repair, but never been more reliable

    New Gemini icon comes to Android and iPhone

    New Gemini icon comes to Android and iPhone

    Best Mac SSD and hard drive Prime Day deals 2025: Early discounts

    Best Mac SSD and hard drive Prime Day deals 2025: Early discounts

    This is the letter Donald Trump sent Apple to keep TikTok online

    This is the letter Donald Trump sent Apple to keep TikTok online

    Siri’s future, the original iPhone’s past, and Apple Music’s birthday

    Siri’s future, the original iPhone’s past, and Apple Music’s birthday

    Apple is prepping 15 new Macs for release, including one potential surprise

    Apple is prepping 15 new Macs for release, including one potential surprise

  • Retro Rewind
    Retro Rewind: Electronic Games April 1995

    Retro Rewind: Electronic Games April 1995

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

    Retro Rewind: Electronic Gaming Monthly Magazine Number 57 April 1994

    Retro Rewind: Blast from the Past – 35 Iconic Commercials of 1988!

    Retro Rewind: Blast from the Past – 35 Iconic Commercials of 1988!

    Retro Rewind: PC World Magazine August 1998

    Retro Rewind: PC World Magazine August 1998

    Retro Rewind: Computer Shopper Magazine September 1997

    Retro Rewind: Computer Shopper Magazine September 1997

    Retro Rewind: PC Magazine December 2015

    Retro Rewind: PC Magazine December 2015

    Retro Rewind: EDGE Magazine RETRO #1: The Guide to Classic Videogame Playing and Collecting

    Retro Rewind: EDGE Magazine RETRO #1: The Guide to Classic Videogame Playing and Collecting

    Retro Rewind: Computer Gaming World Magazine Issue 73 December 1998

    Retro Rewind: Computer Gaming World Magazine Issue 73 December 1998

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

  • Tech Deals
    Razer Iskur V2 Gaming Chair: Adaptive Lumbar Support – Adjustable Lumbar Curve – High…

    Razer Iskur V2 Gaming Chair: Adaptive Lumbar Support – Adjustable Lumbar Curve – High…

    Critical Rolls: Boxed Set – 5e RPG Storytelling Cards, 300 Tarot Sized Cards, Tabletop…

    Critical Rolls: Boxed Set – 5e RPG Storytelling Cards, 300 Tarot Sized Cards, Tabletop…

    Nintendogs Dachshund & Friends (Renewed)

    Nintendogs Dachshund & Friends (Renewed)

    Gamer [Blu-ray]

    Gamer [Blu-ray]

    Transcend TS-RDF2 Cfast 2.0 USB 3.1 Card Reader

    Transcend TS-RDF2 Cfast 2.0 USB 3.1 Card Reader

    MaxLLTo USB 3.0 Power Charger Data SYNC Cable Cord for Toshiba External Hard Drive Disk…

    MaxLLTo USB 3.0 Power Charger Data SYNC Cable Cord for Toshiba External Hard Drive Disk…

    Seagate Bulk ST4000NM0033 Constellation ES.3 4TB SATA 6G (Renewed)

    Seagate Bulk ST4000NM0033 Constellation ES.3 4TB SATA 6G (Renewed)

    Seagate Video 2.5 HDD Hard Drive – Internal (ST500VT000)

    Seagate Video 2.5 HDD Hard Drive – Internal (ST500VT000)

    TAGRY Bluetooth Headphones True Wireless Earbuds 60H Playback LED Power Display…

    TAGRY Bluetooth Headphones True Wireless Earbuds 60H Playback LED Power Display…

  • Tech Eats
    Cheesy Broccoli Rice Mug: 5-Minute Super Comfort Food

    Cheesy Broccoli Rice Mug: 5-Minute Super Comfort Food

    Top 10 Vegetarian Recipes for 2025: Easy and Nutritious Meals for Busy People

    Top 10 Vegetarian Recipes for 2025: Easy and Nutritious Meals for Busy People

    Bacon Mug Lasagna: 5-Minute Microwave Meat Lover’s Dream

    Bacon Mug Lasagna: 5-Minute Microwave Meat Lover’s Dream

    Bacon Fried Rice Mug: 5-Minute Microwave Meal

    Bacon Fried Rice Mug: 5-Minute Microwave Meal

    Bacon & Cheddar Mug Biscuit: 2-Minute Savory Comfort

    Bacon & Cheddar Mug Biscuit: 2-Minute Savory Comfort

    Loaded Bacon Cheesy Potato Mug: 5-Minute Comfort Food

    Loaded Bacon Cheesy Potato Mug: 5-Minute Comfort Food

    Peanut Butter Banana Mug Muffin: 5-Minute Protein Snack

    Peanut Butter Banana Mug Muffin: 5-Minute Protein Snack

    Oreo Mug Cake: 2-Minute Cookie & Cake Combo!

    Oreo Mug Cake: 2-Minute Cookie & Cake Combo!

    Tiramisu Mug Cake: Coffee Lover’s Dream in 2 Minutes!

    Tiramisu Mug Cake: Coffee Lover’s Dream in 2 Minutes!

  • Tesla
    4 PCS Car Front and Rear Side Window Sunshade, 19.6″ x 31.4″ x 7.8″ + 19.6″ x 31.4″ Keep…

    4 PCS Car Front and Rear Side Window Sunshade, 19.6″ x 31.4″ x 7.8″ + 19.6″ x 31.4″ Keep…

    Car Floor Mats for Tesla Cybertruck 2023 2024 2025, Custom TPE All Weather Protection…

    Car Floor Mats for Tesla Cybertruck 2023 2024 2025, Custom TPE All Weather Protection…

    JOYTUTUS Truck Bed Divider Compatible with Cybertruck 2024 2023 Cargo Divider Organizer…

    JOYTUTUS Truck Bed Divider Compatible with Cybertruck 2024 2023 Cargo Divider Organizer…

    HANSSHOW Pet Seat Covers for Cybertruck Rear Dog Seat Protector Full-Cover Waterproof…

    HANSSHOW Pet Seat Covers for Cybertruck Rear Dog Seat Protector Full-Cover Waterproof…

    Center Console Organizer Tray Compatible with Tesla Cybertruck 2024 2025 Accessories,…

    Center Console Organizer Tray Compatible with Tesla Cybertruck 2024 2025 Accessories,…

    Cybertruck Sticker Vinyl Bumper Sticker Decal Waterproof 5″

    Cybertruck Sticker Vinyl Bumper Sticker Decal Waterproof 5″

    JOMISE Dash Cam Front and Rear, 4k FHD Dual Car Camera, 3″ IPS Dash Camera for Cars with…

    JOMISE Dash Cam Front and Rear, 4k FHD Dual Car Camera, 3″ IPS Dash Camera for Cars with…

    Model 3 Badge Emblem – Front Hood and Rear Trunk Replacement Logo for Tesla Model 3-3D…

    Model 3 Badge Emblem – Front Hood and Rear Trunk Replacement Logo for Tesla Model 3-3D…

    TAPTES for Tesla Model S Floor Mats 2019 2018 2017 2016 2015, Premium All Weather…

    TAPTES for Tesla Model S Floor Mats 2019 2018 2017 2016 2015, Premium All Weather…

  • UFO
    SOJOS Retro Polarized Square Sunglasses Womens Men Vintage Double Bridge Metal Frame UV Protection Sun Glasses SJ1246

    SOJOS Retro Polarized Square Sunglasses Womens Men Vintage Double Bridge Metal Frame UV Protection Sun Glasses SJ1246

    Bill Nye on Space Exploration #billnye #science #space #spaceexploration  #masterclass

    Bill Nye on Space Exploration #billnye #science #space #spaceexploration #masterclass

    Nessie and UFO, Sasquatch Rare Selfie, The Loch Ness Bigfoot T-Shirt

    Nessie and UFO, Sasquatch Rare Selfie, The Loch Ness Bigfoot T-Shirt

    Spirit Communication by Rev. Gaurav Tiwari | Indian Paranormal Society

    Spirit Communication by Rev. Gaurav Tiwari | Indian Paranormal Society

    Crumbl Conspiracy Investigation

    Crumbl Conspiracy Investigation

    New York NY City Lights | Ufo sightings in 2021 | Unidentified Flying object

    New York NY City Lights | Ufo sightings in 2021 | Unidentified Flying object

    amBand Compatible for Fitbit Versa 4/3/2/ Fitbit Versa Lite/Fitbit Sense 2/ Fitbit Sense Bands with Case, Protective Smartwatch Case Strap Rugged Sport Protector Wristbands Men Green

    amBand Compatible for Fitbit Versa 4/3/2/ Fitbit Versa Lite/Fitbit Sense 2/ Fitbit Sense Bands with Case, Protective Smartwatch Case Strap Rugged Sport Protector Wristbands Men Green

    Scientists Solve the Mystery Behind the Oumuamua 'Alien Spacecraft' Comet

    Scientists Solve the Mystery Behind the Oumuamua 'Alien Spacecraft' Comet

    Aliens From Outer Space: Ufo Landings, Crashes And Retrievals

    Aliens From Outer Space: Ufo Landings, Crashes And Retrievals

No Result
View All Result
  • TC
  • AI
    Artificial Intelligence

    Transforming network operations with AI: How Swisscom built a network assistant using Amazon Bedrock

    Artificial Intelligence

    EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video

    Artificial Intelligence

    Instruction-Following Pruning for Large Language Models

    Artificial Intelligence

    How to Combine Streamlit, Pandas, and Plotly for Interactive Data Apps

    Artificial Intelligence

    Tailor responsible AI with new safeguard tiers in Amazon Bedrock Guardrails

    Artificial Intelligence

    Automate Data Quality Reports with n8n: From CSV to Professional Analysis

    Artificial Intelligence

    NewDay builds A Generative AI based Customer service Agent Assist with over 90% accuracy

    Artificial Intelligence

    5 Things You Need to Know About Agentic AI

    Artificial Intelligence

    Normalizing Flows are Capable Generative Models

  • App Zone
    Top 3 Launcher Apps for Apple: Features, Pros, and Cons

    Top 3 Launcher Apps for Apple: Features, Pros, and Cons

    Top 3 Launcher Apps for Android: Features, Pros, and Cons

    Top 3 Launcher Apps for Android: Features, Pros, and Cons

    Top 3 Card Game Apps of 2025: Features, Pros, and Cons

    Top 3 Card Game Apps of 2025: Features, Pros, and Cons

    Top 3 Medical Apps of 2025: Features, Pros, and Cons

    Top 3 Medical Apps of 2025: Features, Pros, and Cons

    Top 3 Travel Apps of 2025: Features, Pros, and Cons

    Top 3 Travel Apps of 2025: Features, Pros, and Cons

    Top 3 Casual Game Apps for 2025: Features, Pros, and Cons

    Top 3 Casual Game Apps for 2025: Features, Pros, and Cons

    Top 3 Food Apps for 2025: Features, Pros, and Cons

    Top 3 Food Apps for 2025: Features, Pros, and Cons

    Top 3 Sport Apps for 2025: Features, Pros, and Cons

    Top 3 Sport Apps for 2025: Features, Pros, and Cons

    Top 3 Productivity Apps for 2025: Features, Pros, and Cons

    Top 3 Productivity Apps for 2025: Features, Pros, and Cons

  • Apple
    How to stop LG & Samsung smart TV tracking, screen captures

    How to stop LG & Samsung smart TV tracking, screen captures

    Apple’s F1 expected to hit $300M at the box office this weekend

    Apple’s F1 expected to hit $300M at the box office this weekend

    Apple is reportedly working on a cheaper MacBook, but will it stick the landing?

    Apple is reportedly working on a cheaper MacBook, but will it stick the landing?

    Apple @ Work: Macs have never been more expensive to repair, but never been more reliable

    Apple @ Work: Macs have never been more expensive to repair, but never been more reliable

    New Gemini icon comes to Android and iPhone

    New Gemini icon comes to Android and iPhone

    Best Mac SSD and hard drive Prime Day deals 2025: Early discounts

    Best Mac SSD and hard drive Prime Day deals 2025: Early discounts

    This is the letter Donald Trump sent Apple to keep TikTok online

    This is the letter Donald Trump sent Apple to keep TikTok online

    Siri’s future, the original iPhone’s past, and Apple Music’s birthday

    Siri’s future, the original iPhone’s past, and Apple Music’s birthday

    Apple is prepping 15 new Macs for release, including one potential surprise

    Apple is prepping 15 new Macs for release, including one potential surprise

  • Retro Rewind
    Retro Rewind: Electronic Games April 1995

    Retro Rewind: Electronic Games April 1995

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

    Retro Rewind: Electronic Gaming Monthly Magazine Number 57 April 1994

    Retro Rewind: Blast from the Past – 35 Iconic Commercials of 1988!

    Retro Rewind: Blast from the Past – 35 Iconic Commercials of 1988!

    Retro Rewind: PC World Magazine August 1998

    Retro Rewind: PC World Magazine August 1998

    Retro Rewind: Computer Shopper Magazine September 1997

    Retro Rewind: Computer Shopper Magazine September 1997

    Retro Rewind: PC Magazine December 2015

    Retro Rewind: PC Magazine December 2015

    Retro Rewind: EDGE Magazine RETRO #1: The Guide to Classic Videogame Playing and Collecting

    Retro Rewind: EDGE Magazine RETRO #1: The Guide to Classic Videogame Playing and Collecting

    Retro Rewind: Computer Gaming World Magazine Issue 73 December 1998

    Retro Rewind: Computer Gaming World Magazine Issue 73 December 1998

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

  • Tech Deals
    Razer Iskur V2 Gaming Chair: Adaptive Lumbar Support – Adjustable Lumbar Curve – High…

    Razer Iskur V2 Gaming Chair: Adaptive Lumbar Support – Adjustable Lumbar Curve – High…

    Critical Rolls: Boxed Set – 5e RPG Storytelling Cards, 300 Tarot Sized Cards, Tabletop…

    Critical Rolls: Boxed Set – 5e RPG Storytelling Cards, 300 Tarot Sized Cards, Tabletop…

    Nintendogs Dachshund & Friends (Renewed)

    Nintendogs Dachshund & Friends (Renewed)

    Gamer [Blu-ray]

    Gamer [Blu-ray]

    Transcend TS-RDF2 Cfast 2.0 USB 3.1 Card Reader

    Transcend TS-RDF2 Cfast 2.0 USB 3.1 Card Reader

    MaxLLTo USB 3.0 Power Charger Data SYNC Cable Cord for Toshiba External Hard Drive Disk…

    MaxLLTo USB 3.0 Power Charger Data SYNC Cable Cord for Toshiba External Hard Drive Disk…

    Seagate Bulk ST4000NM0033 Constellation ES.3 4TB SATA 6G (Renewed)

    Seagate Bulk ST4000NM0033 Constellation ES.3 4TB SATA 6G (Renewed)

    Seagate Video 2.5 HDD Hard Drive – Internal (ST500VT000)

    Seagate Video 2.5 HDD Hard Drive – Internal (ST500VT000)

    TAGRY Bluetooth Headphones True Wireless Earbuds 60H Playback LED Power Display…

    TAGRY Bluetooth Headphones True Wireless Earbuds 60H Playback LED Power Display…

  • Tech Eats
    Cheesy Broccoli Rice Mug: 5-Minute Super Comfort Food

    Cheesy Broccoli Rice Mug: 5-Minute Super Comfort Food

    Top 10 Vegetarian Recipes for 2025: Easy and Nutritious Meals for Busy People

    Top 10 Vegetarian Recipes for 2025: Easy and Nutritious Meals for Busy People

    Bacon Mug Lasagna: 5-Minute Microwave Meat Lover’s Dream

    Bacon Mug Lasagna: 5-Minute Microwave Meat Lover’s Dream

    Bacon Fried Rice Mug: 5-Minute Microwave Meal

    Bacon Fried Rice Mug: 5-Minute Microwave Meal

    Bacon & Cheddar Mug Biscuit: 2-Minute Savory Comfort

    Bacon & Cheddar Mug Biscuit: 2-Minute Savory Comfort

    Loaded Bacon Cheesy Potato Mug: 5-Minute Comfort Food

    Loaded Bacon Cheesy Potato Mug: 5-Minute Comfort Food

    Peanut Butter Banana Mug Muffin: 5-Minute Protein Snack

    Peanut Butter Banana Mug Muffin: 5-Minute Protein Snack

    Oreo Mug Cake: 2-Minute Cookie & Cake Combo!

    Oreo Mug Cake: 2-Minute Cookie & Cake Combo!

    Tiramisu Mug Cake: Coffee Lover’s Dream in 2 Minutes!

    Tiramisu Mug Cake: Coffee Lover’s Dream in 2 Minutes!

  • Tesla
    4 PCS Car Front and Rear Side Window Sunshade, 19.6″ x 31.4″ x 7.8″ + 19.6″ x 31.4″ Keep…

    4 PCS Car Front and Rear Side Window Sunshade, 19.6″ x 31.4″ x 7.8″ + 19.6″ x 31.4″ Keep…

    Car Floor Mats for Tesla Cybertruck 2023 2024 2025, Custom TPE All Weather Protection…

    Car Floor Mats for Tesla Cybertruck 2023 2024 2025, Custom TPE All Weather Protection…

    JOYTUTUS Truck Bed Divider Compatible with Cybertruck 2024 2023 Cargo Divider Organizer…

    JOYTUTUS Truck Bed Divider Compatible with Cybertruck 2024 2023 Cargo Divider Organizer…

    HANSSHOW Pet Seat Covers for Cybertruck Rear Dog Seat Protector Full-Cover Waterproof…

    HANSSHOW Pet Seat Covers for Cybertruck Rear Dog Seat Protector Full-Cover Waterproof…

    Center Console Organizer Tray Compatible with Tesla Cybertruck 2024 2025 Accessories,…

    Center Console Organizer Tray Compatible with Tesla Cybertruck 2024 2025 Accessories,…

    Cybertruck Sticker Vinyl Bumper Sticker Decal Waterproof 5″

    Cybertruck Sticker Vinyl Bumper Sticker Decal Waterproof 5″

    JOMISE Dash Cam Front and Rear, 4k FHD Dual Car Camera, 3″ IPS Dash Camera for Cars with…

    JOMISE Dash Cam Front and Rear, 4k FHD Dual Car Camera, 3″ IPS Dash Camera for Cars with…

    Model 3 Badge Emblem – Front Hood and Rear Trunk Replacement Logo for Tesla Model 3-3D…

    Model 3 Badge Emblem – Front Hood and Rear Trunk Replacement Logo for Tesla Model 3-3D…

    TAPTES for Tesla Model S Floor Mats 2019 2018 2017 2016 2015, Premium All Weather…

    TAPTES for Tesla Model S Floor Mats 2019 2018 2017 2016 2015, Premium All Weather…

  • UFO
    SOJOS Retro Polarized Square Sunglasses Womens Men Vintage Double Bridge Metal Frame UV Protection Sun Glasses SJ1246

    SOJOS Retro Polarized Square Sunglasses Womens Men Vintage Double Bridge Metal Frame UV Protection Sun Glasses SJ1246

    Bill Nye on Space Exploration #billnye #science #space #spaceexploration  #masterclass

    Bill Nye on Space Exploration #billnye #science #space #spaceexploration #masterclass

    Nessie and UFO, Sasquatch Rare Selfie, The Loch Ness Bigfoot T-Shirt

    Nessie and UFO, Sasquatch Rare Selfie, The Loch Ness Bigfoot T-Shirt

    Spirit Communication by Rev. Gaurav Tiwari | Indian Paranormal Society

    Spirit Communication by Rev. Gaurav Tiwari | Indian Paranormal Society

    Crumbl Conspiracy Investigation

    Crumbl Conspiracy Investigation

    New York NY City Lights | Ufo sightings in 2021 | Unidentified Flying object

    New York NY City Lights | Ufo sightings in 2021 | Unidentified Flying object

    amBand Compatible for Fitbit Versa 4/3/2/ Fitbit Versa Lite/Fitbit Sense 2/ Fitbit Sense Bands with Case, Protective Smartwatch Case Strap Rugged Sport Protector Wristbands Men Green

    amBand Compatible for Fitbit Versa 4/3/2/ Fitbit Versa Lite/Fitbit Sense 2/ Fitbit Sense Bands with Case, Protective Smartwatch Case Strap Rugged Sport Protector Wristbands Men Green

    Scientists Solve the Mystery Behind the Oumuamua 'Alien Spacecraft' Comet

    Scientists Solve the Mystery Behind the Oumuamua 'Alien Spacecraft' Comet

    Aliens From Outer Space: Ufo Landings, Crashes And Retrievals

    Aliens From Outer Space: Ufo Landings, Crashes And Retrievals

No Result
View All Result
Techcratic
No Result
View All Result
Home Hacker News

Visual Reasoning is Coming Soon

Hacker News by Hacker News
April 9, 2025
in Hacker News
Reading Time: 13 mins read
124
A A
0

2025-04-09 11:58:00
arcturus-labs.com

I gotta say – I love it living in exponential times. I can just wish that something existed and then within a month it does! This time it happened with OpenAI’s 4o image generation release. In this blog post I’ll briefly cover the release and why I think it’s pretty cool. Then I’ll dive into a new opportunity that I think is even more exciting – visual reasoning.

Visual Reasoning Robot

Rather watch than read?

Hey, I get it – sometimes you just want to kick back and watch! Check out this quick video where I walk through everything in this post. Same great content, just easier on the eyes!

Why Image Manipulation with LLMs Stinks

Working with images in Multimodal LLMs has been a mostly one-sided affair. On one hand, it’s really cool that you can drop an image into an LLM conversation and get the model to reason about it. But when you ask the model to generate an image, there is a disconnect, because all the model can do is describe the image in text and then call out to an external image generation tool to generate the image based on that text. Text is a poor communication medium for images, and the resulting image is often quite disconnected from the expected result because the short description that the LLM provides to the image generation tool will rarely capture the full context of the conversation.

The problem is most pronounced when attempting to go back and forth working on an idea for an image. You can show the LLM an image of your cat and then say “make this cat wear a detective hat and a monocle”. The best the model can do is to put a detective hat and monocle on some cat, not the one in your image. To make matters worse, the model can’t even see the image that it has just created. So if you ask for a modification to first generation attempt, then the subsequent generations are really just starting over from scratch and hoping that a more detailed description to the image generation tool will make things better… it won’t.

Your Cat
Bad Cat Generation
Left: OpenAI’s pet cat. Right: The best a traditional LLM can do when asked to give it a detective hat and monocle.

OpenAI’s Release Brings True Image Manipulation to LLMs

OpenAI’s image generation release changes all of this. OpenAI’s newest GPT-4o model carries the full context of the conversation into the image generation, and rather than call out to an external model, it is the very same GPT-4o model that does the image generation. By carrying the full context of the conversation – including the previous images – into the generation of the new image you can generate much more consistent imagery.

Thus, stealing OpenAI’s example, you can show it a photo of your cat, tell it to give your cat a detective hat, et voila! It works!

Your Cat
Good Cat Generation
Left: OpenAI’s pet cat. Right: GPT-4o’s generation when asked to give the same cat a detective hat and monocle. Much more consistent!

Now if all we could do with this technology is stick silly costumes on our cats, then this would honestly be a pretty lame achievement. But think about what we’re angling for here. You’ll be able to turn hand-drawn garbage into sophisticated infographics and marketing material. You’ll be able to take crappy matplotlib charts and reformat them as professional-looking plots that go into slides. You’ll be able to try on clothes virtually before you buy them. You’ll be able to create photo collages from your photos, game assets, and 3D models. You’ll be able to edit your vacation photos to remove that one person who blinked. You’ll be able to take a photo of your living room and try out different furniture arrangements and color schemes. You’ll be able to take a photo of your garden and get suggestions for where to plant new flowers. The possibilities are endless when you can manipulate images with the full context of a conversation.

Visual Reasoning will be the Real Innovation

All of the above is but prologue for what will come next. Models perform better when they take time to reason. This is because models have no internal monologue. Whereas humans pause and think about a question prior to answering, models have to generate tokens in order to think. But to this point models have only been able to reason in the space of words, and words are ill-suited for reasoning through many types of real world problems. But soon the models’ new image manipulation capabilities will power a new form of reasoning. Right now, this ability is mostly useful for image editing. Soon, models will have the ability to visualize different scenarios and visually reason about the world around them.

Demoing Visual Reasoning

Let’s try out an example right now and get a glimpse of how visual reasoning will work. For our experiment we’ll use Matthew Berman’s marble in a cup problem – it’s one of his go-to reasoning challenges when playing around with a new model:

I’ve put a marble in a glass. The glass is then turned upside down and put on a table. Then the glass is picked up and put into the microwave. Where is the marble?

It’s a great problem because it requires common sense knowledge of how objects work in the world, and it’s especially suited for this post because it requires spatial reasoning to extrapolate to a new scenario. I asked the new GPT-4o a variant of this problem in text:

I set a plate on a table, and glass next to it. I set a marble on the plate. Then I pick up the marble, drop it in the glass. Then I turn the glass upside down and set it on the plate. Then, I pick up the glass and put it in the microwave. Where is the marble?

I’ve added an extra item, a plate, for just a bit more complexity. Here’s the answer it gave me:

The marble is in the microwave, inside the glass.

Here’s a breakdown of what happened:

You placed a marble on the plate.

You picked up the marble and dropped it in the glass — now the marble is in the glass.

You turned the glass upside down and set it on the plate — unless the glass is sealed or the marble is somehow stuck, the marble would fall out onto the plate.

Then you picked up the glass (now empty) and put it in the microwave.

So at this point:

The glass is in the microwave.

The marble is on the plate, outside the microwave.

But you asked: Where is the marble? You didn’t say you picked it up again, so based on what you said, the marble is on the plate, not in the microwave.

Want to run it back or add any twists?

It’s actually a pretty weird answer. It immediately answers – incorrectly – then performs post-hoc chain-of-thought reasoning and comes to the correct answer but ignores the fact that it completely contradicted itself.

Let’s now do a vision-enhanced version of this chain-of-thought reasoning and demonstrate how spatial reasoning will improve the model’s reasoning capabilities.

Problem Statement
Problem Statement Image

I prompted it with the following (including extra requirements to think visually):

I’m going to pick up the marble, put it in the glass, turn the glass over onto the plate, pick the glass up, and then put the glass in the microwave.

The question is, where is the marble?

Solve this one step at a time using images to imagine what happens at each step. For each step, first state the step, and then generate an image corresponding to what the scene will look like after that step. And then summarize the image in English, highlighting the important aspects for solving the problem.

Here are the reasoning responses from the model:

Step 1

Step 1: Pick up the marble.

In the image, the marble has not yet been moved, but we are about to pick it up and place it in the glass.

Step 2

Step 2: Put the marble in the glass.

In the image, the marble is now inside the glass, and the plate is empty.

Step 3

Step 3: Turn the glass over onto the plate.

In the image, the glass is now upside down on the plate, trapping the marble inside.

Step 4

Step 4: Pick the glass up.

In the image, the hand is lifting the glass off the plate. The marble remains on the plate, as it is no longer contained.

Step 5

Step 5: Put the glass in the microwave.

In the image, the glass is now inside the microwave with the marble at the bottom.

Final answer: The marble is inside the microwave, at the bottom of the glass.

So… uh… notably, we made some amazingly consistent images, and then completely got the answer wrong. And we were so close! At Step 4, the model acknowledged that the marble was no longer contained in the glass. Unfortunately, by the time the glass was in the microwave, the model imagined that the marble had somehow followed it.

Training Models to Think Visually

I think the failure to correctly use visual reasoning is because the model hasn’t yet been trained for visual reasoning.

Who knows how OpenAI trained these models, but based on their debuting post, the training to date is almost certainly unrelated to visual reasoning. The examples in that post – while extremely impressive – are about better rendering for text, and better instruction following. But all the instructions follow examples related to image manipulation rather than reasoning. Instead, we need to start training models specifically to perform visual reasoning.

For visual reasoning practice, we can do supervised fine-tuning on sequences similar to the marble example above. For instance, to understand more about the physical world, we can show the model sequential pictures of Slinkys going down stairs, or basketball players shooting 3-pointers, or people hammering birdhouses together. We can make the model become more socially aware by letting it see sequences of text and images from social interactions, and having it predict body language and facial expressions for the next scene. We can train for spatial reasoning by having models solve tangrams or having them visualize what a contraption will look like when rotated by 90 degrees. If you think for very long it’s easy to can come up with tons of ideas like these – visual reasoning is central to most of our lives.

But where will we get all this training data? For spatial and physical reasoning tasks, we can leverage computer graphics to generate synthetic data. This approach is particularly valuable because simulations provide a controlled environment where we can create scenarios with known outcomes, making it easy to verify the model’s predictions. But we’ll also need real-world examples. Fortunately, there’s an abundance of video content online that we can tap into. While initial datasets might require human annotation, soon models themselves will be able to process videos and their transcripts to extract training examples automatically. Consider all the how-to videos online – creators typically explain what they’re going to do before demonstrating it. We can use these to create training pairs: given the current scene and verbal description, have models predict what happens next.

As an aside – I think that videos are going to be our richest source of new training data, and we’re just now starting to really tap into it. But who owns the all of the video content on the web? YouTube… Google. I’d say that bodes really well for the future of the Gemini models. Wouldn’t you?

From Chain-of-Thought to Reasoning Models

As early as 2022 we had recognized that LLMs profited from chain-of-thought reasoning. If a model was making a snap judgment and rushing to an incorrect solution, it was commonplace to add a “let’s think step by step”, hope for a better output, and often get it. Soon, deeper reasoning approaches were discovered. In particular, Tree of Thoughts encouraged the model to think through multiple possible approaches for solving a problem, rank them according to perceived feasibility, and then the application would help direct the model optimally through the paths and toward the problem solution.

With OpenAI’s o1 model and several models that followed soon thereafter, the ability to do this sort of tree-of-thoughts reasoning is baked into the model. Prior to providing a final solution, the model may now think through the problem within special “thinking” tags. According to DeepSeek (who has been much more transparent than OpenAI), their R1 model has been trained through a form of reinforcement learning with verifiable problems so that the model can be rewarded for efficiently thinking its way to the verifiable correct solution. Interestingly, and reminiscent of tree-of-thoughts, the models can think themselves into a corner – recognize that they have messed up – backtrack, and get back to a better course toward solving the problem.

Enter visual reasoning. In the coming year, models will undergo supervised fine-tuning to develop scene prediction abilities – given a current scene and a proposed change, they’ll learn to visualize the outcome. For example, they’ll be able to mentally rotate objects in 3D space or anticipate how people might react in social situations. As training progresses using scenarios with verifiable outcomes, models will develop more sophisticated reasoning capabilities. They’ll be able to observe their environment, formulate plans, mentally simulate the consequences of different actions, and learn from comparing their predictions to real-world results. This advancement will be particularly transformative for robotics, where physical interaction with the environment requires robust visual understanding and planning.

Currently, there is one obvious problem with visual reasoning – image creation is a very slow process. However, like everything else in our world right now, this will soon become faster and more feasible. But even if the visual processing remains slow, the very act of training the models to reason about images will improve the text reasoning for visual/spatial problems. What’s more, the internal representation of the images seems to be created at several levels of granularity – a small blurry image, and then a refined high-def image. The image you see when playing with ChatGPT is the latter, high-def image. Perhaps visual reasoning can make use of the blurry image – it will be quicker to generate and will help with the reasoning process.

Conclusion

I’m excited about what’s coming! Just like every moment for the past 4 years, we’re on the cusp of a radical discovery that will change our world – a radical discovery which is soon realized. I don’t think this one is going to let us down. The ever-more capable visual reasoning models will be able to make better sense of our work – not only in terms of understanding the mechanics of physical objects, but also in reading social cues, and really in anything else that we do where vision is of use to us! And then 2026 will be the year of the robots…

Terminator

Hey, and if you liked this post, then maybe we should be friends!

Source Link


Keep your files stored safely and securely with the SanDisk 2TB Extreme Portable SSD. With over 69,505 ratings and an impressive 4.6 out of 5 stars, this product has been purchased over 8K+ times in the past month. At only $129.99, this Amazon’s Choice product is a must-have for secure file storage.

Help keep private content private with the included password protection featuring 256-bit AES hardware encryption. Order now for just $129.99 on Amazon!


Start your free Amazon Prime trial
today and unlock unlimited streaming and more!

Help Power Techcratic’s Future – Scan To Support

If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.

As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!

BITCOIN

Bitcoin Logo

Bitcoin QR Code

bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge

Scan the QR code with your crypto wallet app

DOGECOIN

Dogecoin Logo

Dogecoin QR Code

D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA

Scan the QR code with your crypto wallet app

ETHEREUM

Ethereum Logo

Ethereum QR Code

0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a

Scan the QR code with your crypto wallet app

Please read the Privacy and Security Disclaimer on how Techcratic handles your support.

Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.

Tags: Hacker News
Share162Share28ShareShare4ShareTweet101
Previous Post

Adopting A Kitten | Cat's First Days Home!

Next Post

Legacy Video Game AV Cable (Atari 2600, Colecovision, Intellivision & more)

Hacker News

Hacker News

Stay updated with Hacker News, where technology meets entrepreneurial spirit. Get the latest on tech trends, startup news, and discussions from the tech community. Read the latest updates here at Techcratic.

Related Posts

News Alert Immediately – Instant News Alerts & Global Monitoring
Hacker News

News Alert Immediately – Instant News Alerts & Global Monitoring

July 6, 2025
1.3k
hackArcana
Hacker News

hackArcana

July 6, 2025
1.3k
Differentiable Programming with PyTorch and DSPy
Hacker News

Differentiable Programming with PyTorch and DSPy

July 5, 2025
1.3k
The Right Way to Embed an LLM in a Group Chat
Hacker News

The Right Way to Embed an LLM in a Group Chat

July 5, 2025
1.3k
Cybersecurity
Hacker News

How to get into cybersecurity

July 5, 2025
1.3k
Local First Software Is Easier to Scale
Hacker News

Local First Software Is Easier to Scale

July 5, 2025
1.3k
GNU Taler
Hacker News

GNU Taler

July 5, 2025
1.3k
Impact of PCIe 5.0 Bandwidth on GPU Content Creation Performance
Hacker News

Impact of PCIe 5.0 Bandwidth on GPU Content Creation Performance

July 5, 2025
1.3k
Load More
Next Post
Legacy Video Game AV Cable (Atari 2600, Colecovision, Intellivision & more)

Legacy Video Game AV Cable (Atari 2600, Colecovision, Intellivision & more)

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Tech Resources

  • 30 Second Tech ™
  • AI
  • App Zone ™
  • Apple
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • Forbes
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo
  • Google News
  • Hacker News
  • Harvard Tech
  • I Like Cats ™
  • I Like Dogs ™
  • LifeHacker
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • PC World
  • Photofocus
  • Physics
  • Random Tech
  • Retro Rewind ™
  • Robot Report
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Art
  • Tech Careers
  • Tech Deals
  • Techcratic ™
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Wired
  • ZDNet

Tech News

  • 30 Second Tech ™
  • AI
  • Apple Insider
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • ExtremeTech
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo

Tech News

  • Harvard Tech
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • Physics
  • PC World
  • Random Tech
  • Retro Rewind ™
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Careers

Tech News​

  • Tech Art
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Wired
  • ZDNet

Site Links

  • About Techcratic
  • Affiliate Disclaimer
  • Affiliate Link Policy
  • Contact Techcratic
  • Dealors Discount Store
  • Privacy and Security Disclaimer
  • Privacy Policy
  • RSS Feed
  • Site Map
  • Support Techcratic
  • Techcratic
  • Tech Deals
  • TOS
  • 𝕏
Click For A Secret Deal

Techcratic – Your All In One Tech Hub © 2020 – 2025
All Rights Reserved
∞

No Result
View All Result
  • 30 Second Tech ™
  • AI
  • App Zone ™
  • Apple
  • Ars Technica
  • CNET
  • Crypto News
  • Cybersecurity
  • Endgadget
  • Gaming
  • I Like Cats ™
  • I Like Dogs ™
  • MacRumors
  • Macworld
  • Tech Deals
  • Techcratic ™
  • Techs Got To Eat ™
  • Tesla
  • UFO
  • Wired