• About TC
  • Affiliate Disclaimer
  • Privacy Policy
  • TOS
  • Contact
Sunday, July 6, 2025
Techcratic
  • TC
  • AI
    Artificial Intelligence

    Transforming network operations with AI: How Swisscom built a network assistant using Amazon Bedrock

    Artificial Intelligence

    EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video

    Artificial Intelligence

    Instruction-Following Pruning for Large Language Models

    Artificial Intelligence

    How to Combine Streamlit, Pandas, and Plotly for Interactive Data Apps

    Artificial Intelligence

    Tailor responsible AI with new safeguard tiers in Amazon Bedrock Guardrails

    Artificial Intelligence

    Automate Data Quality Reports with n8n: From CSV to Professional Analysis

    Artificial Intelligence

    NewDay builds A Generative AI based Customer service Agent Assist with over 90% accuracy

    Artificial Intelligence

    5 Things You Need to Know About Agentic AI

    Artificial Intelligence

    Normalizing Flows are Capable Generative Models

  • App Zone
    Top 3 Launcher Apps for Apple: Features, Pros, and Cons

    Top 3 Launcher Apps for Apple: Features, Pros, and Cons

    Top 3 Launcher Apps for Android: Features, Pros, and Cons

    Top 3 Launcher Apps for Android: Features, Pros, and Cons

    Top 3 Card Game Apps of 2025: Features, Pros, and Cons

    Top 3 Card Game Apps of 2025: Features, Pros, and Cons

    Top 3 Medical Apps of 2025: Features, Pros, and Cons

    Top 3 Medical Apps of 2025: Features, Pros, and Cons

    Top 3 Travel Apps of 2025: Features, Pros, and Cons

    Top 3 Travel Apps of 2025: Features, Pros, and Cons

    Top 3 Casual Game Apps for 2025: Features, Pros, and Cons

    Top 3 Casual Game Apps for 2025: Features, Pros, and Cons

    Top 3 Food Apps for 2025: Features, Pros, and Cons

    Top 3 Food Apps for 2025: Features, Pros, and Cons

    Top 3 Sport Apps for 2025: Features, Pros, and Cons

    Top 3 Sport Apps for 2025: Features, Pros, and Cons

    Top 3 Productivity Apps for 2025: Features, Pros, and Cons

    Top 3 Productivity Apps for 2025: Features, Pros, and Cons

  • Apple
    How to stop LG & Samsung smart TV tracking, screen captures

    How to stop LG & Samsung smart TV tracking, screen captures

    Apple’s F1 expected to hit $300M at the box office this weekend

    Apple’s F1 expected to hit $300M at the box office this weekend

    Apple is reportedly working on a cheaper MacBook, but will it stick the landing?

    Apple is reportedly working on a cheaper MacBook, but will it stick the landing?

    Apple @ Work: Macs have never been more expensive to repair, but never been more reliable

    Apple @ Work: Macs have never been more expensive to repair, but never been more reliable

    New Gemini icon comes to Android and iPhone

    New Gemini icon comes to Android and iPhone

    Best Mac SSD and hard drive Prime Day deals 2025: Early discounts

    Best Mac SSD and hard drive Prime Day deals 2025: Early discounts

    This is the letter Donald Trump sent Apple to keep TikTok online

    This is the letter Donald Trump sent Apple to keep TikTok online

    Siri’s future, the original iPhone’s past, and Apple Music’s birthday

    Siri’s future, the original iPhone’s past, and Apple Music’s birthday

    Apple is prepping 15 new Macs for release, including one potential surprise

    Apple is prepping 15 new Macs for release, including one potential surprise

  • Retro Rewind
    Retro Rewind: Electronic Games April 1995

    Retro Rewind: Electronic Games April 1995

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

    Retro Rewind: Electronic Gaming Monthly Magazine Number 57 April 1994

    Retro Rewind: Blast from the Past – 35 Iconic Commercials of 1988!

    Retro Rewind: Blast from the Past – 35 Iconic Commercials of 1988!

    Retro Rewind: PC World Magazine August 1998

    Retro Rewind: PC World Magazine August 1998

    Retro Rewind: Computer Shopper Magazine September 1997

    Retro Rewind: Computer Shopper Magazine September 1997

    Retro Rewind: PC Magazine December 2015

    Retro Rewind: PC Magazine December 2015

    Retro Rewind: EDGE Magazine RETRO #1: The Guide to Classic Videogame Playing and Collecting

    Retro Rewind: EDGE Magazine RETRO #1: The Guide to Classic Videogame Playing and Collecting

    Retro Rewind: Computer Gaming World Magazine Issue 73 December 1998

    Retro Rewind: Computer Gaming World Magazine Issue 73 December 1998

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

  • Tech Deals
    Razer Iskur V2 Gaming Chair: Adaptive Lumbar Support – Adjustable Lumbar Curve – High…

    Razer Iskur V2 Gaming Chair: Adaptive Lumbar Support – Adjustable Lumbar Curve – High…

    Critical Rolls: Boxed Set – 5e RPG Storytelling Cards, 300 Tarot Sized Cards, Tabletop…

    Critical Rolls: Boxed Set – 5e RPG Storytelling Cards, 300 Tarot Sized Cards, Tabletop…

    Nintendogs Dachshund & Friends (Renewed)

    Nintendogs Dachshund & Friends (Renewed)

    Gamer [Blu-ray]

    Gamer [Blu-ray]

    Transcend TS-RDF2 Cfast 2.0 USB 3.1 Card Reader

    Transcend TS-RDF2 Cfast 2.0 USB 3.1 Card Reader

    MaxLLTo USB 3.0 Power Charger Data SYNC Cable Cord for Toshiba External Hard Drive Disk…

    MaxLLTo USB 3.0 Power Charger Data SYNC Cable Cord for Toshiba External Hard Drive Disk…

    Seagate Bulk ST4000NM0033 Constellation ES.3 4TB SATA 6G (Renewed)

    Seagate Bulk ST4000NM0033 Constellation ES.3 4TB SATA 6G (Renewed)

    Seagate Video 2.5 HDD Hard Drive – Internal (ST500VT000)

    Seagate Video 2.5 HDD Hard Drive – Internal (ST500VT000)

    TAGRY Bluetooth Headphones True Wireless Earbuds 60H Playback LED Power Display…

    TAGRY Bluetooth Headphones True Wireless Earbuds 60H Playback LED Power Display…

  • Tech Eats
    Cheesy Broccoli Rice Mug: 5-Minute Super Comfort Food

    Cheesy Broccoli Rice Mug: 5-Minute Super Comfort Food

    Top 10 Vegetarian Recipes for 2025: Easy and Nutritious Meals for Busy People

    Top 10 Vegetarian Recipes for 2025: Easy and Nutritious Meals for Busy People

    Bacon Mug Lasagna: 5-Minute Microwave Meat Lover’s Dream

    Bacon Mug Lasagna: 5-Minute Microwave Meat Lover’s Dream

    Bacon Fried Rice Mug: 5-Minute Microwave Meal

    Bacon Fried Rice Mug: 5-Minute Microwave Meal

    Bacon & Cheddar Mug Biscuit: 2-Minute Savory Comfort

    Bacon & Cheddar Mug Biscuit: 2-Minute Savory Comfort

    Loaded Bacon Cheesy Potato Mug: 5-Minute Comfort Food

    Loaded Bacon Cheesy Potato Mug: 5-Minute Comfort Food

    Peanut Butter Banana Mug Muffin: 5-Minute Protein Snack

    Peanut Butter Banana Mug Muffin: 5-Minute Protein Snack

    Oreo Mug Cake: 2-Minute Cookie & Cake Combo!

    Oreo Mug Cake: 2-Minute Cookie & Cake Combo!

    Tiramisu Mug Cake: Coffee Lover’s Dream in 2 Minutes!

    Tiramisu Mug Cake: Coffee Lover’s Dream in 2 Minutes!

  • Tesla
    Car Floor Mats for Tesla Cybertruck 2023 2024 2025, Custom TPE All Weather Protection…

    Car Floor Mats for Tesla Cybertruck 2023 2024 2025, Custom TPE All Weather Protection…

    JOYTUTUS Truck Bed Divider Compatible with Cybertruck 2024 2023 Cargo Divider Organizer…

    JOYTUTUS Truck Bed Divider Compatible with Cybertruck 2024 2023 Cargo Divider Organizer…

    HANSSHOW Pet Seat Covers for Cybertruck Rear Dog Seat Protector Full-Cover Waterproof…

    HANSSHOW Pet Seat Covers for Cybertruck Rear Dog Seat Protector Full-Cover Waterproof…

    Center Console Organizer Tray Compatible with Tesla Cybertruck 2024 2025 Accessories,…

    Center Console Organizer Tray Compatible with Tesla Cybertruck 2024 2025 Accessories,…

    Cybertruck Sticker Vinyl Bumper Sticker Decal Waterproof 5″

    Cybertruck Sticker Vinyl Bumper Sticker Decal Waterproof 5″

    JOMISE Dash Cam Front and Rear, 4k FHD Dual Car Camera, 3″ IPS Dash Camera for Cars with…

    JOMISE Dash Cam Front and Rear, 4k FHD Dual Car Camera, 3″ IPS Dash Camera for Cars with…

    Model 3 Badge Emblem – Front Hood and Rear Trunk Replacement Logo for Tesla Model 3-3D…

    Model 3 Badge Emblem – Front Hood and Rear Trunk Replacement Logo for Tesla Model 3-3D…

    TAPTES for Tesla Model S Floor Mats 2019 2018 2017 2016 2015, Premium All Weather…

    TAPTES for Tesla Model S Floor Mats 2019 2018 2017 2016 2015, Premium All Weather…

    NACS to CCS1 Car Adapter 500A 1000V Tesla Charger Adapter for EV Fast Charging at Tesla…

    NACS to CCS1 Car Adapter 500A 1000V Tesla Charger Adapter for EV Fast Charging at Tesla…

  • UFO
    Nessie and UFO, Sasquatch Rare Selfie, The Loch Ness Bigfoot T-Shirt

    Nessie and UFO, Sasquatch Rare Selfie, The Loch Ness Bigfoot T-Shirt

    Spirit Communication by Rev. Gaurav Tiwari | Indian Paranormal Society

    Spirit Communication by Rev. Gaurav Tiwari | Indian Paranormal Society

    Crumbl Conspiracy Investigation

    Crumbl Conspiracy Investigation

    New York NY City Lights | Ufo sightings in 2021 | Unidentified Flying object

    New York NY City Lights | Ufo sightings in 2021 | Unidentified Flying object

    amBand Compatible for Fitbit Versa 4/3/2/ Fitbit Versa Lite/Fitbit Sense 2/ Fitbit Sense Bands with Case, Protective Smartwatch Case Strap Rugged Sport Protector Wristbands Men Green

    amBand Compatible for Fitbit Versa 4/3/2/ Fitbit Versa Lite/Fitbit Sense 2/ Fitbit Sense Bands with Case, Protective Smartwatch Case Strap Rugged Sport Protector Wristbands Men Green

    Scientists Solve the Mystery Behind the Oumuamua 'Alien Spacecraft' Comet

    Scientists Solve the Mystery Behind the Oumuamua 'Alien Spacecraft' Comet

    Aliens From Outer Space: Ufo Landings, Crashes And Retrievals

    Aliens From Outer Space: Ufo Landings, Crashes And Retrievals

    ABC World News Tonight with David Muir Full Broadcast – May 24, 2025

    ABC World News Tonight with David Muir Full Broadcast – May 24, 2025

    Cow UFO Abduction Vintage BELIEVE Retro Gift T-Shirt

    Cow UFO Abduction Vintage BELIEVE Retro Gift T-Shirt

No Result
View All Result
  • TC
  • AI
    Artificial Intelligence

    Transforming network operations with AI: How Swisscom built a network assistant using Amazon Bedrock

    Artificial Intelligence

    EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video

    Artificial Intelligence

    Instruction-Following Pruning for Large Language Models

    Artificial Intelligence

    How to Combine Streamlit, Pandas, and Plotly for Interactive Data Apps

    Artificial Intelligence

    Tailor responsible AI with new safeguard tiers in Amazon Bedrock Guardrails

    Artificial Intelligence

    Automate Data Quality Reports with n8n: From CSV to Professional Analysis

    Artificial Intelligence

    NewDay builds A Generative AI based Customer service Agent Assist with over 90% accuracy

    Artificial Intelligence

    5 Things You Need to Know About Agentic AI

    Artificial Intelligence

    Normalizing Flows are Capable Generative Models

  • App Zone
    Top 3 Launcher Apps for Apple: Features, Pros, and Cons

    Top 3 Launcher Apps for Apple: Features, Pros, and Cons

    Top 3 Launcher Apps for Android: Features, Pros, and Cons

    Top 3 Launcher Apps for Android: Features, Pros, and Cons

    Top 3 Card Game Apps of 2025: Features, Pros, and Cons

    Top 3 Card Game Apps of 2025: Features, Pros, and Cons

    Top 3 Medical Apps of 2025: Features, Pros, and Cons

    Top 3 Medical Apps of 2025: Features, Pros, and Cons

    Top 3 Travel Apps of 2025: Features, Pros, and Cons

    Top 3 Travel Apps of 2025: Features, Pros, and Cons

    Top 3 Casual Game Apps for 2025: Features, Pros, and Cons

    Top 3 Casual Game Apps for 2025: Features, Pros, and Cons

    Top 3 Food Apps for 2025: Features, Pros, and Cons

    Top 3 Food Apps for 2025: Features, Pros, and Cons

    Top 3 Sport Apps for 2025: Features, Pros, and Cons

    Top 3 Sport Apps for 2025: Features, Pros, and Cons

    Top 3 Productivity Apps for 2025: Features, Pros, and Cons

    Top 3 Productivity Apps for 2025: Features, Pros, and Cons

  • Apple
    How to stop LG & Samsung smart TV tracking, screen captures

    How to stop LG & Samsung smart TV tracking, screen captures

    Apple’s F1 expected to hit $300M at the box office this weekend

    Apple’s F1 expected to hit $300M at the box office this weekend

    Apple is reportedly working on a cheaper MacBook, but will it stick the landing?

    Apple is reportedly working on a cheaper MacBook, but will it stick the landing?

    Apple @ Work: Macs have never been more expensive to repair, but never been more reliable

    Apple @ Work: Macs have never been more expensive to repair, but never been more reliable

    New Gemini icon comes to Android and iPhone

    New Gemini icon comes to Android and iPhone

    Best Mac SSD and hard drive Prime Day deals 2025: Early discounts

    Best Mac SSD and hard drive Prime Day deals 2025: Early discounts

    This is the letter Donald Trump sent Apple to keep TikTok online

    This is the letter Donald Trump sent Apple to keep TikTok online

    Siri’s future, the original iPhone’s past, and Apple Music’s birthday

    Siri’s future, the original iPhone’s past, and Apple Music’s birthday

    Apple is prepping 15 new Macs for release, including one potential surprise

    Apple is prepping 15 new Macs for release, including one potential surprise

  • Retro Rewind
    Retro Rewind: Electronic Games April 1995

    Retro Rewind: Electronic Games April 1995

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

    Retro Rewind: Electronic Gaming Monthly Magazine Number 57 April 1994

    Retro Rewind: Blast from the Past – 35 Iconic Commercials of 1988!

    Retro Rewind: Blast from the Past – 35 Iconic Commercials of 1988!

    Retro Rewind: PC World Magazine August 1998

    Retro Rewind: PC World Magazine August 1998

    Retro Rewind: Computer Shopper Magazine September 1997

    Retro Rewind: Computer Shopper Magazine September 1997

    Retro Rewind: PC Magazine December 2015

    Retro Rewind: PC Magazine December 2015

    Retro Rewind: EDGE Magazine RETRO #1: The Guide to Classic Videogame Playing and Collecting

    Retro Rewind: EDGE Magazine RETRO #1: The Guide to Classic Videogame Playing and Collecting

    Retro Rewind: Computer Gaming World Magazine Issue 73 December 1998

    Retro Rewind: Computer Gaming World Magazine Issue 73 December 1998

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

  • Tech Deals
    Razer Iskur V2 Gaming Chair: Adaptive Lumbar Support – Adjustable Lumbar Curve – High…

    Razer Iskur V2 Gaming Chair: Adaptive Lumbar Support – Adjustable Lumbar Curve – High…

    Critical Rolls: Boxed Set – 5e RPG Storytelling Cards, 300 Tarot Sized Cards, Tabletop…

    Critical Rolls: Boxed Set – 5e RPG Storytelling Cards, 300 Tarot Sized Cards, Tabletop…

    Nintendogs Dachshund & Friends (Renewed)

    Nintendogs Dachshund & Friends (Renewed)

    Gamer [Blu-ray]

    Gamer [Blu-ray]

    Transcend TS-RDF2 Cfast 2.0 USB 3.1 Card Reader

    Transcend TS-RDF2 Cfast 2.0 USB 3.1 Card Reader

    MaxLLTo USB 3.0 Power Charger Data SYNC Cable Cord for Toshiba External Hard Drive Disk…

    MaxLLTo USB 3.0 Power Charger Data SYNC Cable Cord for Toshiba External Hard Drive Disk…

    Seagate Bulk ST4000NM0033 Constellation ES.3 4TB SATA 6G (Renewed)

    Seagate Bulk ST4000NM0033 Constellation ES.3 4TB SATA 6G (Renewed)

    Seagate Video 2.5 HDD Hard Drive – Internal (ST500VT000)

    Seagate Video 2.5 HDD Hard Drive – Internal (ST500VT000)

    TAGRY Bluetooth Headphones True Wireless Earbuds 60H Playback LED Power Display…

    TAGRY Bluetooth Headphones True Wireless Earbuds 60H Playback LED Power Display…

  • Tech Eats
    Cheesy Broccoli Rice Mug: 5-Minute Super Comfort Food

    Cheesy Broccoli Rice Mug: 5-Minute Super Comfort Food

    Top 10 Vegetarian Recipes for 2025: Easy and Nutritious Meals for Busy People

    Top 10 Vegetarian Recipes for 2025: Easy and Nutritious Meals for Busy People

    Bacon Mug Lasagna: 5-Minute Microwave Meat Lover’s Dream

    Bacon Mug Lasagna: 5-Minute Microwave Meat Lover’s Dream

    Bacon Fried Rice Mug: 5-Minute Microwave Meal

    Bacon Fried Rice Mug: 5-Minute Microwave Meal

    Bacon & Cheddar Mug Biscuit: 2-Minute Savory Comfort

    Bacon & Cheddar Mug Biscuit: 2-Minute Savory Comfort

    Loaded Bacon Cheesy Potato Mug: 5-Minute Comfort Food

    Loaded Bacon Cheesy Potato Mug: 5-Minute Comfort Food

    Peanut Butter Banana Mug Muffin: 5-Minute Protein Snack

    Peanut Butter Banana Mug Muffin: 5-Minute Protein Snack

    Oreo Mug Cake: 2-Minute Cookie & Cake Combo!

    Oreo Mug Cake: 2-Minute Cookie & Cake Combo!

    Tiramisu Mug Cake: Coffee Lover’s Dream in 2 Minutes!

    Tiramisu Mug Cake: Coffee Lover’s Dream in 2 Minutes!

  • Tesla
    Car Floor Mats for Tesla Cybertruck 2023 2024 2025, Custom TPE All Weather Protection…

    Car Floor Mats for Tesla Cybertruck 2023 2024 2025, Custom TPE All Weather Protection…

    JOYTUTUS Truck Bed Divider Compatible with Cybertruck 2024 2023 Cargo Divider Organizer…

    JOYTUTUS Truck Bed Divider Compatible with Cybertruck 2024 2023 Cargo Divider Organizer…

    HANSSHOW Pet Seat Covers for Cybertruck Rear Dog Seat Protector Full-Cover Waterproof…

    HANSSHOW Pet Seat Covers for Cybertruck Rear Dog Seat Protector Full-Cover Waterproof…

    Center Console Organizer Tray Compatible with Tesla Cybertruck 2024 2025 Accessories,…

    Center Console Organizer Tray Compatible with Tesla Cybertruck 2024 2025 Accessories,…

    Cybertruck Sticker Vinyl Bumper Sticker Decal Waterproof 5″

    Cybertruck Sticker Vinyl Bumper Sticker Decal Waterproof 5″

    JOMISE Dash Cam Front and Rear, 4k FHD Dual Car Camera, 3″ IPS Dash Camera for Cars with…

    JOMISE Dash Cam Front and Rear, 4k FHD Dual Car Camera, 3″ IPS Dash Camera for Cars with…

    Model 3 Badge Emblem – Front Hood and Rear Trunk Replacement Logo for Tesla Model 3-3D…

    Model 3 Badge Emblem – Front Hood and Rear Trunk Replacement Logo for Tesla Model 3-3D…

    TAPTES for Tesla Model S Floor Mats 2019 2018 2017 2016 2015, Premium All Weather…

    TAPTES for Tesla Model S Floor Mats 2019 2018 2017 2016 2015, Premium All Weather…

    NACS to CCS1 Car Adapter 500A 1000V Tesla Charger Adapter for EV Fast Charging at Tesla…

    NACS to CCS1 Car Adapter 500A 1000V Tesla Charger Adapter for EV Fast Charging at Tesla…

  • UFO
    Nessie and UFO, Sasquatch Rare Selfie, The Loch Ness Bigfoot T-Shirt

    Nessie and UFO, Sasquatch Rare Selfie, The Loch Ness Bigfoot T-Shirt

    Spirit Communication by Rev. Gaurav Tiwari | Indian Paranormal Society

    Spirit Communication by Rev. Gaurav Tiwari | Indian Paranormal Society

    Crumbl Conspiracy Investigation

    Crumbl Conspiracy Investigation

    New York NY City Lights | Ufo sightings in 2021 | Unidentified Flying object

    New York NY City Lights | Ufo sightings in 2021 | Unidentified Flying object

    amBand Compatible for Fitbit Versa 4/3/2/ Fitbit Versa Lite/Fitbit Sense 2/ Fitbit Sense Bands with Case, Protective Smartwatch Case Strap Rugged Sport Protector Wristbands Men Green

    amBand Compatible for Fitbit Versa 4/3/2/ Fitbit Versa Lite/Fitbit Sense 2/ Fitbit Sense Bands with Case, Protective Smartwatch Case Strap Rugged Sport Protector Wristbands Men Green

    Scientists Solve the Mystery Behind the Oumuamua 'Alien Spacecraft' Comet

    Scientists Solve the Mystery Behind the Oumuamua 'Alien Spacecraft' Comet

    Aliens From Outer Space: Ufo Landings, Crashes And Retrievals

    Aliens From Outer Space: Ufo Landings, Crashes And Retrievals

    ABC World News Tonight with David Muir Full Broadcast – May 24, 2025

    ABC World News Tonight with David Muir Full Broadcast – May 24, 2025

    Cow UFO Abduction Vintage BELIEVE Retro Gift T-Shirt

    Cow UFO Abduction Vintage BELIEVE Retro Gift T-Shirt

No Result
View All Result
Techcratic
No Result
View All Result
Home MIT Tech

AI learns how vision and sound are connected, without human intervention | MIT News

MIT Tech by MIT Tech
May 22, 2025
in MIT Tech
Reading Time: 7 mins read
124
A A
0

Adam Zewe | MIT News
2025-05-22 00:00:00
news.mit.edu

Humans naturally learn by making connections between sight and sound. For instance, we can watch someone playing the cello and recognize that the cellist’s movements are generating the music we hear.

A new approach developed by researchers from MIT and elsewhere improves an AI model’s ability to learn in this same fashion. This could be useful in applications such as journalism and film production, where the model could help with curating multimodal content through automatic video and audio retrieval.

In the longer term, this work could be used to improve a robot’s ability to understand real-world environments, where auditory and visual information are often closely connected.

Improving upon prior work from their group, the researchers created a method that helps machine-learning models align corresponding audio and visual data from video clips without the need for human labels.

They adjusted how their original model is trained so it learns a finer-grained correspondence between a particular video frame and the audio that occurs in that moment. The researchers also made some architectural tweaks that help the system balance two distinct learning objectives, which improves performance.

Taken together, these relatively simple improvements boost the accuracy of their approach in video retrieval tasks and in classifying the action in audiovisual scenes. For instance, the new method could automatically and precisely match the sound of a door slamming with the visual of it closing in a video clip.

“We are building AI systems that can process the world like humans do, in terms of having both audio and visual information coming in at once and being able to seamlessly process both modalities. Looking forward, if we can integrate this audio-visual technology into some of the tools we use on a daily basis, like large language models, it could open up a lot of new applications,” says Andrew Rouditchenko, an MIT graduate student and co-author of a paper on this research.

He is joined on the paper by lead author Edson Araujo, a graduate student at Goethe University in Germany; Yuan Gong, a former MIT postdoc; Saurabhchand Bhati, a current MIT postdoc; Samuel Thomas, Brian Kingsbury, and Leonid Karlinsky of IBM Research; Rogerio Feris, principal scientist and manager at the MIT-IBM Watson AI Lab; James Glass, senior research scientist and head of the Spoken Language Systems Group in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL); and senior author Hilde Kuehne, professor of computer science at Goethe University and an affiliated professor at the MIT-IBM Watson AI Lab. The work will be presented at the Conference on Computer Vision and Pattern Recognition.

Syncing up

This work builds upon a machine-learning method the researchers developed a few years ago, which provided an efficient way to train a multimodal model to simultaneously process audio and visual data without the need for human labels.

The researchers feed this model, called CAV-MAE, unlabeled video clips and it encodes the visual and audio data separately into representations called tokens. Using the natural audio from the recording, the model automatically learns to map corresponding pairs of audio and visual tokens close together within its internal representation space.

They found that using two learning objectives balances the model’s learning process, which enables CAV-MAE to understand the corresponding audio and visual data while improving its ability to recover video clips that match user queries.

But CAV-MAE treats audio and visual samples as one unit, so a 10-second video clip and the sound of a door slamming are mapped together, even if that audio event happens in just one second of the video.

In their improved model, called CAV-MAE Sync, the researchers split the audio into smaller windows before the model computes its representations of the data, so it generates separate representations that correspond to each smaller window of audio.

During training, the model learns to associate one video frame with the audio that occurs during just that frame.

“By doing that, the model learns a finer-grained correspondence, which helps with performance later when we aggregate this information,” Araujo says.

They also incorporated architectural improvements that help the model balance its two learning objectives.

Adding “wiggle room”

The model incorporates a contrastive objective, where it learns to associate similar audio and visual data, and a reconstruction objective which aims to recover specific audio and visual data based on user queries.

In CAV-MAE Sync, the researchers introduced two new types of data representations, or tokens, to improve the model’s learning ability.

They include dedicated “global tokens” that help with the contrastive learning objective and dedicated “register tokens” that help the model focus on important details for the reconstruction objective.

“Essentially, we add a bit more wiggle room to the model so it can perform each of these two tasks, contrastive and reconstructive, a bit more independently. That benefitted overall performance,” Araujo adds.

While the researchers had some intuition these enhancements would improve the performance of CAV-MAE Sync, it took a careful combination of strategies to shift the model in the direction they wanted it to go.

“Because we have multiple modalities, we need a good model for both modalities by themselves, but we also need to get them to fuse together and collaborate,” Rouditchenko says.

In the end, their enhancements improved the model’s ability to retrieve videos based on an audio query and predict the class of an audio-visual scene, like a dog barking or an instrument playing.

Its results were more accurate than their prior work, and it also performed better than more complex, state-of-the-art methods that require larger amounts of training data.

“Sometimes, very simple ideas or little patterns you see in the data have big value when applied on top of a model you are working on,” Araujo says.

In the future, the researchers want to incorporate new models that generate better data representations into CAV-MAE Sync, which could improve performance. They also want to enable their system to handle text data, which would be an important step toward generating an audiovisual large language model.

This work is funded, in part, by the German Federal Ministry of Education and Research and the MIT-IBM Watson AI Lab.

Source Link


Upgrade your audio game with the Logitech for Creators Blue Yeti USB Microphone. With over 33,730 ratings and an impressive 4.6 out of 5 stars, it’s no wonder this is an Amazon’s Choice product. Recently, 5K+ units were purchased in the past month.

Available in five stunning colors: Teal, Silver, Pink Dawn, Midnight Blue, and Blackout, this microphone is perfect for creators looking to produce exceptional audio. Priced at only $84.99, it’s a deal you can’t afford to miss.

Elevate your recordings with clear broadcast-quality sound and explore your creativity with enhanced effects, advanced modulation, and HD audio samples. Order now for just $84.99 on Amazon!


Start your free Amazon Prime trial
today and unlock unlimited streaming and more!

Help Power Techcratic’s Future – Scan To Support

If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.

As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!

BITCOIN

Bitcoin Logo

Bitcoin QR Code

bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge

Scan the QR code with your crypto wallet app

DOGECOIN

Dogecoin Logo

Dogecoin QR Code

D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA

Scan the QR code with your crypto wallet app

ETHEREUM

Ethereum Logo

Ethereum QR Code

0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a

Scan the QR code with your crypto wallet app

Please read the Privacy and Security Disclaimer on how Techcratic handles your support.

Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.

Tags: MIT Tech
Share162Share28ShareShare4ShareTweet101
Previous Post

New TIPS & TRICKS after PATCH! – Faster Oil Farm / New Pall Stats & Chromite Farm Changes – Palworl

Next Post

Infinix Xpad GT gaming tablet unveiled with 13″ 144Hz display and Snapdragon 888

MIT Tech

MIT Tech

Discover cutting-edge research and technological breakthroughs with MIT Tech. Explore innovative projects and academic insights shaping the future of technology. Stay informed with the latest articles here at Techcratic.

Related Posts

3 Questions: How MIT’s venture studio is partnering with MIT labs to solve “holy grail” problems | MIT News
MIT Tech

3 Questions: How MIT’s venture studio is partnering with MIT labs to solve “holy grail” problems | MIT News

July 2, 2025
1.3k
New method combines imaging and sequencing to study gene function in intact tissue | MIT News
MIT Tech

New method combines imaging and sequencing to study gene function in intact tissue | MIT News

June 30, 2025
1.3k
Faces of MIT: Ylana Lopez | MIT News
MIT Tech

Faces of MIT: Ylana Lopez | MIT News

June 27, 2025
1.3k
Face-to-face with Es Devlin | MIT News
MIT Tech

Face-to-face with Es Devlin | MIT News

June 26, 2025
1.3k
Travels with Rambax | MIT Technology Review
MIT Tech

Travels with Rambax | MIT Technology Review

June 25, 2025
1.3k
Art rhymes
MIT Tech

Art rhymes

June 24, 2025
1.3k
An epic year for women’s sports
MIT Tech

An epic year for women’s sports

June 24, 2025
1.3k
LLMs factor in unrelated information when recommending medical treatments | MIT News
MIT Tech

LLMs factor in unrelated information when recommending medical treatments | MIT News

June 23, 2025
1.3k
Load More
Next Post
Smartphone

Infinix Xpad GT gaming tablet unveiled with 13" 144Hz display and Snapdragon 888

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Tech Resources

  • 30 Second Tech ™
  • AI
  • App Zone ™
  • Apple
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • Forbes
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo
  • Google News
  • Hacker News
  • Harvard Tech
  • I Like Cats ™
  • I Like Dogs ™
  • LifeHacker
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • PC World
  • Photofocus
  • Physics
  • Random Tech
  • Retro Rewind ™
  • Robot Report
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Art
  • Tech Careers
  • Tech Deals
  • Techcratic ™
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Wired
  • ZDNet

Tech News

  • 30 Second Tech ™
  • AI
  • Apple Insider
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • ExtremeTech
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo

Tech News

  • Harvard Tech
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • Physics
  • PC World
  • Random Tech
  • Retro Rewind ™
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Careers

Tech News​

  • Tech Art
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Wired
  • ZDNet

Site Links

  • About Techcratic
  • Affiliate Disclaimer
  • Affiliate Link Policy
  • Contact Techcratic
  • Dealors Discount Store
  • Privacy and Security Disclaimer
  • Privacy Policy
  • RSS Feed
  • Site Map
  • Support Techcratic
  • Techcratic
  • Tech Deals
  • TOS
  • 𝕏
Click For A Secret Deal

Techcratic – Your All In One Tech Hub © 2020 – 2025
All Rights Reserved
∞

No Result
View All Result
  • 30 Second Tech ™
  • AI
  • App Zone ™
  • Apple
  • Ars Technica
  • CNET
  • Crypto News
  • Cybersecurity
  • Endgadget
  • Gaming
  • I Like Cats ™
  • I Like Dogs ™
  • MacRumors
  • Macworld
  • Tech Deals
  • Techcratic ™
  • Techs Got To Eat ™
  • Tesla
  • UFO
  • Wired