• About TC
  • Affiliate Disclaimer
  • Privacy Policy
  • TOS
  • Contact
Sunday, July 6, 2025
Techcratic
  • TC
  • AI
    Artificial Intelligence

    Transforming network operations with AI: How Swisscom built a network assistant using Amazon Bedrock

    Artificial Intelligence

    EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video

    Artificial Intelligence

    Instruction-Following Pruning for Large Language Models

    Artificial Intelligence

    How to Combine Streamlit, Pandas, and Plotly for Interactive Data Apps

    Artificial Intelligence

    Tailor responsible AI with new safeguard tiers in Amazon Bedrock Guardrails

    Artificial Intelligence

    Automate Data Quality Reports with n8n: From CSV to Professional Analysis

    Artificial Intelligence

    NewDay builds A Generative AI based Customer service Agent Assist with over 90% accuracy

    Artificial Intelligence

    5 Things You Need to Know About Agentic AI

    Artificial Intelligence

    Normalizing Flows are Capable Generative Models

  • App Zone

    Top 3 Dev Tool Apps of 2025: Features, Pros, and Cons

    Top 3 Launcher Apps for Apple: Features, Pros, and Cons

    Top 3 Launcher Apps for Apple: Features, Pros, and Cons

    Top 3 Launcher Apps for Android: Features, Pros, and Cons

    Top 3 Launcher Apps for Android: Features, Pros, and Cons

    Top 3 Card Game Apps of 2025: Features, Pros, and Cons

    Top 3 Card Game Apps of 2025: Features, Pros, and Cons

    Top 3 Medical Apps of 2025: Features, Pros, and Cons

    Top 3 Medical Apps of 2025: Features, Pros, and Cons

    Top 3 Travel Apps of 2025: Features, Pros, and Cons

    Top 3 Travel Apps of 2025: Features, Pros, and Cons

    Top 3 Casual Game Apps for 2025: Features, Pros, and Cons

    Top 3 Casual Game Apps for 2025: Features, Pros, and Cons

    Top 3 Food Apps for 2025: Features, Pros, and Cons

    Top 3 Food Apps for 2025: Features, Pros, and Cons

    Top 3 Sport Apps for 2025: Features, Pros, and Cons

    Top 3 Sport Apps for 2025: Features, Pros, and Cons

  • Apple
    Yes, you can run Windows 11 on your Mac — and it’s only $15

    Run Windows apps on your Mac with Windows 11 Pro — now just $9.97

    How to stop LG & Samsung smart TV tracking, screen captures

    How to stop LG & Samsung smart TV tracking, screen captures

    Apple’s F1 expected to hit $300M at the box office this weekend

    Apple’s F1 expected to hit $300M at the box office this weekend

    Apple is reportedly working on a cheaper MacBook, but will it stick the landing?

    Apple is reportedly working on a cheaper MacBook, but will it stick the landing?

    Apple @ Work: Macs have never been more expensive to repair, but never been more reliable

    Apple @ Work: Macs have never been more expensive to repair, but never been more reliable

    New Gemini icon comes to Android and iPhone

    New Gemini icon comes to Android and iPhone

    Best Mac SSD and hard drive Prime Day deals 2025: Early discounts

    Best Mac SSD and hard drive Prime Day deals 2025: Early discounts

    This is the letter Donald Trump sent Apple to keep TikTok online

    This is the letter Donald Trump sent Apple to keep TikTok online

    Siri’s future, the original iPhone’s past, and Apple Music’s birthday

    Siri’s future, the original iPhone’s past, and Apple Music’s birthday

  • Retro Rewind
    Retro Rewind: Electronic Games April 1995

    Retro Rewind: Electronic Games April 1995

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

    Retro Rewind: Electronic Gaming Monthly Magazine Number 57 April 1994

    Retro Rewind: Blast from the Past – 35 Iconic Commercials of 1988!

    Retro Rewind: Blast from the Past – 35 Iconic Commercials of 1988!

    Retro Rewind: PC World Magazine August 1998

    Retro Rewind: PC World Magazine August 1998

    Retro Rewind: Computer Shopper Magazine September 1997

    Retro Rewind: Computer Shopper Magazine September 1997

    Retro Rewind: PC Magazine December 2015

    Retro Rewind: PC Magazine December 2015

    Retro Rewind: EDGE Magazine RETRO #1: The Guide to Classic Videogame Playing and Collecting

    Retro Rewind: EDGE Magazine RETRO #1: The Guide to Classic Videogame Playing and Collecting

    Retro Rewind: Computer Gaming World Magazine Issue 73 December 1998

    Retro Rewind: Computer Gaming World Magazine Issue 73 December 1998

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

  • Tech Deals
    JUANWE 32GB Micro SD Cards 10 Pack Memory Card, SDHC High-Speed U1 A1 SD Card, 32GB TF…

    JUANWE 32GB Micro SD Cards 10 Pack Memory Card, SDHC High-Speed U1 A1 SD Card, 32GB TF…

    EVGA GeForce GTX 1650 Super SC Ultra Gaming, 4GB GDDR6, Dual Fan, Metal Backplate,…

    EVGA GeForce GTX 1650 Super SC Ultra Gaming, 4GB GDDR6, Dual Fan, Metal Backplate,…

    CableCreation 6 Feet TRRS Headhpone Extension Cable Bundle with 1.5 Feet TRRS…

    CableCreation 6 Feet TRRS Headhpone Extension Cable Bundle with 1.5 Feet TRRS…

    ASRock Chanllenger AMD RX 6600 8G GDDR6 Graphics Card and Bronze 80+ 550W Power Supply

    ASRock Chanllenger AMD RX 6600 8G GDDR6 Graphics Card and Bronze 80+ 550W Power Supply

    INNOCN 49″ Curved Gaming Monitor 144Hz Ultrawide 32:9 WDFHD 3840 x 1080P, R1800, 99%…

    INNOCN 49″ Curved Gaming Monitor 144Hz Ultrawide 32:9 WDFHD 3840 x 1080P, R1800, 99%…

    Razer Iskur V2 Gaming Chair: Adaptive Lumbar Support – Adjustable Lumbar Curve – High…

    Razer Iskur V2 Gaming Chair: Adaptive Lumbar Support – Adjustable Lumbar Curve – High…

    Critical Rolls: Boxed Set – 5e RPG Storytelling Cards, 300 Tarot Sized Cards, Tabletop…

    Critical Rolls: Boxed Set – 5e RPG Storytelling Cards, 300 Tarot Sized Cards, Tabletop…

    Nintendogs Dachshund & Friends (Renewed)

    Nintendogs Dachshund & Friends (Renewed)

    Gamer [Blu-ray]

    Gamer [Blu-ray]

  • Tech Eats
    Cheesy Broccoli Rice Mug: 5-Minute Super Comfort Food

    Cheesy Broccoli Rice Mug: 5-Minute Super Comfort Food

    Top 10 Vegetarian Recipes for 2025: Easy and Nutritious Meals for Busy People

    Top 10 Vegetarian Recipes for 2025: Easy and Nutritious Meals for Busy People

    Bacon Mug Lasagna: 5-Minute Microwave Meat Lover’s Dream

    Bacon Mug Lasagna: 5-Minute Microwave Meat Lover’s Dream

    Bacon Fried Rice Mug: 5-Minute Microwave Meal

    Bacon Fried Rice Mug: 5-Minute Microwave Meal

    Bacon & Cheddar Mug Biscuit: 2-Minute Savory Comfort

    Bacon & Cheddar Mug Biscuit: 2-Minute Savory Comfort

    Loaded Bacon Cheesy Potato Mug: 5-Minute Comfort Food

    Loaded Bacon Cheesy Potato Mug: 5-Minute Comfort Food

    Peanut Butter Banana Mug Muffin: 5-Minute Protein Snack

    Peanut Butter Banana Mug Muffin: 5-Minute Protein Snack

    Oreo Mug Cake: 2-Minute Cookie & Cake Combo!

    Oreo Mug Cake: 2-Minute Cookie & Cake Combo!

    Tiramisu Mug Cake: Coffee Lover’s Dream in 2 Minutes!

    Tiramisu Mug Cake: Coffee Lover’s Dream in 2 Minutes!

  • Tesla
    1PC Rear Bed Cooler Storage Box Compatible with Tesla Cybertruck 2024 Oxford Waterproof…

    1PC Rear Bed Cooler Storage Box Compatible with Tesla Cybertruck 2024 Oxford Waterproof…

    Seat Back Hooks for Tesla Model 3, Model Y, Model S & Model X 2021-2025 – Bag Purse Back…

    Seat Back Hooks for Tesla Model 3, Model Y, Model S & Model X 2021-2025 – Bag Purse Back…

    Tesla Supercharger to CCS1 Charger Adapter, Max 500A 1000V NACS to CCS EV Fast Charge…

    Tesla Supercharger to CCS1 Charger Adapter, Max 500A 1000V NACS to CCS EV Fast Charge…

    BestEvMod for Refreshed Model 3 Highland Cargo Liner Floor Liners Trunk and Frunk Mat…

    BestEvMod for Refreshed Model 3 Highland Cargo Liner Floor Liners Trunk and Frunk Mat…

    4 PCS Car Front and Rear Side Window Sunshade, 19.6″ x 31.4″ x 7.8″ + 19.6″ x 31.4″ Keep…

    4 PCS Car Front and Rear Side Window Sunshade, 19.6″ x 31.4″ x 7.8″ + 19.6″ x 31.4″ Keep…

    Car Floor Mats for Tesla Cybertruck 2023 2024 2025, Custom TPE All Weather Protection…

    Car Floor Mats for Tesla Cybertruck 2023 2024 2025, Custom TPE All Weather Protection…

    JOYTUTUS Truck Bed Divider Compatible with Cybertruck 2024 2023 Cargo Divider Organizer…

    JOYTUTUS Truck Bed Divider Compatible with Cybertruck 2024 2023 Cargo Divider Organizer…

    HANSSHOW Pet Seat Covers for Cybertruck Rear Dog Seat Protector Full-Cover Waterproof…

    HANSSHOW Pet Seat Covers for Cybertruck Rear Dog Seat Protector Full-Cover Waterproof…

    Center Console Organizer Tray Compatible with Tesla Cybertruck 2024 2025 Accessories,…

    Center Console Organizer Tray Compatible with Tesla Cybertruck 2024 2025 Accessories,…

  • UFO
    Ancient Carvings Point to Strange Information (Season 1) | Ancient Aliens: Origins

    Ancient Carvings Point to Strange Information (Season 1) | Ancient Aliens: Origins

    ERIN MONTGOMERY –  Dirty Little Secret: Confessions of an Alien Contactee

    ERIN MONTGOMERY – Dirty Little Secret: Confessions of an Alien Contactee

    Miniature Schnauzer Funny Graphic Selfie UFOs Weird Aliens T-Shirt

    Miniature Schnauzer Funny Graphic Selfie UFOs Weird Aliens T-Shirt

    Lorine Chia – Intergalactic Love (Official Music Video)

    Lorine Chia – Intergalactic Love (Official Music Video)

    John Deere Men’s Trademark Logo Core Short Sleeve Tee

    John Deere Men’s Trademark Logo Core Short Sleeve Tee

    SOJOS Retro Polarized Square Sunglasses Womens Men Vintage Double Bridge Metal Frame UV Protection Sun Glasses SJ1246

    SOJOS Retro Polarized Square Sunglasses Womens Men Vintage Double Bridge Metal Frame UV Protection Sun Glasses SJ1246

    Bill Nye on Space Exploration #billnye #science #space #spaceexploration  #masterclass

    Bill Nye on Space Exploration #billnye #science #space #spaceexploration #masterclass

    Nessie and UFO, Sasquatch Rare Selfie, The Loch Ness Bigfoot T-Shirt

    Nessie and UFO, Sasquatch Rare Selfie, The Loch Ness Bigfoot T-Shirt

    Spirit Communication by Rev. Gaurav Tiwari | Indian Paranormal Society

    Spirit Communication by Rev. Gaurav Tiwari | Indian Paranormal Society

No Result
View All Result
  • TC
  • AI
    Artificial Intelligence

    Transforming network operations with AI: How Swisscom built a network assistant using Amazon Bedrock

    Artificial Intelligence

    EgoDex: Learning Dexterous Manipulation from Large-Scale Egocentric Video

    Artificial Intelligence

    Instruction-Following Pruning for Large Language Models

    Artificial Intelligence

    How to Combine Streamlit, Pandas, and Plotly for Interactive Data Apps

    Artificial Intelligence

    Tailor responsible AI with new safeguard tiers in Amazon Bedrock Guardrails

    Artificial Intelligence

    Automate Data Quality Reports with n8n: From CSV to Professional Analysis

    Artificial Intelligence

    NewDay builds A Generative AI based Customer service Agent Assist with over 90% accuracy

    Artificial Intelligence

    5 Things You Need to Know About Agentic AI

    Artificial Intelligence

    Normalizing Flows are Capable Generative Models

  • App Zone

    Top 3 Dev Tool Apps of 2025: Features, Pros, and Cons

    Top 3 Launcher Apps for Apple: Features, Pros, and Cons

    Top 3 Launcher Apps for Apple: Features, Pros, and Cons

    Top 3 Launcher Apps for Android: Features, Pros, and Cons

    Top 3 Launcher Apps for Android: Features, Pros, and Cons

    Top 3 Card Game Apps of 2025: Features, Pros, and Cons

    Top 3 Card Game Apps of 2025: Features, Pros, and Cons

    Top 3 Medical Apps of 2025: Features, Pros, and Cons

    Top 3 Medical Apps of 2025: Features, Pros, and Cons

    Top 3 Travel Apps of 2025: Features, Pros, and Cons

    Top 3 Travel Apps of 2025: Features, Pros, and Cons

    Top 3 Casual Game Apps for 2025: Features, Pros, and Cons

    Top 3 Casual Game Apps for 2025: Features, Pros, and Cons

    Top 3 Food Apps for 2025: Features, Pros, and Cons

    Top 3 Food Apps for 2025: Features, Pros, and Cons

    Top 3 Sport Apps for 2025: Features, Pros, and Cons

    Top 3 Sport Apps for 2025: Features, Pros, and Cons

  • Apple
    Yes, you can run Windows 11 on your Mac — and it’s only $15

    Run Windows apps on your Mac with Windows 11 Pro — now just $9.97

    How to stop LG & Samsung smart TV tracking, screen captures

    How to stop LG & Samsung smart TV tracking, screen captures

    Apple’s F1 expected to hit $300M at the box office this weekend

    Apple’s F1 expected to hit $300M at the box office this weekend

    Apple is reportedly working on a cheaper MacBook, but will it stick the landing?

    Apple is reportedly working on a cheaper MacBook, but will it stick the landing?

    Apple @ Work: Macs have never been more expensive to repair, but never been more reliable

    Apple @ Work: Macs have never been more expensive to repair, but never been more reliable

    New Gemini icon comes to Android and iPhone

    New Gemini icon comes to Android and iPhone

    Best Mac SSD and hard drive Prime Day deals 2025: Early discounts

    Best Mac SSD and hard drive Prime Day deals 2025: Early discounts

    This is the letter Donald Trump sent Apple to keep TikTok online

    This is the letter Donald Trump sent Apple to keep TikTok online

    Siri’s future, the original iPhone’s past, and Apple Music’s birthday

    Siri’s future, the original iPhone’s past, and Apple Music’s birthday

  • Retro Rewind
    Retro Rewind: Electronic Games April 1995

    Retro Rewind: Electronic Games April 1995

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

    Retro Rewind: Electronic Gaming Monthly Magazine Number 57 April 1994

    Retro Rewind: Blast from the Past – 35 Iconic Commercials of 1988!

    Retro Rewind: Blast from the Past – 35 Iconic Commercials of 1988!

    Retro Rewind: PC World Magazine August 1998

    Retro Rewind: PC World Magazine August 1998

    Retro Rewind: Computer Shopper Magazine September 1997

    Retro Rewind: Computer Shopper Magazine September 1997

    Retro Rewind: PC Magazine December 2015

    Retro Rewind: PC Magazine December 2015

    Retro Rewind: EDGE Magazine RETRO #1: The Guide to Classic Videogame Playing and Collecting

    Retro Rewind: EDGE Magazine RETRO #1: The Guide to Classic Videogame Playing and Collecting

    Retro Rewind: Computer Gaming World Magazine Issue 73 December 1998

    Retro Rewind: Computer Gaming World Magazine Issue 73 December 1998

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

    Retro Rewind: Electronic Gaming Monthly Magazine Number 55 February 1994

  • Tech Deals
    JUANWE 32GB Micro SD Cards 10 Pack Memory Card, SDHC High-Speed U1 A1 SD Card, 32GB TF…

    JUANWE 32GB Micro SD Cards 10 Pack Memory Card, SDHC High-Speed U1 A1 SD Card, 32GB TF…

    EVGA GeForce GTX 1650 Super SC Ultra Gaming, 4GB GDDR6, Dual Fan, Metal Backplate,…

    EVGA GeForce GTX 1650 Super SC Ultra Gaming, 4GB GDDR6, Dual Fan, Metal Backplate,…

    CableCreation 6 Feet TRRS Headhpone Extension Cable Bundle with 1.5 Feet TRRS…

    CableCreation 6 Feet TRRS Headhpone Extension Cable Bundle with 1.5 Feet TRRS…

    ASRock Chanllenger AMD RX 6600 8G GDDR6 Graphics Card and Bronze 80+ 550W Power Supply

    ASRock Chanllenger AMD RX 6600 8G GDDR6 Graphics Card and Bronze 80+ 550W Power Supply

    INNOCN 49″ Curved Gaming Monitor 144Hz Ultrawide 32:9 WDFHD 3840 x 1080P, R1800, 99%…

    INNOCN 49″ Curved Gaming Monitor 144Hz Ultrawide 32:9 WDFHD 3840 x 1080P, R1800, 99%…

    Razer Iskur V2 Gaming Chair: Adaptive Lumbar Support – Adjustable Lumbar Curve – High…

    Razer Iskur V2 Gaming Chair: Adaptive Lumbar Support – Adjustable Lumbar Curve – High…

    Critical Rolls: Boxed Set – 5e RPG Storytelling Cards, 300 Tarot Sized Cards, Tabletop…

    Critical Rolls: Boxed Set – 5e RPG Storytelling Cards, 300 Tarot Sized Cards, Tabletop…

    Nintendogs Dachshund & Friends (Renewed)

    Nintendogs Dachshund & Friends (Renewed)

    Gamer [Blu-ray]

    Gamer [Blu-ray]

  • Tech Eats
    Cheesy Broccoli Rice Mug: 5-Minute Super Comfort Food

    Cheesy Broccoli Rice Mug: 5-Minute Super Comfort Food

    Top 10 Vegetarian Recipes for 2025: Easy and Nutritious Meals for Busy People

    Top 10 Vegetarian Recipes for 2025: Easy and Nutritious Meals for Busy People

    Bacon Mug Lasagna: 5-Minute Microwave Meat Lover’s Dream

    Bacon Mug Lasagna: 5-Minute Microwave Meat Lover’s Dream

    Bacon Fried Rice Mug: 5-Minute Microwave Meal

    Bacon Fried Rice Mug: 5-Minute Microwave Meal

    Bacon & Cheddar Mug Biscuit: 2-Minute Savory Comfort

    Bacon & Cheddar Mug Biscuit: 2-Minute Savory Comfort

    Loaded Bacon Cheesy Potato Mug: 5-Minute Comfort Food

    Loaded Bacon Cheesy Potato Mug: 5-Minute Comfort Food

    Peanut Butter Banana Mug Muffin: 5-Minute Protein Snack

    Peanut Butter Banana Mug Muffin: 5-Minute Protein Snack

    Oreo Mug Cake: 2-Minute Cookie & Cake Combo!

    Oreo Mug Cake: 2-Minute Cookie & Cake Combo!

    Tiramisu Mug Cake: Coffee Lover’s Dream in 2 Minutes!

    Tiramisu Mug Cake: Coffee Lover’s Dream in 2 Minutes!

  • Tesla
    1PC Rear Bed Cooler Storage Box Compatible with Tesla Cybertruck 2024 Oxford Waterproof…

    1PC Rear Bed Cooler Storage Box Compatible with Tesla Cybertruck 2024 Oxford Waterproof…

    Seat Back Hooks for Tesla Model 3, Model Y, Model S & Model X 2021-2025 – Bag Purse Back…

    Seat Back Hooks for Tesla Model 3, Model Y, Model S & Model X 2021-2025 – Bag Purse Back…

    Tesla Supercharger to CCS1 Charger Adapter, Max 500A 1000V NACS to CCS EV Fast Charge…

    Tesla Supercharger to CCS1 Charger Adapter, Max 500A 1000V NACS to CCS EV Fast Charge…

    BestEvMod for Refreshed Model 3 Highland Cargo Liner Floor Liners Trunk and Frunk Mat…

    BestEvMod for Refreshed Model 3 Highland Cargo Liner Floor Liners Trunk and Frunk Mat…

    4 PCS Car Front and Rear Side Window Sunshade, 19.6″ x 31.4″ x 7.8″ + 19.6″ x 31.4″ Keep…

    4 PCS Car Front and Rear Side Window Sunshade, 19.6″ x 31.4″ x 7.8″ + 19.6″ x 31.4″ Keep…

    Car Floor Mats for Tesla Cybertruck 2023 2024 2025, Custom TPE All Weather Protection…

    Car Floor Mats for Tesla Cybertruck 2023 2024 2025, Custom TPE All Weather Protection…

    JOYTUTUS Truck Bed Divider Compatible with Cybertruck 2024 2023 Cargo Divider Organizer…

    JOYTUTUS Truck Bed Divider Compatible with Cybertruck 2024 2023 Cargo Divider Organizer…

    HANSSHOW Pet Seat Covers for Cybertruck Rear Dog Seat Protector Full-Cover Waterproof…

    HANSSHOW Pet Seat Covers for Cybertruck Rear Dog Seat Protector Full-Cover Waterproof…

    Center Console Organizer Tray Compatible with Tesla Cybertruck 2024 2025 Accessories,…

    Center Console Organizer Tray Compatible with Tesla Cybertruck 2024 2025 Accessories,…

  • UFO
    Ancient Carvings Point to Strange Information (Season 1) | Ancient Aliens: Origins

    Ancient Carvings Point to Strange Information (Season 1) | Ancient Aliens: Origins

    ERIN MONTGOMERY –  Dirty Little Secret: Confessions of an Alien Contactee

    ERIN MONTGOMERY – Dirty Little Secret: Confessions of an Alien Contactee

    Miniature Schnauzer Funny Graphic Selfie UFOs Weird Aliens T-Shirt

    Miniature Schnauzer Funny Graphic Selfie UFOs Weird Aliens T-Shirt

    Lorine Chia – Intergalactic Love (Official Music Video)

    Lorine Chia – Intergalactic Love (Official Music Video)

    John Deere Men’s Trademark Logo Core Short Sleeve Tee

    John Deere Men’s Trademark Logo Core Short Sleeve Tee

    SOJOS Retro Polarized Square Sunglasses Womens Men Vintage Double Bridge Metal Frame UV Protection Sun Glasses SJ1246

    SOJOS Retro Polarized Square Sunglasses Womens Men Vintage Double Bridge Metal Frame UV Protection Sun Glasses SJ1246

    Bill Nye on Space Exploration #billnye #science #space #spaceexploration  #masterclass

    Bill Nye on Space Exploration #billnye #science #space #spaceexploration #masterclass

    Nessie and UFO, Sasquatch Rare Selfie, The Loch Ness Bigfoot T-Shirt

    Nessie and UFO, Sasquatch Rare Selfie, The Loch Ness Bigfoot T-Shirt

    Spirit Communication by Rev. Gaurav Tiwari | Indian Paranormal Society

    Spirit Communication by Rev. Gaurav Tiwari | Indian Paranormal Society

No Result
View All Result
Techcratic
No Result
View All Result
Home Hacker News

The latest AI scaling graph

Hacker News by Hacker News
May 4, 2025
in Hacker News
Reading Time: 15 mins read
124
A A
0

2025-05-04 03:01:00
garymarcus.substack.com

Seen this? It (and the update to it) have been all the rage lately, among “AI forecasters”, on social media, and it has even made it into mainstream media like the Financial Times.

pastedGraphic.png

Some of it makes sense, and some if it doesn’t.

The first version dropped on March 19, METR (Model Evaluation and Threat Research — a non-profit research lab that was created in December 2023) — published a report on a study they had done, measuring the ability of large language models to carry out software-related task. They also tweeted the following:

pastedGraphic_1.png

The tweet took off, and then on April16, OpenAI released its new models o3 and o4-mini. METR carried out their tests on these models, added them to the graph, and found that “AI” was improving even faster than it had reported in March.

Extrapolating from neatly drawn graphs, people are tweeting things like, “Within 12 months, AI will do a reasonable job on most >1hr messiest cognitive tasks” – a claim that in our view doesn’t even pass the “circle the r’s in Strawberry” sniff test.

Abject failure on a task that many adults could solve in a minute

§

Some of the work that went into the METR graph is actually terrific, carried out with exemplary care, adhering scrupulously to best practices. The technical report was, for the most part, responsibly written, carefully describing the scope of the findings and giving suitable, appropriate warnings about potential gaps.

METR’s blog and its tweets (perhaps written by publicists or by generative AI rather than by scientists), however, dropped the careful qualifications and nuance of the technical paper and made claims that go far beyond anything that the study actually supports. These were further amplified in the Twitter echo chamber.

Alas, the whole thing was based on a flawed premise.

§

Everything started with good intentions. The scientists at METR set themselves the task of studying how the performance of large language on software-oriented tasks has improved over time – certainly an important question. They put together a collection of 107 problems related to programming and software engineering. The problem collection was constructed with admirable care; they were proposed, vetted, and edited by experts in a multicycle, laborious process. Most of the problems in the datasets are unpublished, to prevent their being used for training future models. However, METR has published some examples on their github site, and judging by those, they have created a diverse collection of interesting, often challenging, problems of high quality. Here, for instance, is the summary of one:

The task is to implement functions to process payments and avoid duplicate transactions when they are coming in asynchronously from different time zones and currencies. Two payments have to be matched based on fuzzy rules such as time difference across time zones and multi step currency conversions.

There are many edge cases and tricky details to get right, and the most difficult version (“full”) involves setting up all rules related to handling timezones.

The complete problem specification is 3000 words long – well worth looking at to get a sense of what is really involved in a moderately complex software engineering undertaking.

To make a quantitative analysis of their study, METR needed to have a quantitative measure of the difficulty of the problems that a model can address – the y-axis on the graphs at the start of the article.

That’s where things started to get weird. First, they decided to characterize the difficulty of each problem in terms of the time that a human expert takes to solve it; for instance, it is recorded in the dataset that human experts take an average of 23 hours 37 minutes to write the payment processing code. Then they decided to measure the quality of an AI system in terms of the time-demands of problems that the system gets 50% correct. For instance, according to the graph, GPT-4 scores 50% correct on problems that people take 4 minutes to solve. So the y-axis value for GPT-4 is 4 minutes.

This combination of duration and accuracy leads to a very weird and arbitrary measure. For instance: The first graph says that the task “question answering” takes 15 seconds; “count words in passage” takes 2 minutes, and “find fact on web” takes 11 minutes. Presumably, METR doesn’t mean that these tasks in general take that long; those are the average human times for the particular instances of those tasks in the dataset. However, if we consider these tasks in general, then the time for humans to carry them out depends on a zillion different factors. The time that people take to count words in their native language depends almost purely on the length of the passage. If they are asked to count words written in an alphabet with unfamiliar conventions for word breaks, it takes much longer. The time that people take to answer a question depends on all kinds of things; how long is the question, how complex is the reasoning process needed to solve the question, how well do the people know the subject matter, how experienced are they in answering this particular kind of question, what resources or tools do they have access to, and so on. It is not credible that the 50% success rate tracks these combinations of factors with any precision. In other words, the 4-minute mark for GPT-4 is completely arbitrary; you could probably put together one reasonable collection of word counting and question answering tasks with average human time of 30 seconds and another collection with an average human time of 20 minutes where GPT-4 would hit 50% accuracy on each. And it would be absurd to say that o3 can do all tasks that humans can do in 1.7 hours or that GPT-3 could do anything that a person can do in 15 seconds

Bluntly, the Y-axis simply doesn’t make much sense. And needless to say, if the Y-axis doesn’t make sense, you can’t meaningfully use the graph to make predictions. Computers can answer some questions reliably now, for example, and some not, and the graph tells us nothing about which is which or when any specific question will be solved. Or consider songwriting; Dylan wrote some in an afternoon; Leonard Cohen took half a decade on and off to write Hallelujah. Should we average the two figures? Should we sample Dylan songs more heavily because he wrote more of them? Where should songwriting go on the figure? The whole thing strikes us as absurd.

Finally, the only thing METR looked at was “software tasks”. Software might be very different from other domains, in which case the graph (even it did make sense) might not apply. In the technical paper, the authors actually get this right: they discuss carefully the possibility that the tasks used for testing might not be representative of real-world software engineering tasks. They certainly don’t claim that the findings of the paper apply to tasks in general. But the social media posts make that unwarranted leap.

That giant leap seems especially unwarranted given that there has likely been a lot of recent data augmentation directed towards software benchmarks in particular (where this is feasible). In other domains where direct, verifiable augmentation is less feasible, results might be quite different. (Witness the failed letter ‘r’ labeling task depicted above.) Unfortunately, literally none of the tweets we saw even considered the possibility that a problematic graph specific to software tasks might not generalize to literally all other aspects of cognition.

We can only shake our heads.

§

The datasets of software-engineering problems that METR has created are potentially very valuable. Some of the qualitative discussions of the limitations of the current technology on these problems is interesting. The paper is worth reading.

But attempting to use the graph to make predictions about the capacities of future AI is misguided, and the fact that it went viral is a sign that in AI (as in so many other domains) people are repeating things that they want to believe, based on their alleged conclusions rather than their validity.

Gary Marcus and Ernest Davis miss the days in which peer review was taken seriously.

Source Link


Keep your files stored safely and securely with the SanDisk 2TB Extreme Portable SSD. With over 69,505 ratings and an impressive 4.6 out of 5 stars, this product has been purchased over 8K+ times in the past month. At only $129.99, this Amazon’s Choice product is a must-have for secure file storage.

Help keep private content private with the included password protection featuring 256-bit AES hardware encryption. Order now for just $129.99 on Amazon!


Start your free Amazon Prime trial
today and unlock unlimited streaming and more!

Help Power Techcratic’s Future – Scan To Support

If Techcratic’s content and insights have helped you, consider giving back by supporting the platform with crypto. Every contribution makes a difference, whether it’s for high-quality content, server maintenance, or future updates. Techcratic is constantly evolving, and your support helps drive that progress.

As a solo operator who wears all the hats, creating content, managing the tech, and running the site, your support allows me to stay focused on delivering valuable resources. Your support keeps everything running smoothly and enables me to continue creating the content you love. I’m deeply grateful for your support, it truly means the world to me! Thank you!

BITCOIN

Bitcoin Logo

Bitcoin QR Code

bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge

Scan the QR code with your crypto wallet app

DOGECOIN

Dogecoin Logo

Dogecoin QR Code

D64GwvvYQxFXYyan3oQCrmWfidf6T3JpBA

Scan the QR code with your crypto wallet app

ETHEREUM

Ethereum Logo

Ethereum QR Code

0xe9BC980DF3d985730dA827996B43E4A62CCBAA7a

Scan the QR code with your crypto wallet app

Please read the Privacy and Security Disclaimer on how Techcratic handles your support.

Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.

Tags: Hacker News
Share162Share28ShareShare4ShareTweet101
Previous Post

Apple hit with $502M bill for UK 4G patent infringement

Next Post

M.2 SSD Mounting Screws Kit for MSI Motherboards (8pcs)

Hacker News

Hacker News

Stay updated with Hacker News, where technology meets entrepreneurial spirit. Get the latest on tech trends, startup news, and discussions from the tech community. Read the latest updates here at Techcratic.

Related Posts

Yutarop/ga-pixel-art: Generates an animated GIF using a genetic algorithm.
Hacker News

Yutarop/ga-pixel-art: Generates an animated GIF using a genetic algorithm.

July 6, 2025
1.3k
News Alert Immediately – Instant News Alerts & Global Monitoring
Hacker News

News Alert Immediately – Instant News Alerts & Global Monitoring

July 6, 2025
1.3k
hackArcana
Hacker News

hackArcana

July 6, 2025
1.3k
Differentiable Programming with PyTorch and DSPy
Hacker News

Differentiable Programming with PyTorch and DSPy

July 5, 2025
1.3k
The Right Way to Embed an LLM in a Group Chat
Hacker News

The Right Way to Embed an LLM in a Group Chat

July 5, 2025
1.3k
Cybersecurity
Hacker News

How to get into cybersecurity

July 5, 2025
1.3k
Local First Software Is Easier to Scale
Hacker News

Local First Software Is Easier to Scale

July 5, 2025
1.3k
GNU Taler
Hacker News

GNU Taler

July 5, 2025
1.3k
Load More
Next Post
M.2 SSD Mounting Screws Kit for MSI Motherboards (8pcs)

M.2 SSD Mounting Screws Kit for MSI Motherboards (8pcs)

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Tech Resources

  • 30 Second Tech ™
  • AI
  • App Zone ™
  • Apple
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • Forbes
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo
  • Google News
  • Hacker News
  • Harvard Tech
  • I Like Cats ™
  • I Like Dogs ™
  • LifeHacker
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • PC World
  • Photofocus
  • Physics
  • Random Tech
  • Retro Rewind ™
  • Robot Report
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Art
  • Tech Careers
  • Tech Deals
  • Techcratic ™
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Wired
  • ZDNet

Tech News

  • 30 Second Tech ™
  • AI
  • Apple Insider
  • Ars Technica
  • CNET
  • ComputerWorld
  • Crypto News
  • Cybersecurity
  • Endgadget
  • ExtremeTech
  • Fossbytes
  • Gaming
  • GeekWire
  • Gizmodo

Tech News

  • Harvard Tech
  • MacRumors
  • Macworld
  • Mashable
  • Microsoft
  • MIT Tech
  • Physics
  • PC World
  • Random Tech
  • Retro Rewind ™
  • SiliconANGLE
  • SlashGear
  • Smartphone
  • StackSocial
  • Tech Careers

Tech News​

  • Tech Art
  • TechCrunch
  • Techdirt
  • TechRepublic
  • Techs Got To Eat ™
  • TechSpot
  • Tesla
  • The Verge
  • TNW
  • Trusted Reviews
  • UFO
  • VentureBeat
  • Visual Capitalist
  • Wired
  • ZDNet

Site Links

  • About Techcratic
  • Affiliate Disclaimer
  • Affiliate Link Policy
  • Contact Techcratic
  • Dealors Discount Store
  • Privacy and Security Disclaimer
  • Privacy Policy
  • RSS Feed
  • Site Map
  • Support Techcratic
  • Techcratic
  • Tech Deals
  • TOS
  • 𝕏
Click For A Secret Deal

Techcratic – Your All In One Tech Hub © 2020 – 2025
All Rights Reserved
∞

No Result
View All Result
  • 30 Second Tech ™
  • AI
  • App Zone ™
  • Apple
  • Ars Technica
  • CNET
  • Crypto News
  • Cybersecurity
  • Endgadget
  • Gaming
  • I Like Cats ™
  • I Like Dogs ™
  • MacRumors
  • Macworld
  • Tech Deals
  • Techcratic ™
  • Techs Got To Eat ™
  • Tesla
  • UFO
  • Wired