Will Douglas Heaven
2025-01-23 13:00:00
www.technologyreview.com
Like Anthropic’s Computer Use and Google DeepMind’s Mariner, Operator takes screenshots of a computer screen and scans the pixels to figure out what actions it can take. CUA, the model behind it, is trained to interact with the same graphical user interfaces—buttons, text boxes, menus—that people use when they do things online. It scans the screen, takes an action, scans the screen again, takes another action, and so on. That lets the model carry out tasks on most websites that a person can use.
“Traditionally the way models have used software is through specialized APIs,” says Reiichiro Nakano, a scientist at OpenAI. (An API, or application programming interface, is a piece of code that acts as a kind of connector, allowing different bits of software to be hooked up to one another.) That puts a lot of apps and most websites off limits, he says: “But if you create a model that can use the same interface that humans use on a daily basis, it opens up a whole new range of software that was previously inaccessible.”
CUA also breaks tasks down into smaller steps and tries to work through them one by one, backtracking when it gets stuck. OpenAI says CUA was trained with techniques similar to those used for its so-called reasoning models, o1 and o3.
OpenAI has tested CUA against a number of industry benchmarks designed to assess the ability of an agent to carry out tasks on a computer. The company claims that its model beats Computer Use and Mariner in all of them.
For example, on OSWorld, which tests how well an agent performs tasks such as merging PDF files or manipulating an image, CUA scores 38.1% to Computer Use’s 22.0% In comparison, humans score 72.4%. On a benchmark called WebVoyager, which tests how well an agent performs tasks in a browser, CUA scores 87%, Mariner 83.5%, and Computer Use 56%. (Mariner can only carry out tasks in a browser and therefore does not score on OSWorld.)
For now, Operator can also only carry out tasks in a browser. OpenAI plans to make CUA’s wider abilities available in the future via an API that other developers can use to build their own apps. This is how Anthropic released Computer Use in December.
OpenAI says it has tested CUA’s safety, using red teams to explore what happens when users ask it to do unacceptable tasks (such as research how to make a bioweapon), when websites contain hidden instructions designed to derail it, and when the model itself breaks down. “We’ve trained the model to stop and ask the user for information before doing anything with external side effects,” says Casey Chu, another researcher on the team.
Look! No hands
To use Operator, you simply type instructions into a text box. But instead of calling up the browser on your computer, Operator sends your instructions to a remote browser running on an OpenAI server. OpenAI claims that this makes the system more efficient. It’s another key difference between Operator, Computer Use and Mariner (which runs inside Google’s Chrome browser on your own computer).
Upgrade your audio game with the Logitech for Creators Blue Yeti USB Microphone. With over 33,730 ratings and an impressive 4.6 out of 5 stars, it’s no wonder this is an Amazon’s Choice product. Recently, 5K+ units were purchased in the past month.
Available in five stunning colors: Teal, Silver, Pink Dawn, Midnight Blue, and Blackout, this microphone is perfect for creators looking to produce exceptional audio. Priced at only $84.99, it’s a deal you can’t afford to miss.
Elevate your recordings with clear broadcast-quality sound and explore your creativity with enhanced effects, advanced modulation, and HD audio samples. Order now for just $84.99 on Amazon!
Support Techcratic
If you find value in Techcratic’s insights and articles, consider supporting us with Bitcoin. Your support helps me, as a solo operator, continue delivering high-quality content while managing all the technical aspects, from server maintenance to blog writing, future updates, and improvements. Support Innovation! Thank you.
Bitcoin Address:
bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge
Please verify this address before sending funds.
Bitcoin QR Code
Simply scan the QR code below to support Techcratic.
Please read the Privacy and Security Disclaimer on how Techcratic handles your support.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.