Benj Edwards
2025-01-23 17:24:00
arstechnica.com
While it’s working, Operator shows a miniature browser window of its actions.
However, the technology behind Operator is still relatively new and far from perfect. The model reportedly performs best at repetitive web tasks like creating shopping lists or playlists. It struggles more with unfamiliar interfaces like tables and calendars, and does poorly with complex text editing (with a 40 percent success rate), according to OpenAI’s internal testing data.
OpenAI reported the system achieved an 87 percent success rate on the WebVoyager benchmark, which tests live sites like Amazon and Google Maps. On WebArena, which uses offline test sites for training autonomous agents, Operator’s success rate dropped to 58.1 percent. For computer operating system tasks, CUA set an apparent record of 38.1 percent success on the OSWorld benchmark, surpassing previous models but still falling short of human performance at 72.4 percent.
With this imperfect research preview, OpenAI hopes to gather user feedback and refine the system’s capabilities. The company acknowledges CUA won’t perform reliably in all scenarios but plans to improve its reliability across a wider range of tasks through user testing.
Safety and privacy concerns
For any AI model that can see how you operate your computer and even control some aspects of it, privacy and safety are very important. OpenAI says it built multiple safety controls into Operator, requiring user confirmation before completing sensitive actions like sending emails or making purchases. Operator also has limits on what it can browse, set by OpenAI. It cannot access certain website categories, including gambling and adult content.
Traditionally, AI models based on large language model-style Transformer technology like Operator have been relatively easy to fool with jailbreaks and prompt injections.
To catch attempts at subverting Operator, which might hypothetically be embedded in websites that the AI model browses, OpenAI says it has implemented real-time moderation and detection systems. OpenAI reports the system recognized all but one case of prompt injection attempts during an early internal red-teaming session.
Enhance your driving experience with the P12 Pro 4K Mirror Dash Cam Smart Driving Assistant, featuring Front and Rear Cameras, Voice Control, Night Vision, and Parking Monitoring. With a 4.3/5-star rating from 2,070 reviews and over 1,000 units sold in the past month, it’s a top-rated choice for drivers. The dash cam comes with a 32GB Memory Card included, making it ready to use out of the box. Available now for just $119.99, plus a $20 coupon at checkout. Don’t miss out on this smart driving essential from Amazon!
Support Techcratic
If you find value in Techcratic’s insights and articles, consider supporting us with Bitcoin. Your support helps me, as a solo operator, continue delivering high-quality content while managing all the technical aspects, from server maintenance to blog writing, future updates, and improvements. Support Innovation! Thank you.
Bitcoin Address:
bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge
Please verify this address before sending funds.
Bitcoin QR Code
Simply scan the QR code below to support Techcratic.
Please read the Privacy and Security Disclaimer on how Techcratic handles your support.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.