Anthropic has announced upgrades to its AI portfolio, including an enhanced Claude 3.5 Sonnet model and the introduction of Claude 3.5 Haiku, alongside a “computer control” feature in a public beta.
The upgraded Claude 3.5 Sonnet demonstrates substantial improvements across all metrics, with particularly notable advances in coding capabilities. The model achieved an impressive 49.0% on the SWE-bench Verified benchmark, surpassing all publicly available models, including OpenAI’s offerings and specialist coding systems.
In a pioneering development, Anthropic has introduced computer use functionality that enables Claude to interact with computers similarly to humans: viewing screens, controlling cursors, clicking, and typing. This capability, currently in public beta, marks Claude 3.5 Sonnet as the first frontier AI model to offer such functionality.
Several major technology firms have already begun implementing these new capabilities.
“The upgraded Claude 3.5 Sonnet represents a significant leap for AI-powered coding,” reports GitLab, which noted up to 10% stronger reasoning across use cases without additional latency.
The new Claude 3.5 Haiku model, set for release later this month, matches the performance of the previous Claude 3 Opus whilst maintaining cost-effectiveness and speed. It notably achieved 40.6% on SWE-bench Verified, outperforming many competitive models including the original Claude 3.5 Sonnet and GPT-4o.
Regarding computer control capabilities, Anthropic has taken a measured approach, acknowledging current limitations whilst highlighting potential. On the OSWorld benchmark, which evaluates computer interface navigation, Claude 3.5 Sonnet achieved 14.9% in screenshot-only tests, significantly outperforming the next-best system’s 7.8%.
The developments have undergone rigorous safety evaluations, with pre-deployment testing conducted in partnership with both the US and UK AI Safety Institutes. Anthropic maintains that the ASL-2 Standard, as detailed in their Responsible Scaling Policy, remains appropriate for these models.
Source: AI NMEWS
Recent Comments