Join us in Atlanta on April 10th to explore the landscape of security workers. Explore the vision, benefits, and use cases of AI for security teams. request an invitation here.
Apple researchers have developed a new artificial intelligence system that can understand ambiguous references to on-screen entities as well as conversation and background context, creating more natural interactions with voice assistants, according to a paper published Friday. It is said that it made it possible.
system called ReALM (Reference Resolution as Language Modeling)leverages large-scale language models to transform complex tasks of reference resolution, such as understanding references to on-screen visual elements, into pure language modeling problems. This allows ReALM to achieve significant performance improvements compared to existing methods.
„It is essential for conversational assistants to be able to understand context, including references,“ Apple's research team wrote. “Enabling users to query what they see on screen is a critical step in ensuring a truly hands-free experience with voice assistants.”
Conversation assistant enhancements
ReALM's key innovation for tackling screen-based references is that it uses the parsed on-screen entities and their positions to reconstruct the screen and generate a textual representation that captures the visual layout. The researchers demonstrated that combining this approach with a fine-tuned language model specialized for reference resolution can deliver better performance. GPT-4 About tasks.
![](https://venturebeat.com/wp-content/uploads/2024/04/Screenshot-2024-04-01-at-11.30.06%E2%80%AFAM.png?resize=1304%2C1402&strip=all)
“We have demonstrated significant improvements over existing systems with similar functionality in different types of references, with the smallest model achieving an absolute gain of more than 5% in terms of on-screen references. ,” the researchers wrote. “Our large model significantly outperforms his GPT-4.”
Practical uses and limitations
This research highlights the potential of centralized language models to handle tasks such as reference resolution in production systems where latency and computing constraints preclude the use of large-scale end-to-end models. . By publishing this research, Apple is demonstrating its continued investment in making Siri and other products more understandable and context-aware.
Still, the researchers caution that relying on automated screen analysis has its limits. Handling more complex visual references, such as distinguishing between multiple images, may require incorporating computer vision and multimodal techniques.
Apple rushes to close AI gap as rivals proliferate
Apple is Artificial intelligence research is quietly making great progressDespite falling behind its technology rivals in the race to dominate the rapidly changing AI landscape.
from A multimodal model that combines vision and languageto AI-powered animation toolsto techniques for Build high-performance, specialized AI on a budgeta steady drumbeat of breakthroughs from the company's labs suggests its AI ambitions are rapidly escalating.
But the notoriously secretive tech giant faces stiff competition from companies such as: Google, microsoft, Amazon and OpenAIis actively commercializing generative AI in search, office software, cloud services, and more.
Apple has long been a follower rather than a first mover, but it now faces a market that is being transformed at breakneck speed by artificial intelligence.closely monitored Worldwide Developer Conference In June, the company launched a new large-scale language modeling framework calledApple GPT” chatbots and other AI-powered features across its ecosystem.
“We are excited to share more details about our ongoing work in AI later this year.” CEO Tim Cook recently suggested On the phone for income and expenditure report. Despite its characteristic opacity, it's clear that Apple's AI efforts are far-reaching.
But as the battle for AI supremacy intensifies, the iPhone maker's lateness to the party leaves it in an unusually vulnerable position. With deep pockets, brand loyalty, elite engineering, and a tightly integrated product portfolio, there are opportunities, but no guarantees in this high-stakes contest.
A new era of ubiquitous, truly intelligent computing is upon us. We'll find out in June whether Apple did enough to help shape it.