Hear about the new AMEX?
So essentially,
AMEX Dataset allows AI agents to train on phone usageš±
Paper:
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents (20 Pages)
Github:
https://yuxiangchai.github.io/AMEX/
Researchers from MMLab, Shanghai AI Lab and vivo AI Lab are interested in developing better Agents to perform tasks on any app by directly interacting with the user interface elements. Similar to the āLarge Action Modelā idea popularized by Rabbit R1.
Hmm..Whatās the background?
The increasing prevalence of AI assistants on mobile devices, such as Siri and Bixby, has led to research on AI agents that can interact with mobile GUIs (Graphical User Interfaces) like humans.
Researchers are developing Mobile GUI-Control Agents, or GUI Agents, that can process visual and natural language inputs, specifically screenshots, to manipulate UI elements directly. However, existing GUI agents struggle with real-world tasks due to their limited understanding of page layouts and element functionalities.
Ok, So what is proposed in the research paper?
To address the limitations of existing datasets for training AI agents to interact with mobile GUIs, this project introduces AMEX (Android Multi-annotation EXpo), a comprehensive, large-scale dataset for generalist mobile GUI-control agents.
It contains multi-level annotations on over 104K high-resolution screenshots, providing a deeper understanding of the smartphone UI environment
AMEX includes a set of complex instructions (averaging 13 steps per instruction) with annotated action sequences required to complete them on various third-party apps.
Recognizing that most user interactions happen on third-party apps, AMEX prioritizes these applications over system-built apps in its data collection.
Whatās next?
The researchers have identified several areas for potential future work related to the AMEX project:
AMEX primarily focuses on English language instructions and apps. Expanding the dataset to include multi-lingual screenshots, functionalities, and instructions would enhance the agent's applicability in diverse linguistic contexts
The current evaluation method, based on action prediction, is considered inadequate for real-world scenarios. It doesn't account for factors like page loading times or dynamic content changes
So essentially,
AMEX Dataset allows AI agents to train on phone usageš±