So essentially,
GUI Odyssey datasets allows AI to use mobile apps like humans
Paper: GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices ( 16 Pages)
Researchers from multiple organizations such as OpenGVLab Shanghai AI Laboratory, The University of Kong Kong Nanjing University, Harbin Institute of Technology and Shanghai Jiao Tong University have developed a new, large-scale dataset designed to help autonomous agents learn how to navigate between multiple apps on mobile devices.
Hmm..Whatβs the background?
Existing GUI navigation datasets primarily consist of simple tasks that can be completed within a single app. In the real world, many tasks require navigating across multiple apps. Constructing a dataset for cross-app GUI navigation presents two main challenges, especially for real-world scenarios:
β Capturing the complexity and diversity of tasks across multiple applications
β Ensuring consistent and accurate annotation across multiple applications
Ok, So what is proposed in the research paper?
GUI Odyssey is the first dataset focused on cross-app navigation, which is a more realistic use case than the single-app navigation tasks that previous datasets are limited to. Previous datasets are also limited in their number of episodes, user instructions, and applications.
While the AITW dataset claims to have a large number of episodes and unique instructions, GUI Odyssey surpasses it with 3.1 times more of each, in addition to having 2.9 times more apps than AITW. The episodes have an average of 15.4 steps, which is 2.1 times longer than AITZ, the previous dataset with the longest average.2
Finally, GUI Odyssey is annotated by human demonstrations on a variety of mobile devices in an Android emulator, which helps to improve the quality and diversity of the data.
Source: https://huggingface.co/papers/2406.08451
Whatβs next?
While GUI Odyssey is an effective dataset, there are some limitations that lead to some areas for future work:
β Simulating real-world operations: Some operations, like making payments, can't be completed in a simulator. Tasks must be simplified for data collection.
β Simulator limitations: GUI Odyssey only supports Google devices due to the use of an Android Studio simulator. Gathering data from other manufacturers and operating systems is difficult.
β Task Openness: Data collection only captures one way to complete a task, but users might do it in many different ways.
β Offline Environments: An online evaluation platform is in development to better assess how well an agent can handle cross-app tasks
So essentially,
GUI Odyssey datasets allows AI to use mobile apps like humans