GeoGuessr Rainbolt beaten by PIGEON 🕊️

May 27, 2024

So essentially,

PIGEON model can tell your coordinates from a picture of your surroundings (beating Rainbolt with a large margin in GeoGuessr)

Paper: PIGEON: Predicting Image Geolocations (26 Pages)

Researchers from Stanford are interested in solving Planet Scale Geolocation. The idea for this paper was initially conceived in an independent class project as part of CS 330: Deep Multi-Task and Meta Learning, taught by Professor Finn at Stanford University. This paper details two novel models, PIGEON and PIGEOTTO, which achieve state-of-the-art performance on image geolocalization tasks.

Source: https://lexica.art/prompt/758f4227-c076-42a3-810d-80a9e24272b5

Hmm..What’s the background?

The paper centers around the challenge of Image Geolocalization, which is the task of determining the geographical coordinates (latitude and longitude) of an image's origin. Both models were designed to compete in the game GeoGuessr, where >50 million players guess their locations based on random Street View images.

The sheer size and diversity of the Earth's surface makes it difficult to create models that generalize well. Additionally, The appearance of locations can change drastically depending on the time of day, weather conditions, and seasons.

Ok, So what is proposed in the research paper?

The paper proposes several technical innovations:

Semantic Geocell Creation: Instead of dividing the Earth into arbitrary rectangular regions ("geocells"), the authors propose using meaningful administrative and political boundaries (countries, regions, cities) to create these cells. This approach aims to capture location-specific characteristics and improve the model's understanding of geographical context.
Multi-Task Contrastive Pretraining: Recognizing the power of pretrained vision transformers, the researchers employ OpenAI's CLIP model as their backbone. However, they go a step further by continuing the pretraining of CLIP in a multi-task fashion. They augment the training data with auxiliary information like climate data, compass directions, and traffic patterns, creating synthetic captions that incorporate this information.
Distance-Based Label Smoothing (Haversine Smoothing): Instead of treating geolocalization as a pure classification problem, the authors introduce a novel loss function that incorporates the Haversine distance between locations. This approach "smooths" the labels, allowing the model to learn relationships between nearby geocells and implicitly understand geographical proximity.
Refinement via Location Cluster Retrieval: The paper introduces a hierarchical retrieval mechanism inspired by prototypical networks. After an initial geocell prediction, the model refines its guess by comparing the image embedding with representations of location clusters within the predicted geocell. This technique adds an additional layer of granularity, improving accuracy, especially at the street and city levels.

The resulting model places over 40% of its guesses within 25 km of the target location globally and ranks in the top 0.01% of players.

Here’s an awesome video of their collaboration with Trevor Rainbolt

And what’s next?

The authors acknowledge several potential avenues for future research:

Safety Concerns: The authors emphasize the ethical implications of increasingly accurate image geolocalization technologies. They argue that future research should prioritize mitigating potential harm and misuse, particularly considering potential military applications.
Incorporating Land Cover and Overhead Imagery: The paper focuses on ground-level images, but the authors acknowledge the potential of integrating other data sources e.g., satellite images
Exploring Street-Level Prediction Limits: The paper notes that a current limitations, as it is the difficult of achieve consistently accurate street-level predictions. Future research could explore methods to overcome this challenge, potentially by incorporating additional data sources or refining existing techniques.

So essentially,

PIGEON model can tell your coordinates from a picture of your surroundings (beating rainbolt with a large margin in GeoGuessr)

So Essentially

Discussion about this post