Research Overview

Advancing Visual Geolocation Through AI Research

Our research focuses on developing novel computer vision and machine learning techniques for geographic location inference from visual data.

Core Technology
In Training

NaviSense Model

NaviSense is our proprietary transformer-based architecture currently in development that combines visual embeddings with geospatial priors. The model is being trained to achieve state-of-the-art performance in location prediction by learning hierarchical representations of architectural styles, environmental features, and urban patterns.

Multi-Scale Feature Extraction

Captures both fine-grained architectural details and broad environmental context

Geospatial Attention Mechanism

Learns spatial relationships between visual features and geographic coordinates

Transfer Learning Pipeline

Pre-trained on millions of geotagged images for robust generalization

Model Architecture

Input: 224×224 RGB Image
Backbone: Vision Transformer (ViT-L/16)
Embedding Dim: 1024
Attention Heads: 16
Layers: 24
Parameters: 307M
Output: Lat/Lng + Confidence

Research Focus Areas

Visual Geolocation

Developing algorithms that predict geographic coordinates from single images using deep learning and computer vision techniques.

  • • Street-level localization
  • • Cross-view geo-localization
  • • Uncertainty quantification

Dataset Curation

Building large-scale, diverse datasets of geotagged imagery for training and evaluating location recognition models.

  • • Global coverage datasets
  • • Temporal consistency
  • • Privacy-preserving collection

Model Optimization

Improving model efficiency, accuracy, and robustness through novel architectures and training strategies.

  • • Efficient transformers
  • • Few-shot learning
  • • Domain adaptation

Recent Publications

NaviSense: Transformer-Based Visual Geolocation with Spatial Priors

A novel architecture combining vision transformers with geospatial attention mechanisms for improved location prediction accuracy.

2024Computer Vision and Pattern Recognition

Large-Scale Geotagged Image Dataset for Urban Environment Analysis

Introducing a comprehensive dataset of 10M+ geotagged images spanning 195 countries for training location recognition models.

2024International Conference on Computer Vision

Zero-Shot Landmark Recognition via Contrastive Learning

Enabling landmark identification without explicit training through contrastive vision-language models.

2023Neural Information Processing Systems

Collaborate With Us

Interested in research collaboration or accessing our datasets? Get in touch with our team.