ArcheoVLM

Phase 5: Georeferencing & Multi-Modal Verification

Convert pixel-based findings into geographic data and assemble comprehensive verification packages

Goal

Convert pixel-based findings into real-world geographic data and automatically assemble a rich, multi-modal verification package for each potential archaeological site.

Georeferencing Process
Converting pixel coordinates to real-world locations

Coordinate Transformation

Pixel to Geographic

Convert YOLO bounding box coordinates to WGS84 latitude/longitude using GeoTIFF metadata

Spatial Accuracy

Sub-meter Precision

Maintain high spatial accuracy through proper coordinate system transformations

Multi-Modal Data Sources
Comprehensive verification data integration

Optical Satellite Imagery

Google Earth Engine API
Sentinel-2

Recent, low-cloud-cover imagery with True-Color, NDVI, and NDMI products

SAR Imagery

ALOS-PALSAR
NISAR

L-band SAR data to detect subsurface soil moisture anomalies indicating buried earthworks

Historical Text Analysis

GeospaCy/GeoNorm

Two-stage toponym resolution pipeline to geoparse historical Amazonian texts and extract relevant archaeological mentions

Historical Aerial Photography

IBGE
INPE
USGS EarthExplorer

Historical aerial photos from Brazilian and international archives

Verification Package Components
Comprehensive site documentation

LiDAR Products

• Original RVT visualizations
• DEM data
• Detection overlays

Satellite Data

• Optical imagery
• SAR analysis
• Vegetation indices

Historical Context

• Aerial photographs
• Text references
• Temporal analysis

AI Analysis

• VLM descriptions
• Confidence scores
• Feature classifications
Data Processing Pipeline
Automated verification package generation

Processing Steps:

  • Georeferencing: Convert all pixel coordinates to WGS84 coordinates
  • Data Acquisition: Query external APIs and archives for each site location
  • Image Processing: Georeference and crop relevant imagery
  • Text Mining: Extract and correlate historical text references
  • Package Assembly: Compile all data layers into structured verification packages
Execution Checklist
Phase 5 tasks and deliverables
Develop generate_verification_package.py script
Implement georeferencing function to convert pixel coordinates
Set up API access for Google Earth Engine (Sentinel-2)
Set up access to L-band SAR data archives
Digitize and create the historical text corpus
Implement the GeospaCy/GeoNorm toponym resolution pipeline
Identify and query historical aerial photography archives
Run the generate_verification_package.py script for all sites
Verify all verification packages are complete
Expected Outputs

Master Site Database

potential_sites.geojson with all georeferenced discoveries

Verification Packages

Complete multi-modal data packages for each high-confidence site