ArcheoVLM
Comprehensive Project Execution Checklist
Complete task breakdown for all project phases
Phase 0
Phase 1: Cloud Foundation & Data Ingestion
Create new Google Cloud Project
Write and test setup_gcp.sh script to create GCS bucket and folder structure
Execute setup_gcp.sh script
Verify GCS bucket and folder structure are correct
Develop ingest_data.py script with parallelization, checkpointing, and logging
Test ingest_data.py on a small subset of LiDAR files
Execute full data ingestion process
Verify all raw LiDAR files are successfully stored in /00_raw_lidar/
Phase 1
Phase 2: Automated Triage & Prioritization
Acquire and pre-process all necessary GIS data layers
Store GIS data in /01_inventory_and_gis_data/
Develop triage_tiles.py script for automated spatial cross-referencing
Implement automated prioritization algorithms
Run final triage_tiles.py script to categorize all tiles
Verify that raw.laz files are correctly copied to /02_filtered_tiles/ subdirectories
Phase 2
Phase 3: Scalable LiDAR Processing
Set up GKE Autopilot cluster environment
Containerize the process_lidar_tile.py script and dependencies
Develop and fine-tune the Point Transformer V3 model
Develop and parameter-tune the Cloth Simulation Filter (CSF)
Implement the adaptive interpolation logic in the script
Implement the rvt-py visualization generation logic
Test the full process_lidar_tile.py job on a single high-potential tile
Execute the batch processing job on all prioritized tiles in GKE Autopilot
Verify that all output COGs are correctly generated
Phase 3
Phase 4: Hybrid Intelligence Detection
Set up the annotation environment (e.g., Labelbox, CVAT)
Have expert archaeologist annotate the initial seed set of ~200 RVT images
Configure and train the initial YOLOv9 model on the seed set
Develop the VLM analysis script with CoT prompting
Develop the Active Learning module script
Begin Active Learning Loop iterations
Run YOLO and VLM inference on the unlabeled pool
Consolidate final detections from both YOLO and VLM
Phase 4
Phase 5: Georeferencing & Verification
Develop generate_verification_package.py script
Implement georeferencing function to convert pixel coordinates
Set up API access for Google Earth Engine (Sentinel-2)
Set up access to L-band SAR data archives
Digitize and create the historical text corpus
Implement the GeospaCy/GeoNorm toponym resolution pipeline
Identify and query historical aerial photography archives
Run the generate_verification_package.py script for all sites
Verify all verification packages are complete
Phase 5
Phase 6: Collaborative Review & Dissemination
Establish the formal collaborative review committee
Schedule and conduct review meetings to evaluate verification packages
Create a final, validated list of newly discovered sites
Develop the interactive dashboard template
Generate a unique, secure dashboard for each validated site
Draft scientific publications with partners as co-authors
Submit publications for peer review
Distribute project reports and data to partners
Archive project data and code according to the governance plan
Project Completion Metrics
Key indicators of successful project execution
Data Processing
~3,154 LiDAR tiles processed
AI Performance
Optimized detection accuracy
Archaeological Discovery
Verified site identifications