ArcheoVLM

Phase 1: Cloud Foundation & Data Ingestion

Establish the GCP environment and efficiently ingest the raw LiDAR data

Goal

Establish the GCP environment and efficiently ingest the raw LiDAR data for processing.

Infrastructure Setup
Google Cloud Platform configuration

GCS Bucket Structure

/00_raw_lidar/
/01_inventory_and_gis_data/
/02_filtered_tiles/
/03_processed_visualizations/
/04_analysis_outputs/
/05_verification_packages/

Setup Script

setup_gcp.sh

Provisions GCS bucket with predefined folder structure

Data Ingestion

ingest_data.py

Scalable transfer with parallelization and checkpointing

Data Source
LiDAR dataset specifications

Dataset

ORNL DAAC LiDAR Surveys

2008-2018

File Count

Raw LiDAR files

~3,154 .laz files

Coverage

Brazilian Amazon

Forest Research Sites
Execution Checklist
Phase 1 tasks and deliverables
Create new Google Cloud Project
Write and test setup_gcp.sh script to create GCS bucket and folder structure
Execute setup_gcp.sh script
Verify GCS bucket and folder structure are correct
Develop ingest_data.py script with parallelization, checkpointing, and logging
Test ingest_data.py on a small subset of LiDAR files
Execute full data ingestion process
Verify all raw LiDAR files are successfully stored in /00_raw_lidar/
Expected Outputs

GCS Infrastructure

Fully provisioned bucket with organized folder structure

Raw Data Storage

All LiDAR files successfully transferred and verified