ArcheoVLM
Phase 1: Cloud Foundation & Data Ingestion
Establish the GCP environment and efficiently ingest the raw LiDAR data
Goal
Establish the GCP environment and efficiently ingest the raw LiDAR data for processing.
Infrastructure Setup
Google Cloud Platform configuration
GCS Bucket Structure
/00_raw_lidar/
/01_inventory_and_gis_data/
/02_filtered_tiles/
/03_processed_visualizations/
/04_analysis_outputs/
/05_verification_packages/
Setup Script
setup_gcp.sh
Provisions GCS bucket with predefined folder structure
Data Ingestion
ingest_data.py
Scalable transfer with parallelization and checkpointing
Data Source
LiDAR dataset specifications
Dataset
ORNL DAAC LiDAR Surveys
2008-2018
File Count
Raw LiDAR files
~3,154 .laz files
Coverage
Brazilian Amazon
Forest Research Sites
Execution Checklist
Phase 1 tasks and deliverables
Create new Google Cloud Project
Write and test setup_gcp.sh script to create GCS bucket and folder structure
Execute setup_gcp.sh script
Verify GCS bucket and folder structure are correct
Develop ingest_data.py script with parallelization, checkpointing, and logging
Test ingest_data.py on a small subset of LiDAR files
Execute full data ingestion process
Verify all raw LiDAR files are successfully stored in /00_raw_lidar/
Expected Outputs
GCS Infrastructure
Fully provisioned bucket with organized folder structure
Raw Data Storage
All LiDAR files successfully transferred and verified