ArcheoVLM

Phase 1: Cloud Foundation & Data Ingestion

Establish the GCP environment and efficiently ingest the raw LiDAR data

Goal

Establish the GCP environment and efficiently ingest the raw LiDAR data for processing.

Infrastructure Setup

Google Cloud Platform configuration

/00_raw_lidar/

/01_inventory_and_gis_data/

/02_filtered_tiles/

/03_processed_visualizations/

/04_analysis_outputs/

/05_verification_packages/

setup_gcp.sh

Provisions GCS bucket with predefined folder structure

ingest_data.py

Scalable transfer with parallelization and checkpointing

Data Source

LiDAR dataset specifications

ORNL DAAC LiDAR Surveys

2008-2018

Raw LiDAR files

~3,154 .laz files

Brazilian Amazon

Forest Research Sites

Execution Checklist

Phase 1 tasks and deliverables

Create new Google Cloud Project

Write and test setup_gcp.sh script to create GCS bucket and folder structure

Execute setup_gcp.sh script

Verify GCS bucket and folder structure are correct

Develop ingest_data.py script with parallelization, checkpointing, and logging

Test ingest_data.py on a small subset of LiDAR files

Execute full data ingestion process

Verify all raw LiDAR files are successfully stored in /00_raw_lidar/

Expected Outputs

Fully provisioned bucket with organized folder structure

All LiDAR files successfully transferred and verified