The CECE Stacking Engine¶
The Stacking Engine is the computational core of CECE, responsible for combining multiple emission data layers into final emission fields for each chemical species. It implements a sophisticated hierarchy-based processing system that mimics and extends the HEMCO (Harmonized Emissions Component) approach while providing significant performance optimizations for modern HPC architectures.
Overview¶
The Stacking Engine processes emission data through several key phases:
- Configuration Analysis - Parse and validate the emission layer configuration
- Field Binding - Connect to data sources (TIDE streams, ESMF fields)
- Hierarchy Processing - Apply priority-based layer combination rules
- Kernel Fusion - Generate optimized compute kernels for performance
- Temporal Scaling - Apply time-dependent scaling factors
- Vertical Distribution - Map 2D emissions to 3D atmospheric grids
- Provenance Tracking - Record the complete emission calculation history
Architecture and Performance¶
Kernel Fusion Optimization¶
Unlike traditional approach of processing each layer sequentially, CECE uses kernel fusion to generate a single, optimized compute kernel per species. This provides several key advantages:
- Reduced Memory Bandwidth: Minimizes data movement between GPU/CPU memory and compute units
- Lower Kernel Launch Overhead: Single kernel launch per species instead of multiple
- Better Cache Utilization: Maximizes data reuse within compute kernels
- Improved Parallelization: Better load balancing across computational resources
Hierarchy-Based Processing¶
The engine processes layers according to a two-level hierarchy system:
- Categories: Logical groupings (e.g.,
anthropogenic,biomass_burning,biogenic) - Hierarchy Levels: Numerical priorities within categories (higher numbers take precedence)
species:
co:
- field: "global_co_inventory"
category: "anthropogenic"
hierarchy: 1 # Base layer
operation: "add"
- field: "regional_co_override"
category: "anthropogenic"
hierarchy: 10 # Higher priority - replaces base
operation: "replace"
Layer Operations¶
Basic Operations¶
The Stacking Engine supports four fundamental operations for combining emission layers:
| Operation | Description | Behavior |
|---|---|---|
add |
Accumulative | Adds layer emissions to current accumulated value |
multiply |
Multiplicative | Multiplies current accumulated value by layer |
replace |
Override | Replaces accumulated value with layer emissions |
set |
Initialization | Sets initial value (typically used for first layer) |
Masking and Scaling¶
Each layer can be modified through multiple mechanisms applied in sequence:
- Base Scale Factor: Simple numerical multiplier applied to raw emission data
- Scale Fields: Dynamic field-based scaling (e.g., temperature dependence)
- Geographical Masks: 2D or 3D fields that spatially restrict emissions
- Temporal Profiles: Time-varying scale factors (diurnal, weekly, seasonal)
species:
isoprene:
- field: "base_isoprene"
scale: 1.5 # Base scale factor
mask: "vegetation_mask" # Apply only over vegetation
scale_fields: ["temperature", "lai"] # Scale with temp and leaf area
diurnal_cycle: "biogenic_diurnal" # Apply diurnal variation
operation: "add"
Vertical Distribution¶
CECE supports multiple algorithms for distributing 2D surface emissions into 3D atmospheric volumes:
Distribution Methods¶
| Method | Description | Use Cases |
|---|---|---|
SINGLE |
All emissions in one vertical level | Aircraft emissions, elevated sources |
RANGE |
Even distribution over layer range | Industrial stacks, biomass burning |
PRESSURE |
Distribution based on pressure bounds | Free troposphere sources |
HEIGHT |
Distribution based on altitude bounds | Topography-dependent sources |
PBL |
Distribution within planetary boundary layer | Surface-based anthropogenic sources |
Mass Conservation¶
All vertical distribution methods ensure strict mass conservation:
Configuration Example¶
species:
nox:
- field: "aircraft_nox"
vdist_method: "HEIGHT"
vdist_h_start: 8000.0 # 8 km altitude
vdist_h_end: 12000.0 # 12 km altitude
operation: "add"
- field: "surface_nox"
vdist_method: "PBL" # Distribute in boundary layer
operation: "add"
Temporal Scaling¶
The Stacking Engine applies time-dependent scale factors to account for temporal emission variability:
Cycle Types¶
- Diurnal: 24-hour cycle (hourly scale factors)
- Weekly: 7-day cycle (daily scale factors)
- Seasonal: 12-month cycle (monthly scale factors)
Implementation¶
Temporal profiles are defined globally and referenced by layers:
temporal_profiles:
traffic_diurnal: [0.5, 0.3, 0.2, 0.3, 0.6, 1.2, 1.8, 1.5, 1.2, 1.0, 1.1, 1.2,
1.3, 1.2, 1.3, 1.5, 1.8, 2.0, 1.8, 1.5, 1.2, 1.0, 0.8, 0.6]
weekday_pattern: [1.2, 1.3, 1.3, 1.3, 1.3, 0.8, 0.7] # Mon-Sun
species:
co:
- field: "traffic_co"
diurnal_cycle: "traffic_diurnal"
weekly_cycle: "weekday_pattern"
operation: "add"
Performance Considerations¶
Memory Layout¶
The Stacking Engine uses Kokkos Views with optimized memory layouts:
// Device-optimized layout for GPU execution
using DeviceView3D = Kokkos::View<double***, Kokkos::LayoutLeft, Kokkos::DefaultExecutionSpace>;
// Host mirror for CPU-GPU data transfer
using HostMirror3D = Kokkos::View<double***, Kokkos::LayoutLeft>::HostMirror;
Execution Patterns¶
- Parallel Field Operations: All grid points processed simultaneously
- Hierarchical Layer Processing: Layers processed in priority order
- Fused Kernel Execution: Single kernel per species combines all operations
- Asynchronous Data Transfer: Overlapped CPU-GPU memory transfers
Scaling Performance¶
Typical performance characteristics on modern HPC systems:
| Grid Size | CPU Cores | GPU | Throughput |
|---|---|---|---|
| 1440×721×72 | 40 (Intel Xeon) | - | ~50 species/sec |
| 1440×721×72 | - | V100 | ~200 species/sec |
| 3600×1801×72 | - | A100 | ~150 species/sec |
Provenance Tracking¶
The Stacking Engine provides complete scientific traceability through its provenance system:
Tracked Information¶
- Layer Contributions: Which fields contribute to each species
- Hierarchy Application: Order and priority of layer processing
- Scaling Factors: All applied temporal and spatial scale factors
- Operation History: Complete record of mathematical operations
- Field Sources: Data source identification (TIDE/ESMF/computed)
Provenance Output¶
# Example provenance report excerpt
species: CO
time_context: hour=14 day_of_week=2 month=7
contributing_layers:
- field: global_co_base
operation: add
hierarchy: 1
category: anthropogenic
effective_scale: 1.25 # Base scale × temporal factors
masks: [land_mask]
- field: regional_co_override
operation: replace
hierarchy: 10
effective_scale: 0.85
geographic_bounds: [lon_min: -125, lon_max: -65, lat_min: 25, lat_max: 50]
Advanced Configuration¶
Multi-Scale Processing¶
Complex emission scenarios can use multiple scale factors simultaneously:
species:
biogenic_voc:
- field: "base_biogenic"
scale: 2.0 # Literature adjustment factor
scale_fields: ["temperature", "par", "lai"] # Environmental dependencies
masks: ["vegetation_mask", "growing_season"] # Geographic/temporal masks
operation: "add"
Category-Based Hierarchies¶
Organize related emission sources using categories:
species:
nox:
# Transportation category
- field: "road_transport"
category: "transportation"
hierarchy: 1
- field: "aviation"
category: "transportation"
hierarchy: 2
- field: "shipping"
category: "transportation"
hierarchy: 3
# Industrial category
- field: "power_plants"
category: "industrial"
hierarchy: 1
- field: "cement_production"
category: "industrial"
hierarchy: 2
Integration with Physics Schemes¶
The Stacking Engine coordinates with CECE physics schemes:
- Base Field Processing: Stacking Engine processes static/inventory emissions
- Physics Augmentation: Physics schemes modify or add to stacked emissions
- Final Assembly: Combined results synchronized to ESMF export state
Execution Order¶
1. Parse Configuration → Validate layers and hierarchy
2. Bind Data Fields → Connect to TIDE and ESMF data sources
3. Stack Base Emissions → Apply hierarchy and scaling rules
4. Execute Physics Schemes → Run active emission parameterizations
5. Apply Diagnostics → Capture intermediate and final results
6. Synchronize Output → Transfer to ESMF for coupled model use
This architecture ensures that both inventory-based and process-based emissions are properly integrated while maintaining high computational performance and complete scientific traceability.