The CECE Stacking Engine¶

The Stacking Engine is the computational core of CECE, responsible for combining multiple emission data layers into final emission fields for each chemical species. It implements a sophisticated hierarchy-based processing system that mimics and extends the HEMCO (Harmonized Emissions Component) approach while providing significant performance optimizations for modern HPC architectures.

Overview¶

The Stacking Engine processes emission data through several key phases:

Configuration Analysis - Parse and validate the emission layer configuration
Field Binding - Connect to data sources (TIDE streams, ESMF fields)
Hierarchy Processing - Apply priority-based layer combination rules
Kernel Fusion - Generate optimized compute kernels for performance
Temporal Scaling - Apply time-dependent scaling factors
Vertical Distribution - Map 2D emissions to 3D atmospheric grids
Provenance Tracking - Record the complete emission calculation history

Architecture and Performance¶

Kernel Fusion Optimization¶

Unlike traditional approach of processing each layer sequentially, CECE uses kernel fusion to generate a single, optimized compute kernel per species. This provides several key advantages:

Reduced Memory Bandwidth: Minimizes data movement between GPU/CPU memory and compute units
Lower Kernel Launch Overhead: Single kernel launch per species instead of multiple
Better Cache Utilization: Maximizes data reuse within compute kernels
Improved Parallelization: Better load balancing across computational resources

Hierarchy-Based Processing¶

The engine processes layers according to a two-level hierarchy system:

Categories: Logical groupings (e.g., anthropogenic, biomass_burning, biogenic)
Hierarchy Levels: Numerical priorities within categories (higher numbers take precedence)

species:
  co:
    - field: "global_co_inventory"
      category: "anthropogenic"
      hierarchy: 1              # Base layer
      operation: "add"
    - field: "regional_co_override"
      category: "anthropogenic"
      hierarchy: 10             # Higher priority - replaces base
      operation: "replace"

Layer Operations¶

Basic Operations¶

The Stacking Engine supports four fundamental operations for combining emission layers:

Operation	Description	Behavior
`add`	Accumulative	Adds layer emissions to current accumulated value
`multiply`	Multiplicative	Multiplies current accumulated value by layer
`replace`	Override	Replaces accumulated value with layer emissions
`set`	Initialization	Sets initial value (typically used for first layer)

Masking and Scaling¶

Each layer can be modified through multiple mechanisms applied in sequence:

Base Scale Factor: Simple numerical multiplier applied to raw emission data
Scale Fields: Dynamic field-based scaling (e.g., temperature dependence)
Geographical Masks: 2D or 3D fields that spatially restrict emissions
Temporal Profiles: Time-varying scale factors (diurnal, weekly, seasonal)

species:
  isoprene:
    - field: "base_isoprene"
      scale: 1.5                           # Base scale factor
      mask: "vegetation_mask"              # Apply only over vegetation
      scale_fields: ["temperature", "lai"] # Scale with temp and leaf area
      diurnal_cycle: "biogenic_diurnal"    # Apply diurnal variation
      operation: "add"

Vertical Distribution¶

CECE supports multiple algorithms for distributing 2D surface emissions into 3D atmospheric volumes:

Distribution Methods¶

Method	Description	Use Cases
`SINGLE`	All emissions in one vertical level	Aircraft emissions, elevated sources
`RANGE`	Even distribution over layer range	Industrial stacks, biomass burning
`PRESSURE`	Distribution based on pressure bounds	Free troposphere sources
`HEIGHT`	Distribution based on altitude bounds	Topography-dependent sources
`PBL`	Distribution within planetary boundary layer	Surface-based anthropogenic sources

Mass Conservation¶

All vertical distribution methods ensure strict mass conservation:

∑(emissions_3d[i,j,:]) = emissions_2d[i,j]  ∀ i,j

Configuration Example¶

species:
  nox:
    - field: "aircraft_nox"
      vdist_method: "HEIGHT"
      vdist_h_start: 8000.0    # 8 km altitude
      vdist_h_end: 12000.0     # 12 km altitude
      operation: "add"
    - field: "surface_nox"
      vdist_method: "PBL"      # Distribute in boundary layer
      operation: "add"

Temporal Scaling¶

The Stacking Engine applies time-dependent scale factors to account for temporal emission variability:

Cycle Types¶

Diurnal: 24-hour cycle (hourly scale factors)
Weekly: 7-day cycle (daily scale factors)
Seasonal: 12-month cycle (monthly scale factors)

Implementation¶

Temporal profiles are defined globally and referenced by layers:

temporal_profiles:
  traffic_diurnal: [0.5, 0.3, 0.2, 0.3, 0.6, 1.2, 1.8, 1.5, 1.2, 1.0, 1.1, 1.2,
                    1.3, 1.2, 1.3, 1.5, 1.8, 2.0, 1.8, 1.5, 1.2, 1.0, 0.8, 0.6]

  weekday_pattern: [1.2, 1.3, 1.3, 1.3, 1.3, 0.8, 0.7]  # Mon-Sun

species:
  co:
    - field: "traffic_co"
      diurnal_cycle: "traffic_diurnal"
      weekly_cycle: "weekday_pattern"
      operation: "add"

Performance Considerations¶

Memory Layout¶

The Stacking Engine uses Kokkos Views with optimized memory layouts:

// Device-optimized layout for GPU execution
using DeviceView3D = Kokkos::View<double***, Kokkos::LayoutLeft, Kokkos::DefaultExecutionSpace>;

// Host mirror for CPU-GPU data transfer
using HostMirror3D = Kokkos::View<double***, Kokkos::LayoutLeft>::HostMirror;

Execution Patterns¶

Parallel Field Operations: All grid points processed simultaneously
Hierarchical Layer Processing: Layers processed in priority order
Fused Kernel Execution: Single kernel per species combines all operations
Asynchronous Data Transfer: Overlapped CPU-GPU memory transfers

Scaling Performance¶

Typical performance characteristics on modern HPC systems:

Grid Size	CPU Cores	GPU	Throughput
1440×721×72	40 (Intel Xeon)	-	~50 species/sec
1440×721×72	-	V100	~200 species/sec
3600×1801×72	-	A100	~150 species/sec

Provenance Tracking¶

The Stacking Engine provides complete scientific traceability through its provenance system:

Tracked Information¶

Layer Contributions: Which fields contribute to each species
Hierarchy Application: Order and priority of layer processing
Scaling Factors: All applied temporal and spatial scale factors
Operation History: Complete record of mathematical operations
Field Sources: Data source identification (TIDE/ESMF/computed)

Provenance Output¶

# Example provenance report excerpt
species: CO
time_context: hour=14 day_of_week=2 month=7
contributing_layers:
  - field: global_co_base
    operation: add
    hierarchy: 1
    category: anthropogenic
    effective_scale: 1.25  # Base scale × temporal factors
    masks: [land_mask]
  - field: regional_co_override
    operation: replace
    hierarchy: 10
    effective_scale: 0.85
    geographic_bounds: [lon_min: -125, lon_max: -65, lat_min: 25, lat_max: 50]

Advanced Configuration¶

Multi-Scale Processing¶

Complex emission scenarios can use multiple scale factors simultaneously:

species:
  biogenic_voc:
    - field: "base_biogenic"
      scale: 2.0                           # Literature adjustment factor
      scale_fields: ["temperature", "par", "lai"]  # Environmental dependencies
      masks: ["vegetation_mask", "growing_season"] # Geographic/temporal masks
      operation: "add"

Category-Based Hierarchies¶

Organize related emission sources using categories:

species:
  nox:
    # Transportation category
    - field: "road_transport"
      category: "transportation"
      hierarchy: 1
    - field: "aviation"
      category: "transportation"
      hierarchy: 2
    - field: "shipping"
      category: "transportation"
      hierarchy: 3

    # Industrial category
    - field: "power_plants"
      category: "industrial"
      hierarchy: 1
    - field: "cement_production"
      category: "industrial"
      hierarchy: 2

Integration with Physics Schemes¶

The Stacking Engine coordinates with CECE physics schemes:

Base Field Processing: Stacking Engine processes static/inventory emissions
Physics Augmentation: Physics schemes modify or add to stacked emissions
Final Assembly: Combined results synchronized to ESMF export state

Execution Order¶

1. Parse Configuration → Validate layers and hierarchy
2. Bind Data Fields → Connect to TIDE and ESMF data sources
3. Stack Base Emissions → Apply hierarchy and scaling rules
4. Execute Physics Schemes → Run active emission parameterizations
5. Apply Diagnostics → Capture intermediate and final results
6. Synchronize Output → Transfer to ESMF for coupled model use

This architecture ensures that both inventory-based and process-based emissions are properly integrated while maintaining high computational performance and complete scientific traceability.