Physics Scheme Development Guide¶

Overview¶

This guide provides a comprehensive workflow for developing new physics schemes in CECE. Physics schemes are pluggable modules that compute or modify emissions based on meteorological inputs. CECE provides a scientist-friendly framework that abstracts away ESMF complexity and enables focus on physics implementation.

Physics Scheme Lifecycle¶

Every physics scheme in CECE follows a three-phase lifecycle:

1. Initialize Phase¶

Called once during model initialization. Use this phase to: - Read configuration parameters from YAML - Validate parameters and check for errors - Pre-compute lookup tables or coefficients - Register diagnostic fields - Allocate any internal data structures

Key Points: - Called before the first Run phase - Configuration is immutable after Initialize - Errors should be reported with descriptive messages - Pre-computation here improves Run phase performance

2. Run Phase¶

Called once per time step. Use this phase to: - Resolve input fields from meteorology - Resolve output fields to be computed or modified - Execute Kokkos kernels for physics computation - Mark modified fields for device-to-host synchronization

Key Points: - Must be efficient (called many times) - All computation should use Kokkos for performance portability - Avoid I/O or blocking operations - Field handles are cached to avoid redundant lookups

3. Finalize Phase¶

Called once during model shutdown. Use this phase to: - Release allocated resources - Flush any pending I/O operations - Clean up temporary data structures

Key Points: - Called after the last Run phase - Should be quick (model is shutting down) - BasePhysicsScheme provides a default no-op implementation

Getting Started¶

Prerequisites¶

Before developing a physics scheme, ensure you have: - CECE source code checked out - JCSDA Docker environment set up (see AGENTS.md) - Python 3.6+ with jinja2 and pyyaml installed - Basic understanding of Kokkos and ESMF concepts

Quick Start Example¶

Here's a minimal physics scheme that computes emissions from temperature:

// include/cece/physics/cece_my_scheme.hpp
#ifndef CECE_MY_SCHEME_HPP
#define CECE_MY_SCHEME_HPP

#include "cece/physics_scheme.hpp"

namespace cece {

class MyScheme : public BasePhysicsScheme {
public:
    void Initialize(const YAML::Node& config, CeceDiagnosticManager* diag_manager) override;
    void Run(CeceImportState& import_state, CeceExportState& export_state) override;
};

}  // namespace cece
#endif

// src/physics/cece_my_scheme.cpp
#include "cece/physics/cece_my_scheme.hpp"
#include "cece/cece_physics_factory.hpp"

namespace cece {

static PhysicsRegistration<MyScheme> register_my_scheme("my_scheme");

void MyScheme::Initialize(const YAML::Node& config, CeceDiagnosticManager* diag_manager) {
    BasePhysicsScheme::Initialize(config, diag_manager);
    // Read parameters from config
}

void MyScheme::Run(CeceImportState& import_state, CeceExportState& export_state) {
    auto temperature = ResolveImport("temperature", import_state);
    auto emissions = ResolveExport("emissions", export_state);

    if (temperature.data() == nullptr || emissions.data() == nullptr) return;

    int nx = emissions.extent(0);
    int ny = emissions.extent(1);
    int nz = emissions.extent(2);

    Kokkos::parallel_for(
        "MySchemeKernel",
        Kokkos::MDRangePolicy<Kokkos::DefaultExecutionSpace, Kokkos::Rank<3>>(
            {0, 0, 0}, {nx, ny, nz}),
        KOKKOS_LAMBDA(int i, int j, int k) {
            emissions(i, j, k) = temperature(i, j, k) * 0.001;
        });

    Kokkos::fence();
    MarkModified("emissions", export_state);
}

}  // namespace cece

Scheme Generator Workflow¶

The CECE scheme generator automates the creation of physics scheme scaffolding from YAML configuration files.

Step 1: Create Configuration File¶

Create a YAML file describing your scheme (e.g., my_scheme_config.yaml):

scheme:
  name: MyEmissionScheme
  scheme_name: my_emission_scheme
  description: "Computes emissions from temperature and solar radiation"
  language: cpp  # or fortran

imports:
  - name: temperature
    var_name: temp
    units: K
    dimensions: 3D
  - name: solar_radiation
    var_name: solar
    units: W/m2
    dimensions: 3D

exports:
  - name: emissions
    var_name: emis
    units: kg/m2/s
    dimensions: 2D

options:
  - name: base_emission_factor
    type: double
    default: 1.0e-6
    description: "Base emission factor"
  - name: temperature_ref
    type: double
    default: 298.15
    description: "Reference temperature"
  - name: q10
    type: double
    default: 2.0
    description: "Q10 temperature sensitivity"

diagnostics:
  - name: temperature_factor
    units: dimensionless
    description: "Temperature scaling factor"
  - name: solar_factor
    units: dimensionless
    description: "Solar radiation scaling factor"

Step 2: Run Scheme Generator¶

./scripts/generate_physics_scheme.py my_scheme_config.yaml

This generates: - include/cece/physics/cece_my_emission_scheme.hpp - src/physics/cece_my_emission_scheme.cpp - src/physics/my_emission_scheme_kernel.F90 (if language=fortran)

Step 3: Implement Physics Logic¶

Edit the generated files and implement the physics computation in the Run method.

Step 4: Update CMakeLists.txt¶

Add the new scheme to CMakeLists.txt:

add_library(cece_physics_my_emission_scheme src/physics/cece_my_emission_scheme.cpp)
target_link_libraries(cece_physics_my_emission_scheme PUBLIC cece_core)
target_link_libraries(cece_core PUBLIC cece_physics_my_emission_scheme)

Step 5: Configure in YAML¶

Add the scheme to your CECE configuration file:

physics_schemes:
  - name: my_emission_scheme
    options:
      base_emission_factor: 1.0e-6
      temperature_ref: 298.15
      q10: 2.0

Step 6: Build and Test¶

cd build
cmake ..
make -j4
ctest --output-on-failure

Writing Kokkos Kernels¶

Kokkos kernels are the core of physics computation in CECE. They enable single-source code that runs efficiently on CPUs and GPUs.

Basic Kernel Structure¶

Kokkos::parallel_for(
    "KernelName",
    Kokkos::MDRangePolicy<Kokkos::DefaultExecutionSpace, Kokkos::Rank<3>>(
        {0, 0, 0}, {nx, ny, nz}),
    KOKKOS_LAMBDA(int i, int j, int k) {
        // Kernel body: executed for each grid point
        // Access fields through Kokkos::View handles
        output(i, j, k) = input(i, j, k) * scale;
    });

Kokkos::fence();  // Ensure all threads complete

Key Kokkos Concepts¶

1. Execution Spaces¶

Kokkos::DefaultExecutionSpace automatically selects the appropriate backend: - Serial (debugging) - OpenMP (CPU parallelism) - CUDA (NVIDIA GPU) - HIP (AMD GPU)

// Automatic dispatch to appropriate backend
Kokkos::parallel_for(
    "KernelName",
    Kokkos::MDRangePolicy<Kokkos::DefaultExecutionSpace, Kokkos::Rank<3>>(
        {0, 0, 0}, {nx, ny, nz}),
    KOKKOS_LAMBDA(int i, int j, int k) {
        // Code runs on CPU or GPU automatically
    });

2. Memory Layouts¶

Use Kokkos::LayoutLeft for Fortran-style column-major layout (matches ESMF):

Kokkos::View<double***, Kokkos::LayoutLeft, Kokkos::DefaultExecutionSpace> field;
// Access: field(i, j, k) where i varies fastest

3. Parallel Patterns¶

Parallel For Loop:

Kokkos::parallel_for(
    "LoopName",
    Kokkos::MDRangePolicy<Kokkos::DefaultExecutionSpace, Kokkos::Rank<3>>(
        {0, 0, 0}, {nx, ny, nz}),
    KOKKOS_LAMBDA(int i, int j, int k) {
        // Loop body
    });

Parallel Reduce (Global Reduction):

double total = 0.0;
Kokkos::parallel_reduce(
    "ReductionName",
    Kokkos::MDRangePolicy<Kokkos::DefaultExecutionSpace, Kokkos::Rank<3>>(
        {0, 0, 0}, {nx, ny, nz}),
    KOKKOS_LAMBDA(int i, int j, int k, double& local_sum) {
        local_sum += field(i, j, k);
    }, total);

Atomic Operations (Race-Condition-Free Accumulation):

Kokkos::parallel_for(
    "AtomicName",
    Kokkos::MDRangePolicy<Kokkos::DefaultExecutionSpace, Kokkos::Rank<3>>(
        {0, 0, 0}, {nx, ny, nz}),
    KOKKOS_LAMBDA(int i, int j, int k) {
        // Multiple threads can safely accumulate to the same location
        Kokkos::atomic_add(&output(i, j, 0), input(i, j, k));
    });

Performance Tips¶

1. Minimize Floating-Point Operations¶

Use Horner's method for polynomial evaluation:

// Bad: 4 multiplications
double result = a*x*x*x + b*x*x + c*x + d;

// Good: 3 multiplications (Horner's method)
double result = ((a*x + b)*x + c)*x + d;

2. Avoid Function Calls in Kernels¶

Pre-compute values outside the kernel:

// Bad: Expensive function called for every grid point
Kokkos::parallel_for(..., KOKKOS_LAMBDA(int i, int j, int k) {
    output(i, j, k) = expensive_function(input(i, j, k));
});

// Good: Pre-compute lookup table
std::vector<double> lookup(256);
for (int i = 0; i < 256; ++i) {
    lookup[i] = expensive_function(i / 256.0);
}
auto lookup_device = Kokkos::create_mirror_view_and_copy(
    Kokkos::DefaultExecutionSpace(), lookup);

Kokkos::parallel_for(..., KOKKOS_LAMBDA(int i, int j, int k) {
    int idx = (int)(input(i, j, k) * 256);
    output(i, j, k) = lookup_device(idx);
});

3. Optimize Memory Access Patterns¶

Access memory in order of layout:

// Good: Coalesced memory access (LayoutLeft)
Kokkos::parallel_for(..., KOKKOS_LAMBDA(int i, int j, int k) {
    output(i, j, k) = input(i, j, k);  // i varies fastest
});

// Bad: Strided memory access
Kokkos::parallel_for(..., KOKKOS_LAMBDA(int i, int j, int k) {
    output(k, j, i) = input(i, j, k);  // k varies fastest
});

4. Use Local Variables to Cache Values¶

// Good: Cache frequently accessed values
Kokkos::parallel_for(..., KOKKOS_LAMBDA(int i, int j, int k) {
    double temp = temperature(i, j, k);
    double solar = solar_radiation(i, j, k);
    double factor = temp * solar;
    output(i, j, k) = base_value * factor;
});

Calling Fortran Kernels from C++¶

CECE supports calling Fortran kernels from C++ schemes using Fortran-C interoperability.

Fortran Kernel Example¶

! src/physics/my_kernel.F90
module my_kernel_mod
    use iso_c_binding
    implicit none
    private
    public :: run_my_kernel

contains

    subroutine run_my_kernel(nx, ny, nz, temperature_ptr, emissions_ptr) &
        bind(C, name="run_my_kernel_fortran")
        integer(c_int), intent(in), value :: nx, ny, nz
        real(c_double), intent(in) :: temperature_ptr(nx, ny, nz)
        real(c_double), intent(inout) :: emissions_ptr(nx, ny, nz)

        integer :: i, j, k

        !$omp parallel do collapse(3)
        do k = 1, nz
            do j = 1, ny
                do i = 1, nx
                    emissions_ptr(i, j, k) = temperature_ptr(i, j, k) * 0.001_c_double
                end do
            end do
        end do
        !$omp end parallel do

    end subroutine run_my_kernel

end module my_kernel_mod

C++ Wrapper¶

// Declare Fortran function
extern "C" void run_my_kernel_fortran(int nx, int ny, int nz,
                                      double* temperature_ptr,
                                      double* emissions_ptr);

// Call from C++ scheme
void MyScheme::Run(CeceImportState& import_state, CeceExportState& export_state) {
    auto temperature = ResolveImport("temperature", import_state);
    auto emissions = ResolveExport("emissions", export_state);

    if (temperature.data() == nullptr || emissions.data() == nullptr) return;

    int nx = emissions.extent(0);
    int ny = emissions.extent(1);
    int nz = emissions.extent(2);

    // Sync device to host before calling Fortran
    Kokkos::fence();

    // Call Fortran kernel
    run_my_kernel_fortran(nx, ny, nz,
                         const_cast<double*>(temperature.data()),
                         emissions.data());

    // Mark modified for device-to-host sync
    MarkModified("emissions", export_state);
}

Important Notes¶

Fortran arrays are 1-indexed; C++ arrays are 0-indexed
Use iso_c_binding for proper type compatibility
Sync device to host before calling Fortran
Fortran kernels run on CPU only (no GPU support)
Use OpenMP for parallelization in Fortran

Configuration and Parameters¶

All physics parameters must be configurable via YAML. Never hardcode physical constants.

Reading Parameters¶

void MyScheme::Initialize(const YAML::Node& config, CeceDiagnosticManager* diag_manager) {
    BasePhysicsScheme::Initialize(config, diag_manager);

    // Read scalar parameters
    if (config["emission_factor"]) {
        emission_factor_ = config["emission_factor"].as<double>();
    }

    // Read array parameters
    if (config["diurnal_cycle"]) {
        for (int hour = 0; hour < 24; ++hour) {
            diurnal_cycle_[hour] = config["diurnal_cycle"][hour].as<double>();
        }
    }

    // Read boolean flags
    if (config["apply_mask"]) {
        apply_mask_ = config["apply_mask"].as<bool>();
    }
}

YAML Configuration Example¶

physics_schemes:
  - name: my_scheme
    options:
      emission_factor: 1.0e-6
      temperature_ref: 298.15
      q10: 2.0
      apply_mask: true
      diurnal_cycle:
        - 0.5
        - 0.6
        - 0.7
        # ... 24 values total

Parameter Validation¶

void MyScheme::Initialize(const YAML::Node& config, CeceDiagnosticManager* diag_manager) {
    BasePhysicsScheme::Initialize(config, diag_manager);

    if (config["emission_factor"]) {
        emission_factor_ = config["emission_factor"].as<double>();
        if (emission_factor_ < 0.0) {
            throw std::invalid_argument("emission_factor must be non-negative");
        }
    }

    if (config["temperature_ref"]) {
        temperature_ref_ = config["temperature_ref"].as<double>();
        if (temperature_ref_ < 200.0 || temperature_ref_ > 330.0) {
            throw std::out_of_range("temperature_ref must be between 200 and 330 K");
        }
    }
}

Field Resolution and Caching¶

BasePhysicsScheme provides efficient field resolution with automatic caching.

Resolving Import Fields¶

// Resolve meteorological input field
auto temperature = ResolveImport("temperature", import_state);

// Field handle is cached; subsequent calls return cached value
auto temperature_again = ResolveImport("temperature", import_state);
// No ESMF lookup performed; returns cached handle

Resolving Export Fields¶

// Resolve emission output field
auto emissions = ResolveExport("emissions", export_state);

// Modify the field
Kokkos::parallel_for(..., KOKKOS_LAMBDA(int i, int j, int k) {
    emissions(i, j, k) = compute_emission(i, j, k);
});

// Mark as modified for device-to-host sync
MarkModified("emissions", export_state);

Resolving from Both States¶

// Useful for schemes that depend on previously computed fields
auto base_emissions = ResolveInput("emissions", import_state, export_state);
// Checks import state first, then export state

Field Name Mapping¶

Map internal names to external field names via YAML:

physics_schemes:
  - name: my_scheme
    input_mapping:
      temp: temperature
      solar: solar_radiation
    output_mapping:
      emis: emissions

Then use internal names in code:

auto temperature = ResolveImport("temp", import_state);  // Maps to "temperature"
auto emissions = ResolveExport("emis", export_state);    // Maps to "emissions"

Diagnostic Fields¶

Diagnostic fields are optional intermediate variables useful for validation and debugging.

Registering Diagnostic Fields¶

void MyScheme::Initialize(const YAML::Node& config, CeceDiagnosticManager* diag_manager) {
    BasePhysicsScheme::Initialize(config, diag_manager);

    if (diag_manager != nullptr) {
        temperature_factor_ = ResolveDiagnostic("temperature_factor", nx, ny, nz,
                                                "dimensionless",
                                                "Temperature scaling factor");
    }
}

Writing to Diagnostic Fields¶

void MyScheme::Run(CeceImportState& import_state, CeceExportState& export_state) {
    // ... resolve fields ...

    Kokkos::parallel_for(..., KOKKOS_LAMBDA(int i, int j, int k) {
        double temp_factor = compute_temperature_factor(temperature(i, j, k));
        emissions(i, j, k) = base_emission * temp_factor;

        // Store diagnostic value
        temperature_factor_(i, j, k) = temp_factor;
    });
}

Diagnostic Output¶

Diagnostic fields are automatically written to NetCDF output if enabled in configuration:

output:
  diagnostics: true
  fields:
    - emissions
    - temperature_factor

Performance Optimization¶

Profiling¶

Use Kokkos profiling tools to identify bottlenecks:

# Enable Kokkos profiling
export KOKKOS_PROFILE_LIBRARY=/path/to/libkokkos_profiling.so
./cece_driver

Common Optimization Techniques¶

1. Kernel Fusion¶

Combine multiple kernels into one to reduce memory bandwidth:

// Bad: Two separate kernels
Kokkos::parallel_for(..., KOKKOS_LAMBDA(int i, int j, int k) {
    temp_factor(i, j, k) = compute_temp_factor(temperature(i, j, k));
});
Kokkos::parallel_for(..., KOKKOS_LAMBDA(int i, int j, int k) {
    emissions(i, j, k) = base_emission * temp_factor(i, j, k);
});

// Good: Fused kernel
Kokkos::parallel_for(..., KOKKOS_LAMBDA(int i, int j, int k) {
    double temp_factor = compute_temp_factor(temperature(i, j, k));
    emissions(i, j, k) = base_emission * temp_factor;
});

2. Lookup Tables¶

Pre-compute expensive functions:

// Pre-compute lookup table in Initialize
std::vector<double> lookup(256);
for (int i = 0; i < 256; ++i) {
    lookup[i] = expensive_function(i / 256.0);
}
auto lookup_device = Kokkos::create_mirror_view_and_copy(
    Kokkos::DefaultExecutionSpace(), lookup);

// Use in kernel
Kokkos::parallel_for(..., KOKKOS_LAMBDA(int i, int j, int k) {
    int idx = (int)(input(i, j, k) * 256);
    output(i, j, k) = lookup_device(idx);
});

3. Vectorization¶

Structure code to enable compiler vectorization:

// Good: Compiler can vectorize
Kokkos::parallel_for(..., KOKKOS_LAMBDA(int i, int j, int k) {
    output(i, j, k) = input(i, j, k) * scale;
});

// Bad: Compiler cannot vectorize (data dependency)
Kokkos::parallel_for(..., KOKKOS_LAMBDA(int i, int j, int k) {
    output(i, j, k) = output(i-1, j, k) + input(i, j, k);
});

Testing and Validation¶

Unit Tests¶

Create unit tests for your scheme:

// tests/test_my_scheme.cpp
#include <gtest/gtest.h>
#include "cece/physics/cece_my_scheme.hpp"

class MySchemeTest : public ::testing::Test {
protected:
    void SetUp() override {
        // Initialize scheme
        scheme_.Initialize(config_, nullptr);
    }

    MyScheme scheme_;
    YAML::Node config_;
};

TEST_F(MySchemeTest, EmissionComputation) {
    // Create mock import/export states
    // Call scheme_.Run()
    // Verify results
}

Property-Based Testing¶

Test universal properties with random inputs:

// Generate random valid inputs
// Execute scheme
// Verify properties (e.g., mass conservation, physical bounds)

Integration Testing¶

Test scheme in full CECE context:

# Build with scheme
cd build && cmake .. && make -j4

# Run single-model driver
./cece_nuopc_single_driver --config cece_config.yaml

# Verify output
ncdump -h cece_output.nc

Troubleshooting¶

Common Issues¶

1. Segmentation Fault in Kernel¶

Cause: Accessing invalid memory (out-of-bounds array access)

Solution: - Check array dimensions match grid size - Verify field handles are not null before use - Use bounds checking in debug builds

if (temperature.data() == nullptr) {
    return;  // Field not found
}

2. Incorrect Results¶

Cause: Logic error or incorrect field resolution

Solution: - Add diagnostic fields to inspect intermediate values - Compare with reference implementation - Check parameter values in YAML config

3. Poor Performance¶

Cause: Inefficient kernel implementation

Solution: - Profile with Kokkos profiling tools - Check memory access patterns - Fuse multiple kernels - Use lookup tables for expensive functions

4. Device-Host Synchronization Issues¶

Cause: Forgetting to call MarkModified()

Solution: - Always call MarkModified() after modifying export fields - Ensure Kokkos::fence() is called before CPU access

Kokkos::parallel_for(..., KOKKOS_LAMBDA(...) { ... });
Kokkos::fence();
MarkModified("emissions", export_state);

Best Practices¶

1. Always Use BasePhysicsScheme Helpers¶

// Good: Use helper methods
auto temperature = ResolveImport("temperature", import_state);
auto emissions = ResolveExport("emissions", export_state);

// Bad: Direct ESMF calls
// ESMC_Field field = ESMC_StateGetField(import_state, "temperature", rc);

2. Never Hardcode Physical Constants¶

// Good: Read from YAML
if (config["q10"]) {
    q10_ = config["q10"].as<double>();
}

// Bad: Hardcoded constant
double q10 = 2.0;

3. Use Kokkos for All Compute¶

// Good: Kokkos kernel
Kokkos::parallel_for(..., KOKKOS_LAMBDA(...) { ... });

// Bad: CPU-only loop
for (int i = 0; i < nx; ++i) {
    for (int j = 0; j < ny; ++j) {
        output(i, j, 0) = input(i, j, 0) * scale;
    }
}

4. Validate Configuration Parameters¶

// Good: Validate parameters
if (emission_factor_ < 0.0) {
    throw std::invalid_argument("emission_factor must be non-negative");
}

// Bad: No validation
// emission_factor_ = config["emission_factor"].as<double>();

5. Document Your Scheme¶

/**
 * @class MyScheme
 * @brief Computes emissions from temperature and solar radiation
 *
 * This scheme implements the Q10 temperature parameterization:
 * emission = base_factor * Q10^((T-Tref)/10) * solar_factor
 *
 * Configuration parameters:
 * - base_emission_factor: Base emission rate [kg/m2/s]
 * - temperature_ref: Reference temperature [K]
 * - q10: Q10 temperature sensitivity factor
 * - solar_factor: Solar radiation scaling coefficient
 */
class MyScheme : public BasePhysicsScheme { ... };

6. Test on Both CPU and GPU¶

# Test on CPU (OpenMP)
export KOKKOS_DEVICES=OpenMP
ctest --output-on-failure

# Test on GPU (CUDA)
export KOKKOS_DEVICES=Cuda
ctest --output-on-failure

7. Use Meaningful Kernel Names¶

// Good: Descriptive kernel name
Kokkos::parallel_for(
    "MySchemeEmissionComputationKernel",
    ...);

// Bad: Generic kernel name
Kokkos::parallel_for(
    "Kernel",
    ...);

Example: Complete Physics Scheme¶

See the following example schemes in the CECE repository:

ExampleEmissionGeneration (include/cece/physics/cece_example_emission_generation.hpp)
Demonstrates emission generation pattern
Shows Q10 temperature parameterization
Includes diagnostic field output
ExampleEmissionModification (include/cece/physics/cece_example_emission_modification.hpp)
Demonstrates emission modification pattern
Shows diurnal cycle application
Shows in-place field modification
ExampleDiagnosticComputation (include/cece/physics/cece_example_diagnostic_computation.hpp)
Demonstrates diagnostic field computation
Shows validation and quality flagging
Shows reading from export state

Additional Resources¶

ESMF User Guide: https://earthsystemmodeling.org/docs/release/latest/ESMF_usrdoc
Kokkos Documentation: https://kokkos.github.io/kokkos-core-wiki/
CECE Architecture: See docs/api.md
Physics Scheme Examples: See include/cece/physics/cece_example_*.hpp

Getting Help¶

If you encounter issues:

Check the troubleshooting section above
Review example schemes in the repository
Check CECE documentation in docs/
Run tests to verify your implementation
Use diagnostic fields to inspect intermediate values
Profile with Kokkos profiling tools

Good luck with your physics scheme development!