chromasurr.surrogate module

surrogate.py – Gaussian-process replacement of CADET simulations.

This module provides the Surrogate class that can

  1. sample the CADET parameter space with Saltelli’s method,

  2. train one Gaussian-process (GP) emulator per KPI in log-space,

  3. perform cheap Sobol sensitivity analysis on the GP,

  4. keep only the most important parameters (user-defined criterion), and

  5. retrain without rerunning CADET.

It also exposes Surrogate.predict() and Surrogate.predict_var() so you can propagate uncertainty later (e.g. with chromasurr.uq.perform_monte_carlo_uq).

class chromasurr.surrogate.Surrogate(process, param_config, bounds, metrics, n_train=128, *, kernel=None, seed=0)[source]

Bases: object

Gaussian-process surrogate for a CADET chromatography workflow.

Parameters:
  • process (CADETProcess.processModel.process.Process) – Fully configured process model to emulate.

  • param_config (dict[str, str]) – Mapping from parameter names to attribute paths on process, e.g. {"ads_rate_A": "flow_sheet.column.binding_model.adsorption_rate[0]"}.

  • bounds (dict[str, Sequence[float]]) – Lower/upper bounds for each parameter – same keys/order as param_config.

  • metrics (list[str]) – Keys returned by chromasurr.metrics.extract() to emulate.

  • n_train (int, default=128) – Saltelli base sample size. Total sims ≈ n_train × (2D + 2).

  • kernel (Kernel | type | None, optional) – scikit-learn kernel instance or class. Defaults to Matern(1.5)+White.

  • seed (int, default=0) – RNG seed (NumPy and scikit-learn).

analyze_sensitivity(*, n_samples=1024, metric=None, log_space=True, print_to_console=False)[source]

Compute global Sobol indices via the trained GP.

Parameters:
  • n_samples (int, default=1024) – Saltelli base sample size for the surrogate evaluation.

  • metric (str or None, optional) – If None analyse all metrics and store under :pyattr:`sensitivity`; otherwise analyse only the given metric.

  • log_space (bool, default=True) – Analyse on the GP training scale (True) or exponentiate the predictions first (False).

  • print_to_console (bool, default=False) – Forward SALib’s textual summary to stdout?

Raises:

RuntimeError – If the surrogate has not been trained yet.

Return type:

None

predict(metric, X)[source]

Predict mean of a KPI on the original scale.

Parameters:
  • metric (str) – Which KPI (metric name) to return.

  • X (ndarray, shape (n_samples, n_params)) – Input design.

Returns:

ndarray – Mean predictions on the original (exp) scale.

predict_var(metric, X)[source]

Predict the variance of a KPI on the original scale.

The GP is trained on log-responses. For a log-normal variable Y = exp(Z), with Z ~ N(μ, σ²), the variance is

\[\operatorname{Var}[Y] = (e^{\sigma^2} - 1)\, e^{2\mu + \sigma^2}\]
Parameters:
  • metric (str) – Which KPI (metric name) to return variance for.

  • X (ndarray, shape (n_samples, n_params)) – Input design.

Returns:

ndarray – Variance predictions on the original scale.

retrain()[source]

Retrain without new CADET runs, using only top_params.

Return type:

None

select_important_params(*, metric=None, threshold=0.05, n_top=None)[source]

Pick key parameters based on total Sobol indices.

Exactly one of threshold or n_top can be given. The default keeps parameters with ST 0.05 for the first metric.

Parameters:
  • metric (str or None, optional) – Which metric’s indices to use; defaults to the first metric.

  • threshold (float or None, default=0.05) – Keep parameters with ST >= threshold.

  • n_top (int or None, optional) – If provided, keep exactly the top-n_top parameters by ST.

Returns:

list[str] – The retained parameter names.

Raises:
train()[source]

Run CADET on a Saltelli design and fit one GP per metric.

The method

  1. draws a Saltelli sample X_full with n_train base samples,

  2. simulates CADET for each sample and extracts metrics,

  3. builds a single sklearn.preprocessing.StandardScaler on the intersection of rows valid for all metrics, and

  4. fits a sklearn.gaussian_process.GaussianProcessRegressor on log(y) for each metric.

Raises:
  • RuntimeError – If no valid simulations are available to train on.

  • ValueError – If a particular metric has no finite data.

Return type:

None