chromasurr.surrogate module

surrogate.py – Gaussian-process replacement of CADET simulations.

This module provides the Surrogate class that can

sample the CADET parameter space with Saltelli’s method,
train one Gaussian-process (GP) emulator per KPI in log-space,
perform cheap Sobol sensitivity analysis on the GP,
keep only the most important parameters (user-defined criterion), and
retrain without rerunning CADET.

It also exposes Surrogate.predict() and Surrogate.predict_var() so you can propagate uncertainty later (e.g. with chromasurr.uq.perform_monte_carlo_uq).

class chromasurr.surrogate.Surrogate(process, param_config, bounds, metrics, n_train=128, *, kernel=None, seed=0)[source]

Bases: object

Gaussian-process surrogate for a CADET chromatography workflow.

Parameters:

process (CADETProcess.processModel.process.Process) – Fully configured process model to emulate.
param_config (dict[str, str]) – Mapping from parameter names to attribute paths on process, e.g. {"ads_rate_A": "flow_sheet.column.binding_model.adsorption_rate[0]"}.
bounds (dict[str, Sequence[float]]) – Lower/upper bounds for each parameter – same keys/order as param_config.
metrics (list[str]) – Keys returned by chromasurr.metrics.extract() to emulate.
n_train (int, default=128) – Saltelli base sample size. Total sims ≈ n_train × (2D + 2).
kernel (Kernel | type | None, optional) – scikit-learn kernel instance or class. Defaults to Matern(1.5)+White.
seed (int, default=0) – RNG seed (NumPy and scikit-learn).

analyze_sensitivity(*, n_samples=1024, metric=None, log_space=True, print_to_console=False)[source]

Compute global Sobol indices via the trained GP.

Parameters:

n_samples (int, default=1024) – Saltelli base sample size for the surrogate evaluation.
metric (str or None, optional) – If None analyse all metrics and store under :pyattr:`sensitivity`; otherwise analyse only the given metric.
log_space (bool, default=True) – Analyse on the GP training scale (True) or exponentiate the predictions first (False).
print_to_console (bool, default=False) – Forward SALib’s textual summary to stdout?

Raises:

RuntimeError – If the surrogate has not been trained yet.

Return type:

None

predict(metric, X)[source]

Predict mean of a KPI on the original scale.

Parameters:

metric (str) – Which KPI (metric name) to return.
X (ndarray, shape (n_samples, n_params)) – Input design.

Returns:

ndarray – Mean predictions on the original (exp) scale.

predict_var(metric, X)[source]

Predict the variance of a KPI on the original scale.

The GP is trained on log-responses. For a log-normal variable Y = exp(Z), with Z ~ N(μ, σ²), the variance is

\[\operatorname{Var}[Y] = (e^{\sigma^2} - 1)\, e^{2\mu + \sigma^2}\]

Parameters:

metric (str) – Which KPI (metric name) to return variance for.
X (ndarray, shape (n_samples, n_params)) – Input design.

Returns:

ndarray – Variance predictions on the original scale.

retrain()[source]

Retrain without new CADET runs, using only top_params.

Return type:: None

select_important_params(*, metric=None, threshold=0.05, n_top=None)[source]

Pick key parameters based on total Sobol indices.

Exactly one of threshold or n_top can be given. The default keeps parameters with ST ≥ 0.05 for the first metric.

Parameters:

metric (str or None, optional) – Which metric’s indices to use; defaults to the first metric.
threshold (float or None, default=0.05) – Keep parameters with ST >= threshold.
n_top (int or None, optional) – If provided, keep exactly the top-n_top parameters by ST.

Returns:

list[str] – The retained parameter names.

Raises:

ValueError – If both or neither of threshold and n_top are provided.
RuntimeError – If analyze_sensitivity() has not been called yet.

train()[source]

Run CADET on a Saltelli design and fit one GP per metric.

The method

draws a Saltelli sample X_full with n_train base samples,
simulates CADET for each sample and extracts metrics,
builds a single sklearn.preprocessing.StandardScaler on the intersection of rows valid for all metrics, and
fits a sklearn.gaussian_process.GaussianProcessRegressor on log(y) for each metric.

Raises:

RuntimeError – If no valid simulations are available to train on.
ValueError – If a particular metric has no finite data.

Return type:

None