chromasurr.surrogate module
surrogate.py – Gaussian-process replacement of CADET simulations.
This module provides the Surrogate class that can
sample the CADET parameter space with Saltelli’s method,
train one Gaussian-process (GP) emulator per KPI in log-space,
perform cheap Sobol sensitivity analysis on the GP,
keep only the most important parameters (user-defined criterion), and
retrain without rerunning CADET.
It also exposes Surrogate.predict() and
Surrogate.predict_var() so you can propagate uncertainty later
(e.g. with chromasurr.uq.perform_monte_carlo_uq).
- class chromasurr.surrogate.Surrogate(process, param_config, bounds, metrics, n_train=128, *, kernel=None, seed=0)[source]
Bases:
objectGaussian-process surrogate for a CADET chromatography workflow.
- Parameters:
process (CADETProcess.processModel.process.Process) – Fully configured process model to emulate.
param_config (dict[str, str]) – Mapping from parameter names to attribute paths on
process, e.g.{"ads_rate_A": "flow_sheet.column.binding_model.adsorption_rate[0]"}.bounds (dict[str, Sequence[float]]) – Lower/upper bounds for each parameter – same keys/order as
param_config.metrics (list[str]) – Keys returned by
chromasurr.metrics.extract()to emulate.n_train (int, default=128) – Saltelli base sample size. Total sims ≈
n_train × (2D + 2).kernel (Kernel | type | None, optional) – scikit-learn kernel instance or class. Defaults to
Matern(1.5)+White.seed (int, default=0) – RNG seed (NumPy and scikit-learn).
- analyze_sensitivity(*, n_samples=1024, metric=None, log_space=True, print_to_console=False)[source]
Compute global Sobol indices via the trained GP.
- Parameters:
n_samples (int, default=1024) – Saltelli base sample size for the surrogate evaluation.
metric (str or None, optional) – If
Noneanalyse all metrics and store under :pyattr:`sensitivity`; otherwise analyse only the given metric.log_space (bool, default=True) – Analyse on the GP training scale (
True) or exponentiate the predictions first (False).print_to_console (bool, default=False) – Forward SALib’s textual summary to stdout?
- Raises:
RuntimeError – If the surrogate has not been trained yet.
- Return type:
- predict(metric, X)[source]
Predict mean of a KPI on the original scale.
- Parameters:
metric (str) – Which KPI (metric name) to return.
X (ndarray, shape (n_samples, n_params)) – Input design.
- Returns:
ndarray – Mean predictions on the original (exp) scale.
- predict_var(metric, X)[source]
Predict the variance of a KPI on the original scale.
The GP is trained on log-responses. For a log-normal variable
Y = exp(Z), withZ ~ N(μ, σ²), the variance is\[\operatorname{Var}[Y] = (e^{\sigma^2} - 1)\, e^{2\mu + \sigma^2}\]- Parameters:
metric (str) – Which KPI (metric name) to return variance for.
X (ndarray, shape (n_samples, n_params)) – Input design.
- Returns:
ndarray – Variance predictions on the original scale.
- select_important_params(*, metric=None, threshold=0.05, n_top=None)[source]
Pick key parameters based on total Sobol indices.
Exactly one of threshold or n_top can be given. The default keeps parameters with
ST ≥ 0.05for the first metric.- Parameters:
- Returns:
list[str] – The retained parameter names.
- Raises:
ValueError – If both or neither of
thresholdandn_topare provided.RuntimeError – If
analyze_sensitivity()has not been called yet.
- train()[source]
Run CADET on a Saltelli design and fit one GP per metric.
The method
draws a Saltelli sample
X_fullwith n_train base samples,simulates CADET for each sample and extracts metrics,
builds a single
sklearn.preprocessing.StandardScaleron the intersection of rows valid for all metrics, andfits a
sklearn.gaussian_process.GaussianProcessRegressoronlog(y)for each metric.
- Raises:
RuntimeError – If no valid simulations are available to train on.
ValueError – If a particular metric has no finite data.
- Return type: