cleands.Prediction.ensemble module

Ensemble models for prediction tasks.

This module implements bootstrap aggregating (bagging) methods for regression and classification, extending base learners such as least squares, logistic regression, and recursive partitioning (decision trees). It also defines convenience wrappers for random forest–style ensembles.

Classes:

bootstrap_model:: Base container for storing bootstrap resamples of a supervised model.
bagging_least_squares_regressor:: Bagging ensemble of least squares regressors. Produces averaged parameter estimates and predictions with bootstrap variance estimation.
bagging_logistic_regressor:: Bagging ensemble of logistic regressors. Averages coefficients across bootstrap samples to improve stability and predictive performance.
bagging_recursive_partitioning_regressor:: Bagging ensemble of recursive partitioning regressors (decision trees). Supports optional feature subsampling to mimic random forests. Selects the bootstrap tree closest to the full-sample fit for final structure, while predictions are aggregated across bootstraps.

Functions:

random_forest_regressor:: Partial constructor of bagging_recursive_partitioning_regressor with feature subsampling enabled (random_x=True), equivalent to a regression random forest.
BaggingLogisticRegressor:: Partial wrapper that constructs a SupervisedModel around bagging_logistic_regressor for formula-notation usage.
BaggingRecursivePartitioningRegressor:: Partial wrapper for bagging_recursive_partitioning_regressor.
RandomForestRegressor:: Partial wrapper for random_forest_regressor.

Notes

All ensembles rely on the bootstrap utility from utils.py, which generates bootstrap resamples of the training data.
These models are intended as drop-in replacements for their base learners but with improved stability through aggregation.

class cleands.Prediction.ensemble.bootstrap_model(x, y, model_type, seed=None, bootstraps=1000, *args, **kwargs)[source]

Bases: ABC

Generic bootstrap wrapper for supervised models.

Creates and stores a collection of bootstrap-fitted models for a given model_type, providing a common structure you can extend.

Variables:

model_type (Type[supervised_model]) – The underlying model class to fit.
model (supervised_model) – The model fit on the original (non-bootstrapped) data.
seed (Optional[int]) – Seed for reproducibility of bootstrap sampling.
n_boot (int) – Number of bootstrap resamples.
bootstraps (list[supervised_model]) – List of models fit on bootstrap samples.

Parameters:

x (ndarray)
y (ndarray)
model_type (Type[supervised_model])
seed (int | None)
bootstraps (int)

class cleands.Prediction.ensemble.bagging_least_squares_regressor(x, y, seed=None, bootstraps=1000)[source]

Bases: least_squares_regressor

Bagged OLS regressor.

Trains many OLS models on bootstrap resamples and aggregates: - params is set to the mean of bootstrap coefficient vectors. - Predictions are the mean of bootstrap predictions. - vcov_params is the bootstrap covariance of coefficients.

Parameters:: y (ndarray)

predict(newx)[source]

Predict by averaging bootstrap model predictions.

Parameters:: newx (np.ndarray) – Feature matrix for prediction.
Returns:: Mean prediction across bootstraps.
Return type:: np.ndarray

property vcov_params

Bootstrap covariance of parameter estimates.

Returns:: (p x p) covariance matrix from bootstrap params.
Return type:: np.ndarray

class cleands.Prediction.ensemble.bagging_logistic_regressor(x, y, seed=None, bootstraps=1000)[source]

Bases: logistic_regressor

Bagged logistic regressor.

Trains many logistic models on bootstrap resamples and aggregates: - params is the mean of bootstrap coefficient vectors. - Predictions are the mean of bootstrap probability predictions. - vcov_params is the bootstrap covariance of coefficients.

Parameters:: y (ndarray)

predict(target)[source]

Predict by averaging bootstrap probability predictions.

Parameters:: target (np.ndarray) – Feature matrix for prediction.
Returns:: Mean predicted probabilities across bootstraps.
Return type:: np.ndarray

property vcov_params

Bootstrap covariance of parameter estimates.

Returns:: (p x p) covariance matrix from bootstrap params.
Return type:: np.ndarray

class cleands.Prediction.ensemble.bagging_recursive_partitioning_regressor(x, y, seed=None, bootstraps=1000, sign_level=0.95, max_level=None, random_x=False)[source]

Bases: recursive_partitioning_regressor

Bagged regression trees (recursive partitioning).

Fits multiple trees on bootstrap samples, then chooses the single tree whose fitted values are closest (MSE) to the bagged fit; the chosen tree’s learned structure is copied onto self for efficient prediction.

Notes

If random_x=True via the random_forest_regressor partial, each tree samples a subset of features at each split (random subspace), similar to Random Forests.

predict(newx, fitted=False)[source]

Predict by averaging over bootstrap trees (or on original feature order if fitted=True).

Parameters:

newx (np.ndarray) – Feature matrix for prediction.
fitted (bool) – If True, newx is assumed already aligned to the selected tree’s internal column order; otherwise, alignment is handled.

Returns:

Mean prediction across bootstrap trees.

Return type:

np.ndarray

class cleands.Prediction.ensemble.random_forest_regressor(x, y, seed=None, bootstraps=1000, sign_level=0.95, max_level=None, random_x=False)[source]

Bases: bagging_recursive_partitioning_regressor

Random Forest regressor using recursive partitioning trees.

This class implements a random forest by combining multiple recursive partitioning trees with bootstrapping and feature sub-sampling. At each split, only a random subset of predictors is considered, reducing correlation between trees.

Parameters:

x (np.ndarray) – Training feature matrix of shape (n_obs, n_feat).
y (np.ndarray) – Training response vector of shape (n_obs,).
seed (int, optional) – Random seed for reproducibility.
bootstraps (int, optional) – Number of bootstrap resamples. Defaults to 1000.
sign_level (float, optional) – Significance level for splitting. Defaults to 0.95.
max_level (int, optional) – Maximum tree depth. If None, grows until no valid splits remain.

Inherits:: bagging_recursive_partitioning_regressor: Provides the ensemble logic and base tree structure.

class cleands.Prediction.ensemble.BaggingLogisticRegressor(formula, data, *args, **kwargs)[source]

Bases: PredictionModel

Convenience wrapper for bagging logistic regression.

This class applies the unified PredictionModel interface to the bagging_logistic_regressor, enabling construction from formulas and DataFrames.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to bagging_logistic_regressor.

Parameters:

formula (str)
data (DataFrame)

Example

>>> model = BaggingLogisticRegressor.from_formula("y ~ x1 + x2", data=df)
>>> model.predict(df[["x1", "x2"]])

class cleands.Prediction.ensemble.BaggingRecursivePartitioningRegressor(formula, data, *args, **kwargs)[source]

Bases: PredictionModel

Convenience wrapper for bagging recursive partitioning regression.

Provides a formula/DataFrame interface for the bagging_recursive_partitioning_regressor.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to bagging_recursive_partitioning_regressor.

Parameters:

formula (str)
data (DataFrame)

Example

>>> model = BaggingRecursivePartitioningRegressor.from_formula("y ~ x1 + x2", data=df)
>>> model.predict(df[["x1", "x2"]])

class cleands.Prediction.ensemble.RandomForestRegressor(formula, data, *args, **kwargs)[source]

Bases: PredictionModel

Convenience wrapper for random forest regression.

Provides a formula/DataFrame interface for the random_forest_regressor, which implements an ensemble of recursive partitioning trees with random feature sub-sampling.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to random_forest_regressor.

Parameters:

formula (str)
data (DataFrame)

Example

>>> model = RandomForestRegressor.from_formula("y ~ x1 + x2 + x3", data=df)
>>> model.predict(df[["x1", "x2", "x3"]])