cleands.Classification.ensemble module

Ensemble classification models.

This module provides ensemble classifiers for improved predictive accuracy and stability, using bootstrap aggregation (bagging) and randomization strategies.

Classes:

bagging_logistic_classifier:: Ensemble of logistic classifiers fit on bootstrap samples.
bagging_recursive_partitioning_classifier:: Ensemble of recursive partitioning classifiers (decision trees).

Factory Aliases:

random_forest_classifier:: Partially-applied constructor for randomized recursive partitioning (random forest-style) classifier.
BaggingLogisticClassifier:: Wrapper for constructing a bagged logistic classifier via ClassificationModel.
RandomForestClassifier:: Wrapper for constructing a random forest classifier via ClassificationModel.
BaggingRecursivePartitioningClassifier:: Wrapper for constructing a bagged tree classifier via ClassificationModel.

class cleands.Classification.ensemble.bagging_logistic_classifier(x, y, probability=0.5, seed=None, bootstraps=1000)[source]

Bases: logistic_classifier

Bagged ensemble of logistic classifiers.

Trains multiple logistic classifiers on bootstrap resamples of the data, then aggregates their predictions for improved robustness and variance reduction.

Variables:

seed (int, optional) – Random seed for reproducibility.
n_boot (int) – Number of bootstrap samples.
bootstraps (list[logistic_classifier]) – List of fitted base classifiers.
bootstrap_params (np.ndarray) – Matrix of parameter estimates across bootstraps.
params (np.ndarray) – Averaged parameter estimates across bootstraps.
model (abstract_logistic_regressor) – Aggregated logistic regression model.

Parameters:

probability (float)
seed (int)
bootstraps (int)

predict_proba(newx)[source]

Predict class probabilities by majority vote from bootstrapped models.

Parameters:

newx (np.ndarray or pd.DataFrame) – Feature matrix of shape (n_samples, n_features).

Returns:

Predicted probabilities of shape (n_samples, 2).: Column 0 = probability of class 0, Column 1 = probability of class 1.

Return type:

np.ndarray

class cleands.Classification.ensemble.bagging_recursive_partitioning_classifier(x, y, seed=None, bootstraps=1000, sign_level=0.95, max_level=2, random_x=False, weights=None)[source]

Bases: recursive_partitioning_classifier

Bagged ensemble of recursive partitioning classifiers (decision trees).

Builds multiple trees on bootstrap resamples and selects the best-performing model (closest to ensemble fit) for stability. Can also be randomized to emulate random forest behavior.

Variables:

seed (int, optional) – Random seed for reproducibility.
n_boot (int) – Number of bootstrap samples.
bootstraps (list[recursive_partitioning_classifier]) – List of fitted base trees.

Parameters:

seed (int)
bootstraps (int)
sign_level (float)
max_level (int)
random_x (bool)
weights (ndarray)

predict_proba(target, fitted=False)[source]

Predict class probabilities by aggregating bootstrap trees.

Parameters:

target (np.ndarray or pd.DataFrame) – Feature matrix of shape (n_samples, n_features).
fitted (bool, optional) – If True, use fitted model attributes. Defaults to False.

Returns:

Predicted class probabilities of shape (n_samples, n_classes).

Return type:

np.ndarray

class cleands.Classification.ensemble.random_forest_classifier(x, y, seed=None, bootstraps=1000, sign_level=0.95, max_level=2, random_x=False, weights=None)[source]

Bases: bagging_recursive_partitioning_classifier

Random Forest classifier using recursive partitioning trees.

This class implements a random forest by combining multiple recursive partitioning classifiers with bootstrapping and random feature sub-sampling. At each split, only a random subset of predictors is considered, reducing correlation between trees.

Parameters:

x (np.ndarray) – Training feature matrix of shape (n_obs, n_feat).
y (np.ndarray) – Training class labels of shape (n_obs,).
seed (int, optional) – Random seed for reproducibility.
bootstraps (int, optional) – Number of bootstrap resamples. Defaults to 1000.
sign_level (float, optional) – Significance level for splitting. Defaults to 0.95.
max_level (int, optional) – Maximum tree depth. Defaults to 2.
weights (np.ndarray, optional) – Optional observation weights.
random_x (bool)

Inherits:: bagging_recursive_partitioning_classifier: Provides the ensemble logic and base classification tree structure.

class cleands.Classification.ensemble.BaggingLogisticClassifier(formula, data, *args, **kwargs)[source]

Bases: ClassificationModel

Convenience wrapper for bagging logistic classification.

Applies the unified ClassificationModel interface to the bagging_logistic_classifier, enabling construction from formulas and DataFrames.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to bagging_logistic_classifier.

Parameters:

formula (str)
data (DataFrame)

Example

>>> model = BaggingLogisticClassifier.from_formula("y ~ x1 + x2", data=df)
>>> model.classify(df[["x1", "x2"]])

class cleands.Classification.ensemble.BaggingRecursivePartitioningClassifier(formula, data, *args, **kwargs)[source]

Bases: ClassificationModel

Convenience wrapper for bagging recursive partitioning classification.

Provides a formula/DataFrame interface for the bagging_recursive_partitioning_classifier.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to bagging_recursive_partitioning_classifier.

Parameters:

formula (str)
data (DataFrame)

Example

>>> model = BaggingRecursivePartitioningClassifier.from_formula("y ~ x1 + x2", data=df, max_level=3)
>>> model.classify(df[["x1", "x2"]])

class cleands.Classification.ensemble.RandomForestClassifier(formula, data, *args, **kwargs)[source]

Bases: ClassificationModel

Convenience wrapper for random forest classification.

Provides a formula/DataFrame interface for the random_forest_classifier, which implements an ensemble of recursive partitioning classifiers with random feature sub-sampling.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to random_forest_classifier.

Parameters:

formula (str)
data (DataFrame)

Example

>>> model = RandomForestClassifier.from_formula("y ~ x1 + x2 + x3", data=df)
>>> model.classify(df[["x1", "x2", "x3"]])