cleands.Classification.ensemble module

Ensemble classification models.

This module provides ensemble classifiers for improved predictive accuracy and stability, using bootstrap aggregation (bagging) and randomization strategies.

Classes:
bagging_logistic_classifier:

Ensemble of logistic classifiers fit on bootstrap samples.

bagging_recursive_partitioning_classifier:

Ensemble of recursive partitioning classifiers (decision trees).

Factory Aliases:
random_forest_classifier:

Partially-applied constructor for randomized recursive partitioning (random forest-style) classifier.

BaggingLogisticClassifier:

Wrapper for constructing a bagged logistic classifier via ClassificationModel.

RandomForestClassifier:

Wrapper for constructing a random forest classifier via ClassificationModel.

BaggingRecursivePartitioningClassifier:

Wrapper for constructing a bagged tree classifier via ClassificationModel.

class cleands.Classification.ensemble.bagging_logistic_classifier(x, y, probability=0.5, seed=None, bootstraps=1000)[source]

Bases: logistic_classifier

Bagged ensemble of logistic classifiers.

Trains multiple logistic classifiers on bootstrap resamples of the data, then aggregates their predictions for improved robustness and variance reduction.

Variables:
  • seed (int, optional) – Random seed for reproducibility.

  • n_boot (int) – Number of bootstrap samples.

  • bootstraps (list[logistic_classifier]) – List of fitted base classifiers.

  • bootstrap_params (np.ndarray) – Matrix of parameter estimates across bootstraps.

  • params (np.ndarray) – Averaged parameter estimates across bootstraps.

  • model (abstract_logistic_regressor) – Aggregated logistic regression model.

Parameters:
  • probability (float)

  • seed (int)

  • bootstraps (int)

predict_proba(newx)[source]

Predict class probabilities by majority vote from bootstrapped models.

Parameters:

newx (np.ndarray or pd.DataFrame) – Feature matrix of shape (n_samples, n_features).

Returns:

Predicted probabilities of shape (n_samples, 2).

Column 0 = probability of class 0, Column 1 = probability of class 1.

Return type:

np.ndarray

class cleands.Classification.ensemble.bagging_recursive_partitioning_classifier(x, y, seed=None, bootstraps=1000, sign_level=0.95, max_level=2, random_x=False, weights=None)[source]

Bases: recursive_partitioning_classifier

Bagged ensemble of recursive partitioning classifiers (decision trees).

Builds multiple trees on bootstrap resamples and selects the best-performing model (closest to ensemble fit) for stability. Can also be randomized to emulate random forest behavior.

Variables:
  • seed (int, optional) – Random seed for reproducibility.

  • n_boot (int) – Number of bootstrap samples.

  • bootstraps (list[recursive_partitioning_classifier]) – List of fitted base trees.

Parameters:
  • seed (int)

  • bootstraps (int)

  • sign_level (float)

  • max_level (int)

  • random_x (bool)

  • weights (ndarray)

predict_proba(target, fitted=False)[source]

Predict class probabilities by aggregating bootstrap trees.

Parameters:
  • target (np.ndarray or pd.DataFrame) – Feature matrix of shape (n_samples, n_features).

  • fitted (bool, optional) – If True, use fitted model attributes. Defaults to False.

Returns:

Predicted class probabilities of shape (n_samples, n_classes).

Return type:

np.ndarray

class cleands.Classification.ensemble.random_forest_classifier(x, y, seed=None, bootstraps=1000, sign_level=0.95, max_level=2, random_x=False, weights=None)[source]

Bases: bagging_recursive_partitioning_classifier

Random Forest classifier using recursive partitioning trees.

This class implements a random forest by combining multiple recursive partitioning classifiers with bootstrapping and random feature sub-sampling. At each split, only a random subset of predictors is considered, reducing correlation between trees.

Parameters:
  • x (np.ndarray) – Training feature matrix of shape (n_obs, n_feat).

  • y (np.ndarray) – Training class labels of shape (n_obs,).

  • seed (int, optional) – Random seed for reproducibility.

  • bootstraps (int, optional) – Number of bootstrap resamples. Defaults to 1000.

  • sign_level (float, optional) – Significance level for splitting. Defaults to 0.95.

  • max_level (int, optional) – Maximum tree depth. Defaults to 2.

  • weights (np.ndarray, optional) – Optional observation weights.

  • random_x (bool)

Inherits:

bagging_recursive_partitioning_classifier: Provides the ensemble logic and base classification tree structure.

class cleands.Classification.ensemble.BaggingLogisticClassifier(formula, data, *args, **kwargs)[source]

Bases: ClassificationModel

Convenience wrapper for bagging logistic classification.

Applies the unified ClassificationModel interface to the bagging_logistic_classifier, enabling construction from formulas and DataFrames.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to bagging_logistic_classifier.

Parameters:
  • formula (str)

  • data (DataFrame)

Example

>>> model = BaggingLogisticClassifier.from_formula("y ~ x1 + x2", data=df)
>>> model.classify(df[["x1", "x2"]])
class cleands.Classification.ensemble.BaggingRecursivePartitioningClassifier(formula, data, *args, **kwargs)[source]

Bases: ClassificationModel

Convenience wrapper for bagging recursive partitioning classification.

Provides a formula/DataFrame interface for the bagging_recursive_partitioning_classifier.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to bagging_recursive_partitioning_classifier.

Parameters:
  • formula (str)

  • data (DataFrame)

Example

>>> model = BaggingRecursivePartitioningClassifier.from_formula("y ~ x1 + x2", data=df, max_level=3)
>>> model.classify(df[["x1", "x2"]])
class cleands.Classification.ensemble.RandomForestClassifier(formula, data, *args, **kwargs)[source]

Bases: ClassificationModel

Convenience wrapper for random forest classification.

Provides a formula/DataFrame interface for the random_forest_classifier, which implements an ensemble of recursive partitioning classifiers with random feature sub-sampling.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to random_forest_classifier.

Parameters:
  • formula (str)

  • data (DataFrame)

Example

>>> model = RandomForestClassifier.from_formula("y ~ x1 + x2 + x3", data=df)
>>> model.classify(df[["x1", "x2", "x3"]])