cleands.Classification.ensemble module
Ensemble classification models.
This module provides ensemble classifiers for improved predictive accuracy and stability, using bootstrap aggregation (bagging) and randomization strategies.
- Classes:
- bagging_logistic_classifier:
Ensemble of logistic classifiers fit on bootstrap samples.
- bagging_recursive_partitioning_classifier:
Ensemble of recursive partitioning classifiers (decision trees).
- Factory Aliases:
- random_forest_classifier:
Partially-applied constructor for randomized recursive partitioning (random forest-style) classifier.
- BaggingLogisticClassifier:
Wrapper for constructing a bagged logistic classifier via ClassificationModel.
- RandomForestClassifier:
Wrapper for constructing a random forest classifier via ClassificationModel.
- BaggingRecursivePartitioningClassifier:
Wrapper for constructing a bagged tree classifier via ClassificationModel.
- class cleands.Classification.ensemble.bagging_logistic_classifier(x, y, probability=0.5, seed=None, bootstraps=1000)[source]
Bases:
logistic_classifierBagged ensemble of logistic classifiers.
Trains multiple logistic classifiers on bootstrap resamples of the data, then aggregates their predictions for improved robustness and variance reduction.
- Variables:
seed (int, optional) – Random seed for reproducibility.
n_boot (int) – Number of bootstrap samples.
bootstraps (list[logistic_classifier]) – List of fitted base classifiers.
bootstrap_params (np.ndarray) – Matrix of parameter estimates across bootstraps.
params (np.ndarray) – Averaged parameter estimates across bootstraps.
model (abstract_logistic_regressor) – Aggregated logistic regression model.
- Parameters:
probability (float)
seed (int)
bootstraps (int)
- predict_proba(newx)[source]
Predict class probabilities by majority vote from bootstrapped models.
- Parameters:
newx (np.ndarray or pd.DataFrame) – Feature matrix of shape (n_samples, n_features).
- Returns:
- Predicted probabilities of shape (n_samples, 2).
Column 0 = probability of class 0, Column 1 = probability of class 1.
- Return type:
np.ndarray
- class cleands.Classification.ensemble.bagging_recursive_partitioning_classifier(x, y, seed=None, bootstraps=1000, sign_level=0.95, max_level=2, random_x=False, weights=None)[source]
Bases:
recursive_partitioning_classifierBagged ensemble of recursive partitioning classifiers (decision trees).
Builds multiple trees on bootstrap resamples and selects the best-performing model (closest to ensemble fit) for stability. Can also be randomized to emulate random forest behavior.
- Variables:
seed (int, optional) – Random seed for reproducibility.
n_boot (int) – Number of bootstrap samples.
bootstraps (list[recursive_partitioning_classifier]) – List of fitted base trees.
- Parameters:
seed (int)
bootstraps (int)
sign_level (float)
max_level (int)
random_x (bool)
weights (ndarray)
- predict_proba(target, fitted=False)[source]
Predict class probabilities by aggregating bootstrap trees.
- Parameters:
target (np.ndarray or pd.DataFrame) – Feature matrix of shape (n_samples, n_features).
fitted (bool, optional) – If True, use fitted model attributes. Defaults to False.
- Returns:
Predicted class probabilities of shape (n_samples, n_classes).
- Return type:
np.ndarray
- class cleands.Classification.ensemble.random_forest_classifier(x, y, seed=None, bootstraps=1000, sign_level=0.95, max_level=2, random_x=False, weights=None)[source]
Bases:
bagging_recursive_partitioning_classifierRandom Forest classifier using recursive partitioning trees.
This class implements a random forest by combining multiple recursive partitioning classifiers with bootstrapping and random feature sub-sampling. At each split, only a random subset of predictors is considered, reducing correlation between trees.
- Parameters:
x (np.ndarray) – Training feature matrix of shape (n_obs, n_feat).
y (np.ndarray) – Training class labels of shape (n_obs,).
seed (int, optional) – Random seed for reproducibility.
bootstraps (int, optional) – Number of bootstrap resamples. Defaults to 1000.
sign_level (float, optional) – Significance level for splitting. Defaults to 0.95.
max_level (int, optional) – Maximum tree depth. Defaults to 2.
weights (np.ndarray, optional) – Optional observation weights.
random_x (bool)
- Inherits:
bagging_recursive_partitioning_classifier: Provides the ensemble logic and base classification tree structure.
- class cleands.Classification.ensemble.BaggingLogisticClassifier(formula, data, *args, **kwargs)[source]
Bases:
ClassificationModelConvenience wrapper for bagging logistic classification.
Applies the unified
ClassificationModelinterface to thebagging_logistic_classifier, enabling construction from formulas and DataFrames.- Variables:
MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to
bagging_logistic_classifier.- Parameters:
formula (str)
data (DataFrame)
Example
>>> model = BaggingLogisticClassifier.from_formula("y ~ x1 + x2", data=df) >>> model.classify(df[["x1", "x2"]])
- class cleands.Classification.ensemble.BaggingRecursivePartitioningClassifier(formula, data, *args, **kwargs)[source]
Bases:
ClassificationModelConvenience wrapper for bagging recursive partitioning classification.
Provides a formula/DataFrame interface for the
bagging_recursive_partitioning_classifier.- Variables:
MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to
bagging_recursive_partitioning_classifier.- Parameters:
formula (str)
data (DataFrame)
Example
>>> model = BaggingRecursivePartitioningClassifier.from_formula("y ~ x1 + x2", data=df, max_level=3) >>> model.classify(df[["x1", "x2"]])
- class cleands.Classification.ensemble.RandomForestClassifier(formula, data, *args, **kwargs)[source]
Bases:
ClassificationModelConvenience wrapper for random forest classification.
Provides a formula/DataFrame interface for the
random_forest_classifier, which implements an ensemble of recursive partitioning classifiers with random feature sub-sampling.- Variables:
MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to
random_forest_classifier.- Parameters:
formula (str)
data (DataFrame)
Example
>>> model = RandomForestClassifier.from_formula("y ~ x1 + x2 + x3", data=df) >>> model.classify(df[["x1", "x2", "x3"]])