cleands.Classification.tree module
Decision-tree–style classifiers (recursive partitioning and random forest).
This module implements a univariate-split recursive partitioning classifier that chooses splits by maximizing validation accuracy and uses a likelihood ratio test (multinomial) to control splitting with a significance threshold. It also provides a simple random-forest–style variant via feature subsampling.
- Classes:
- recursive_partitioning_classifier:
Greedy tree that recursively splits features to maximize accuracy, with optional class weights and a split-significance test.
- Factory Aliases:
- random_forest_classifier:
Partially-applied constructor for randomized feature selection at each node (√p columns), i.e., random-forest–style splits.
- RecursivePartitioningClassifier:
Wrapper for constructing a recursive_partitioning_classifier via ClassificationModel.
- RandomForestClassifier:
Wrapper for constructing the random-forest–style variant via ClassificationModel.
- Typical usage example:
>>> from cleands.Classification.tree import RecursivePartitioningClassifier >>> model = RecursivePartitioningClassifier(x, y, max_level=3) >>> model.tidy; model.glance # via classification_model mixins
- class cleands.Classification.tree.recursive_partitioning_classifier(x, y, sign_level=0.95, max_level=None, random_x=False, level='', classes=None, weights=None)[source]
Bases:
classification_modelRecursive partitioning (decision tree) classifier.
Builds a binary tree by recursively selecting a single feature and a split (threshold or binary split) that maximizes classification accuracy on the current node. Splits are accepted only if a multinomial likelihood-ratio test is sufficiently significant (controlled by sign_level), unless max_level stops recursion earlier. Optionally subsamples features (random_x=True) by taking √p columns at each node (random-forest style).
- Variables:
_col_indx (np.ndarray) – Column indices used at this fit (subsampled when random_x=True), applied to both training and prediction.
max_level (Optional[int]) – Maximum tree depth (None for data-driven).
sign_level (float) – Significance level used to gate new splits.
_level (str) – String encoding of the path from the root (e.g., ‘LLR’).
weights (Optional[np.ndarray]) – Optional per-sample weights.
_split_variable (float|int) – Index of the chosen split feature (w.r.t. the possibly subsampled columns), or np.nan if terminal.
_split_value (float) – Threshold for the chosen feature, or np.nan for binary splits.
_p_value (float) – p-value from the split-significance test at the node.
_right (_left,) – Children nodes.
_terminal_prediction (np.ndarray) – Class probability vector at a leaf (estimated by a multinomial model on node samples).
- Parameters:
classes (int | None)
weights (ndarray | None)
- predict_proba(newx, fitted=False)[source]
Predict class probabilities for new samples by tree traversal.
- Parameters:
newx (np.ndarray) – Feature matrix (n_new, n_feat).
fitted (bool, optional) – If True, assumes newx is already aligned with the column subset _col_indx. If False, applies the same feature selection used at fit time. Defaults to False.
- Returns:
Class probabilities of shape (n_new, n_classes).
- Return type:
np.ndarray
- classify(target, fitted=False)[source]
Predict hard class labels (argmax over probabilities).
- Parameters:
target (np.ndarray) – Feature matrix (n_new, n_feat).
fitted (bool, optional) – See predict_proba for column handling.
- Returns:
Integer class labels of shape (n_new,).
- Return type:
np.ndarray
- property fitted
Return hard labels for the training data (in-sample classification).
- Returns:
Integer class labels of shape (n_obs,).
- Return type:
np.ndarray
- class cleands.Classification.tree.RecursivePartitioningClassifier(formula, data, *args, **kwargs)[source]
Bases:
ClassificationModelConvenience wrapper for recursive partitioning classification.
Fits a decision tree classifier by recursively splitting the feature space to maximize classification accuracy. Provides a formula/DataFrame interface for the
recursive_partitioning_classifier.- Variables:
MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to
recursive_partitioning_classifier.- Parameters:
formula (str)
data (DataFrame)
Example
>>> model = RecursivePartitioningClassifier.from_formula("y ~ x1 + x2", data=df, max_level=3) >>> model.classify(df[["x1", "x2"]]) >>> model.predict_proba(df[["x1", "x2"]])