cleands.Classification.tree module

Decision-tree–style classifiers (recursive partitioning and random forest).

This module implements a univariate-split recursive partitioning classifier that chooses splits by maximizing validation accuracy and uses a likelihood ratio test (multinomial) to control splitting with a significance threshold. It also provides a simple random-forest–style variant via feature subsampling.

Classes:

recursive_partitioning_classifier:: Greedy tree that recursively splits features to maximize accuracy, with optional class weights and a split-significance test.

Factory Aliases:

random_forest_classifier:: Partially-applied constructor for randomized feature selection at each node (√p columns), i.e., random-forest–style splits.
RecursivePartitioningClassifier:: Wrapper for constructing a recursive_partitioning_classifier via ClassificationModel.
RandomForestClassifier:: Wrapper for constructing the random-forest–style variant via ClassificationModel.

Typical usage example:

>>> from cleands.Classification.tree import RecursivePartitioningClassifier
>>> model = RecursivePartitioningClassifier(x, y, max_level=3)
>>> model.tidy; model.glance  # via classification_model mixins

class cleands.Classification.tree.recursive_partitioning_classifier(x, y, sign_level=0.95, max_level=None, random_x=False, level='', classes=None, weights=None)[source]

Bases: classification_model

Recursive partitioning (decision tree) classifier.

Builds a binary tree by recursively selecting a single feature and a split (threshold or binary split) that maximizes classification accuracy on the current node. Splits are accepted only if a multinomial likelihood-ratio test is sufficiently significant (controlled by sign_level), unless max_level stops recursion earlier. Optionally subsamples features (random_x=True) by taking √p columns at each node (random-forest style).

Variables:

_col_indx (np.ndarray) – Column indices used at this fit (subsampled when random_x=True), applied to both training and prediction.
max_level (Optional[int]) – Maximum tree depth (None for data-driven).
sign_level (float) – Significance level used to gate new splits.
_level (str) – String encoding of the path from the root (e.g., ‘LLR’).
weights (Optional[np.ndarray]) – Optional per-sample weights.
_split_variable (float|int) – Index of the chosen split feature (w.r.t. the possibly subsampled columns), or np.nan if terminal.
_split_value (float) – Threshold for the chosen feature, or np.nan for binary splits.
_p_value (float) – p-value from the split-significance test at the node.
_right (_left,) – Children nodes.
_terminal_prediction (np.ndarray) – Class probability vector at a leaf (estimated by a multinomial model on node samples).

Parameters:

classes (int | None)
weights (ndarray | None)

predict_proba(newx, fitted=False)[source]

Predict class probabilities for new samples by tree traversal.

Parameters:

newx (np.ndarray) – Feature matrix (n_new, n_feat).
fitted (bool, optional) – If True, assumes newx is already aligned with the column subset _col_indx. If False, applies the same feature selection used at fit time. Defaults to False.

Returns:

Class probabilities of shape (n_new, n_classes).

Return type:

np.ndarray

classify(target, fitted=False)[source]

Predict hard class labels (argmax over probabilities).

Parameters:

target (np.ndarray) – Feature matrix (n_new, n_feat).
fitted (bool, optional) – See predict_proba for column handling.

Returns:

Integer class labels of shape (n_new,).

Return type:

np.ndarray

property fitted

Return hard labels for the training data (in-sample classification).

Returns:: Integer class labels of shape (n_obs,).
Return type:: np.ndarray

class cleands.Classification.tree.RecursivePartitioningClassifier(formula, data, *args, **kwargs)[source]

Bases: ClassificationModel

Convenience wrapper for recursive partitioning classification.

Fits a decision tree classifier by recursively splitting the feature space to maximize classification accuracy. Provides a formula/DataFrame interface for the recursive_partitioning_classifier.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to recursive_partitioning_classifier.

Parameters:

formula (str)
data (DataFrame)

Example

>>> model = RecursivePartitioningClassifier.from_formula("y ~ x1 + x2", data=df, max_level=3)
>>> model.classify(df[["x1", "x2"]])
>>> model.predict_proba(df[["x1", "x2"]])