cleands.Classification.tree module

Decision-tree–style classifiers (recursive partitioning and random forest).

This module implements a univariate-split recursive partitioning classifier that chooses splits by maximizing validation accuracy and uses a likelihood ratio test (multinomial) to control splitting with a significance threshold. It also provides a simple random-forest–style variant via feature subsampling.

Classes:
recursive_partitioning_classifier:

Greedy tree that recursively splits features to maximize accuracy, with optional class weights and a split-significance test.

Factory Aliases:
random_forest_classifier:

Partially-applied constructor for randomized feature selection at each node (√p columns), i.e., random-forest–style splits.

RecursivePartitioningClassifier:

Wrapper for constructing a recursive_partitioning_classifier via ClassificationModel.

RandomForestClassifier:

Wrapper for constructing the random-forest–style variant via ClassificationModel.

Typical usage example:
>>> from cleands.Classification.tree import RecursivePartitioningClassifier
>>> model = RecursivePartitioningClassifier(x, y, max_level=3)
>>> model.tidy; model.glance  # via classification_model mixins
class cleands.Classification.tree.recursive_partitioning_classifier(x, y, sign_level=0.95, max_level=None, random_x=False, level='', classes=None, weights=None)[source]

Bases: classification_model

Recursive partitioning (decision tree) classifier.

Builds a binary tree by recursively selecting a single feature and a split (threshold or binary split) that maximizes classification accuracy on the current node. Splits are accepted only if a multinomial likelihood-ratio test is sufficiently significant (controlled by sign_level), unless max_level stops recursion earlier. Optionally subsamples features (random_x=True) by taking √p columns at each node (random-forest style).

Variables:
  • _col_indx (np.ndarray) – Column indices used at this fit (subsampled when random_x=True), applied to both training and prediction.

  • max_level (Optional[int]) – Maximum tree depth (None for data-driven).

  • sign_level (float) – Significance level used to gate new splits.

  • _level (str) – String encoding of the path from the root (e.g., ‘LLR’).

  • weights (Optional[np.ndarray]) – Optional per-sample weights.

  • _split_variable (float|int) – Index of the chosen split feature (w.r.t. the possibly subsampled columns), or np.nan if terminal.

  • _split_value (float) – Threshold for the chosen feature, or np.nan for binary splits.

  • _p_value (float) – p-value from the split-significance test at the node.

  • _right (_left,) – Children nodes.

  • _terminal_prediction (np.ndarray) – Class probability vector at a leaf (estimated by a multinomial model on node samples).

Parameters:
  • classes (int | None)

  • weights (ndarray | None)

predict_proba(newx, fitted=False)[source]

Predict class probabilities for new samples by tree traversal.

Parameters:
  • newx (np.ndarray) – Feature matrix (n_new, n_feat).

  • fitted (bool, optional) – If True, assumes newx is already aligned with the column subset _col_indx. If False, applies the same feature selection used at fit time. Defaults to False.

Returns:

Class probabilities of shape (n_new, n_classes).

Return type:

np.ndarray

classify(target, fitted=False)[source]

Predict hard class labels (argmax over probabilities).

Parameters:
  • target (np.ndarray) – Feature matrix (n_new, n_feat).

  • fitted (bool, optional) – See predict_proba for column handling.

Returns:

Integer class labels of shape (n_new,).

Return type:

np.ndarray

property fitted

Return hard labels for the training data (in-sample classification).

Returns:

Integer class labels of shape (n_obs,).

Return type:

np.ndarray

class cleands.Classification.tree.RecursivePartitioningClassifier(formula, data, *args, **kwargs)[source]

Bases: ClassificationModel

Convenience wrapper for recursive partitioning classification.

Fits a decision tree classifier by recursively splitting the feature space to maximize classification accuracy. Provides a formula/DataFrame interface for the recursive_partitioning_classifier.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to recursive_partitioning_classifier.

Parameters:
  • formula (str)

  • data (DataFrame)

Example

>>> model = RecursivePartitioningClassifier.from_formula("y ~ x1 + x2", data=df, max_level=3)
>>> model.classify(df[["x1", "x2"]])
>>> model.predict_proba(df[["x1", "x2"]])