cleands.Prediction.tree module

tree.py

Implements recursive partitioning regression trees for prediction tasks.

This module provides:

recursive_partitioning_regressor: A regression tree learner that recursively splits predictors to minimize residual sum of squares, with optional statistical stopping rules.
Factory partials (RecursivePartitioningRegressor, RandomForestRegressor) to wrap models into the PredictionModel interface for pandas/formula compatibility.

Features:

Greedy splitting based on residual sum of squares (RSS).
Statistical significance pruning with p-value thresholds.
Depth control via max_level.
Support for random feature subsets at each split (for random forest–style trees).
Integration with the broader framework (PredictionModel) for .tidy and .glance.

Example

>>> model = RecursivePartitioningRegressor(
...     x_vars=["x1", "x2"], y_var="y", data=df, sign_level=0.95, max_level=3
... )
>>> predictions = model.predict(df)
>>> summary = model.glance
>>> frame = model.tidy

class cleands.Prediction.tree.recursive_partitioning_regressor(x, y, sign_level=0.95, max_level=None, random_x=False, level='')[source]

Bases: prediction_model

Recursive partitioning regression tree.

Splits data into subgroups to minimize residual sum of squares (RSS). Supports significance-based stopping and random feature subsets.

Parameters:

x (np.ndarray) – Predictor matrix of shape (n_obs, n_features).
y (np.ndarray) – Target vector of shape (n_obs,).
sign_level (float, optional) – Significance threshold for stopping. Default 0.95.
max_level (int | None, optional) – Maximum depth of the tree. Default None.
random_x (bool, optional) – If True, use random feature subsets (RF-style). Default False.
level (str, optional) – Internal recursion marker (L/R path string). Default ‘’.

predict(newx, fitted=False)[source]

Predict target values for new data.

Parameters:

newx (np.ndarray) – New predictor data.
fitted (bool) – If True, assume newx already uses training column subset. Default False.

Returns:

Predicted values.

Return type:

np.ndarray

property fitted

Return fitted values (in-sample predictions).

Returns:: Predictions for training data.
Return type:: np.ndarray

property tidy

Structured node summary of the tree.

Returns:: Node-wise information including levels, splits, and statistics.
Return type:: pd.DataFrame

property glance

One-row summary of overall tree performance.

Returns:: Contains RSS, MSE, RMSE, R², and degrees of freedom.
Return type:: pd.DataFrame

class cleands.Prediction.tree.RecursivePartitioningRegressor(formula, data, *args, **kwargs)[source]

Bases: PredictionModel

Convenience wrapper for recursive partitioning regression.

This model fits a regression tree by recursively splitting the predictor space to minimize residual sum of squares. It provides a formula/DataFrame interface for the recursive_partitioning_regressor.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to recursive_partitioning_regressor.

Parameters:

formula (str)
data (DataFrame)

Example

>>> model = RecursivePartitioningRegressor.from_formula("y ~ x1 + x2", data=df, max_level=3)
>>> model.predict(df[["x1", "x2"]])