cleands.Prediction.tree module

tree.py

Implements recursive partitioning regression trees for prediction tasks.

This module provides:

  • recursive_partitioning_regressor: A regression tree learner that recursively splits predictors to minimize residual sum of squares, with optional statistical stopping rules.

  • Factory partials (RecursivePartitioningRegressor, RandomForestRegressor) to wrap models into the PredictionModel interface for pandas/formula compatibility.

Features:
  • Greedy splitting based on residual sum of squares (RSS).

  • Statistical significance pruning with p-value thresholds.

  • Depth control via max_level.

  • Support for random feature subsets at each split (for random forest–style trees).

  • Integration with the broader framework (PredictionModel) for .tidy and .glance.

Example

>>> model = RecursivePartitioningRegressor(
...     x_vars=["x1", "x2"], y_var="y", data=df, sign_level=0.95, max_level=3
... )
>>> predictions = model.predict(df)
>>> summary = model.glance
>>> frame = model.tidy
class cleands.Prediction.tree.recursive_partitioning_regressor(x, y, sign_level=0.95, max_level=None, random_x=False, level='')[source]

Bases: prediction_model

Recursive partitioning regression tree.

Splits data into subgroups to minimize residual sum of squares (RSS). Supports significance-based stopping and random feature subsets.

Parameters:
  • x (np.ndarray) – Predictor matrix of shape (n_obs, n_features).

  • y (np.ndarray) – Target vector of shape (n_obs,).

  • sign_level (float, optional) – Significance threshold for stopping. Default 0.95.

  • max_level (int | None, optional) – Maximum depth of the tree. Default None.

  • random_x (bool, optional) – If True, use random feature subsets (RF-style). Default False.

  • level (str, optional) – Internal recursion marker (L/R path string). Default ‘’.

predict(newx, fitted=False)[source]

Predict target values for new data.

Parameters:
  • newx (np.ndarray) – New predictor data.

  • fitted (bool) – If True, assume newx already uses training column subset. Default False.

Returns:

Predicted values.

Return type:

np.ndarray

property fitted

Return fitted values (in-sample predictions).

Returns:

Predictions for training data.

Return type:

np.ndarray

property tidy

Structured node summary of the tree.

Returns:

Node-wise information including levels, splits, and statistics.

Return type:

pd.DataFrame

property glance

One-row summary of overall tree performance.

Returns:

Contains RSS, MSE, RMSE, R², and degrees of freedom.

Return type:

pd.DataFrame

class cleands.Prediction.tree.RecursivePartitioningRegressor(formula, data, *args, **kwargs)[source]

Bases: PredictionModel

Convenience wrapper for recursive partitioning regression.

This model fits a regression tree by recursively splitting the predictor space to minimize residual sum of squares. It provides a formula/DataFrame interface for the recursive_partitioning_regressor.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to recursive_partitioning_regressor.

Parameters:
  • formula (str)

  • data (DataFrame)

Example

>>> model = RecursivePartitioningRegressor.from_formula("y ~ x1 + x2", data=df, max_level=3)
>>> model.predict(df[["x1", "x2"]])