cleands.Prediction.tree module
tree.py
Implements recursive partitioning regression trees for prediction tasks.
This module provides:
recursive_partitioning_regressor: A regression tree learner that recursively splits predictors to minimize residual sum of squares, with optional statistical stopping rules.
Factory partials (RecursivePartitioningRegressor, RandomForestRegressor) to wrap models into the PredictionModel interface for pandas/formula compatibility.
- Features:
Greedy splitting based on residual sum of squares (RSS).
Statistical significance pruning with p-value thresholds.
Depth control via max_level.
Support for random feature subsets at each split (for random forest–style trees).
Integration with the broader framework (PredictionModel) for .tidy and .glance.
Example
>>> model = RecursivePartitioningRegressor(
... x_vars=["x1", "x2"], y_var="y", data=df, sign_level=0.95, max_level=3
... )
>>> predictions = model.predict(df)
>>> summary = model.glance
>>> frame = model.tidy
- class cleands.Prediction.tree.recursive_partitioning_regressor(x, y, sign_level=0.95, max_level=None, random_x=False, level='')[source]
Bases:
prediction_modelRecursive partitioning regression tree.
Splits data into subgroups to minimize residual sum of squares (RSS). Supports significance-based stopping and random feature subsets.
- Parameters:
x (np.ndarray) – Predictor matrix of shape (n_obs, n_features).
y (np.ndarray) – Target vector of shape (n_obs,).
sign_level (float, optional) – Significance threshold for stopping. Default 0.95.
max_level (int | None, optional) – Maximum depth of the tree. Default None.
random_x (bool, optional) – If True, use random feature subsets (RF-style). Default False.
level (str, optional) – Internal recursion marker (L/R path string). Default ‘’.
- predict(newx, fitted=False)[source]
Predict target values for new data.
- Parameters:
newx (np.ndarray) – New predictor data.
fitted (bool) – If True, assume newx already uses training column subset. Default False.
- Returns:
Predicted values.
- Return type:
np.ndarray
- property fitted
Return fitted values (in-sample predictions).
- Returns:
Predictions for training data.
- Return type:
np.ndarray
- property tidy
Structured node summary of the tree.
- Returns:
Node-wise information including levels, splits, and statistics.
- Return type:
pd.DataFrame
- property glance
One-row summary of overall tree performance.
- Returns:
Contains RSS, MSE, RMSE, R², and degrees of freedom.
- Return type:
pd.DataFrame
- class cleands.Prediction.tree.RecursivePartitioningRegressor(formula, data, *args, **kwargs)[source]
Bases:
PredictionModelConvenience wrapper for recursive partitioning regression.
This model fits a regression tree by recursively splitting the predictor space to minimize residual sum of squares. It provides a formula/DataFrame interface for the
recursive_partitioning_regressor.- Variables:
MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to
recursive_partitioning_regressor.- Parameters:
formula (str)
data (DataFrame)
Example
>>> model = RecursivePartitioningRegressor.from_formula("y ~ x1 + x2", data=df, max_level=3) >>> model.predict(df[["x1", "x2"]])