cleands.Prediction.glm module

class cleands.Prediction.glm.linear_model(x, y, *args, **kwargs)[source]

Bases: prediction_model, prediction_likelihood_model

Ordinary least squares linear regression.

Inherits from:
  • prediction_model: supervised regression base.

  • prediction_likelihood_model: provides log-likelihood evaluation.

Variables:

params (np.ndarray) – Estimated regression coefficients.

Parameters:

y (ndarray)

predict(newdata)[source]

Predict responses for new data.

Parameters:

newdata (np.ndarray) – Design matrix for prediction.

Returns:

Predicted values.

Return type:

np.ndarray

evaluate_lnL(pred)[source]

Evaluate log-likelihood of predictions under Gaussian errors.

Parameters:

pred (np.ndarray) – Predicted values.

Returns:

Log-likelihood value.

Return type:

float

class cleands.Prediction.glm.logistic_regressor(*args, **kwargs)[source]

Bases: linear_model, variance_model

Logistic regression with Newton–Raphson estimation.

Provides likelihood-based fit, variance-covariance matrix, and pseudo-R² measures (McFadden, Ben-Akiva–Lerman).

property vcov_params: ndarray

Variance-covariance matrix of parameters.

evaluate_lnL(pred)[source]

Log-likelihood for Bernoulli outcomes.

Parameters:

pred (np.ndarray) – Predicted probabilities.

Returns:

Log-likelihood value.

Return type:

float

gradient(coefs)[source]

Gradient of the log-likelihood.

Parameters:

coefs (ndarray)

Return type:

ndarray

hessian(coefs)[source]

Hessian matrix of the log-likelihood.

Parameters:

coefs (ndarray)

Return type:

ndarray

predict(target)[source]

Predict probabilities for new data.

Parameters:

target (ndarray)

Return type:

ndarray

property mcfaddens_r_squared: float

McFadden’s pseudo-R².

property ben_akiva_lerman_r_squared: float

Ben-Akiva–Lerman pseudo-R².

marginal_effects(newx=None, average=True)[source]

Compute marginal effects of predictors.

Parameters:
  • newx (np.ndarray | pd.DataFrame, optional) – New design matrix. If None, use training data.

  • average (bool) – If True, return average marginal effects. If False, return case-specific effects.

Returns:

Marginal effects.

Return type:

np.ndarray

class cleands.Prediction.glm.least_squares_regressor(x, y, white=False, hc=3, *args, **kwargs)[source]

Bases: linear_model, variance_model

Ordinary least squares regression with optional robust SEs.

Parameters:
  • y (ndarray)

  • white (bool)

  • hc (int)

property vcov_params

Variance-covariance matrix of parameters.

class cleands.Prediction.glm.poisson_regressor(x, y, *args, **kwargs)[source]

Bases: linear_model

Poisson regression for count data.

Parameters:

y (ndarray)

property vcov_params: ndarray

Variance-covariance matrix of parameters.

evaluate_lnL(pred)[source]

Log-likelihood for Poisson-distributed outcomes.

Parameters:

pred (ndarray)

Return type:

float

gradient(coefs)[source]

Gradient of the log-likelihood.

Parameters:

coefs (ndarray)

Return type:

ndarray

hessian(coefs)[source]

Hessian matrix of the log-likelihood.

Parameters:

coefs (ndarray)

Return type:

ndarray

predict(target)[source]

Predict expected counts for new data.

Parameters:

target (ndarray)

Return type:

ndarray

cleands.Prediction.glm.backward_stepwise(model, criterion='aic', keep_vars=None, min_features=1, verbose=False)[source]

Perform backward stepwise feature selection.

Iteratively removes features to optimize a model fit according to an information criterion (e.g., AIC, BIC, MSE).

Parameters:
  • model (Any) – Model object. Either: - Raw supervised_model with .x and .y. - SupervisedModel wrapper with .x_vars, .y_var, .data, .model_type.

  • criterion (str) – Model selection criterion (“aic”, “bic”, “mse”, etc.).

  • keep_vars (list[str]) – Variable names that must not be dropped.

  • min_features (int) – Minimum number of features to retain.

  • verbose (bool) – If True, print progress messages.

Returns:

A dictionary with:
  • ”model”: The best fitted model.

  • ”selected_features”: List of selected feature names.

  • ”history”: pd.DataFrame with stepwise history.

Return type:

Dict[str, Any]

cleands.Prediction.glm.forward_stepwise(model, criterion='aic', keep_vars=None, max_features=None, prefer_intercept=True, verbose=False)[source]

Perform forward stepwise feature selection.

Iteratively adds features to optimize a model according to a selection criterion (e.g., AIC, BIC, MSE). Supports both raw models and SupervisedModel wrappers, with optional intercept preference.

Parameters:
  • model (Any) – Model object. Either: - Raw supervised_model with .x and .y. - SupervisedModel wrapper with .x_vars, .y_var, .data, .model_type.

  • criterion (str) – Model selection criterion (“aic”, “bic”, “mse”, etc.).

  • keep_vars (List[str] | None) – Variables that must always be included.

  • max_features (int | None) – Maximum number of features allowed to be selected.

  • prefer_intercept (bool) – If True, attempt to start with an intercept term (if detected).

  • verbose (bool) – If True, print progress messages.

Returns:

A dictionary with:
  • ”model”: The best fitted model.

  • ”selected_features”: List of selected feature names.

  • ”history”: pd.DataFrame with stepwise history.

Return type:

Dict[str, Any]

cleands.Prediction.glm.stepwise(model, direction='both', criterion='aic', keep_vars=None, min_features=1, max_features=None, prefer_intercept=True, verbose=False)[source]

Unified stepwise selection wrapper.

Routes to forward, backward, or both directions and returns the best model by a vote across metrics when direction=”both”.

Parameters:
  • model (Any) – Model object. Either: - Raw supervised_model with .x and .y. - SupervisedModel wrapper with .x_vars, .y_var, .data, .model_type.

  • direction (str) – Stepwise direction: - “forwards”: Forward selection. - “backwards”: Backward elimination. - “both”: Run both and select the better.

  • criterion (str) – Model selection criterion (“aic”, “bic”, “mse”, etc.).

  • keep_vars (List[str] | None) – Variable names that must always be included.

  • min_features (int) – Minimum number of features (for backward).

  • max_features (int | None) – Maximum number of features (for forward).

  • prefer_intercept (bool) – If True, prefer/include an intercept where applicable.

  • verbose (bool) – If True, print selection progress.

Returns:

A dictionary with:
  • ”model”: Best fitted model.

  • ”selected_features”: List of chosen features.

  • ”history”: pd.DataFrame of the chosen direction’s history.

  • ”direction_chosen”: One of {“forwards”,”backwards”}.

  • ”comparison”: Dict of per-metric winners (only if direction=”both”).

Return type:

Dict[str, Any]

class cleands.Prediction.glm.LeastSquaresRegressor(formula, data, *args, **kwargs)[source]

Bases: PredictionModel

Ordinary least squares (OLS) regression.

A high-level wrapper around least_squares_regressor that provides a formula interface and pandas-aware prediction methods. Fits a linear model by minimizing the sum of squared residuals.

This class inherits from PredictionModel, which handles parsing the formula, extracting variables from a DataFrame, and exposing tidy/glance summaries consistent with the rest of the package.

Examples

Fit an OLS regression from a formula:

>>> model = LeastSquaresRegressor("y ~ x1 + x2", data=df)
>>> model.tidy         # coefficient table
>>> model.glance       # model summary
>>> preds = model.predict(df)
Variables:
  • MODEL_TYPE (Type[supervised_model]) – The underlying implementation (least_squares_regressor).

  • formula (str) – Formula string used to specify the model.

  • x_vars (list[str]) – Predictor variable names.

  • y_var (str) – Response variable name.

  • data (pd.DataFrame) – Parsed DataFrame containing predictors and response.

  • model (least_squares_regressor) – Fitted underlying OLS model.

Parameters:
  • formula (str)

  • data (DataFrame)

class cleands.Prediction.glm.LogisticRegressor(formula, data, *args, **kwargs)[source]

Bases: PredictionModel

Logistic regression for binary outcomes.

A high-level wrapper around logistic_regressor that provides a formula interface and pandas-aware prediction methods. Fits a generalized linear model with a logit link, estimating probabilities for binary response variables.

This class inherits from PredictionModel, which handles parsing the formula, extracting variables from a DataFrame, and exposing tidy/glance summaries consistent with the rest of the package.

Examples

Fit a logistic regression model from a formula:

>>> model = LogisticRegressor("y ~ x1 + x2", data=df)
>>> model.tidy          # coefficient table with log-odds
>>> model.glance        # model fit summary (AIC, log-likelihood, etc.)
>>> probs = model.predict(df)   # predicted probabilities
Variables:
  • MODEL_TYPE (Type[supervised_model]) – The underlying implementation (logistic_regressor).

  • formula (str) – Formula string used to specify the model.

  • x_vars (list[str]) – Predictor variable names.

  • y_var (str) – Response variable name.

  • data (pd.DataFrame) – Parsed DataFrame containing predictors and response.

  • model (logistic_regressor) – Fitted underlying logistic regression model.

Parameters:
  • formula (str)

  • data (DataFrame)

class cleands.Prediction.glm.PoissonRegressor(formula, data, *args, **kwargs)[source]

Bases: PredictionModel

Poisson regression for count outcomes.

A high-level wrapper around poisson_regressor that provides a formula interface and pandas-aware prediction methods. Fits a generalized linear model with a log link, appropriate for count data where the variance is proportional to the mean.

This class inherits from PredictionModel, which handles parsing the formula, extracting variables from a DataFrame, and exposing tidy/glance summaries consistent with the rest of the package.

Examples

Fit a Poisson regression model from a formula:

>>> model = PoissonRegressor("y ~ x1 + x2", data=df)
>>> model.tidy          # coefficient table with log-incidence ratios
>>> model.glance        # model summary (deviance, AIC, etc.)
>>> rates = model.predict(df)   # expected counts
Variables:
  • MODEL_TYPE (Type[supervised_model]) – The underlying implementation (poisson_regressor).

  • formula (str) – Formula string used to specify the model.

  • x_vars (list[str]) – Predictor variable names.

  • y_var (str) – Response variable name.

  • data (pd.DataFrame) – Parsed DataFrame containing predictors and response.

  • model (poisson_regressor) – Fitted underlying Poisson regression model.

Parameters:
  • formula (str)

  • data (DataFrame)