cleands.Prediction.glm module
- class cleands.Prediction.glm.linear_model(x, y, *args, **kwargs)[source]
Bases:
prediction_model,prediction_likelihood_modelOrdinary least squares linear regression.
- Inherits from:
prediction_model: supervised regression base.
prediction_likelihood_model: provides log-likelihood evaluation.
- Variables:
params (np.ndarray) – Estimated regression coefficients.
- Parameters:
y (ndarray)
- class cleands.Prediction.glm.logistic_regressor(*args, **kwargs)[source]
Bases:
linear_model,variance_modelLogistic regression with Newton–Raphson estimation.
Provides likelihood-based fit, variance-covariance matrix, and pseudo-R² measures (McFadden, Ben-Akiva–Lerman).
- property vcov_params: ndarray
Variance-covariance matrix of parameters.
- evaluate_lnL(pred)[source]
Log-likelihood for Bernoulli outcomes.
- Parameters:
pred (np.ndarray) – Predicted probabilities.
- Returns:
Log-likelihood value.
- Return type:
float
- gradient(coefs)[source]
Gradient of the log-likelihood.
- Parameters:
coefs (ndarray)
- Return type:
ndarray
- hessian(coefs)[source]
Hessian matrix of the log-likelihood.
- Parameters:
coefs (ndarray)
- Return type:
ndarray
- predict(target)[source]
Predict probabilities for new data.
- Parameters:
target (ndarray)
- Return type:
ndarray
- property mcfaddens_r_squared: float
McFadden’s pseudo-R².
- property ben_akiva_lerman_r_squared: float
Ben-Akiva–Lerman pseudo-R².
- marginal_effects(newx=None, average=True)[source]
Compute marginal effects of predictors.
- Parameters:
newx (np.ndarray | pd.DataFrame, optional) – New design matrix. If None, use training data.
average (bool) – If True, return average marginal effects. If False, return case-specific effects.
- Returns:
Marginal effects.
- Return type:
np.ndarray
- class cleands.Prediction.glm.least_squares_regressor(x, y, white=False, hc=3, *args, **kwargs)[source]
Bases:
linear_model,variance_modelOrdinary least squares regression with optional robust SEs.
- Parameters:
y (ndarray)
white (bool)
hc (int)
- property vcov_params
Variance-covariance matrix of parameters.
- class cleands.Prediction.glm.poisson_regressor(x, y, *args, **kwargs)[source]
Bases:
linear_modelPoisson regression for count data.
- Parameters:
y (ndarray)
- property vcov_params: ndarray
Variance-covariance matrix of parameters.
- evaluate_lnL(pred)[source]
Log-likelihood for Poisson-distributed outcomes.
- Parameters:
pred (ndarray)
- Return type:
float
- gradient(coefs)[source]
Gradient of the log-likelihood.
- Parameters:
coefs (ndarray)
- Return type:
ndarray
- cleands.Prediction.glm.backward_stepwise(model, criterion='aic', keep_vars=None, min_features=1, verbose=False)[source]
Perform backward stepwise feature selection.
Iteratively removes features to optimize a model fit according to an information criterion (e.g., AIC, BIC, MSE).
- Parameters:
model (Any) – Model object. Either: - Raw supervised_model with .x and .y. - SupervisedModel wrapper with .x_vars, .y_var, .data, .model_type.
criterion (str) – Model selection criterion (“aic”, “bic”, “mse”, etc.).
keep_vars (list[str]) – Variable names that must not be dropped.
min_features (int) – Minimum number of features to retain.
verbose (bool) – If True, print progress messages.
- Returns:
- A dictionary with:
”model”: The best fitted model.
”selected_features”: List of selected feature names.
”history”: pd.DataFrame with stepwise history.
- Return type:
Dict[str, Any]
- cleands.Prediction.glm.forward_stepwise(model, criterion='aic', keep_vars=None, max_features=None, prefer_intercept=True, verbose=False)[source]
Perform forward stepwise feature selection.
Iteratively adds features to optimize a model according to a selection criterion (e.g., AIC, BIC, MSE). Supports both raw models and SupervisedModel wrappers, with optional intercept preference.
- Parameters:
model (Any) – Model object. Either: - Raw supervised_model with .x and .y. - SupervisedModel wrapper with .x_vars, .y_var, .data, .model_type.
criterion (str) – Model selection criterion (“aic”, “bic”, “mse”, etc.).
keep_vars (List[str] | None) – Variables that must always be included.
max_features (int | None) – Maximum number of features allowed to be selected.
prefer_intercept (bool) – If True, attempt to start with an intercept term (if detected).
verbose (bool) – If True, print progress messages.
- Returns:
- A dictionary with:
”model”: The best fitted model.
”selected_features”: List of selected feature names.
”history”: pd.DataFrame with stepwise history.
- Return type:
Dict[str, Any]
- cleands.Prediction.glm.stepwise(model, direction='both', criterion='aic', keep_vars=None, min_features=1, max_features=None, prefer_intercept=True, verbose=False)[source]
Unified stepwise selection wrapper.
Routes to forward, backward, or both directions and returns the best model by a vote across metrics when direction=”both”.
- Parameters:
model (Any) – Model object. Either: - Raw supervised_model with .x and .y. - SupervisedModel wrapper with .x_vars, .y_var, .data, .model_type.
direction (str) – Stepwise direction: - “forwards”: Forward selection. - “backwards”: Backward elimination. - “both”: Run both and select the better.
criterion (str) – Model selection criterion (“aic”, “bic”, “mse”, etc.).
keep_vars (List[str] | None) – Variable names that must always be included.
min_features (int) – Minimum number of features (for backward).
max_features (int | None) – Maximum number of features (for forward).
prefer_intercept (bool) – If True, prefer/include an intercept where applicable.
verbose (bool) – If True, print selection progress.
- Returns:
- A dictionary with:
”model”: Best fitted model.
”selected_features”: List of chosen features.
”history”: pd.DataFrame of the chosen direction’s history.
”direction_chosen”: One of {“forwards”,”backwards”}.
”comparison”: Dict of per-metric winners (only if direction=”both”).
- Return type:
Dict[str, Any]
- class cleands.Prediction.glm.LeastSquaresRegressor(formula, data, *args, **kwargs)[source]
Bases:
PredictionModelOrdinary least squares (OLS) regression.
A high-level wrapper around
least_squares_regressorthat provides a formula interface and pandas-aware prediction methods. Fits a linear model by minimizing the sum of squared residuals.This class inherits from
PredictionModel, which handles parsing the formula, extracting variables from a DataFrame, and exposing tidy/glance summaries consistent with the rest of the package.Examples
Fit an OLS regression from a formula:
>>> model = LeastSquaresRegressor("y ~ x1 + x2", data=df) >>> model.tidy # coefficient table >>> model.glance # model summary >>> preds = model.predict(df)
- Variables:
MODEL_TYPE (Type[supervised_model]) – The underlying implementation (
least_squares_regressor).formula (str) – Formula string used to specify the model.
x_vars (list[str]) – Predictor variable names.
y_var (str) – Response variable name.
data (pd.DataFrame) – Parsed DataFrame containing predictors and response.
model (least_squares_regressor) – Fitted underlying OLS model.
- Parameters:
formula (str)
data (DataFrame)
- class cleands.Prediction.glm.LogisticRegressor(formula, data, *args, **kwargs)[source]
Bases:
PredictionModelLogistic regression for binary outcomes.
A high-level wrapper around
logistic_regressorthat provides a formula interface and pandas-aware prediction methods. Fits a generalized linear model with a logit link, estimating probabilities for binary response variables.This class inherits from
PredictionModel, which handles parsing the formula, extracting variables from a DataFrame, and exposing tidy/glance summaries consistent with the rest of the package.Examples
Fit a logistic regression model from a formula:
>>> model = LogisticRegressor("y ~ x1 + x2", data=df) >>> model.tidy # coefficient table with log-odds >>> model.glance # model fit summary (AIC, log-likelihood, etc.) >>> probs = model.predict(df) # predicted probabilities
- Variables:
MODEL_TYPE (Type[supervised_model]) – The underlying implementation (
logistic_regressor).formula (str) – Formula string used to specify the model.
x_vars (list[str]) – Predictor variable names.
y_var (str) – Response variable name.
data (pd.DataFrame) – Parsed DataFrame containing predictors and response.
model (logistic_regressor) – Fitted underlying logistic regression model.
- Parameters:
formula (str)
data (DataFrame)
- class cleands.Prediction.glm.PoissonRegressor(formula, data, *args, **kwargs)[source]
Bases:
PredictionModelPoisson regression for count outcomes.
A high-level wrapper around
poisson_regressorthat provides a formula interface and pandas-aware prediction methods. Fits a generalized linear model with a log link, appropriate for count data where the variance is proportional to the mean.This class inherits from
PredictionModel, which handles parsing the formula, extracting variables from a DataFrame, and exposing tidy/glance summaries consistent with the rest of the package.Examples
Fit a Poisson regression model from a formula:
>>> model = PoissonRegressor("y ~ x1 + x2", data=df) >>> model.tidy # coefficient table with log-incidence ratios >>> model.glance # model summary (deviance, AIC, etc.) >>> rates = model.predict(df) # expected counts
- Variables:
MODEL_TYPE (Type[supervised_model]) – The underlying implementation (
poisson_regressor).formula (str) – Formula string used to specify the model.
x_vars (list[str]) – Predictor variable names.
y_var (str) – Response variable name.
data (pd.DataFrame) – Parsed DataFrame containing predictors and response.
model (poisson_regressor) – Fitted underlying Poisson regression model.
- Parameters:
formula (str)
data (DataFrame)