cleands.Prediction.glm module

class cleands.Prediction.glm.linear_model(x, y, *args, **kwargs)[source]

Bases: prediction_model, prediction_likelihood_model

Ordinary least squares linear regression.

Inherits from:

prediction_model: supervised regression base.
prediction_likelihood_model: provides log-likelihood evaluation.

Variables:: params (np.ndarray) – Estimated regression coefficients.
Parameters:: y (ndarray)

predict(newdata)[source]

Predict responses for new data.

Parameters:: newdata (np.ndarray) – Design matrix for prediction.
Returns:: Predicted values.
Return type:: np.ndarray

evaluate_lnL(pred)[source]

Evaluate log-likelihood of predictions under Gaussian errors.

Parameters:: pred (np.ndarray) – Predicted values.
Returns:: Log-likelihood value.
Return type:: float

class cleands.Prediction.glm.logistic_regressor(*args, **kwargs)[source]

Bases: linear_model, variance_model

Logistic regression with Newton–Raphson estimation.

Provides likelihood-based fit, variance-covariance matrix, and pseudo-R² measures (McFadden, Ben-Akiva–Lerman).

property vcov_params: ndarray: Variance-covariance matrix of parameters.

evaluate_lnL(pred)[source]

Log-likelihood for Bernoulli outcomes.

Parameters:: pred (np.ndarray) – Predicted probabilities.
Returns:: Log-likelihood value.
Return type:: float

gradient(coefs)[source]

Gradient of the log-likelihood.

Parameters:: coefs (ndarray)
Return type:: ndarray

hessian(coefs)[source]

Hessian matrix of the log-likelihood.

Parameters:: coefs (ndarray)
Return type:: ndarray

predict(target)[source]

Predict probabilities for new data.

Parameters:: target (ndarray)
Return type:: ndarray

property mcfaddens_r_squared: float: McFadden’s pseudo-R².

property ben_akiva_lerman_r_squared: float: Ben-Akiva–Lerman pseudo-R².

marginal_effects(newx=None, average=True)[source]

Compute marginal effects of predictors.

Parameters:

newx (np.ndarray | pd.DataFrame, optional) – New design matrix. If None, use training data.
average (bool) – If True, return average marginal effects. If False, return case-specific effects.

Returns:

Marginal effects.

Return type:

np.ndarray

class cleands.Prediction.glm.least_squares_regressor(x, y, white=False, hc=3, *args, **kwargs)[source]

Bases: linear_model, variance_model

Ordinary least squares regression with optional robust SEs.

Parameters:

y (ndarray)
white (bool)
hc (int)

property vcov_params: Variance-covariance matrix of parameters.

class cleands.Prediction.glm.poisson_regressor(x, y, *args, **kwargs)[source]

Bases: linear_model

Poisson regression for count data.

Parameters:: y (ndarray)

property vcov_params: ndarray: Variance-covariance matrix of parameters.

evaluate_lnL(pred)[source]

Log-likelihood for Poisson-distributed outcomes.

Parameters:: pred (ndarray)
Return type:: float

gradient(coefs)[source]

Gradient of the log-likelihood.

Parameters:: coefs (ndarray)
Return type:: ndarray

hessian(coefs)[source]

Hessian matrix of the log-likelihood.

Parameters:: coefs (ndarray)
Return type:: ndarray

predict(target)[source]

Predict expected counts for new data.

Parameters:: target (ndarray)
Return type:: ndarray

cleands.Prediction.glm.backward_stepwise(model, criterion='aic', keep_vars=None, min_features=1, verbose=False)[source]

Perform backward stepwise feature selection.

Iteratively removes features to optimize a model fit according to an information criterion (e.g., AIC, BIC, MSE).

Parameters:

model (Any) – Model object. Either: - Raw supervised_model with .x and .y. - SupervisedModel wrapper with .x_vars, .y_var, .data, .model_type.
criterion (str) – Model selection criterion (“aic”, “bic”, “mse”, etc.).
keep_vars (list[str]) – Variable names that must not be dropped.
min_features (int) – Minimum number of features to retain.
verbose (bool) – If True, print progress messages.

Returns:

A dictionary with:

”model”: The best fitted model.
”selected_features”: List of selected feature names.
”history”: pd.DataFrame with stepwise history.

Return type:

Dict[str, Any]

cleands.Prediction.glm.forward_stepwise(model, criterion='aic', keep_vars=None, max_features=None, prefer_intercept=True, verbose=False)[source]

Perform forward stepwise feature selection.

Iteratively adds features to optimize a model according to a selection criterion (e.g., AIC, BIC, MSE). Supports both raw models and SupervisedModel wrappers, with optional intercept preference.

Parameters:

model (Any) – Model object. Either: - Raw supervised_model with .x and .y. - SupervisedModel wrapper with .x_vars, .y_var, .data, .model_type.
criterion (str) – Model selection criterion (“aic”, “bic”, “mse”, etc.).
keep_vars (List[str] | None) – Variables that must always be included.
max_features (int | None) – Maximum number of features allowed to be selected.
prefer_intercept (bool) – If True, attempt to start with an intercept term (if detected).
verbose (bool) – If True, print progress messages.

Returns:

A dictionary with:

”model”: The best fitted model.
”selected_features”: List of selected feature names.
”history”: pd.DataFrame with stepwise history.

Return type:

Dict[str, Any]

cleands.Prediction.glm.stepwise(model, direction='both', criterion='aic', keep_vars=None, min_features=1, max_features=None, prefer_intercept=True, verbose=False)[source]

Unified stepwise selection wrapper.

Routes to forward, backward, or both directions and returns the best model by a vote across metrics when direction=”both”.

Parameters:

model (Any) – Model object. Either: - Raw supervised_model with .x and .y. - SupervisedModel wrapper with .x_vars, .y_var, .data, .model_type.
direction (str) – Stepwise direction: - “forwards”: Forward selection. - “backwards”: Backward elimination. - “both”: Run both and select the better.
criterion (str) – Model selection criterion (“aic”, “bic”, “mse”, etc.).
keep_vars (List[str] | None) – Variable names that must always be included.
min_features (int) – Minimum number of features (for backward).
max_features (int | None) – Maximum number of features (for forward).
prefer_intercept (bool) – If True, prefer/include an intercept where applicable.
verbose (bool) – If True, print selection progress.

Returns:

A dictionary with:

”model”: Best fitted model.
”selected_features”: List of chosen features.
”history”: pd.DataFrame of the chosen direction’s history.
”direction_chosen”: One of {“forwards”,”backwards”}.
”comparison”: Dict of per-metric winners (only if direction=”both”).

Return type:

Dict[str, Any]

class cleands.Prediction.glm.LeastSquaresRegressor(formula, data, *args, **kwargs)[source]

Bases: PredictionModel

Ordinary least squares (OLS) regression.

A high-level wrapper around least_squares_regressor that provides a formula interface and pandas-aware prediction methods. Fits a linear model by minimizing the sum of squared residuals.

This class inherits from PredictionModel, which handles parsing the formula, extracting variables from a DataFrame, and exposing tidy/glance summaries consistent with the rest of the package.

Examples

Fit an OLS regression from a formula:

>>> model = LeastSquaresRegressor("y ~ x1 + x2", data=df)
>>> model.tidy         # coefficient table
>>> model.glance       # model summary
>>> preds = model.predict(df)

Variables:

MODEL_TYPE (Type[supervised_model]) – The underlying implementation (least_squares_regressor).
formula (str) – Formula string used to specify the model.
x_vars (list[str]) – Predictor variable names.
y_var (str) – Response variable name.
data (pd.DataFrame) – Parsed DataFrame containing predictors and response.
model (least_squares_regressor) – Fitted underlying OLS model.

Parameters:

formula (str)
data (DataFrame)

class cleands.Prediction.glm.LogisticRegressor(formula, data, *args, **kwargs)[source]

Bases: PredictionModel

Logistic regression for binary outcomes.

A high-level wrapper around logistic_regressor that provides a formula interface and pandas-aware prediction methods. Fits a generalized linear model with a logit link, estimating probabilities for binary response variables.

This class inherits from PredictionModel, which handles parsing the formula, extracting variables from a DataFrame, and exposing tidy/glance summaries consistent with the rest of the package.

Examples

Fit a logistic regression model from a formula:

>>> model = LogisticRegressor("y ~ x1 + x2", data=df)
>>> model.tidy          # coefficient table with log-odds
>>> model.glance        # model fit summary (AIC, log-likelihood, etc.)
>>> probs = model.predict(df)   # predicted probabilities

Variables:

MODEL_TYPE (Type[supervised_model]) – The underlying implementation (logistic_regressor).
formula (str) – Formula string used to specify the model.
x_vars (list[str]) – Predictor variable names.
y_var (str) – Response variable name.
data (pd.DataFrame) – Parsed DataFrame containing predictors and response.
model (logistic_regressor) – Fitted underlying logistic regression model.

Parameters:

formula (str)
data (DataFrame)

class cleands.Prediction.glm.PoissonRegressor(formula, data, *args, **kwargs)[source]

Bases: PredictionModel

Poisson regression for count outcomes.

A high-level wrapper around poisson_regressor that provides a formula interface and pandas-aware prediction methods. Fits a generalized linear model with a log link, appropriate for count data where the variance is proportional to the mean.

This class inherits from PredictionModel, which handles parsing the formula, extracting variables from a DataFrame, and exposing tidy/glance summaries consistent with the rest of the package.

Examples

Fit a Poisson regression model from a formula:

>>> model = PoissonRegressor("y ~ x1 + x2", data=df)
>>> model.tidy          # coefficient table with log-incidence ratios
>>> model.glance        # model summary (deviance, AIC, etc.)
>>> rates = model.predict(df)   # expected counts

Variables:

MODEL_TYPE (Type[supervised_model]) – The underlying implementation (poisson_regressor).
formula (str) – Formula string used to specify the model.
x_vars (list[str]) – Predictor variable names.
y_var (str) – Response variable name.
data (pd.DataFrame) – Parsed DataFrame containing predictors and response.
model (poisson_regressor) – Fitted underlying Poisson regression model.

Parameters:

formula (str)
data (DataFrame)