cleands.Prediction.ldv module

ldv.py — Limited Dependent Variable (LDV) Models

This module implements models for limited dependent variables where the outcome is only partially observed due to censoring, truncation, or selection processes. These models extend standard regression by explicitly accounting for restricted observability of the dependent variable.

Models currently implemented

Tobit regression (two-limit censored normal)

Latent variable:

y* = Xβ + ε, ε ~ N(0, σ²)

Observed variable:

y = L if y* ≤ L (left-censored) y = y* if L < y* < R (uncensored) y = R if y* ≥ R (right-censored)

Features:

Supports left- and/or right-censoring (finite or infinite).
Fits parameters by maximum likelihood (L-BFGS-B).
Returns estimates of β and σ with variance-covariance matrix.
Provides log-likelihood, AIC, BIC, deviance, convergence info.
Includes:
- predict() for latent mean μ = Xβ
- expected_observed() for E[y | X] under censoring
- censoring_probs() for P_left, P_uncensored, P_right.

Planned models

Truncated regression

Similar to Tobit but assumes data outside [L, R] are unobserved (not censored).
Log-likelihood excludes truncated cases entirely.
Useful for survey data where only responses in a restricted range are collected.

Heckman selection model (two-step / full MLE)

Jointly models outcome and selection equations to correct for sample selection bias.
Outcome observed only if selection variable exceeds threshold.
Widely applied in labor economics, health economics, and marketing.

Classes

tobit_regressor: Core implementation of two-limit Tobit regression with MLE fitting, prediction, and inference utilities.

Factory Aliases

TobitRegressor: Partial wrapper that exposes tobit_regressor through the PredictionModel interface for pandas DataFrame/formula use.

Examples

>>> import numpy as np, pandas as pd
>>> from cleands.Prediction.ldv import TobitRegressor
>>> df = pd.DataFrame({"x1": np.random.randn(100), "y": np.random.randn(100)})
>>> model = TobitRegressor(x_vars=["x1"], y_var="y", data=df, L=0.0)
>>> model.glance

class cleands.Prediction.ldv.tobit_regressor(x, y, L=0.0, R=None, add_intercept=False, start=None, tol=1e-8)[source]

Bases: prediction_model, prediction_likelihood_model, variance_model

Two-limit Tobit (censored normal) regression model.

Latent model:

y* = Xβ + ε, ε ~ N(0, σ²)

Observed:

y = L if y* ≤ L (left-censored)
y = y* if L < y* < R (uncensored)
y = R if y* ≥ R (right-censored)

Parameters are stored as:

params = [β, σ] with σ > 0

predict(X) returns the latent mean μ = Xβ. Use expected_observed(X) to compute E[y | X] under censoring.

Parameters:

x (ndarray)
y (ndarray)
L (float | None)
R (float | None)
add_intercept (bool)
start (Tuple[ndarray, float] | None)
tol (float)

predict(target)[source]

Predict latent mean μ = Xβ.

Parameters:: target (np.ndarray) – New design matrix.
Returns:: Latent means.
Return type:: np.ndarray

expected_observed(target)[source]

Predict expected observed y under censoring.

Parameters:: target (np.ndarray) – New design matrix.
Returns:: Expected observed values, E[y | X].
Return type:: np.ndarray

censoring_probs(target)[source]

Compute probabilities of being left-censored, uncensored, or right-censored.

Parameters:: target (np.ndarray) – New design matrix.
Returns:: (P_left, P_uncensored, P_right) for each observation.
Return type:: Tuple[np.ndarray, np.ndarray, np.ndarray]

evaluate_lnL(pred)[source]

Evaluate the log-likelihood at a given latent mean μ.

Parameters:: pred (np.ndarray) – Latent mean predictions μ = Xβ.
Returns:: Log-likelihood value.
Return type:: float

property vcov_params: ndarray

Variance-covariance matrix of parameter estimates.

Returns:: (r+1) x (r+1) covariance matrix for [β, σ].
Return type:: np.ndarray

class cleands.Prediction.ldv.TobitRegressor(formula, data, *args, **kwargs)[source]

Bases: PredictionModel

Convenience wrapper for Tobit regression.

The Tobit model is used for censored dependent variables, where observations below (or above) a threshold are censored rather than fully observed. This wrapper provides a formula/DataFrame interface for the tobit_regressor.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to tobit_regressor.

Parameters:

formula (str)
data (DataFrame)

Example

>>> model = TobitRegressor.from_formula("y ~ x1 + x2", data=df)
>>> model.predict(df[["x1", "x2"]])