cleands.Prediction.knn module
k-Nearest Neighbors (kNN) regression models.
This module implements kNN regression using squared Euclidean distance. Predictions are made by averaging the target values of the k nearest training samples. A cross-validation variant is also included to automatically select the optimal number of neighbors.
- Classes:
- k_nearest_neighbors_regressor:
Basic kNN regressor that predicts by averaging the outcomes of the k closest training samples.
- k_nearest_neighbors_cross_validation_regressor:
Extension of kNN regressor that selects the best k (up to k_max) using k-fold cross-validation to minimize mean squared prediction error.
Notes
Distance computation uses precomputed squared L2 norms for efficiency.
The cross-validation model defaults to 5 folds and evaluates k from 1 up to k_max (default 25).
- class cleands.Prediction.knn.k_nearest_neighbors_regressor(x, y, k=1)[source]
Bases:
prediction_modelk-Nearest Neighbors (kNN) regressor.
Predicts values by averaging the outcomes of the k closest points (neighbors) in the training set, using squared Euclidean distance.
- Variables:
k (int) – Number of neighbors to use.
norms_train (np.ndarray) – Precomputed squared L2 norms of training samples.
- Parameters:
x (ndarray)
y (ndarray)
k (int)
- class cleands.Prediction.knn.k_nearest_neighbors_cross_validation_regressor(x, y, k_max=25, folds=5, seed=None)[source]
Bases:
k_nearest_neighbors_regressorCross-validated kNN regressor.
Automatically selects the best value of k (1 ≤ k ≤ k_max) using k-fold cross-validation to minimize mean squared prediction error (MSPE).
- Inherits from:
k_nearest_neighbors_regressor
- Variables:
k (int) – Optimal number of neighbors chosen by cross-validation.
- Parameters:
x (ndarray)
y (ndarray)
k_max (int)
folds (int)
seed (int | None)
- class cleands.Prediction.knn.kNearestNeighborsRegressor(formula, data, *args, **kwargs)[source]
Bases:
PredictionModelConvenience wrapper for k-nearest neighbors regression.
Provides a formula/DataFrame interface for the
k_nearest_neighbors_regressor, which predicts continuous outcomes by averaging the responses of the nearest neighbors.- Variables:
MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to
k_nearest_neighbors_regressor.- Parameters:
formula (str)
data (DataFrame)
Example
>>> model = kNearestNeighborsRegressor.from_formula("y ~ x1 + x2", data=df, k=5) >>> model.predict(df[["x1", "x2"]])
- class cleands.Prediction.knn.kNearestNeighborsCrossValidationRegressor(formula, data, *args, **kwargs)[source]
Bases:
PredictionModelConvenience wrapper for cross-validated k-nearest neighbors regression.
Selects the optimal number of neighbors via k-fold cross-validation and provides a formula/DataFrame interface for the resulting model.
- Variables:
MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to
k_nearest_neighbors_cross_validation_regressor.- Parameters:
formula (str)
data (DataFrame)
Example
>>> model = kNearestNeighborsCrossValidationRegressor.from_formula("y ~ x1 + x2", data=df) >>> model.predict(df[["x1", "x2"]])