cleands.Prediction.knn module

k-Nearest Neighbors (kNN) regression models.

This module implements kNN regression using squared Euclidean distance. Predictions are made by averaging the target values of the k nearest training samples. A cross-validation variant is also included to automatically select the optimal number of neighbors.

Classes:
k_nearest_neighbors_regressor:

Basic kNN regressor that predicts by averaging the outcomes of the k closest training samples.

k_nearest_neighbors_cross_validation_regressor:

Extension of kNN regressor that selects the best k (up to k_max) using k-fold cross-validation to minimize mean squared prediction error.

Notes

  • Distance computation uses precomputed squared L2 norms for efficiency.

  • The cross-validation model defaults to 5 folds and evaluates k from 1 up to k_max (default 25).

class cleands.Prediction.knn.k_nearest_neighbors_regressor(x, y, k=1)[source]

Bases: prediction_model

k-Nearest Neighbors (kNN) regressor.

Predicts values by averaging the outcomes of the k closest points (neighbors) in the training set, using squared Euclidean distance.

Variables:
  • k (int) – Number of neighbors to use.

  • norms_train (np.ndarray) – Precomputed squared L2 norms of training samples.

Parameters:
  • x (ndarray)

  • y (ndarray)

  • k (int)

neighbors(target, k)[source]

Find the k nearest neighbors of each target sample.

Parameters:
  • target (np.ndarray) – Query feature matrix of shape (m, n_features).

  • k (int) – Number of neighbors to return.

Returns:

Indices of nearest neighbors with shape (m, k).

Return type:

np.ndarray

predict(target)[source]

Predict by averaging target values of nearest neighbors.

Parameters:

target (np.ndarray) – Query feature matrix of shape (m, n_features).

Returns:

Predicted values of shape (m,).

Return type:

np.ndarray

class cleands.Prediction.knn.k_nearest_neighbors_cross_validation_regressor(x, y, k_max=25, folds=5, seed=None)[source]

Bases: k_nearest_neighbors_regressor

Cross-validated kNN regressor.

Automatically selects the best value of k (1 ≤ k ≤ k_max) using k-fold cross-validation to minimize mean squared prediction error (MSPE).

Inherits from:

k_nearest_neighbors_regressor

Variables:

k (int) – Optimal number of neighbors chosen by cross-validation.

Parameters:
  • x (ndarray)

  • y (ndarray)

  • k_max (int)

  • folds (int)

  • seed (int | None)

class cleands.Prediction.knn.kNearestNeighborsRegressor(formula, data, *args, **kwargs)[source]

Bases: PredictionModel

Convenience wrapper for k-nearest neighbors regression.

Provides a formula/DataFrame interface for the k_nearest_neighbors_regressor, which predicts continuous outcomes by averaging the responses of the nearest neighbors.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to k_nearest_neighbors_regressor.

Parameters:
  • formula (str)

  • data (DataFrame)

Example

>>> model = kNearestNeighborsRegressor.from_formula("y ~ x1 + x2", data=df, k=5)
>>> model.predict(df[["x1", "x2"]])
class cleands.Prediction.knn.kNearestNeighborsCrossValidationRegressor(formula, data, *args, **kwargs)[source]

Bases: PredictionModel

Convenience wrapper for cross-validated k-nearest neighbors regression.

Selects the optimal number of neighbors via k-fold cross-validation and provides a formula/DataFrame interface for the resulting model.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to k_nearest_neighbors_cross_validation_regressor.

Parameters:
  • formula (str)

  • data (DataFrame)

Example

>>> model = kNearestNeighborsCrossValidationRegressor.from_formula("y ~ x1 + x2", data=df)
>>> model.predict(df[["x1", "x2"]])