cleands.Classification.knn module

k-Nearest Neighbors (kNN) classification models.

This module provides:

A standard kNN classifier that estimates class probabilities from the label frequencies of the k nearest training points.
A cross-validated kNN classifier that selects k by maximizing average accuracy across K folds.

Both classes integrate with the cleands classification framework and expose the usual classification API (for example, predict_proba, accuracy).

class cleands.Classification.knn.k_nearest_neighbors_classifier(x, y, k=1)[source]

Bases: classification_model, k_nearest_neighbors_regressor

k-Nearest Neighbors (kNN) classifier.

Combines the classification interface with the kNN neighbor search implemented for regression. Class probabilities are computed as the empirical frequency of labels among the k nearest training samples.

Variables:

k (int) – Number of neighbors used for prediction.
norms_train (np.ndarray) – Precomputed squared norms of training rows for fast distance computation.

Parameters:

x (ndarray)
y (ndarray)
k (int)

predict_proba(target)[source]

Predict class probabilities for new samples.

Uses Euclidean distance in the original feature space. For each sample, the class probabilities are the label frequencies among the k nearest neighbors.

Parameters:: target (np.ndarray) – Feature matrix of shape (n_samples, n_features).
Returns:: Predicted probabilities of shape (n_samples, n_classes).
Return type:: np.ndarray

class cleands.Classification.knn.k_nearest_neighbors_cross_validation_classifier(x, y, k_max=25, folds=5, seed=None)[source]

Bases: k_nearest_neighbors_classifier

kNN classifier with cross-validated k.

Selects the number of neighbors k by K-fold cross-validation, maximizing mean accuracy over the validation folds, and then fits a final kNN classifier using the selected k.

Variables:

k (int) – Selected number of neighbors after cross-validation.

Parameters:

x (ndarray)
y (ndarray)
k_max (int)
folds (int)
seed (int | None)

class cleands.Classification.knn.kNearestNeighborsClassifier(formula, data, *args, **kwargs)[source]

Bases: ClassificationModel

Convenience wrapper for k-nearest neighbors classification.

Provides a formula/DataFrame interface for the k_nearest_neighbors_classifier, which predicts class labels based on the majority vote among the nearest neighbors.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to k_nearest_neighbors_classifier.

Parameters:

formula (str)
data (DataFrame)

Example

>>> model = kNearestNeighborsClassifier.from_formula("y ~ x1 + x2", data=df, k=5)
>>> model.classify(df[["x1", "x2"]])
>>> model.predict_proba(df[["x1", "x2"]])

class cleands.Classification.knn.kNearestNeighborsCrossValidationClassifier(formula, data, *args, **kwargs)[source]

Bases: ClassificationModel

Convenience wrapper for cross-validated k-nearest neighbors classification.

Selects the optimal number of neighbors via k-fold cross-validation and provides a formula/DataFrame interface for the resulting k_nearest_neighbors_cross_validation_classifier.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to k_nearest_neighbors_cross_validation_classifier.

Parameters:

formula (str)
data (DataFrame)

Example

>>> model = kNearestNeighborsCrossValidationClassifier.from_formula("y ~ x1 + x2", data=df)
>>> model.classify(df[["x1", "x2"]])
>>> model.predict_proba(df[["x1", "x2"]])