cleands.Classification.nb module

Naive Bayes classifiers (Gaussian and Multinomial).

This module provides a small hierarchy for Naive Bayes:

naive_bayes: abstract base implementing common plumbing: priors (optionally weighted), log-prior caching, input checks, and predict_proba via log-sum-exp stabilization.
gaussian_naive_bayes: continuous features modeled as independent univariate Gaussians per class with optional ridge variance reg.
multinomial_naive_bayes: count/nonnegative features with Laplace smoothing (alpha) for per-class feature probabilities.

Factory Aliases:

GaussianNaiveBayes:: Wrapper for constructing gaussian_naive_bayes via ClassificationModel.
MultinomialNaiveBayes:: Wrapper for constructing multinomial_naive_bayes via ClassificationModel.

Typical usage example:

>>> from cleands.Classification.nb import GaussianNaiveBayes
>>> model = GaussianNaiveBayes(X, y)
>>> model.tidy; model.glance

class cleands.Classification.nb.naive_bayes(X, y, priors=None, sample_weight=None)[source]

Bases: classification_model

Abstract Naive Bayes classifier.

Handles class priors (with optional sample weights), caches log-priors, validates inputs, and provides stable predict_proba. Subclasses must implement log_likelihood(target) returning class-wise log-likelihoods.

Variables:

priors (np.ndarray) – Class priors of shape (K,), normalized to sum to 1.
_log_priors (np.ndarray) – Log(priors + ε) cached for numerical stability.

Parameters:

X (ndarray)
y (ndarray)
priors (ndarray | None)
sample_weight (ndarray | None)

log_likelihood(target)[source]

Compute per-class log-likelihoods log p(x | y=k) for each row.

Parameters:: target (np.ndarray) – Feature matrix of shape (n, p) or (p,).
Returns:: Log-likelihoods of shape (n, K).
Return type:: np.ndarray
Raises:: NotImplementedError – Must be implemented by subclasses.

predict_proba(target)[source]

Predict class posterior probabilities.

Implements stabilized softmax over log posterior:: log p(y=k | x) ∝ log p(x | y=k) + log π_k

Parameters:

target (np.ndarray) – Feature matrix of shape (n, p) or (p,).

Returns:

Posterior probabilities of shape (n, K),: rows summing to 1.

Return type:

np.ndarray

class cleands.Classification.nb.gaussian_naive_bayes(X, y, priors=None, reg=1e-9, sample_weight=None)[source]

Bases: naive_bayes

Gaussian Naive Bayes classifier.

Assumes conditional independence across features with per-class Gaussian likelihoods.

Parameters:

X (np.ndarray) – Feature matrix of shape (n_samples, n_features).
y (np.ndarray) – Integer class labels of shape (n_samples,).
priors (np.ndarray | None, optional) – Class prior probabilities. If None, uses empirical class frequencies. Defaults to None.
var_smoothing (float, optional) – Small additive term to variances for numerical stability. Defaults to 1e-9.
reg (float)
sample_weight (ndarray | None)

Variables:

means (np.ndarray) – Per-class feature means, shape (n_classes, n_features).
variances (np.ndarray) – Per-class feature variances, same shape as means.
priors (np.ndarray) – Class prior probabilities, shape (n_classes,).

log_likelihood(target)[source]

Compute log p(x | y=k) under the Gaussian NB model.

Uses vectorized per-class, per-feature Gaussian log-densities with precomputed constants for efficiency.

Parameters:: target (np.ndarray) – Feature matrix (n, p) or (p,).
Returns:: Log-likelihoods of shape (n, K).
Return type:: np.ndarray

class cleands.Classification.nb.multinomial_naive_bayes(X, y, priors=None, alpha=1.0, sample_weight=None)[source]

Bases: naive_bayes

Multinomial Naive Bayes classifier.

Suitable for count-like, sparse, or term-frequency features (nonnegative). Uses per-class multinomial parameters with Laplace smoothing.

Variables:

alpha (float) – Laplace smoothing parameter (≥ 0).
feature_counts (np.ndarray) – Per-class feature totals, shape (K, p).
feature_probs (np.ndarray) – Per-class feature probabilities, shape (K, p).
_log_feature_probs (np.ndarray) – Cached log probabilities, shape (K, p).

Parameters:

X (ndarray)
y (ndarray)
priors (ndarray | None)
alpha (float)
sample_weight (ndarray | None)

log_likelihood(target)[source]

Compute log p(x | y=k) under the Multinomial NB model.

Treats each sample row as a bag of feature counts (or TF-like weights). Log-likelihood is proportional to x · log θ_k (constant terms cancel).

Parameters:: target (np.ndarray) – Nonnegative features, shape (n, p) or (p,).
Returns:: Log-likelihoods of shape (n, K).
Return type:: np.ndarray
Raises:: ValueError – If target has negative entries.

class cleands.Classification.nb.GaussianNaiveBayes(formula, data, *args, **kwargs)[source]

Bases: ClassificationModel

Convenience wrapper for Gaussian Naive Bayes classification.

Assumes features are conditionally independent given the class label and normally distributed within each class. Provides a formula/DataFrame interface for the gaussian_naive_bayes.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to gaussian_naive_bayes.

Parameters:

formula (str)
data (DataFrame)

Example

>>> model = GaussianNaiveBayes.from_formula("y ~ x1 + x2", data=df)
>>> model.classify(df[["x1", "x2"]])
>>> model.predict_proba(df[["x1", "x2"]])

class cleands.Classification.nb.MultinomialNaiveBayes(formula, data, *args, **kwargs)[source]

Bases: ClassificationModel

Convenience wrapper for Multinomial Naive Bayes classification.

Assumes features are conditionally independent given the class label and follow a multinomial distribution. Often used for discrete count data such as text classification. Provides a formula/DataFrame interface for the multinomial_naive_bayes.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to multinomial_naive_bayes.

Parameters:

formula (str)
data (DataFrame)

Example

>>> model = MultinomialNaiveBayes.from_formula("y ~ x1 + x2 + x3", data=df)
>>> model.classify(df[["x1", "x2", "x3"]])
>>> model.predict_proba(df[["x1", "x2", "x3"]])