cleands.Classification.nb module

Naive Bayes classifiers (Gaussian and Multinomial).

This module provides a small hierarchy for Naive Bayes:

  • naive_bayes: abstract base implementing common plumbing: priors (optionally weighted), log-prior caching, input checks, and predict_proba via log-sum-exp stabilization.

  • gaussian_naive_bayes: continuous features modeled as independent univariate Gaussians per class with optional ridge variance reg.

  • multinomial_naive_bayes: count/nonnegative features with Laplace smoothing (alpha) for per-class feature probabilities.

Factory Aliases:
GaussianNaiveBayes:

Wrapper for constructing gaussian_naive_bayes via ClassificationModel.

MultinomialNaiveBayes:

Wrapper for constructing multinomial_naive_bayes via ClassificationModel.

Typical usage example:
>>> from cleands.Classification.nb import GaussianNaiveBayes
>>> model = GaussianNaiveBayes(X, y)
>>> model.tidy; model.glance
class cleands.Classification.nb.naive_bayes(X, y, priors=None, sample_weight=None)[source]

Bases: classification_model

Abstract Naive Bayes classifier.

Handles class priors (with optional sample weights), caches log-priors, validates inputs, and provides stable predict_proba. Subclasses must implement log_likelihood(target) returning class-wise log-likelihoods.

Variables:
  • priors (np.ndarray) – Class priors of shape (K,), normalized to sum to 1.

  • _log_priors (np.ndarray) – Log(priors + ε) cached for numerical stability.

Parameters:
  • X (ndarray)

  • y (ndarray)

  • priors (ndarray | None)

  • sample_weight (ndarray | None)

log_likelihood(target)[source]

Compute per-class log-likelihoods log p(x | y=k) for each row.

Parameters:

target (np.ndarray) – Feature matrix of shape (n, p) or (p,).

Returns:

Log-likelihoods of shape (n, K).

Return type:

np.ndarray

Raises:

NotImplementedError – Must be implemented by subclasses.

predict_proba(target)[source]

Predict class posterior probabilities.

Implements stabilized softmax over log posterior:

log p(y=k | x) ∝ log p(x | y=k) + log π_k

Parameters:

target (np.ndarray) – Feature matrix of shape (n, p) or (p,).

Returns:

Posterior probabilities of shape (n, K),

rows summing to 1.

Return type:

np.ndarray

class cleands.Classification.nb.gaussian_naive_bayes(X, y, priors=None, reg=1e-9, sample_weight=None)[source]

Bases: naive_bayes

Gaussian Naive Bayes classifier.

Assumes conditional independence across features with per-class Gaussian likelihoods.

Parameters:
  • X (np.ndarray) – Feature matrix of shape (n_samples, n_features).

  • y (np.ndarray) – Integer class labels of shape (n_samples,).

  • priors (np.ndarray | None, optional) – Class prior probabilities. If None, uses empirical class frequencies. Defaults to None.

  • var_smoothing (float, optional) – Small additive term to variances for numerical stability. Defaults to 1e-9.

  • reg (float)

  • sample_weight (ndarray | None)

Variables:
  • means (np.ndarray) – Per-class feature means, shape (n_classes, n_features).

  • variances (np.ndarray) – Per-class feature variances, same shape as means.

  • priors (np.ndarray) – Class prior probabilities, shape (n_classes,).

log_likelihood(target)[source]

Compute log p(x | y=k) under the Gaussian NB model.

Uses vectorized per-class, per-feature Gaussian log-densities with precomputed constants for efficiency.

Parameters:

target (np.ndarray) – Feature matrix (n, p) or (p,).

Returns:

Log-likelihoods of shape (n, K).

Return type:

np.ndarray

class cleands.Classification.nb.multinomial_naive_bayes(X, y, priors=None, alpha=1.0, sample_weight=None)[source]

Bases: naive_bayes

Multinomial Naive Bayes classifier.

Suitable for count-like, sparse, or term-frequency features (nonnegative). Uses per-class multinomial parameters with Laplace smoothing.

Variables:
  • alpha (float) – Laplace smoothing parameter (≥ 0).

  • feature_counts (np.ndarray) – Per-class feature totals, shape (K, p).

  • feature_probs (np.ndarray) – Per-class feature probabilities, shape (K, p).

  • _log_feature_probs (np.ndarray) – Cached log probabilities, shape (K, p).

Parameters:
  • X (ndarray)

  • y (ndarray)

  • priors (ndarray | None)

  • alpha (float)

  • sample_weight (ndarray | None)

log_likelihood(target)[source]

Compute log p(x | y=k) under the Multinomial NB model.

Treats each sample row as a bag of feature counts (or TF-like weights). Log-likelihood is proportional to x · log θ_k (constant terms cancel).

Parameters:

target (np.ndarray) – Nonnegative features, shape (n, p) or (p,).

Returns:

Log-likelihoods of shape (n, K).

Return type:

np.ndarray

Raises:

ValueError – If target has negative entries.

class cleands.Classification.nb.GaussianNaiveBayes(formula, data, *args, **kwargs)[source]

Bases: ClassificationModel

Convenience wrapper for Gaussian Naive Bayes classification.

Assumes features are conditionally independent given the class label and normally distributed within each class. Provides a formula/DataFrame interface for the gaussian_naive_bayes.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to gaussian_naive_bayes.

Parameters:
  • formula (str)

  • data (DataFrame)

Example

>>> model = GaussianNaiveBayes.from_formula("y ~ x1 + x2", data=df)
>>> model.classify(df[["x1", "x2"]])
>>> model.predict_proba(df[["x1", "x2"]])
class cleands.Classification.nb.MultinomialNaiveBayes(formula, data, *args, **kwargs)[source]

Bases: ClassificationModel

Convenience wrapper for Multinomial Naive Bayes classification.

Assumes features are conditionally independent given the class label and follow a multinomial distribution. Often used for discrete count data such as text classification. Provides a formula/DataFrame interface for the multinomial_naive_bayes.

Variables:

MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to multinomial_naive_bayes.

Parameters:
  • formula (str)

  • data (DataFrame)

Example

>>> model = MultinomialNaiveBayes.from_formula("y ~ x1 + x2 + x3", data=df)
>>> model.classify(df[["x1", "x2", "x3"]])
>>> model.predict_proba(df[["x1", "x2", "x3"]])