cleands.Classification.nb module
Naive Bayes classifiers (Gaussian and Multinomial).
This module provides a small hierarchy for Naive Bayes:
naive_bayes: abstract base implementing common plumbing: priors (optionally weighted), log-prior caching, input checks, and predict_proba via log-sum-exp stabilization.
gaussian_naive_bayes: continuous features modeled as independent univariate Gaussians per class with optional ridge variance reg.
multinomial_naive_bayes: count/nonnegative features with Laplace smoothing (alpha) for per-class feature probabilities.
- Factory Aliases:
- GaussianNaiveBayes:
Wrapper for constructing gaussian_naive_bayes via ClassificationModel.
- MultinomialNaiveBayes:
Wrapper for constructing multinomial_naive_bayes via ClassificationModel.
- Typical usage example:
>>> from cleands.Classification.nb import GaussianNaiveBayes >>> model = GaussianNaiveBayes(X, y) >>> model.tidy; model.glance
- class cleands.Classification.nb.naive_bayes(X, y, priors=None, sample_weight=None)[source]
Bases:
classification_modelAbstract Naive Bayes classifier.
Handles class priors (with optional sample weights), caches log-priors, validates inputs, and provides stable predict_proba. Subclasses must implement log_likelihood(target) returning class-wise log-likelihoods.
- Variables:
priors (np.ndarray) – Class priors of shape (K,), normalized to sum to 1.
_log_priors (np.ndarray) – Log(priors + ε) cached for numerical stability.
- Parameters:
X (ndarray)
y (ndarray)
priors (ndarray | None)
sample_weight (ndarray | None)
- log_likelihood(target)[source]
Compute per-class log-likelihoods log p(x | y=k) for each row.
- Parameters:
target (np.ndarray) – Feature matrix of shape (n, p) or (p,).
- Returns:
Log-likelihoods of shape (n, K).
- Return type:
np.ndarray
- Raises:
NotImplementedError – Must be implemented by subclasses.
- predict_proba(target)[source]
Predict class posterior probabilities.
- Implements stabilized softmax over log posterior:
log p(y=k | x) ∝ log p(x | y=k) + log π_k
- Parameters:
target (np.ndarray) – Feature matrix of shape (n, p) or (p,).
- Returns:
- Posterior probabilities of shape (n, K),
rows summing to 1.
- Return type:
np.ndarray
- class cleands.Classification.nb.gaussian_naive_bayes(X, y, priors=None, reg=1e-9, sample_weight=None)[source]
Bases:
naive_bayesGaussian Naive Bayes classifier.
Assumes conditional independence across features with per-class Gaussian likelihoods.
- Parameters:
X (np.ndarray) – Feature matrix of shape
(n_samples, n_features).y (np.ndarray) – Integer class labels of shape
(n_samples,).priors (np.ndarray | None, optional) – Class prior probabilities. If
None, uses empirical class frequencies. Defaults toNone.var_smoothing (float, optional) – Small additive term to variances for numerical stability. Defaults to
1e-9.reg (float)
sample_weight (ndarray | None)
- Variables:
means (np.ndarray) – Per-class feature means, shape
(n_classes, n_features).variances (np.ndarray) – Per-class feature variances, same shape as
means.priors (np.ndarray) – Class prior probabilities, shape
(n_classes,).
- log_likelihood(target)[source]
Compute log p(x | y=k) under the Gaussian NB model.
Uses vectorized per-class, per-feature Gaussian log-densities with precomputed constants for efficiency.
- Parameters:
target (np.ndarray) – Feature matrix (n, p) or (p,).
- Returns:
Log-likelihoods of shape (n, K).
- Return type:
np.ndarray
- class cleands.Classification.nb.multinomial_naive_bayes(X, y, priors=None, alpha=1.0, sample_weight=None)[source]
Bases:
naive_bayesMultinomial Naive Bayes classifier.
Suitable for count-like, sparse, or term-frequency features (nonnegative). Uses per-class multinomial parameters with Laplace smoothing.
- Variables:
alpha (float) – Laplace smoothing parameter (≥ 0).
feature_counts (np.ndarray) – Per-class feature totals, shape (K, p).
feature_probs (np.ndarray) – Per-class feature probabilities, shape (K, p).
_log_feature_probs (np.ndarray) – Cached log probabilities, shape (K, p).
- Parameters:
X (ndarray)
y (ndarray)
priors (ndarray | None)
alpha (float)
sample_weight (ndarray | None)
- log_likelihood(target)[source]
Compute log p(x | y=k) under the Multinomial NB model.
Treats each sample row as a bag of feature counts (or TF-like weights). Log-likelihood is proportional to x · log θ_k (constant terms cancel).
- Parameters:
target (np.ndarray) – Nonnegative features, shape (n, p) or (p,).
- Returns:
Log-likelihoods of shape (n, K).
- Return type:
np.ndarray
- Raises:
ValueError – If target has negative entries.
- class cleands.Classification.nb.GaussianNaiveBayes(formula, data, *args, **kwargs)[source]
Bases:
ClassificationModelConvenience wrapper for Gaussian Naive Bayes classification.
Assumes features are conditionally independent given the class label and normally distributed within each class. Provides a formula/DataFrame interface for the
gaussian_naive_bayes.- Variables:
MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to
gaussian_naive_bayes.- Parameters:
formula (str)
data (DataFrame)
Example
>>> model = GaussianNaiveBayes.from_formula("y ~ x1 + x2", data=df) >>> model.classify(df[["x1", "x2"]]) >>> model.predict_proba(df[["x1", "x2"]])
- class cleands.Classification.nb.MultinomialNaiveBayes(formula, data, *args, **kwargs)[source]
Bases:
ClassificationModelConvenience wrapper for Multinomial Naive Bayes classification.
Assumes features are conditionally independent given the class label and follow a multinomial distribution. Often used for discrete count data such as text classification. Provides a formula/DataFrame interface for the
multinomial_naive_bayes.- Variables:
MODEL_TYPE (ClassVar[Type[cleands.base.supervised_model]]) – Underlying model type, fixed to
multinomial_naive_bayes.- Parameters:
formula (str)
data (DataFrame)
Example
>>> model = MultinomialNaiveBayes.from_formula("y ~ x1 + x2 + x3", data=df) >>> model.classify(df[["x1", "x2", "x3"]]) >>> model.predict_proba(df[["x1", "x2", "x3"]])