pupil.sampling package

pupil.sampling.cluster_based module

class pupil.sampling.cluster_based.ClusteringSampler(clustering_model: pupil.models.clustering.Clustering)

Bases: object

Clustering sampling: 1. Get the closest data to centroids 2. Get outliers in each cluster 3. Randomly sample from each cluster 4. Combine them all

fit(X: NDArray2D) → None

predict(X: NDArray2D) → Tuple[numpy.ndarray, numpy.ndarray]

Parameters: X (NDArray2D) – _description_
Returns: tuple(distances , cluster_ids)
Return type: Tuple[NDArray2D, NDArray2D]

pupil.sampling.model_based module

class pupil.sampling.model_based.LinearInterpolationTransformer

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

fit(X, y=None)

fit_transform(X, y=None)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

transform(X)

class pupil.sampling.model_based.ModelBasedSampler(ranker)

Bases: object

fit(X: NDArray2D)

classmethod from_strategy(strategy: Literal['rank', 'quantile', 'linear'] = 'linear') → pupil.sampling.model_based.ModelBasedSampler

predict(X: NDArray2D)

class pupil.sampling.model_based.RankTransformer

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

fit(X, y=None)

fit_transform(X, y=None)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns

X_new – Transformed array.

Return type

ndarray array of shape (n_samples, n_features_new)

transform(X)

pupil.sampling.uncertainty module

class pupil.sampling.uncertainty.UncertaintySampler(sampling_strategy: Callable[[numpy.ndarray], numpy.ndarray])

Bases: object

Uncertainty sampling is a set of techniques for identifying unlabeled items that are near a decision boundary in your current machine learning model.

fit(prob_dist: NDArray2D) → None

Get the 2D numpy array of model predictions and retun an array on indecies with the order of highest to lowst uncertainty.

Parameters: prob_dist (NDArray2D) –

classmethod from_strategy(strategy: str) → pupil.sampling.uncertainty.UncertaintySampler

classmethod to help picking the sampling strategy

Parameters: strategy (str) – Should be one of: ['least_confidence', 'margin_confidence', 'ratio_confidence', 'entropy_based']
Raises: ValueError – If strategy is not in the valid list
Return type: UncertaintySampler

pupil.sampling.uncertainty.entropy_based(prob_dist: NDArray2D) → numpy.ndarray

Returns the uncertainty score of an array using least confidence sampling in a 0-1 range where 1 is most uncertain.

Example:

Assumes probability distribution is a numpy array, like: np.array([[0.0321, 0.6439, 0.0871, 0.2369]]) The results will be

P(y|x) log2(P(y|x)) = 0 – SUM(–0.159, –0.409, –0.307, –0.492) = 1.367

1.367 / log2(n_classes = 4) = 0.684

Parameters

prob_dist (NDArray2D) – a 2D numpy array of real numbers between 0 and 1
point (each row is a data) –
class (and each column shows the probability of that) –

Returns

shape(n_rows)

Return type

np.ndarray

pupil.sampling.uncertainty.least_confidence(prob_dist: NDArray2D) → numpy.ndarray

Returns the uncertainty score of an array using least confidence sampling in a 0-1 range where 1 is most uncertain.

Example:

Assumes probability distribution is a numpy array, like np.array ([[0.0321, 0.6439, 0.0871, 0.2369]]) The restults will be (1 – 0.6439) × (4 / 3) = 0.4748

Parameters

prob_dist (NDArray2D) – a 2D numpy array of real numbers between 0 and 1
point (each row is a data) –
class (and each column shows the probability of that) –

Returns

shape(n_rows)

Return type

np.ndarray

pupil.sampling.uncertainty.margin_confidence(prob_dist: NDArray2D) → numpy.ndarray

Returns the uncertainty score of an array using least confidence sampling in a 0-1 range where 1 is most uncertain.

Example:

Assumes probability distribution is a numpy array, like: np.array([[0.0321, 0.6439, 0.0871, 0.2369]]) The results would will be 1.0 - (0.6439 - 0.2369) = 0.5930

Parameters

prob_dist (NDArray2D) – a 2D numpy array of real numbers between 0 and 1
point (each row is a data) –
class. (and each column shows the probability of that) –

Returns

shape(n_rows)

Return type

np.ndarray

pupil.sampling.uncertainty.ratio_confidence(prob_dist: NDArray2D) → numpy.ndarray

Returns the uncertainty score of an array using least confidence sampling in a 0-1 range where 1 is most uncertain. Example:

Assumes probability distribution is a numpy array, like np.array ***([[0.0321, 0.6439, 0.0871, 0.2369]]) The results will be 0.6439 / 0.2369 = 2.71828

Parameters

prob_dist (NDArray2D) – a 2D numpy array of real numbers between 0 and 1
point (each row is a data) –
class (and each column shows the probability of that) –

Returns

shape(n_rows)

Return type

np.ndarray