Don’t use pickle to userialize objects from this package as it’s likely to fail. Use cloudpickle or dill instead, which have the same syntax and is able to serialize more types of objects.
Selects a proportion of actions according to an active learning heuristic based on gradient.
Works only for differentiable and preferably smooth functions.
Note
Here, for the predictions that are made according to an active learning heuristic
(these are selected at random, just like in Epsilon-Greedy), the guiding heuristic
is the gradient that the observation, having either label (either weighted by the estimted
probability, or taking the maximum or minimum), would produce on each model that
predicts a class, given the current coefficients for that model. This of course requires
being able to calculate gradients - package comes with pre-defined gradient functions for
linear and logistic regression, and allows passing custom functions for others.
Parameters:
base_algorithm (obj) – Base binary classifier for which each sample for each class will be fit.
Will look for, in this order:
A ‘predict_proba’ method with outputs (n_samples, 2), values in [0,1], rows suming to 1
A ‘decision_function’ method with unbounded outputs (n_samples,) to which it will apply a sigmoid function.
A ‘predict’ method with outputs (n_samples,) with values in [0,1].
Can also pass a list with a different (or already-fit) classifier for each arm.
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case
the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a
custom name.
f_grad_norm (str ‘auto’ or function(base_algorithm, X, pred) -> array (n_samples, 2)) – Function that calculates the row-wise norm of the gradient from observations in X if their class were
negative (first column) or positive (second column).
Can also use different functions for each arm, in which case it
accepts them as a list of functions with length equal to nchoices.
The option ‘auto’ will only work with scikit-learn’s ‘LogisticRegression’, ‘SGDClassifier’ (log-loss only), and ‘RidgeClassifier’;
with stochQN’s ‘StochasticLogisticRegression’;
and with this package’s ‘LinearRegression’.
case_one_class (str ‘auto’, ‘zero’, None, or function(X, n_pos, n_neg, rng) -> array(n_samples, 2)) – If some arm/choice/class has only rewards of one type, many models will fail to fit, and consequently the gradients
will be undefined. Likewise, if the model has not been fit, the gradient might also be undefined, and this requires a workaround.
If passing ‘None’, will assume that base_algorithm can be fit to
data of only-positive or only-negative class without problems, and that
it can calculate gradients and predictions with a base_algorithm
object that has not been fitted. Be aware that the methods ‘predict’,
‘predict_proba’, and ‘decision_function’ in base_algorithm might be
overwritten with another method that wraps it in a try-catch block, so
don’t rely on it producing errors when unfitted.
If passing a function, will take the output of it as the row-wise
gradient norms when it compares them against other arms/classes, with
the first column having the values if the observations were of negative
class, and the second column if they were positive class. The other
inputs to this function are the number of positive and negative examples
that have been observed, and a Generator object from NumPy to use
for generating random numbers.
If passing a list, will assume each entry is a function as described
above, to be used with each corresponding arm.
If passing ‘zero’, it will output zero whenever models have not been fitted.
Note that the theoretically correct approach for a logistic regression would
be to assume models with all-zero coefficients, in which case the gradient
is defined in the absence of any data, but this tends to produce bad end
results.
active_choice (str in {‘min’, ‘max’, ‘weighted’}) – How to calculate the gradient that an observation would have on the loss
function for each classifier, given that it could be either class (positive or negative)
for the classifier that predicts each arm. If weighted, they are weighted by the same
probability estimates from the base algorithm.
explore_prob (float (0,1)) – Probability of selecting an action according to active learning criteria.
decay (float (0,1)) – After each prediction, the probability of selecting an arm according to active
learning criteria is set to p = p*decay
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without
a reward from a given arm, it will predict the score for that class as a
random number drawn from a beta distribution with the prior
specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((2/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed
as a list of tuples.
This parameter can have a very large impact in the end results, and it’s
recommended to tune it accordingly - scenarios with low expected reward rates
should have priors that result in drawing small random numbers, whereas
scenarios with large expected reward rates should have stronger priors and
tend towards larger random numbers. Also, the more arms there are, the smaller
the optimal expected value for these random numbers.
Recommended to use only one of beta_prior or smoothing.
smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b),
where ‘n’ is the number of times each arm was chosen in the training data.
Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm
(e.g. if there are arm features, these parameters can be determined through a different model).
Recommended to use only one of beta_prior or smoothing.
noise_to_smooth (bool) – If passing smoothing, whether to add a small amount of random
noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of
choosing the smallest arm index.
Ignored when passing smoothing=None.
batch_train (bool) – Whether the base algorithm will be fit to the data in batches as it comes (streaming),
or to the whole dataset each time it is refit. Requires a classifier with a
‘partial_fit’ method.
refit_buffer (int or None) – Number of observations per arm to keep as a reserve for passing to
‘partial_fit’. If passing it, up until the moment there are at least this
number of observations for a given arm, that arm will keep the observations
when calling ‘fit’ and ‘partial_fit’, and will translate calls to
‘partial_fit’ to calls to ‘fit’ with the new plus stored observations.
After the reserve number is reached, calls to ‘partial_fit’ will enlarge
the data batch with the stored observations, and old stored observations
will be gradually replaced with the new ones (at random, not on a FIFO
basis). This technique can greatly enchance the performance when fitting
the data in batches, but memory consumption can grow quite large.
If passing sparse CSR matrices as input to ‘fit’ and ‘partial_fit’,
these will be converted to dense once they go into this reserve, and
then converted back to CSR to augment the new data.
Calls to ‘fit’ will override this reserve.
Ignored when passing ‘batch_train=False’.
deep_copy_buffer (bool) – Whether to make deep copies of the data that is stored in the
reserve for refit_buffer. If passing ‘False’, when the reserve is
not yet full, these will only store shallow copies of the data, which
is faster but will not let Python’s garbage collector free memory
after deleting the data, and if the original data is overwritten, so will
this buffer.
Ignored when not using refit_buffer.
assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’,
whenever an arm receives a reward, the classifiers for all other arms will be
fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator object for random number generation, a RandomState
object (from NumPy) from which to draw an integer, or a Generator
object (from NumPy), which will be used directly.
While this controls random number generation for this meteheuristic,
there can still be other sources of variations upon re-runs, such as
data aggregations in parallel (e.g. from OpenMP or BLAS functions).
njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will
set it to the number of CPU cores. Note that if the base algorithm is itself parallelized,
this might result in a slowdown as both compete for available threads, so don’t set
parallelization in both. The parallelization uses shared memory, thus you will only
see a speed up if your base classifier releases the Python GIL, and will
otherwise result in slower runs.
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last
arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise,
will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not
accept arbitrary classifiers as input,
don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase).
If the constructor was called with different base_algorithm per arm, must pass a
base classifier here. Not applicable for the classes that do not take a base_algorithm.
n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor
for details). If None and if the smoothing passed to the constructor didn’t have
separate entries per arm, will use the same smoothing as was passed in the constructor.
If no smoothing was passed to the constructor, the smoothing here will be ignored.
Must pass a smoothing here if the constructor was passed a smoothing with different entries per arm.
beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details.
Must be passed if the constructor was provided different beta priors per arm.
If None and the constructor had a single beta_prior, will use that same
beta_prior for this new arm.
Note that n_w_rew and n_wo_rew will be counted towards the threshold ‘n’
in here.
Cannot be passed if the constructor did not have a beta_prior.
refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
Parameters:
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise,
will drop the arm matching this name (argument must be of the same type as the individual entries
passed to ‘nchoices’ in the initialization).
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
continue_from_last (bool) – If the policy was previously fit to data, whether to assume that
this new call to ‘fit’ will continue from the exact same dataset as before
plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case,
will only refit the models that have new data according to ‘a’.
Note that the bootstrapped policies will still benefit from extra refits.
This option should not be used when there are calls to ‘partial_fit’ between
calls to fit.
Ignored if using assume_unique_reward=True.
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method,
such as ‘sklearn.linear_model.SGDClassifier’. This method is not available
for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
Selects actions according to this policy for new data.
Parameters:
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the
arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use
it with this pakckage’s offpolicy and evaluation modules.
Returns:
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary
with the chosen arm and the score that the arm got following this policy with the classifiers used.
Return type:
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
active_choice (str in {‘min’, ‘max’, ‘weighted’}) – How to calculate the gradient that an observation would have on the loss
function for each classifier, given that it could be either class (positive or negative)
for the classifier that predicts each arm. If weighted, they are weighted by the same
probability estimates from the base algorithm.
This method will rank choices/arms according to what the policy
dictates - it is not an exploitation-mode rank, so if e.g. there are
random choices for some observations, there will be random ranks in here.
Parameters:
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
Returns:
topN – The top-ranked actions for each observation
Takes the action with highest estimated reward, unless that estimation falls below a certain
threshold, in which case it takes a an action either at random or according to an active learning
heuristic (same way as ActiveExplorer).
Note
The hyperparameters here can make a large impact on the quality of the choices. Be sure
to tune the threshold (or percentile), decay, and prior (or smoothing parameters).
Note
The threshold for the reward probabilities can be set to a hard-coded number, or
to be calculated dynamically by keeping track of the predictions it makes, and taking
a fixed percentile of that distribution to be the threshold.
In the second case, these are calculated in separate batches rather than in a sliding window.
Can also be set to make choices in the same way as
‘ActiveExplorer’ rather than random (see ‘greedy_choice’ parameter).
Parameters:
base_algorithm (obj) – Base binary classifier for which each sample for each class will be fit.
Will look for, in this order:
A ‘predict_proba’ method with outputs (n_samples, 2), values in [0,1], rows suming to 1
A ‘decision_function’ method with unbounded outputs (n_samples,) to which it will apply a sigmoid function.
A ‘predict’ method with outputs (n_samples,) with values in [0,1].
Can also pass a list with a different (or already-fit) classifier for each arm.
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or series with arm names, in which case
the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a
custom name.
window_size (int) – Number of predictions after which the threshold will be updated to the desired percentile.
percentile (int in [0,100] or None) – Percentile of the predictions sample to set as threshold, below which actions are random.
If None, will not take percentiles, will instead use the intial threshold and apply decay to it.
decay (float (0,1) or None) –
After each prediction, either the threshold or the percentile gets adjusted to:
val_{t+1} = val_t*decay
decay_type (str, either ‘percentile’ or ‘threshold’) – Whether to decay the threshold itself or the percentile of the predictions to take after
each prediction. Ignored when using ‘decay=None’. If passing ‘percentile=None’ and ‘decay_type=percentile’,
will be forced to ‘threshold’.
initial_thr (str ‘auto’ or float (0,1)) – Initial threshold for the prediction below which a random action is taken.
If set to ‘auto’, will be calculated as initial_thr = 1 / (2 * sqrt(nchoices)).
Note that if ‘base_algorithm’ has a ‘decision_function’ method, it will first apply a sigmoid function to the
output, and then compare it to the threshold, so the threshold should lie between zero and one.
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without
a reward from a given arm, it will predict the score for that class as a
random number drawn from a beta distribution with the prior
specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((3/nchoices, 4), 2)
Can also pass different priors per arm, in which case they should be passed
as a list of tuples.
This parameter can have a very large impact in the end results, and it’s
recommended to tune it accordingly - scenarios with low expected reward rates
should have priors that result in drawing small random numbers, whereas
scenarios with large expected reward rates should have stronger priors and
tend towards larger random numbers. Also, the more arms there are, the smaller
the optimal expected value for these random numbers.
Note that the default value for AdaptiveGreedy is different than from the
other methods in this module, and it’s recommended to experiment with different
values of this hyperparameter.
Recommended to use only one of beta_prior or smoothing.
smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b),
where ‘n’ is the number of times each arm was chosen in the training data.
Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm
(e.g. if there are arm features, these parameters can be determined through a different model).
This will not work well with non-probabilistic classifiers such as SVM, in which case you might
want to define a class that embeds it with some recalibration built-in.
Recommended to use only one of beta_prior or smoothing.
noise_to_smooth (bool) – If passing smoothing, whether to add a small amount of random
noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of
choosing the smallest arm index.
Ignored when passing smoothing=None.
batch_train (bool) – Whether the base algorithm will be fit to the data in batches as it comes (streaming),
or to the whole dataset each time it is refit. Requires a classifier with a
‘partial_fit’ method.
refit_buffer (int or None) – Number of observations per arm to keep as a reserve for passing to
‘partial_fit’. If passing it, up until the moment there are at least this
number of observations for a given arm, that arm will keep the observations
when calling ‘fit’ and ‘partial_fit’, and will translate calls to
‘partial_fit’ to calls to ‘fit’ with the new plus stored observations.
After the reserve number is reached, calls to ‘partial_fit’ will enlarge
the data batch with the stored observations, and old stored observations
will be gradually replaced with the new ones (at random, not on a FIFO
basis). This technique can greatly enchance the performance when fitting
the data in batches, but memory consumption can grow quite large.
If passing sparse CSR matrices as input to ‘fit’ and ‘partial_fit’,
these will be converted to dense once they go into this reserve, and
then converted back to CSR to augment the new data.
Calls to ‘fit’ will override this reserve.
Ignored when passing ‘batch_train=False’.
deep_copy_buffer (bool) – Whether to make deep copies of the data that is stored in the
reserve for refit_buffer. If passing ‘False’, when the reserve is
not yet full, these will only store shallow copies of the data, which
is faster but will not let Python’s garbage collector free memory
after deleting the data, and if the original data is overwritten, so will
this buffer.
Ignored when not using refit_buffer.
assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’,
whenever an arm receives a reward, the classifiers for all other arms will be
fit to that observation too, having negative label.
active_choice (None or str in {‘min’, ‘max’, ‘weighted’}) – How to select arms when predictions are below the threshold. If passing None, selects them at random (default).
If passing ‘min’, ‘max’ or ‘weighted’, selects them in the same way as ‘ActiveExplorer’.
Non-random active selection requires being able to calculate gradients (gradients for logistic regression and linear regression (from this package)
are already defined with an option ‘auto’ below).
f_grad_norm (None, str ‘auto’, list, or function(base_algorithm, X, pred) -> array (n_samples, 2)) – (When passing active_choice)
Function that calculates the row-wise norm of the gradient from observations in X if their class were
Function that calculates the row-wise norm of the gradient from observations in X if their class were
negative (first column) or positive (second column).
Can also use different functions for each arm, in which case it
accepts them as a list of functions with length equal to nchoices.
The option ‘auto’ will only work with scikit-learn’s ‘LogisticRegression’, ‘SGDClassifier’, and ‘RidgeClassifier’;
with stochQN’s ‘StochasticLogisticRegression’;
and with this package’s ‘LinearRegression’.
case_one_class (str ‘auto’, ‘zero’, None, list, or function(X, n_pos, n_neg, rng) -> array(n_samples, 2)) – (When passing active_choice)
If some arm/choice/class has only rewards of one type, many models will fail to fit, and consequently the gradients
will be undefined. Likewise, if the model has not been fit, the gradient might also be undefined, and this requires a workaround.
If passing ‘None’, will assume that base_algorithm can be fit to
data of only-positive or only-negative class without problems, and that
it can calculate gradients and predictions with a base_algorithm
object that has not been fitted. Be aware that the methods ‘predict’,
‘predict_proba’, and ‘decision_function’ in base_algorithm might be
overwritten with another method that wraps it in a try-catch block, so
don’t rely on it producing errors when unfitted.
If passing a function, will take the output of it as the row-wise
gradient norms when it compares them against other arms/classes, with
the first column having the values if the observations were of negative
class, and the second column if they were positive class. The inputs to this
function (signature described above) are the number of positive and negative examples
that have been observed, and a Generator object from NumPy to use
for generating random numbers.
If passing a list, will assume each entry is a function as described
above, to be used with each corresponding arm.
If passing ‘zero’, it will output zero whenever models have not been fitted.
Note that the theoretically correct approach for a logistic regression would
be to assume models with all-zero coefficients, in which case the gradient
is defined in the absence of any data, but this tends to produce bad end
results.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator object for random number generation, a RandomState
object (from NumPy) from which to draw an integer, or a Generator
object (from NumPy), which will be used directly.
While this controls random number generation for this meteheuristic,
there can still be other sources of variations upon re-runs, such as
data aggregations in parallel (e.g. from OpenMP or BLAS functions).
njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will
set it to the number of CPU cores. Note that if the base algorithm is itself parallelized,
this might result in a slowdown as both compete for available threads, so don’t set
parallelization in both. The parallelization uses shared memory, thus you will only
see a speed up if your base classifier releases the Python GIL, and will
otherwise result in slower runs.
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last
arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise,
will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not
accept arbitrary classifiers as input,
don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase).
If the constructor was called with different base_algorithm per arm, must pass a
base classifier here. Not applicable for the classes that do not take a base_algorithm.
n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor
for details). If None and if the smoothing passed to the constructor didn’t have
separate entries per arm, will use the same smoothing as was passed in the constructor.
If no smoothing was passed to the constructor, the smoothing here will be ignored.
Must pass a smoothing here if the constructor was passed a smoothing with different entries per arm.
beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details.
Must be passed if the constructor was provided different beta priors per arm.
If None and the constructor had a single beta_prior, will use that same
beta_prior for this new arm.
Note that n_w_rew and n_wo_rew will be counted towards the threshold ‘n’
in here.
Cannot be passed if the constructor did not have a beta_prior.
refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
Parameters:
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise,
will drop the arm matching this name (argument must be of the same type as the individual entries
passed to ‘nchoices’ in the initialization).
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
continue_from_last (bool) – If the policy was previously fit to data, whether to assume that
this new call to ‘fit’ will continue from the exact same dataset as before
plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case,
will only refit the models that have new data according to ‘a’.
Note that the bootstrapped policies will still benefit from extra refits.
This option should not be used when there are calls to ‘partial_fit’ between
calls to fit.
Ignored if using assume_unique_reward=True.
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method,
such as ‘sklearn.linear_model.SGDClassifier’. This method is not available
for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
Selects actions according to this policy for new data.
Parameters:
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the
arm with the highest expected reward according to current models.
active_choice (str in {‘min’, ‘max’, ‘weighted’}) – How to calculate the gradient that an observation would have on the loss
function for each classifier, given that it could be either class (positive or negative)
for the classifier that predicts each arm. If weighted, they are weighted by the same
probability estimates from the base algorithm.
threshold (float or “auto”) – New threshold to use. If passing “auto”, will set it
to 1.5/nchoices. Note that this threshold will still be
decayed if the object was initialized with decay_type="threshold",
and will still be updated if initialized with percentile!=None.
This method will rank choices/arms according to what the policy
dictates - it is not an exploitation-mode rank, so if e.g. there are
random choices for some observations, there will be random ranks in here.
Parameters:
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
Returns:
topN – The top-ranked actions for each observation
Performs Thompson Sampling by fitting several models per class on bootstrapped samples,
then makes predictions by taking one of them at random for each class.
Note
When fitting the algorithm to data in batches (online), it’s not possible to take an
exact bootstrapped sample, as the sample is not known in advance. In theory, as the sample size
grows to infinity, the number of times that an observation appears in a bootstrapped sample is
distributed \(\sim Poisson(1)\). However, assigning random gamma-distributed weights to observations
produces a more stable effect, so it also has the option to assign weights randomly \(\sim Gamma(1,1)\).
Note
If you plan to make only one call to ‘predict’ between calls to ‘fit’ and have
sample_unique=False, you can pass nsamples=1 without losing any precision.
Parameters:
base_algorithm (obj) – Base binary classifier for which each sample for each class will be fit.
Will look for, in this order:
A ‘predict_proba’ method with outputs (n_samples, 2), values in [0,1], rows suming to 1
A ‘decision_function’ method with unbounded outputs (n_samples,) to which it will apply a sigmoid function.
A ‘predict’ method with outputs (n_samples,) with values in [0,1].
Can also pass a list with a different (or already-fit) classifier for each arm.
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case
the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a
custom name.
nsamples (int) – Number of bootstrapped samples per class to take.
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without
a reward from a given arm, it will predict the score for that class as a
random number drawn from a beta distribution with the prior
specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((2/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed
as a list of tuples.
This parameter can have a very large impact in the end results, and it’s
recommended to tune it accordingly - scenarios with low expected reward rates
should have priors that result in drawing small random numbers, whereas
scenarios with large expected reward rates should have stronger priors and
tend towards larger random numbers. Also, the more arms there are, the smaller
the optimal expected value for these random numbers.
Recommended to use only one of beta_prior or smoothing.
smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b),
where ‘n’ is the number of times each arm was chosen in the training data.
Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm
(e.g. if there are arm features, these parameters can be determined through a different model).
This will not work well with non-probabilistic classifiers such as SVM, in which case you might
want to define a class that embeds it with some recalibration built-in.
Recommended to use only one of beta_prior or smoothing.
noise_to_smooth (bool) – If passing smoothing, whether to add a small amount of random
noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of
choosing the smallest arm index.
Ignored when passing smoothing=None.
sample_unique (bool) – Whether to use a different bootstrapped classifier per row at each arm when
calling ‘predict’. If passing ‘False’, will take the same bootstrapped
classifier within an arm for all the rows passed in a single call to ‘predict’.
Passing ‘False’ is a faster alternative, but the theoretically correct way
is using a different one per row.
Forced to ‘True’ when passing sample_weighted=True.
sample_weighted (bool) – Whether to take a weighted average from the predictions from each bootstrapped
classifier at a given arm, with random weights. This will make the predictions
more variable (i.e. more randomness in exploration). The alternative (and
default) is to take a prediction from a single classifier each time.
batch_train (bool) – Whether the base algorithm will be fit to the data in batches as it comes (streaming),
or to the whole dataset each time it is refit. Requires a classifier with a
‘partial_fit’ method.
refit_buffer (int or None) – Number of observations per arm to keep as a reserve for passing to
‘partial_fit’. If passing it, up until the moment there are at least this
number of observations for a given arm, that arm will keep the observations
when calling ‘fit’ and ‘partial_fit’, and will translate calls to
‘partial_fit’ to calls to ‘fit’ with the new plus stored observations.
After the reserve number is reached, calls to ‘partial_fit’ will enlarge
the data batch with the stored observations, and old stored observations
will be gradually replaced with the new ones (at random, not on a FIFO
basis). This technique can greatly enchance the performance when fitting
the data in batches, but memory consumption can grow quite large.
If passing sparse CSR matrices as input to ‘fit’ and ‘partial_fit’,
these will be converted to dense once they go into this reserve, and
then converted back to CSR to augment the new data.
Calls to ‘fit’ will override this reserve.
Ignored when passing ‘batch_train=False’.
deep_copy_buffer (bool) – Whether to make deep copies of the data that is stored in the
reserve for refit_buffer. If passing ‘False’, when the reserve is
not yet full, these will only store shallow copies of the data, which
is faster but will not let Python’s garbage collector free memory
after deleting the data, and if the original data is overwritten, so will
this buffer.
Ignored when not using refit_buffer.
assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’,
whenever an arm receives a reward, the classifiers for all other arms will be
fit to that observation too, having negative label.
batch_sample_method (str, either ‘gamma’ or ‘poisson’) – How to simulate bootstrapped samples when training in batch mode (online).
See Note.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator object for random number generation, a RandomState
object (from NumPy) from which to draw an integer, or a Generator
object (from NumPy), which will be used directly.
While this controls random number generation for this meteheuristic,
there can still be other sources of variations upon re-runs, such as
data aggregations in parallel (e.g. from OpenMP or BLAS functions).
njobs_arms (int or None) – Number of parallel jobs to run (for dividing work across arms). If passing None will set it to 1.
If passing -1 will set it to the number of CPU cores. Note that if the base algorithm is itself
parallelized, this might result in a slowdown as both compete for available threads, so don’t set
parallelization in both. The total number of parallel jobs will be njobs_arms * njobs_samples.
The parallelization uses shared memory, thus you will only
see a speed up if your base classifier releases the Python GIL, and will
otherwise result in slower runs.
njobs_samples (int or None) – Number of parallel jobs to run (for dividing work across samples within one arm). If passing None
will set it to 1. If passing -1 will set it to the number of CPU cores. The total number of parallel
jobs will be njobs_arms * njobs_samples.
The parallelization uses shared memory, thus you will only
see a speed up if your base classifier releases the Python GIL, and will
otherwise result in slower runs.
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last
arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise,
will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not
accept arbitrary classifiers as input,
don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase).
If the constructor was called with different base_algorithm per arm, must pass a
base classifier here. Not applicable for the classes that do not take a base_algorithm.
n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor
for details). If None and if the smoothing passed to the constructor didn’t have
separate entries per arm, will use the same smoothing as was passed in the constructor.
If no smoothing was passed to the constructor, the smoothing here will be ignored.
Must pass a smoothing here if the constructor was passed a smoothing with different entries per arm.
beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details.
Must be passed if the constructor was provided different beta priors per arm.
If None and the constructor had a single beta_prior, will use that same
beta_prior for this new arm.
Note that n_w_rew and n_wo_rew will be counted towards the threshold ‘n’
in here.
Cannot be passed if the constructor did not have a beta_prior.
refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
Parameters:
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise,
will drop the arm matching this name (argument must be of the same type as the individual entries
passed to ‘nchoices’ in the initialization).
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
continue_from_last (bool) – If the policy was previously fit to data, whether to assume that
this new call to ‘fit’ will continue from the exact same dataset as before
plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case,
will only refit the models that have new data according to ‘a’.
Note that the bootstrapped policies will still benefit from extra refits.
This option should not be used when there are calls to ‘partial_fit’ between
calls to fit.
Ignored if using assume_unique_reward=True.
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method,
such as ‘sklearn.linear_model.SGDClassifier’. This method is not available
for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
Selects actions according to this policy for new data.
Parameters:
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the
arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use
it with this pakckage’s offpolicy and evaluation modules.
Returns:
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary
with the chosen arm and the score that the arm got following this policy with the classifiers used.
Return type:
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
This method will rank choices/arms according to what the policy
dictates - it is not an exploitation-mode rank, so if e.g. there are
random choices for some observations, there will be random ranks in here.
Parameters:
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
Returns:
topN – The top-ranked actions for each observation
Obtains an upper confidence bound by taking the percentile of the predictions from a
set of classifiers, all fit with different bootstrapped samples (multiple samples per arm).
Note
When fitting the algorithm to data in batches (online), it’s not possible to take an
exact bootstrapped sample, as the sample is not known in advance. In theory, as the sample size
grows to infinity, the number of times that an observation appears in a bootstrapped sample is
distributed \(\sim Poisson(1)\). However, assigning random gamma-distributed weights to observations
produces a more stable effect, so it also has the option to assign weights randomly \(\sim Gamma(1,1)\).
Parameters:
base_algorithm (obj or list) – Base binary classifier for which each sample for each class will be fit.
Will look for, in this order:
A ‘predict_proba’ method with outputs (n_samples, 2), values in [0,1], rows suming to 1
A ‘decision_function’ method with unbounded outputs (n_samples,) to which it will apply a sigmoid function.
A ‘predict’ method with outputs (n_samples,) with values in [0,1].
Can also pass a list with a different (or already-fit) classifier for each arm.
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case
the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a
custom name.
nsamples (int) – Number of bootstrapped samples per class to take.
percentile (int [0,100]) – Percentile of the predictions sample to take
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without
a reward from a given arm, it will predict the score for that class as a
random number drawn from a beta distribution with the prior
specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((3/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed
as a list of tuples.
Note that it will only generate one random number per arm, so the ‘a’
parameter should be higher than for other methods.
This parameter can have a very large impact in the end results, and it’s
recommended to tune it accordingly - scenarios with low expected reward rates
should have priors that result in drawing small random numbers, whereas
scenarios with large expected reward rates should have stronger priors and
tend towards larger random numbers. Also, the more arms there are, the smaller
the optimal expected value for these random numbers.
Recommended to use only one of beta_prior or smoothing.
smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b),
where ‘n’ is the number of times each arm was chosen in the training data.
Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm
(e.g. if there are arm features, these parameters can be determined through a different model).
This will not work well with non-probabilistic classifiers such as SVM, in which case you might
want to define a class that embeds it with some recalibration built-in.
Recommended to use only one of beta_prior or smoothing.
noise_to_smooth (bool) – If passing smoothing, whether to add a small amount of random
noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of
choosing the smallest arm index.
Ignored when passing smoothing=None.
batch_train (bool) – Whether the base algorithm will be fit to the data in batches as it comes (streaming),
or to the whole dataset each time it is refit. Requires a classifier with a
‘partial_fit’ method.
refit_buffer (int or None) – Number of observations per arm to keep as a reserve for passing to
‘partial_fit’. If passing it, up until the moment there are at least this
number of observations for a given arm, that arm will keep the observations
when calling ‘fit’ and ‘partial_fit’, and will translate calls to
‘partial_fit’ to calls to ‘fit’ with the new plus stored observations.
After the reserve number is reached, calls to ‘partial_fit’ will enlarge
the data batch with the stored observations, and old stored observations
will be gradually replaced with the new ones (at random, not on a FIFO
basis). This technique can greatly enchance the performance when fitting
the data in batches, but memory consumption can grow quite large.
If passing sparse CSR matrices as input to ‘fit’ and ‘partial_fit’,
these will be converted to dense once they go into this reserve, and
then converted back to CSR to augment the new data.
Calls to ‘fit’ will override this reserve.
Ignored when passing ‘batch_train=False’.
deep_copy_buffer (bool) – Whether to make deep copies of the data that is stored in the
reserve for refit_buffer. If passing ‘False’, when the reserve is
not yet full, these will only store shallow copies of the data, which
is faster but will not let Python’s garbage collector free memory
after deleting the data, and if the original data is overwritten, so will
this buffer.
Ignored when not using refit_buffer.
assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’,
whenever an arm receives a reward, the classifiers for all other arms will be
fit to that observation too, having negative label.
batch_sample_method (str, either ‘gamma’ or ‘poisson’) – How to simulate bootstrapped samples when training in batch mode (online).
See Note.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator object for random number generation, a RandomState
object (from NumPy) from which to draw an integer, or a Generator
object (from NumPy), which will be used directly.
While this controls random number generation for this meteheuristic,
there can still be other sources of variations upon re-runs, such as
data aggregations in parallel (e.g. from OpenMP or BLAS functions).
njobs_arms (int or None) – Number of parallel jobs to run (for dividing work across arms). If passing None will set it to 1.
If passing -1 will set it to the number of CPU cores. Note that if the base algorithm is itself
parallelized, this might result in a slowdown as both compete for available threads, so don’t set
parallelization in both. The total number of parallel jobs will be njobs_arms * njobs_samples. The parallelization uses shared memory, thus you will only
see a speed up if your base classifier releases the Python GIL, and will
otherwise result in slower runs.
njobs_samples (int or None) – Number of parallel jobs to run (for dividing work across samples within one arm). If passing None
will set it to 1. If passing -1 will set it to the number of CPU cores. The total number of parallel
jobs will be njobs_arms * njobs_samples.
The parallelization uses shared memory, thus you will only
see a speed up if your base classifier releases the Python GIL, and will
otherwise result in slower runs.
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last
arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise,
will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not
accept arbitrary classifiers as input,
don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase).
If the constructor was called with different base_algorithm per arm, must pass a
base classifier here. Not applicable for the classes that do not take a base_algorithm.
n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor
for details). If None and if the smoothing passed to the constructor didn’t have
separate entries per arm, will use the same smoothing as was passed in the constructor.
If no smoothing was passed to the constructor, the smoothing here will be ignored.
Must pass a smoothing here if the constructor was passed a smoothing with different entries per arm.
beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details.
Must be passed if the constructor was provided different beta priors per arm.
If None and the constructor had a single beta_prior, will use that same
beta_prior for this new arm.
Note that n_w_rew and n_wo_rew will be counted towards the threshold ‘n’
in here.
Cannot be passed if the constructor did not have a beta_prior.
refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
Parameters:
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise,
will drop the arm matching this name (argument must be of the same type as the individual entries
passed to ‘nchoices’ in the initialization).
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
continue_from_last (bool) – If the policy was previously fit to data, whether to assume that
this new call to ‘fit’ will continue from the exact same dataset as before
plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case,
will only refit the models that have new data according to ‘a’.
Note that the bootstrapped policies will still benefit from extra refits.
This option should not be used when there are calls to ‘partial_fit’ between
calls to fit.
Ignored if using assume_unique_reward=True.
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method,
such as ‘sklearn.linear_model.SGDClassifier’. This method is not available
for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
Selects actions according to this policy for new data.
Parameters:
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the
arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use
it with this pakckage’s offpolicy and evaluation modules.
Returns:
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary
with the chosen arm and the score that the arm got following this policy with the classifiers used.
Return type:
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
This method will rank choices/arms according to what the policy
dictates - it is not an exploitation-mode rank, so if e.g. there are
random choices for some observations, there will be random ranks in here.
Parameters:
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
Returns:
topN – The top-ranked actions for each observation
Takes a random action with probability p, or the action with highest
estimated reward with probability 1-p.
Parameters:
base_algorithm (obj) – Base binary classifier for which each sample for each class will be fit.
Will look for, in this order:
A ‘predict_proba’ method with outputs (n_samples, 2), values in [0,1], rows suming to 1
A ‘decision_function’ method with unbounded outputs (n_samples,) to which it will apply a sigmoid function.
A ‘predict’ method with outputs (n_samples,) with values in [0,1].
Can also pass a list with a different (or already-fit) classifier for each arm.
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case
the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a
custom name.
explore_prob (float (0,1)) – Probability of taking a random action at each round.
decay (float (0,1)) –
After each prediction, the explore probability reduces to
p = p*decay
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without
a reward from a given arm, it will predict the score for that class as a
random number drawn from a beta distribution with the prior
specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((2/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed
as a list of tuples.
The impact of beta_prior for EpsilonGreedy is not as high as for other
policies in this module.
Recommended to use only one of beta_prior or smoothing.
smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b),
where ‘n’ is the number of times each arm was chosen in the training data.
Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm
(e.g. if there are arm features, these parameters can be determined through a different model).
This will not work well with non-probabilistic classifiers such as SVM, in which case you might
want to define a class that embeds it with some recalibration built-in.
Recommended to use only one of beta_prior or smoothing.
noise_to_smooth (bool) – If passing smoothing, whether to add a small amount of random
noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of
choosing the smallest arm index.
Ignored when passing smoothing=None.
batch_train (bool) – Whether the base algorithm will be fit to the data in batches as it comes (streaming),
or to the whole dataset each time it is refit. Requires a classifier with a
‘partial_fit’ method.
refit_buffer (int or None) – Number of observations per arm to keep as a reserve for passing to
‘partial_fit’. If passing it, up until the moment there are at least this
number of observations for a given arm, that arm will keep the observations
when calling ‘fit’ and ‘partial_fit’, and will translate calls to
‘partial_fit’ to calls to ‘fit’ with the new plus stored observations.
After the reserve number is reached, calls to ‘partial_fit’ will enlarge
the data batch with the stored observations, and old stored observations
will be gradually replaced with the new ones (at random, not on a FIFO
basis). This technique can greatly enchance the performance when fitting
the data in batches, but memory consumption can grow quite large.
If passing sparse CSR matrices as input to ‘fit’ and ‘partial_fit’,
these will be converted to dense once they go into this reserve, and
then converted back to CSR to augment the new data.
Calls to ‘fit’ will override this reserve.
Ignored when passing ‘batch_train=False’.
deep_copy_buffer (bool) – Whether to make deep copies of the data that is stored in the
reserve for refit_buffer. If passing ‘False’, when the reserve is
not yet full, these will only store shallow copies of the data, which
is faster but will not let Python’s garbage collector free memory
after deleting the data, and if the original data is overwritten, so will
this buffer.
Ignored when not using refit_buffer.
assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’,
whenever an arm receives a reward, the classifiers for all other arms will be
fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator object for random number generation, a RandomState
object (from NumPy) from which to draw an integer, or a Generator
object (from NumPy), which will be used directly.
While this controls random number generation for this meteheuristic,
there can still be other sources of variations upon re-runs, such as
data aggregations in parallel (e.g. from OpenMP or BLAS functions).
njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will
set it to the number of CPU cores. Note that if the base algorithm is itself parallelized,
this might result in a slowdown as both compete for available threads, so don’t set
parallelization in both. The parallelization uses shared memory, thus you will only
see a speed up if your base classifier releases the Python GIL, and will
otherwise result in slower runs.
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last
arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise,
will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not
accept arbitrary classifiers as input,
don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase).
If the constructor was called with different base_algorithm per arm, must pass a
base classifier here. Not applicable for the classes that do not take a base_algorithm.
n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor
for details). If None and if the smoothing passed to the constructor didn’t have
separate entries per arm, will use the same smoothing as was passed in the constructor.
If no smoothing was passed to the constructor, the smoothing here will be ignored.
Must pass a smoothing here if the constructor was passed a smoothing with different entries per arm.
beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details.
Must be passed if the constructor was provided different beta priors per arm.
If None and the constructor had a single beta_prior, will use that same
beta_prior for this new arm.
Note that n_w_rew and n_wo_rew will be counted towards the threshold ‘n’
in here.
Cannot be passed if the constructor did not have a beta_prior.
refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
Parameters:
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise,
will drop the arm matching this name (argument must be of the same type as the individual entries
passed to ‘nchoices’ in the initialization).
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
continue_from_last (bool) – If the policy was previously fit to data, whether to assume that
this new call to ‘fit’ will continue from the exact same dataset as before
plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case,
will only refit the models that have new data according to ‘a’.
Note that the bootstrapped policies will still benefit from extra refits.
This option should not be used when there are calls to ‘partial_fit’ between
calls to fit.
Ignored if using assume_unique_reward=True.
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method,
such as ‘sklearn.linear_model.SGDClassifier’. This method is not available
for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
Selects actions according to this policy for new data.
Parameters:
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the
arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use
it with this pakckage’s offpolicy and evaluation modules.
Returns:
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary
with the chosen arm and the score that the arm got following this policy with the classifiers used.
Return type:
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
This method will rank choices/arms according to what the policy
dictates - it is not an exploitation-mode rank, so if e.g. there are
random choices for some observations, there will be random ranks in here.
Parameters:
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
Returns:
topN – The top-ranked actions for each observation
Selects random actions for the first N predictions, after which it selects the
best arm only, according to its estimates.
Parameters:
base_algorithm (obj) – Base binary classifier for which each sample for each class will be fit.
Will look for, in this order:
A ‘predict_proba’ method with outputs (n_samples, 2), values in [0,1], rows suming to 1
A ‘decision_function’ method with unbounded outputs (n_samples,) to which it will apply a sigmoid function.
A ‘predict’ method with outputs (n_samples,) with values in [0,1].
Can also pass a list with a different (or already-fit) classifier for each arm.
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case
the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a
custom name.
explore_rounds (int) – Number of rounds to wait before exploitation mode.
Will switch after making N predictions.
prob_active_choice (float (0, 1)) – Probability of choosing explore-mode actions according to active
learning criteria. Pass zero for choosing everything at random.
active_choice (str, one of ‘weighted’, ‘max’ or ‘min’) – How to calculate the gradient that an observation would have on the loss
function for each classifier, given that it could be either class (positive or negative)
for the classifier that predicts each arm. If weighted, they are weighted by the same
probability estimates from the base algorithm.
f_grad_norm (None, str ‘auto’ or function(base_algorithm, X, pred) -> array (n_samples, 2)) – (When passing active_choice)
Function that calculates the row-wise norm of the gradient from observations in X if their class were
negative (first column) or positive (second column).
Can also use different functions for each arm, in which case it
accepts them as a list of functions with length equal to nchoices.
The option ‘auto’ will only work with scikit-learn’s ‘LogisticRegression’, ‘SGDClassifier’ (log-loss only), and ‘RidgeClassifier’;
with stochQN’s ‘StochasticLogisticRegression’;
and with this package’s ‘LinearRegression’.
Ignored when passing prob_active_choice=0.
case_one_class (str ‘auto’, ‘zero’, None, list, or function(X, n_pos, n_neg, rng) -> array(n_samples, 2)) – (When passing active_choice)
If some arm/choice/class has only rewards of one type, many models will fail to fit, and consequently the gradients
will be undefined. Likewise, if the model has not been fit, the gradient might also be undefined, and this requires a workaround.
If passing ‘None’, will assume that base_algorithm can be fit to
data of only-positive or only-negative class without problems, and that
it can calculate gradients and predictions with a base_algorithm
object that has not been fitted. Be aware that the methods ‘predict’,
‘predict_proba’, and ‘decision_function’ in base_algorithm might be
overwritten with another method that wraps it in a try-catch block, so
don’t rely on it producing errors when unfitted.
If passing a function, will take the output of it as the row-wise
gradient norms when it compares them against other arms/classes, with
the first column having the values if the observations were of negative
class, and the second column if they were positive class. The inputs to this
function (signature described above) are the number of positive and negative examples
that have been observed, and a Generator object from NumPy to use
for generating random numbers.
If passing a list, will assume each entry is a function as described
above, to be used with each corresponding arm.
If passing ‘zero’, it will output zero whenever models have not been fitted.
Note that the theoretically correct approach for a logistic regression would
be to assume models with all-zero coefficients, in which case the gradient
is defined in the absence of any data, but this tends to produce bad end
results.
Ignored when passing prob_active_choice=0.
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without
a reward from a given arm, it will predict the score for that class as a
random number drawn from a beta distribution with the prior
specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((2/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed
as a list of tuples.
Recommended to use only one of beta_prior or smoothing.
smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b),
where ‘n’ is the number of times each arm was chosen in the training data.
Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm
(e.g. if there are arm features, these parameters can be determined through a different model).
This will not work well with non-probabilistic classifiers such as SVM, in which case you might
want to define a class that embeds it with some recalibration built-in.
Recommended to use only one of beta_prior or smoothing.
noise_to_smooth (bool) – If passing smoothing, whether to add a small amount of random
noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of
choosing the smallest arm index.
Ignored when passing smoothing=None.
batch_train (bool) – Whether the base algorithm will be fit to the data in batches as it comes (streaming),
or to the whole dataset each time it is refit. Requires a classifier with a
‘partial_fit’ method.
refit_buffer (int or None) – Number of observations per arm to keep as a reserve for passing to
‘partial_fit’. If passing it, up until the moment there are at least this
number of observations for a given arm, that arm will keep the observations
when calling ‘fit’ and ‘partial_fit’, and will translate calls to
‘partial_fit’ to calls to ‘fit’ with the new plus stored observations.
After the reserve number is reached, calls to ‘partial_fit’ will enlarge
the data batch with the stored observations, and old stored observations
will be gradually replaced with the new ones (at random, not on a FIFO
basis). This technique can greatly enchance the performance when fitting
the data in batches, but memory consumption can grow quite large.
If passing sparse CSR matrices as input to ‘fit’ and ‘partial_fit’,
these will be converted to dense once they go into this reserve, and
then converted back to CSR to augment the new data.
Calls to ‘fit’ will override this reserve.
Ignored when passing ‘batch_train=False’.
deep_copy_buffer (bool) – Whether to make deep copies of the data that is stored in the
reserve for refit_buffer. If passing ‘False’, when the reserve is
not yet full, these will only store shallow copies of the data, which
is faster but will not let Python’s garbage collector free memory
after deleting the data, and if the original data is overwritten, so will
this buffer.
Ignored when not using refit_buffer.
assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’,
whenever an arm receives a reward, the classifiers for all other arms will be
fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator object for random number generation, a RandomState
object (from NumPy) from which to draw an integer, or a Generator
object (from NumPy), which will be used directly.
While this controls random number generation for this meteheuristic,
there can still be other sources of variations upon re-runs, such as
data aggregations in parallel (e.g. from OpenMP or BLAS functions).
njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will
set it to the number of CPU cores. Note that if the base algorithm is itself parallelized,
this might result in a slowdown as both compete for available threads, so don’t set
parallelization in both. The parallelization uses shared memory, thus you will only
see a speed up if your base classifier releases the Python GIL, and will
otherwise result in slower runs.
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last
arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise,
will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not
accept arbitrary classifiers as input,
don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase).
If the constructor was called with different base_algorithm per arm, must pass a
base classifier here. Not applicable for the classes that do not take a base_algorithm.
n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor
for details). If None and if the smoothing passed to the constructor didn’t have
separate entries per arm, will use the same smoothing as was passed in the constructor.
If no smoothing was passed to the constructor, the smoothing here will be ignored.
Must pass a smoothing here if the constructor was passed a smoothing with different entries per arm.
beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details.
Must be passed if the constructor was provided different beta priors per arm.
If None and the constructor had a single beta_prior, will use that same
beta_prior for this new arm.
Note that n_w_rew and n_wo_rew will be counted towards the threshold ‘n’
in here.
Cannot be passed if the constructor did not have a beta_prior.
refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
Parameters:
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise,
will drop the arm matching this name (argument must be of the same type as the individual entries
passed to ‘nchoices’ in the initialization).
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
continue_from_last (bool) – If the policy was previously fit to data, whether to assume that
this new call to ‘fit’ will continue from the exact same dataset as before
plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case,
will only refit the models that have new data according to ‘a’.
Note that the bootstrapped policies will still benefit from extra refits.
This option should not be used when there are calls to ‘partial_fit’ between
calls to fit.
Ignored if using assume_unique_reward=True.
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method,
such as ‘sklearn.linear_model.SGDClassifier’. This method is not available
for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
Selects actions according to this policy for new data.
Parameters:
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the
arm with the highest expected reward according to current models.
active_choice (str in {‘min’, ‘max’, ‘weighted’}) – How to calculate the gradient that an observation would have on the loss
function for each classifier, given that it could be either class (positive or negative)
for the classifier that predicts each arm. If weighted, they are weighted by the same
probability estimates from the base algorithm.
This method will rank choices/arms according to what the policy
dictates - it is not an exploitation-mode rank, so if e.g. there are
random choices for some observations, there will be random ranks in here.
Parameters:
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
Returns:
topN – The top-ranked actions for each observation
This strategy requires each fitted model to store a square matrix with
dimension equal to the number of features. Thus, memory consumption can grow
very high with this method.
Note
The ‘X’ data (covariates) should ideally be centered before passing them
to ‘fit’, ‘partial_fit’, ‘predict’.
Note
Be aware that sampling coefficients is an operation that scales poorly with
the number of columns/features/variables. For wide datasets, it might be
slower than a bootstrapped approach, especially when using sample_unique=True.
Parameters:
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case
the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a
custom name.
lambda_ (float > 0) – Regularization parameter. References assumed this would always be equal to 1, but this
implementation allows to change it.
fit_intercept (bool) – Whether to add an intercept term to the coefficients.
v_sq (float) – Parameter by which to multiply the covariance matrix (more means higher variance).
It is recommended to decrease it from the default value of 1.
sample_from (str, one of “coef”, “ci”) – Whether to make predictions by sampling the model coefficients or by
sampling the predicted value from an interval centered around the coefficients.
If sampling from the coefficients, it’s highly recommended to use method="chol"
as it will be faster and more precise.
n_presampled (None or int) – If sampling from coefficients, this denotes a number of coefficients to pre-sample
after calling ‘fit’ and/or ‘partial_fit’, which will be used later in the predictions. Pre-sampling
a large number of coefficients can help to speed up predictions at the expense
of longer fitting times, and is recommended if there is a large number of predictions
between calls to ‘fit’ or ‘partial_fit’.
If passing ‘None’ (the default), will not pre-sample a finite number of the coefficients
at fitting time, but will rather sample (different) coefficients in calls to ‘predict’.
Ignored when passing sample_from="ci".
sample_unique (bool) – Whether to sample different coefficients each time a prediction is to
be made. If passing ‘False’, when calling ‘predict’, it will sample
the same coefficients for all the observations in the same call to
‘predict’, whereas if passing ‘True’, will use a different set of
coefficients for each observations. Passing ‘False’ leads to an
approach which is theoretically wrong, but as sampling coefficients
can be very slow, using ‘False’ can provide a reasonable speed up
without much of a performance penalty.
Ignored when passing sample_from="ci" or n_presampled.
use_float (bool) – Whether to use C ‘float’ type for the required matrices. If passing ‘False’,
will use C ‘double’. Be aware that memory usage for this model can grow
very large, and that it is more prone to suffer from numeric precision
problems compared to its UCB counterpart.
method (str, one of ‘chol’ or ‘sm’) – Method used to fit the model. Options are:
'chol':
Uses the Cholesky decomposition to solve the linear system from the
least-squares closed-form each time ‘fit’ or ‘partial_fit’ is called.
This is likely to be faster when fitting the model to a large number
of observations at once, and is able to better exploit multi-threading.
'sm':
Starts with an inverse diagonal matrix and updates it as each
new observation comes using the Sherman-Morrison formula, thus
never explicitly solving the linear system, nor needing to calculate
a matrix inverse. This is likely to be faster when fitting the model
to small batches of observations. Be aware that with this method, it
will add regularization to the intercept if passing ‘fit_intercept=True’.
Note that, even when using “sm” here, if sampling from the coefficients, it
will need after each update to calculate eigen values of the covariance or
inverse covariance matrix, so it won’t be as fast as LinUCB.
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without
a reward from a given arm, it will predict the score for that class as a
random number drawn from a beta distribution with the prior
specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((2/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed
as a list of tuples.
This parameter can have a very large impact in the end results, and it’s
recommended to tune it accordingly - scenarios with low expected reward rates
should have priors that result in drawing small random numbers, whereas
scenarios with large expected reward rates should have stronger priors and
tend towards larger random numbers. Also, the more arms there are, the smaller
the optimal expected value for these random numbers.
smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b),
where ‘n’ is the number of times each arm was chosen in the training data.
Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm
(e.g. if there are arm features, these parameters can be determined through a different model).
Recommended to use only one of beta_prior or smoothing.
Note that it is technically incorrect to apply smoothing like this (because
the predictions from models are not bounded between zero and one), but
if neither beta_prior, nor smoothing are passed, the policy can get
stuck in situations in which it will only choose actions from the first batch
of observations to which it is fit.
noise_to_smooth (bool) – If passing smoothing, whether to add a small amount of random
noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of
choosing the smallest arm index.
Ignored when passing smoothing=None.
assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’,
whenever an arm receives a reward, the classifiers for all other arms will be
fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator object for random number generation, a RandomState
object (from NumPy) from which to draw an integer, or a Generator
object (from NumPy), which will be used directly.
While this controls random number generation for this meteheuristic,
there can still be other sources of variations upon re-runs, such as
data aggregations in parallel (e.g. from OpenMP or BLAS functions).
njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will
set it to the number of CPU cores. Be aware that the algorithm will use BLAS function calls,
and if these have multi-threading enabled, it might result in a slow-down
as both functions compete for available threads.
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last
arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise,
will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not
accept arbitrary classifiers as input,
don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase).
If the constructor was called with different base_algorithm per arm, must pass a
base classifier here. Not applicable for the classes that do not take a base_algorithm.
n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor
for details). If None and if the smoothing passed to the constructor didn’t have
separate entries per arm, will use the same smoothing as was passed in the constructor.
If no smoothing was passed to the constructor, the smoothing here will be ignored.
Must pass a smoothing here if the constructor was passed a smoothing with different entries per arm.
beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details.
Must be passed if the constructor was provided different beta priors per arm.
If None and the constructor had a single beta_prior, will use that same
beta_prior for this new arm.
Note that n_w_rew and n_wo_rew will be counted towards the threshold ‘n’
in here.
Cannot be passed if the constructor did not have a beta_prior.
refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
Parameters:
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise,
will drop the arm matching this name (argument must be of the same type as the individual entries
passed to ‘nchoices’ in the initialization).
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
continue_from_last (bool) – If the policy was previously fit to data, whether to assume that
this new call to ‘fit’ will continue from the exact same dataset as before
plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case,
will only refit the models that have new data according to ‘a’.
Note that the bootstrapped policies will still benefit from extra refits.
This option should not be used when there are calls to ‘partial_fit’ between
calls to fit.
Ignored if using assume_unique_reward=True.
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method,
such as ‘sklearn.linear_model.SGDClassifier’. This method is not available
for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
Selects actions according to this policy for new data.
Parameters:
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the
arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use
it with this pakckage’s offpolicy and evaluation modules.
Returns:
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary
with the chosen arm and the score that the arm got following this policy with the classifiers used.
Return type:
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
This method will rank choices/arms according to what the policy
dictates - it is not an exploitation-mode rank, so if e.g. there are
random choices for some observations, there will be random ranks in here.
Parameters:
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
Returns:
topN – The top-ranked actions for each observation
This strategy requires each fitted model to store a square matrix with
dimension equal to the number of features. Thus, memory consumption can grow
very high with this method.
Note
The ‘X’ data (covariates) should ideally be centered before passing them
to ‘fit’, ‘partial_fit’, ‘predict’.
Note
The default hyperparameters here are meant to match the original reference, but
it’s recommended to change them. Particularly: use beta_prior instead of
ucb_from_empty, decrease alpha, and maybe increase lambda_.
Parameters:
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case
the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a
custom name.
alpha (float) – Parameter to control the upper confidence bound (more is higher).
lambda_ (float > 0) – Regularization parameter. References assumed this would always be equal to 1, but this
implementation allows to change it.
fit_intercept (bool) – Whether to add an intercept term to the coefficients.
use_float (bool) – Whether to use C ‘float’ type for the required matrices. If passing ‘False’,
will use C ‘double’. Be aware that memory usage for this model can grow
very large.
method (str, one of ‘chol’ or ‘sm’) – Method used to fit the model. Options are:
'chol':
Uses the Cholesky decomposition to solve the linear system from the
least-squares closed-form each time ‘fit’ or ‘partial_fit’ is called.
This is likely to be faster when fitting the model to a large number
of observations at once, and is able to better exploit multi-threading.
'sm':
Starts with an inverse diagonal matrix and updates it as each
new observation comes using the Sherman-Morrison formula, thus
never explicitly solving the linear system, nor needing to calculate
a matrix inverse. This is likely to be faster when fitting the model
to small batches of observations. Be aware that with this method, it
will add regularization to the intercept if passing ‘fit_intercept=True’.
ucb_from_empty (bool) – Whether to make upper confidence bounds on arms with no observations according
to the formula, as suggested in the references (ties are broken at random for
them). Choosing this option leads to policies that usually start making random
predictions until having sampled from all arms, and as such, it’s not
recommended when the number of arms is large relative to the number of rounds.
Instead, it’s recommended to use beta_prior, which acts in the same way
as for the other policies in this library.
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without
a reward from a given arm, it will predict the score for that class as a
random number drawn from a beta distribution with the prior
specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((3/log2(nchoices), 4), 2).
Can also pass different priors per arm, in which case they should be passed
as a list of tuples.
This parameter can have a very large impact in the end results, and it’s
recommended to tune it accordingly - scenarios with low expected reward rates
should have priors that result in drawing small random numbers, whereas
scenarios with large expected reward rates should have stronger priors and
tend towards larger random numbers. Also, the more arms there are, the smaller
the optimal expected value for these random numbers.
Ignored when passing ucb_from_empty=True.
smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b),
where ‘n’ is the number of times each arm was chosen in the training data.
Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm
(e.g. if there are arm features, these parameters can be determined through a different model).
Recommended to use only one of beta_prior or smoothing.
Note that it is technically incorrect to apply smoothing like this (because
the predictions from models are not bounded between zero and one), but
if neither beta_prior, nor smoothing are passed, the policy can get
stuck in situations in which it will only choose actions from the first batch
of observations to which it is fit (if using ucb_from_empty=False), or
only from the first arms that show rewards (if using ucb_from_empty=True).
noise_to_smooth (bool) – If passing smoothing, whether to add a small amount of random
noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of
choosing the smallest arm index.
Ignored when passing smoothing=None.
assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’,
whenever an arm receives a reward, the classifiers for all other arms will be
fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator object for random number generation, a RandomState
object (from NumPy) from which to draw an integer, or a Generator
object (from NumPy), which will be used directly.
While this controls random number generation for this meteheuristic,
there can still be other sources of variations upon re-runs, such as
data aggregations in parallel (e.g. from OpenMP or BLAS functions).
njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will
set it to the number of CPU cores. Be aware that the algorithm will use BLAS function calls,
and if these have multi-threading enabled, it might result in a slow-down
as both functions compete for available threads.
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last
arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise,
will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not
accept arbitrary classifiers as input,
don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase).
If the constructor was called with different base_algorithm per arm, must pass a
base classifier here. Not applicable for the classes that do not take a base_algorithm.
n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor
for details). If None and if the smoothing passed to the constructor didn’t have
separate entries per arm, will use the same smoothing as was passed in the constructor.
If no smoothing was passed to the constructor, the smoothing here will be ignored.
Must pass a smoothing here if the constructor was passed a smoothing with different entries per arm.
beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details.
Must be passed if the constructor was provided different beta priors per arm.
If None and the constructor had a single beta_prior, will use that same
beta_prior for this new arm.
Note that n_w_rew and n_wo_rew will be counted towards the threshold ‘n’
in here.
Cannot be passed if the constructor did not have a beta_prior.
refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
Parameters:
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise,
will drop the arm matching this name (argument must be of the same type as the individual entries
passed to ‘nchoices’ in the initialization).
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
continue_from_last (bool) – If the policy was previously fit to data, whether to assume that
this new call to ‘fit’ will continue from the exact same dataset as before
plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case,
will only refit the models that have new data according to ‘a’.
Note that the bootstrapped policies will still benefit from extra refits.
This option should not be used when there are calls to ‘partial_fit’ between
calls to fit.
Ignored if using assume_unique_reward=True.
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method,
such as ‘sklearn.linear_model.SGDClassifier’. This method is not available
for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
Selects actions according to this policy for new data.
Parameters:
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the
arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use
it with this pakckage’s offpolicy and evaluation modules.
Returns:
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary
with the chosen arm and the score that the arm got following this policy with the classifiers used.
Return type:
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
This method will rank choices/arms according to what the policy
dictates - it is not an exploitation-mode rank, so if e.g. there are
random choices for some observations, there will be random ranks in here.
Parameters:
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
Returns:
topN – The top-ranked actions for each observation
Logistic regression classifier which either samples its coefficients using
the variance-covariance matrix of the fitted non-sampled coefficients,
or which samples predicted values from a confidence interval built from the
same variance-covariance matrix as a faster alternative.
Note
This strategy is implemented for comparison purposes only and it’s not
recommended to rely on it, particularly not for large datasets. Performance
tends to be very bad compared to the other methods provided here.
Note
This strategy does not support fitting the data in batches (‘partial_fit’
will not be available), nor does it support using any other classifier.
See ‘BootstrappedTS’ for a more generalizable version.
Note
This strategy requires each fitted model to store a square matrix with
dimension equal to the number of features. Thus, memory consumption can grow
very high with this method.
Note
Be aware that sampling coefficients is an operation that scales poorly with
the number of columns/features/variables. For wide datasets, it might be
slower than a bootstrapped approach, especially when using sample_unique=True.
Parameters:
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case
the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a
custom name.
sample_from (str, one of “coef”, “ci”) – Whether to make predictions by sampling the model coefficients or by
sampling the predicted value from a confidence interval around the best-fit
coefficients.
ci_from_empty (bool) – Whether to construct a confidence interval on arms with no observations
according to a variance-covariance matrix given by the regulatization
parameter alone.
Ignored when passing sample_from='coef'.
multiplier (float) – Multiplier for the covariance matrix. Pass 1 to take it as-is.
Ignored when passing sample_from='ci'.
n_presampled (None or int) – If sampling from coefficients, this denotes a number of coefficients to pre-sample
after calling ‘fit’, which will be used later in the predictions. Pre-sampling
a large number of coefficients can help to speed up predictions at the expense
of longer fitting times, and is recommended if there is a large number of predictions
between calls to ‘fit’.
If passing ‘None’ (the default), will not pre-sample a finite number of the coefficients
at fitting time, but will rather sample (different) coefficients in calls to ‘predict’.
Ignored when passing sample_from="ci".
fit_intercept (bool) – Whether to add an intercept term to the models.
lambda_ (float) – Strenght of the L2 regularization. Must be greater than zero.
sample_unique (bool) – Whether to sample different coefficients each time a prediction is to
be made. If passing ‘False’, when calling ‘predict’, it will sample
the same coefficients for all the observations in the same call to
‘predict’, whereas if passing ‘True’, will use a different set of
coefficients for each observation/row. Passing ‘False’ leads to an
approach which is theoretically wrong, but as sampling coefficients
can be very slow, using ‘False’ can provide a reasonable speed up
without much of a performance penalty.
Ignored when passing sample_from='ci' or n_presampled.
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without
a reward from a given arm, it will predict the score for that class as a
random number drawn from a beta distribution with the prior
specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((2/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed
as a list of tuples.
This parameter can have a very large impact in the end results, and it’s
recommended to tune it accordingly - scenarios with low expected reward rates
should have priors that result in drawing small random numbers, whereas
scenarios with large expected reward rates should have stronger priors and
tend towards larger random numbers. Also, the more arms there are, the smaller
the optimal expected value for these random numbers.
Recommended to use only one of beta_prior, smoothing, ci_from_empty.
Ignored when passing ci_from_empty=True.
smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b),
where ‘n’ is the number of times each arm was chosen in the training data.
Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm
(e.g. if there are arm features, these parameters can be determined through a different model).
Recommended to use only one of beta_prior, smoothing, ci_from_empty.
noise_to_smooth (bool) – If passing smoothing, whether to add a small amount of random
noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of
choosing the smallest arm index.
Ignored when passing smoothing=None.
assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’,
whenever an arm receives a reward, the classifiers for all other arms will be
fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator object for random number generation, a RandomState
object (from NumPy) from which to draw an integer, or a Generator
object (from NumPy), which will be used directly.
While this controls random number generation for this meteheuristic,
there can still be other sources of variations upon re-runs, such as
data aggregations in parallel (e.g. from OpenMP or BLAS functions).
njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will
set it to the number of CPU cores. Be aware that the algorithm will use BLAS function calls,
and if these have multi-threading enabled, it might result in a slow-down
as both functions compete for available threads.
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last
arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise,
will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not
accept arbitrary classifiers as input,
don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase).
If the constructor was called with different base_algorithm per arm, must pass a
base classifier here. Not applicable for the classes that do not take a base_algorithm.
n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor
for details). If None and if the smoothing passed to the constructor didn’t have
separate entries per arm, will use the same smoothing as was passed in the constructor.
If no smoothing was passed to the constructor, the smoothing here will be ignored.
Must pass a smoothing here if the constructor was passed a smoothing with different entries per arm.
beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details.
Must be passed if the constructor was provided different beta priors per arm.
If None and the constructor had a single beta_prior, will use that same
beta_prior for this new arm.
Note that n_w_rew and n_wo_rew will be counted towards the threshold ‘n’
in here.
Cannot be passed if the constructor did not have a beta_prior.
refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
Parameters:
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise,
will drop the arm matching this name (argument must be of the same type as the individual entries
passed to ‘nchoices’ in the initialization).
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
continue_from_last (bool) – If the policy was previously fit to data, whether to assume that
this new call to ‘fit’ will continue from the exact same dataset as before
plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case,
will only refit the models that have new data according to ‘a’.
Note that the bootstrapped policies will still benefit from extra refits.
This option should not be used when there are calls to ‘partial_fit’ between
calls to fit.
Ignored if using assume_unique_reward=True.
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method,
such as ‘sklearn.linear_model.SGDClassifier’. This method is not available
for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
Selects actions according to this policy for new data.
Parameters:
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the
arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use
it with this pakckage’s offpolicy and evaluation modules.
Returns:
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary
with the chosen arm and the score that the arm got following this policy with the classifiers used.
Return type:
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
This method will rank choices/arms according to what the policy
dictates - it is not an exploitation-mode rank, so if e.g. there are
random choices for some observations, there will be random ranks in here.
Parameters:
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
Returns:
topN – The top-ranked actions for each observation
Logistic regression classifier which constructs an upper bound on the
predicted probabilities through a confidence interval calculated from
the variance-covariance matrix of the fitted coefficients.
Note
This strategy is implemented for comparison purposes only and it’s not
recommended to rely on it, particularly not for large datasets.
Note
This strategy does not support fitting the data in batches (‘partial_fit’
will not be available), nor does it support using any other classifier.
See ‘BootstrappedUCB’ for a more generalizable version.
Note
This strategy requires each fitted classifier to store a square matrix with
dimension equal to the number of features. Thus, memory consumption can grow
very high with this method.
Parameters:
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case
the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a
custom name.
percentile (int [0,100]) – Percentile of the confidence interval to take.
fit_intercept (bool) – Whether to add an intercept term to the models.
lambda_ (float) – Strenght of the L2 regularization. Must be greater than zero.
ucb_from_empty (bool) – Whether to make upper confidence bounds on arms with no observations according
to the formula (ties are broken at random for
them). Choosing this option leads to policies that usually start making random
predictions until having sampled from all arms, and as such, it’s not
recommended when the number of arms is large relative to the number of rounds.
Instead, it’s recommended to use beta_prior, which acts in the same way
as for the other policies in this library.
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without
a reward from a given arm, it will predict the score for that class as a
random number drawn from a beta distribution with the prior
specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((3/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed
as a list of tuples.
This parameter can have a very large impact in the end results, and it’s
recommended to tune it accordingly - scenarios with low expected reward rates
should have priors that result in drawing small random numbers, whereas
scenarios with large expected reward rates should have stronger priors and
tend towards larger random numbers. Also, the more arms there are, the smaller
the optimal expected value for these random numbers.
Note that this method calculates upper bounds rather than expectations, so the ‘a’
parameter should be higher than for other methods.
Recommended to use only one of beta_prior or smoothing. Ignored when
passing ucb_from_empty=True.
smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b),
where ‘n’ is the number of times each arm was chosen in the training data.
Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm
(e.g. if there are arm features, these parameters can be determined through a different model).
Recommended to use only one of beta_prior or smoothing.
noise_to_smooth (bool) – If passing smoothing, whether to add a small amount of random
noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of
choosing the smallest arm index.
Ignored when passing smoothing=None.
assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’,
whenever an arm receives a reward, the classifiers for all other arms will be
fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator object for random number generation, a RandomState
object (from NumPy) from which to draw an integer, or a Generator
object (from NumPy), which will be used directly.
While this controls random number generation for this meteheuristic,
there can still be other sources of variations upon re-runs, such as
data aggregations in parallel (e.g. from OpenMP or BLAS functions).
njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will
set it to the number of CPU cores. Be aware that the algorithm will use BLAS function calls,
and if these have multi-threading enabled, it might result in a slow-down
as both functions compete for available threads.
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last
arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise,
will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not
accept arbitrary classifiers as input,
don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase).
If the constructor was called with different base_algorithm per arm, must pass a
base classifier here. Not applicable for the classes that do not take a base_algorithm.
n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor
for details). If None and if the smoothing passed to the constructor didn’t have
separate entries per arm, will use the same smoothing as was passed in the constructor.
If no smoothing was passed to the constructor, the smoothing here will be ignored.
Must pass a smoothing here if the constructor was passed a smoothing with different entries per arm.
beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details.
Must be passed if the constructor was provided different beta priors per arm.
If None and the constructor had a single beta_prior, will use that same
beta_prior for this new arm.
Note that n_w_rew and n_wo_rew will be counted towards the threshold ‘n’
in here.
Cannot be passed if the constructor did not have a beta_prior.
refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
Parameters:
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise,
will drop the arm matching this name (argument must be of the same type as the individual entries
passed to ‘nchoices’ in the initialization).
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
continue_from_last (bool) – If the policy was previously fit to data, whether to assume that
this new call to ‘fit’ will continue from the exact same dataset as before
plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case,
will only refit the models that have new data according to ‘a’.
Note that the bootstrapped policies will still benefit from extra refits.
This option should not be used when there are calls to ‘partial_fit’ between
calls to fit.
Ignored if using assume_unique_reward=True.
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method,
such as ‘sklearn.linear_model.SGDClassifier’. This method is not available
for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
Selects actions according to this policy for new data.
Parameters:
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the
arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use
it with this pakckage’s offpolicy and evaluation modules.
Returns:
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary
with the chosen arm and the score that the arm got following this policy with the classifiers used.
Return type:
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
This method will rank choices/arms according to what the policy
dictates - it is not an exploitation-mode rank, so if e.g. there are
random choices for some observations, there will be random ranks in here.
Parameters:
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
Returns:
topN – The top-ranked actions for each observation
Performs Thompson sampling using a beta distribution, with parameters given
by the predicted probability from the base algorithm multiplied by the number
of observations seen from each arm.
Parameters:
base_algorithm (obj) – Base binary classifier for which each sample for each class will be fit.
Will look for, in this order:
A ‘predict_proba’ method with outputs (n_samples, 2), values in [0,1], rows suming to 1
A ‘decision_function’ method with unbounded outputs (n_samples,) to which it will apply a sigmoid function.
A ‘predict’ method with outputs (n_samples,) with values in [0,1].
Can also pass a list with a different (or already-fit) classifier for each arm.
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case
the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a
custom name.
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without
a reward from a given arm, it will predict the score for that class as a
random number drawn from a beta distribution with the prior
specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((2/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed
as a list of tuples.
This parameter can have a very large impact in the end results, and it’s
recommended to tune it accordingly - scenarios with low expected reward rates
should have priors that result in drawing small random numbers, whereas
scenarios with large expected reward rates should have stronger priors and
tend towards larger random numbers. Also, the more arms there are, the smaller
the optimal expected value for these random numbers.
Recommended to use only one of beta_prior or smoothing.
beta_prior_ts (tuple(float, float)) – Beta prior used for the distribution from which to draw probabilities given
the base algorithm’s estimates. This is independent of beta_prior, and
they will not be used together under the same arm. Pass ‘(0,0)’ for no prior.
smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b),
where ‘n’ is the number of times each arm was chosen in the training data.
Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm
(e.g. if there are arm features, these parameters can be determined through a different model).
This will not work well with non-probabilistic classifiers such as SVM, in which case you might
want to define a class that embeds it with some recalibration built-in.
Recommended to use only one of beta_prior or smoothing.
noise_to_smooth (bool) – If passing smoothing, whether to add a small amount of random
noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of
choosing the smallest arm index.
Ignored when passing smoothing=None.
batch_train (bool) – Whether the base algorithm will be fit to the data in batches as it comes (streaming),
or to the whole dataset each time it is refit. Requires a classifier with a
‘partial_fit’ method.
refit_buffer (int or None) – Number of observations per arm to keep as a reserve for passing to
‘partial_fit’. If passing it, up until the moment there are at least this
number of observations for a given arm, that arm will keep the observations
when calling ‘fit’ and ‘partial_fit’, and will translate calls to
‘partial_fit’ to calls to ‘fit’ with the new plus stored observations.
After the reserve number is reached, calls to ‘partial_fit’ will enlarge
the data batch with the stored observations, and old stored observations
will be gradually replaced with the new ones (at random, not on a FIFO
basis). This technique can greatly enchance the performance when fitting
the data in batches, but memory consumption can grow quite large.
If passing sparse CSR matrices as input to ‘fit’ and ‘partial_fit’,
these will be converted to dense once they go into this reserve, and
then converted back to CSR to augment the new data.
Calls to ‘fit’ will override this reserve.
Ignored when passing ‘batch_train=False’.
deep_copy_buffer (bool) – Whether to make deep copies of the data that is stored in the
reserve for refit_buffer. If passing ‘False’, when the reserve is
not yet full, these will only store shallow copies of the data, which
is faster but will not let Python’s garbage collector free memory
after deleting the data, and if the original data is overwritten, so will
this buffer.
Ignored when not using refit_buffer.
assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’,
whenever an arm receives a reward, the classifiers for all other arms will be
fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator object for random number generation, a RandomState
object (from NumPy) from which to draw an integer, or a Generator
object (from NumPy), which will be used directly.
While this controls random number generation for this meteheuristic,
there can still be other sources of variations upon re-runs, such as
data aggregations in parallel (e.g. from OpenMP or BLAS functions).
njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will
set it to the number of CPU cores. Note that if the base algorithm is itself parallelized,
this might result in a slowdown as both compete for available threads, so don’t set
parallelization in both. The parallelization uses shared memory, thus you will only
see a speed up if your base classifier releases the Python GIL, and will
otherwise result in slower runs.
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last
arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise,
will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not
accept arbitrary classifiers as input,
don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase).
If the constructor was called with different base_algorithm per arm, must pass a
base classifier here. Not applicable for the classes that do not take a base_algorithm.
n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor
for details). If None and if the smoothing passed to the constructor didn’t have
separate entries per arm, will use the same smoothing as was passed in the constructor.
If no smoothing was passed to the constructor, the smoothing here will be ignored.
Must pass a smoothing here if the constructor was passed a smoothing with different entries per arm.
beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details.
Must be passed if the constructor was provided different beta priors per arm.
If None and the constructor had a single beta_prior, will use that same
beta_prior for this new arm.
Note that n_w_rew and n_wo_rew will be counted towards the threshold ‘n’
in here.
Cannot be passed if the constructor did not have a beta_prior.
refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
Parameters:
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise,
will drop the arm matching this name (argument must be of the same type as the individual entries
passed to ‘nchoices’ in the initialization).
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
continue_from_last (bool) – If the policy was previously fit to data, whether to assume that
this new call to ‘fit’ will continue from the exact same dataset as before
plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case,
will only refit the models that have new data according to ‘a’.
Note that the bootstrapped policies will still benefit from extra refits.
This option should not be used when there are calls to ‘partial_fit’ between
calls to fit.
Ignored if using assume_unique_reward=True.
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method,
such as ‘sklearn.linear_model.SGDClassifier’. This method is not available
for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
Selects actions according to this policy for new data.
Parameters:
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the
arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use
it with this pakckage’s offpolicy and evaluation modules.
Returns:
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary
with the chosen arm and the score that the arm got following this policy with the classifiers used.
Return type:
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
beta_prior_ts (tuple(float, float)) – Beta prior used for the distribution from which to draw probabilities given
the base algorithm’s estimates. This is independent of beta_prior, and
they will not be used together under the same arm. Pass ‘(0,0)’ for no prior.
This method will rank choices/arms according to what the policy
dictates - it is not an exploitation-mode rank, so if e.g. there are
random choices for some observations, there will be random ranks in here.
Parameters:
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
Returns:
topN – The top-ranked actions for each observation
Fits decision trees having non-contextual multi-armed Thompson-sampling
bandits at each leaf.
This corresponds to the ‘TreeHeuristic’ in the reference paper.
Note
This method fits only one tree per arm. As such, it’s not recommended for
high-dimensional data.
Note
The default values for beta prior are as suggested in the reference paper.
It is recommended to change it however.
Parameters:
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case
the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a
custom name.
beta_prior (str ‘auto’, tuple ((a,b), n), or list[tuple((a,b), n)]) – When there are less than ‘n’ samples with and without a reward from
a given arm, it will predict the score
for that class as a random number drawn from a beta distribution with the prior
specified by ‘a’ and ‘b’.
If passing ‘auto’ (which is not the default), will use the same default as for
the other policies in this library:
beta_prior = ((2/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed
as a list of tuples.
Additionally, will use (a,b) as prior when sampling from the MAB at a given node.
smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b),
where ‘n’ is the number of times each arm was chosen in the training data.
Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm
(e.g. if there are arm features, these parameters can be determined through a different model).
Not recommended for this method.
noise_to_smooth (bool) – If passing smoothing, whether to add a small amount of random
noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of
choosing the smallest arm index.
Ignored when passing smoothing=None.
assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’,
whenever an arm receives a reward, the classifiers for all other arms will be
fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator object for random number generation, a RandomState
object (from NumPy) from which to draw an integer, or a Generator
object (from NumPy), which will be used directly.
njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will
set it to the number of CPU cores. Note that it will not achieve a large
degree of parallelization due to needing many Python computations with
shared memory and no GIL releasing.
*args (tuple) – Additional arguments to pass to the decision tree model (this policy uses
SciKit-Learn’s DecisionTreeClassifier - see their docs for more details).
Note that passing random_state for DecisionTreeClassifier will have
no effect as it will be set independently.
**kwargs (dict) – Additional keyword arguments to pass to the decision tree model (this policy uses
SciKit-Learn’s DecisionTreeClassifier - see their docs for more details).
Note that passing random_state for DecisionTreeClassifier will have
no effect as it will be set independently.
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last
arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise,
will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not
accept arbitrary classifiers as input,
don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase).
If the constructor was called with different base_algorithm per arm, must pass a
base classifier here. Not applicable for the classes that do not take a base_algorithm.
n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor
for details). If None and if the smoothing passed to the constructor didn’t have
separate entries per arm, will use the same smoothing as was passed in the constructor.
If no smoothing was passed to the constructor, the smoothing here will be ignored.
Must pass a smoothing here if the constructor was passed a smoothing with different entries per arm.
beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details.
Must be passed if the constructor was provided different beta priors per arm.
If None and the constructor had a single beta_prior, will use that same
beta_prior for this new arm.
Note that n_w_rew and n_wo_rew will be counted towards the threshold ‘n’
in here.
Cannot be passed if the constructor did not have a beta_prior.
refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
Parameters:
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise,
will drop the arm matching this name (argument must be of the same type as the individual entries
passed to ‘nchoices’ in the initialization).
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
continue_from_last (bool) – If the policy was previously fit to data, whether to assume that
this new call to ‘fit’ will continue from the exact same dataset as before
plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case,
will only refit the models that have new data according to ‘a’.
Note that the bootstrapped policies will still benefit from extra refits.
This option should not be used when there are calls to ‘partial_fit’ between
calls to fit.
Ignored if using assume_unique_reward=True.
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method,
such as ‘sklearn.linear_model.SGDClassifier’. This method is not available
for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
Selects actions according to this policy for new data.
Parameters:
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the
arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use
it with this pakckage’s offpolicy and evaluation modules.
Returns:
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary
with the chosen arm and the score that the arm got following this policy with the classifiers used.
Return type:
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
This method will rank choices/arms according to what the policy
dictates - it is not an exploitation-mode rank, so if e.g. there are
random choices for some observations, there will be random ranks in here.
Parameters:
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
Returns:
topN – The top-ranked actions for each observation
Fits decision trees having non-contextual multi-armed UCB bandits at each leaf.
Uses the standard approximation for confidence interval of a proportion
(mean + c * sqrt(mean * (1-mean) / n)).
This is similar to the ‘TreeHeuristic’ in the reference paper, but uses UCB as a
MAB policy instead of Thompson sampling.
Note
This method fits only one tree per arm. As such, it’s not recommended for
high-dimensional data.
Parameters:
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case
the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a
custom name.
percentile (int [0,100]) – Percentile of the confidence interval to take.
ucb_prior (tuple(float, float)) – Prior for the upper confidence bounds generated at each tree leaf. First
number will be added to the number of positives, and second number to
the number of negatives. If passing beta_prior=None, will use these alone
to generate an upper confidence bound and will break ties at random.
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without
a reward from a given arm, it will predict the score for that class as a
random number drawn from a beta distribution with the prior
specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((3/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed
as a list of tuples.
This parameter can have a very large impact in the end results, and it’s
recommended to tune it accordingly - scenarios with low expected reward rates
should have priors that result in drawing small random numbers, whereas
scenarios with large expected reward rates should have stronger priors and
tend towards larger random numbers. Also, the more arms there are, the smaller
the optimal expected value for these random numbers.
Note that this method calculates upper bounds rather than expectations, so the ‘a’
parameter should be higher than for other methods.
Recommended to use only one of beta_prior or smoothing.
smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b),
where ‘n’ is the number of times each arm was chosen in the training data.
Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm
(e.g. if there are arm features, these parameters can be determined through a different model).
Not recommended for this method.
noise_to_smooth (bool) – If passing smoothing, whether to add a small amount of random
noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of
choosing the smallest arm index.
Ignored when passing smoothing=None.
assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’,
whenever an arm receives a reward, the classifiers for all other arms will be
fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator object for random number generation, a RandomState
object (from NumPy) from which to draw an integer, or a Generator
object (from NumPy), which will be used directly.
njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will
set it to the number of CPU cores. Note that it will not achieve a large
degree of parallelization due to needing many Python computations with
shared memory and no GIL releasing.
*args (tuple) – Additional arguments to pass to the decision tree model (this policy uses
SciKit-Learn’s DecisionTreeClassifier - see their docs for more details).
Note that passing random_state for DecisionTreeClassifier will have
no effect as it will be set independently.
**kwargs (dict) – Additional keyword arguments to pass to the decision tree model (this policy uses
SciKit-Learn’s DecisionTreeClassifier - see their docs for more details).
Note that passing random_state for DecisionTreeClassifier will have
no effect as it will be set independently.
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last
arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise,
will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not
accept arbitrary classifiers as input,
don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase).
If the constructor was called with different base_algorithm per arm, must pass a
base classifier here. Not applicable for the classes that do not take a base_algorithm.
n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor
for details). If None and if the smoothing passed to the constructor didn’t have
separate entries per arm, will use the same smoothing as was passed in the constructor.
If no smoothing was passed to the constructor, the smoothing here will be ignored.
Must pass a smoothing here if the constructor was passed a smoothing with different entries per arm.
beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details.
Must be passed if the constructor was provided different beta priors per arm.
If None and the constructor had a single beta_prior, will use that same
beta_prior for this new arm.
Note that n_w_rew and n_wo_rew will be counted towards the threshold ‘n’
in here.
Cannot be passed if the constructor did not have a beta_prior.
refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
Parameters:
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise,
will drop the arm matching this name (argument must be of the same type as the individual entries
passed to ‘nchoices’ in the initialization).
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
continue_from_last (bool) – If the policy was previously fit to data, whether to assume that
this new call to ‘fit’ will continue from the exact same dataset as before
plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case,
will only refit the models that have new data according to ‘a’.
Note that the bootstrapped policies will still benefit from extra refits.
This option should not be used when there are calls to ‘partial_fit’ between
calls to fit.
Ignored if using assume_unique_reward=True.
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method,
such as ‘sklearn.linear_model.SGDClassifier’. This method is not available
for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
Selects actions according to this policy for new data.
Parameters:
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the
arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use
it with this pakckage’s offpolicy and evaluation modules.
Returns:
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary
with the chosen arm and the score that the arm got following this policy with the classifiers used.
Return type:
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
Set the upper confidence bound prior to a custom tuple
Parameters:
ucb_prior (tuple(float, float)) – Prior for the upper confidence bounds generated at each tree leaf. First
number will be added to the number of positives, and second number to
the number of negatives. If passing beta_prior=None, will use these alone
to generate an upper confidence bound and will break ties at random.
This method will rank choices/arms according to what the policy
dictates - it is not an exploitation-mode rank, so if e.g. there are
random choices for some observations, there will be random ranks in here.
Parameters:
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
Returns:
topN – The top-ranked actions for each observation
Fits one classifier per arm using only the data on which that arm was chosen.
Predicts as One-Vs-Rest, plus the usual metaheuristics from beta_prior
and smoothing.
Parameters:
base_algorithm (obj) – Base binary classifier for which each sample for each class will be fit.
Will look for, in this order:
A ‘predict_proba’ method with outputs (n_samples, 2), values in [0,1], rows suming to 1
A ‘decision_function’ method with unbounded outputs (n_samples,) to which it will apply a sigmoid function.
A ‘predict’ method with outputs (n_samples,) with values in [0,1].
Can also pass a list with a different (or already-fit) classifier for each arm.
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case
the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a
custom name.
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without
a reward from a given arm, it will predict the score for that class as a
random number drawn from a beta distribution with the prior
specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((2/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed
as a list of tuples.
This parameter can have a very large impact in the end results, and it’s
recommended to tune it accordingly - scenarios with low expected reward rates
should have priors that result in drawing small random numbers, whereas
scenarios with large expected reward rates should have stronger priors and
tend towards larger random numbers. Also, the more arms there are, the smaller
the optimal expected value for these random numbers.
Recommended to use only one of beta_prior or smoothing.
smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b),
where ‘n’ is the number of times each arm was chosen in the training data.
Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm
(e.g. if there are arm features, these parameters can be determined through a different model).
This will not work well with non-probabilistic classifiers such as SVM, in which case you might
want to define a class that embeds it with some recalibration built-in.
Recommended to use only one of beta_prior or smoothing.
noise_to_smooth (bool) – If passing smoothing, whether to add a small amount of random
noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of
choosing the smallest arm index.
Ignored when passing smoothing=None.
batch_train (bool) – Whether the base algorithm will be fit to the data in batches as it comes (streaming),
or to the whole dataset each time it is refit. Requires a classifier with a
‘partial_fit’ method.
refit_buffer (int or None) – Number of observations per arm to keep as a reserve for passing to
‘partial_fit’. If passing it, up until the moment there are at least this
number of observations for a given arm, that arm will keep the observations
when calling ‘fit’ and ‘partial_fit’, and will translate calls to
‘partial_fit’ to calls to ‘fit’ with the new plus stored observations.
After the reserve number is reached, calls to ‘partial_fit’ will enlarge
the data batch with the stored observations, and old stored observations
will be gradually replaced with the new ones (at random, not on a FIFO
basis). This technique can greatly enchance the performance when fitting
the data in batches, but memory consumption can grow quite large.
If passing sparse CSR matrices as input to ‘fit’ and ‘partial_fit’,
these will be converted to dense once they go into this reserve, and
then converted back to CSR to augment the new data.
Calls to ‘fit’ will override this reserve.
Ignored when passing ‘batch_train=False’.
deep_copy_buffer (bool) – Whether to make deep copies of the data that is stored in the
reserve for refit_buffer. If passing ‘False’, when the reserve is
not yet full, these will only store shallow copies of the data, which
is faster but will not let Python’s garbage collector free memory
after deleting the data, and if the original data is overwritten, so will
this buffer.
Ignored when not using refit_buffer.
assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’,
whenever an arm receives a reward, the classifiers for all other arms will be
fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator object for random number generation, a RandomState
object (from NumPy) from which to draw an integer, or a Generator
object (from NumPy), which will be used directly.
While this controls random number generation for this meteheuristic,
there can still be other sources of variations upon re-runs, such as
data aggregations in parallel (e.g. from OpenMP or BLAS functions).
njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will
set it to the number of CPU cores. Note that if the base algorithm is itself parallelized,
this might result in a slowdown as both compete for available threads, so don’t set
parallelization in both. The parallelization uses shared memory, thus you will only
see a speed up if your base classifier releases the Python GIL, and will
otherwise result in slower runs.
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last
arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise,
will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not
accept arbitrary classifiers as input,
don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase).
If the constructor was called with different base_algorithm per arm, must pass a
base classifier here. Not applicable for the classes that do not take a base_algorithm.
n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor
for details). If None and if the smoothing passed to the constructor didn’t have
separate entries per arm, will use the same smoothing as was passed in the constructor.
If no smoothing was passed to the constructor, the smoothing here will be ignored.
Must pass a smoothing here if the constructor was passed a smoothing with different entries per arm.
beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details.
Must be passed if the constructor was provided different beta priors per arm.
If None and the constructor had a single beta_prior, will use that same
beta_prior for this new arm.
Note that n_w_rew and n_wo_rew will be counted towards the threshold ‘n’
in here.
Cannot be passed if the constructor did not have a beta_prior.
refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
Get the predicted “probabilities” from each arm from the classifier that predicts it,
standardized to sum up to 1 (note that these are no longer probabilities).
Parameters:
X (array (n_samples, n_features)) – Data for which to obtain decision function scores for each arm.
Returns:
scores – Scores following this policy for each arm.
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
Parameters:
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise,
will drop the arm matching this name (argument must be of the same type as the individual entries
passed to ‘nchoices’ in the initialization).
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
continue_from_last (bool) – If the policy was previously fit to data, whether to assume that
this new call to ‘fit’ will continue from the exact same dataset as before
plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case,
will only refit the models that have new data according to ‘a’.
Note that the bootstrapped policies will still benefit from extra refits.
This option should not be used when there are calls to ‘partial_fit’ between
calls to fit.
Ignored if using assume_unique_reward=True.
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method,
such as ‘sklearn.linear_model.SGDClassifier’. This method is not available
for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
Selects actions according to this policy for new data.
Parameters:
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use
it with this pakckage’s offpolicy and evaluation modules.
Returns:
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary
with the chosen arm and the score that the arm got following this policy with the classifiers used.
Return type:
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
This method will rank choices/arms according to what the policy
dictates - it is not an exploitation-mode rank, so if e.g. there are
random choices for some observations, there will be random ranks in here.
Parameters:
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
Returns:
topN – The top-ranked actions for each observation
Selects an action according to probabilites determined by a softmax transformation
on the scores from the decision function that predicts each class.
Note
Will apply an inverse sigmoid transformations to the probabilities that come from the base algorithm
before applying the softmax function.
Parameters:
base_algorithm (obj) – Base binary classifier for which each sample for each class will be fit.
Will look for, in this order:
A ‘predict_proba’ method with outputs (n_samples, 2), values in [0,1], rows suming to 1, to which it
will apply an inverse sigmoid function.
A ‘decision_function’ method with unbounded outputs (n_samples,).
A ‘predict’ method outputting (n_samples,), values in [0,1], to which it will apply an inverse sigmoid function.
Can also pass a list with a different (or already-fit) classifier for each arm.
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case
the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a
custom name.
multiplier (float or None) – Number by which to multiply the outputs from the base algorithm before applying the softmax function
(i.e. will take softmax(yhat * multiplier)).
inflation_rate (float or None) – Number by which to multiply the multipier rate after every prediction, i.e. after making
‘t’ predictions, the multiplier will be ‘multiplier_t = multiplier * inflation_rate^t’.
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without
a reward from a given arm, it will predict the score for that class as a
random number drawn from a beta distribution with the prior
specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((2/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed
as a list of tuples.
This parameter can have a very large impact in the end results, and it’s
recommended to tune it accordingly - scenarios with low expected reward rates
should have priors that result in drawing small random numbers, whereas
scenarios with large expected reward rates should have stronger priors and
tend towards larger random numbers. Also, the more arms there are, the smaller
the optimal expected value for these random numbers.
Recommended to use only one of beta_prior or smoothing.
smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b),
where ‘n’ is the number of times each arm was chosen in the training data.
Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm
(e.g. if there are arm features, these parameters can be determined through a different model).
This will not work well with non-probabilistic classifiers such as SVM, in which case you might
want to define a class that embeds it with some recalibration built-in.
Recommended to use only one of beta_prior or smoothing.
noise_to_smooth (bool) – If passing smoothing, whether to add a small amount of random
noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of
choosing the smallest arm index.
Ignored when passing smoothing=None.
batch_train (bool) – Whether the base algorithm will be fit to the data in batches as it comes (streaming),
or to the whole dataset each time it is refit. Requires a classifier with a
‘partial_fit’ method.
refit_buffer (int or None) – Number of observations per arm to keep as a reserve for passing to
‘partial_fit’. If passing it, up until the moment there are at least this
number of observations for a given arm, that arm will keep the observations
when calling ‘fit’ and ‘partial_fit’, and will translate calls to
‘partial_fit’ to calls to ‘fit’ with the new plus stored observations.
After the reserve number is reached, calls to ‘partial_fit’ will enlarge
the data batch with the stored observations, and old stored observations
will be gradually replaced with the new ones (at random, not on a FIFO
basis). This technique can greatly enchance the performance when fitting
the data in batches, but memory consumption can grow quite large.
If passing sparse CSR matrices as input to ‘fit’ and ‘partial_fit’,
these will be converted to dense once they go into this reserve, and
then converted back to CSR to augment the new data.
Calls to ‘fit’ will override this reserve.
Ignored when passing ‘batch_train=False’.
deep_copy_buffer (bool) – Whether to make deep copies of the data that is stored in the
reserve for refit_buffer. If passing ‘False’, when the reserve is
not yet full, these will only store shallow copies of the data, which
is faster but will not let Python’s garbage collector free memory
after deleting the data, and if the original data is overwritten, so will
this buffer.
Ignored when not using refit_buffer.
assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’,
whenever an arm receives a reward, the classifiers for all other arms will be
fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator object for random number generation, a RandomState
object (from NumPy) from which to draw an integer, or a Generator
object (from NumPy), which will be used directly.
While this controls random number generation for this meteheuristic,
there can still be other sources of variations upon re-runs, such as
data aggregations in parallel (e.g. from OpenMP or BLAS functions).
njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will
set it to the number of CPU cores. Note that if the base algorithm is itself parallelized,
this might result in a slowdown as both compete for available threads, so don’t set
parallelization in both. The parallelization uses shared memory, thus you will only
see a speed up if your base classifier releases the Python GIL, and will
otherwise result in slower runs.
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last
arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise,
will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not
accept arbitrary classifiers as input,
don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase).
If the constructor was called with different base_algorithm per arm, must pass a
base classifier here. Not applicable for the classes that do not take a base_algorithm.
n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor
for details). If None and if the smoothing passed to the constructor didn’t have
separate entries per arm, will use the same smoothing as was passed in the constructor.
If no smoothing was passed to the constructor, the smoothing here will be ignored.
Must pass a smoothing here if the constructor was passed a smoothing with different entries per arm.
beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details.
Must be passed if the constructor was provided different beta priors per arm.
If None and the constructor had a single beta_prior, will use that same
beta_prior for this new arm.
Note that n_w_rew and n_wo_rew will be counted towards the threshold ‘n’
in here.
Cannot be passed if the constructor did not have a beta_prior.
refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using
‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only
for the policies that make choices according to active learning
criteria, and only for situations in which the policy was passed
different functions for each arm.
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
Parameters:
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise,
will drop the arm matching this name (argument must be of the same type as the individual entries
passed to ‘nchoices’ in the initialization).
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
continue_from_last (bool) – If the policy was previously fit to data, whether to assume that
this new call to ‘fit’ will continue from the exact same dataset as before
plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case,
will only refit the models that have new data according to ‘a’.
Note that the bootstrapped policies will still benefit from extra refits.
This option should not be used when there are calls to ‘partial_fit’ between
calls to fit.
Ignored if using assume_unique_reward=True.
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method,
such as ‘sklearn.linear_model.SGDClassifier’. This method is not available
for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
Parameters:
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
Selects actions according to this policy for new data.
Parameters:
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the
arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use
it with this pakckage’s offpolicy and evaluation modules.
Returns:
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary
with the chosen arm and the score that the arm got following this policy with the classifiers used.
Return type:
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
multiplier (float) – New multiplier for the numbers going to the softmax function.
Note that it will still apply the inflation rate after this
parameter is being reset.
This method will rank choices/arms according to what the policy
dictates - it is not an exploitation-mode rank, so if e.g. there are
random choices for some observations, there will be random ranks in here.
Parameters:
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
Returns:
topN – The top-ranked actions for each observation
Estimates the expected reward for each arm, applies a correction for the actions that
were chosen, and converts the problem to const-sensitive classification, on which the
base algorithm is then fit.
Note
If following these docs to the letter about what to pass under each argument,
this implementation will be theoretically incorrect as this whole library
doesn’t follow the paradigm of producing probabilities of choosing actions, nor of
estimating probabilities of a previous policy.
Instead, it uses estimated expected rewards (that is, the rows of the estimations
don’t sum to 1), but nevertheless, this is likely to still produce an improvement
over a naive approach. One may still supply post-hoc estimated probabilities if
feasible though.
Note
This technique converts the problem into a cost-sensitive classification problem
by calculating a matrix of expected rewards and turning it into costs. The base
algorithm is then fit to this data, using either the Weighted All-Pairs approach,
which requires a binary classifier with sample weights as base algorithm, or the
Regression One-Vs-Rest approach, which requires a regressor as base algorithm.
In the Weighted All-Pairs approach, this technique will fail if there are actions that
were never taken by the exploration policy, as it cannot construct a model for them.
The expected rewards are estimated with the imputer algorithm passed here, which should
output a number in the range \([0,1]\).
This technique is meant for the case of contiunous rewards in the \([0,1]\) interval,
but here it is used for the case of discrete rewards \(\{0,1\}\), under which it performs
poorly. It is not recommended to use, but provided for comparison purposes.
Alo important: this method requires to form reward estimates of all arms for each observation. In order to
do so, you can either provide estimates as an array (see Parameters), or pass a model.
One method to obtain reward estimates is to fit a model to the data and use its predictions as
reward estimates. You can do so by passing an object of class
contextualbandits.online.SeparateClassifiers which should be already fitted, or by passing a
classifier with a ‘predict_proba’ method, which will be put into a ‘SeparateClassifiers’
object and fit to the same data passed to this function to obtain reward estimates.
The estimates can make invalid predictions if there are some arms for which every time
they were chosen they resulted in a reward, or never resulted in a reward. In such cases,
this function includes the option to impute the “predictions” for them (which would otherwise
always be exactly zero or one regardless of the context) by replacing them with random
numbers \(\sim \text{Beta}(3,1)\) or \(\sim \text{Beta}(1,3)\) for the cases of
always good and always bad.
This is just a wild idea though, and doesn’t guarantee reasonable results in such siutation.
Note that, if you are using the ‘SeparateClassifiers’ class from the online module in this
same package, it comes with a method ‘predict_proba_separate’ that can be used to get reward
estimates. It still can suffer from the same problem of always-one and always-zero predictions though.
Parameters:
base_algorithm (obj) – Base algorithm to be used for cost-sensitive classification.
reward_estimator (obj or array (n_samples, n_choices)) –
One of the following:
An array with the first column corresponding to the reward estimates for the action chosen
by the new policy, and the second column corresponding to the reward estimates for the
action chosen in the data (see Note for details).
An already-fit object of class ‘contextualbandits.online.SeparateClassifiers’, which will
be used to make predictions on the actions chosen and the actions that the new
policy would choose.
A classifier with a ‘predict_proba’ method, which will be fit to the same test data
passed here in order to obtain reward estimates (see Note 2 for details).
nchoices (int) – Number of arms/labels to choose from.
Only used when passing a classifier object to ‘reward_estimator’.
method (str, either ‘rovr’ or ‘wap’) – Whether to use Regression One-Vs-Rest or Weighted All-Pairs (see Note 1)
handle_invalid (bool) – Whether to replace 0/1 estimated rewards with randomly-generated numbers (see Note 2)
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator object for random number generation, a RandomState
object (from NumPy) from which to draw an integer, or a Generator
object (from NumPy), which will be used directly.
This is used when passing handle_invalid=True or beta_prior!=None.
c (None or float) – Constant by which to multiply all scores from the exploration policy.
pmin (None or float) – Scores (from the exploration policy) will be converted to the minimum between
pmin and the original estimate.
beta_prior (tuple((a, b), n), str “auto”, or None) – Beta prior to pass to ‘SeparateClassifiers’. Only used when passing to
‘reward_estimator’ a classifier with ‘predict_proba’. See the documentation
of ‘SeparateClassifiers’ for details about it.
smoothing (tuple(a, b), list, or None) – Smoothing parameter to pass to SeparateClassifiers. Only used when passing to ‘reward_estimator’
a classifier with ‘predict_proba’. See the documentation of SeparateClassifiers for details.
njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will
set it to the number of CPU cores.
kwargs_costsens – Additional keyword arguments to pass to the cost-sensitive classifier.
Fits the Doubly-Robust estimator to partially-labeled data collected from a different policy.
Parameters:
X (array (n_samples, n_features)) – Matrix of covariates for the available data.
a (array (n_samples), int type) – Arms or actions that were chosen for each observations.
r (array (n_samples), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
p (array (n_samples)) – Reward estimates for the actions that were chosen by the policy.
Note that, in theory, this should be an estimate of the probabilities that the
actions in a would have been taken under the policy that chose these actions,
but passing reward estimates in its place might still produce reasonable results.
base_algorithm (obj) – Binary classifier to be used for each classification sub-problem in the tree.
nchoices (int) – Number of arms/labels to choose from.
c (None or float) – Constant by which to multiply all scores from the exploration policy.
pmin (None or float) – Scores (from the exploration policy) will be converted to the minimum between
pmin and the original estimate.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator object for random number generation, a RandomState
object (from NumPy) from which to draw an integer, or a Generator
object (from NumPy), which will be used directly.
This is used when predictions need to be done for an arm with no data.
njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will
set it to the number of CPU cores. Note that if the base algorithm is itself parallelized,
this might result in a slowdown as both compete for available threads, so don’t set
parallelization in both.
While in theory, making predictions from this algorithm should be faster than from others,
the implementation here uses a Python loop for each observation, which is slow compared to
NumPy array lookups, so the predictions will be slower to calculate than those from other algorithms.
Parameters:
X (array (n_samples, n_features)) – New observations for which to choose an action.
classcontextualbandits.evaluation.evaluateRejectionSampling(policy, X, a, r, online=True, partial_fit=False, start_point_online='random', random_state=1, batch_size=10)
Evaluate a policy using rejection sampling on test data.
Note
In order for this method to be unbiased, the actions on the test sample must have been
collected at random and not according to some other policy.
Parameters:
policy (obj) – Policy to be evaluated (already fitted to data). Must have a ‘predict’ method.
If it is an online policy, it must also have a ‘fit’ method.
X (array (n_samples, n_features)) – Matrix of covariates for the available data.
a (array (n_samples), int type) – Arms or actions that were chosen for each observation.
r (array (n_samples), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
online (bool) – Whether this is an online policy to be evaluated by refitting it to the data
as it makes choices on it.
partial_fit (bool) – Whether to use ‘partial_fit’ when fitting the policy to more data.
Ignored if passing online=False.
start_point_online (either str ‘random’ or int in [0, n_samples-1]) – Point at which to start evaluating cases in the sample.
Only used when passing online=True.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator object for random number generation, a RandomState
object (from NumPy) from which to draw an integer, or a Generator
object (from NumPy), which will be used directly.
This is only used when passing start_point_online='random'.
batch_size (int) – Size of batches of data to take for making predictions and adding
observations to the history. Note that usually most of the samples
are rejected, thus the actual size of the batches to which the models
are refit are usually smaller than this number.
Only used when passing online=True.
Returns:
result – Estimated mean reward and number of observations taken.
Evaluates rewards of arm choices of a policy from data collected by another policy, using a reward estimator along with the historical probabilities
(hence the name).
Note
This method requires to form reward estimates of the arms that were chosen and of the arms
that the policy to be evaluated would choose. In order to do so, you can either provide
estimates as an array (see Parameters), or pass a model.
One method to obtain reward estimates is to fit a model to both the training and test data
and use its predictions as reward estimates. You can do so by passing an object of class
contextualbandits.online.SeparateClassifiers which should be already fitted.
Another method is to fit a model to the test data, in which case you can pass a classifier
with a ‘predict_proba’ method here, which will be fit to the same test data passed to this
function to obtain reward estimates.
The last two options can suffer from invalid predictions if there are some arms for which every time
they were chosen they resulted in a reward, or never resulted in a reward. In such cases,
this function includes the option to impute the “predictions” for them (which would otherwise
always be exactly zero or one regardless of the context) by replacing them with random
numbers ~Beta(3,1) or ~Beta(1,3) for the cases of always good and always bad.
This is just a wild idea though, and doesn’t guarantee reasonable results in such siutation.
Note that, if you are using the ‘SeparateClassifiers’ class from the online module in this
same package, it comes with a method ‘predict_proba_separate’ that can be used to get reward
estimates. It still can suffer from the same problem of always-one and always-zero predictions though.
Parameters:
pred (array (n_samples,)) – Arms that would be chosen by the policy to evaluate.
X (array (n_samples, n_features)) – Matrix of covariates for the available data.
a (array (n_samples), int type) – Arms or actions that were chosen for each observation.
r (array (n_samples), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
p (array (n_samples)) – Scores or reward estimates from the policy that generated the data for the actions
that were chosen by it.
reward_estimator (obj or array (n_samples, 2)) –
One of the following:
An array with the first column corresponding to the reward estimates for the action chosen
by the new policy, and the second column corresponding to the reward estimates for the
action chosen in the data (see Note for details).
An already-fit object of class ‘contextualbandits.online.SeparateClassifiers’, which will
be used to make predictions on the actions chosen and the actions that the new
policy would choose.
A classifier with a ‘predict_proba’ method, which will be fit to the same test data
passed here in order to obtain reward estimates (see Note for details).
nchoices (int) – Number of arms/labels to choose from.
Only used when passing a classifier object to ‘reward_estimator’.
handle_invalid (bool) – Whether to replace 0/1 estimated rewards with randomly-generated numbers (see Note)
c (None or float) – Constant by which to multiply all scores from the exploration policy.
pmin (None or float) – Scores (from the exploration policy) will be converted to the minimum between
pmin and the original estimate.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator object for random number generation, a RandomState
object (from NumPy) from which to draw an integer, or a Generator
object (from NumPy), which will be used directly.
Returns:
est – The estimated mean reward that the new policy would obtain on the ‘X’ data.
policy (obj) – Policy to be evaluated (already fitted to data). Must have a ‘predict’ method.
If it is an online policy, it must also have a ‘fit’ method.
X (array (n_samples, n_features)) – Covariates for each observation.
y_onehot (array (n_samples, n_arms)) – Labels (zero or one) for each class for each observation.
online (bool) – Whether the algorithm should be fit to batches of data with a ‘partial_fit’ method,
or to all historical data each time.
shuffle (bool) – Whether to shuffle the data (X and y_onehot) before passing through it.
Be aware that data is shuffled in-place.
update_freq (int) – Batch size - how many observations to predict before refitting the model.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator object for random number generation, a RandomState
object (from NumPy) from which to draw an integer, or a Generator
object (from NumPy), which will be used directly.
This is used when shuffling and when selecting actions at random for
the first batch.
Evaluates rewards of arm choices of a policy from data collected by another policy,
making corrections according to the difference between the estimations of the
new and old policy over the actions that were chosen.
Note
This implementation is theoretically incorrect as this whole library
doesn’t follow the paradigm of producing probabilities of choosing actions
(it is theoretically possible for many of the methods in the online
section, but computationally inefficient and not supported by the library).
Instead, it uses estimated expected rewards (that is, the rows of the estimations
don’t sum to 1), which is not what this method expects, but nevertheless, the
ratio of these estimations between the old and new policy should be highly related
to the ratio of the probabilities of choosing those actions, and as such, this
function is likely to still produce an improvement over a naive average of the
expected rewards across actions that were chosen by a different policy.
Note
Unlike the other functions in this module, function doesn’t take the indices
of the chosen actions, but rather takes the predictions directly (see the
‘Parameters’ section for details).
Parameters:
est (array (n_samples,)) – Scores or reward estimates from the policy being evaluated on the actions
that were chosen by the old policy for each row of ‘X’.
r (array (n_samples), {0,1}) – Rewards that were observed for the chosen actions.
p (array (n_samples)) – Scores or reward estimates from the policy that generated the data for the actions
that were chosen by it. Must be in the same scale as ‘est’.
cmin (float) – Minimum value for the ratio between estimations to assign to observations.
If any ratio is below this number, it will be assigned this value (i.e.
will be clipped).
cmax (float) – Maximum value of the ratio between estimations that will be taken.
Observations with ratios higher than this will be discarded rather
than clipped.
Returns:
est – The estimated mean reward that the new policy would obtain on the ‘X’ data.
The package offers non-stochastic linear regression procedures with exact “partial_fit” solutions, which are recommended to use alongside the online policies for better incremental updates.
Typical Linear Regression model, which keeps track of the aggregated data
needed to obtain the closed-form solution in a way that calling ‘partial_fit’
multiple times would be equivalent to a single call to ‘fit’ with all the data.
This is an exact method rather than a stochastic optimization procedure.
Also provides functionality for making predictions according to upper confidence
bound (UCB) and to Thompson sampling criteria.
Note
Doing linear regression this way requires both memory and computation time
which scale quadratically with the number of columns/features/variables. As
such, the class will by default use C ‘float’ types (typically np.float32)
instead of C ‘double’ (np.float64), in order to save memory.
Parameters:
lambda_ (float) – Strenght of the L2 regularization.
fit_intercept (bool) – Whether to add an intercept term to the formula. If passing ‘True’, it will
be the last entry in the coefficients.
method (str, one of ‘chol’ or ‘sm’) – Method used to fit the model. Options are:
'chol':
Uses the Cholesky decomposition to solve the linear system from the
least-squares closed-form each time ‘fit’ or ‘partial_fit’ is called.
This is likely to be faster when fitting the model to a large number
of observations at once, and is able to better exploit multi-threading.
'sm':
Starts with an inverse diagonal matrix and updates it as each
new observation comes using the Sherman-Morrison formula, thus
never explicitly solving the linear system, nor needing to calculate
a matrix inverse. This is likely to be faster when fitting the model
to small batches of observations. Be aware that with this method, it
will add regularization to the intercept if passing ‘fit_intercept=True’.
Note that it is possible to change the method after the object has
already been fit (e.g. if you want non-regularization intercept
with fast online updates, you might use Cholesky first and then switch
to Sherman-Morrison).
calc_inv (bool) – When using method='chol', whether to also produce a matrix inverse, which
is required for using the LinUCB prediction mode. Ignored when
passing method='sm' (the default). Note that is is possible to change
the method after the object has already been fit.
precompute_ts (bool) – Whether to pre-compute the necessary matrices to accelerate the Thompson
sampling prediction mode (method predict_thompson). If you plan to use
predict_thompson, it’s recommended to pass “True”.
Note that this will make the Sherman-Morrison updates (method="sm")
much slower as it will calculate eigenvalues after every update.
Can be changed after the object is already initialized or fitted.
precompute_ts_multiplier (float) – Multiplier for the covariance matrix to use when using precompute_ts.
Calling predict_thompson with this same multiplier will be faster than
with a different one. Calling it with a different multiplier with
precompute_ts will still be faster than without it, unless using
also n_presampled.
Ignored when passing precompute_ts=False.
n_presampled (None or int) – When passing precompute_ts, this denotes a number of coefficients to pre-sample
after calling ‘fit’ and/or ‘partial_fit’, which will be used later
when calling predict_thompson with the same multiplier as in precompute_ts_multiplier.
Pre-sampling a large number of coefficients can help to speed up Thompson-sampled predictions
at the expense of longer fitting times, and is recommended if there is a large number of
predictions between calls to ‘fit’ or ‘partial_fit’.
If passing ‘None’ (the default), will not pre-sample a finite number of the coefficients
at fitting time, but will rather sample (different) coefficients in calls to
predict_thompson.
The pre-sampled coefficients will not be used if calling predict_thompson with
a different multiplier than what was passed to precompute_ts_multiplier.
rng_presample (None, int, RandomState, or Generator) – Random number generator to use for pre-sampling coefficients.
If passing an integer, will use it as a random seed for initialization. If passing
a RandomState, will use it to draw an integer to use as seed. If passing a
Generator, will use it directly. If passing ‘None’, will initialize a Generator
without random seed.
Ignored if passing precompute_ts=False or n_presampled=None (the defaults).
use_float (bool) – Whether to use C ‘float’ type for the required matrices. If passing ‘False’,
will use C ‘double’. Be aware that memory usage for this model can grow
very large. Can be changed after initialization.
Variables:
coef (array(n) or array(n+1)) – The obtained coefficients. If passing ‘fit_intercept=True’, the intercept
will be at the last entry.
Make a prediction on new data with coefficients sampled from their
estimated distribution.
Note
If using this method, it’s recommended to center the ‘X’ data passed
to ‘fit’ and ‘partial_fit’. If not centered, it’s recommendable to
lower the v_sq value.
Parameters:
X (array(m,n) or CSR matrix(m, n)) – The covariates.
v_sq (float > 0) – The multiplier for the covariance matrix. Larger values lead to
more variable results.
sample_unique (bool) – Whether to sample different coefficients each time a prediction is to
be made. If passing ‘False’, when calling ‘predict’, it will sample
the same coefficients for all the observations in the same call to
‘predict’, whereas if passing ‘True’, will use a different set of
coefficients for each observations. Passing ‘False’ leads to an
approach which is theoretically wrong, but as sampling coefficients
can be very slow, using ‘False’ can provide a reasonable speed up
without much of a performance penalty.
random_state (None, np.random.Generator, or np.random.RandomState) – A NumPy ‘Generator’ or ‘RandomState’ object instance to use for generating
random numbers. If passing ‘None’, will use NumPy’s random
module directly (which can be made reproducible through
np.random.seed).
Returns:
y_hat – The predicted guess on ‘y’ given ‘X’ and v_sq.
Make a prediction on new data with an upper bound given by the LinUCB
formula (be aware that it’s not probabilistic like a regular CI).
Note
If using this method, it’s recommended to center the ‘X’ data passed
to ‘fit’ and ‘partial_fit’. If not centered, it’s recommendable to
lower the alpha value.
Parameters:
X (array(m,n) or CSR matrix(m, n)) – The covariates.
alpha (float > 0 or array(m, ) > 0) – The multiplier for the width of the bound. Can also pass an array
with different values for each row.
add_unfit_noise (bool) – When making predictions with an unfit model (in this case they are
given by empty zero matrices except for the inverse diagonal matrix
based on the regularization parameter), whether to add a very small
amount of random noise ~ Uniform(0, 10^-12) to it. This is useful in
order to break ties at random when using multiple models.
random_state (None, np.random.Generator, or np.random.RandomState) – A NumPy ‘Generator’ or ‘RandomState’ object instance to use for generating
random numbers. If passing ‘None’, will use NumPy’s random
module directly (which can be made reproducible through
np.random.seed). Only used when passing add_unfit_noise=True
and calling this method on a model that has not been fit to data.
Returns:
y_hat – The predicted upper bound on ‘y’ given ‘X’ and alpha.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.
The method works on simple estimators as well as on nested objects
(such as Pipeline). The latter have
parameters of the form <component>__<parameter> so that it’s
possible to update each component of a nested object.
Request metadata passed to the partial_fit method.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to partial_fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in partial_fit.
ElasticNet regression (with penalization on the l1 and l2 norms of
the coefficients), which keeps track of the aggregated data
needed to obtain the optimal coefficients in such a way that calling ‘partial_fit’
multiple times would be equivalent to a single call to ‘fit’ with all the data.
This is an exact method rather than a stochastic optimization procedure.
Note
This ElasticNet regression is fit through a reduction to non-negative
least squares with twice the number of variables, which is in turn solved
through a coordinate descent procedure. This is typically slower than the
lasso paths used in GLMNet and SciKit-Learn, and scales much worse with
the number of features/columns, but allows for faster incremental updates
through ‘partial_fit’, which will give the same result as calls to fit.
Note
This model will not standardize the input data in any way.
Note
By default, will set the l1 and l2 regularization in the same way as
GLMNet and SciKit-Learn - that is, the regularizations increase along
with the number of rows in the data, which means they will be different
after each call to ‘fit’ or ‘partial_fit’. It is nevertheless possible
to specify the l1 and l2 regularization directly, and both will remain
constant that way, but be careful about the choice for such hyperparameters.
Note
Doing regression this way requires both memory and computation time
which scale quadratically with the number of columns/features/variables. As
such, the class will by default use C ‘float’ types (typically np.float32)
instead of C ‘double’ (np.float64), in order to save memory.
Parameters:
alpha (float) – Strenght of the regularization.
l1_ratio (float [0,1]) – Proportion of the regularization that will be applied to the l1 norm of
the coefficients (remainder will be applied to the l2 norm). Must be a
number between zero and one. If passing l1_ratio=0, it’s recommended
instead to use the LinearRegression class which uses more efficient
procedures.
Using higher l1 regularization is more likely to result in some of the
obtained coefficients being exactly zero, which is oftentimes desirable.
fit_intercept (bool) – Whether to add an intercept term to the formula. If passing ‘True’, it will
be the last entry in the coefficients.
l1 (None or float) – Strength of the l1 regularization. If passing it, will bypass the values
set through alpha and l1_ratio, and will remain constant inbetween
calls to fit and partial_fit. If passing this, should also pass
l2 or otherwise will assume that it is zero.
l2 (None or float) – Strength of the l2 regularization. If passing it, will bypass the values
set through alpha and l1_ratio, and will remain constant inbetween
calls to fit and partial_fit. If passing this, should also pass
l1 or otherwise will assume that it is zero.
use_float (bool) – Whether to use C ‘float’ type for the required matrices. If passing ‘False’,
will use C ‘double’. Be aware that memory usage for this model can grow
very large. Can be changed after initialization.
Variables:
coef (array(n) or array(n+1)) – The obtained coefficients. If passing ‘fit_intercept=True’, the intercept
will be at the last entry.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.
The method works on simple estimators as well as on nested objects
(such as Pipeline). The latter have
parameters of the form <component>__<parameter> so that it’s
possible to update each component of a nested object.
Request metadata passed to the partial_fit method.
Note that this method is only relevant if
enable_metadata_routing=True (see sklearn.set_config()).
Please see User Guide on how the routing
mechanism works.
The options for each parameter are:
True: metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to partial_fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (sklearn.utils.metadata_routing.UNCHANGED) retains the
existing request. This allows you to change the request for some
parameters and not others.
Added in version 1.3.
Note
This method is only relevant if this estimator is used as a
sub-estimator of a meta-estimator, e.g. used inside a
Pipeline. Otherwise it has no effect.
Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in partial_fit.