Contextual Bandits¶
This is the documentation page for the python package contextualbandits. For more details, see the project’s GitHub page:
Installation¶
Package is available on PyPI, can be installed with
pip install contextualbandits
If it fails to install due to not being able to compile C code, an earlier pure-Python version can be installed with
pip install contextualbandits==0.1.8.5
Getting started¶
You can find user guides with detailed examples in the following links:
Serializing (pickling) objects¶
Don’t use pickle to userialize objects from this package as it’s likely to fail. Use dill instead, which has the same syntax and is able to serialize more types of objects.
Online Contextual Bandits¶
Hint: if in doubt of where to start or which method to choose, the safest bet is BootstrappedUCB.
Policy classes - first one from each group is the recommended one to use:
Randomized:
Active choices:
AdaptiveGreedy (with active_choice != None)
ExploreFirst (with prob_active_choice > 0)
Thompson sampling:
Upper confidence bound:
Naive:
ActiveExplorer¶
- class contextualbandits.online.ActiveExplorer(base_algorithm, nchoices, f_grad_norm='auto', case_one_class='auto', active_choice='weighted', explore_prob=0.15, decay=0.9997, beta_prior='auto', smoothing=None, noise_to_smooth=True, batch_train=False, refit_buffer=None, deep_copy_buffer=True, assume_unique_reward=False, random_state=None, njobs=- 1)¶
Active Explorer
Selects a proportion of actions according to an active learning heuristic based on gradient. Works only for differentiable and preferably smooth functions.
Note
Here, for the predictions that are made according to an active learning heuristic (these are selected at random, just like in Epsilon-Greedy), the guiding heuristic is the gradient that the observation, having either label (either weighted by the estimted probability, or taking the maximum or minimum), would produce on each model that predicts a class, given the current coefficients for that model. This of course requires being able to calculate gradients - package comes with pre-defined gradient functions for linear and logistic regression, and allows passing custom functions for others.
- Parameters
base_algorithm (obj) – Base binary classifier for which each sample for each class will be fit. Will look for, in this order:
A ‘predict_proba’ method with outputs (n_samples, 2), values in [0,1], rows suming to 1
A ‘decision_function’ method with unbounded outputs (n_samples,) to which it will apply a sigmoid function.
A ‘predict’ method with outputs (n_samples,) with values in [0,1].
Can also pass a list with a different (or already-fit) classifier for each arm.
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a custom name.
f_grad_norm (str ‘auto’ or function(base_algorithm, X, pred) -> array (n_samples, 2)) – Function that calculates the row-wise norm of the gradient from observations in X if their class were negative (first column) or positive (second column). Can also use different functions for each arm, in which case it accepts them as a list of functions with length equal to
nchoices
. The option ‘auto’ will only work with scikit-learn’s ‘LogisticRegression’, ‘SGDClassifier’ (log-loss only), and ‘RidgeClassifier’; with stochQN’s ‘StochasticLogisticRegression’; and with this package’s ‘LinearRegression’.case_one_class (str ‘auto’, ‘zero’, None, or function(X, n_pos, n_neg, rng) -> array(n_samples, 2)) – If some arm/choice/class has only rewards of one type, many models will fail to fit, and consequently the gradients will be undefined. Likewise, if the model has not been fit, the gradient might also be undefined, and this requires a workaround.
If passing ‘None’, will assume that
base_algorithm
can be fit to data of only-positive or only-negative class without problems, and that it can calculate gradients and predictions with abase_algorithm
object that has not been fitted. Be aware that the methods ‘predict’, ‘predict_proba’, and ‘decision_function’ inbase_algorithm
might be overwritten with another method that wraps it in a try-catch block, so don’t rely on it producing errors when unfitted.If passing a function, will take the output of it as the row-wise gradient norms when it compares them against other arms/classes, with the first column having the values if the observations were of negative class, and the second column if they were positive class. The other inputs to this function are the number of positive and negative examples that have been observed, and a
Generator
object from NumPy to use for generating random numbers.If passing a list, will assume each entry is a function as described above, to be used with each corresponding arm.
If passing ‘auto’, will generate random numbers:
negative: ~ Gamma(log10(n_features) / (n_pos+1)/(n_pos+n_neg+2), log10(n_features)).
positive: ~ Gamma(log10(n_features) * (n_pos+1)/(n_pos+n_neg+2), log10(n_features)).
If passing ‘zero’, it will output zero whenever models have not been fitted.
Note that the theoretically correct approach for a logistic regression would be to assume models with all-zero coefficients, in which case the gradient is defined in the absence of any data, but this tends to produce bad end results.
active_choice (str in {‘min’, ‘max’, ‘weighted’}) – How to calculate the gradient that an observation would have on the loss function for each classifier, given that it could be either class (positive or negative) for the classifier that predicts each arm. If weighted, they are weighted by the same probability estimates from the base algorithm.
explore_prob (float (0,1)) – Probability of selecting an action according to active learning criteria.
decay (float (0,1)) – After each prediction, the probability of selecting an arm according to active learning criteria is set to p = p*decay
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without a reward from a given arm, it will predict the score for that class as a random number drawn from a beta distribution with the prior specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((2/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed as a list of tuples. This parameter can have a very large impact in the end results, and it’s recommended to tune it accordingly - scenarios with low expected reward rates should have priors that result in drawing small random numbers, whereas scenarios with large expected reward rates should have stronger priors and tend towards larger random numbers. Also, the more arms there are, the smaller the optimal expected value for these random numbers. Recommended to use only one of
beta_prior
orsmoothing
.smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b), where ‘n’ is the number of times each arm was chosen in the training data. Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm (e.g. if there are arm features, these parameters can be determined through a different model). Recommended to use only one of
beta_prior
orsmoothing
.noise_to_smooth (bool) – If passing
smoothing
, whether to add a small amount of random noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of choosing the smallest arm index. Ignored when passingsmoothing=None
.batch_train (bool) – Whether the base algorithm will be fit to the data in batches as it comes (streaming), or to the whole dataset each time it is refit. Requires a classifier with a ‘partial_fit’ method.
refit_buffer (int or None) – Number of observations per arm to keep as a reserve for passing to ‘partial_fit’. If passing it, up until the moment there are at least this number of observations for a given arm, that arm will keep the observations when calling ‘fit’ and ‘partial_fit’, and will translate calls to ‘partial_fit’ to calls to ‘fit’ with the new plus stored observations. After the reserve number is reached, calls to ‘partial_fit’ will enlarge the data batch with the stored observations, and old stored observations will be gradually replaced with the new ones (at random, not on a FIFO basis). This technique can greatly enchance the performance when fitting the data in batches, but memory consumption can grow quite large. If passing sparse CSR matrices as input to ‘fit’ and ‘partial_fit’, these will be converted to dense once they go into this reserve, and then converted back to CSR to augment the new data. Calls to ‘fit’ will override this reserve. Ignored when passing ‘batch_train=False’.
deep_copy_buffer (bool) – Whether to make deep copies of the data that is stored in the reserve for
refit_buffer
. If passing ‘False’, when the reserve is not yet full, these will only store shallow copies of the data, which is faster but will not let Python’s garbage collector free memory after deleting the data, and if the original data is overwritten, so will this buffer. Ignored when not usingrefit_buffer
.assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’, whenever an arm receives a reward, the classifiers for all other arms will be fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator
object for random number generation, aRandomState
object (from NumPy) from which to draw an integer, or aGenerator
object (from NumPy), which will be used directly. While this controls random number generation for this meteheuristic, there can still be other sources of variations upon re-runs, such as data aggregations in parallel (e.g. from OpenMP or BLAS functions).njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will set it to the number of CPU cores. Note that if the base algorithm is itself parallelized, this might result in a slowdown as both compete for available threads, so don’t set parallelization in both. The parallelization uses shared memory, thus you will only see a speed up if your base classifier releases the Python GIL, and will otherwise result in slower runs.
References
- 1
Cortes, David. “Adapting multi-armed bandits policies to contextual bandits scenarios.” arXiv preprint arXiv:1811.04383 (2018).
- add_arm(arm_name=None, fitted_classifier=None, n_w_rew=0, n_wo_rew=0, smoothing=None, beta_prior=None, refit_buffer_X=None, refit_buffer_r=None, f_grad_norm=None, case_one_class=None)¶
Adds a new arm to the pool of choices
- Parameters
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise, will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not accept arbitrary classifiers as input, don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase). If the constructor was called with different
base_algorithm
per arm, must pass a base classifier here. Not applicable for the classes that do not take abase_algorithm
.n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor for details). If
None
and if thesmoothing
passed to the constructor didn’t have separate entries per arm, will use the samesmoothing
as was passed in the constructor. If nosmoothing
was passed to the constructor, thesmoothing
here will be ignored. Must pass asmoothing
here if the constructor was passed asmoothing
with different entries per arm.beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details. Must be passed if the constructor was provided different beta priors per arm. If
None
and the constructor had a singlebeta_prior
, will use that samebeta_prior
for this new arm. Note thatn_w_rew
andn_wo_rew
will be counted towards the threshold ‘n’ in here. Cannot be passed if the constructor did not have abeta_prior
.refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
- Returns
self – This object
- Return type
object
- decision_function(X)¶
Get the scores for each arm following this policy’s action-choosing criteria.
- Parameters
X (array (n_samples, n_features)) – Data for which to obtain decision function scores for each arm.
- Returns
scores – Scores following this policy for each arm.
- Return type
array (n_samples, n_choices)
- drop_arm(arm_name)¶
Drop an arm/choice
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
- Parameters
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise, will drop the arm matching this name (argument must be of the same type as the individual entries passed to ‘nchoices’ in the initialization).
- Returns
self – This object
- Return type
object
- fit(X, a, r, warm_start=False, continue_from_last=False)¶
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
warm_start (bool) – Whether to use the results of previous calls to ‘fit’ as a start for fitting to the ‘X’ data passed here. This will only be available if the base classifier has a property
warm_start
too and that property is also set to ‘True’. You can double-check that it’s recognized as such by checking this object’s propertyhas_warm_start
. Passing ‘True’ when the classifier doesn’t support warm start despite having the property might slow down things. Dropping arms will make this functionality unavailable. This options is not available for ‘BootstrappedUCB’, nor for ‘BootstrappedTS’.continue_from_last (bool) – If the policy was previously fit to data, whether to assume that this new call to ‘fit’ will continue from the exact same dataset as before plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case, will only refit the models that have new data according to ‘a’. Note that the bootstrapped policies will still benefit from extra refits. This option should not be used when there are calls to ‘partial_fit’ between calls to fit. Ignored if using
assume_unique_reward=True
.
- Returns
self – This object
- Return type
obj
- partial_fit(X, a, r)¶
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method, such as ‘sklearn.linear_model.SGDClassifier’. This method is not available for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
- Returns
self – This object
- Return type
obj
- predict(X, exploit=False, output_score=False)¶
Selects actions according to this policy for new data.
- Parameters
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use it with this pakckage’s offpolicy and evaluation modules.
- Returns
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary with the chosen arm and the score that the arm got following this policy with the classifiers used.
- Return type
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
- reset_active_choice(active_choice='weighted')¶
Set the active gradient criteria to a custom form
- Parameters
active_choice (str in {‘min’, ‘max’, ‘weighted’}) – How to calculate the gradient that an observation would have on the loss function for each classifier, given that it could be either class (positive or negative) for the classifier that predicts each arm. If weighted, they are weighted by the same probability estimates from the base algorithm.
- Returns
self – This object
- Return type
obj
- reset_explore_prob(explore_prob=0.2)¶
Set the active exploration probability to a custom number
- Parameters
explore_prob (float between 0 and 1) – The new exploration probability. Note that it will still apply decay on it after being reset.
- Returns
self – This object
- Return type
obj
- topN(X, n)¶
Get top-N ranked actions for each observation
Note
This method will rank choices/arms according to what the policy dictates - it is not an exploitation-mode rank, so if e.g. there are random choices for some observations, there will be random ranks in here.
- Parameters
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
- Returns
topN – The top-ranked actions for each observation
- Return type
array(n_samples, n)
AdaptiveGreedy¶
- class contextualbandits.online.AdaptiveGreedy(base_algorithm, nchoices, window_size=500, percentile=30, decay=0.9998, decay_type='percentile', initial_thr='auto', beta_prior='auto', smoothing=None, noise_to_smooth=True, batch_train=False, refit_buffer=None, deep_copy_buffer=True, assume_unique_reward=False, active_choice=None, f_grad_norm='auto', case_one_class='auto', random_state=None, njobs=- 1)¶
Adaptive Greedy
Takes the action with highest estimated reward, unless that estimation falls below a certain threshold, in which case it takes a an action either at random or according to an active learning heuristic (same way as ActiveExplorer).
Note
The hyperparameters here can make a large impact on the quality of the choices. Be sure to tune the threshold (or percentile), decay, and prior (or smoothing parameters).
Note
The threshold for the reward probabilities can be set to a hard-coded number, or to be calculated dynamically by keeping track of the predictions it makes, and taking a fixed percentile of that distribution to be the threshold. In the second case, these are calculated in separate batches rather than in a sliding window.
Can also be set to make choices in the same way as ‘ActiveExplorer’ rather than random (see ‘greedy_choice’ parameter).
- Parameters
base_algorithm (obj) – Base binary classifier for which each sample for each class will be fit. Will look for, in this order:
A ‘predict_proba’ method with outputs (n_samples, 2), values in [0,1], rows suming to 1
A ‘decision_function’ method with unbounded outputs (n_samples,) to which it will apply a sigmoid function.
A ‘predict’ method with outputs (n_samples,) with values in [0,1].
Can also pass a list with a different (or already-fit) classifier for each arm.
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or series with arm names, in which case the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a custom name.
window_size (int) – Number of predictions after which the threshold will be updated to the desired percentile.
percentile (int in [0,100] or None) – Percentile of the predictions sample to set as threshold, below which actions are random. If None, will not take percentiles, will instead use the intial threshold and apply decay to it.
decay (float (0,1) or None) –
- After each prediction, either the threshold or the percentile gets adjusted to:
val_{t+1} = val_t*decay
decay_type (str, either ‘percentile’ or ‘threshold’) – Whether to decay the threshold itself or the percentile of the predictions to take after each prediction. Ignored when using ‘decay=None’. If passing ‘percentile=None’ and ‘decay_type=percentile’, will be forced to ‘threshold’.
initial_thr (str ‘auto’ or float (0,1)) – Initial threshold for the prediction below which a random action is taken. If set to ‘auto’, will be calculated as initial_thr = 1 / (2 * sqrt(nchoices)). Note that if ‘base_algorithm’ has a ‘decision_function’ method, it will first apply a sigmoid function to the output, and then compare it to the threshold, so the threshold should lie between zero and one.
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without a reward from a given arm, it will predict the score for that class as a random number drawn from a beta distribution with the prior specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((3/nchoices, 4), 2)
Can also pass different priors per arm, in which case they should be passed as a list of tuples. This parameter can have a very large impact in the end results, and it’s recommended to tune it accordingly - scenarios with low expected reward rates should have priors that result in drawing small random numbers, whereas scenarios with large expected reward rates should have stronger priors and tend towards larger random numbers. Also, the more arms there are, the smaller the optimal expected value for these random numbers. Note that the default value for
AdaptiveGreedy
is different than from the other methods in this module, and it’s recommended to experiment with different values of this hyperparameter. Recommended to use only one ofbeta_prior
orsmoothing
.smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b), where ‘n’ is the number of times each arm was chosen in the training data. Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm (e.g. if there are arm features, these parameters can be determined through a different model). This will not work well with non-probabilistic classifiers such as SVM, in which case you might want to define a class that embeds it with some recalibration built-in. Recommended to use only one of
beta_prior
orsmoothing
.noise_to_smooth (bool) – If passing
smoothing
, whether to add a small amount of random noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of choosing the smallest arm index. Ignored when passingsmoothing=None
.batch_train (bool) – Whether the base algorithm will be fit to the data in batches as it comes (streaming), or to the whole dataset each time it is refit. Requires a classifier with a ‘partial_fit’ method.
refit_buffer (int or None) – Number of observations per arm to keep as a reserve for passing to ‘partial_fit’. If passing it, up until the moment there are at least this number of observations for a given arm, that arm will keep the observations when calling ‘fit’ and ‘partial_fit’, and will translate calls to ‘partial_fit’ to calls to ‘fit’ with the new plus stored observations. After the reserve number is reached, calls to ‘partial_fit’ will enlarge the data batch with the stored observations, and old stored observations will be gradually replaced with the new ones (at random, not on a FIFO basis). This technique can greatly enchance the performance when fitting the data in batches, but memory consumption can grow quite large. If passing sparse CSR matrices as input to ‘fit’ and ‘partial_fit’, these will be converted to dense once they go into this reserve, and then converted back to CSR to augment the new data. Calls to ‘fit’ will override this reserve. Ignored when passing ‘batch_train=False’.
deep_copy_buffer (bool) – Whether to make deep copies of the data that is stored in the reserve for
refit_buffer
. If passing ‘False’, when the reserve is not yet full, these will only store shallow copies of the data, which is faster but will not let Python’s garbage collector free memory after deleting the data, and if the original data is overwritten, so will this buffer. Ignored when not usingrefit_buffer
.assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’, whenever an arm receives a reward, the classifiers for all other arms will be fit to that observation too, having negative label.
active_choice (None or str in {‘min’, ‘max’, ‘weighted’}) – How to select arms when predictions are below the threshold. If passing None, selects them at random (default). If passing ‘min’, ‘max’ or ‘weighted’, selects them in the same way as ‘ActiveExplorer’. Non-random active selection requires being able to calculate gradients (gradients for logistic regression and linear regression (from this package) are already defined with an option ‘auto’ below).
f_grad_norm (None, str ‘auto’, list, or function(base_algorithm, X, pred) -> array (n_samples, 2)) – (When passing
active_choice
) Function that calculates the row-wise norm of the gradient from observations in X if their class were Function that calculates the row-wise norm of the gradient from observations in X if their class were negative (first column) or positive (second column). Can also use different functions for each arm, in which case it accepts them as a list of functions with length equal tonchoices
. The option ‘auto’ will only work with scikit-learn’s ‘LogisticRegression’, ‘SGDClassifier’, and ‘RidgeClassifier’; with stochQN’s ‘StochasticLogisticRegression’; and with this package’s ‘LinearRegression’.case_one_class (str ‘auto’, ‘zero’, None, list, or function(X, n_pos, n_neg, rng) -> array(n_samples, 2)) – (When passing
active_choice
) If some arm/choice/class has only rewards of one type, many models will fail to fit, and consequently the gradients will be undefined. Likewise, if the model has not been fit, the gradient might also be undefined, and this requires a workaround.If passing ‘None’, will assume that
base_algorithm
can be fit to data of only-positive or only-negative class without problems, and that it can calculate gradients and predictions with abase_algorithm
object that has not been fitted. Be aware that the methods ‘predict’, ‘predict_proba’, and ‘decision_function’ inbase_algorithm
might be overwritten with another method that wraps it in a try-catch block, so don’t rely on it producing errors when unfitted.If passing a function, will take the output of it as the row-wise gradient norms when it compares them against other arms/classes, with the first column having the values if the observations were of negative class, and the second column if they were positive class. The inputs to this function (signature described above) are the number of positive and negative examples that have been observed, and a
Generator
object from NumPy to use for generating random numbers.If passing a list, will assume each entry is a function as described above, to be used with each corresponding arm.
If passing ‘auto’, will generate random numbers:
negative: ~ Gamma(log10(n_features) / (n_pos+1)/(n_pos+n_neg+2), log10(n_features)).
positive: ~ Gamma(log10(n_features) * (n_pos+1)/(n_pos+n_neg+2), log10(n_features)).
If passing ‘zero’, it will output zero whenever models have not been fitted.
Note that the theoretically correct approach for a logistic regression would be to assume models with all-zero coefficients, in which case the gradient is defined in the absence of any data, but this tends to produce bad end results.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator
object for random number generation, aRandomState
object (from NumPy) from which to draw an integer, or aGenerator
object (from NumPy), which will be used directly. While this controls random number generation for this meteheuristic, there can still be other sources of variations upon re-runs, such as data aggregations in parallel (e.g. from OpenMP or BLAS functions).njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will set it to the number of CPU cores. Note that if the base algorithm is itself parallelized, this might result in a slowdown as both compete for available threads, so don’t set parallelization in both. The parallelization uses shared memory, thus you will only see a speed up if your base classifier releases the Python GIL, and will otherwise result in slower runs.
References
- 1
Chakrabarti, Deepayan, et al. “Mortal multi-armed bandits.” Advances in neural information processing systems. 2009.
- 2
Cortes, David. “Adapting multi-armed bandits policies to contextual bandits scenarios.” arXiv preprint arXiv:1811.04383 (2018).
- add_arm(arm_name=None, fitted_classifier=None, n_w_rew=0, n_wo_rew=0, smoothing=None, beta_prior=None, refit_buffer_X=None, refit_buffer_r=None, f_grad_norm=None, case_one_class=None)¶
Adds a new arm to the pool of choices
- Parameters
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise, will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not accept arbitrary classifiers as input, don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase). If the constructor was called with different
base_algorithm
per arm, must pass a base classifier here. Not applicable for the classes that do not take abase_algorithm
.n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor for details). If
None
and if thesmoothing
passed to the constructor didn’t have separate entries per arm, will use the samesmoothing
as was passed in the constructor. If nosmoothing
was passed to the constructor, thesmoothing
here will be ignored. Must pass asmoothing
here if the constructor was passed asmoothing
with different entries per arm.beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details. Must be passed if the constructor was provided different beta priors per arm. If
None
and the constructor had a singlebeta_prior
, will use that samebeta_prior
for this new arm. Note thatn_w_rew
andn_wo_rew
will be counted towards the threshold ‘n’ in here. Cannot be passed if the constructor did not have abeta_prior
.refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
- Returns
self – This object
- Return type
object
- decision_function(X)¶
Get the scores for each arm following this policy’s action-choosing criteria.
- Parameters
X (array (n_samples, n_features)) – Data for which to obtain decision function scores for each arm.
- Returns
scores – Scores following this policy for each arm.
- Return type
array (n_samples, n_choices)
- drop_arm(arm_name)¶
Drop an arm/choice
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
- Parameters
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise, will drop the arm matching this name (argument must be of the same type as the individual entries passed to ‘nchoices’ in the initialization).
- Returns
self – This object
- Return type
object
- fit(X, a, r, warm_start=False, continue_from_last=False)¶
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
warm_start (bool) – Whether to use the results of previous calls to ‘fit’ as a start for fitting to the ‘X’ data passed here. This will only be available if the base classifier has a property
warm_start
too and that property is also set to ‘True’. You can double-check that it’s recognized as such by checking this object’s propertyhas_warm_start
. Passing ‘True’ when the classifier doesn’t support warm start despite having the property might slow down things. Dropping arms will make this functionality unavailable. This options is not available for ‘BootstrappedUCB’, nor for ‘BootstrappedTS’.continue_from_last (bool) – If the policy was previously fit to data, whether to assume that this new call to ‘fit’ will continue from the exact same dataset as before plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case, will only refit the models that have new data according to ‘a’. Note that the bootstrapped policies will still benefit from extra refits. This option should not be used when there are calls to ‘partial_fit’ between calls to fit. Ignored if using
assume_unique_reward=True
.
- Returns
self – This object
- Return type
obj
- partial_fit(X, a, r)¶
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method, such as ‘sklearn.linear_model.SGDClassifier’. This method is not available for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
- Returns
self – This object
- Return type
obj
- predict(X, exploit=False)¶
Selects actions according to this policy for new data.
- Parameters
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the arm with the highest expected reward according to current models.
- Returns
pred – Actions chosen by the policy.
- Return type
array (n_samples,)
- reset_active_choice(active_choice='weighted')¶
Set the active gradient criteria to a custom form
- Parameters
active_choice (str in {‘min’, ‘max’, ‘weighted’}) – How to calculate the gradient that an observation would have on the loss function for each classifier, given that it could be either class (positive or negative) for the classifier that predicts each arm. If weighted, they are weighted by the same probability estimates from the base algorithm.
- Returns
self – This object
- Return type
obj
- reset_percentile(percentile=30)¶
Set the moving percentile to a custom number
- Parameters
percentile (int between 0 and 100) – The new percentile to set. Note that it will still apply decay to it after being set through this method.
- Returns
self – This object
- Return type
obj
- reset_threshold(threshold='auto')¶
Set the adaptive threshold to a custom number
- Parameters
threshold (float or “auto”) – New threshold to use. If passing “auto”, will set it to 1.5/nchoices. Note that this threshold will still be decayed if the object was initialized with
decay_type="threshold"
, and will still be updated if initialized withpercentile != None
.- Returns
self – This object
- Return type
obj
- topN(X, n)¶
Get top-N ranked actions for each observation
Note
This method will rank choices/arms according to what the policy dictates - it is not an exploitation-mode rank, so if e.g. there are random choices for some observations, there will be random ranks in here.
- Parameters
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
- Returns
topN – The top-ranked actions for each observation
- Return type
array(n_samples, n)
BootstrappedTS¶
- class contextualbandits.online.BootstrappedTS(base_algorithm, nchoices, nsamples=10, beta_prior='auto', smoothing=None, noise_to_smooth=True, sample_unique=True, sample_weighted=False, batch_train=False, refit_buffer=None, deep_copy_buffer=True, assume_unique_reward=False, batch_sample_method='gamma', random_state=None, njobs_arms=- 1, njobs_samples=1)¶
Bootstrapped Thompson Sampling
Performs Thompson Sampling by fitting several models per class on bootstrapped samples, then makes predictions by taking one of them at random for each class.
Note
When fitting the algorithm to data in batches (online), it’s not possible to take an exact bootstrapped sample, as the sample is not known in advance. In theory, as the sample size grows to infinity, the number of times that an observation appears in a bootstrapped sample is distributed \(\sim Poisson(1)\). However, assigning random gamma-distributed weights to observations produces a more stable effect, so it also has the option to assign weights randomly \(\sim Gamma(1,1)\).
Note
If you plan to make only one call to ‘predict’ between calls to ‘fit’ and have
sample_unique=False
, you can passnsamples=1
without losing any precision.- Parameters
base_algorithm (obj) – Base binary classifier for which each sample for each class will be fit. Will look for, in this order:
A ‘predict_proba’ method with outputs (n_samples, 2), values in [0,1], rows suming to 1
A ‘decision_function’ method with unbounded outputs (n_samples,) to which it will apply a sigmoid function.
A ‘predict’ method with outputs (n_samples,) with values in [0,1].
Can also pass a list with a different (or already-fit) classifier for each arm.
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a custom name.
nsamples (int) – Number of bootstrapped samples per class to take.
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without a reward from a given arm, it will predict the score for that class as a random number drawn from a beta distribution with the prior specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((2/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed as a list of tuples. This parameter can have a very large impact in the end results, and it’s recommended to tune it accordingly - scenarios with low expected reward rates should have priors that result in drawing small random numbers, whereas scenarios with large expected reward rates should have stronger priors and tend towards larger random numbers. Also, the more arms there are, the smaller the optimal expected value for these random numbers. Recommended to use only one of
beta_prior
orsmoothing
.smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b), where ‘n’ is the number of times each arm was chosen in the training data. Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm (e.g. if there are arm features, these parameters can be determined through a different model). This will not work well with non-probabilistic classifiers such as SVM, in which case you might want to define a class that embeds it with some recalibration built-in. Recommended to use only one of
beta_prior
orsmoothing
.noise_to_smooth (bool) – If passing
smoothing
, whether to add a small amount of random noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of choosing the smallest arm index. Ignored when passingsmoothing=None
.sample_unique (bool) – Whether to use a different bootstrapped classifier per row at each arm when calling ‘predict’. If passing ‘False’, will take the same bootstrapped classifier within an arm for all the rows passed in a single call to ‘predict’. Passing ‘False’ is a faster alternative, but the theoretically correct way is using a different one per row. Forced to ‘True’ when passing
sample_weighted=True
.sample_weighted (bool) – Whether to take a weighted average from the predictions from each bootstrapped classifier at a given arm, with random weights. This will make the predictions more variable (i.e. more randomness in exploration). The alternative (and default) is to take a prediction from a single classifier each time.
batch_train (bool) – Whether the base algorithm will be fit to the data in batches as it comes (streaming), or to the whole dataset each time it is refit. Requires a classifier with a ‘partial_fit’ method.
refit_buffer (int or None) – Number of observations per arm to keep as a reserve for passing to ‘partial_fit’. If passing it, up until the moment there are at least this number of observations for a given arm, that arm will keep the observations when calling ‘fit’ and ‘partial_fit’, and will translate calls to ‘partial_fit’ to calls to ‘fit’ with the new plus stored observations. After the reserve number is reached, calls to ‘partial_fit’ will enlarge the data batch with the stored observations, and old stored observations will be gradually replaced with the new ones (at random, not on a FIFO basis). This technique can greatly enchance the performance when fitting the data in batches, but memory consumption can grow quite large. If passing sparse CSR matrices as input to ‘fit’ and ‘partial_fit’, these will be converted to dense once they go into this reserve, and then converted back to CSR to augment the new data. Calls to ‘fit’ will override this reserve. Ignored when passing ‘batch_train=False’.
deep_copy_buffer (bool) – Whether to make deep copies of the data that is stored in the reserve for
refit_buffer
. If passing ‘False’, when the reserve is not yet full, these will only store shallow copies of the data, which is faster but will not let Python’s garbage collector free memory after deleting the data, and if the original data is overwritten, so will this buffer. Ignored when not usingrefit_buffer
.assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’, whenever an arm receives a reward, the classifiers for all other arms will be fit to that observation too, having negative label.
batch_sample_method (str, either ‘gamma’ or ‘poisson’) – How to simulate bootstrapped samples when training in batch mode (online). See Note.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator
object for random number generation, aRandomState
object (from NumPy) from which to draw an integer, or aGenerator
object (from NumPy), which will be used directly. While this controls random number generation for this meteheuristic, there can still be other sources of variations upon re-runs, such as data aggregations in parallel (e.g. from OpenMP or BLAS functions).njobs_arms (int or None) – Number of parallel jobs to run (for dividing work across arms). If passing None will set it to 1. If passing -1 will set it to the number of CPU cores. Note that if the base algorithm is itself parallelized, this might result in a slowdown as both compete for available threads, so don’t set parallelization in both. The total number of parallel jobs will be njobs_arms * njobs_samples. The parallelization uses shared memory, thus you will only see a speed up if your base classifier releases the Python GIL, and will otherwise result in slower runs.
njobs_samples (int or None) – Number of parallel jobs to run (for dividing work across samples within one arm). If passing None will set it to 1. If passing -1 will set it to the number of CPU cores. The total number of parallel jobs will be njobs_arms * njobs_samples. The parallelization uses shared memory, thus you will only see a speed up if your base classifier releases the Python GIL, and will otherwise result in slower runs.
References
- 1
Cortes, David. “Adapting multi-armed bandits policies to contextual bandits scenarios.” arXiv preprint arXiv:1811.04383 (2018).
- 2
Chapelle, Olivier, and Lihong Li. “An empirical evaluation of thompson sampling.” Advances in neural information processing systems. 2011.
- add_arm(arm_name=None, fitted_classifier=None, n_w_rew=0, n_wo_rew=0, smoothing=None, beta_prior=None, refit_buffer_X=None, refit_buffer_r=None, f_grad_norm=None, case_one_class=None)¶
Adds a new arm to the pool of choices
- Parameters
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise, will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not accept arbitrary classifiers as input, don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase). If the constructor was called with different
base_algorithm
per arm, must pass a base classifier here. Not applicable for the classes that do not take abase_algorithm
.n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor for details). If
None
and if thesmoothing
passed to the constructor didn’t have separate entries per arm, will use the samesmoothing
as was passed in the constructor. If nosmoothing
was passed to the constructor, thesmoothing
here will be ignored. Must pass asmoothing
here if the constructor was passed asmoothing
with different entries per arm.beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details. Must be passed if the constructor was provided different beta priors per arm. If
None
and the constructor had a singlebeta_prior
, will use that samebeta_prior
for this new arm. Note thatn_w_rew
andn_wo_rew
will be counted towards the threshold ‘n’ in here. Cannot be passed if the constructor did not have abeta_prior
.refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
- Returns
self – This object
- Return type
object
- decision_function(X)¶
Get the scores for each arm following this policy’s action-choosing criteria.
- Parameters
X (array (n_samples, n_features)) – Data for which to obtain decision function scores for each arm.
- Returns
scores – Scores following this policy for each arm.
- Return type
array (n_samples, n_choices)
- drop_arm(arm_name)¶
Drop an arm/choice
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
- Parameters
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise, will drop the arm matching this name (argument must be of the same type as the individual entries passed to ‘nchoices’ in the initialization).
- Returns
self – This object
- Return type
object
- fit(X, a, r, warm_start=False, continue_from_last=False)¶
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
warm_start (bool) – Whether to use the results of previous calls to ‘fit’ as a start for fitting to the ‘X’ data passed here. This will only be available if the base classifier has a property
warm_start
too and that property is also set to ‘True’. You can double-check that it’s recognized as such by checking this object’s propertyhas_warm_start
. Passing ‘True’ when the classifier doesn’t support warm start despite having the property might slow down things. Dropping arms will make this functionality unavailable. This options is not available for ‘BootstrappedUCB’, nor for ‘BootstrappedTS’.continue_from_last (bool) – If the policy was previously fit to data, whether to assume that this new call to ‘fit’ will continue from the exact same dataset as before plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case, will only refit the models that have new data according to ‘a’. Note that the bootstrapped policies will still benefit from extra refits. This option should not be used when there are calls to ‘partial_fit’ between calls to fit. Ignored if using
assume_unique_reward=True
.
- Returns
self – This object
- Return type
obj
- partial_fit(X, a, r)¶
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method, such as ‘sklearn.linear_model.SGDClassifier’. This method is not available for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
- Returns
self – This object
- Return type
obj
- predict(X, exploit=False, output_score=False)¶
Selects actions according to this policy for new data.
- Parameters
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use it with this pakckage’s offpolicy and evaluation modules.
- Returns
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary with the chosen arm and the score that the arm got following this policy with the classifiers used.
- Return type
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
- topN(X, n)¶
Get top-N ranked actions for each observation
Note
This method will rank choices/arms according to what the policy dictates - it is not an exploitation-mode rank, so if e.g. there are random choices for some observations, there will be random ranks in here.
- Parameters
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
- Returns
topN – The top-ranked actions for each observation
- Return type
array(n_samples, n)
BootstrappedUCB¶
- class contextualbandits.online.BootstrappedUCB(base_algorithm, nchoices, nsamples=10, percentile=80, beta_prior='auto', smoothing=None, noise_to_smooth=True, batch_train=False, refit_buffer=None, deep_copy_buffer=True, assume_unique_reward=False, batch_sample_method='gamma', random_state=None, njobs_arms=- 1, njobs_samples=1)¶
Bootstrapped Upper Confidence Bound
Obtains an upper confidence bound by taking the percentile of the predictions from a set of classifiers, all fit with different bootstrapped samples (multiple samples per arm).
Note
When fitting the algorithm to data in batches (online), it’s not possible to take an exact bootstrapped sample, as the sample is not known in advance. In theory, as the sample size grows to infinity, the number of times that an observation appears in a bootstrapped sample is distributed \(\sim Poisson(1)\). However, assigning random gamma-distributed weights to observations produces a more stable effect, so it also has the option to assign weights randomly \(\sim Gamma(1,1)\).
- Parameters
base_algorithm (obj or list) – Base binary classifier for which each sample for each class will be fit. Will look for, in this order:
A ‘predict_proba’ method with outputs (n_samples, 2), values in [0,1], rows suming to 1
A ‘decision_function’ method with unbounded outputs (n_samples,) to which it will apply a sigmoid function.
A ‘predict’ method with outputs (n_samples,) with values in [0,1].
Can also pass a list with a different (or already-fit) classifier for each arm.
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a custom name.
nsamples (int) – Number of bootstrapped samples per class to take.
percentile (int [0,100]) – Percentile of the predictions sample to take
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without a reward from a given arm, it will predict the score for that class as a random number drawn from a beta distribution with the prior specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((3/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed as a list of tuples. Note that it will only generate one random number per arm, so the ‘a’ parameter should be higher than for other methods. This parameter can have a very large impact in the end results, and it’s recommended to tune it accordingly - scenarios with low expected reward rates should have priors that result in drawing small random numbers, whereas scenarios with large expected reward rates should have stronger priors and tend towards larger random numbers. Also, the more arms there are, the smaller the optimal expected value for these random numbers. Recommended to use only one of
beta_prior
orsmoothing
.smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b), where ‘n’ is the number of times each arm was chosen in the training data. Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm (e.g. if there are arm features, these parameters can be determined through a different model). This will not work well with non-probabilistic classifiers such as SVM, in which case you might want to define a class that embeds it with some recalibration built-in. Recommended to use only one of
beta_prior
orsmoothing
.noise_to_smooth (bool) – If passing
smoothing
, whether to add a small amount of random noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of choosing the smallest arm index. Ignored when passingsmoothing=None
.batch_train (bool) – Whether the base algorithm will be fit to the data in batches as it comes (streaming), or to the whole dataset each time it is refit. Requires a classifier with a ‘partial_fit’ method.
refit_buffer (int or None) – Number of observations per arm to keep as a reserve for passing to ‘partial_fit’. If passing it, up until the moment there are at least this number of observations for a given arm, that arm will keep the observations when calling ‘fit’ and ‘partial_fit’, and will translate calls to ‘partial_fit’ to calls to ‘fit’ with the new plus stored observations. After the reserve number is reached, calls to ‘partial_fit’ will enlarge the data batch with the stored observations, and old stored observations will be gradually replaced with the new ones (at random, not on a FIFO basis). This technique can greatly enchance the performance when fitting the data in batches, but memory consumption can grow quite large. If passing sparse CSR matrices as input to ‘fit’ and ‘partial_fit’, these will be converted to dense once they go into this reserve, and then converted back to CSR to augment the new data. Calls to ‘fit’ will override this reserve. Ignored when passing ‘batch_train=False’.
deep_copy_buffer (bool) – Whether to make deep copies of the data that is stored in the reserve for
refit_buffer
. If passing ‘False’, when the reserve is not yet full, these will only store shallow copies of the data, which is faster but will not let Python’s garbage collector free memory after deleting the data, and if the original data is overwritten, so will this buffer. Ignored when not usingrefit_buffer
.assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’, whenever an arm receives a reward, the classifiers for all other arms will be fit to that observation too, having negative label.
batch_sample_method (str, either ‘gamma’ or ‘poisson’) – How to simulate bootstrapped samples when training in batch mode (online). See Note.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator
object for random number generation, aRandomState
object (from NumPy) from which to draw an integer, or aGenerator
object (from NumPy), which will be used directly. While this controls random number generation for this meteheuristic, there can still be other sources of variations upon re-runs, such as data aggregations in parallel (e.g. from OpenMP or BLAS functions).njobs_arms (int or None) – Number of parallel jobs to run (for dividing work across arms). If passing None will set it to 1. If passing -1 will set it to the number of CPU cores. Note that if the base algorithm is itself parallelized, this might result in a slowdown as both compete for available threads, so don’t set parallelization in both. The total number of parallel jobs will be njobs_arms * njobs_samples. The parallelization uses shared memory, thus you will only see a speed up if your base classifier releases the Python GIL, and will otherwise result in slower runs.
njobs_samples (int or None) – Number of parallel jobs to run (for dividing work across samples within one arm). If passing None will set it to 1. If passing -1 will set it to the number of CPU cores. The total number of parallel jobs will be njobs_arms * njobs_samples. The parallelization uses shared memory, thus you will only see a speed up if your base classifier releases the Python GIL, and will otherwise result in slower runs.
References
- 1
Cortes, David. “Adapting multi-armed bandits policies to contextual bandits scenarios.” arXiv preprint arXiv:1811.04383 (2018).
- add_arm(arm_name=None, fitted_classifier=None, n_w_rew=0, n_wo_rew=0, smoothing=None, beta_prior=None, refit_buffer_X=None, refit_buffer_r=None, f_grad_norm=None, case_one_class=None)¶
Adds a new arm to the pool of choices
- Parameters
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise, will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not accept arbitrary classifiers as input, don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase). If the constructor was called with different
base_algorithm
per arm, must pass a base classifier here. Not applicable for the classes that do not take abase_algorithm
.n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor for details). If
None
and if thesmoothing
passed to the constructor didn’t have separate entries per arm, will use the samesmoothing
as was passed in the constructor. If nosmoothing
was passed to the constructor, thesmoothing
here will be ignored. Must pass asmoothing
here if the constructor was passed asmoothing
with different entries per arm.beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details. Must be passed if the constructor was provided different beta priors per arm. If
None
and the constructor had a singlebeta_prior
, will use that samebeta_prior
for this new arm. Note thatn_w_rew
andn_wo_rew
will be counted towards the threshold ‘n’ in here. Cannot be passed if the constructor did not have abeta_prior
.refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
- Returns
self – This object
- Return type
object
- decision_function(X)¶
Get the scores for each arm following this policy’s action-choosing criteria.
- Parameters
X (array (n_samples, n_features)) – Data for which to obtain decision function scores for each arm.
- Returns
scores – Scores following this policy for each arm.
- Return type
array (n_samples, n_choices)
- drop_arm(arm_name)¶
Drop an arm/choice
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
- Parameters
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise, will drop the arm matching this name (argument must be of the same type as the individual entries passed to ‘nchoices’ in the initialization).
- Returns
self – This object
- Return type
object
- fit(X, a, r, warm_start=False, continue_from_last=False)¶
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
warm_start (bool) – Whether to use the results of previous calls to ‘fit’ as a start for fitting to the ‘X’ data passed here. This will only be available if the base classifier has a property
warm_start
too and that property is also set to ‘True’. You can double-check that it’s recognized as such by checking this object’s propertyhas_warm_start
. Passing ‘True’ when the classifier doesn’t support warm start despite having the property might slow down things. Dropping arms will make this functionality unavailable. This options is not available for ‘BootstrappedUCB’, nor for ‘BootstrappedTS’.continue_from_last (bool) – If the policy was previously fit to data, whether to assume that this new call to ‘fit’ will continue from the exact same dataset as before plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case, will only refit the models that have new data according to ‘a’. Note that the bootstrapped policies will still benefit from extra refits. This option should not be used when there are calls to ‘partial_fit’ between calls to fit. Ignored if using
assume_unique_reward=True
.
- Returns
self – This object
- Return type
obj
- partial_fit(X, a, r)¶
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method, such as ‘sklearn.linear_model.SGDClassifier’. This method is not available for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
- Returns
self – This object
- Return type
obj
- predict(X, exploit=False, output_score=False)¶
Selects actions according to this policy for new data.
- Parameters
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use it with this pakckage’s offpolicy and evaluation modules.
- Returns
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary with the chosen arm and the score that the arm got following this policy with the classifiers used.
- Return type
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
- reset_percentile(percentile=80)¶
Set the upper confidence bound percentile to a custom number
- Parameters
percentile (int [0,100]) – Percentile of the confidence interval to take.
- Returns
self – This object
- Return type
obj
- topN(X, n)¶
Get top-N ranked actions for each observation
Note
This method will rank choices/arms according to what the policy dictates - it is not an exploitation-mode rank, so if e.g. there are random choices for some observations, there will be random ranks in here.
- Parameters
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
- Returns
topN – The top-ranked actions for each observation
- Return type
array(n_samples, n)
EpsilonGreedy¶
- class contextualbandits.online.EpsilonGreedy(base_algorithm, nchoices, explore_prob=0.2, decay=0.9999, beta_prior='auto', smoothing=None, noise_to_smooth=True, batch_train=False, refit_buffer=None, deep_copy_buffer=True, assume_unique_reward=False, random_state=None, njobs=- 1)¶
Epsilon Greedy
Takes a random action with probability p, or the action with highest estimated reward with probability 1-p.
- Parameters
base_algorithm (obj) – Base binary classifier for which each sample for each class will be fit. Will look for, in this order:
A ‘predict_proba’ method with outputs (n_samples, 2), values in [0,1], rows suming to 1
A ‘decision_function’ method with unbounded outputs (n_samples,) to which it will apply a sigmoid function.
A ‘predict’ method with outputs (n_samples,) with values in [0,1].
Can also pass a list with a different (or already-fit) classifier for each arm.
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a custom name.
explore_prob (float (0,1)) – Probability of taking a random action at each round.
decay (float (0,1)) –
- After each prediction, the explore probability reduces to
p = p*decay
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without a reward from a given arm, it will predict the score for that class as a random number drawn from a beta distribution with the prior specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((2/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed as a list of tuples. The impact of
beta_prior
forEpsilonGreedy
is not as high as for other policies in this module. Recommended to use only one ofbeta_prior
orsmoothing
.smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b), where ‘n’ is the number of times each arm was chosen in the training data. Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm (e.g. if there are arm features, these parameters can be determined through a different model). This will not work well with non-probabilistic classifiers such as SVM, in which case you might want to define a class that embeds it with some recalibration built-in. Recommended to use only one of
beta_prior
orsmoothing
.noise_to_smooth (bool) – If passing
smoothing
, whether to add a small amount of random noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of choosing the smallest arm index. Ignored when passingsmoothing=None
.batch_train (bool) – Whether the base algorithm will be fit to the data in batches as it comes (streaming), or to the whole dataset each time it is refit. Requires a classifier with a ‘partial_fit’ method.
refit_buffer (int or None) – Number of observations per arm to keep as a reserve for passing to ‘partial_fit’. If passing it, up until the moment there are at least this number of observations for a given arm, that arm will keep the observations when calling ‘fit’ and ‘partial_fit’, and will translate calls to ‘partial_fit’ to calls to ‘fit’ with the new plus stored observations. After the reserve number is reached, calls to ‘partial_fit’ will enlarge the data batch with the stored observations, and old stored observations will be gradually replaced with the new ones (at random, not on a FIFO basis). This technique can greatly enchance the performance when fitting the data in batches, but memory consumption can grow quite large. If passing sparse CSR matrices as input to ‘fit’ and ‘partial_fit’, these will be converted to dense once they go into this reserve, and then converted back to CSR to augment the new data. Calls to ‘fit’ will override this reserve. Ignored when passing ‘batch_train=False’.
deep_copy_buffer (bool) – Whether to make deep copies of the data that is stored in the reserve for
refit_buffer
. If passing ‘False’, when the reserve is not yet full, these will only store shallow copies of the data, which is faster but will not let Python’s garbage collector free memory after deleting the data, and if the original data is overwritten, so will this buffer. Ignored when not usingrefit_buffer
.assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’, whenever an arm receives a reward, the classifiers for all other arms will be fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator
object for random number generation, aRandomState
object (from NumPy) from which to draw an integer, or aGenerator
object (from NumPy), which will be used directly. While this controls random number generation for this meteheuristic, there can still be other sources of variations upon re-runs, such as data aggregations in parallel (e.g. from OpenMP or BLAS functions).njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will set it to the number of CPU cores. Note that if the base algorithm is itself parallelized, this might result in a slowdown as both compete for available threads, so don’t set parallelization in both. The parallelization uses shared memory, thus you will only see a speed up if your base classifier releases the Python GIL, and will otherwise result in slower runs.
References
- 1
Cortes, David. “Adapting multi-armed bandits policies to contextual bandits scenarios.” arXiv preprint arXiv:1811.04383 (2018).
- 2
Yue, Yisong, et al. “The k-armed dueling bandits problem.” Journal of Computer and System Sciences 78.5 (2012): 1538-1556.
- add_arm(arm_name=None, fitted_classifier=None, n_w_rew=0, n_wo_rew=0, smoothing=None, beta_prior=None, refit_buffer_X=None, refit_buffer_r=None, f_grad_norm=None, case_one_class=None)¶
Adds a new arm to the pool of choices
- Parameters
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise, will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not accept arbitrary classifiers as input, don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase). If the constructor was called with different
base_algorithm
per arm, must pass a base classifier here. Not applicable for the classes that do not take abase_algorithm
.n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor for details). If
None
and if thesmoothing
passed to the constructor didn’t have separate entries per arm, will use the samesmoothing
as was passed in the constructor. If nosmoothing
was passed to the constructor, thesmoothing
here will be ignored. Must pass asmoothing
here if the constructor was passed asmoothing
with different entries per arm.beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details. Must be passed if the constructor was provided different beta priors per arm. If
None
and the constructor had a singlebeta_prior
, will use that samebeta_prior
for this new arm. Note thatn_w_rew
andn_wo_rew
will be counted towards the threshold ‘n’ in here. Cannot be passed if the constructor did not have abeta_prior
.refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
- Returns
self – This object
- Return type
object
- decision_function(X)¶
Get the scores for each arm following this policy’s action-choosing criteria.
- Parameters
X (array (n_samples, n_features)) – Data for which to obtain decision function scores for each arm.
- Returns
scores – Scores following this policy for each arm.
- Return type
array (n_samples, n_choices)
- drop_arm(arm_name)¶
Drop an arm/choice
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
- Parameters
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise, will drop the arm matching this name (argument must be of the same type as the individual entries passed to ‘nchoices’ in the initialization).
- Returns
self – This object
- Return type
object
- fit(X, a, r, warm_start=False, continue_from_last=False)¶
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
warm_start (bool) – Whether to use the results of previous calls to ‘fit’ as a start for fitting to the ‘X’ data passed here. This will only be available if the base classifier has a property
warm_start
too and that property is also set to ‘True’. You can double-check that it’s recognized as such by checking this object’s propertyhas_warm_start
. Passing ‘True’ when the classifier doesn’t support warm start despite having the property might slow down things. Dropping arms will make this functionality unavailable. This options is not available for ‘BootstrappedUCB’, nor for ‘BootstrappedTS’.continue_from_last (bool) – If the policy was previously fit to data, whether to assume that this new call to ‘fit’ will continue from the exact same dataset as before plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case, will only refit the models that have new data according to ‘a’. Note that the bootstrapped policies will still benefit from extra refits. This option should not be used when there are calls to ‘partial_fit’ between calls to fit. Ignored if using
assume_unique_reward=True
.
- Returns
self – This object
- Return type
obj
- partial_fit(X, a, r)¶
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method, such as ‘sklearn.linear_model.SGDClassifier’. This method is not available for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
- Returns
self – This object
- Return type
obj
- predict(X, exploit=False, output_score=False)¶
Selects actions according to this policy for new data.
- Parameters
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use it with this pakckage’s offpolicy and evaluation modules.
- Returns
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary with the chosen arm and the score that the arm got following this policy with the classifiers used.
- Return type
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
- reset_epsilon(explore_prob=0.2)¶
Set the exploration probability to a custom number
- Parameters
explore_prob (float between 0 and 1) – The exploration probability to set. Note that it will still apply the decay after resetting it.
- Returns
self – This object
- Return type
obj
- topN(X, n)¶
Get top-N ranked actions for each observation
Note
This method will rank choices/arms according to what the policy dictates - it is not an exploitation-mode rank, so if e.g. there are random choices for some observations, there will be random ranks in here.
- Parameters
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
- Returns
topN – The top-ranked actions for each observation
- Return type
array(n_samples, n)
ExploreFirst¶
- class contextualbandits.online.ExploreFirst(base_algorithm, nchoices, explore_rounds=2500, prob_active_choice=0.0, active_choice='weighted', f_grad_norm='auto', case_one_class='auto', beta_prior=None, smoothing=None, noise_to_smooth=True, batch_train=False, refit_buffer=None, deep_copy_buffer=True, assume_unique_reward=False, random_state=None, njobs=- 1)¶
Explore First, a.k.a. Explore-Then-Exploit
Selects random actions for the first N predictions, after which it selects the best arm only, according to its estimates.
- Parameters
base_algorithm (obj) – Base binary classifier for which each sample for each class will be fit. Will look for, in this order:
A ‘predict_proba’ method with outputs (n_samples, 2), values in [0,1], rows suming to 1
A ‘decision_function’ method with unbounded outputs (n_samples,) to which it will apply a sigmoid function.
A ‘predict’ method with outputs (n_samples,) with values in [0,1].
Can also pass a list with a different (or already-fit) classifier for each arm.
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a custom name.
explore_rounds (int) – Number of rounds to wait before exploitation mode. Will switch after making N predictions.
prob_active_choice (float (0, 1)) – Probability of choosing explore-mode actions according to active learning criteria. Pass zero for choosing everything at random.
active_choice (str, one of ‘weighted’, ‘max’ or ‘min’) – How to calculate the gradient that an observation would have on the loss function for each classifier, given that it could be either class (positive or negative) for the classifier that predicts each arm. If weighted, they are weighted by the same probability estimates from the base algorithm.
f_grad_norm (None, str ‘auto’ or function(base_algorithm, X, pred) -> array (n_samples, 2)) – (When passing
active_choice
) Function that calculates the row-wise norm of the gradient from observations in X if their class were negative (first column) or positive (second column). Can also use different functions for each arm, in which case it accepts them as a list of functions with length equal tonchoices
. The option ‘auto’ will only work with scikit-learn’s ‘LogisticRegression’, ‘SGDClassifier’ (log-loss only), and ‘RidgeClassifier’; with stochQN’s ‘StochasticLogisticRegression’; and with this package’s ‘LinearRegression’. Ignored when passingprob_active_choice=0.
case_one_class (str ‘auto’, ‘zero’, None, list, or function(X, n_pos, n_neg, rng) -> array(n_samples, 2)) – (When passing
active_choice
) If some arm/choice/class has only rewards of one type, many models will fail to fit, and consequently the gradients will be undefined. Likewise, if the model has not been fit, the gradient might also be undefined, and this requires a workaround.If passing ‘None’, will assume that
base_algorithm
can be fit to data of only-positive or only-negative class without problems, and that it can calculate gradients and predictions with abase_algorithm
object that has not been fitted. Be aware that the methods ‘predict’, ‘predict_proba’, and ‘decision_function’ inbase_algorithm
might be overwritten with another method that wraps it in a try-catch block, so don’t rely on it producing errors when unfitted.If passing a function, will take the output of it as the row-wise gradient norms when it compares them against other arms/classes, with the first column having the values if the observations were of negative class, and the second column if they were positive class. The inputs to this function (signature described above) are the number of positive and negative examples that have been observed, and a
Generator
object from NumPy to use for generating random numbers.If passing a list, will assume each entry is a function as described above, to be used with each corresponding arm.
If passing ‘auto’, will generate random numbers:
negative: ~ Gamma(log10(n_features) / (n_pos+1)/(n_pos+n_neg+2), log10(n_features)).
positive: ~ Gamma(log10(n_features) * (n_pos+1)/(n_pos+n_neg+2), log10(n_features)).
If passing ‘zero’, it will output zero whenever models have not been fitted.
Note that the theoretically correct approach for a logistic regression would be to assume models with all-zero coefficients, in which case the gradient is defined in the absence of any data, but this tends to produce bad end results. Ignored when passing
prob_active_choice=0.
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without a reward from a given arm, it will predict the score for that class as a random number drawn from a beta distribution with the prior specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((2/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed as a list of tuples. Recommended to use only one of
beta_prior
orsmoothing
.smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b), where ‘n’ is the number of times each arm was chosen in the training data. Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm (e.g. if there are arm features, these parameters can be determined through a different model). This will not work well with non-probabilistic classifiers such as SVM, in which case you might want to define a class that embeds it with some recalibration built-in. Recommended to use only one of
beta_prior
orsmoothing
.noise_to_smooth (bool) – If passing
smoothing
, whether to add a small amount of random noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of choosing the smallest arm index. Ignored when passingsmoothing=None
.batch_train (bool) – Whether the base algorithm will be fit to the data in batches as it comes (streaming), or to the whole dataset each time it is refit. Requires a classifier with a ‘partial_fit’ method.
refit_buffer (int or None) – Number of observations per arm to keep as a reserve for passing to ‘partial_fit’. If passing it, up until the moment there are at least this number of observations for a given arm, that arm will keep the observations when calling ‘fit’ and ‘partial_fit’, and will translate calls to ‘partial_fit’ to calls to ‘fit’ with the new plus stored observations. After the reserve number is reached, calls to ‘partial_fit’ will enlarge the data batch with the stored observations, and old stored observations will be gradually replaced with the new ones (at random, not on a FIFO basis). This technique can greatly enchance the performance when fitting the data in batches, but memory consumption can grow quite large. If passing sparse CSR matrices as input to ‘fit’ and ‘partial_fit’, these will be converted to dense once they go into this reserve, and then converted back to CSR to augment the new data. Calls to ‘fit’ will override this reserve. Ignored when passing ‘batch_train=False’.
deep_copy_buffer (bool) – Whether to make deep copies of the data that is stored in the reserve for
refit_buffer
. If passing ‘False’, when the reserve is not yet full, these will only store shallow copies of the data, which is faster but will not let Python’s garbage collector free memory after deleting the data, and if the original data is overwritten, so will this buffer. Ignored when not usingrefit_buffer
.assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’, whenever an arm receives a reward, the classifiers for all other arms will be fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator
object for random number generation, aRandomState
object (from NumPy) from which to draw an integer, or aGenerator
object (from NumPy), which will be used directly. While this controls random number generation for this meteheuristic, there can still be other sources of variations upon re-runs, such as data aggregations in parallel (e.g. from OpenMP or BLAS functions).njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will set it to the number of CPU cores. Note that if the base algorithm is itself parallelized, this might result in a slowdown as both compete for available threads, so don’t set parallelization in both. The parallelization uses shared memory, thus you will only see a speed up if your base classifier releases the Python GIL, and will otherwise result in slower runs.
References
- 1
Cortes, David. “Adapting multi-armed bandits policies to contextual bandits scenarios.” arXiv preprint arXiv:1811.04383 (2018).
- add_arm(arm_name=None, fitted_classifier=None, n_w_rew=0, n_wo_rew=0, smoothing=None, beta_prior=None, refit_buffer_X=None, refit_buffer_r=None, f_grad_norm=None, case_one_class=None)¶
Adds a new arm to the pool of choices
- Parameters
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise, will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not accept arbitrary classifiers as input, don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase). If the constructor was called with different
base_algorithm
per arm, must pass a base classifier here. Not applicable for the classes that do not take abase_algorithm
.n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor for details). If
None
and if thesmoothing
passed to the constructor didn’t have separate entries per arm, will use the samesmoothing
as was passed in the constructor. If nosmoothing
was passed to the constructor, thesmoothing
here will be ignored. Must pass asmoothing
here if the constructor was passed asmoothing
with different entries per arm.beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details. Must be passed if the constructor was provided different beta priors per arm. If
None
and the constructor had a singlebeta_prior
, will use that samebeta_prior
for this new arm. Note thatn_w_rew
andn_wo_rew
will be counted towards the threshold ‘n’ in here. Cannot be passed if the constructor did not have abeta_prior
.refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
- Returns
self – This object
- Return type
object
- decision_function(X)¶
Get the scores for each arm following this policy’s action-choosing criteria.
- Parameters
X (array (n_samples, n_features)) – Data for which to obtain decision function scores for each arm.
- Returns
scores – Scores following this policy for each arm.
- Return type
array (n_samples, n_choices)
- drop_arm(arm_name)¶
Drop an arm/choice
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
- Parameters
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise, will drop the arm matching this name (argument must be of the same type as the individual entries passed to ‘nchoices’ in the initialization).
- Returns
self – This object
- Return type
object
- fit(X, a, r, warm_start=False, continue_from_last=False)¶
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
warm_start (bool) – Whether to use the results of previous calls to ‘fit’ as a start for fitting to the ‘X’ data passed here. This will only be available if the base classifier has a property
warm_start
too and that property is also set to ‘True’. You can double-check that it’s recognized as such by checking this object’s propertyhas_warm_start
. Passing ‘True’ when the classifier doesn’t support warm start despite having the property might slow down things. Dropping arms will make this functionality unavailable. This options is not available for ‘BootstrappedUCB’, nor for ‘BootstrappedTS’.continue_from_last (bool) – If the policy was previously fit to data, whether to assume that this new call to ‘fit’ will continue from the exact same dataset as before plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case, will only refit the models that have new data according to ‘a’. Note that the bootstrapped policies will still benefit from extra refits. This option should not be used when there are calls to ‘partial_fit’ between calls to fit. Ignored if using
assume_unique_reward=True
.
- Returns
self – This object
- Return type
obj
- partial_fit(X, a, r)¶
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method, such as ‘sklearn.linear_model.SGDClassifier’. This method is not available for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
- Returns
self – This object
- Return type
obj
- predict(X, exploit=False)¶
Selects actions according to this policy for new data.
- Parameters
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the arm with the highest expected reward according to current models.
- Returns
pred – Actions chosen by the policy.
- Return type
array (n_samples,)
- reset_active_choice(active_choice='weighted')¶
Set the active gradient criteria to a custom form
- Parameters
active_choice (str in {‘min’, ‘max’, ‘weighted’}) – How to calculate the gradient that an observation would have on the loss function for each classifier, given that it could be either class (positive or negative) for the classifier that predicts each arm. If weighted, they are weighted by the same probability estimates from the base algorithm.
- Returns
self – This object
- Return type
obj
- reset_count()¶
Resets the counter for exploitation mode
- Return type
self
- topN(X, n)¶
Get top-N ranked actions for each observation
Note
This method will rank choices/arms according to what the policy dictates - it is not an exploitation-mode rank, so if e.g. there are random choices for some observations, there will be random ranks in here.
- Parameters
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
- Returns
topN – The top-ranked actions for each observation
- Return type
array(n_samples, n)
LinTS¶
- class contextualbandits.online.LinTS(nchoices, lambda_=1.0, fit_intercept=True, v_sq=1.0, sample_from='coef', n_presampled=None, sample_unique=True, use_float=False, method='chol', beta_prior=None, smoothing=None, noise_to_smooth=True, assume_unique_reward=False, random_state=None, njobs=1)¶
Linear Thompson Sampling
Note
This strategy requires each fitted model to store a square matrix with dimension equal to the number of features. Thus, memory consumption can grow very high with this method.
Note
The ‘X’ data (covariates) should ideally be centered before passing them to ‘fit’, ‘partial_fit’, ‘predict’.
Note
Be aware that sampling coefficients is an operation that scales poorly with the number of columns/features/variables. For wide datasets, it might be slower than a bootstrapped approach, especially when using
sample_unique=True
.- Parameters
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a custom name.
lambda_ (float > 0) – Regularization parameter. References assumed this would always be equal to 1, but this implementation allows to change it.
fit_intercept (bool) – Whether to add an intercept term to the coefficients.
v_sq (float) – Parameter by which to multiply the covariance matrix (more means higher variance). It is recommended to decrease it from the default value of 1.
sample_from (str, one of “coef”, “ci”) – Whether to make predictions by sampling the model coefficients or by sampling the predicted value from an interval centered around the coefficients. If sampling from the coefficients, it’s highly recommended to use
method="chol"
as it will be faster and more precise.n_presampled (None or int) – If sampling from coefficients, this denotes a number of coefficients to pre-sample after calling ‘fit’ and/or ‘partial_fit’, which will be used later in the predictions. Pre-sampling a large number of coefficients can help to speed up predictions at the expense of longer fitting times, and is recommended if there is a large number of predictions between calls to ‘fit’ or ‘partial_fit’. If passing ‘None’ (the default), will not pre-sample a finite number of the coefficients at fitting time, but will rather sample (different) coefficients in calls to ‘predict’. Ignored when passing
sample_from="ci"
.sample_unique (bool) – Whether to sample different coefficients each time a prediction is to be made. If passing ‘False’, when calling ‘predict’, it will sample the same coefficients for all the observations in the same call to ‘predict’, whereas if passing ‘True’, will use a different set of coefficients for each observations. Passing ‘False’ leads to an approach which is theoretically wrong, but as sampling coefficients can be very slow, using ‘False’ can provide a reasonable speed up without much of a performance penalty. Ignored when passing
sample_from="ci"
orn_presampled
.use_float (bool) – Whether to use C ‘float’ type for the required matrices. If passing ‘False’, will use C ‘double’. Be aware that memory usage for this model can grow very large, and that it is more prone to suffer from numeric precision problems compared to its UCB counterpart.
method (str, one of ‘chol’ or ‘sm’) – Method used to fit the model. Options are:
'chol'
:Uses the Cholesky decomposition to solve the linear system from the least-squares closed-form each time ‘fit’ or ‘partial_fit’ is called. This is likely to be faster when fitting the model to a large number of observations at once, and is able to better exploit multi-threading.
'sm'
:Starts with an inverse diagonal matrix and updates it as each new observation comes using the Sherman-Morrison formula, thus never explicitly solving the linear system, nor needing to calculate a matrix inverse. This is likely to be faster when fitting the model to small batches of observations. Be aware that with this method, it will add regularization to the intercept if passing ‘fit_intercept=True’.
Note that, even when using “sm” here, if sampling from the coefficients, it will need after each update to calculate eigen values of the covariance or inverse covariance matrix, so it won’t be as fast as LinUCB.
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without a reward from a given arm, it will predict the score for that class as a random number drawn from a beta distribution with the prior specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((2/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed as a list of tuples. This parameter can have a very large impact in the end results, and it’s recommended to tune it accordingly - scenarios with low expected reward rates should have priors that result in drawing small random numbers, whereas scenarios with large expected reward rates should have stronger priors and tend towards larger random numbers. Also, the more arms there are, the smaller the optimal expected value for these random numbers.
smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b), where ‘n’ is the number of times each arm was chosen in the training data. Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm (e.g. if there are arm features, these parameters can be determined through a different model). Recommended to use only one of
beta_prior
orsmoothing
. Note that it is technically incorrect to apply smoothing like this (because the predictions from models are not bounded between zero and one), but if neitherbeta_prior
, norsmoothing
are passed, the policy can get stuck in situations in which it will only choose actions from the first batch of observations to which it is fit.noise_to_smooth (bool) – If passing
smoothing
, whether to add a small amount of random noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of choosing the smallest arm index. Ignored when passingsmoothing=None
.assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’, whenever an arm receives a reward, the classifiers for all other arms will be fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator
object for random number generation, aRandomState
object (from NumPy) from which to draw an integer, or aGenerator
object (from NumPy), which will be used directly. While this controls random number generation for this meteheuristic, there can still be other sources of variations upon re-runs, such as data aggregations in parallel (e.g. from OpenMP or BLAS functions).njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will set it to the number of CPU cores. Be aware that the algorithm will use BLAS function calls, and if these have multi-threading enabled, it might result in a slow-down as both functions compete for available threads.
References
- 1
Agrawal, Shipra, and Navin Goyal. “Thompson sampling for contextual bandits with linear payoffs.” International Conference on Machine Learning. 2013.
- add_arm(arm_name=None, fitted_classifier=None, n_w_rew=0, n_wo_rew=0, smoothing=None, beta_prior=None, refit_buffer_X=None, refit_buffer_r=None, f_grad_norm=None, case_one_class=None)¶
Adds a new arm to the pool of choices
- Parameters
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise, will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not accept arbitrary classifiers as input, don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase). If the constructor was called with different
base_algorithm
per arm, must pass a base classifier here. Not applicable for the classes that do not take abase_algorithm
.n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor for details). If
None
and if thesmoothing
passed to the constructor didn’t have separate entries per arm, will use the samesmoothing
as was passed in the constructor. If nosmoothing
was passed to the constructor, thesmoothing
here will be ignored. Must pass asmoothing
here if the constructor was passed asmoothing
with different entries per arm.beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details. Must be passed if the constructor was provided different beta priors per arm. If
None
and the constructor had a singlebeta_prior
, will use that samebeta_prior
for this new arm. Note thatn_w_rew
andn_wo_rew
will be counted towards the threshold ‘n’ in here. Cannot be passed if the constructor did not have abeta_prior
.refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
- Returns
self – This object
- Return type
object
- decision_function(X)¶
Get the scores for each arm following this policy’s action-choosing criteria.
- Parameters
X (array (n_samples, n_features)) – Data for which to obtain decision function scores for each arm.
- Returns
scores – Scores following this policy for each arm.
- Return type
array (n_samples, n_choices)
- drop_arm(arm_name)¶
Drop an arm/choice
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
- Parameters
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise, will drop the arm matching this name (argument must be of the same type as the individual entries passed to ‘nchoices’ in the initialization).
- Returns
self – This object
- Return type
object
- fit(X, a, r, warm_start=False, continue_from_last=False)¶
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
warm_start (bool) – Whether to use the results of previous calls to ‘fit’ as a start for fitting to the ‘X’ data passed here. This will only be available if the base classifier has a property
warm_start
too and that property is also set to ‘True’. You can double-check that it’s recognized as such by checking this object’s propertyhas_warm_start
. Passing ‘True’ when the classifier doesn’t support warm start despite having the property might slow down things. Dropping arms will make this functionality unavailable. This options is not available for ‘BootstrappedUCB’, nor for ‘BootstrappedTS’.continue_from_last (bool) – If the policy was previously fit to data, whether to assume that this new call to ‘fit’ will continue from the exact same dataset as before plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case, will only refit the models that have new data according to ‘a’. Note that the bootstrapped policies will still benefit from extra refits. This option should not be used when there are calls to ‘partial_fit’ between calls to fit. Ignored if using
assume_unique_reward=True
.
- Returns
self – This object
- Return type
obj
- partial_fit(X, a, r)¶
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method, such as ‘sklearn.linear_model.SGDClassifier’. This method is not available for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
- Returns
self – This object
- Return type
obj
- predict(X, exploit=False, output_score=False)¶
Selects actions according to this policy for new data.
- Parameters
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use it with this pakckage’s offpolicy and evaluation modules.
- Returns
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary with the chosen arm and the score that the arm got following this policy with the classifiers used.
- Return type
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
- reset_alpha(alpha=1.0)¶
Set the upper confidence bound parameter to a custom number
Note
This method is only for LinUCB, not for LinTS.
- Parameters
alpha (float) – Parameter to control the upper confidence bound (more is higher).
- Returns
self – This object
- Return type
obj
- reset_v_sq(v_sq=1.0)¶
Set the covariance multiplier to a custom number
- Parameters
v_sq (float) – Parameter by which to multiply the covariance matrix (more means higher variance).
- Returns
self – This object
- Return type
obj
- topN(X, n)¶
Get top-N ranked actions for each observation
Note
This method will rank choices/arms according to what the policy dictates - it is not an exploitation-mode rank, so if e.g. there are random choices for some observations, there will be random ranks in here.
- Parameters
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
- Returns
topN – The top-ranked actions for each observation
- Return type
array(n_samples, n)
LinUCB¶
- class contextualbandits.online.LinUCB(nchoices, alpha=1.0, lambda_=1.0, fit_intercept=True, use_float=True, method='sm', ucb_from_empty=True, beta_prior=None, smoothing=None, noise_to_smooth=True, assume_unique_reward=False, random_state=None, njobs=1)¶
Note
This strategy requires each fitted model to store a square matrix with dimension equal to the number of features. Thus, memory consumption can grow very high with this method.
Note
The ‘X’ data (covariates) should ideally be centered before passing them to ‘fit’, ‘partial_fit’, ‘predict’.
Note
The default hyperparameters here are meant to match the original reference, but it’s recommended to change them. Particularly: use
beta_prior
instead ofucb_from_empty
, decreasealpha
, and maybe increaselambda_
.- Parameters
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a custom name.
alpha (float) – Parameter to control the upper confidence bound (more is higher).
lambda_ (float > 0) – Regularization parameter. References assumed this would always be equal to 1, but this implementation allows to change it.
fit_intercept (bool) – Whether to add an intercept term to the coefficients.
use_float (bool) – Whether to use C ‘float’ type for the required matrices. If passing ‘False’, will use C ‘double’. Be aware that memory usage for this model can grow very large.
method (str, one of ‘chol’ or ‘sm’) – Method used to fit the model. Options are:
'chol'
:Uses the Cholesky decomposition to solve the linear system from the least-squares closed-form each time ‘fit’ or ‘partial_fit’ is called. This is likely to be faster when fitting the model to a large number of observations at once, and is able to better exploit multi-threading.
'sm'
:Starts with an inverse diagonal matrix and updates it as each new observation comes using the Sherman-Morrison formula, thus never explicitly solving the linear system, nor needing to calculate a matrix inverse. This is likely to be faster when fitting the model to small batches of observations. Be aware that with this method, it will add regularization to the intercept if passing ‘fit_intercept=True’.
ucb_from_empty (bool) – Whether to make upper confidence bounds on arms with no observations according to the formula, as suggested in the references (ties are broken at random for them). Choosing this option leads to policies that usually start making random predictions until having sampled from all arms, and as such, it’s not recommended when the number of arms is large relative to the number of rounds. Instead, it’s recommended to use
beta_prior
, which acts in the same way as for the other policies in this library.beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without a reward from a given arm, it will predict the score for that class as a random number drawn from a beta distribution with the prior specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((3/log2(nchoices), 4), 2).
Can also pass different priors per arm, in which case they should be passed as a list of tuples. This parameter can have a very large impact in the end results, and it’s recommended to tune it accordingly - scenarios with low expected reward rates should have priors that result in drawing small random numbers, whereas scenarios with large expected reward rates should have stronger priors and tend towards larger random numbers. Also, the more arms there are, the smaller the optimal expected value for these random numbers. Ignored when passing
ucb_from_empty=True
.smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b), where ‘n’ is the number of times each arm was chosen in the training data. Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm (e.g. if there are arm features, these parameters can be determined through a different model). Recommended to use only one of
beta_prior
orsmoothing
. Note that it is technically incorrect to apply smoothing like this (because the predictions from models are not bounded between zero and one), but if neitherbeta_prior
, norsmoothing
are passed, the policy can get stuck in situations in which it will only choose actions from the first batch of observations to which it is fit (if usingucb_from_empty=False
), or only from the first arms that show rewards (if usingucb_from_empty=True
).noise_to_smooth (bool) – If passing
smoothing
, whether to add a small amount of random noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of choosing the smallest arm index. Ignored when passingsmoothing=None
.assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’, whenever an arm receives a reward, the classifiers for all other arms will be fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator
object for random number generation, aRandomState
object (from NumPy) from which to draw an integer, or aGenerator
object (from NumPy), which will be used directly. While this controls random number generation for this meteheuristic, there can still be other sources of variations upon re-runs, such as data aggregations in parallel (e.g. from OpenMP or BLAS functions).njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will set it to the number of CPU cores. Be aware that the algorithm will use BLAS function calls, and if these have multi-threading enabled, it might result in a slow-down as both functions compete for available threads.
References
- 1
Chu, Wei, et al. “Contextual bandits with linear payoff functions.” Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. 2011.
- 2
Li, Lihong, et al. “A contextual-bandit approach to personalized news article recommendation.” Proceedings of the 19th international conference on World wide web. ACM, 2010.
- add_arm(arm_name=None, fitted_classifier=None, n_w_rew=0, n_wo_rew=0, smoothing=None, beta_prior=None, refit_buffer_X=None, refit_buffer_r=None, f_grad_norm=None, case_one_class=None)¶
Adds a new arm to the pool of choices
- Parameters
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise, will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not accept arbitrary classifiers as input, don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase). If the constructor was called with different
base_algorithm
per arm, must pass a base classifier here. Not applicable for the classes that do not take abase_algorithm
.n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor for details). If
None
and if thesmoothing
passed to the constructor didn’t have separate entries per arm, will use the samesmoothing
as was passed in the constructor. If nosmoothing
was passed to the constructor, thesmoothing
here will be ignored. Must pass asmoothing
here if the constructor was passed asmoothing
with different entries per arm.beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details. Must be passed if the constructor was provided different beta priors per arm. If
None
and the constructor had a singlebeta_prior
, will use that samebeta_prior
for this new arm. Note thatn_w_rew
andn_wo_rew
will be counted towards the threshold ‘n’ in here. Cannot be passed if the constructor did not have abeta_prior
.refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
- Returns
self – This object
- Return type
object
- decision_function(X)¶
Get the scores for each arm following this policy’s action-choosing criteria.
- Parameters
X (array (n_samples, n_features)) – Data for which to obtain decision function scores for each arm.
- Returns
scores – Scores following this policy for each arm.
- Return type
array (n_samples, n_choices)
- drop_arm(arm_name)¶
Drop an arm/choice
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
- Parameters
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise, will drop the arm matching this name (argument must be of the same type as the individual entries passed to ‘nchoices’ in the initialization).
- Returns
self – This object
- Return type
object
- fit(X, a, r, warm_start=False, continue_from_last=False)¶
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
warm_start (bool) – Whether to use the results of previous calls to ‘fit’ as a start for fitting to the ‘X’ data passed here. This will only be available if the base classifier has a property
warm_start
too and that property is also set to ‘True’. You can double-check that it’s recognized as such by checking this object’s propertyhas_warm_start
. Passing ‘True’ when the classifier doesn’t support warm start despite having the property might slow down things. Dropping arms will make this functionality unavailable. This options is not available for ‘BootstrappedUCB’, nor for ‘BootstrappedTS’.continue_from_last (bool) – If the policy was previously fit to data, whether to assume that this new call to ‘fit’ will continue from the exact same dataset as before plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case, will only refit the models that have new data according to ‘a’. Note that the bootstrapped policies will still benefit from extra refits. This option should not be used when there are calls to ‘partial_fit’ between calls to fit. Ignored if using
assume_unique_reward=True
.
- Returns
self – This object
- Return type
obj
- partial_fit(X, a, r)¶
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method, such as ‘sklearn.linear_model.SGDClassifier’. This method is not available for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
- Returns
self – This object
- Return type
obj
- predict(X, exploit=False, output_score=False)¶
Selects actions according to this policy for new data.
- Parameters
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use it with this pakckage’s offpolicy and evaluation modules.
- Returns
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary with the chosen arm and the score that the arm got following this policy with the classifiers used.
- Return type
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
- reset_alpha(alpha=1.0)¶
Set the upper confidence bound parameter to a custom number
Note
This method is only for LinUCB, not for LinTS.
- Parameters
alpha (float) – Parameter to control the upper confidence bound (more is higher).
- Returns
self – This object
- Return type
obj
- topN(X, n)¶
Get top-N ranked actions for each observation
Note
This method will rank choices/arms according to what the policy dictates - it is not an exploitation-mode rank, so if e.g. there are random choices for some observations, there will be random ranks in here.
- Parameters
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
- Returns
topN – The top-ranked actions for each observation
- Return type
array(n_samples, n)
LogisticTS¶
- class contextualbandits.online.LogisticTS(nchoices, sample_from='ci', ci_from_empty=False, multiplier=0.25, n_presampled=None, fit_intercept=True, lambda_=1.0, sample_unique=True, beta_prior='auto', smoothing=None, noise_to_smooth=True, assume_unique_reward=False, random_state=None, njobs=- 1)¶
Logistic Regression with Thompson Sampling
Logistic regression classifier which either samples its coefficients using the variance-covariance matrix of the fitted non-sampled coefficients, or which samples predicted values from a confidence interval built from the same variance-covariance matrix as a faster alternative.
Note
This strategy is implemented for comparison purposes only and it’s not recommended to rely on it, particularly not for large datasets. Performance tends to be very bad compared to the other methods provided here.
Note
This strategy does not support fitting the data in batches (‘partial_fit’ will not be available), nor does it support using any other classifier. See ‘BootstrappedTS’ for a more generalizable version.
Note
This strategy requires each fitted model to store a square matrix with dimension equal to the number of features. Thus, memory consumption can grow very high with this method.
Note
Be aware that sampling coefficients is an operation that scales poorly with the number of columns/features/variables. For wide datasets, it might be slower than a bootstrapped approach, especially when using
sample_unique=True
.- Parameters
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a custom name.
sample_from (str, one of “coef”, “ci”) – Whether to make predictions by sampling the model coefficients or by sampling the predicted value from a confidence interval around the best-fit coefficients.
ci_from_empty (bool) – Whether to construct a confidence interval on arms with no observations according to a variance-covariance matrix given by the regulatization parameter alone. Ignored when passing
sample_from='coef'
.multiplier (float) – Multiplier for the covariance matrix. Pass 1 to take it as-is. Ignored when passing
sample_from='ci'
.n_presampled (None or int) – If sampling from coefficients, this denotes a number of coefficients to pre-sample after calling ‘fit’, which will be used later in the predictions. Pre-sampling a large number of coefficients can help to speed up predictions at the expense of longer fitting times, and is recommended if there is a large number of predictions between calls to ‘fit’. If passing ‘None’ (the default), will not pre-sample a finite number of the coefficients at fitting time, but will rather sample (different) coefficients in calls to ‘predict’. Ignored when passing
sample_from="ci"
.fit_intercept (bool) – Whether to add an intercept term to the models.
lambda_ (float) – Strenght of the L2 regularization. Must be greater than zero.
sample_unique (bool) – Whether to sample different coefficients each time a prediction is to be made. If passing ‘False’, when calling ‘predict’, it will sample the same coefficients for all the observations in the same call to ‘predict’, whereas if passing ‘True’, will use a different set of coefficients for each observation/row. Passing ‘False’ leads to an approach which is theoretically wrong, but as sampling coefficients can be very slow, using ‘False’ can provide a reasonable speed up without much of a performance penalty. Ignored when passing
sample_from='ci'
orn_presampled
.beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without a reward from a given arm, it will predict the score for that class as a random number drawn from a beta distribution with the prior specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((2/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed as a list of tuples. This parameter can have a very large impact in the end results, and it’s recommended to tune it accordingly - scenarios with low expected reward rates should have priors that result in drawing small random numbers, whereas scenarios with large expected reward rates should have stronger priors and tend towards larger random numbers. Also, the more arms there are, the smaller the optimal expected value for these random numbers. Recommended to use only one of
beta_prior
,smoothing
,ci_from_empty
. Ignored when passingci_from_empty=True
.smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b), where ‘n’ is the number of times each arm was chosen in the training data. Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm (e.g. if there are arm features, these parameters can be determined through a different model). Recommended to use only one of
beta_prior
,smoothing
,ci_from_empty
.noise_to_smooth (bool) – If passing
smoothing
, whether to add a small amount of random noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of choosing the smallest arm index. Ignored when passingsmoothing=None
.assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’, whenever an arm receives a reward, the classifiers for all other arms will be fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator
object for random number generation, aRandomState
object (from NumPy) from which to draw an integer, or aGenerator
object (from NumPy), which will be used directly. While this controls random number generation for this meteheuristic, there can still be other sources of variations upon re-runs, such as data aggregations in parallel (e.g. from OpenMP or BLAS functions).njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will set it to the number of CPU cores. Be aware that the algorithm will use BLAS function calls, and if these have multi-threading enabled, it might result in a slow-down as both functions compete for available threads.
References
- 1
Cortes, David. “Adapting multi-armed bandits policies to contextual bandits scenarios.” arXiv preprint arXiv:1811.04383 (2018).
- add_arm(arm_name=None, fitted_classifier=None, n_w_rew=0, n_wo_rew=0, smoothing=None, beta_prior=None, refit_buffer_X=None, refit_buffer_r=None, f_grad_norm=None, case_one_class=None)¶
Adds a new arm to the pool of choices
- Parameters
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise, will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not accept arbitrary classifiers as input, don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase). If the constructor was called with different
base_algorithm
per arm, must pass a base classifier here. Not applicable for the classes that do not take abase_algorithm
.n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor for details). If
None
and if thesmoothing
passed to the constructor didn’t have separate entries per arm, will use the samesmoothing
as was passed in the constructor. If nosmoothing
was passed to the constructor, thesmoothing
here will be ignored. Must pass asmoothing
here if the constructor was passed asmoothing
with different entries per arm.beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details. Must be passed if the constructor was provided different beta priors per arm. If
None
and the constructor had a singlebeta_prior
, will use that samebeta_prior
for this new arm. Note thatn_w_rew
andn_wo_rew
will be counted towards the threshold ‘n’ in here. Cannot be passed if the constructor did not have abeta_prior
.refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
- Returns
self – This object
- Return type
object
- decision_function(X)¶
Get the scores for each arm following this policy’s action-choosing criteria.
- Parameters
X (array (n_samples, n_features)) – Data for which to obtain decision function scores for each arm.
- Returns
scores – Scores following this policy for each arm.
- Return type
array (n_samples, n_choices)
- drop_arm(arm_name)¶
Drop an arm/choice
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
- Parameters
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise, will drop the arm matching this name (argument must be of the same type as the individual entries passed to ‘nchoices’ in the initialization).
- Returns
self – This object
- Return type
object
- fit(X, a, r, warm_start=False, continue_from_last=False)¶
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
warm_start (bool) – Whether to use the results of previous calls to ‘fit’ as a start for fitting to the ‘X’ data passed here. This will only be available if the base classifier has a property
warm_start
too and that property is also set to ‘True’. You can double-check that it’s recognized as such by checking this object’s propertyhas_warm_start
. Passing ‘True’ when the classifier doesn’t support warm start despite having the property might slow down things. Dropping arms will make this functionality unavailable. This options is not available for ‘BootstrappedUCB’, nor for ‘BootstrappedTS’.continue_from_last (bool) – If the policy was previously fit to data, whether to assume that this new call to ‘fit’ will continue from the exact same dataset as before plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case, will only refit the models that have new data according to ‘a’. Note that the bootstrapped policies will still benefit from extra refits. This option should not be used when there are calls to ‘partial_fit’ between calls to fit. Ignored if using
assume_unique_reward=True
.
- Returns
self – This object
- Return type
obj
- partial_fit(X, a, r)¶
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method, such as ‘sklearn.linear_model.SGDClassifier’. This method is not available for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
- Returns
self – This object
- Return type
obj
- predict(X, exploit=False, output_score=False)¶
Selects actions according to this policy for new data.
- Parameters
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use it with this pakckage’s offpolicy and evaluation modules.
- Returns
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary with the chosen arm and the score that the arm got following this policy with the classifiers used.
- Return type
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
- topN(X, n)¶
Get top-N ranked actions for each observation
Note
This method will rank choices/arms according to what the policy dictates - it is not an exploitation-mode rank, so if e.g. there are random choices for some observations, there will be random ranks in here.
- Parameters
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
- Returns
topN – The top-ranked actions for each observation
- Return type
array(n_samples, n)
LogisticUCB¶
- class contextualbandits.online.LogisticUCB(nchoices, percentile=80, fit_intercept=True, lambda_=1.0, ucb_from_empty=False, beta_prior='auto', smoothing=None, noise_to_smooth=True, assume_unique_reward=False, random_state=None, njobs=- 1)¶
Logistic Regression with Confidence Interval
Logistic regression classifier which constructs an upper bound on the predicted probabilities through a confidence interval calculated from the variance-covariance matrix of the fitted coefficients.
Note
This strategy is implemented for comparison purposes only and it’s not recommended to rely on it, particularly not for large datasets.
Note
This strategy does not support fitting the data in batches (‘partial_fit’ will not be available), nor does it support using any other classifier. See ‘BootstrappedUCB’ for a more generalizable version.
Note
This strategy requires each fitted classifier to store a square matrix with dimension equal to the number of features. Thus, memory consumption can grow very high with this method.
- Parameters
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a custom name.
percentile (int [0,100]) – Percentile of the confidence interval to take.
fit_intercept (bool) – Whether to add an intercept term to the models.
lambda_ (float) – Strenght of the L2 regularization. Must be greater than zero.
ucb_from_empty (bool) – Whether to make upper confidence bounds on arms with no observations according to the formula (ties are broken at random for them). Choosing this option leads to policies that usually start making random predictions until having sampled from all arms, and as such, it’s not recommended when the number of arms is large relative to the number of rounds. Instead, it’s recommended to use
beta_prior
, which acts in the same way as for the other policies in this library.beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without a reward from a given arm, it will predict the score for that class as a random number drawn from a beta distribution with the prior specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((3/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed as a list of tuples. This parameter can have a very large impact in the end results, and it’s recommended to tune it accordingly - scenarios with low expected reward rates should have priors that result in drawing small random numbers, whereas scenarios with large expected reward rates should have stronger priors and tend towards larger random numbers. Also, the more arms there are, the smaller the optimal expected value for these random numbers. Note that this method calculates upper bounds rather than expectations, so the ‘a’ parameter should be higher than for other methods. Recommended to use only one of
beta_prior
orsmoothing
. Ignored when passingucb_from_empty=True
.smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b), where ‘n’ is the number of times each arm was chosen in the training data. Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm (e.g. if there are arm features, these parameters can be determined through a different model). Recommended to use only one of
beta_prior
orsmoothing
.noise_to_smooth (bool) – If passing
smoothing
, whether to add a small amount of random noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of choosing the smallest arm index. Ignored when passingsmoothing=None
.assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’, whenever an arm receives a reward, the classifiers for all other arms will be fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator
object for random number generation, aRandomState
object (from NumPy) from which to draw an integer, or aGenerator
object (from NumPy), which will be used directly. While this controls random number generation for this meteheuristic, there can still be other sources of variations upon re-runs, such as data aggregations in parallel (e.g. from OpenMP or BLAS functions).njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will set it to the number of CPU cores. Be aware that the algorithm will use BLAS function calls, and if these have multi-threading enabled, it might result in a slow-down as both functions compete for available threads.
References
- 1
Cortes, David. “Adapting multi-armed bandits policies to contextual bandits scenarios.” arXiv preprint arXiv:1811.04383 (2018).
- add_arm(arm_name=None, fitted_classifier=None, n_w_rew=0, n_wo_rew=0, smoothing=None, beta_prior=None, refit_buffer_X=None, refit_buffer_r=None, f_grad_norm=None, case_one_class=None)¶
Adds a new arm to the pool of choices
- Parameters
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise, will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not accept arbitrary classifiers as input, don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase). If the constructor was called with different
base_algorithm
per arm, must pass a base classifier here. Not applicable for the classes that do not take abase_algorithm
.n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor for details). If
None
and if thesmoothing
passed to the constructor didn’t have separate entries per arm, will use the samesmoothing
as was passed in the constructor. If nosmoothing
was passed to the constructor, thesmoothing
here will be ignored. Must pass asmoothing
here if the constructor was passed asmoothing
with different entries per arm.beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details. Must be passed if the constructor was provided different beta priors per arm. If
None
and the constructor had a singlebeta_prior
, will use that samebeta_prior
for this new arm. Note thatn_w_rew
andn_wo_rew
will be counted towards the threshold ‘n’ in here. Cannot be passed if the constructor did not have abeta_prior
.refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
- Returns
self – This object
- Return type
object
- decision_function(X)¶
Get the scores for each arm following this policy’s action-choosing criteria.
- Parameters
X (array (n_samples, n_features)) – Data for which to obtain decision function scores for each arm.
- Returns
scores – Scores following this policy for each arm.
- Return type
array (n_samples, n_choices)
- drop_arm(arm_name)¶
Drop an arm/choice
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
- Parameters
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise, will drop the arm matching this name (argument must be of the same type as the individual entries passed to ‘nchoices’ in the initialization).
- Returns
self – This object
- Return type
object
- fit(X, a, r, warm_start=False, continue_from_last=False)¶
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
warm_start (bool) – Whether to use the results of previous calls to ‘fit’ as a start for fitting to the ‘X’ data passed here. This will only be available if the base classifier has a property
warm_start
too and that property is also set to ‘True’. You can double-check that it’s recognized as such by checking this object’s propertyhas_warm_start
. Passing ‘True’ when the classifier doesn’t support warm start despite having the property might slow down things. Dropping arms will make this functionality unavailable. This options is not available for ‘BootstrappedUCB’, nor for ‘BootstrappedTS’.continue_from_last (bool) – If the policy was previously fit to data, whether to assume that this new call to ‘fit’ will continue from the exact same dataset as before plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case, will only refit the models that have new data according to ‘a’. Note that the bootstrapped policies will still benefit from extra refits. This option should not be used when there are calls to ‘partial_fit’ between calls to fit. Ignored if using
assume_unique_reward=True
.
- Returns
self – This object
- Return type
obj
- partial_fit(X, a, r)¶
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method, such as ‘sklearn.linear_model.SGDClassifier’. This method is not available for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
- Returns
self – This object
- Return type
obj
- predict(X, exploit=False, output_score=False)¶
Selects actions according to this policy for new data.
- Parameters
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use it with this pakckage’s offpolicy and evaluation modules.
- Returns
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary with the chosen arm and the score that the arm got following this policy with the classifiers used.
- Return type
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
- reset_percentile(percentile=80)¶
Set the upper confidence bound percentile to a custom number
- Parameters
percentile (int [0,100]) – Percentile of the confidence interval to take.
- Returns
self – This object
- Return type
obj
- topN(X, n)¶
Get top-N ranked actions for each observation
Note
This method will rank choices/arms according to what the policy dictates - it is not an exploitation-mode rank, so if e.g. there are random choices for some observations, there will be random ranks in here.
- Parameters
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
- Returns
topN – The top-ranked actions for each observation
- Return type
array(n_samples, n)
ParametricTS¶
- class contextualbandits.online.ParametricTS(base_algorithm, nchoices, beta_prior=None, beta_prior_ts=(0.0, 0.0), smoothing=None, noise_to_smooth=True, batch_train=False, refit_buffer=None, deep_copy_buffer=True, assume_unique_reward=False, random_state=None, njobs=- 1)¶
Parametric Thompson Sampling
Performs Thompson sampling using a beta distribution, with parameters given by the predicted probability from the base algorithm multiplied by the number of observations seen from each arm.
- Parameters
base_algorithm (obj) – Base binary classifier for which each sample for each class will be fit. Will look for, in this order:
A ‘predict_proba’ method with outputs (n_samples, 2), values in [0,1], rows suming to 1
A ‘decision_function’ method with unbounded outputs (n_samples,) to which it will apply a sigmoid function.
A ‘predict’ method with outputs (n_samples,) with values in [0,1].
Can also pass a list with a different (or already-fit) classifier for each arm.
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a custom name.
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without a reward from a given arm, it will predict the score for that class as a random number drawn from a beta distribution with the prior specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((2/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed as a list of tuples. This parameter can have a very large impact in the end results, and it’s recommended to tune it accordingly - scenarios with low expected reward rates should have priors that result in drawing small random numbers, whereas scenarios with large expected reward rates should have stronger priors and tend towards larger random numbers. Also, the more arms there are, the smaller the optimal expected value for these random numbers. Recommended to use only one of
beta_prior
orsmoothing
.beta_prior_ts (tuple(float, float)) – Beta prior used for the distribution from which to draw probabilities given the base algorithm’s estimates. This is independent of
beta_prior
, and they will not be used together under the same arm. Pass ‘(0,0)’ for no prior.smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b), where ‘n’ is the number of times each arm was chosen in the training data. Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm (e.g. if there are arm features, these parameters can be determined through a different model). This will not work well with non-probabilistic classifiers such as SVM, in which case you might want to define a class that embeds it with some recalibration built-in. Recommended to use only one of
beta_prior
orsmoothing
.noise_to_smooth (bool) – If passing
smoothing
, whether to add a small amount of random noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of choosing the smallest arm index. Ignored when passingsmoothing=None
.batch_train (bool) – Whether the base algorithm will be fit to the data in batches as it comes (streaming), or to the whole dataset each time it is refit. Requires a classifier with a ‘partial_fit’ method.
refit_buffer (int or None) – Number of observations per arm to keep as a reserve for passing to ‘partial_fit’. If passing it, up until the moment there are at least this number of observations for a given arm, that arm will keep the observations when calling ‘fit’ and ‘partial_fit’, and will translate calls to ‘partial_fit’ to calls to ‘fit’ with the new plus stored observations. After the reserve number is reached, calls to ‘partial_fit’ will enlarge the data batch with the stored observations, and old stored observations will be gradually replaced with the new ones (at random, not on a FIFO basis). This technique can greatly enchance the performance when fitting the data in batches, but memory consumption can grow quite large. If passing sparse CSR matrices as input to ‘fit’ and ‘partial_fit’, these will be converted to dense once they go into this reserve, and then converted back to CSR to augment the new data. Calls to ‘fit’ will override this reserve. Ignored when passing ‘batch_train=False’.
deep_copy_buffer (bool) – Whether to make deep copies of the data that is stored in the reserve for
refit_buffer
. If passing ‘False’, when the reserve is not yet full, these will only store shallow copies of the data, which is faster but will not let Python’s garbage collector free memory after deleting the data, and if the original data is overwritten, so will this buffer. Ignored when not usingrefit_buffer
.assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’, whenever an arm receives a reward, the classifiers for all other arms will be fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator
object for random number generation, aRandomState
object (from NumPy) from which to draw an integer, or aGenerator
object (from NumPy), which will be used directly. While this controls random number generation for this meteheuristic, there can still be other sources of variations upon re-runs, such as data aggregations in parallel (e.g. from OpenMP or BLAS functions).njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will set it to the number of CPU cores. Note that if the base algorithm is itself parallelized, this might result in a slowdown as both compete for available threads, so don’t set parallelization in both. The parallelization uses shared memory, thus you will only see a speed up if your base classifier releases the Python GIL, and will otherwise result in slower runs.
- add_arm(arm_name=None, fitted_classifier=None, n_w_rew=0, n_wo_rew=0, smoothing=None, beta_prior=None, refit_buffer_X=None, refit_buffer_r=None, f_grad_norm=None, case_one_class=None)¶
Adds a new arm to the pool of choices
- Parameters
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise, will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not accept arbitrary classifiers as input, don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase). If the constructor was called with different
base_algorithm
per arm, must pass a base classifier here. Not applicable for the classes that do not take abase_algorithm
.n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor for details). If
None
and if thesmoothing
passed to the constructor didn’t have separate entries per arm, will use the samesmoothing
as was passed in the constructor. If nosmoothing
was passed to the constructor, thesmoothing
here will be ignored. Must pass asmoothing
here if the constructor was passed asmoothing
with different entries per arm.beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details. Must be passed if the constructor was provided different beta priors per arm. If
None
and the constructor had a singlebeta_prior
, will use that samebeta_prior
for this new arm. Note thatn_w_rew
andn_wo_rew
will be counted towards the threshold ‘n’ in here. Cannot be passed if the constructor did not have abeta_prior
.refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
- Returns
self – This object
- Return type
object
- decision_function(X)¶
Get the scores for each arm following this policy’s action-choosing criteria.
- Parameters
X (array (n_samples, n_features)) – Data for which to obtain decision function scores for each arm.
- Returns
scores – Scores following this policy for each arm.
- Return type
array (n_samples, n_choices)
- drop_arm(arm_name)¶
Drop an arm/choice
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
- Parameters
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise, will drop the arm matching this name (argument must be of the same type as the individual entries passed to ‘nchoices’ in the initialization).
- Returns
self – This object
- Return type
object
- fit(X, a, r, warm_start=False, continue_from_last=False)¶
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
warm_start (bool) – Whether to use the results of previous calls to ‘fit’ as a start for fitting to the ‘X’ data passed here. This will only be available if the base classifier has a property
warm_start
too and that property is also set to ‘True’. You can double-check that it’s recognized as such by checking this object’s propertyhas_warm_start
. Passing ‘True’ when the classifier doesn’t support warm start despite having the property might slow down things. Dropping arms will make this functionality unavailable. This options is not available for ‘BootstrappedUCB’, nor for ‘BootstrappedTS’.continue_from_last (bool) – If the policy was previously fit to data, whether to assume that this new call to ‘fit’ will continue from the exact same dataset as before plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case, will only refit the models that have new data according to ‘a’. Note that the bootstrapped policies will still benefit from extra refits. This option should not be used when there are calls to ‘partial_fit’ between calls to fit. Ignored if using
assume_unique_reward=True
.
- Returns
self – This object
- Return type
obj
- partial_fit(X, a, r)¶
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method, such as ‘sklearn.linear_model.SGDClassifier’. This method is not available for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
- Returns
self – This object
- Return type
obj
- predict(X, exploit=False, output_score=False)¶
Selects actions according to this policy for new data.
- Parameters
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use it with this pakckage’s offpolicy and evaluation modules.
- Returns
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary with the chosen arm and the score that the arm got following this policy with the classifiers used.
- Return type
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
- reset_beta_prior_ts(beta_prior_ts=(0.0, 0.0))¶
Set the Thompson prior to a custom tuple
- Parameters
beta_prior_ts (tuple(float, float)) – Beta prior used for the distribution from which to draw probabilities given the base algorithm’s estimates. This is independent of
beta_prior
, and they will not be used together under the same arm. Pass ‘(0,0)’ for no prior.- Returns
self – This object
- Return type
obj
- topN(X, n)¶
Get top-N ranked actions for each observation
Note
This method will rank choices/arms according to what the policy dictates - it is not an exploitation-mode rank, so if e.g. there are random choices for some observations, there will be random ranks in here.
- Parameters
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
- Returns
topN – The top-ranked actions for each observation
- Return type
array(n_samples, n)
PartitionedTS¶
- class contextualbandits.online.PartitionedTS(nchoices, beta_prior=((1, 1), 1), smoothing=None, noise_to_smooth=True, assume_unique_reward=False, random_state=None, njobs=- 1, *args, **kwargs)¶
Tree-partitioned Thompson Sampling
Fits decision trees having non-contextual multi-armed Thompson-sampling bandits at each leaf.
This corresponds to the ‘TreeHeuristic’ in the reference paper.
Note
This method fits only one tree per arm. As such, it’s not recommended for high-dimensional data.
Note
The default values for beta prior are as suggested in the reference paper. It is recommended to change it however.
- Parameters
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a custom name.
beta_prior (str ‘auto’, tuple ((a,b), n), or list[tuple((a,b), n)]) – When there are less than ‘n’ samples with and without a reward from a given arm, it will predict the score for that class as a random number drawn from a beta distribution with the prior specified by ‘a’ and ‘b’. If passing ‘auto’ (which is not the default), will use the same default as for the other policies in this library:
beta_prior = ((2/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed as a list of tuples. Additionally, will use (a,b) as prior when sampling from the MAB at a given node.
smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b), where ‘n’ is the number of times each arm was chosen in the training data. Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm (e.g. if there are arm features, these parameters can be determined through a different model). Not recommended for this method.
noise_to_smooth (bool) – If passing
smoothing
, whether to add a small amount of random noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of choosing the smallest arm index. Ignored when passingsmoothing=None
.assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’, whenever an arm receives a reward, the classifiers for all other arms will be fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator
object for random number generation, aRandomState
object (from NumPy) from which to draw an integer, or aGenerator
object (from NumPy), which will be used directly.njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will set it to the number of CPU cores. Note that it will not achieve a large degree of parallelization due to needing many Python computations with shared memory and no GIL releasing.
*args (tuple) – Additional arguments to pass to the decision tree model (this policy uses SciKit-Learn’s
DecisionTreeClassifier
- see their docs for more details). Note that passingrandom_state
forDecisionTreeClassifier
will have no effect as it will be set independently.**kwargs (dict) – Additional keyword arguments to pass to the decision tree model (this policy uses SciKit-Learn’s
DecisionTreeClassifier
- see their docs for more details). Note that passingrandom_state
forDecisionTreeClassifier
will have no effect as it will be set independently.
References
- 1
Elmachtoub, Adam N., et al. “A practical method for solving contextual bandit problems using decision trees.” arXiv preprint arXiv:1706.04687 (2017).
- 2
https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
- add_arm(arm_name=None, fitted_classifier=None, n_w_rew=0, n_wo_rew=0, smoothing=None, beta_prior=None, refit_buffer_X=None, refit_buffer_r=None, f_grad_norm=None, case_one_class=None)¶
Adds a new arm to the pool of choices
- Parameters
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise, will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not accept arbitrary classifiers as input, don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase). If the constructor was called with different
base_algorithm
per arm, must pass a base classifier here. Not applicable for the classes that do not take abase_algorithm
.n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor for details). If
None
and if thesmoothing
passed to the constructor didn’t have separate entries per arm, will use the samesmoothing
as was passed in the constructor. If nosmoothing
was passed to the constructor, thesmoothing
here will be ignored. Must pass asmoothing
here if the constructor was passed asmoothing
with different entries per arm.beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details. Must be passed if the constructor was provided different beta priors per arm. If
None
and the constructor had a singlebeta_prior
, will use that samebeta_prior
for this new arm. Note thatn_w_rew
andn_wo_rew
will be counted towards the threshold ‘n’ in here. Cannot be passed if the constructor did not have abeta_prior
.refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
- Returns
self – This object
- Return type
object
- decision_function(X)¶
Get the scores for each arm following this policy’s action-choosing criteria.
- Parameters
X (array (n_samples, n_features)) – Data for which to obtain decision function scores for each arm.
- Returns
scores – Scores following this policy for each arm.
- Return type
array (n_samples, n_choices)
- drop_arm(arm_name)¶
Drop an arm/choice
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
- Parameters
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise, will drop the arm matching this name (argument must be of the same type as the individual entries passed to ‘nchoices’ in the initialization).
- Returns
self – This object
- Return type
object
- fit(X, a, r, warm_start=False, continue_from_last=False)¶
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
warm_start (bool) – Whether to use the results of previous calls to ‘fit’ as a start for fitting to the ‘X’ data passed here. This will only be available if the base classifier has a property
warm_start
too and that property is also set to ‘True’. You can double-check that it’s recognized as such by checking this object’s propertyhas_warm_start
. Passing ‘True’ when the classifier doesn’t support warm start despite having the property might slow down things. Dropping arms will make this functionality unavailable. This options is not available for ‘BootstrappedUCB’, nor for ‘BootstrappedTS’.continue_from_last (bool) – If the policy was previously fit to data, whether to assume that this new call to ‘fit’ will continue from the exact same dataset as before plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case, will only refit the models that have new data according to ‘a’. Note that the bootstrapped policies will still benefit from extra refits. This option should not be used when there are calls to ‘partial_fit’ between calls to fit. Ignored if using
assume_unique_reward=True
.
- Returns
self – This object
- Return type
obj
- partial_fit(X, a, r)¶
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method, such as ‘sklearn.linear_model.SGDClassifier’. This method is not available for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
- Returns
self – This object
- Return type
obj
- predict(X, exploit=False, output_score=False)¶
Selects actions according to this policy for new data.
- Parameters
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use it with this pakckage’s offpolicy and evaluation modules.
- Returns
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary with the chosen arm and the score that the arm got following this policy with the classifiers used.
- Return type
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
- topN(X, n)¶
Get top-N ranked actions for each observation
Note
This method will rank choices/arms according to what the policy dictates - it is not an exploitation-mode rank, so if e.g. there are random choices for some observations, there will be random ranks in here.
- Parameters
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
- Returns
topN – The top-ranked actions for each observation
- Return type
array(n_samples, n)
PartitionedUCB¶
- class contextualbandits.online.PartitionedUCB(nchoices, percentile=80, ucb_prior=(1, 1), beta_prior='auto', smoothing=None, noise_to_smooth=True, assume_unique_reward=False, random_state=None, njobs=- 1, *args, **kwargs)¶
Tree-partitioned Upper Confidence Bound
Fits decision trees having non-contextual multi-armed UCB bandits at each leaf. Uses the standard approximation for confidence interval of a proportion (mean + c * sqrt(mean * (1-mean) / n)).
This is similar to the ‘TreeHeuristic’ in the reference paper, but uses UCB as a MAB policy instead of Thompson sampling.
Note
This method fits only one tree per arm. As such, it’s not recommended for high-dimensional data.
- Parameters
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a custom name.
percentile (int [0,100]) – Percentile of the confidence interval to take.
ucb_prior (tuple(float, float)) – Prior for the upper confidence bounds generated at each tree leaf. First number will be added to the number of positives, and second number to the number of negatives. If passing
beta_prior=None
, will use these alone to generate an upper confidence bound and will break ties at random.beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without a reward from a given arm, it will predict the score for that class as a random number drawn from a beta distribution with the prior specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((3/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed as a list of tuples. This parameter can have a very large impact in the end results, and it’s recommended to tune it accordingly - scenarios with low expected reward rates should have priors that result in drawing small random numbers, whereas scenarios with large expected reward rates should have stronger priors and tend towards larger random numbers. Also, the more arms there are, the smaller the optimal expected value for these random numbers. Note that this method calculates upper bounds rather than expectations, so the ‘a’ parameter should be higher than for other methods. Recommended to use only one of
beta_prior
orsmoothing
.smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b), where ‘n’ is the number of times each arm was chosen in the training data. Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm (e.g. if there are arm features, these parameters can be determined through a different model). Not recommended for this method.
noise_to_smooth (bool) – If passing
smoothing
, whether to add a small amount of random noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of choosing the smallest arm index. Ignored when passingsmoothing=None
.assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’, whenever an arm receives a reward, the classifiers for all other arms will be fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator
object for random number generation, aRandomState
object (from NumPy) from which to draw an integer, or aGenerator
object (from NumPy), which will be used directly.njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will set it to the number of CPU cores. Note that it will not achieve a large degree of parallelization due to needing many Python computations with shared memory and no GIL releasing.
*args (tuple) – Additional arguments to pass to the decision tree model (this policy uses SciKit-Learn’s
DecisionTreeClassifier
- see their docs for more details). Note that passingrandom_state
forDecisionTreeClassifier
will have no effect as it will be set independently.**kwargs (dict) – Additional keyword arguments to pass to the decision tree model (this policy uses SciKit-Learn’s
DecisionTreeClassifier
- see their docs for more details). Note that passingrandom_state
forDecisionTreeClassifier
will have no effect as it will be set independently.
References
- 1
Elmachtoub, Adam N., et al. “A practical method for solving contextual bandit problems using decision trees.” arXiv preprint arXiv:1706.04687 (2017).
- 2
https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
- add_arm(arm_name=None, fitted_classifier=None, n_w_rew=0, n_wo_rew=0, smoothing=None, beta_prior=None, refit_buffer_X=None, refit_buffer_r=None, f_grad_norm=None, case_one_class=None)¶
Adds a new arm to the pool of choices
- Parameters
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise, will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not accept arbitrary classifiers as input, don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase). If the constructor was called with different
base_algorithm
per arm, must pass a base classifier here. Not applicable for the classes that do not take abase_algorithm
.n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor for details). If
None
and if thesmoothing
passed to the constructor didn’t have separate entries per arm, will use the samesmoothing
as was passed in the constructor. If nosmoothing
was passed to the constructor, thesmoothing
here will be ignored. Must pass asmoothing
here if the constructor was passed asmoothing
with different entries per arm.beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details. Must be passed if the constructor was provided different beta priors per arm. If
None
and the constructor had a singlebeta_prior
, will use that samebeta_prior
for this new arm. Note thatn_w_rew
andn_wo_rew
will be counted towards the threshold ‘n’ in here. Cannot be passed if the constructor did not have abeta_prior
.refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
- Returns
self – This object
- Return type
object
- decision_function(X)¶
Get the scores for each arm following this policy’s action-choosing criteria.
- Parameters
X (array (n_samples, n_features)) – Data for which to obtain decision function scores for each arm.
- Returns
scores – Scores following this policy for each arm.
- Return type
array (n_samples, n_choices)
- drop_arm(arm_name)¶
Drop an arm/choice
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
- Parameters
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise, will drop the arm matching this name (argument must be of the same type as the individual entries passed to ‘nchoices’ in the initialization).
- Returns
self – This object
- Return type
object
- fit(X, a, r, warm_start=False, continue_from_last=False)¶
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
warm_start (bool) – Whether to use the results of previous calls to ‘fit’ as a start for fitting to the ‘X’ data passed here. This will only be available if the base classifier has a property
warm_start
too and that property is also set to ‘True’. You can double-check that it’s recognized as such by checking this object’s propertyhas_warm_start
. Passing ‘True’ when the classifier doesn’t support warm start despite having the property might slow down things. Dropping arms will make this functionality unavailable. This options is not available for ‘BootstrappedUCB’, nor for ‘BootstrappedTS’.continue_from_last (bool) – If the policy was previously fit to data, whether to assume that this new call to ‘fit’ will continue from the exact same dataset as before plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case, will only refit the models that have new data according to ‘a’. Note that the bootstrapped policies will still benefit from extra refits. This option should not be used when there are calls to ‘partial_fit’ between calls to fit. Ignored if using
assume_unique_reward=True
.
- Returns
self – This object
- Return type
obj
- partial_fit(X, a, r)¶
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method, such as ‘sklearn.linear_model.SGDClassifier’. This method is not available for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
- Returns
self – This object
- Return type
obj
- predict(X, exploit=False, output_score=False)¶
Selects actions according to this policy for new data.
- Parameters
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use it with this pakckage’s offpolicy and evaluation modules.
- Returns
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary with the chosen arm and the score that the arm got following this policy with the classifiers used.
- Return type
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
- reset_percentile(percentile=80)¶
Set the upper confidence bound percentile to a custom number
- Parameters
percentile (int [0,100]) – Percentile of the confidence interval to take.
- Returns
self – This object
- Return type
obj
- reset_ucb_prior(ucb_prior=(1, 1))¶
Set the upper confidence bound prior to a custom tuple
- Parameters
ucb_prior (tuple(float, float)) – Prior for the upper confidence bounds generated at each tree leaf. First number will be added to the number of positives, and second number to the number of negatives. If passing
beta_prior=None
, will use these alone to generate an upper confidence bound and will break ties at random.- Returns
self – This object
- Return type
obj
- topN(X, n)¶
Get top-N ranked actions for each observation
Note
This method will rank choices/arms according to what the policy dictates - it is not an exploitation-mode rank, so if e.g. there are random choices for some observations, there will be random ranks in here.
- Parameters
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
- Returns
topN – The top-ranked actions for each observation
- Return type
array(n_samples, n)
SeparateClassifiers¶
- class contextualbandits.online.SeparateClassifiers(base_algorithm, nchoices, beta_prior=None, smoothing=None, noise_to_smooth=True, batch_train=False, refit_buffer=None, deep_copy_buffer=True, assume_unique_reward=False, random_state=None, njobs=- 1)¶
Separate Clasifiers per arm
Fits one classifier per arm using only the data on which that arm was chosen. Predicts as One-Vs-Rest, plus the usual metaheuristics from
beta_prior
andsmoothing
.- Parameters
base_algorithm (obj) – Base binary classifier for which each sample for each class will be fit. Will look for, in this order:
A ‘predict_proba’ method with outputs (n_samples, 2), values in [0,1], rows suming to 1
A ‘decision_function’ method with unbounded outputs (n_samples,) to which it will apply a sigmoid function.
A ‘predict’ method with outputs (n_samples,) with values in [0,1].
Can also pass a list with a different (or already-fit) classifier for each arm.
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a custom name.
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without a reward from a given arm, it will predict the score for that class as a random number drawn from a beta distribution with the prior specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((2/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed as a list of tuples. This parameter can have a very large impact in the end results, and it’s recommended to tune it accordingly - scenarios with low expected reward rates should have priors that result in drawing small random numbers, whereas scenarios with large expected reward rates should have stronger priors and tend towards larger random numbers. Also, the more arms there are, the smaller the optimal expected value for these random numbers. Recommended to use only one of
beta_prior
orsmoothing
.smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b), where ‘n’ is the number of times each arm was chosen in the training data. Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm (e.g. if there are arm features, these parameters can be determined through a different model). This will not work well with non-probabilistic classifiers such as SVM, in which case you might want to define a class that embeds it with some recalibration built-in. Recommended to use only one of
beta_prior
orsmoothing
.noise_to_smooth (bool) – If passing
smoothing
, whether to add a small amount of random noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of choosing the smallest arm index. Ignored when passingsmoothing=None
.batch_train (bool) – Whether the base algorithm will be fit to the data in batches as it comes (streaming), or to the whole dataset each time it is refit. Requires a classifier with a ‘partial_fit’ method.
refit_buffer (int or None) – Number of observations per arm to keep as a reserve for passing to ‘partial_fit’. If passing it, up until the moment there are at least this number of observations for a given arm, that arm will keep the observations when calling ‘fit’ and ‘partial_fit’, and will translate calls to ‘partial_fit’ to calls to ‘fit’ with the new plus stored observations. After the reserve number is reached, calls to ‘partial_fit’ will enlarge the data batch with the stored observations, and old stored observations will be gradually replaced with the new ones (at random, not on a FIFO basis). This technique can greatly enchance the performance when fitting the data in batches, but memory consumption can grow quite large. If passing sparse CSR matrices as input to ‘fit’ and ‘partial_fit’, these will be converted to dense once they go into this reserve, and then converted back to CSR to augment the new data. Calls to ‘fit’ will override this reserve. Ignored when passing ‘batch_train=False’.
deep_copy_buffer (bool) – Whether to make deep copies of the data that is stored in the reserve for
refit_buffer
. If passing ‘False’, when the reserve is not yet full, these will only store shallow copies of the data, which is faster but will not let Python’s garbage collector free memory after deleting the data, and if the original data is overwritten, so will this buffer. Ignored when not usingrefit_buffer
.assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’, whenever an arm receives a reward, the classifiers for all other arms will be fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator
object for random number generation, aRandomState
object (from NumPy) from which to draw an integer, or aGenerator
object (from NumPy), which will be used directly. While this controls random number generation for this meteheuristic, there can still be other sources of variations upon re-runs, such as data aggregations in parallel (e.g. from OpenMP or BLAS functions).njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will set it to the number of CPU cores. Note that if the base algorithm is itself parallelized, this might result in a slowdown as both compete for available threads, so don’t set parallelization in both. The parallelization uses shared memory, thus you will only see a speed up if your base classifier releases the Python GIL, and will otherwise result in slower runs.
References
- 1
Cortes, David. “Adapting multi-armed bandits policies to contextual bandits scenarios.” arXiv preprint arXiv:1811.04383 (2018).
- add_arm(arm_name=None, fitted_classifier=None, n_w_rew=0, n_wo_rew=0, smoothing=None, beta_prior=None, refit_buffer_X=None, refit_buffer_r=None, f_grad_norm=None, case_one_class=None)¶
Adds a new arm to the pool of choices
- Parameters
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise, will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not accept arbitrary classifiers as input, don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase). If the constructor was called with different
base_algorithm
per arm, must pass a base classifier here. Not applicable for the classes that do not take abase_algorithm
.n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor for details). If
None
and if thesmoothing
passed to the constructor didn’t have separate entries per arm, will use the samesmoothing
as was passed in the constructor. If nosmoothing
was passed to the constructor, thesmoothing
here will be ignored. Must pass asmoothing
here if the constructor was passed asmoothing
with different entries per arm.beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details. Must be passed if the constructor was provided different beta priors per arm. If
None
and the constructor had a singlebeta_prior
, will use that samebeta_prior
for this new arm. Note thatn_w_rew
andn_wo_rew
will be counted towards the threshold ‘n’ in here. Cannot be passed if the constructor did not have abeta_prior
.refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
- Returns
self – This object
- Return type
object
- decision_function(X)¶
Get the scores for each arm following this policy’s action-choosing criteria.
- Parameters
X (array (n_samples, n_features)) – Data for which to obtain decision function scores for each arm.
- Returns
scores – Scores following this policy for each arm.
- Return type
array (n_samples, n_choices)
- decision_function_std(X)¶
Get the predicted “probabilities” from each arm from the classifier that predicts it, standardized to sum up to 1 (note that these are no longer probabilities).
- Parameters
X (array (n_samples, n_features)) – Data for which to obtain decision function scores for each arm.
- Returns
scores – Scores following this policy for each arm.
- Return type
array (n_samples, n_choices)
- drop_arm(arm_name)¶
Drop an arm/choice
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
- Parameters
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise, will drop the arm matching this name (argument must be of the same type as the individual entries passed to ‘nchoices’ in the initialization).
- Returns
self – This object
- Return type
object
- fit(X, a, r, warm_start=False, continue_from_last=False)¶
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
warm_start (bool) – Whether to use the results of previous calls to ‘fit’ as a start for fitting to the ‘X’ data passed here. This will only be available if the base classifier has a property
warm_start
too and that property is also set to ‘True’. You can double-check that it’s recognized as such by checking this object’s propertyhas_warm_start
. Passing ‘True’ when the classifier doesn’t support warm start despite having the property might slow down things. Dropping arms will make this functionality unavailable. This options is not available for ‘BootstrappedUCB’, nor for ‘BootstrappedTS’.continue_from_last (bool) – If the policy was previously fit to data, whether to assume that this new call to ‘fit’ will continue from the exact same dataset as before plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case, will only refit the models that have new data according to ‘a’. Note that the bootstrapped policies will still benefit from extra refits. This option should not be used when there are calls to ‘partial_fit’ between calls to fit. Ignored if using
assume_unique_reward=True
.
- Returns
self – This object
- Return type
obj
- partial_fit(X, a, r)¶
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method, such as ‘sklearn.linear_model.SGDClassifier’. This method is not available for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
- Returns
self – This object
- Return type
obj
- predict(X, output_score=False)¶
Selects actions according to this policy for new data.
- Parameters
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use it with this pakckage’s offpolicy and evaluation modules.
- Returns
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary with the chosen arm and the score that the arm got following this policy with the classifiers used.
- Return type
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
- predict_proba_separate(X)¶
Get the predicted probabilities from each arm from the classifier that predicts it.
Note
Classifiers are all fit on different data, so the probabilities will not add up to 1.
- Parameters
X (array (n_samples, n_features)) – Data for which to obtain decision function scores for each arm.
- Returns
scores – Scores following this policy for each arm.
- Return type
array (n_samples, n_choices)
- topN(X, n)¶
Get top-N ranked actions for each observation
Note
This method will rank choices/arms according to what the policy dictates - it is not an exploitation-mode rank, so if e.g. there are random choices for some observations, there will be random ranks in here.
- Parameters
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
- Returns
topN – The top-ranked actions for each observation
- Return type
array(n_samples, n)
SoftmaxExplorer¶
- class contextualbandits.online.SoftmaxExplorer(base_algorithm, nchoices, multiplier=1.0, inflation_rate=1.0004, beta_prior='auto', smoothing=None, noise_to_smooth=True, batch_train=False, refit_buffer=None, deep_copy_buffer=True, assume_unique_reward=False, random_state=None, njobs=- 1)¶
SoftMax Explorer
Selects an action according to probabilites determined by a softmax transformation on the scores from the decision function that predicts each class.
Note
Will apply an inverse sigmoid transformations to the probabilities that come from the base algorithm before applying the softmax function.
- Parameters
base_algorithm (obj) – Base binary classifier for which each sample for each class will be fit. Will look for, in this order:
A ‘predict_proba’ method with outputs (n_samples, 2), values in [0,1], rows suming to 1, to which it will apply an inverse sigmoid function.
A ‘decision_function’ method with unbounded outputs (n_samples,).
A ‘predict’ method outputting (n_samples,), values in [0,1], to which it will apply an inverse sigmoid function.
Can also pass a list with a different (or already-fit) classifier for each arm.
nchoices (int or list-like) – Number of arms/labels to choose from. Can also pass a list, array, or Series with arm names, in which case the outputs from predict will follow these names and arms can be dropped by name, and new ones added with a custom name.
multiplier (float or None) – Number by which to multiply the outputs from the base algorithm before applying the softmax function (i.e. will take softmax(yhat * multiplier)).
inflation_rate (float or None) – Number by which to multiply the multipier rate after every prediction, i.e. after making ‘t’ predictions, the multiplier will be ‘multiplier_t = multiplier * inflation_rate^t’.
beta_prior (str ‘auto’, None, tuple ((a,b), n), or list[tuple((a,b), n)]) – If not ‘None’, when there are less than ‘n’ samples with and without a reward from a given arm, it will predict the score for that class as a random number drawn from a beta distribution with the prior specified by ‘a’ and ‘b’. If set to “auto”, will be calculated as:
beta_prior = ((2/log2(nchoices), 4), 2)
Can also pass different priors per arm, in which case they should be passed as a list of tuples. This parameter can have a very large impact in the end results, and it’s recommended to tune it accordingly - scenarios with low expected reward rates should have priors that result in drawing small random numbers, whereas scenarios with large expected reward rates should have stronger priors and tend towards larger random numbers. Also, the more arms there are, the smaller the optimal expected value for these random numbers. Recommended to use only one of
beta_prior
orsmoothing
.smoothing (None, tuple (a,b), or list) – If not None, predictions will be smoothed as yhat_smooth = (yhat*n + a)/(n + b), where ‘n’ is the number of times each arm was chosen in the training data. Can also pass it as a list of tuples with different ‘a’ and ‘b’ parameters for each arm (e.g. if there are arm features, these parameters can be determined through a different model). This will not work well with non-probabilistic classifiers such as SVM, in which case you might want to define a class that embeds it with some recalibration built-in. Recommended to use only one of
beta_prior
orsmoothing
.noise_to_smooth (bool) – If passing
smoothing
, whether to add a small amount of random noise \(\sim Uniform(0, 10^{-12})\) in order to break ties at random instead of choosing the smallest arm index. Ignored when passingsmoothing=None
.batch_train (bool) – Whether the base algorithm will be fit to the data in batches as it comes (streaming), or to the whole dataset each time it is refit. Requires a classifier with a ‘partial_fit’ method.
refit_buffer (int or None) – Number of observations per arm to keep as a reserve for passing to ‘partial_fit’. If passing it, up until the moment there are at least this number of observations for a given arm, that arm will keep the observations when calling ‘fit’ and ‘partial_fit’, and will translate calls to ‘partial_fit’ to calls to ‘fit’ with the new plus stored observations. After the reserve number is reached, calls to ‘partial_fit’ will enlarge the data batch with the stored observations, and old stored observations will be gradually replaced with the new ones (at random, not on a FIFO basis). This technique can greatly enchance the performance when fitting the data in batches, but memory consumption can grow quite large. If passing sparse CSR matrices as input to ‘fit’ and ‘partial_fit’, these will be converted to dense once they go into this reserve, and then converted back to CSR to augment the new data. Calls to ‘fit’ will override this reserve. Ignored when passing ‘batch_train=False’.
deep_copy_buffer (bool) – Whether to make deep copies of the data that is stored in the reserve for
refit_buffer
. If passing ‘False’, when the reserve is not yet full, these will only store shallow copies of the data, which is faster but will not let Python’s garbage collector free memory after deleting the data, and if the original data is overwritten, so will this buffer. Ignored when not usingrefit_buffer
.assume_unique_reward (bool) – Whether to assume that only one arm has a reward per observation. If set to ‘True’, whenever an arm receives a reward, the classifiers for all other arms will be fit to that observation too, having negative label.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator
object for random number generation, aRandomState
object (from NumPy) from which to draw an integer, or aGenerator
object (from NumPy), which will be used directly. While this controls random number generation for this meteheuristic, there can still be other sources of variations upon re-runs, such as data aggregations in parallel (e.g. from OpenMP or BLAS functions).njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will set it to the number of CPU cores. Note that if the base algorithm is itself parallelized, this might result in a slowdown as both compete for available threads, so don’t set parallelization in both. The parallelization uses shared memory, thus you will only see a speed up if your base classifier releases the Python GIL, and will otherwise result in slower runs.
References
- 1
Cortes, David. “Adapting multi-armed bandits policies to contextual bandits scenarios.” arXiv preprint arXiv:1811.04383 (2018).
- add_arm(arm_name=None, fitted_classifier=None, n_w_rew=0, n_wo_rew=0, smoothing=None, beta_prior=None, refit_buffer_X=None, refit_buffer_r=None, f_grad_norm=None, case_one_class=None)¶
Adds a new arm to the pool of choices
- Parameters
arm_name (object) – Name for this arm. Only applicable when using named arms. If None, will use the name of the last arm plus 1 (will only work when the names are integers).
fitted_classifier (object) – If a classifier has already been fit to rewards coming from this arm, you can pass it here, otherwise, will be started from the same ‘base_classifier’ as the initial arms. If using bootstrapped methods or methods from this module which do not accept arbitrary classifiers as input, don’t pass a classifier here (unless using the classes like e.g. utils._BootstrappedClassifierBase). If the constructor was called with different
base_algorithm
per arm, must pass a base classifier here. Not applicable for the classes that do not take abase_algorithm
.n_w_rew (int) – Number of trials/rounds with rewards coming from this arm (only used when using a beta prior or smoothing).
n_wo_rew (int) – Number of trials/rounds without rewards coming from this arm (only used when using a beta prior or smoothing).
smoothing (None, tuple (a,b), or list) – Smoothing parameters to use for this arm (see documentation of the class constructor for details). If
None
and if thesmoothing
passed to the constructor didn’t have separate entries per arm, will use the samesmoothing
as was passed in the constructor. If nosmoothing
was passed to the constructor, thesmoothing
here will be ignored. Must pass asmoothing
here if the constructor was passed asmoothing
with different entries per arm.beta_prior (None or tuple((a,b), n)) – Beta prior to use for this arm. See the class’ documenation for details. Must be passed if the constructor was provided different beta priors per arm. If
None
and the constructor had a singlebeta_prior
, will use that samebeta_prior
for this new arm. Note thatn_w_rew
andn_wo_rew
will be counted towards the threshold ‘n’ in here. Cannot be passed if the constructor did not have abeta_prior
.refit_buffer_X (array(m, n) or None) – Refit buffer of ‘X’ data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
refit_buffer_r (array(m,) or None) – Refit buffer of rewards data to use for the new arm. Ignored when using ‘batch_train=False’ or ‘refit_buffer=None’.
f_grad_norm (function) – Gradient calculation function to use for this arm. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
case_one_class (function) – Gradient workaround function for single-class data. This is only for the policies that make choices according to active learning criteria, and only for situations in which the policy was passed different functions for each arm.
- Returns
self – This object
- Return type
object
- decision_function(X, output_score=False, apply_sigmoid_score=True)¶
Get the scores for each arm following this policy’s action-choosing criteria.
- Parameters
X (array (n_samples, n_features)) – Data for which to obtain decision function scores for each arm.
- Returns
scores – Scores following this policy for each arm.
- Return type
array (n_samples, n_choices)
- drop_arm(arm_name)¶
Drop an arm/choice
Drops (removes/deletes) an arm from the set of available choices to the policy.
Note
The available arms, if named, are stored in attribute ‘choice_names’.
- Parameters
arm_name (int or object) – Arm to drop. If passing an integer, will drop at that index (starting at zero). Otherwise, will drop the arm matching this name (argument must be of the same type as the individual entries passed to ‘nchoices’ in the initialization).
- Returns
self – This object
- Return type
object
- fit(X, a, r, warm_start=False, continue_from_last=False)¶
Fits the base algorithm (one per class [and per sample if bootstrapped]) to partially labeled data.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
warm_start (bool) – Whether to use the results of previous calls to ‘fit’ as a start for fitting to the ‘X’ data passed here. This will only be available if the base classifier has a property
warm_start
too and that property is also set to ‘True’. You can double-check that it’s recognized as such by checking this object’s propertyhas_warm_start
. Passing ‘True’ when the classifier doesn’t support warm start despite having the property might slow down things. Dropping arms will make this functionality unavailable. This options is not available for ‘BootstrappedUCB’, nor for ‘BootstrappedTS’.continue_from_last (bool) – If the policy was previously fit to data, whether to assume that this new call to ‘fit’ will continue from the exact same dataset as before plus new rows appended at the end of ‘X’, ‘a’, ‘r’. In this case, will only refit the models that have new data according to ‘a’. Note that the bootstrapped policies will still benefit from extra refits. This option should not be used when there are calls to ‘partial_fit’ between calls to fit. Ignored if using
assume_unique_reward=True
.
- Returns
self – This object
- Return type
obj
- partial_fit(X, a, r)¶
Fits the base algorithm (one per class) to partially labeled data in batches.
Note
In order to use this method, the base classifier must have a ‘partial_fit’ method, such as ‘sklearn.linear_model.SGDClassifier’. This method is not available for ‘LogisticUCB’, ‘LogisticTS’, ‘PartitionedUCB’, ‘PartitionedTS’.
- Parameters
X (array(n_samples, n_features) or CSR(n_samples, n_features)) – Matrix of covariates for the available data.
a (array(n_samples, ), int type) – Arms or actions that were chosen for each observations.
r (array(n_samples, ), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
- Returns
self – This object
- Return type
obj
- predict(X, exploit=False, output_score=False)¶
Selects actions according to this policy for new data.
- Parameters
X (array (n_samples, n_features)) – New observations for which to choose an action according to this policy.
exploit (bool) – Whether to make a prediction according to the policy, or to just choose the arm with the highest expected reward according to current models.
output_score (bool) – Whether to output the score that this method predicted, in case it is desired to use it with this pakckage’s offpolicy and evaluation modules.
- Returns
pred – Actions chosen by the policy. If passing output_score=True, it will be a dictionary with the chosen arm and the score that the arm got following this policy with the classifiers used.
- Return type
array (n_samples,) or dict(“choice” : array(n_samples,), “score” : array(n_samples,))
- reset_multiplier(multiplier=1.0)¶
Set the multiplier to a custom number
- Parameters
multiplier (float) – New multiplier for the numbers going to the softmax function. Note that it will still apply the inflation rate after this parameter is being reset.
- Returns
self – This object
- Return type
obj
- topN(X, n)¶
Get top-N ranked actions for each observation
Note
This method will rank choices/arms according to what the policy dictates - it is not an exploitation-mode rank, so if e.g. there are random choices for some observations, there will be random ranks in here.
- Parameters
X (array (n_samples, n_features)) – New observations for which to rank actions according to this policy.
n (int) – Number of top-ranked actions to output
- Returns
topN – The top-ranked actions for each observation
- Return type
array(n_samples, n)
Off-policy learning¶
Hint: if in doubt, use OffsetTree or SeparateClassifiers (last one is from the online module)
DoublyRobustEstimator¶
- class contextualbandits.offpolicy.DoublyRobustEstimator(base_algorithm, reward_estimator, nchoices, method='rovr', handle_invalid=True, random_state=1, c=None, pmin=1e-05, beta_prior=None, smoothing=(1.0, 2.0), njobs=- 1, **kwargs_costsens)¶
Doubly-Robust Estimator
Estimates the expected reward for each arm, applies a correction for the actions that were chosen, and converts the problem to const-sensitive classification, on which the base algorithm is then fit.
Note
This technique converts the problem into a cost-sensitive classification problem by calculating a matrix of expected rewards and turning it into costs. The base algorithm is then fit to this data, using either the Weighted All-Pairs approach, which requires a binary classifier with sample weights as base algorithm, or the Regression One-Vs-Rest approach, which requires a regressor as base algorithm.
In the Weighted All-Pairs approach, this technique will fail if there are actions that were never taken by the exploration policy, as it cannot construct a model for them.
The expected rewards are estimated with the imputer algorithm passed here, which should output a number in the range \([0,1]\).
This technique is meant for the case of contiunous rewards in the \([0,1]\) interval, but here it is used for the case of discrete rewards \(\{0,1\}\), under which it performs poorly. It is not recommended to use, but provided for comparison purposes.
Alo important: this method requires to form reward estimates of all arms for each observation. In order to do so, you can either provide estimates as an array (see Parameters), or pass a model.
One method to obtain reward estimates is to fit a model to the data and use its predictions as reward estimates. You can do so by passing an object of class contextualbandits.online.SeparateClassifiers which should be already fitted, or by passing a classifier with a ‘predict_proba’ method, which will be put into a ‘SeparateClassifiers’
object and fit to the same data passed to this function to obtain reward estimates.
The estimates can make invalid predictions if there are some arms for which every time they were chosen they resulted in a reward, or never resulted in a reward. In such cases, this function includes the option to impute the “predictions” for them (which would otherwise always be exactly zero or one regardless of the context) by replacing them with random numbers \(\sim ext{Beta}(3,1)\) or \(\sim ext{Beta}(1,3)\) for the cases of always good and always bad.
This is just a wild idea though, and doesn’t guarantee reasonable results in such siutation.
Note that, if you are using the ‘SeparateClassifiers’ class from the online module in this same package, it comes with a method ‘predict_proba_separate’ that can be used to get reward estimates. It still can suffer from the same problem of always-one and always-zero predictions though.
- Parameters
base_algorithm (obj) – Base algorithm to be used for cost-sensitive classification.
reward_estimator (obj or array (n_samples, n_choices)) –
- One of the following:
An array with the first column corresponding to the reward estimates for the action chosen by the new policy, and the second column corresponding to the reward estimates for the action chosen in the data (see Note for details).
An already-fit object of class ‘contextualbandits.online.SeparateClassifiers’, which will be used to make predictions on the actions chosen and the actions that the new policy would choose.
A classifier with a ‘predict_proba’ method, which will be fit to the same test data passed here in order to obtain reward estimates (see Note 2 for details).
nchoices (int) – Number of arms/labels to choose from. Only used when passing a classifier object to ‘reward_estimator’.
method (str, either ‘rovr’ or ‘wap’) – Whether to use Regression One-Vs-Rest or Weighted All-Pairs (see Note 1)
handle_invalid (bool) – Whether to replace 0/1 estimated rewards with randomly-generated numbers (see Note 2)
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator
object for random number generation, aRandomState
object (from NumPy) from which to draw an integer, or aGenerator
object (from NumPy), which will be used directly. This is used when passinghandle_invalid=True
orbeta_prior != None
.c (None or float) – Constant by which to multiply all scores from the exploration policy.
pmin (None or float) – Scores (from the exploration policy) will be converted to the minimum between pmin and the original estimate.
beta_prior (tuple((a, b), n), str “auto”, or None) – Beta prior to pass to ‘SeparateClassifiers’. Only used when passing to ‘reward_estimator’ a classifier with ‘predict_proba’. See the documentation of ‘SeparateClassifiers’ for details about it.
smoothing (tuple(a, b), list, or None) – Smoothing parameter to pass to
SeparateClassifiers
. Only used when passing to ‘reward_estimator’ a classifier with ‘predict_proba’. See the documentation ofSeparateClassifiers
for details.njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will set it to the number of CPU cores.
kwargs_costsens – Additional keyword arguments to pass to the cost-sensitive classifier.
References
- 1
Dudík, Miroslav, John Langford, and Lihong Li. “Doubly robust policy evaluation and learning.” arXiv preprint arXiv:1103.4601 (2011).
- 2
Dudík, Miroslav, et al. “Doubly robust policy evaluation and optimization.” Statistical Science 29.4 (2014): 485-511.
- decision_function(X)¶
Get score distribution for the arm’s rewards
Note
For details on how this is calculated, see the documentation of the RegressionOneVsRest and WeightedAllPairs classes in the costsensitive package.
- Parameters
X (array (n_samples, n_features)) – New observations for which to evaluate actions.
- Returns
pred – Score assigned to each arm for each observation (see Note).
- Return type
array (n_samples, n_choices)
- fit(X, a, r, p)¶
Fits the Doubly-Robust estimator to partially-labeled data collected from a different policy.
- Parameters
X (array (n_samples, n_features)) – Matrix of covariates for the available data.
a (array (n_samples), int type) – Arms or actions that were chosen for each observations.
r (array (n_samples), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
p (array (n_samples)) – Reward estimates for the actions that were chosen by the policy.
- predict(X)¶
Predict best arm for new data.
- Parameters
X (array (n_samples, n_features)) – New observations for which to choose an action.
- Returns
pred – Actions chosen by this technique.
- Return type
array (n_samples,)
OffsetTree¶
- class contextualbandits.offpolicy.OffsetTree(base_algorithm, nchoices, c=None, pmin=1e-05, random_state=1, njobs=- 1)¶
Offset Tree
- Parameters
base_algorithm (obj) – Binary classifier to be used for each classification sub-problem in the tree.
nchoices (int) – Number of arms/labels to choose from.
c (None or float) – Constant by which to multiply all scores from the exploration policy.
pmin (None or float) – Scores (from the exploration policy) will be converted to the minimum between pmin and the original estimate.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator
object for random number generation, aRandomState
object (from NumPy) from which to draw an integer, or aGenerator
object (from NumPy), which will be used directly. This is used when predictions need to be done for an arm with no data.njobs (int or None) – Number of parallel jobs to run. If passing None will set it to 1. If passing -1 will set it to the number of CPU cores. Note that if the base algorithm is itself parallelized, this might result in a slowdown as both compete for available threads, so don’t set parallelization in both.
References
- 1
Beygelzimer, Alina, and John Langford. “The offset tree for learning with partial labels.” Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009.
- fit(X, a, r, p)¶
Fits the Offset Tree estimator to partially-labeled data collected from a different policy.
- Parameters
X (array (n_samples, n_features)) – Matrix of covariates for the available data.
a (array (n_samples), int type) – Arms or actions that were chosen for each observations.
r (array (n_samples), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
p (array (n_samples)) – Reward estimates for the actions that were chosen by the policy.
- predict(X)¶
Predict best arm for new data.
Note
While in theory, making predictions from this algorithm should be faster than from others, the implementation here uses a Python loop for each observation, which is slow compared to NumPy array lookups, so the predictions will be slower to calculate than those from other algorithms.
- Parameters
X (array (n_samples, n_features)) – New observations for which to choose an action.
- Returns
pred – Actions chosen by this technique.
- Return type
array (n_samples,)
Policy Evaluation¶
evaluateRejectionSampling¶
- class contextualbandits.evaluation.evaluateRejectionSampling(policy, X, a, r, online=True, partial_fit=False, start_point_online='random', random_state=1, batch_size=10)¶
Evaluate a policy using rejection sampling on test data.
Note
In order for this method to be unbiased, the actions on the test sample must have been collected at random and not according to some other policy.
- Parameters
policy (obj) – Policy to be evaluated (already fitted to data). Must have a ‘predict’ method. If it is an online policy, it must also have a ‘fit’ method.
X (array (n_samples, n_features)) – Matrix of covariates for the available data.
a (array (n_samples), int type) – Arms or actions that were chosen for each observation.
r (array (n_samples), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
online (bool) – Whether this is an online policy to be evaluated by refitting it to the data as it makes choices on it.
partial_fit (bool) – Whether to use ‘partial_fit’ when fitting the policy to more data. Ignored if passing
online=False
.start_point_online (either str ‘random’ or int in [0, n_samples-1]) – Point at which to start evaluating cases in the sample. Only used when passing online=True.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator
object for random number generation, aRandomState
object (from NumPy) from which to draw an integer, or aGenerator
object (from NumPy), which will be used directly. This is only used when passingstart_point_online='random'
.batch_size (int) – Size of batches of data to take for making predictions and adding observations to the history. Note that usually most of the samples are rejected, thus the actual size of the batches to which the models are refit are usually smaller than this number. Only used when passing
online=True
.
- Returns
result – Estimated mean reward and number of observations taken.
- Return type
tuple (float, int)
References
- 1
Li, Lihong, et al. “A contextual-bandit approach to personalized news article recommendation.” Proceedings of the 19th international conference on World wide web. ACM, 2010.
evaluateDoublyRobust¶
- class contextualbandits.evaluation.evaluateDoublyRobust(pred, X, a, r, p, reward_estimator, nchoices=None, handle_invalid=True, c=None, pmin=1e-05, random_state=1)¶
Doubly-Robust Policy Evaluation
Evaluates rewards of arm choices of a policy from data collected by another policy, using a reward estimator along with the historical probabilities (hence the name).
Note
This method requires to form reward estimates of the arms that were chosen and of the arms that the policy to be evaluated would choose. In order to do so, you can either provide estimates as an array (see Parameters), or pass a model.
One method to obtain reward estimates is to fit a model to both the training and test data and use its predictions as reward estimates. You can do so by passing an object of class contextualbandits.online.SeparateClassifiers which should be already fitted.
Another method is to fit a model to the test data, in which case you can pass a classifier with a ‘predict_proba’ method here, which will be fit to the same test data passed to this function to obtain reward estimates.
The last two options can suffer from invalid predictions if there are some arms for which every time they were chosen they resulted in a reward, or never resulted in a reward. In such cases, this function includes the option to impute the “predictions” for them (which would otherwise always be exactly zero or one regardless of the context) by replacing them with random numbers ~Beta(3,1) or ~Beta(1,3) for the cases of always good and always bad.
This is just a wild idea though, and doesn’t guarantee reasonable results in such siutation.
Note that, if you are using the ‘SeparateClassifiers’ class from the online module in this same package, it comes with a method ‘predict_proba_separate’ that can be used to get reward estimates. It still can suffer from the same problem of always-one and always-zero predictions though.
- Parameters
pred (array (n_samples,)) – Arms that would be chosen by the policy to evaluate.
X (array (n_samples, n_features)) – Matrix of covariates for the available data.
a (array (n_samples), int type) – Arms or actions that were chosen for each observation.
r (array (n_samples), {0,1}) – Rewards that were observed for the chosen actions. Must be binary rewards 0/1.
p (array (n_samples)) – Scores or reward estimates from the policy that generated the data for the actions that were chosen by it.
reward_estimator (obj or array (n_samples, 2)) –
- One of the following:
An array with the first column corresponding to the reward estimates for the action chosen by the new policy, and the second column corresponding to the reward estimates for the action chosen in the data (see Note for details).
An already-fit object of class ‘contextualbandits.online.SeparateClassifiers’, which will be used to make predictions on the actions chosen and the actions that the new policy would choose.
A classifier with a ‘predict_proba’ method, which will be fit to the same test data passed here in order to obtain reward estimates (see Note for details).
nchoices (int) – Number of arms/labels to choose from. Only used when passing a classifier object to ‘reward_estimator’.
handle_invalid (bool) – Whether to replace 0/1 estimated rewards with randomly-generated numbers (see Note)
c (None or float) – Constant by which to multiply all scores from the exploration policy.
pmin (None or float) – Scores (from the exploration policy) will be converted to the minimum between pmin and the original estimate.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator
object for random number generation, aRandomState
object (from NumPy) from which to draw an integer, or aGenerator
object (from NumPy), which will be used directly.
- Returns
est – The estimated mean reward that the new policy would obtain on the ‘X’ data.
- Return type
float
References
- 1
Dudík, Miroslav, John Langford, and Lihong Li. “Doubly robust policy evaluation and learning.” arXiv preprint arXiv:1103.4601 (2011).
evaluateFullyLabeled¶
- class contextualbandits.evaluation.evaluateFullyLabeled(policy, X, y_onehot, online=False, shuffle=True, update_freq=50, random_state=1)¶
Evaluates a policy on fully-labeled data
- Parameters
policy (obj) – Policy to be evaluated (already fitted to data). Must have a ‘predict’ method. If it is an online policy, it must also have a ‘fit’ method.
X (array (n_samples, n_features)) – Covariates for each observation.
y_onehot (array (n_samples, n_arms)) – Labels (zero or one) for each class for each observation.
online (bool) – Whether the algorithm should be fit to batches of data with a ‘partial_fit’ method, or to all historical data each time.
shuffle (bool) – Whether to shuffle the data (X and y_onehot) before passing through it. Be awarethat data is shuffled in-place.
update_freq (int) – Batch size - how many observations to predict before refitting the model.
random_state (int, None, RandomState, or Generator) – Either an integer which will be used as seed for initializing a
Generator
object for random number generation, aRandomState
object (from NumPy) from which to draw an integer, or aGenerator
object (from NumPy), which will be used directly. This is used when shuffling and when selecting actions at random for the first batch.
- Returns
mean_rew – Mean reward obtained at each batch.
- Return type
array (n_samples,)
evaluateNCIS¶
- class contextualbandits.evaluation.evaluateNCIS(est, r, p, cmin=1e-08, cmax=1000.0)¶
Normalized Capped Importance Sampling
Evaluates rewards of arm choices of a policy from data collected by another policy, making corrections according to the difference between the estimations of the new and old policy over the actions that were chosen.
Note
This implementation is theoretically incorrect as this whole library doesn’t follow the paradigm of producing probabilities of choosing actions (it is theoretically possible for many of the methods in the
online
section, but computationally inefficient and not supported by the library). Instead, it uses estimated expected rewards (that is, the rows of the estimations don’t sum to 1), which is not what this method expects, but nevertheless, the ratio of these estimations between the old and new policy should be highly related to the ratio of the probabilities of choosing those actions, and as such, this function is likely to still produce an improvement over a naive average of the expected rewards across actions that were chosen by a different policy.Note
Unlike the other functions in this module, function doesn’t take the indices of the chosen actions, but rather takes the predictions directly (see the ‘Parameters’ section for details).
- Parameters
est (array (n_samples,)) – Scores or reward estimates from the policy being evaluated on the actions that were chosen by the old policy for each row of ‘X’.
r (array (n_samples), {0,1}) – Rewards that were observed for the chosen actions.
p (array (n_samples)) – Scores or reward estimates from the policy that generated the data for the actions that were chosen by it. Must be in the same scale as ‘est’.
cmin (float) – Minimum value for the ratio between estimations to assign to observations. If any ratio is below this number, it will be assigned this value (i.e. will be clipped).
cmax (float) – Maximum value of the ratio between estimations that will be taken. Observations with ratios higher than this will be discarded rather than clipped.
- Returns
est – The estimated mean reward that the new policy would obtain on the ‘X’ data.
- Return type
float
References
- 1
Gilotte, Alexandre, et al. “Offline a/b testing for recommender systems.” Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 2018.
Linear Regression¶
The package offers non-stochastic linear regression procedures with exact “partial_fit” solutions, which are recommended to use alongside the online policies for better incremental updates.
Linear Regression¶
- class contextualbandits.linreg.LinearRegression(lambda_=1.0, fit_intercept=True, method='sm', calc_inv=True, precompute_ts=False, precompute_ts_multiplier=1.0, n_presampled=None, rng_presample=None, use_float=True)¶
Linear Regression
Typical Linear Regression model, which keeps track of the aggregated data needed to obtain the closed-form solution in a way that calling ‘partial_fit’ multiple times would be equivalent to a single call to ‘fit’ with all the data. This is an exact method rather than a stochastic optimization procedure.
Also provides functionality for making predictions according to upper confidence bound (UCB) and to Thompson sampling criteria.
Note
Doing linear regression this way requires both memory and computation time which scale quadratically with the number of columns/features/variables. As such, the class will by default use C ‘float’ types (typically
np.float32
) instead of C ‘double’ (np.float64
), in order to save memory.- Parameters
lambda_ (float) – Strenght of the L2 regularization.
fit_intercept (bool) – Whether to add an intercept term to the formula. If passing ‘True’, it will be the last entry in the coefficients.
method (str, one of ‘chol’ or ‘sm’) – Method used to fit the model. Options are:
'chol'
:Uses the Cholesky decomposition to solve the linear system from the least-squares closed-form each time ‘fit’ or ‘partial_fit’ is called. This is likely to be faster when fitting the model to a large number of observations at once, and is able to better exploit multi-threading.
'sm'
:Starts with an inverse diagonal matrix and updates it as each new observation comes using the Sherman-Morrison formula, thus never explicitly solving the linear system, nor needing to calculate a matrix inverse. This is likely to be faster when fitting the model to small batches of observations. Be aware that with this method, it will add regularization to the intercept if passing ‘fit_intercept=True’.
Note that it is possible to change the method after the object has already been fit (e.g. if you want non-regularization intercept with fast online updates, you might use Cholesky first and then switch to Sherman-Morrison).
calc_inv (bool) – When using
method='chol'
, whether to also produce a matrix inverse, which is required for using the LinUCB prediction mode. Ignored when passingmethod='sm'
(the default). Note that is is possible to change the method after the object has already been fit.precompute_ts (bool) – Whether to pre-compute the necessary matrices to accelerate the Thompson sampling prediction mode (method
predict_thompson
). If you plan to usepredict_thompson
, it’s recommended to pass “True”. Note that this will make the Sherman-Morrison updates (method="sm"
) much slower as it will calculate eigenvalues after every update. Can be changed after the object is already initialized or fitted.precompute_ts_multiplier (float) – Multiplier for the covariance matrix to use when using
precompute_ts
. Callingpredict_thompson
with this same multiplier will be faster than with a different one. Calling it with a different multiplier withprecompute_ts
will still be faster than without it, unless using alson_presampled
. Ignored when passingprecompute_ts=False
.n_presampled (None or int) – When passing
precompute_ts
, this denotes a number of coefficients to pre-sample after calling ‘fit’ and/or ‘partial_fit’, which will be used later when callingpredict_thompson
with the same multiplier as inprecompute_ts_multiplier
. Pre-sampling a large number of coefficients can help to speed up Thompson-sampled predictions at the expense of longer fitting times, and is recommended if there is a large number of predictions between calls to ‘fit’ or ‘partial_fit’. If passing ‘None’ (the default), will not pre-sample a finite number of the coefficients at fitting time, but will rather sample (different) coefficients in calls topredict_thompson
. The pre-sampled coefficients will not be used if callingpredict_thompson
with a different multiplier than what was passed toprecompute_ts_multiplier
.rng_presample (None, int, RandomState, or Generator) – Random number generator to use for pre-sampling coefficients. If passing an integer, will use it as a random seed for initialization. If passing a RandomState, will use it to draw an integer to use as seed. If passing a Generator, will use it directly. If passing ‘None’, will initialize a Generator without random seed. Ignored if passing
precompute_ts=False
orn_presampled=None
(the defaults).use_float (bool) – Whether to use C ‘float’ type for the required matrices. If passing ‘False’, will use C ‘double’. Be aware that memory usage for this model can grow very large. Can be changed after initialization.
- Variables
coef (array(n) or array(n+1)) – The obtained coefficients. If passing ‘fit_intercept=True’, the intercept will be at the last entry.
- property calc_inv¶
- fit(X, y, sample_weight=None)¶
Fit model to data
Note
Calling ‘fit’ will reset whatever previous data was there. For fitting the model incrementally to new data, use ‘partial_fit’ instead.
- Parameters
X (array(m,n) or CSR matrix(m, n)) – The covariates.
y (array-like(m)) – The target variable.
sample_weight (None or array-like(m)) – Observation weights for each row.
- Return type
self
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
- property method¶
- property n_presampled¶
- partial_fit(X, y, sample_weight=None, *args, **kwargs)¶
Fit model incrementally to new data
- Parameters
X (array(m,n) or CSR matrix(m, n)) – The covariates.
y (array-like(m)) – The target variable.
sample_weight (None or array-like(m)) – Observation weights for each row.
- Return type
self
- property precompute_ts¶
- property precompute_ts_multiplier¶
- predict(X)¶
Make predictions on new data
- Parameters
X (array(m,n) or CSR matrix(m, n)) – The covariates.
- Returns
y_hat – The predicted values given ‘X’.
- Return type
array(m)
- predict_thompson(X, v_sq=1.0, sample_unique=False, random_state=None)¶
Make a guess prediction on new data
Make a prediction on new data with coefficients sampled from their estimated distribution.
Note
If using this method, it’s recommended to center the ‘X’ data passed to ‘fit’ and ‘partial_fit’. If not centered, it’s recommendable to lower the
v_sq
value.- Parameters
X (array(m,n) or CSR matrix(m, n)) – The covariates.
v_sq (float > 0) – The multiplier for the covariance matrix. Larger values lead to more variable results.
sample_unique (bool) – Whether to sample different coefficients each time a prediction is to be made. If passing ‘False’, when calling ‘predict’, it will sample the same coefficients for all the observations in the same call to ‘predict’, whereas if passing ‘True’, will use a different set of coefficients for each observations. Passing ‘False’ leads to an approach which is theoretically wrong, but as sampling coefficients can be very slow, using ‘False’ can provide a reasonable speed up without much of a performance penalty.
random_state (None, np.random.Generator, or np.random.RandomState) – A NumPy ‘Generator’ or ‘RandomState’ object instance to use for generating random numbers. If passing ‘None’, will use NumPy’s random module directly (which can be made reproducible through
np.random.seed
).
- Returns
y_hat – The predicted guess on ‘y’ given ‘X’ and
v_sq
.- Return type
array(m)
References
- 1
Agrawal, Shipra, and Navin Goyal. “Thompson sampling for contextual bandits with linear payoffs.” International Conference on Machine Learning. 2013.
- predict_ucb(X, alpha=1.0, add_unfit_noise=False, random_state=None)¶
Make an upper-bound prediction on new data
Make a prediction on new data with an upper bound given by the LinUCB formula (be aware that it’s not probabilistic like a regular CI).
Note
If using this method, it’s recommended to center the ‘X’ data passed to ‘fit’ and ‘partial_fit’. If not centered, it’s recommendable to lower the
alpha
value.- Parameters
X (array(m,n) or CSR matrix(m, n)) – The covariates.
alpha (float > 0 or array(m, ) > 0) – The multiplier for the width of the bound. Can also pass an array with different values for each row.
add_unfit_noise (bool) – When making predictions with an unfit model (in this case they are given by empty zero matrices except for the inverse diagonal matrix based on the regularization parameter), whether to add a very small amount of random noise ~ Uniform(0, 10^-12) to it. This is useful in order to break ties at random when using multiple models.
random_state (None, np.random.Generator, or np.random.RandomState) – A NumPy ‘Generator’ or ‘RandomState’ object instance to use for generating random numbers. If passing ‘None’, will use NumPy’s random module directly (which can be made reproducible through
np.random.seed
). Only used when passingadd_unfit_noise=True
and calling this method on a model that has not been fit to data.
- Returns
y_hat – The predicted upper bound on ‘y’ given ‘X’ and
alpha
.- Return type
array(m)
References
- 1
Chu, Wei, et al. “Contextual bandits with linear payoff functions.” Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. 2011.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- property use_float¶
ElasticNet¶
- class contextualbandits.linreg.ElasticNet(alpha=0.1, l1_ratio=0.5, fit_intercept=True, l1=None, l2=None, use_float=True)¶
ElasticNet Regression
ElasticNet regression (with penalization on the l1 and l2 norms of the coefficients), which keeps track of the aggregated data needed to obtain the optimal coefficients in such a way that calling ‘partial_fit’ multiple times would be equivalent to a single call to ‘fit’ with all the data. This is an exact method rather than a stochastic optimization procedure.
Note
This ElasticNet regression is fit through a reduction to non-negative least squares with twice the number of variables, which is in turn solved through a coordinate descent procedure. This is typically slower than the lasso paths used in GLMNet and SciKit-Learn, and scales much worse with the number of features/columns, but allows for faster incremental updates through ‘partial_fit’, which will give the same result as calls to fit.
Note
This model will not standardize the input data in any way.
Note
By default, will set the l1 and l2 regularization in the same way as GLMNet and SciKit-Learn - that is, the regularizations increase along with the number of rows in the data, which means they will be different after each call to ‘fit’ or ‘partial_fit’. It is nevertheless possible to specify the l1 and l2 regularization directly, and both will remain constant that way, but be careful about the choice for such hyperparameters.
Note
Doing regression this way requires both memory and computation time which scale quadratically with the number of columns/features/variables. As such, the class will by default use C ‘float’ types (typically
np.float32
) instead of C ‘double’ (np.float64
), in order to save memory.- Parameters
alpha (float) – Strenght of the regularization.
l1_ratio (float [0,1]) – Proportion of the regularization that will be applied to the l1 norm of the coefficients (remainder will be applied to the l2 norm). Must be a number between zero and one. If passing
l1_ratio=0
, it’s recommended instead to use theLinearRegression
class which uses more efficient procedures.Using higher l1 regularization is more likely to result in some of the obtained coefficients being exactly zero, which is oftentimes desirable.
fit_intercept (bool) – Whether to add an intercept term to the formula. If passing ‘True’, it will be the last entry in the coefficients.
l1 (None or float) – Strength of the l1 regularization. If passing it, will bypass the values set through
alpha
andl1_ratio
, and will remain constant inbetween calls tofit
andpartial_fit
. If passing this, should also passl2
or otherwise will assume that it is zero.l2 (None or float) – Strength of the l2 regularization. If passing it, will bypass the values set through
alpha
andl1_ratio
, and will remain constant inbetween calls tofit
andpartial_fit
. If passing this, should also passl1
or otherwise will assume that it is zero.use_float (bool) – Whether to use C ‘float’ type for the required matrices. If passing ‘False’, will use C ‘double’. Be aware that memory usage for this model can grow very large. Can be changed after initialization.
- Variables
coef (array(n) or array(n+1)) – The obtained coefficients. If passing ‘fit_intercept=True’, the intercept will be at the last entry.
References
- 1
Franc, Vojtech, Vaclav Hlavac, and Mirko Navara. “Sequential coordinate-wise algorithm for the non-negative least squares problem.” International Conference on Computer Analysis of Images and Patterns. Springer, Berlin, Heidelberg, 2005.
- fit(X, y, sample_weight=None)¶
Fit model to data
Note
Calling ‘fit’ will reset whatever previous data was there. For fitting the model incrementally to new data, use ‘partial_fit’ instead.
- Parameters
X (array(m,n) or CSR matrix(m, n)) – The covariates.
y (array-like(m)) – The target variable.
sample_weight (None or array-like(m)) – Observation weights for each row.
- Return type
self
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
- partial_fit(X, y, sample_weight=None, *args, **kwargs)¶
Fit model incrementally to new data
- Parameters
X (array(m,n) or CSR matrix(m, n)) – The covariates.
y (array-like(m)) – The target variable.
sample_weight (None or array-like(m)) – Observation weights for each row.
- Return type
self
- predict(X)¶
Make predictions on new data
- Parameters
X (array(m,n) or CSR matrix(m, n)) – The covariates.
- Returns
y_hat – The predicted values given ‘X’.
- Return type
array(m)
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- property use_float¶