src.fairreckitlib.model.algorithms.surprise.surprise_predictor

This module contains the surprise predictor and creation functions.

Classes:

SurprisePredictor: predictor implementation for surprise.

Functions:

create_baseline_only_als: create BaselineOnly ALS predictor (factory creation compatible).
create_baseline_only_sgd: create BaselineOnly SGD predictor (factory creation compatible).
create_co_clustering: create CoClustering predictor (factory creation compatible).
create_knn_basic: create KNNBasic predictor (factory creation compatible).
create_knn_baseline_als: create KNNBaseline ALS predictor (factory creation compatible).
create_knn_baseline_sgd: create KNNBaseline SGD predictor (factory creation compatible).
create_knn_with_means: create KNNWithMeans predictor (factory creation compatible).
create_knn_with_zscore: create KNNWithZScore predictor (factory creation compatible).
create_nmf: create NMF predictor (factory creation compatible).
create_normal_predictor: create NormalPredictor predictor (factory creation compatible).
create_slope_one: create SlopeOne predictor (factory creation compatible).
create_svd: create SVD predictor (factory creation compatible).
create_svd_pp: create SVDpp predictor (factory creation compatible).

This program has been developed by students from the bachelor Computer Science at Utrecht University within the Software Project course. © Copyright Utrecht University (Department of Information and Computing Sciences)

  1"""This module contains the surprise predictor and creation functions.
  2
  3Classes:
  4
  5    SurprisePredictor: predictor implementation for surprise.
  6
  7Functions:
  8
  9    create_baseline_only_als: create BaselineOnly ALS predictor (factory creation compatible).
 10    create_baseline_only_sgd: create BaselineOnly SGD predictor (factory creation compatible).
 11    create_co_clustering: create CoClustering predictor (factory creation compatible).
 12    create_knn_basic: create KNNBasic predictor (factory creation compatible).
 13    create_knn_baseline_als: create KNNBaseline ALS predictor (factory creation compatible).
 14    create_knn_baseline_sgd: create KNNBaseline SGD predictor (factory creation compatible).
 15    create_knn_with_means: create KNNWithMeans predictor (factory creation compatible).
 16    create_knn_with_zscore: create KNNWithZScore predictor (factory creation compatible).
 17    create_nmf: create NMF predictor (factory creation compatible).
 18    create_normal_predictor: create NormalPredictor predictor (factory creation compatible).
 19    create_slope_one: create SlopeOne predictor (factory creation compatible).
 20    create_svd: create SVD predictor (factory creation compatible).
 21    create_svd_pp: create SVDpp predictor (factory creation compatible).
 22
 23This program has been developed by students from the bachelor Computer Science at
 24Utrecht University within the Software Project course.
 25© Copyright Utrecht University (Department of Information and Computing Sciences)
 26"""
 27
 28import math
 29import time
 30from typing import Any, Dict
 31
 32import surprise
 33from surprise.prediction_algorithms import AlgoBase
 34from surprise.prediction_algorithms import BaselineOnly
 35from surprise.prediction_algorithms import CoClustering
 36from surprise.prediction_algorithms import KNNBasic, KNNBaseline, KNNWithMeans, KNNWithZScore
 37from surprise.prediction_algorithms import NMF
 38from surprise.prediction_algorithms import NormalPredictor
 39from surprise.prediction_algorithms import SlopeOne
 40from surprise.prediction_algorithms import SVD, SVDpp
 41
 42from ..base_predictor import Predictor
 43
 44
 45class SurprisePredictor(Predictor):
 46    """Predictor implementation for the Surprise package."""
 47
 48    def __init__(self, algo: AlgoBase, name: str, params: Dict[str, Any], **kwargs):
 49        """Construct the surprise predictor.
 50
 51        Args:
 52            algo: the surprise prediction algorithm.
 53            name: the name of the predictor.
 54            params: the parameters of the predictor.
 55
 56        Keyword Args:
 57            num_threads(int): the max number of threads the predictor can use.
 58        """
 59        Predictor.__init__(self, name, params, kwargs['num_threads'])
 60        self.algo = algo
 61
 62    def on_train(self, train_set: surprise.Trainset) -> None:
 63        """Train the algorithm on the train set.
 64
 65        The predictor should be trained with a matrix that is
 66        compatible with the surprise package.
 67
 68        Args:
 69            train_set: the set to train the predictor with.
 70
 71        Raises:
 72            ArithmeticError: possibly raised by an algorithm on training.
 73            MemoryError: possibly raised by an algorithm on training.
 74            RuntimeError: possibly raised by an algorithm on training.
 75            TypeError: when the train set is not a surprise.Trainset.
 76        """
 77        if not isinstance(train_set, surprise.Trainset):
 78            raise TypeError('Expected predictor to be trained with a surprise compatible matrix')
 79
 80        self.algo.fit(train_set)
 81
 82    def on_predict(self, user: int, item: int) -> float:
 83        """Compute a prediction for the specified user and item.
 84
 85        Surprise predictors clip the predicted ratings by default to the original rating scale
 86        that is provided during training. It is turned off to conform with the expected interface.
 87
 88        Args:
 89            user: the user ID.
 90            item: the item ID.
 91
 92        Raises:
 93            ArithmeticError: possibly raised by a predictor on testing.
 94            MemoryError: possibly raised by a predictor on testing.
 95            RuntimeError: when the predictor is not trained yet.
 96
 97        Returns:
 98            the predicted rating.
 99        """
100        prediction = self.algo.predict(user, item, clip=False)
101        return math.nan if prediction.details['was_impossible'] else prediction.est
102
103
104def create_baseline_only_als(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
105    """Create the BaselineOnly ALS predictor.
106
107    Args:
108        name: the name of the algorithm.
109        params: containing the following name-value pairs:
110            epochs(int): The number of iteration of the ALS procedure.
111            reg_i(int): the regularization parameter for items.
112            reg_u(int): The regularization parameter for items.
113
114    Returns:
115        the SurprisePredictor wrapper of BaselineOnly with method 'als'.
116    """
117    algo = BaselineOnly(
118        bsl_options={
119            'method': 'als',
120            'reg_i': params['reg_i'],
121            'reg_u': params['reg_u'],
122            'n_epochs': params['epochs']
123        },
124        verbose=False
125    )
126
127    return SurprisePredictor(algo, name, params, **kwargs)
128
129
130
131def create_baseline_only_sgd(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
132    """Create the BaselineOnly SGD predictor.
133
134    Args:
135        name: the name of the algorithm.
136        params: containing the following name-value pairs:
137            epochs(int): the number of iteration of the SGD procedure.
138            regularization(float): the regularization parameter
139                of the cost function that is optimized.
140            learning_rate(float): the learning rate of SGD.
141
142    Returns:
143        the SurprisePredictor wrapper of BaselineOnly with method 'sgd'.
144    """
145    algo = BaselineOnly(
146        bsl_options={
147            'method': 'sgd',
148            'reg': params['regularization'],
149            'learning_rate': params['learning_rate'],
150            'n_epochs': params['epochs']
151         },
152        verbose=False
153    )
154
155    return SurprisePredictor(algo, name, params, **kwargs)
156
157
158def create_co_clustering(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
159    """Create the CoClustering predictor.
160
161    Args:
162        name: the name of the algorithm.
163        params: containing the following name-value pairs:
164            epochs(int): number of iteration of the optimization loop.
165            user_clusters(int): number of user clusters.
166            item_clusters(int): number of item clusters.
167            random_seed(int): the random seed or None for the current time as seed.
168
169    Returns:
170        the SurprisePredictor wrapper of CoClustering.
171    """
172    if params['random_seed'] is None:
173        params['random_seed'] = int(time.time())
174
175    algo = CoClustering(
176        n_cltr_u=params['user_clusters'],
177        n_cltr_i=params['item_clusters'],
178        n_epochs=params['epochs'],
179        random_state=params['random_seed'],
180        verbose=False
181    )
182
183    return SurprisePredictor(algo, name, params, **kwargs)
184
185
186def create_knn_basic(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
187    """Create the KNNBasic predictor.
188
189    Args:
190        name: the name of the algorithm.
191        params: containing the following name-value pairs:
192            max_k(int): the maximum number of neighbors to take into account for aggregation.
193            min_k(int): the minimum number of neighbors to take into account for aggregation.
194            user_based(bool): whether similarities will be computed between users or between
195                items, this has a huge impact on the performance.
196            min_support(int): the minimum number of common items or users, depending on the
197                user_based parameter.
198            similarity(str): the name of the similarity to use ('MSD', 'cosine' or 'pearson').
199
200    Returns:
201        the SurprisePredictor wrapper of KNNBasic.
202    """
203    algo = KNNBasic(
204        k=params['max_k'],
205        min_k=params['min_k'],
206        sim_options={
207            'name': params['similarity'],
208            'user_based': params['user_based'],
209            'min_support': params['min_support']
210        },
211        verbose=False
212    )
213
214    return SurprisePredictor(algo, name, params, **kwargs)
215
216
217def create_knn_baseline_als(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
218    """Create the KNNBaseline ALS predictor.
219
220    Args:
221        name: the name of the algorithm.
222        params: containing the following name-value pairs:
223            max_k(int): the maximum number of neighbors to take into account for aggregation.
224            min_k(int): the minimum number of neighbors to take into account for aggregation.
225            user_based(bool): whether similarities will be computed between users or between
226                items, this has a huge impact on the performance.
227            min_support(int): the minimum number of common items or users, depending on the
228                user_based parameter.
229            epochs(int): The number of iteration of the ALS procedure.
230            reg_i(int): the regularization parameter for items.
231            reg_u(int): The regularization parameter for items.
232
233    Returns:
234        the SurprisePredictor wrapper of KNNBaseline with method 'als'.
235    """
236    algo = KNNBaseline(
237        k=params['max_k'],
238        min_k=params['min_k'],
239        bsl_options={
240            'name': 'als',
241            'reg_i': params['reg_i'],
242            'reg_u': params['reg_u'],
243            'n_epochs': params['epochs']
244        },
245        sim_options={
246            'name': 'pearson_baseline',
247            'user_based': params['user_based'],
248            'min_support': params['min_support'],
249            'shrinkage': params['shrinkage']
250        },
251        verbose=False
252    )
253
254    return SurprisePredictor(algo, name, params, **kwargs)
255
256
257
258def create_knn_baseline_sgd(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
259    """Create the KNNBaseline SGD predictor.
260
261    Args:
262        name: the name of the algorithm.
263        params: containing the following name-value pairs:
264            max_k(int): the maximum number of neighbors to take into account for aggregation.
265            min_k(int): the minimum number of neighbors to take into account for aggregation.
266            user_based(bool): whether similarities will be computed between users or between
267                items, this has a huge impact on the performance.
268            min_support(int): the minimum number of common items or users, depending on the
269                user_based parameter.
270            shrinkage(int): shrinkage parameter to apply.
271            epochs(int): the number of iteration of the SGD procedure.
272            regularization(float): the regularization parameter
273                of the cost function that is optimized.
274            learning_rate(float): the learning rate of SGD.
275
276    Returns:
277        the SurprisePredictor wrapper of KNNBaseline with method 'sgd'.
278    """
279    algo = KNNBaseline(
280        k=params['max_k'],
281        min_k=params['min_k'],
282        bsl_options={
283            'method': 'sgd',
284            'reg': params['regularization'],
285            'learning_rate': params['learning_rate'],
286            'n_epochs': params['epochs']
287         },
288        sim_options={
289            'name': 'pearson_baseline',
290            'user_based': params['user_based'],
291            'min_support': params['min_support'],
292            'shrinkage': params['shrinkage']
293        },
294        verbose=False
295    )
296
297    return SurprisePredictor(algo, name, params, **kwargs)
298
299
300
301def create_knn_with_means(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
302    """Create the KNNWithMeans predictor.
303
304    Args:
305        name: the name of the algorithm.
306        params: containing the following name-value pairs:
307            max_k(int): the maximum number of neighbors to take into account for aggregation.
308            min_k(int): the minimum number of neighbors to take into account for aggregation.
309            user_based(bool): whether similarities will be computed between users or between
310                items, this has a huge impact on the performance.
311            min_support(int): the minimum number of common items or users, depending on the
312                user_based parameter.
313            similarity(str): the name of the similarity to use ('MSD', 'cosine' or 'pearson').
314
315    Returns:
316        the SurprisePredictor wrapper of KNNWithMeans.
317    """
318    algo = KNNWithMeans(
319        k=params['max_k'],
320        min_k=params['min_k'],
321        sim_options={
322            'name': params['similarity'],
323            'user_based': params['user_based'],
324            'min_support': params['min_support']
325        },
326        verbose=False
327    )
328
329    return SurprisePredictor(algo, name, params, **kwargs)
330
331
332
333def create_knn_with_zscore(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
334    """Create the KNNWithZScore predictor.
335
336    Args:
337        name: the name of the algorithm.
338        params: containing the following name-value pairs:
339            max_k(int): the maximum number of neighbors to take into account for aggregation.
340            min_k(int): the minimum number of neighbors to take into account for aggregation.
341            user_based(bool): whether similarities will be computed between users or between
342                items, this has a huge impact on the performance.
343            min_support(int): the minimum number of common items or users, depending on the
344                user_based parameter.
345            similarity(str): the name of the similarity to use ('MSD', 'cosine' or 'pearson').
346
347    Returns:
348        the SurprisePredictor wrapper of KNNWithZScore.
349    """
350    algo = KNNWithZScore(
351        k=params['max_k'],
352        min_k=params['min_k'],
353        sim_options={
354            'name': params['similarity'],
355            'user_based': params['user_based'],
356            'min_support': params['min_support']
357        },
358        verbose=False
359    )
360
361    return SurprisePredictor(algo, name, params, **kwargs)
362
363
364def create_nmf(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
365    """Create the NMF predictor.
366
367    Args:
368        name: the name of the algorithm.
369        params: containing the following name-value pairs:
370            factors(int): the number of factors.
371            epochs(int): the number of iteration of the SGD procedure.
372            reg_pu(float): the regularization term for users.
373            reg_qi(float): the regularization term for items.
374            init_low(int): lower bound for random initialization of factors.
375            init_high(int): higher bound for random initialization of factors.
376            random_seed(int): the random seed or None for the current time as seed.
377
378    Returns:
379        the SurprisePredictor wrapper of NMF.
380    """
381    if params['random_seed'] is None:
382        params['random_seed'] = int(time.time())
383
384    algo = NMF(
385        n_factors=params['factors'],
386        n_epochs=params['epochs'],
387        biased=False,
388        reg_pu=params['reg_pu'],
389        reg_qi=params['reg_qi'],
390        init_low=params['init_low'],
391        init_high=params['init_high'],
392        random_state=params['random_seed'],
393        verbose=False
394    )
395
396    return SurprisePredictor(algo, name, params, **kwargs)
397
398
399def create_normal_predictor(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
400    """Create the NormalPredictor.
401
402    Args:
403        name: the name of the algorithm.
404        params: there are no parameters for this algorithm.
405
406    Returns:
407        the SurprisePredictor wrapper of NormalPredictor.
408    """
409    return SurprisePredictor(NormalPredictor(), name, params, **kwargs)
410
411
412def create_slope_one(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
413    """Create the SlopeOne predictor.
414
415    Args:
416        name: the name of the algorithm.
417        params: there are no parameters for this algorithm.
418
419    Returns:
420        the SurprisePredictor wrapper of SlopeOne.
421    """
422    return SurprisePredictor(SlopeOne(), name, params, **kwargs)
423
424
425def create_svd(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
426    """Create the SVD predictor.
427
428    Args:
429        name: the name of the algorithm.
430        params: containing the following name-value pairs:
431            factors(int): the number of factors.
432            epochs(int): the number of iteration of the SGD procedure.
433            biased(bool): whether to use baselines (or biases).
434            init_mean(int): the mean of the normal distribution for factor vectors initialization.
435            init_std_dev(float): the standard deviation of the normal distribution for
436                factor vectors initialization.
437            learning_rate(float): the learning rate for users and items.
438            regularization(float): the regularization term for users and items.
439            random_seed(int): the random seed or None for the current time as seed.
440
441    Returns:
442        the SurprisePredictor wrapper of SVD.
443    """
444    if params['random_seed'] is None:
445        params['random_seed'] = int(time.time())
446
447    algo = SVD(
448        n_factors=params['factors'],
449        n_epochs=params['epochs'],
450        biased=params['biased'],
451        init_mean=params['init_mean'],
452        init_std_dev=params['init_std_dev'],
453        lr_all=params['learning_rate'],
454        reg_all=params['regularization'],
455        lr_bu=None, lr_bi=None, lr_pu=None, lr_qi=None,
456        reg_bu=None, reg_bi=None, reg_pu=None, reg_qi=None,
457        random_state=params['random_seed'],
458        verbose=False
459    )
460
461    return SurprisePredictor(algo, name, params, **kwargs)
462
463
464def create_svd_pp(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
465    """Create the SVDpp predictor.
466
467    Args:
468        name: the name of the algorithm.
469        params: containing the following name-value pairs:
470            factors(int): the number of factors.
471            epochs(int): the number of iteration of the SGD procedure.
472            init_mean(int): the mean of the normal distribution for factor vectors initialization.
473            init_std_dev(float): the standard deviation of the normal distribution for
474                factor vectors initialization.
475            learning_rate(float): the learning rate for users and items.
476            regularization(float): the regularization term for users and items.
477            random_seed(int): the random seed or None for the current time as seed.
478
479    Returns:
480        the SurprisePredictor wrapper of SVDpp.
481    """
482    if params['random_seed'] is None:
483        params['random_seed'] = int(time.time())
484
485    algo = SVDpp(
486        n_factors=params['factors'],
487        n_epochs=params['epochs'],
488        init_mean=params['init_mean'],
489        init_std_dev=params['init_std_dev'],
490        lr_all=params['learning_rate'],
491        reg_all=params['regularization'],
492        lr_bu=None, lr_bi=None, lr_pu=None, lr_qi=None, lr_yj=None,
493        reg_bu=None, reg_bi=None, reg_pu=None, reg_qi=None, reg_yj=None,
494        random_state=params['random_seed'],
495        verbose=False
496    )
497
498    return SurprisePredictor(algo, name, params, **kwargs)
class SurprisePredictor(src.fairreckitlib.model.algorithms.base_predictor.Predictor):
 46class SurprisePredictor(Predictor):
 47    """Predictor implementation for the Surprise package."""
 48
 49    def __init__(self, algo: AlgoBase, name: str, params: Dict[str, Any], **kwargs):
 50        """Construct the surprise predictor.
 51
 52        Args:
 53            algo: the surprise prediction algorithm.
 54            name: the name of the predictor.
 55            params: the parameters of the predictor.
 56
 57        Keyword Args:
 58            num_threads(int): the max number of threads the predictor can use.
 59        """
 60        Predictor.__init__(self, name, params, kwargs['num_threads'])
 61        self.algo = algo
 62
 63    def on_train(self, train_set: surprise.Trainset) -> None:
 64        """Train the algorithm on the train set.
 65
 66        The predictor should be trained with a matrix that is
 67        compatible with the surprise package.
 68
 69        Args:
 70            train_set: the set to train the predictor with.
 71
 72        Raises:
 73            ArithmeticError: possibly raised by an algorithm on training.
 74            MemoryError: possibly raised by an algorithm on training.
 75            RuntimeError: possibly raised by an algorithm on training.
 76            TypeError: when the train set is not a surprise.Trainset.
 77        """
 78        if not isinstance(train_set, surprise.Trainset):
 79            raise TypeError('Expected predictor to be trained with a surprise compatible matrix')
 80
 81        self.algo.fit(train_set)
 82
 83    def on_predict(self, user: int, item: int) -> float:
 84        """Compute a prediction for the specified user and item.
 85
 86        Surprise predictors clip the predicted ratings by default to the original rating scale
 87        that is provided during training. It is turned off to conform with the expected interface.
 88
 89        Args:
 90            user: the user ID.
 91            item: the item ID.
 92
 93        Raises:
 94            ArithmeticError: possibly raised by a predictor on testing.
 95            MemoryError: possibly raised by a predictor on testing.
 96            RuntimeError: when the predictor is not trained yet.
 97
 98        Returns:
 99            the predicted rating.
100        """
101        prediction = self.algo.predict(user, item, clip=False)
102        return math.nan if prediction.details['was_impossible'] else prediction.est

Predictor implementation for the Surprise package.

SurprisePredictor( algo: surprise.prediction_algorithms.algo_base.AlgoBase, name: str, params: Dict[str, Any], **kwargs)
49    def __init__(self, algo: AlgoBase, name: str, params: Dict[str, Any], **kwargs):
50        """Construct the surprise predictor.
51
52        Args:
53            algo: the surprise prediction algorithm.
54            name: the name of the predictor.
55            params: the parameters of the predictor.
56
57        Keyword Args:
58            num_threads(int): the max number of threads the predictor can use.
59        """
60        Predictor.__init__(self, name, params, kwargs['num_threads'])
61        self.algo = algo

Construct the surprise predictor.

Args: algo: the surprise prediction algorithm. name: the name of the predictor. params: the parameters of the predictor.

Keyword Args: num_threads(int): the max number of threads the predictor can use.

def on_train(self, train_set: surprise.trainset.Trainset) -> None:
63    def on_train(self, train_set: surprise.Trainset) -> None:
64        """Train the algorithm on the train set.
65
66        The predictor should be trained with a matrix that is
67        compatible with the surprise package.
68
69        Args:
70            train_set: the set to train the predictor with.
71
72        Raises:
73            ArithmeticError: possibly raised by an algorithm on training.
74            MemoryError: possibly raised by an algorithm on training.
75            RuntimeError: possibly raised by an algorithm on training.
76            TypeError: when the train set is not a surprise.Trainset.
77        """
78        if not isinstance(train_set, surprise.Trainset):
79            raise TypeError('Expected predictor to be trained with a surprise compatible matrix')
80
81        self.algo.fit(train_set)

Train the algorithm on the train set.

The predictor should be trained with a matrix that is compatible with the surprise package.

Args: train_set: the set to train the predictor with.

Raises: ArithmeticError: possibly raised by an algorithm on training. MemoryError: possibly raised by an algorithm on training. RuntimeError: possibly raised by an algorithm on training. TypeError: when the train set is not a surprise.Trainset.

def on_predict(self, user: int, item: int) -> float:
 83    def on_predict(self, user: int, item: int) -> float:
 84        """Compute a prediction for the specified user and item.
 85
 86        Surprise predictors clip the predicted ratings by default to the original rating scale
 87        that is provided during training. It is turned off to conform with the expected interface.
 88
 89        Args:
 90            user: the user ID.
 91            item: the item ID.
 92
 93        Raises:
 94            ArithmeticError: possibly raised by a predictor on testing.
 95            MemoryError: possibly raised by a predictor on testing.
 96            RuntimeError: when the predictor is not trained yet.
 97
 98        Returns:
 99            the predicted rating.
100        """
101        prediction = self.algo.predict(user, item, clip=False)
102        return math.nan if prediction.details['was_impossible'] else prediction.est

Compute a prediction for the specified user and item.

Surprise predictors clip the predicted ratings by default to the original rating scale that is provided during training. It is turned off to conform with the expected interface.

Args: user: the user ID. item: the item ID.

Raises: ArithmeticError: possibly raised by a predictor on testing. MemoryError: possibly raised by a predictor on testing. RuntimeError: when the predictor is not trained yet.

Returns: the predicted rating.

def create_baseline_only_als( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.surprise.surprise_predictor.SurprisePredictor:
105def create_baseline_only_als(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
106    """Create the BaselineOnly ALS predictor.
107
108    Args:
109        name: the name of the algorithm.
110        params: containing the following name-value pairs:
111            epochs(int): The number of iteration of the ALS procedure.
112            reg_i(int): the regularization parameter for items.
113            reg_u(int): The regularization parameter for items.
114
115    Returns:
116        the SurprisePredictor wrapper of BaselineOnly with method 'als'.
117    """
118    algo = BaselineOnly(
119        bsl_options={
120            'method': 'als',
121            'reg_i': params['reg_i'],
122            'reg_u': params['reg_u'],
123            'n_epochs': params['epochs']
124        },
125        verbose=False
126    )
127
128    return SurprisePredictor(algo, name, params, **kwargs)

Create the BaselineOnly ALS predictor.

Args: name: the name of the algorithm. params: containing the following name-value pairs: epochs(int): The number of iteration of the ALS procedure. reg_i(int): the regularization parameter for items. reg_u(int): The regularization parameter for items.

Returns: the SurprisePredictor wrapper of BaselineOnly with method 'als'.

def create_baseline_only_sgd( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.surprise.surprise_predictor.SurprisePredictor:
132def create_baseline_only_sgd(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
133    """Create the BaselineOnly SGD predictor.
134
135    Args:
136        name: the name of the algorithm.
137        params: containing the following name-value pairs:
138            epochs(int): the number of iteration of the SGD procedure.
139            regularization(float): the regularization parameter
140                of the cost function that is optimized.
141            learning_rate(float): the learning rate of SGD.
142
143    Returns:
144        the SurprisePredictor wrapper of BaselineOnly with method 'sgd'.
145    """
146    algo = BaselineOnly(
147        bsl_options={
148            'method': 'sgd',
149            'reg': params['regularization'],
150            'learning_rate': params['learning_rate'],
151            'n_epochs': params['epochs']
152         },
153        verbose=False
154    )
155
156    return SurprisePredictor(algo, name, params, **kwargs)

Create the BaselineOnly SGD predictor.

Args: name: the name of the algorithm. params: containing the following name-value pairs: epochs(int): the number of iteration of the SGD procedure. regularization(float): the regularization parameter of the cost function that is optimized. learning_rate(float): the learning rate of SGD.

Returns: the SurprisePredictor wrapper of BaselineOnly with method 'sgd'.

def create_co_clustering( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.surprise.surprise_predictor.SurprisePredictor:
159def create_co_clustering(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
160    """Create the CoClustering predictor.
161
162    Args:
163        name: the name of the algorithm.
164        params: containing the following name-value pairs:
165            epochs(int): number of iteration of the optimization loop.
166            user_clusters(int): number of user clusters.
167            item_clusters(int): number of item clusters.
168            random_seed(int): the random seed or None for the current time as seed.
169
170    Returns:
171        the SurprisePredictor wrapper of CoClustering.
172    """
173    if params['random_seed'] is None:
174        params['random_seed'] = int(time.time())
175
176    algo = CoClustering(
177        n_cltr_u=params['user_clusters'],
178        n_cltr_i=params['item_clusters'],
179        n_epochs=params['epochs'],
180        random_state=params['random_seed'],
181        verbose=False
182    )
183
184    return SurprisePredictor(algo, name, params, **kwargs)

Create the CoClustering predictor.

Args: name: the name of the algorithm. params: containing the following name-value pairs: epochs(int): number of iteration of the optimization loop. user_clusters(int): number of user clusters. item_clusters(int): number of item clusters. random_seed(int): the random seed or None for the current time as seed.

Returns: the SurprisePredictor wrapper of CoClustering.

def create_knn_basic( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.surprise.surprise_predictor.SurprisePredictor:
187def create_knn_basic(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
188    """Create the KNNBasic predictor.
189
190    Args:
191        name: the name of the algorithm.
192        params: containing the following name-value pairs:
193            max_k(int): the maximum number of neighbors to take into account for aggregation.
194            min_k(int): the minimum number of neighbors to take into account for aggregation.
195            user_based(bool): whether similarities will be computed between users or between
196                items, this has a huge impact on the performance.
197            min_support(int): the minimum number of common items or users, depending on the
198                user_based parameter.
199            similarity(str): the name of the similarity to use ('MSD', 'cosine' or 'pearson').
200
201    Returns:
202        the SurprisePredictor wrapper of KNNBasic.
203    """
204    algo = KNNBasic(
205        k=params['max_k'],
206        min_k=params['min_k'],
207        sim_options={
208            'name': params['similarity'],
209            'user_based': params['user_based'],
210            'min_support': params['min_support']
211        },
212        verbose=False
213    )
214
215    return SurprisePredictor(algo, name, params, **kwargs)

Create the KNNBasic predictor.

Args: name: the name of the algorithm. params: containing the following name-value pairs: max_k(int): the maximum number of neighbors to take into account for aggregation. min_k(int): the minimum number of neighbors to take into account for aggregation. user_based(bool): whether similarities will be computed between users or between items, this has a huge impact on the performance. min_support(int): the minimum number of common items or users, depending on the user_based parameter. similarity(str): the name of the similarity to use ('MSD', 'cosine' or 'pearson').

Returns: the SurprisePredictor wrapper of KNNBasic.

def create_knn_baseline_als( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.surprise.surprise_predictor.SurprisePredictor:
218def create_knn_baseline_als(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
219    """Create the KNNBaseline ALS predictor.
220
221    Args:
222        name: the name of the algorithm.
223        params: containing the following name-value pairs:
224            max_k(int): the maximum number of neighbors to take into account for aggregation.
225            min_k(int): the minimum number of neighbors to take into account for aggregation.
226            user_based(bool): whether similarities will be computed between users or between
227                items, this has a huge impact on the performance.
228            min_support(int): the minimum number of common items or users, depending on the
229                user_based parameter.
230            epochs(int): The number of iteration of the ALS procedure.
231            reg_i(int): the regularization parameter for items.
232            reg_u(int): The regularization parameter for items.
233
234    Returns:
235        the SurprisePredictor wrapper of KNNBaseline with method 'als'.
236    """
237    algo = KNNBaseline(
238        k=params['max_k'],
239        min_k=params['min_k'],
240        bsl_options={
241            'name': 'als',
242            'reg_i': params['reg_i'],
243            'reg_u': params['reg_u'],
244            'n_epochs': params['epochs']
245        },
246        sim_options={
247            'name': 'pearson_baseline',
248            'user_based': params['user_based'],
249            'min_support': params['min_support'],
250            'shrinkage': params['shrinkage']
251        },
252        verbose=False
253    )
254
255    return SurprisePredictor(algo, name, params, **kwargs)

Create the KNNBaseline ALS predictor.

Args: name: the name of the algorithm. params: containing the following name-value pairs: max_k(int): the maximum number of neighbors to take into account for aggregation. min_k(int): the minimum number of neighbors to take into account for aggregation. user_based(bool): whether similarities will be computed between users or between items, this has a huge impact on the performance. min_support(int): the minimum number of common items or users, depending on the user_based parameter. epochs(int): The number of iteration of the ALS procedure. reg_i(int): the regularization parameter for items. reg_u(int): The regularization parameter for items.

Returns: the SurprisePredictor wrapper of KNNBaseline with method 'als'.

def create_knn_baseline_sgd( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.surprise.surprise_predictor.SurprisePredictor:
259def create_knn_baseline_sgd(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
260    """Create the KNNBaseline SGD predictor.
261
262    Args:
263        name: the name of the algorithm.
264        params: containing the following name-value pairs:
265            max_k(int): the maximum number of neighbors to take into account for aggregation.
266            min_k(int): the minimum number of neighbors to take into account for aggregation.
267            user_based(bool): whether similarities will be computed between users or between
268                items, this has a huge impact on the performance.
269            min_support(int): the minimum number of common items or users, depending on the
270                user_based parameter.
271            shrinkage(int): shrinkage parameter to apply.
272            epochs(int): the number of iteration of the SGD procedure.
273            regularization(float): the regularization parameter
274                of the cost function that is optimized.
275            learning_rate(float): the learning rate of SGD.
276
277    Returns:
278        the SurprisePredictor wrapper of KNNBaseline with method 'sgd'.
279    """
280    algo = KNNBaseline(
281        k=params['max_k'],
282        min_k=params['min_k'],
283        bsl_options={
284            'method': 'sgd',
285            'reg': params['regularization'],
286            'learning_rate': params['learning_rate'],
287            'n_epochs': params['epochs']
288         },
289        sim_options={
290            'name': 'pearson_baseline',
291            'user_based': params['user_based'],
292            'min_support': params['min_support'],
293            'shrinkage': params['shrinkage']
294        },
295        verbose=False
296    )
297
298    return SurprisePredictor(algo, name, params, **kwargs)

Create the KNNBaseline SGD predictor.

Args: name: the name of the algorithm. params: containing the following name-value pairs: max_k(int): the maximum number of neighbors to take into account for aggregation. min_k(int): the minimum number of neighbors to take into account for aggregation. user_based(bool): whether similarities will be computed between users or between items, this has a huge impact on the performance. min_support(int): the minimum number of common items or users, depending on the user_based parameter. shrinkage(int): shrinkage parameter to apply. epochs(int): the number of iteration of the SGD procedure. regularization(float): the regularization parameter of the cost function that is optimized. learning_rate(float): the learning rate of SGD.

Returns: the SurprisePredictor wrapper of KNNBaseline with method 'sgd'.

def create_knn_with_means( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.surprise.surprise_predictor.SurprisePredictor:
302def create_knn_with_means(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
303    """Create the KNNWithMeans predictor.
304
305    Args:
306        name: the name of the algorithm.
307        params: containing the following name-value pairs:
308            max_k(int): the maximum number of neighbors to take into account for aggregation.
309            min_k(int): the minimum number of neighbors to take into account for aggregation.
310            user_based(bool): whether similarities will be computed between users or between
311                items, this has a huge impact on the performance.
312            min_support(int): the minimum number of common items or users, depending on the
313                user_based parameter.
314            similarity(str): the name of the similarity to use ('MSD', 'cosine' or 'pearson').
315
316    Returns:
317        the SurprisePredictor wrapper of KNNWithMeans.
318    """
319    algo = KNNWithMeans(
320        k=params['max_k'],
321        min_k=params['min_k'],
322        sim_options={
323            'name': params['similarity'],
324            'user_based': params['user_based'],
325            'min_support': params['min_support']
326        },
327        verbose=False
328    )
329
330    return SurprisePredictor(algo, name, params, **kwargs)

Create the KNNWithMeans predictor.

Args: name: the name of the algorithm. params: containing the following name-value pairs: max_k(int): the maximum number of neighbors to take into account for aggregation. min_k(int): the minimum number of neighbors to take into account for aggregation. user_based(bool): whether similarities will be computed between users or between items, this has a huge impact on the performance. min_support(int): the minimum number of common items or users, depending on the user_based parameter. similarity(str): the name of the similarity to use ('MSD', 'cosine' or 'pearson').

Returns: the SurprisePredictor wrapper of KNNWithMeans.

def create_knn_with_zscore( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.surprise.surprise_predictor.SurprisePredictor:
334def create_knn_with_zscore(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
335    """Create the KNNWithZScore predictor.
336
337    Args:
338        name: the name of the algorithm.
339        params: containing the following name-value pairs:
340            max_k(int): the maximum number of neighbors to take into account for aggregation.
341            min_k(int): the minimum number of neighbors to take into account for aggregation.
342            user_based(bool): whether similarities will be computed between users or between
343                items, this has a huge impact on the performance.
344            min_support(int): the minimum number of common items or users, depending on the
345                user_based parameter.
346            similarity(str): the name of the similarity to use ('MSD', 'cosine' or 'pearson').
347
348    Returns:
349        the SurprisePredictor wrapper of KNNWithZScore.
350    """
351    algo = KNNWithZScore(
352        k=params['max_k'],
353        min_k=params['min_k'],
354        sim_options={
355            'name': params['similarity'],
356            'user_based': params['user_based'],
357            'min_support': params['min_support']
358        },
359        verbose=False
360    )
361
362    return SurprisePredictor(algo, name, params, **kwargs)

Create the KNNWithZScore predictor.

Args: name: the name of the algorithm. params: containing the following name-value pairs: max_k(int): the maximum number of neighbors to take into account for aggregation. min_k(int): the minimum number of neighbors to take into account for aggregation. user_based(bool): whether similarities will be computed between users or between items, this has a huge impact on the performance. min_support(int): the minimum number of common items or users, depending on the user_based parameter. similarity(str): the name of the similarity to use ('MSD', 'cosine' or 'pearson').

Returns: the SurprisePredictor wrapper of KNNWithZScore.

def create_nmf( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.surprise.surprise_predictor.SurprisePredictor:
365def create_nmf(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
366    """Create the NMF predictor.
367
368    Args:
369        name: the name of the algorithm.
370        params: containing the following name-value pairs:
371            factors(int): the number of factors.
372            epochs(int): the number of iteration of the SGD procedure.
373            reg_pu(float): the regularization term for users.
374            reg_qi(float): the regularization term for items.
375            init_low(int): lower bound for random initialization of factors.
376            init_high(int): higher bound for random initialization of factors.
377            random_seed(int): the random seed or None for the current time as seed.
378
379    Returns:
380        the SurprisePredictor wrapper of NMF.
381    """
382    if params['random_seed'] is None:
383        params['random_seed'] = int(time.time())
384
385    algo = NMF(
386        n_factors=params['factors'],
387        n_epochs=params['epochs'],
388        biased=False,
389        reg_pu=params['reg_pu'],
390        reg_qi=params['reg_qi'],
391        init_low=params['init_low'],
392        init_high=params['init_high'],
393        random_state=params['random_seed'],
394        verbose=False
395    )
396
397    return SurprisePredictor(algo, name, params, **kwargs)

Create the NMF predictor.

Args: name: the name of the algorithm. params: containing the following name-value pairs: factors(int): the number of factors. epochs(int): the number of iteration of the SGD procedure. reg_pu(float): the regularization term for users. reg_qi(float): the regularization term for items. init_low(int): lower bound for random initialization of factors. init_high(int): higher bound for random initialization of factors. random_seed(int): the random seed or None for the current time as seed.

Returns: the SurprisePredictor wrapper of NMF.

def create_normal_predictor( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.surprise.surprise_predictor.SurprisePredictor:
400def create_normal_predictor(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
401    """Create the NormalPredictor.
402
403    Args:
404        name: the name of the algorithm.
405        params: there are no parameters for this algorithm.
406
407    Returns:
408        the SurprisePredictor wrapper of NormalPredictor.
409    """
410    return SurprisePredictor(NormalPredictor(), name, params, **kwargs)

Create the NormalPredictor.

Args: name: the name of the algorithm. params: there are no parameters for this algorithm.

Returns: the SurprisePredictor wrapper of NormalPredictor.

def create_slope_one( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.surprise.surprise_predictor.SurprisePredictor:
413def create_slope_one(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
414    """Create the SlopeOne predictor.
415
416    Args:
417        name: the name of the algorithm.
418        params: there are no parameters for this algorithm.
419
420    Returns:
421        the SurprisePredictor wrapper of SlopeOne.
422    """
423    return SurprisePredictor(SlopeOne(), name, params, **kwargs)

Create the SlopeOne predictor.

Args: name: the name of the algorithm. params: there are no parameters for this algorithm.

Returns: the SurprisePredictor wrapper of SlopeOne.

def create_svd( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.surprise.surprise_predictor.SurprisePredictor:
426def create_svd(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
427    """Create the SVD predictor.
428
429    Args:
430        name: the name of the algorithm.
431        params: containing the following name-value pairs:
432            factors(int): the number of factors.
433            epochs(int): the number of iteration of the SGD procedure.
434            biased(bool): whether to use baselines (or biases).
435            init_mean(int): the mean of the normal distribution for factor vectors initialization.
436            init_std_dev(float): the standard deviation of the normal distribution for
437                factor vectors initialization.
438            learning_rate(float): the learning rate for users and items.
439            regularization(float): the regularization term for users and items.
440            random_seed(int): the random seed or None for the current time as seed.
441
442    Returns:
443        the SurprisePredictor wrapper of SVD.
444    """
445    if params['random_seed'] is None:
446        params['random_seed'] = int(time.time())
447
448    algo = SVD(
449        n_factors=params['factors'],
450        n_epochs=params['epochs'],
451        biased=params['biased'],
452        init_mean=params['init_mean'],
453        init_std_dev=params['init_std_dev'],
454        lr_all=params['learning_rate'],
455        reg_all=params['regularization'],
456        lr_bu=None, lr_bi=None, lr_pu=None, lr_qi=None,
457        reg_bu=None, reg_bi=None, reg_pu=None, reg_qi=None,
458        random_state=params['random_seed'],
459        verbose=False
460    )
461
462    return SurprisePredictor(algo, name, params, **kwargs)

Create the SVD predictor.

Args: name: the name of the algorithm. params: containing the following name-value pairs: factors(int): the number of factors. epochs(int): the number of iteration of the SGD procedure. biased(bool): whether to use baselines (or biases). init_mean(int): the mean of the normal distribution for factor vectors initialization. init_std_dev(float): the standard deviation of the normal distribution for factor vectors initialization. learning_rate(float): the learning rate for users and items. regularization(float): the regularization term for users and items. random_seed(int): the random seed or None for the current time as seed.

Returns: the SurprisePredictor wrapper of SVD.

def create_svd_pp( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.surprise.surprise_predictor.SurprisePredictor:
465def create_svd_pp(name: str, params: Dict[str, Any], **kwargs) -> SurprisePredictor:
466    """Create the SVDpp predictor.
467
468    Args:
469        name: the name of the algorithm.
470        params: containing the following name-value pairs:
471            factors(int): the number of factors.
472            epochs(int): the number of iteration of the SGD procedure.
473            init_mean(int): the mean of the normal distribution for factor vectors initialization.
474            init_std_dev(float): the standard deviation of the normal distribution for
475                factor vectors initialization.
476            learning_rate(float): the learning rate for users and items.
477            regularization(float): the regularization term for users and items.
478            random_seed(int): the random seed or None for the current time as seed.
479
480    Returns:
481        the SurprisePredictor wrapper of SVDpp.
482    """
483    if params['random_seed'] is None:
484        params['random_seed'] = int(time.time())
485
486    algo = SVDpp(
487        n_factors=params['factors'],
488        n_epochs=params['epochs'],
489        init_mean=params['init_mean'],
490        init_std_dev=params['init_std_dev'],
491        lr_all=params['learning_rate'],
492        reg_all=params['regularization'],
493        lr_bu=None, lr_bi=None, lr_pu=None, lr_qi=None, lr_yj=None,
494        reg_bu=None, reg_bi=None, reg_pu=None, reg_qi=None, reg_yj=None,
495        random_state=params['random_seed'],
496        verbose=False
497    )
498
499    return SurprisePredictor(algo, name, params, **kwargs)

Create the SVDpp predictor.

Args: name: the name of the algorithm. params: containing the following name-value pairs: factors(int): the number of factors. epochs(int): the number of iteration of the SGD procedure. init_mean(int): the mean of the normal distribution for factor vectors initialization. init_std_dev(float): the standard deviation of the normal distribution for factor vectors initialization. learning_rate(float): the learning rate for users and items. regularization(float): the regularization term for users and items. random_seed(int): the random seed or None for the current time as seed.

Returns: the SurprisePredictor wrapper of SVDpp.