src.fairreckitlib.model.algorithms.lenskit.lenskit_predictor
This module contains the lenskit predictor and creation functions.
Classes:
LensKitPredictor: predictor implementation for lenskit.
Functions:
create_biased_mf: create BiasedMF predictor (factory creation compatible).
create_implicit_mf: create ImplicitMF predictor (factory creation compatible).
create_item_item: create ItemItem predictor (factory creation compatible).
create_pop_score: create PopScore predictor (factory creation compatible).
create_user_user: create UserUser predictor (factory creation compatible).
This program has been developed by students from the bachelor Computer Science at Utrecht University within the Software Project course. © Copyright Utrecht University (Department of Information and Computing Sciences)
1"""This module contains the lenskit predictor and creation functions. 2 3Classes: 4 5 LensKitPredictor: predictor implementation for lenskit. 6 7Functions: 8 9 create_biased_mf: create BiasedMF predictor (factory creation compatible). 10 create_implicit_mf: create ImplicitMF predictor (factory creation compatible). 11 create_item_item: create ItemItem predictor (factory creation compatible). 12 create_pop_score: create PopScore predictor (factory creation compatible). 13 create_user_user: create UserUser predictor (factory creation compatible). 14 15This program has been developed by students from the bachelor Computer Science at 16Utrecht University within the Software Project course. 17© Copyright Utrecht University (Department of Information and Computing Sciences) 18""" 19 20from typing import Any, Dict 21 22import lenskit 23from lenskit import batch 24import pandas as pd 25 26from ..base_predictor import Predictor 27from . import lenskit_algorithms 28 29 30class LensKitPredictor(Predictor): 31 """Predictor implementation for the LensKit framework.""" 32 33 def __init__(self, algo: lenskit.Predictor, name: str, params: Dict[str, Any], **kwargs): 34 """Construct the lenskit predictor. 35 36 Args: 37 algo: the lenskit prediction algorithm. 38 name: the name of the predictor. 39 params: the parameters of the predictor. 40 41 Keyword Args: 42 num_threads(int): the max number of threads the predictor can use. 43 """ 44 Predictor.__init__(self, name, params, kwargs['num_threads']) 45 self.algo = algo 46 47 def on_train(self, train_set: pd.DataFrame) -> None: 48 """Fit the lenskit algorithm on the train set. 49 50 The predictor should be trained with a dataframe matrix. 51 52 Args: 53 train_set: the set to train the predictor with. 54 55 Raises: 56 ArithmeticError: possibly raised by an algorithm on training. 57 MemoryError: possibly raised by an algorithm on training. 58 RuntimeError: possibly raised by an algorithm on training. 59 TypeError: when the train set is not a pandas dataframe. 60 """ 61 if not isinstance(train_set, pd.DataFrame): 62 raise TypeError('Expected predictor to be trained with a dataframe matrix') 63 64 self.algo.fit(train_set) 65 66 def on_predict(self, user: int, item: int) -> float: 67 """Compute a prediction for the specified user and item. 68 69 Lenskit predictors allow for predicting multiple items at the same time. 70 To conform with the interface only one item needs to be predicted and all 71 the extra data that it generates needs to be excluded. 72 73 Args: 74 user: the user ID. 75 item: the item ID. 76 77 Raises: 78 ArithmeticError: possibly raised by a predictor on testing. 79 MemoryError: possibly raised by a predictor on testing. 80 RuntimeError: when the predictor is not trained yet. 81 82 Returns: 83 the predicted rating. 84 """ 85 prediction = self.algo.predict_for_user(user, [item]) 86 return prediction[item] 87 88 def on_predict_batch(self, user_item_pairs: pd.DataFrame) -> pd.DataFrame: 89 """Compute the predictions for each of the specified user and item pairs. 90 91 Lenskit predictors have a batch implementation available that allows for 92 predicting ratings using multiple 'jobs'. 93 94 Args: 95 user_item_pairs: with at least two columns: 'user', 'item'. 96 97 Raises: 98 ArithmeticError: possibly raised by a predictor on testing. 99 MemoryError: possibly raised by a predictor on testing. 100 RuntimeError: when the predictor is not trained yet. 101 102 Returns: 103 dataFrame with the columns: 'user', 'item', 'prediction'. 104 """ 105 n_jobs = self.num_threads if self.num_threads > 0 else None 106 predictions = batch.predict(self.algo, user_item_pairs, n_jobs=n_jobs) 107 return predictions[['user', 'item', 'prediction']] 108 109 110def create_biased_mf(name: str, params: Dict[str, Any], **kwargs) -> LensKitPredictor: 111 """Create the BiasedMF predictor. 112 113 Args: 114 name: the name of the algorithm. 115 params: containing the following name-value pairs: 116 features(int): the number of features to train. 117 iterations(int): the number of iterations to train. 118 user_reg(float): the regularization factor for users. 119 item_reg(float): the regularization factor for items. 120 damping(float): damping factor for the underlying bias. 121 method(str): the solver to use ('cd' or 'lu'). 122 random_seed(int): the random seed or None for the current time as seed. 123 124 Keyword Args: 125 num_threads(int): the max number of threads the algorithm can use. 126 127 Returns: 128 the LensKitPredictor wrapper of BiasedMF. 129 """ 130 algo = lenskit_algorithms.create_biased_mf(params) 131 return LensKitPredictor(algo, name, params, **kwargs) 132 133 134def create_implicit_mf(name: str, params: Dict[str, Any], **kwargs) -> LensKitPredictor: 135 """Create the ImplicitMF predictor. 136 137 Args: 138 name: the name of the algorithm. 139 params: containing the following name-value pairs: 140 features(int): the number of features to train. 141 iterations(int): the number of iterations to train. 142 reg(float): the regularization factor. 143 weight(flot): the scaling weight for positive samples. 144 use_ratings(bool): whether to use the rating column or treat 145 every rated user-item pair as having a rating of 1. 146 method(str): the training method ('cg' or 'lu'). 147 random_seed(int): the random seed or None for the current time as seed. 148 149 Keyword Args: 150 num_threads(int): the max number of threads the algorithm can use. 151 152 Returns: 153 the LensKitPredictor wrapper of ImplicitMF. 154 """ 155 algo = lenskit_algorithms.create_implicit_mf(params) 156 return LensKitPredictor(algo, name, params, **kwargs) 157 158 159def create_item_item(name: str, params: Dict[str, Any], **kwargs) -> LensKitPredictor: 160 """Create the ItemItem predictor. 161 162 Args: 163 name: the name of the algorithm. 164 params: containing the following name-value pairs: 165 max_neighbors(int): the maximum number of neighbors for scoring each item. 166 min_neighbors(int): the minimum number of neighbors for scoring each item. 167 min_similarity(float): minimum similarity threshold for considering a neighbor. 168 169 Keyword Args: 170 num_threads(int): the max number of threads the algorithm can use. 171 rating_type(str): the rating type on how feedback should be interpreted. 172 173 Returns: 174 the LensKitPredictor wrapper of ItemItem. 175 """ 176 algo = lenskit_algorithms.create_item_item(params, kwargs['rating_type']) 177 return LensKitPredictor(algo, name, params, **kwargs) 178 179 180def create_pop_score(name: str, params: Dict[str, Any], **kwargs) -> LensKitPredictor: 181 """Create the PopScore predictor. 182 183 Args: 184 name: the name of the algorithm. 185 params: containing the following name-value pairs: 186 score_method(str): for computing popularity scores ('quantile', 'rank' or 'count'). 187 188 Keyword Args: 189 num_threads(int): the max number of threads the algorithm can use. 190 191 Returns: 192 the LensKitPredictor wrapper of PopScore. 193 """ 194 algo = lenskit_algorithms.create_pop_score(params) 195 return LensKitPredictor(algo, name, params, **kwargs) 196 197 198def create_user_user(name: str, params: Dict[str, Any], **kwargs) -> LensKitPredictor: 199 """Create the UserUser predictor. 200 201 Args: 202 name: the name of the algorithm. 203 params: containing the following name-value pairs: 204 max_neighbors(int): the maximum number of neighbors for scoring each item. 205 min_neighbors(int): the minimum number of neighbors for scoring each item. 206 min_similarity(float): minimum similarity threshold for considering a neighbor. 207 208 Keyword Args: 209 num_threads(int): the max number of threads the algorithm can use. 210 rating_type(str): the rating type on how feedback should be interpreted. 211 212 Returns: 213 the LensKitPredictor wrapper of UserUser. 214 """ 215 algo = lenskit_algorithms.create_user_user(params, kwargs['rating_type']) 216 return LensKitPredictor(algo, name, params, **kwargs)
31class LensKitPredictor(Predictor): 32 """Predictor implementation for the LensKit framework.""" 33 34 def __init__(self, algo: lenskit.Predictor, name: str, params: Dict[str, Any], **kwargs): 35 """Construct the lenskit predictor. 36 37 Args: 38 algo: the lenskit prediction algorithm. 39 name: the name of the predictor. 40 params: the parameters of the predictor. 41 42 Keyword Args: 43 num_threads(int): the max number of threads the predictor can use. 44 """ 45 Predictor.__init__(self, name, params, kwargs['num_threads']) 46 self.algo = algo 47 48 def on_train(self, train_set: pd.DataFrame) -> None: 49 """Fit the lenskit algorithm on the train set. 50 51 The predictor should be trained with a dataframe matrix. 52 53 Args: 54 train_set: the set to train the predictor with. 55 56 Raises: 57 ArithmeticError: possibly raised by an algorithm on training. 58 MemoryError: possibly raised by an algorithm on training. 59 RuntimeError: possibly raised by an algorithm on training. 60 TypeError: when the train set is not a pandas dataframe. 61 """ 62 if not isinstance(train_set, pd.DataFrame): 63 raise TypeError('Expected predictor to be trained with a dataframe matrix') 64 65 self.algo.fit(train_set) 66 67 def on_predict(self, user: int, item: int) -> float: 68 """Compute a prediction for the specified user and item. 69 70 Lenskit predictors allow for predicting multiple items at the same time. 71 To conform with the interface only one item needs to be predicted and all 72 the extra data that it generates needs to be excluded. 73 74 Args: 75 user: the user ID. 76 item: the item ID. 77 78 Raises: 79 ArithmeticError: possibly raised by a predictor on testing. 80 MemoryError: possibly raised by a predictor on testing. 81 RuntimeError: when the predictor is not trained yet. 82 83 Returns: 84 the predicted rating. 85 """ 86 prediction = self.algo.predict_for_user(user, [item]) 87 return prediction[item] 88 89 def on_predict_batch(self, user_item_pairs: pd.DataFrame) -> pd.DataFrame: 90 """Compute the predictions for each of the specified user and item pairs. 91 92 Lenskit predictors have a batch implementation available that allows for 93 predicting ratings using multiple 'jobs'. 94 95 Args: 96 user_item_pairs: with at least two columns: 'user', 'item'. 97 98 Raises: 99 ArithmeticError: possibly raised by a predictor on testing. 100 MemoryError: possibly raised by a predictor on testing. 101 RuntimeError: when the predictor is not trained yet. 102 103 Returns: 104 dataFrame with the columns: 'user', 'item', 'prediction'. 105 """ 106 n_jobs = self.num_threads if self.num_threads > 0 else None 107 predictions = batch.predict(self.algo, user_item_pairs, n_jobs=n_jobs) 108 return predictions[['user', 'item', 'prediction']]
Predictor implementation for the LensKit framework.
34 def __init__(self, algo: lenskit.Predictor, name: str, params: Dict[str, Any], **kwargs): 35 """Construct the lenskit predictor. 36 37 Args: 38 algo: the lenskit prediction algorithm. 39 name: the name of the predictor. 40 params: the parameters of the predictor. 41 42 Keyword Args: 43 num_threads(int): the max number of threads the predictor can use. 44 """ 45 Predictor.__init__(self, name, params, kwargs['num_threads']) 46 self.algo = algo
Construct the lenskit predictor.
Args: algo: the lenskit prediction algorithm. name: the name of the predictor. params: the parameters of the predictor.
Keyword Args: num_threads(int): the max number of threads the predictor can use.
48 def on_train(self, train_set: pd.DataFrame) -> None: 49 """Fit the lenskit algorithm on the train set. 50 51 The predictor should be trained with a dataframe matrix. 52 53 Args: 54 train_set: the set to train the predictor with. 55 56 Raises: 57 ArithmeticError: possibly raised by an algorithm on training. 58 MemoryError: possibly raised by an algorithm on training. 59 RuntimeError: possibly raised by an algorithm on training. 60 TypeError: when the train set is not a pandas dataframe. 61 """ 62 if not isinstance(train_set, pd.DataFrame): 63 raise TypeError('Expected predictor to be trained with a dataframe matrix') 64 65 self.algo.fit(train_set)
Fit the lenskit algorithm on the train set.
The predictor should be trained with a dataframe matrix.
Args: train_set: the set to train the predictor with.
Raises: ArithmeticError: possibly raised by an algorithm on training. MemoryError: possibly raised by an algorithm on training. RuntimeError: possibly raised by an algorithm on training. TypeError: when the train set is not a pandas dataframe.
67 def on_predict(self, user: int, item: int) -> float: 68 """Compute a prediction for the specified user and item. 69 70 Lenskit predictors allow for predicting multiple items at the same time. 71 To conform with the interface only one item needs to be predicted and all 72 the extra data that it generates needs to be excluded. 73 74 Args: 75 user: the user ID. 76 item: the item ID. 77 78 Raises: 79 ArithmeticError: possibly raised by a predictor on testing. 80 MemoryError: possibly raised by a predictor on testing. 81 RuntimeError: when the predictor is not trained yet. 82 83 Returns: 84 the predicted rating. 85 """ 86 prediction = self.algo.predict_for_user(user, [item]) 87 return prediction[item]
Compute a prediction for the specified user and item.
Lenskit predictors allow for predicting multiple items at the same time. To conform with the interface only one item needs to be predicted and all the extra data that it generates needs to be excluded.
Args: user: the user ID. item: the item ID.
Raises: ArithmeticError: possibly raised by a predictor on testing. MemoryError: possibly raised by a predictor on testing. RuntimeError: when the predictor is not trained yet.
Returns: the predicted rating.
89 def on_predict_batch(self, user_item_pairs: pd.DataFrame) -> pd.DataFrame: 90 """Compute the predictions for each of the specified user and item pairs. 91 92 Lenskit predictors have a batch implementation available that allows for 93 predicting ratings using multiple 'jobs'. 94 95 Args: 96 user_item_pairs: with at least two columns: 'user', 'item'. 97 98 Raises: 99 ArithmeticError: possibly raised by a predictor on testing. 100 MemoryError: possibly raised by a predictor on testing. 101 RuntimeError: when the predictor is not trained yet. 102 103 Returns: 104 dataFrame with the columns: 'user', 'item', 'prediction'. 105 """ 106 n_jobs = self.num_threads if self.num_threads > 0 else None 107 predictions = batch.predict(self.algo, user_item_pairs, n_jobs=n_jobs) 108 return predictions[['user', 'item', 'prediction']]
Compute the predictions for each of the specified user and item pairs.
Lenskit predictors have a batch implementation available that allows for predicting ratings using multiple 'jobs'.
Args: user_item_pairs: with at least two columns: 'user', 'item'.
Raises: ArithmeticError: possibly raised by a predictor on testing. MemoryError: possibly raised by a predictor on testing. RuntimeError: when the predictor is not trained yet.
Returns: dataFrame with the columns: 'user', 'item', 'prediction'.
111def create_biased_mf(name: str, params: Dict[str, Any], **kwargs) -> LensKitPredictor: 112 """Create the BiasedMF predictor. 113 114 Args: 115 name: the name of the algorithm. 116 params: containing the following name-value pairs: 117 features(int): the number of features to train. 118 iterations(int): the number of iterations to train. 119 user_reg(float): the regularization factor for users. 120 item_reg(float): the regularization factor for items. 121 damping(float): damping factor for the underlying bias. 122 method(str): the solver to use ('cd' or 'lu'). 123 random_seed(int): the random seed or None for the current time as seed. 124 125 Keyword Args: 126 num_threads(int): the max number of threads the algorithm can use. 127 128 Returns: 129 the LensKitPredictor wrapper of BiasedMF. 130 """ 131 algo = lenskit_algorithms.create_biased_mf(params) 132 return LensKitPredictor(algo, name, params, **kwargs)
Create the BiasedMF predictor.
Args: name: the name of the algorithm. params: containing the following name-value pairs: features(int): the number of features to train. iterations(int): the number of iterations to train. user_reg(float): the regularization factor for users. item_reg(float): the regularization factor for items. damping(float): damping factor for the underlying bias. method(str): the solver to use ('cd' or 'lu'). random_seed(int): the random seed or None for the current time as seed.
Keyword Args: num_threads(int): the max number of threads the algorithm can use.
Returns: the LensKitPredictor wrapper of BiasedMF.
135def create_implicit_mf(name: str, params: Dict[str, Any], **kwargs) -> LensKitPredictor: 136 """Create the ImplicitMF predictor. 137 138 Args: 139 name: the name of the algorithm. 140 params: containing the following name-value pairs: 141 features(int): the number of features to train. 142 iterations(int): the number of iterations to train. 143 reg(float): the regularization factor. 144 weight(flot): the scaling weight for positive samples. 145 use_ratings(bool): whether to use the rating column or treat 146 every rated user-item pair as having a rating of 1. 147 method(str): the training method ('cg' or 'lu'). 148 random_seed(int): the random seed or None for the current time as seed. 149 150 Keyword Args: 151 num_threads(int): the max number of threads the algorithm can use. 152 153 Returns: 154 the LensKitPredictor wrapper of ImplicitMF. 155 """ 156 algo = lenskit_algorithms.create_implicit_mf(params) 157 return LensKitPredictor(algo, name, params, **kwargs)
Create the ImplicitMF predictor.
Args: name: the name of the algorithm. params: containing the following name-value pairs: features(int): the number of features to train. iterations(int): the number of iterations to train. reg(float): the regularization factor. weight(flot): the scaling weight for positive samples. use_ratings(bool): whether to use the rating column or treat every rated user-item pair as having a rating of 1. method(str): the training method ('cg' or 'lu'). random_seed(int): the random seed or None for the current time as seed.
Keyword Args: num_threads(int): the max number of threads the algorithm can use.
Returns: the LensKitPredictor wrapper of ImplicitMF.
160def create_item_item(name: str, params: Dict[str, Any], **kwargs) -> LensKitPredictor: 161 """Create the ItemItem predictor. 162 163 Args: 164 name: the name of the algorithm. 165 params: containing the following name-value pairs: 166 max_neighbors(int): the maximum number of neighbors for scoring each item. 167 min_neighbors(int): the minimum number of neighbors for scoring each item. 168 min_similarity(float): minimum similarity threshold for considering a neighbor. 169 170 Keyword Args: 171 num_threads(int): the max number of threads the algorithm can use. 172 rating_type(str): the rating type on how feedback should be interpreted. 173 174 Returns: 175 the LensKitPredictor wrapper of ItemItem. 176 """ 177 algo = lenskit_algorithms.create_item_item(params, kwargs['rating_type']) 178 return LensKitPredictor(algo, name, params, **kwargs)
Create the ItemItem predictor.
Args: name: the name of the algorithm. params: containing the following name-value pairs: max_neighbors(int): the maximum number of neighbors for scoring each item. min_neighbors(int): the minimum number of neighbors for scoring each item. min_similarity(float): minimum similarity threshold for considering a neighbor.
Keyword Args: num_threads(int): the max number of threads the algorithm can use. rating_type(str): the rating type on how feedback should be interpreted.
Returns: the LensKitPredictor wrapper of ItemItem.
181def create_pop_score(name: str, params: Dict[str, Any], **kwargs) -> LensKitPredictor: 182 """Create the PopScore predictor. 183 184 Args: 185 name: the name of the algorithm. 186 params: containing the following name-value pairs: 187 score_method(str): for computing popularity scores ('quantile', 'rank' or 'count'). 188 189 Keyword Args: 190 num_threads(int): the max number of threads the algorithm can use. 191 192 Returns: 193 the LensKitPredictor wrapper of PopScore. 194 """ 195 algo = lenskit_algorithms.create_pop_score(params) 196 return LensKitPredictor(algo, name, params, **kwargs)
Create the PopScore predictor.
Args: name: the name of the algorithm. params: containing the following name-value pairs: score_method(str): for computing popularity scores ('quantile', 'rank' or 'count').
Keyword Args: num_threads(int): the max number of threads the algorithm can use.
Returns: the LensKitPredictor wrapper of PopScore.
199def create_user_user(name: str, params: Dict[str, Any], **kwargs) -> LensKitPredictor: 200 """Create the UserUser predictor. 201 202 Args: 203 name: the name of the algorithm. 204 params: containing the following name-value pairs: 205 max_neighbors(int): the maximum number of neighbors for scoring each item. 206 min_neighbors(int): the minimum number of neighbors for scoring each item. 207 min_similarity(float): minimum similarity threshold for considering a neighbor. 208 209 Keyword Args: 210 num_threads(int): the max number of threads the algorithm can use. 211 rating_type(str): the rating type on how feedback should be interpreted. 212 213 Returns: 214 the LensKitPredictor wrapper of UserUser. 215 """ 216 algo = lenskit_algorithms.create_user_user(params, kwargs['rating_type']) 217 return LensKitPredictor(algo, name, params, **kwargs)
Create the UserUser predictor.
Args: name: the name of the algorithm. params: containing the following name-value pairs: max_neighbors(int): the maximum number of neighbors for scoring each item. min_neighbors(int): the minimum number of neighbors for scoring each item. min_similarity(float): minimum similarity threshold for considering a neighbor.
Keyword Args: num_threads(int): the max number of threads the algorithm can use. rating_type(str): the rating type on how feedback should be interpreted.
Returns: the LensKitPredictor wrapper of UserUser.