src.fairreckitlib.model.algorithms.lenskit.lenskit_predictor

This module contains the lenskit predictor and creation functions.

Classes:

LensKitPredictor: predictor implementation for lenskit.

Functions:

create_biased_mf: create BiasedMF predictor (factory creation compatible).
create_implicit_mf: create ImplicitMF predictor (factory creation compatible).
create_item_item: create ItemItem predictor (factory creation compatible).
create_pop_score: create PopScore predictor (factory creation compatible).
create_user_user: create UserUser predictor (factory creation compatible).

This program has been developed by students from the bachelor Computer Science at Utrecht University within the Software Project course. © Copyright Utrecht University (Department of Information and Computing Sciences)

  1"""This module contains the lenskit predictor and creation functions.
  2
  3Classes:
  4
  5    LensKitPredictor: predictor implementation for lenskit.
  6
  7Functions:
  8
  9    create_biased_mf: create BiasedMF predictor (factory creation compatible).
 10    create_implicit_mf: create ImplicitMF predictor (factory creation compatible).
 11    create_item_item: create ItemItem predictor (factory creation compatible).
 12    create_pop_score: create PopScore predictor (factory creation compatible).
 13    create_user_user: create UserUser predictor (factory creation compatible).
 14
 15This program has been developed by students from the bachelor Computer Science at
 16Utrecht University within the Software Project course.
 17© Copyright Utrecht University (Department of Information and Computing Sciences)
 18"""
 19
 20from typing import Any, Dict
 21
 22import lenskit
 23from lenskit import batch
 24import pandas as pd
 25
 26from ..base_predictor import Predictor
 27from . import lenskit_algorithms
 28
 29
 30class LensKitPredictor(Predictor):
 31    """Predictor implementation for the LensKit framework."""
 32
 33    def __init__(self, algo: lenskit.Predictor, name: str, params: Dict[str, Any], **kwargs):
 34        """Construct the lenskit predictor.
 35
 36        Args:
 37            algo: the lenskit prediction algorithm.
 38            name: the name of the predictor.
 39            params: the parameters of the predictor.
 40
 41        Keyword Args:
 42            num_threads(int): the max number of threads the predictor can use.
 43        """
 44        Predictor.__init__(self, name, params, kwargs['num_threads'])
 45        self.algo = algo
 46
 47    def on_train(self, train_set: pd.DataFrame) -> None:
 48        """Fit the lenskit algorithm on the train set.
 49
 50        The predictor should be trained with a dataframe matrix.
 51
 52        Args:
 53            train_set: the set to train the predictor with.
 54
 55        Raises:
 56            ArithmeticError: possibly raised by an algorithm on training.
 57            MemoryError: possibly raised by an algorithm on training.
 58            RuntimeError: possibly raised by an algorithm on training.
 59            TypeError: when the train set is not a pandas dataframe.
 60        """
 61        if not isinstance(train_set, pd.DataFrame):
 62            raise TypeError('Expected predictor to be trained with a dataframe matrix')
 63
 64        self.algo.fit(train_set)
 65
 66    def on_predict(self, user: int, item: int) -> float:
 67        """Compute a prediction for the specified user and item.
 68
 69        Lenskit predictors allow for predicting multiple items at the same time.
 70        To conform with the interface only one item needs to be predicted and all
 71        the extra data that it generates needs to be excluded.
 72
 73        Args:
 74            user: the user ID.
 75            item: the item ID.
 76
 77        Raises:
 78            ArithmeticError: possibly raised by a predictor on testing.
 79            MemoryError: possibly raised by a predictor on testing.
 80            RuntimeError: when the predictor is not trained yet.
 81
 82        Returns:
 83            the predicted rating.
 84        """
 85        prediction = self.algo.predict_for_user(user, [item])
 86        return prediction[item]
 87
 88    def on_predict_batch(self, user_item_pairs: pd.DataFrame) -> pd.DataFrame:
 89        """Compute the predictions for each of the specified user and item pairs.
 90
 91        Lenskit predictors have a batch implementation available that allows for
 92        predicting ratings using multiple 'jobs'.
 93
 94        Args:
 95            user_item_pairs: with at least two columns: 'user', 'item'.
 96
 97        Raises:
 98            ArithmeticError: possibly raised by a predictor on testing.
 99            MemoryError: possibly raised by a predictor on testing.
100            RuntimeError: when the predictor is not trained yet.
101
102        Returns:
103            dataFrame with the columns: 'user', 'item', 'prediction'.
104        """
105        n_jobs = self.num_threads if self.num_threads > 0 else None
106        predictions = batch.predict(self.algo, user_item_pairs, n_jobs=n_jobs)
107        return predictions[['user', 'item', 'prediction']]
108
109
110def create_biased_mf(name: str, params: Dict[str, Any], **kwargs) -> LensKitPredictor:
111    """Create the BiasedMF predictor.
112
113    Args:
114        name: the name of the algorithm.
115        params: containing the following name-value pairs:
116            features(int): the number of features to train.
117            iterations(int): the number of iterations to train.
118            user_reg(float): the regularization factor for users.
119            item_reg(float): the regularization factor for items.
120            damping(float): damping factor for the underlying bias.
121            method(str): the solver to use ('cd' or 'lu').
122            random_seed(int): the random seed or None for the current time as seed.
123
124    Keyword Args:
125        num_threads(int): the max number of threads the algorithm can use.
126
127    Returns:
128        the LensKitPredictor wrapper of BiasedMF.
129    """
130    algo = lenskit_algorithms.create_biased_mf(params)
131    return LensKitPredictor(algo, name, params, **kwargs)
132
133
134def create_implicit_mf(name: str, params: Dict[str, Any], **kwargs) -> LensKitPredictor:
135    """Create the ImplicitMF predictor.
136
137    Args:
138        name: the name of the algorithm.
139        params: containing the following name-value pairs:
140            features(int): the number of features to train.
141            iterations(int): the number of iterations to train.
142            reg(float): the regularization factor.
143            weight(flot): the scaling weight for positive samples.
144            use_ratings(bool): whether to use the rating column or treat
145                every rated user-item pair as having a rating of 1.
146            method(str): the training method ('cg' or 'lu').
147            random_seed(int): the random seed or None for the current time as seed.
148
149    Keyword Args:
150        num_threads(int): the max number of threads the algorithm can use.
151
152    Returns:
153        the LensKitPredictor wrapper of ImplicitMF.
154    """
155    algo = lenskit_algorithms.create_implicit_mf(params)
156    return LensKitPredictor(algo, name, params, **kwargs)
157
158
159def create_item_item(name: str, params: Dict[str, Any], **kwargs) -> LensKitPredictor:
160    """Create the ItemItem predictor.
161
162    Args:
163        name: the name of the algorithm.
164        params: containing the following name-value pairs:
165            max_neighbors(int): the maximum number of neighbors for scoring each item.
166            min_neighbors(int): the minimum number of neighbors for scoring each item.
167            min_similarity(float): minimum similarity threshold for considering a neighbor.
168
169    Keyword Args:
170        num_threads(int): the max number of threads the algorithm can use.
171        rating_type(str): the rating type on how feedback should be interpreted.
172
173    Returns:
174        the LensKitPredictor wrapper of ItemItem.
175    """
176    algo = lenskit_algorithms.create_item_item(params, kwargs['rating_type'])
177    return LensKitPredictor(algo, name, params, **kwargs)
178
179
180def create_pop_score(name: str, params: Dict[str, Any], **kwargs) -> LensKitPredictor:
181    """Create the PopScore predictor.
182
183    Args:
184        name: the name of the algorithm.
185        params: containing the following name-value pairs:
186            score_method(str): for computing popularity scores ('quantile', 'rank' or 'count').
187
188    Keyword Args:
189        num_threads(int): the max number of threads the algorithm can use.
190
191    Returns:
192        the LensKitPredictor wrapper of PopScore.
193    """
194    algo = lenskit_algorithms.create_pop_score(params)
195    return LensKitPredictor(algo, name, params, **kwargs)
196
197
198def create_user_user(name: str, params: Dict[str, Any], **kwargs) -> LensKitPredictor:
199    """Create the UserUser predictor.
200
201    Args:
202        name: the name of the algorithm.
203        params: containing the following name-value pairs:
204            max_neighbors(int): the maximum number of neighbors for scoring each item.
205            min_neighbors(int): the minimum number of neighbors for scoring each item.
206            min_similarity(float): minimum similarity threshold for considering a neighbor.
207
208    Keyword Args:
209        num_threads(int): the max number of threads the algorithm can use.
210        rating_type(str): the rating type on how feedback should be interpreted.
211
212    Returns:
213        the LensKitPredictor wrapper of UserUser.
214    """
215    algo = lenskit_algorithms.create_user_user(params, kwargs['rating_type'])
216    return LensKitPredictor(algo, name, params, **kwargs)
 31class LensKitPredictor(Predictor):
 32    """Predictor implementation for the LensKit framework."""
 33
 34    def __init__(self, algo: lenskit.Predictor, name: str, params: Dict[str, Any], **kwargs):
 35        """Construct the lenskit predictor.
 36
 37        Args:
 38            algo: the lenskit prediction algorithm.
 39            name: the name of the predictor.
 40            params: the parameters of the predictor.
 41
 42        Keyword Args:
 43            num_threads(int): the max number of threads the predictor can use.
 44        """
 45        Predictor.__init__(self, name, params, kwargs['num_threads'])
 46        self.algo = algo
 47
 48    def on_train(self, train_set: pd.DataFrame) -> None:
 49        """Fit the lenskit algorithm on the train set.
 50
 51        The predictor should be trained with a dataframe matrix.
 52
 53        Args:
 54            train_set: the set to train the predictor with.
 55
 56        Raises:
 57            ArithmeticError: possibly raised by an algorithm on training.
 58            MemoryError: possibly raised by an algorithm on training.
 59            RuntimeError: possibly raised by an algorithm on training.
 60            TypeError: when the train set is not a pandas dataframe.
 61        """
 62        if not isinstance(train_set, pd.DataFrame):
 63            raise TypeError('Expected predictor to be trained with a dataframe matrix')
 64
 65        self.algo.fit(train_set)
 66
 67    def on_predict(self, user: int, item: int) -> float:
 68        """Compute a prediction for the specified user and item.
 69
 70        Lenskit predictors allow for predicting multiple items at the same time.
 71        To conform with the interface only one item needs to be predicted and all
 72        the extra data that it generates needs to be excluded.
 73
 74        Args:
 75            user: the user ID.
 76            item: the item ID.
 77
 78        Raises:
 79            ArithmeticError: possibly raised by a predictor on testing.
 80            MemoryError: possibly raised by a predictor on testing.
 81            RuntimeError: when the predictor is not trained yet.
 82
 83        Returns:
 84            the predicted rating.
 85        """
 86        prediction = self.algo.predict_for_user(user, [item])
 87        return prediction[item]
 88
 89    def on_predict_batch(self, user_item_pairs: pd.DataFrame) -> pd.DataFrame:
 90        """Compute the predictions for each of the specified user and item pairs.
 91
 92        Lenskit predictors have a batch implementation available that allows for
 93        predicting ratings using multiple 'jobs'.
 94
 95        Args:
 96            user_item_pairs: with at least two columns: 'user', 'item'.
 97
 98        Raises:
 99            ArithmeticError: possibly raised by a predictor on testing.
100            MemoryError: possibly raised by a predictor on testing.
101            RuntimeError: when the predictor is not trained yet.
102
103        Returns:
104            dataFrame with the columns: 'user', 'item', 'prediction'.
105        """
106        n_jobs = self.num_threads if self.num_threads > 0 else None
107        predictions = batch.predict(self.algo, user_item_pairs, n_jobs=n_jobs)
108        return predictions[['user', 'item', 'prediction']]

Predictor implementation for the LensKit framework.

LensKitPredictor( algo: lenskit.algorithms.Predictor, name: str, params: Dict[str, Any], **kwargs)
34    def __init__(self, algo: lenskit.Predictor, name: str, params: Dict[str, Any], **kwargs):
35        """Construct the lenskit predictor.
36
37        Args:
38            algo: the lenskit prediction algorithm.
39            name: the name of the predictor.
40            params: the parameters of the predictor.
41
42        Keyword Args:
43            num_threads(int): the max number of threads the predictor can use.
44        """
45        Predictor.__init__(self, name, params, kwargs['num_threads'])
46        self.algo = algo

Construct the lenskit predictor.

Args: algo: the lenskit prediction algorithm. name: the name of the predictor. params: the parameters of the predictor.

Keyword Args: num_threads(int): the max number of threads the predictor can use.

def on_train(self, train_set: pandas.core.frame.DataFrame) -> None:
48    def on_train(self, train_set: pd.DataFrame) -> None:
49        """Fit the lenskit algorithm on the train set.
50
51        The predictor should be trained with a dataframe matrix.
52
53        Args:
54            train_set: the set to train the predictor with.
55
56        Raises:
57            ArithmeticError: possibly raised by an algorithm on training.
58            MemoryError: possibly raised by an algorithm on training.
59            RuntimeError: possibly raised by an algorithm on training.
60            TypeError: when the train set is not a pandas dataframe.
61        """
62        if not isinstance(train_set, pd.DataFrame):
63            raise TypeError('Expected predictor to be trained with a dataframe matrix')
64
65        self.algo.fit(train_set)

Fit the lenskit algorithm on the train set.

The predictor should be trained with a dataframe matrix.

Args: train_set: the set to train the predictor with.

Raises: ArithmeticError: possibly raised by an algorithm on training. MemoryError: possibly raised by an algorithm on training. RuntimeError: possibly raised by an algorithm on training. TypeError: when the train set is not a pandas dataframe.

def on_predict(self, user: int, item: int) -> float:
67    def on_predict(self, user: int, item: int) -> float:
68        """Compute a prediction for the specified user and item.
69
70        Lenskit predictors allow for predicting multiple items at the same time.
71        To conform with the interface only one item needs to be predicted and all
72        the extra data that it generates needs to be excluded.
73
74        Args:
75            user: the user ID.
76            item: the item ID.
77
78        Raises:
79            ArithmeticError: possibly raised by a predictor on testing.
80            MemoryError: possibly raised by a predictor on testing.
81            RuntimeError: when the predictor is not trained yet.
82
83        Returns:
84            the predicted rating.
85        """
86        prediction = self.algo.predict_for_user(user, [item])
87        return prediction[item]

Compute a prediction for the specified user and item.

Lenskit predictors allow for predicting multiple items at the same time. To conform with the interface only one item needs to be predicted and all the extra data that it generates needs to be excluded.

Args: user: the user ID. item: the item ID.

Raises: ArithmeticError: possibly raised by a predictor on testing. MemoryError: possibly raised by a predictor on testing. RuntimeError: when the predictor is not trained yet.

Returns: the predicted rating.

def on_predict_batch( self, user_item_pairs: pandas.core.frame.DataFrame) -> pandas.core.frame.DataFrame:
 89    def on_predict_batch(self, user_item_pairs: pd.DataFrame) -> pd.DataFrame:
 90        """Compute the predictions for each of the specified user and item pairs.
 91
 92        Lenskit predictors have a batch implementation available that allows for
 93        predicting ratings using multiple 'jobs'.
 94
 95        Args:
 96            user_item_pairs: with at least two columns: 'user', 'item'.
 97
 98        Raises:
 99            ArithmeticError: possibly raised by a predictor on testing.
100            MemoryError: possibly raised by a predictor on testing.
101            RuntimeError: when the predictor is not trained yet.
102
103        Returns:
104            dataFrame with the columns: 'user', 'item', 'prediction'.
105        """
106        n_jobs = self.num_threads if self.num_threads > 0 else None
107        predictions = batch.predict(self.algo, user_item_pairs, n_jobs=n_jobs)
108        return predictions[['user', 'item', 'prediction']]

Compute the predictions for each of the specified user and item pairs.

Lenskit predictors have a batch implementation available that allows for predicting ratings using multiple 'jobs'.

Args: user_item_pairs: with at least two columns: 'user', 'item'.

Raises: ArithmeticError: possibly raised by a predictor on testing. MemoryError: possibly raised by a predictor on testing. RuntimeError: when the predictor is not trained yet.

Returns: dataFrame with the columns: 'user', 'item', 'prediction'.

def create_biased_mf( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.lenskit.lenskit_predictor.LensKitPredictor:
111def create_biased_mf(name: str, params: Dict[str, Any], **kwargs) -> LensKitPredictor:
112    """Create the BiasedMF predictor.
113
114    Args:
115        name: the name of the algorithm.
116        params: containing the following name-value pairs:
117            features(int): the number of features to train.
118            iterations(int): the number of iterations to train.
119            user_reg(float): the regularization factor for users.
120            item_reg(float): the regularization factor for items.
121            damping(float): damping factor for the underlying bias.
122            method(str): the solver to use ('cd' or 'lu').
123            random_seed(int): the random seed or None for the current time as seed.
124
125    Keyword Args:
126        num_threads(int): the max number of threads the algorithm can use.
127
128    Returns:
129        the LensKitPredictor wrapper of BiasedMF.
130    """
131    algo = lenskit_algorithms.create_biased_mf(params)
132    return LensKitPredictor(algo, name, params, **kwargs)

Create the BiasedMF predictor.

Args: name: the name of the algorithm. params: containing the following name-value pairs: features(int): the number of features to train. iterations(int): the number of iterations to train. user_reg(float): the regularization factor for users. item_reg(float): the regularization factor for items. damping(float): damping factor for the underlying bias. method(str): the solver to use ('cd' or 'lu'). random_seed(int): the random seed or None for the current time as seed.

Keyword Args: num_threads(int): the max number of threads the algorithm can use.

Returns: the LensKitPredictor wrapper of BiasedMF.

def create_implicit_mf( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.lenskit.lenskit_predictor.LensKitPredictor:
135def create_implicit_mf(name: str, params: Dict[str, Any], **kwargs) -> LensKitPredictor:
136    """Create the ImplicitMF predictor.
137
138    Args:
139        name: the name of the algorithm.
140        params: containing the following name-value pairs:
141            features(int): the number of features to train.
142            iterations(int): the number of iterations to train.
143            reg(float): the regularization factor.
144            weight(flot): the scaling weight for positive samples.
145            use_ratings(bool): whether to use the rating column or treat
146                every rated user-item pair as having a rating of 1.
147            method(str): the training method ('cg' or 'lu').
148            random_seed(int): the random seed or None for the current time as seed.
149
150    Keyword Args:
151        num_threads(int): the max number of threads the algorithm can use.
152
153    Returns:
154        the LensKitPredictor wrapper of ImplicitMF.
155    """
156    algo = lenskit_algorithms.create_implicit_mf(params)
157    return LensKitPredictor(algo, name, params, **kwargs)

Create the ImplicitMF predictor.

Args: name: the name of the algorithm. params: containing the following name-value pairs: features(int): the number of features to train. iterations(int): the number of iterations to train. reg(float): the regularization factor. weight(flot): the scaling weight for positive samples. use_ratings(bool): whether to use the rating column or treat every rated user-item pair as having a rating of 1. method(str): the training method ('cg' or 'lu'). random_seed(int): the random seed or None for the current time as seed.

Keyword Args: num_threads(int): the max number of threads the algorithm can use.

Returns: the LensKitPredictor wrapper of ImplicitMF.

def create_item_item( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.lenskit.lenskit_predictor.LensKitPredictor:
160def create_item_item(name: str, params: Dict[str, Any], **kwargs) -> LensKitPredictor:
161    """Create the ItemItem predictor.
162
163    Args:
164        name: the name of the algorithm.
165        params: containing the following name-value pairs:
166            max_neighbors(int): the maximum number of neighbors for scoring each item.
167            min_neighbors(int): the minimum number of neighbors for scoring each item.
168            min_similarity(float): minimum similarity threshold for considering a neighbor.
169
170    Keyword Args:
171        num_threads(int): the max number of threads the algorithm can use.
172        rating_type(str): the rating type on how feedback should be interpreted.
173
174    Returns:
175        the LensKitPredictor wrapper of ItemItem.
176    """
177    algo = lenskit_algorithms.create_item_item(params, kwargs['rating_type'])
178    return LensKitPredictor(algo, name, params, **kwargs)

Create the ItemItem predictor.

Args: name: the name of the algorithm. params: containing the following name-value pairs: max_neighbors(int): the maximum number of neighbors for scoring each item. min_neighbors(int): the minimum number of neighbors for scoring each item. min_similarity(float): minimum similarity threshold for considering a neighbor.

Keyword Args: num_threads(int): the max number of threads the algorithm can use. rating_type(str): the rating type on how feedback should be interpreted.

Returns: the LensKitPredictor wrapper of ItemItem.

def create_pop_score( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.lenskit.lenskit_predictor.LensKitPredictor:
181def create_pop_score(name: str, params: Dict[str, Any], **kwargs) -> LensKitPredictor:
182    """Create the PopScore predictor.
183
184    Args:
185        name: the name of the algorithm.
186        params: containing the following name-value pairs:
187            score_method(str): for computing popularity scores ('quantile', 'rank' or 'count').
188
189    Keyword Args:
190        num_threads(int): the max number of threads the algorithm can use.
191
192    Returns:
193        the LensKitPredictor wrapper of PopScore.
194    """
195    algo = lenskit_algorithms.create_pop_score(params)
196    return LensKitPredictor(algo, name, params, **kwargs)

Create the PopScore predictor.

Args: name: the name of the algorithm. params: containing the following name-value pairs: score_method(str): for computing popularity scores ('quantile', 'rank' or 'count').

Keyword Args: num_threads(int): the max number of threads the algorithm can use.

Returns: the LensKitPredictor wrapper of PopScore.

def create_user_user( name: str, params: Dict[str, Any], **kwargs) -> src.fairreckitlib.model.algorithms.lenskit.lenskit_predictor.LensKitPredictor:
199def create_user_user(name: str, params: Dict[str, Any], **kwargs) -> LensKitPredictor:
200    """Create the UserUser predictor.
201
202    Args:
203        name: the name of the algorithm.
204        params: containing the following name-value pairs:
205            max_neighbors(int): the maximum number of neighbors for scoring each item.
206            min_neighbors(int): the minimum number of neighbors for scoring each item.
207            min_similarity(float): minimum similarity threshold for considering a neighbor.
208
209    Keyword Args:
210        num_threads(int): the max number of threads the algorithm can use.
211        rating_type(str): the rating type on how feedback should be interpreted.
212
213    Returns:
214        the LensKitPredictor wrapper of UserUser.
215    """
216    algo = lenskit_algorithms.create_user_user(params, kwargs['rating_type'])
217    return LensKitPredictor(algo, name, params, **kwargs)

Create the UserUser predictor.

Args: name: the name of the algorithm. params: containing the following name-value pairs: max_neighbors(int): the maximum number of neighbors for scoring each item. min_neighbors(int): the minimum number of neighbors for scoring each item. min_similarity(float): minimum similarity threshold for considering a neighbor.

Keyword Args: num_threads(int): the max number of threads the algorithm can use. rating_type(str): the rating type on how feedback should be interpreted.

Returns: the LensKitPredictor wrapper of UserUser.