amazon recommendation system

 
(original article) 
http://spectrum.ieee.org/computing/software/deconstructing-recommender-systems 
 
Recommender system is relevant for increasing revenues. some say it's 10-30% increase. 
In a way, it is a behavior prediction system. 
 
Amazon, Netflix, Pandora, Youtube, they all use it. 
 
e.g. 
- online retail site: Amazon site: people who viewed/bought this viewed/bought these. 
- university : steer students to courses. 
- cell phone companies : predict users likely shift to other carriers. 
- conference organizers assign papers to appropriate reviewers. 
- facebook friend suggestion 
 

#  basic idea of the recommender system 

 
row = user 
column = attributes about user. e.g. clicked pages, purchase records, rating on certain moview/music/book, which image you enlarged, how many times, what item you put into wish lists, etc 
 
===> helps the recommender system to determine what to suggest/email you. 
 
 
(1) user-user algo 
 
find users who share similarity in their attributes, then recommend a product to a user who has not bought what everyone else bought who are similar to that user. 
 
====> data set is not enough. not many users have a versatile enough pool of data in common to measure a meaningful distance between them. 
====> also, for a blockbuster movie, everyone gives s high rating, which is not useful. 
====> these attributes have to be updated and distance hast to be calculated real time. 
 
 
(2) item-item algo 
 
distance between certain books, music, movies based on rating by millions of users. 
 
e.g. people who rate movie A high rate movie B high also. 
 
===> data set is abandunt 
===> distance can be pre-computed 
(both amazon & netflix use this algo, though details are not disclosed.) 
 
 
=====> both user-user & item-item algo have a problem: user preference(raiting) changes over time. 
=====> thats why sometimes you get asked to rate stuff again. 
 
 

# dimensionality reduction 

 
instead of having so many columns (say movies,books) for each user's rating, just categorize columns into genre. e.g. action, horror, history, etc. 
 
==> better abstraction. but computationally intensive. 
==> there is a mathematical approach to this categorization(dimenstionality reduction) 
 
(quote) 
"technique called singular value decomposition to compute the dimensions. The technique involves factoring the original giant matrix into two taste matrices, one that includes all the users and the 100 taste dimensions and another that includes all the foods and the 100 taste dimensions plus a third matrix that, when multiplied by either of the other two, re-creates the original matrix." 
 
 
 

#  goals 

 
(1) figuring out what users may buy 
(2) increasing profits while meeting business requirements/rules 
 
- use of browser cookies enable amazon to link an anonymous shopper to someone when he creates an account. 
- cannot increase prices for users who would likely buy products anyway (got hit in 2000, and stopped since.) 
- certain exclusion rules are needed. cannot recommend what everyone likes. anyone who searches gang moviews is likely to have already seen godfathers. 
- recommend products that are abandunt in stock or high-margin products.  ---- is it a good strategy? (if you do this too much, customer trust will be lost) 
- something you bought as "gift for others" shouldnt be part of the calculation of your favorite stuff. 
- feedback is extremely useful. let user click "not interested" "i own a copy already" 
 
- evaluation of recommend algo is hard. because even if users buy recommended items, they might have done so anyway without recommendation, in which case, they didnt need the recommendation. 
--- one way is to more objectively measure the predicted rating against the user's actual rating. 
----- especially, high rated stuff is important (because users might buy) while low rated items dont matter. 
 
- serendipity/diversity (still an open topic in reco system) 
 

  1. 2014-10-28 20:03:13 |
  2. Category : misc
  3. Page View:

Google Ads