推荐算法有基于协同的Collaboration Filtering:包括 user Based和item Based;基于内容 : Content Based

协同过滤包括基于物品的协同过滤和基于用户的协同过滤,本文基于电影评分数据做基于商品的推荐

SRE实战 互联网时代守护先锋,助力企业售后服务体系运筹帷幄!一键直达领取阿里云限量特价优惠。

查看数据u.data

 主要用到前3列分别指 用户编号user_id、电影编号item_id、用户对电影的打分score

这个文件主要用户构建物品的相似度矩阵

liuzhimin@ubuntu-2:~/workspace/jupyter_project/recommendation$ head  ./data/u.data
196	242	3	881250949
186	302	3	891717742
22	377	1	878887116
244	51	2	880606923
166	346	1	886397596
298	474	4	884182806
115	265	2	881171488
253	465	5	891628467
305	451	3	886324817
6	86	3	883603013

查看数据u.item

主要用到前两列:第一列是电影id item_id  第二列是电影名称

这个文件主要用于根据预测的推荐结果进行展示

ubuntu@ubuntu-2:~/workspace/jupyter_project/recommendation$ head  ./data/u.item 
1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)|0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0
2|GoldenEye (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?GoldenEye%20(1995)|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
3|Four Rooms (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
4|Get Shorty (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Get%20Shorty%20(1995)|0|1|0|0|0|1|0|0|1|0|0|0|0|0|0|0|0|0|0
5|Copycat (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Copycat%20(1995)|0|0|0|0|0|0|1|0|1|0|0|0|0|0|0|0|1|0|0
6|Shanghai Triad (Yao a yao yao dao waipo qiao) (1995)|01-Jan-1995||http://us.imdb.com/Title?Yao+a+yao+yao+dao+waipo+qiao+(1995)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0
7|Twelve Monkeys (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Twelve%20Monkeys%20(1995)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|1|0|0|0
8|Babe (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Babe%20(1995)|0|0|0|0|1|1|0|0|1|0|0|0|0|0|0|0|0|0|0
9|Dead Man Walking (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Dead%20Man%20Walking%20(1995)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0
10|Richard III (1995)|22-Jan-1996||http://us.imdb.com/M/title-exact?Richard%20III%20(1995)|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|1|0

 

代码如下

#-*-encoding = utf-8 -*-
import math
#读取数据 构建dic[每个人][看过的电影]=分数
def read_data(udata,uitem): user_movies={} movies={} for line in open(udata): user_id,movie_id,score = line.split("\t")[0:3] user_movies.setdefault(user_id,{}) user_movies[user_id][movie_id] = int(score) for line in open(uitem,encoding = "ISO-8859-1"): movie_id,movie_name = line.split("|")[:2] movies[movie_id] = movie_name return user_movies,movies

#依据dict[每个人][看过的电影] = 分数 构建物品相似度矩阵
def item_similarity(user_movies,k=0): user_movies = user_movies C={}#存放最终的物品相似度矩阵 N={}#存放每个电影的评分人数 for user,item in user_movies.items(): #print (user,"************************************") #print (item,"///////////////////") for i in item.keys(): N.setdefault(i,0) N[i]+=1 C.setdefault(i,{}) for j in item.keys(): if i == j : continue C[i].setdefault(j,0) C[i][j]+=1 W = {} #存放最终的物品余弦相似度矩阵 for i,related_items in C.items(): W.setdefault(i,{}) for j,cij in related_items.items(): W[i][j] = cij/(math.sqrt(N[i] * N[j])) return W

#计算推荐结果 K代表取每一个 def Recommend(user,user_movies,W,K,N): rank = {} #存放推荐计算结果 # action_item = user_movies[user] #存放用户看过的电影,及打分 action_item = user_movies[user] for item,score in action_item.items(): for j,wj in sorted(W[item].items(),key = lambda x:x[1],reverse=True)[0:5]: #j代表用户每一个电影的电影推荐,依据打分的倒排推荐 wj为分数 if j in action_item: #过滤掉推荐中看过的 continue rank.setdefault(j,0) rank[j] += float(score*wj) #每一个电影推荐的分数是 电影用户打分*矩阵相似分数 return dict(sorted(rank.items(),key = lambda x:x[1],reverse=True)[0:N])
if __name__ == "__main__":#主函数 #加载数据 user_movies,movies =read_data("./data/u.data","./data/u.item") #计算电影相似度 W=item_similarity(user_movies) #print (W) #计算推荐结果 result = Recommend("1",user_movies,W,5,5) for i,rating in result.items(): print (movies[i],rating)

 

 

  

扫码关注我们
微信号:SRE实战
拒绝背锅 运筹帷幄