前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >SPARK框架下实现CPM(派系协同过滤算法)

SPARK框架下实现CPM(派系协同过滤算法)

原创
作者头像
Donuts_choco
修改2020-05-16 14:54:04
7490
修改2020-05-16 14:54:04
举报
文章被收录于专栏:Django系统Django系统

社群发现算法实现:CPM,基于SPARK+SCALA+MAVEN+Hadoop

选择此框架实现原因:

(1)SPARK的Graphx对于图操作较为便捷。

(2)SPARK下的并行处理,便于对以后大型数据集进行扩充。

详情见我的Github:传送门

代码和数据集都在里面,需要的自己clone。

ps:对我的算法或结果有什么疑问,或者搬代码。请评论留言。

以下是我的Readme陈述算法思路,还没写完,先发上来,增加浏览量,之后部分我近几天补充。

💡 The latest and final version of this algorithm, its route is BK\src\main\scala\BBB\Final.scala ✂️ I'm so sorry that I donn't have sufficient time to sort up my code. I'll gradually update it in the next few days after my first Commit.😝

Community-detection

use [spark+scala] to implement CPM(Cluster Percolation Method)😃 As we konw, the CPM algorithm has three steps on the whole, but I have made a little alteration for the dataset and my project. My algorithm process is shown below.

1.Find all kliques of size k(or larger than k) in the graph, As a result, Bron-kerbosch algorithm is an efficient choice for this step. ps: There are two versions of BK that are most commonly used: (1)the basic form; (2)with pivoting. For my experience, the second one has a better performance 😏

2.Fliter noisy nodes to make the final evaluation(use the modularity) valuable, so I create a new subgraph based on the initial graph. ps:the noisy nodes are refer to the ones who has few connections with others (such as: the isolated points)

3.Statistic whether these two maximal cliques are 'related'(if they contain the same nodes more than n), If they are 'related', they could be merged together.

😕ps: (1)the author of the paper use the matrix to finally merge all the 'related' maximal cliques together, I use the set to store which maximal cliques are 'related' with the current one. However, this alteratin doesn't make any difference essentially. (2)n is just a variable, you can adjust it with your result. ☺️

4.Use the ans of step 3 to merge the 'related' maximal cliques into one community.

5.Use the modularity to evaluate the outcome.

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' 👊 Now I generalize my function in this project (1)bronKerboschl the basic form of bronkerbosch algorithm (2)bronkerbosch2 the bronkerbosch algorithm with pivoting

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Community-detection
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档