SPARK框架下实现CPM（派系协同过滤算法）

原创

Donuts_choco

修改于 2020-05-16 14:54:04

7540

修改于 2020-05-16 14:54:04

文章被收录于专栏：Django系统

社群发现算法实现：CPM，基于SPARK+SCALA+MAVEN+Hadoop

选择此框架实现原因：

（1）SPARK的Graphx对于图操作较为便捷。

（2）SPARK下的并行处理，便于对以后大型数据集进行扩充。

详情见我的Github：传送门

代码和数据集都在里面，需要的自己clone。

ps:对我的算法或结果有什么疑问，或者搬代码。请评论留言。

以下是我的Readme陈述算法思路，还没写完，先发上来，增加浏览量，之后部分我近几天补充。

💡 The latest and final version of this algorithm, its route is BK\src\main\scala\BBB\Final.scala ✂️ I'm so sorry that I donn't have sufficient time to sort up my code. I'll gradually update it in the next few days after my first Commit.😝

Community-detection

use [spark+scala] to implement CPM(Cluster Percolation Method)😃 As we konw, the CPM algorithm has three steps on the whole, but I have made a little alteration for the dataset and my project. My algorithm process is shown below.

1.Find all kliques of size k(or larger than k) in the graph, As a result, Bron-kerbosch algorithm is an efficient choice for this step. ps: There are two versions of BK that are most commonly used: (1)the basic form; (2)with pivoting. For my experience, the second one has a better performance 😏

2.Fliter noisy nodes to make the final evaluation(use the modularity) valuable, so I create a new subgraph based on the initial graph. ps:the noisy nodes are refer to the ones who has few connections with others (such as: the isolated points)

3.Statistic whether these two maximal cliques are 'related'(if they contain the same nodes more than n), If they are 'related', they could be merged together.

😕ps: (1)the author of the paper use the matrix to finally merge all the 'related' maximal cliques together, I use the set to store which maximal cliques are 'related' with the current one. However, this alteratin doesn't make any difference essentially. (2)n is just a variable, you can adjust it with your result. ☺️

4.Use the ans of step 3 to merge the 'related' maximal cliques into one community.

5.Use the modularity to evaluate the outcome.

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' 👊 Now I generalize my function in this project (1)bronKerboschl the basic form of bronkerbosch algorithm (2)bronkerbosch2 the bronkerbosch algorithm with pivoting

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

黑石物理服务器