专栏首页Django系统SPARK框架下实现CPM(派系协同过滤算法)
原创

SPARK框架下实现CPM(派系协同过滤算法)

社群发现算法实现:CPM,基于SPARK+SCALA+MAVEN+Hadoop

选择此框架实现原因:

(1)SPARK的Graphx对于图操作较为便捷。

(2)SPARK下的并行处理,便于对以后大型数据集进行扩充。

详情见我的Github:传送门

代码和数据集都在里面,需要的自己clone。

ps:对我的算法或结果有什么疑问,或者搬代码。请评论留言。

以下是我的Readme陈述算法思路,还没写完,先发上来,增加浏览量,之后部分我近几天补充。

💡 The latest and final version of this algorithm, its route is BK\src\main\scala\BBB\Final.scala ✂️ I'm so sorry that I donn't have sufficient time to sort up my code. I'll gradually update it in the next few days after my first Commit.😝

Community-detection

use [spark+scala] to implement CPM(Cluster Percolation Method)😃 As we konw, the CPM algorithm has three steps on the whole, but I have made a little alteration for the dataset and my project. My algorithm process is shown below.

1.Find all kliques of size k(or larger than k) in the graph, As a result, Bron-kerbosch algorithm is an efficient choice for this step. ps: There are two versions of BK that are most commonly used: (1)the basic form; (2)with pivoting. For my experience, the second one has a better performance 😏

2.Fliter noisy nodes to make the final evaluation(use the modularity) valuable, so I create a new subgraph based on the initial graph. ps:the noisy nodes are refer to the ones who has few connections with others (such as: the isolated points)

3.Statistic whether these two maximal cliques are 'related'(if they contain the same nodes more than n), If they are 'related', they could be merged together.

😕ps: (1)the author of the paper use the matrix to finally merge all the 'related' maximal cliques together, I use the set to store which maximal cliques are 'related' with the current one. However, this alteratin doesn't make any difference essentially. (2)n is just a variable, you can adjust it with your result. ☺️

4.Use the ans of step 3 to merge the 'related' maximal cliques into one community.

5.Use the modularity to evaluate the outcome.

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' 👊 Now I generalize my function in this project (1)bronKerboschl the basic form of bronkerbosch algorithm (2)bronkerbosch2 the bronkerbosch algorithm with pivoting

原创声明,本文系作者授权云+社区发表,未经许可,不得转载。

如有侵权,请联系 yunjia_community@tencent.com 删除。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 多目标进化算法应用于提高医药数据领域学习器的性能(CS AI)

    原文标题完整翻译:多目标进化算法应用于提高在医药数据领域使用整体特征选择和离散化模型的学习器的性能

    Donuts_choco
  • 针对网上资源分配机制设计的统一方法(cs.GT)

    这篇论文是关于网上资源分配在战略制定方面的机制设计。在该设定中,一个单独的供应者通过分配有限量的资源以求资源以顺序任意的方式到达。代理者则与每一个请求息息相关。...

    Donuts_choco
  • 项链的 K 中心问题(cs.DS)

    在图论中,k-中心问题的目标是找到一组 k 顶点,其中任何点与其在K-集合中距其最近的点的最大距离最小化。在本文中,我们介绍了项链集的k-中心问题,即循环移位下...

    Donuts_choco
  • 【译文】数据可视化的10个关键术语①

    Format 交互方式 Interactive visualisations allow you to modify, manipulate and expl...

    小莹莹
  • 数据可视化的10个关键术语

    Format 交互方式 Interactive visualisations allow you to modify, manipulate and explo...

    CSDN技术头条
  • 具有主要和次要代理以及非高斯噪声的分散线性二次系统(CS RO)

    我们考虑具有主要代理人和次要代理人的分散线性二次系统。代理商的动力和二次成本是耦合的。尤其是动力学是线性的。主要代理的状态和控制行为会影响所有次要代理的状态演变...

    时代在召唤
  • Codeforces Round #345 (Div. 2)【A.模拟,B,暴力,C,STL,容斥原理】

    A. Joysticks time limit per test:1 second memory limit per test:256 megabytes in...

    Angel_Kitty
  • String

    青木
  • Oops错误

    在at91rm9200下写了一个spi的驱动,加载后,运行测试程序时,蹦出这么个吓人的东西: Unable to handle kernel paging r...

    一见
  • 【CodeForces 605A】BUPT 2015 newbie practice #2 div2-E - Sorting Railway Cars

    http://acm.hust.edu.cn/vjudge/contest/view.action?cid=102419#problem/E

    饶文津

扫码关注云+社区

领取腾讯云代金券