# Apriori算法的Python实现

Apriori算法是数据挖掘中频发模式挖掘的鼻祖，从60年代就开始流行，其算法思想也十分简单朴素，首先挖掘出长度为1的频繁模式，然后k=2

```__author__ = 'linfuyuan'
file_name = raw_input('please input the transaction file:')
transactions = []

def has_infrequent_subset(candidate, Lk):
for i in range(len(candidate)):
subset = candidate[:-1]
subset.sort()
if not ''.join(subset) in Lk:
return False
lastitem = candidate.pop()
candidate.insert(0, lastitem)
return True

def countFrequency(candidate, transactions):
count = 0
for transaction in transactions:
if transaction.issuperset(candidate):
count += 1
return count

with open(file_name) as f:
line = line.strip()
tokens = line.split(',')
if len(tokens) > 0:
transaction = set(tokens)
transactions.append(transaction)
currentFrequencySet = {}
for transaction in transactions:
for item in transaction:
time = currentFrequencySet.get(item, 0)
currentFrequencySet[item] = time + 1
Lk = set()
for (itemset, count) in currentFrequencySet.items():
if count >= min_frequency:
print ', '.join(Lk)

while len(Lk) > 0:
newLk = set()
for itemset1 in Lk:
for itemset2 in Lk:
cancombine = True
for i in range(len(itemset1)):
if i < len(itemset1) - 1:
cancombine = itemset1[i] == itemset2[i]
if not cancombine:
break
else:
cancombine = itemset1[i] < itemset2[i]
if not cancombine:
break
if cancombine:
newitemset = []
for char in itemset1:
newitemset.append(char)
newitemset.append(itemset2[-1])
if has_infrequent_subset(newitemset, Lk) and countFrequency(newitemset, transactions) >= min_frequency:
print ', '.join(newLk)
Lk = newLk```

104 篇文章40 人订阅

0 条评论

## 相关文章

### Python3插入排序

Python3插入排序 前言 为什么要开始写Python3算法系列呢？ 一是很长很长时间没专门练习练习这种基本功 二是想把这个系列以基本代码的方式给写出来，提...

27360

33870

9540

25260

7610

45540

383100

21830

36560

### 求两个等长有序数组的中位数的logN算法 分治法

http://blog.csdn.net/yangliuy/article/details/7194199

62320