文章/答案/技术大牛

发布

NOTE1：Data Mining Concepts and Techniques

文章来源：企鹅号 - S石子

筆不由手，手不由心，心不由境。

本篇主要四个内容

把pattern译（臆）想成“规律”

pattern mining的大纲

OLTP VS OLAP

决策树的算法过程

pattern译（臆）想成“规律”

pattern mining中的pattern多被译成“模式”。

听着实在觉得别扭，死板。

pattern, sth that repeats in a predictable way is a pattern.

from wiki:

Data miningis the process ofdiscovering patternsin large data sets involving methods at the intersection of machine learning, statistics and database systems.

数据挖掘是在大数据集中发现patterns的过程，这个探索过程会用到机器学习，统计学和数据库系统里的方法。

暂且把patterns译（臆）想成规律，大数据集中的规律。

pattern mining的大纲

pattern mining 规律挖掘主要有三个内容：

规律和规则的类型

挖掘的方法

应用

详细见下图：

OLTP VS OLAP

OTLP online transaction processing，线上事务处理，侧重事务，交易。

OLAP online analytical processing，线上分析处理，侧重信息。

主要区别如下图：

决策树的基本算法

最重要的是Node N由什么属性来分叉，也就是分枝标准splitting criterion，

由下面的attribute_selection_method来决定。

常用的三种分枝方法：

信息增益information gain

增益比gain ration

基尼系数Gini index

其实这三个指标都是对信息information的度量。

对信息的度量来源于在Shannon的 a mathematical theory of communication。

下回分解。

决策树的算法过程如下图：

封面源于：

https://yourshot.nationalgeographic.com/daily-dozen/

參考：

Data Minig: Concepts and Techniques, by Jianwei Han. 3rd Edition

https://www.vocabulary.com/dictionary/pattern

https://en.wikipedia.org/wiki/Data_mining#Pattern_mining

https://www.amazon.com/Data-Mining-Concepts-Techniques-Management/dp/0123814790

发表于: 2018-09-152018-09-15 21:14:03
原文链接：https://kuaibao.qq.com/s/20180915G1L4CY00?refer=cp_1026
腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号（企鹅号）传播渠道之一，根据《腾讯内容开放平台服务协议》转载发布内容。
如有侵权，请联系 cloudcommunity@tencent.com 删除。

扫码

添加站长进交流群

领取专属 10元无门槛券

私享最新 技术干货

NOTE1：Data Mining Concepts and Techniques

相关快讯

扫码

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐