前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >数据挖掘的数据集资源

数据挖掘的数据集资源

作者头像
钱塘数据
发布2018-03-01 14:26:05
2.5K0
发布2018-03-01 14:26:05
举报
文章被收录于专栏:钱塘大数据钱塘大数据

UCI数据集是一个常用的标准测试数据集,下载地址在

http://www.ics.uci.edu/~mlearn/MLRepository.html

整理好的一些UCI数据集(arff格式):

http://lamda.nju.edu.cn/yuy/files/download/UCI_arff.zip 关于源代码,网上有很多公开源码的算法包,例如最为著名的Weka,MLC++等。Weka还在不断的更新其算法,下载地址:

http://www.cs.waikato.ac.nz/ml/weka/

很多的机器学习的经典算法都在里面。而且公布源程序,易于修改。 关于论文的下载,如果能够访问电子图书馆是最好的,很多学校都买了IEEE, Elsevier, Kluwer等,上面的期刊都不错。有一些很好的期刊是免费的,像JAIR和JMLR,分别在: http://www.cs.washington.edu/research/jair/home.html http://www.jmlr.org/ 如果能访问的免费期刊太少,可以到CiteSeer上搜索(http://citeseer.ist.psu.edu/ ),上面搜集了很多免费论文(但是要注意,论文的质量参差不齐),或者用Google搜索。 数据挖掘的数据集资源: 大家做数据挖掘研究时,常常为找不到合适的数据而发愁。在KDNuggets上有Datasets栏目,提供一些数据集,网址为: http://www.kdnuggets.com/datasets/ 还有另外一个很好的资源网址为: http://kdd.ics.uci.edu/ 里面包含的数据资源如下(按应用领域划分): Direct Marketing KDD CUP 1998 Data GIS Forest CoverType Indexing Corel Image Features Pseudo Periodic Synthetic Time Series Intrusion Detection KDD CUP 1999 Data Process Control Synthetic Control Chart Time Series Recommendation Systems Entree Chicago Recommendation Data Robots Pioneer-1 Mobile Robot Data Robot Execution Failures Sign Language Recognition Australian Sign Language Data High-quality Australian Sign Language Data Text Categorization 20 Newsgroups Data Reuters-21578 Text Categorization Collection NSF Research Awards Abstracts 199 0-2003 World Wide Web Microsoft Anonymous Web Data MSNBC Anonymous Web Data Syskill Webert Web Data 下面转自:http://blogger.org.cn/blog/more.asp?name=DMman&id=24043 1、气候监测数据集: http://cdiac.ornl.gov/ftp/ndp026b 2、几个实用的测试数据集下载的网站 http://www.cs.toronto.edu/~roweis/data.html http://www.cs.toronto.edu/~roweis/data.html http://kdd.ics.uci.edu/summary.task.type.html http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/ http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/ http://www.phys.uni.torun.pl/~duch/software.html 在下面的网址可以找到reuters数据集: http://www.research.att.com/~lewis/reuters21578.html 以下网址上有各种数据集: http://kdd.ics.uci.edu/summary.data.type.html 进行文本分类,还有一个数据集是可以用的,即rainbow的数据集 http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html 3、找了很多测试数据集,写论文的同志们肯定需要的,至少能用来检验算法的效果,可能有一些不能访问,但是总有能访问的吧: UCI收集的机器学习数据集 ftp://pami.sjtu.edu.cn/ http://www.ics.uci.edu/~mlearn//MLRepository.htm statlib http://liama.ia.ac.cn/SCILAB/scilabindexgb.htm http://lib.stat.cmu.edu/ 样本数据库 http://kdd.ics.uci.edu/ http://www.ics.uci.edu/~mlearn/MLRepository.html 关于基金的数据挖掘的网站 http://www.gotofund.com/index.asp http://lans.ece.utexas.edu/~strehl/ reuters数据集 http://www.research.att.com/~lewis/reuters21578.html 各种数据集: http://kdd.ics.uci.edu/summary.data.type.html http://www.mlnet.org/cgi-bin/mlnetois.pl/?File=datasets.html http://lib.stat.cmu.edu/datasets/ http://dctc.sjtu.edu.cn/adaptive/datasets/ http://fimi.cs.helsinki.fi/data/ http://www.almaden.ibm.com/software/quest/Resources/index.shtml http://miles.cnuce.cnr.it/~palmeri/datam/DCI/ 进行文本分类&WEB http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html http://www.w3.org/TR/WD-logfile-960221.html http://www.w3.org/Daemon/User/Config/Logging.html#AccessLog http://www.w3.org/1998/11/05/WC-workshop/Papers/bala2.html http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/ http://www.web-caching.com/traces-logs.html http://www-2.cs.cmu.edu/webkb http://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdf http://www.cs.cornell.edu/projects/kddcup/index.html 时间序列数据的网址 http://www.stat.wisc.edu/~reinsel/bjr-data/ apriori算法的测试数据 http://www.almaden.ibm.com/cs/quest/syndata.html 数据生成器的链接 http://www.cse.cuhk.edu.hk/~kdd/data_collection.html http://www.almaden.ibm.com/cs/quest/syndata.html 关联: http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar http://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#assocSynData WEKA: http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar 1、A jarfile containing 37 classification problems, originally obtained from the UCI repository http://prdownloads.sourceforge.net/weka/datasets-UCI.jar 2、A jarfile containing 37 regression problems, obtained from various sources http://prdownloads.sourceforge.net/weka/datasets-numeric.jar 3、A jarfile containing 30 regression datasets collected by Luis Torgo http://prdownloads.sourceforge.net/weka/regression-datasets.jar 癌症基因: http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi 金融数据: http://lisp.vse.cz/pkdd99/Challenge/chall.htm 另一个人提供的: http://www.cs.toronto.edu/~roweis/data.html http://kdd.ics.uci.edu/summary.task.type.html http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/ http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/ http://www.phys.uni.torun.pl/~duch/software.html 在下面的网址可以找到reuters数据集 http://www.research.att.com/~lewis/reuters21578.html 以下网址上有各种数据集: http://kdd.ics.uci.edu/summary.data.type.html 进行文本分类,还有一个数据集是可以用的,即rainbow的数据集 http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html Download the Financial Data (~17.5M zipped file, ~67M unzipped data) Download the Medical Data (~2M zipped file, ~6M unzipped data) http://lisp.vse.cz/pkdd99/Challenge/chall.htm kdnuggets 相关链接数据集: http://www.kdnuggets.com/datasets/index.html 你也可以到: http://blogger.org.cn/blog/more.asp?name=idmer&id=24017 察看kdnuggets 数据集资源的详细介绍。 数据挖掘相关比赛以及数据集: 2005 University of California data mining contest , predicting bad accounts and their churn date using real-world CRM data, deadline June 30, 2005. ILP 2005 Challenge , on the prediction of functional classes of genes. KDD Cup 2005 , on classifying internet user search queries, deadline July 8. Data Mining Cup 2005 (Chemnitz, Germany) , for students; topic: How data mining can ascertain the risk of loss of payments and reduce this risk. KDD Cup 2004 , focuses on data-mining for a several performance criteria using datasets from bioinformatics and quantum physics. InfoVis 2004 Contest , The History of InfoVis. DATA MINING CUP 2004 (Chemnitz, Germany) , for students. InfoVis 2003 Contest: Visualization and Pair Wise Comparison of Trees , results announced Sep 5, 2003. KDD Cup 2003 , focuses on problems motivated by network mining and the analysis of usage logs. DATA MINING CUP 2003 (Chemnitz, Germany) . The task is to identify spam emails before they reach the user′s mailbox. KDD Cup 2002 , focus on data mining in molecular biology. Student Data Mining Cup (2002) , Chemnitz University and Prudential Systems. 引自:http://blog.csdn.net/yaoyepeng/article/details/6282171

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2016-08-14,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 钱塘大数据 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档