前20名Python机器学习开源项目

We analyze Top 20 Python Machine learning projects on GitHub and find that scikit-Learn, PyLearn2 and NuPic are the most actively contributed projects. Explore these popular projects on Github!

Fig. 1: Python Machine learning projects on GitHub, with color corresponding to commits/contributors. Bob, Iepy, Nilearn, and NuPIC have the highest such value.

  1. scikit-learn, 18845 commits, 404 contributors, www.github.com/scikit-learn/scikit-learn scikit-learn is a Python module for machine learning built on top of SciPy.It features various classification, regression and clustering algorithms including support vector machines, logistic regression, naive Bayes, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
  2. Pylearn2, 7027 commits, 117 contributors, www.github.com/lisa-lab/pylearn2 Pylearn2 is a library designed to make machine learning research easy. Its a library based on Theano
  3. NuPIC, 4392 commits, 60 contributors, www.github.com/numenta/nupic The Numenta Platform for Intelligent Computing (NuPIC) is a machine intelligence platform that implements the HTM learning algorithms. HTM is a detailed computational theory of the neocortex. At the core of HTM are time-based continuous learning algorithms that store and recall spatial and temporal patterns. NuPIC is suited to a variety of problems, particularly anomaly detection and prediction of streaming data sources.
  4. Nilearn, 2742 commits, 28 contributors, www.github.com/nilearn/nilearn Nilearn is a Python module for fast and easy statistical learning on NeuroImaging data. It leverages the scikit-learn Python toolbox for multivariate statistics with applications such as predictive modeling, classification, decoding, or connectivity analysis.
  5. PyBrain, 969 commits, 27 contributors, www.github.com/pybrain/pybrain PyBrain is short for Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library. Its goal is to offer flexible, easy-to-use yet still powerful algorithms for Machine Learning Tasks and a variety of predefined environments to test and compare your algorithms.
  6. Pattern, 943 commits, 20 contributors, www.github.com/clips/pattern Pattern is a web mining module for Python. It has tools for Data Mining, Natural Language Processing, Network Analysis and Machine Learning. It supports vector space model, clustering, classification using KNN, SVM, Perceptron
  7. Fuel, 497 commits, 12 contributors, www.github.com/mila-udem/fuel Fuel provides your machine learning models with the data they need to learn. it has interfaces to common datasets such as MNIST, CIFAR-10 (image datasets), Google's One Billion Words (text). It gives you the ability to iterate over your data in a variety of ways, such as in minibatches with shuffled/sequential examples
  8. Bob, 5080 commits, 11 contributors, www.github.com/idiap/bob Bob is a free signal-processing and machine learning toolbox The toolbox is written in a mix of Python and C++ and is designed to be both efficient and reduce development time. It is composed of a reasonably large number of packages that implement tools for image, audio & video processing, machine learning and pattern recognition
  9. skdata, 441 commits, 10 contributors, www.github.com/jaberg/skdata Skdata is a library of data sets for machine learning and statistics. This module provides standardized Python access to toy problems as well as popular computer vision and natural language processing data sets.
  10. MILK, 687 commits, 9 contributors, www.github.com/luispedro/milk Milk is a machine learning toolkit in Python. Its focus is on supervised classification with several classifiers available: SVMs, k-NN, random forests, decision trees. It also performs feature selection. These classifiers can be combined in many ways to form different classification systems.For unsupervised learning, milk supports k-means clustering and affinity propagation.
  11. IEPY, 1758 commits, 9 contributors, www.github.com/machinalis/iepy IEPY is an open source tool for Information Extraction focused on Relation Extraction It's aimed at users needing to perform Information Extraction on a large dataset. scientists wanting to experiment with new IE algorithms.
  12. Quepy, 131 commits, 9 contributors, www.github.com/machinalis/quepy Quepy is a python framework to transform natural language questions to queries in a database query language. It can be easily customized to different kinds of questions in natural language and database queries. So, with little coding you can build your own system for natural language access to your database. Currently Quepy provides support for Sparql and MQL query languages, with plans to extended it to other database query languages.
  13. Hebel, 244 commits, 5 contributors, www.github.com/hannes-brt/hebel Hebel is a library for deep learning with neural networks in Python using GPU acceleration with CUDA through PyCUDA. It implements the most important types of neural network models and offers a variety of different activation functions and training methods such as momentum, Nesterov momentum, dropout, and early stopping.
  14. mlxtend, 135 commits, 5 contributors, www.github.com/rasbt/mlxtend Its a library consisting of useful tools and extensions for the day-to-day data science tasks.
  15. nolearn, 192 commits, 4 contributors, www.github.com/dnouri/nolearn This package contains a number of utility modules that are helpful with machine learning tasks. Most of the modules work together with scikit-learn, others are more generally useful.
  16. Ramp, 179 commits, 4 contributors, www.github.com/kvh/ramp Ramp is a python library for rapid prototyping of machine learning solutions. It's a light-weight pandas-based machine learning framework pluggable with existing python machine learning and statistics tools (scikit-learn, rpy2, etc.). Ramp provides a simple, declarative syntax for exploring features, algorithms and transformations quickly and efficiently.
  17. Feature Forge, 219 commits, 3 contributors, www.github.com/machinalis/featureforge A set of tools for creating and testing machine learning features, with a scikit-learn compatible API. This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, etc.), and particularly helpful if you use scikit-learn (although this can work if you have a different algorithm).
  18. REP, 50 commits, 3 contributors, www.github.com/yandex/rep REP is environment for conducting data-driven research in a consistent and reproducible way. It has a unified classifiers wrapper for variety of implementations like TMVA, Sklearn, XGBoost, uBoost. It can train classifiers in parallel on a cluster. It supports interactive plots
  19. Python Machine Learning Samples, 15 commits, 3 contributors, www.github.com/awslabs/machine-learning-samples A collection of sample applications built using Amazon Machine Learning.
  20. Python-ELM, 17 commits, 1 contributor, www.github.com/dclambert/Python-ELM This is an implementation of the Extreme Learning Machine in Python, based on scikit-learn.

原文发布于微信公众号 - 智能计算时代(intelligentinterconn)

原文发表时间:2017-03-23

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏CreateAMind

互信息论文笔记

https://github.com/topics/mutual-information

36350
来自专栏专知

【论文推荐】最新六篇情感分析相关论文—深度上下文、支持向量机、两级LSTM、多模态情感分析、软件工程、代码混合

【导读】专知内容组整理了最近六篇情感分析(Sentiment Analysis)相关文章,为大家进行介绍,欢迎查看! 1. Deep contextualize...

1.5K130
来自专栏小樱的经验随笔

2017年浙江理工大学程序设计竞赛校赛 题解&源码(A.水, D. 简单贪心 ,E.数论,I 暴力)

Problem A: 回文 Time Limit: 1 Sec  Memory Limit: 128 MB Submit: 1719  Solved: 528 ...

67170
来自专栏HansBug's Lab

1059: [ZJOI2007]矩阵游戏

1059: [ZJOI2007]矩阵游戏 Time Limit: 10 Sec  Memory Limit: 162 MB Submit: 2154  Solv...

30260
来自专栏数据结构与算法

P1343 地震逃生

题目描述 汶川地震发生时,四川**中学正在上课,一看地震发生,老师们立刻带领x名学生逃跑,整个学校可以抽象地看成一个有向图,图中有n个点,m条边。1号点为教室,...

31050
来自专栏WOLFRAM

五形相生

在三维欧氏空间里,有且仅有五种正多面体:正四面体、立方体、正八面体、正十二面体、正二十面体。一般介绍正多面体的文献中,只会强调立方体和正四面体互为对偶,正十二面...

15340
来自专栏AI2ML人工智能to机器学习

决策树会有哪些特性?

决策树(Decision Tree)是机器学习中最常见的算法, 因为决策树的结果简单,容易理解, 因此应用超级广泛, 但是机器学习的专家们在设计决策树的时候会考...

14720
来自专栏智能计算时代

45测试深度学习基础知识的数据科学家的问题(以及解决方案)

原文:https://www.analyticsvidhya.com/blog/2017/01/must-know-questions-deep-learnin...

37960
来自专栏段石石的专栏

自然语言处理基础技术之分词、向量化、词性标注

前段时间,因为项目需求, 开始接触了 NLP,中文分词、词向量、词性标注, 这三块是前段时间项目上有用到过,所以稍做总结与大家分享下,只有更极致地深入了解才能学...

1.2K50
来自专栏成长道路

文本型数据的向量化:TF-IDF

1.对于文本型数据的分类处理(或者其他的处理),根据ik和jcseg等分词器先对它们进行分词处理之后,大家都知道,计算机是处理不了汉字的,对于文本型的词我们如何...

31000

扫码关注云+社区

领取腾讯云代金券