Studying for what, and

Studying what

How to study and

How to study applications

in various candidates in the world ofPythonfamily and evenRas alternatives

Pandas overview


pandas for the Python programming language.

an open source, BSD-licensed library


easy-to-usedata structuresand

data analysis tools

pandas consists of the following elements:

A set of [labeled] [array] data structures, the primary of which are [Series] and [DataFrame].

[Index] objects enabling both simple [axis] indexing and [multi-level] / [hierarchical axis] indexing.

An integrated [group by] engine for [aggregating] and [transforming] data sets.

[Daterange] generation ([date_range]) and custom [date offsets] enabling the implementation of customized [frequencies].

Input/Output tools: loading [tabulardata] from flat files ([CSV], delimited, [Excel 2003]), and saving and loading pandas objects from the fast and efficient PyTables/HDF5 format.

Memory-efficient “sparse” versions of the standard data structures for storing data that is mostly missing or mostly constant (some fixed value).

Moving window statistics ([rolling] mean, rolling standard deviation, etc.).

Data Structures

[Series]: [1D] [labeled] [homogeneously-typed] [array]

[DataFrame]: General [2D] [labeled], size-mutable [tabular] structure with potentially [heterogeneously-typed] [column]

The best way to think about the pandas data structures is as flexible[containers] for lower dimensional data.

For example, [DataFrame] is a container for [Series], and Series is a container for [scalars].

We would like to be able to insert and remove objects from these containers in a [dictionary-like] fashion.

With tabular data (DataFrame) it is more semantically helpful to think of the [index] (the rows) and the [columns] rather than axis 0 and axis

Iteratingthrough the columns of the DataFrame thus results in more readable code:

for col in df.columns:

series = df[col]

# do something with series

Mutability and copying of data

All pandas data structures are [value-mutable] (the values they contain can be altered) but not always [size-mutable]. The length of a Series cannot be changed, but, for example, columns can be inserted into a DataFrame. However, the vast majority of methods produce new objects and leave the input data untouched. In general we like to favor immutability where sensible.

pandas Cookbook

The goal of this 2015 cookbook (by Julia Evans) is to give you some concrete examples for getting started with pandas. These are examples with real-world data, and all the bugs and weirdness that entails.

A quick tour of the IPython Notebook:

Shows off IPython’s awesome tab completion and magic functions.

Chapter 1: [Reading your data] into pandas is pretty much the easiest thing. Even when the encoding is wrong!

Chapter 2: It’s not totally obvious how to select data from a pandas [dataframe]. Here we explain the basics (how to take slices and get columns)

Chapter 3: Here we get into serious [slicing and dicing] and learn how to [filter] dataframes in complicated ways, really fast.

Chapter 4: [Groupby/aggregate] is seriously my favorite thing about pandas and I use it all the time. You should probably read this.

Chapter 5: Here you get to find out if it’s cold in Montreal in the winter (spoiler: yes). [Web scraping] with pandas is fun! Here we combine dataframes.

Chapter 6: [Strings] with pandas are great. It has all these [vectorized string] operations and they’re the best. We will turn a bunch of strings containing “Snow” into vectors of numbers in a trice.

Chapter 7: [Cleaning] up messy data is never a joy, but with pandas it’s easier.

Chapter 8: Parsing Unix [timestamps] is confusing at first but it turns out to be really easy.

Chapter 9: Reading data from SQL [databases].

So, whats are the strengths and potential usages of Pandas

Ref: HZH

Pandas can do

Reading data into dataframe

Select data in dataframe

Filter data in dataframe

Groupby/aggregate data in dataframe

String data

Time data

Data cleaning

Dataframe and database

Pandas is really

A data container (dataframe, series, scalar data) from and to various data sources (file and database)

data preprocessing (reading, select, filter, write)

Specially data cleaning

Simple statistics, time series analysis tool, computation and visulization

Strongly High-performance tabular data-handling tool

  • 发表于:
  • 原文链接
  • 腾讯「云+社区」是腾讯内容开放平台帐号(企鹅号)传播渠道之一,根据《腾讯内容开放平台服务协议》转载发布内容。
  • 如有侵权,请联系 删除。