Studying for what, and
How to study and
How to study applications
in various candidates in the world ofPythonfamily and evenRas alternatives
pandas for the Python programming language.
an open source, BSD-licensed library
data analysis tools
pandas consists of the following elements:
A set of [labeled] [array] data structures, the primary of which are [Series] and [DataFrame].
[Index] objects enabling both simple [axis] indexing and [multi-level] / [hierarchical axis] indexing.
An integrated [group by] engine for [aggregating] and [transforming] data sets.
[Daterange] generation ([date_range]) and custom [date offsets] enabling the implementation of customized [frequencies].
Input/Output tools: loading [tabulardata] from flat files ([CSV], delimited, [Excel 2003]), and saving and loading pandas objects from the fast and efficient PyTables/HDF5 format.
Memory-efficient “sparse” versions of the standard data structures for storing data that is mostly missing or mostly constant (some fixed value).
Moving window statistics ([rolling] mean, rolling standard deviation, etc.).
[Series]: [1D] [labeled] [homogeneously-typed] [array]
[DataFrame]: General [2D] [labeled], size-mutable [tabular] structure with potentially [heterogeneously-typed] [column]
The best way to think about the pandas data structures is as flexible[containers] for lower dimensional data.
For example, [DataFrame] is a container for [Series], and Series is a container for [scalars].
We would like to be able to insert and remove objects from these containers in a [dictionary-like] fashion.
With tabular data (DataFrame) it is more semantically helpful to think of the [index] (the rows) and the [columns] rather than axis 0 and axis
Iteratingthrough the columns of the DataFrame thus results in more readable code:
for col in df.columns:
series = df[col]
# do something with series
Mutability and copying of data
All pandas data structures are [value-mutable] (the values they contain can be altered) but not always [size-mutable]. The length of a Series cannot be changed, but, for example, columns can be inserted into a DataFrame. However, the vast majority of methods produce new objects and leave the input data untouched. In general we like to favor immutability where sensible.
The goal of this 2015 cookbook (by Julia Evans) is to give you some concrete examples for getting started with pandas. These are examples with real-world data, and all the bugs and weirdness that entails.
A quick tour of the IPython Notebook:
Shows off IPython’s awesome tab completion and magic functions.
Chapter 1: [Reading your data] into pandas is pretty much the easiest thing. Even when the encoding is wrong!
Chapter 2: It’s not totally obvious how to select data from a pandas [dataframe]. Here we explain the basics (how to take slices and get columns)
Chapter 3: Here we get into serious [slicing and dicing] and learn how to [filter] dataframes in complicated ways, really fast.
Chapter 4: [Groupby/aggregate] is seriously my favorite thing about pandas and I use it all the time. You should probably read this.
Chapter 5: Here you get to find out if it’s cold in Montreal in the winter (spoiler: yes). [Web scraping] with pandas is fun! Here we combine dataframes.
Chapter 6: [Strings] with pandas are great. It has all these [vectorized string] operations and they’re the best. We will turn a bunch of strings containing “Snow” into vectors of numbers in a trice.
Chapter 7: [Cleaning] up messy data is never a joy, but with pandas it’s easier.
Chapter 8: Parsing Unix [timestamps] is confusing at first but it turns out to be really easy.
Chapter 9: Reading data from SQL [databases].
So, whats are the strengths and potential usages of Pandas
Pandas can do
Reading data into dataframe
Select data in dataframe
Filter data in dataframe
Groupby/aggregate data in dataframe
Dataframe and database
Pandas is really
A data container (dataframe, series, scalar data) from and to various data sources (file and database)
data preprocessing (reading, select, filter, write)
Specially data cleaning
Simple statistics, time series analysis tool, computation and visulization
Strongly High-performance tabular data-handling tool