The goal of this JustPython is to give you some concrete examples for getting started with pandas.
Table of Contents
I want to start using pandas
In [1]: import pandas as pd
To load the pandas package and start working with it, import the package. The community agreed alias for pandas is pd, so loading pandas as pd is assumed standard practice for all of the pandas documentation.
I want to store passenger data of the Titanic. For a number of passengers, I know the name (characters), age (integers) and sex (male/female) data.
In [2]:
import numpy as np
df = pd.DataFrame(
{
"Name": [
"Braund, Mr. Owen Harris",
"Allen, Mr. William Henry",
"Bonnell, Miss. Elizabeth",
"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",
"Heikkinen, Miss. Laina",
"Futrelle, Mrs. Jacques Heath (Lily May Peel)",
"Sandstrom, Miss. Marguerite Rut",
"Moran, Mr. James",
"McCarthy, Mr. Timothy J"
],
"Age": [22, 35, 58, 38, 26, 35, 4, np.nan, 54],
"Sex": ["male", "male", "female", "female", "female", "female", "female", "male", "male"],
}
)
_________________________________________________________________________________________________
In [3]: df
Out[3]:
Name Age Sex
0 Braund, Mr. Owen Harris 22 male
1 Allen, Mr. William Henry 35 male
2 Bonnell, Miss. Elizabeth 58 female
3 Cumings, Mrs. John ...... 38.0 female
4 Heikkinen, Miss....... 26.0 female
5 Futrelle, Mrs. J.......... 35.0 female
6 Sandstrom, Miss. Marguer... 4.0 female
7 Moran, Mr. James.......... NaN male
8 McCarthy, Mr. Timothy J... 54.0 male
To manually store data in a table, create a DataFrame. When using a Python dictionary of lists, the dictionary keys will be used as column headers and the values in each list as columns of the DataFrame.
➤ Purpose: Summarizes your DataFrame—columns, data types, and missing values.
In [4]: df.head()
__________________________________________________________________________________________________
out [4]:
Name Age Sex
0 Braund, Mr. Owen Harris 22.0 male
1 Allen, Mr. William He.. 35.0 male
2 Bonnell, Miss. Elizab.. 58.0 female
3 Cumings, Mrs. John B... 38.0 female
4 Heikkinen, Miss. Laina 26.0 female
In [5]: df.head(2)
__________________________________________________________________________________________________
out [5]:
Name Age Sex
0 Braund, Mr. Owen Harris 22.0 male
1 Allen, Mr. William He.. 35.0 male
➤ Purpose: Summarizes your DataFrame—columns, data types, and missing values.
In [5]: df.info()
__________________________________________________________________________________________________
out [5]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9 entries, 0 to 8
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 9 non-null object
1 Age 8 non-null float64
2 Sex 9 non-null object
dtypes: float64(1), object(2)
memory usage: 348.0+ bytes
➤ Purpose: Provides summary statistics for all numeric columns.
In [6]: df.describe()
__________________________________________________________________________________________________
out [6]:
Age
count 8.000000
mean 34.000000
std 17.328754
min 4.000000
25% 25.000000
50% 35.000000
75% 42.000000
max 58.000000