Pandas Indexing and Slicing
In Pandas, indexing refers to accessing rows and columns of data from a DataFrame, whereas slicing refers to accessing a range of rows and columns.
We can access data or range of data from a DataFrame using different methods.
Access Columns of a DataFrame
We can access columns of a DataFrame using the bracket ([]
) operator. For example,
import pandas as pd
# create a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 32, 18, 47, 33],
'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
}
df = pd.DataFrame(data)
# access the Name column
names = df['Name']
print(names)
Output
0 Alice
1 Bob
2 Charlie
3 David
4 Eve
Name: Name, dtype: object
In this example, we accessed the Name
column of df using the []
operator. It returned a series containing the values of the Name
column.
We can also access multiple columns using the []
operator. For example,
import pandas as pd
# create a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 32, 18, 47, 33],
'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
}
df = pd.DataFrame(data)
# access multiple columns
name_city = df[['Name','City']]
print(name_city)
Output
Name City
0 Alice New York
1 Bob Paris
2 Charlie London
3 David Tokyo
4 Eve Sydney
In this example, we accessed the Name
and the City
columns of df using the []
operator. It returned a DataFrame containing the values from Name
and City
of df.
The []
operator, however, provides limited functionality. Even basic operations like selecting rows, slicing DataFrames and selecting individual elements are quite tricky using the []
operator only.
So we use the .loc
and .iloc
properties for indexing and slicing DataFrames. They provide much more flexibility compared to the []
operator.
Pandas .loc
In Pandas, we use the .loc
property to access and modify data within a DataFrame using label-based indexing. It allows us to select specific rows and columns based on their labels.
Syntax
The syntax of .loc
in Pandas is:
df.loc[row_indexer, column_indexer]
Here,
row_indexer
- selects rows by their labels, can be a single label, a list of labels, or a boolean arraycolumn_indexer
- selects columns, can also be a single label, a list of labels, or a boolean array
Example: Indexing Using .loc
We can use .loc
to access the data from a dataframe using its indexes.
import pandas as pd
# create a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 32, 18, 47, 33],
'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
}
df = pd.DataFrame(data)
# access a single row
single_row = df.loc[2]
print("Single row:")
print(single_row)
print()
# access rows 0, 3 and 4
row_list = df.loc[[0, 3, 4]]
print("List of Rows:")
print(row_list)
print()
# access a list of columns
column_list = df.loc[:,['Name', 'Age']]
print("List of Columns:")
print(column_list)
print()
# access second row of 'Name' column
specific_value = df.loc[1, 'Name']
print("Specific Value:")
print(specific_value)
Output
Single row:
Name Charlie
Age 18
City London
Name: 2, dtype: object
List of Rows:
Name Age City
0 Alice 25 New York
3 David 47 Tokyo
4 Eve 33 Sydney
List of Columns:
Name Age
0 Alice 25
1 Bob 32
2 Charlie 18
3 David 47
4 Eve 33
Specific Value:
Bob
Here, we used .loc
to access a row, a list of rows, a list of columns and a specific value using the respective labels.
In the line,
column_list = df.loc[:,['Name', 'Age']]
The :
operator indicates that all the rows are to be selected.
Example: Slicing Using .loc
We can also use .loc
to access a range of rows and columns. If we sequentially access a DataFrame (say from index 1 to 3), we call it slicing.
import pandas as pd
# create a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 32, 18, 47, 33],
'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
}
df = pd.DataFrame(data)
# slice rows from index 1 to 3
slice_rows = df.loc[1:3]
print("Sliced Rows:")
print(slice_rows)
print()
# slicing columns from 'Name' to 'Age'
slice_columns = df.loc[:, 'Name':'Age']
print("Sliced Columns:")
print(slice_columns)
Output
Sliced Rows:
Name Age City
1 Bob 32 Paris
2 Charlie 18 London
3 David 47 Tokyo
Sliced Columns:
Name Age
0 Alice 25
1 Bob 32
2 Charlie 18
3 David 47
4 Eve 33
Here, we sliced rows and columns using .loc
and :
operator.
Notice the endpoints are inclusive i.e. both 1 and 3 positions are included in
df.loc[1:3]
.
Example: Boolean Indexing With .loc
We can use the boolean indexing to set conditions and filter the data.
import pandas as pd
# create a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 32, 18, 47, 33],
'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
}
df = pd.DataFrame(data)
# boolean indexing with .loc
boolean_index = df.loc[df['Age'] > 30]
print("Filtered DataFrame: ")
print(boolean_index)
Output
Boolean Indexing:
Name Age City
1 Bob 32 Paris
3 David 47 Tokyo
4 Eve 33 Sydney
In this example, we selected all the rows where the value of Age
is greater than 30.
Pandas .iloc
In Pandas, the .iloc
property is used to access and modify data within a DataFrame using integer-based indexing. It allows us to select specific rows and columns based on their integer locations.
Syntax
The syntax of .iloc
in Pandas is:
df.iloc[row_indexer, column_indexer]
Here,
row_indexer
- is used to select rows by their integer location, and can be a single integer, a list of integers, or a boolean arraycolumn_indexer
- selects columns, and can also be a single integer, a list of integers, or a boolean array
Example: Indexing Using .iloc
import pandas as pd
# create a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 32, 18, 47, 33],
'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
}
df = pd.DataFrame(data)
# access single row
single_row = df.iloc[2]
# access rows 0, 3 and 4
row_list = df.iloc[[0, 3, 4]]
# access columns 0 and 2
column_list = df.iloc[:,[0,2]]
# access a specific value
specific_value = df.iloc[1, 0]
# display result
print("Single Row:")
print(single_row)
print()
print("List of Rows:")
print(row_list)
print()
print("List of Columns:")
print(column_list)
print()
print("Specific Value:")
print(specific_value)
Output
Single Row:
Name Charlie
Age 18
City London
Name: 2, dtype: object
List of Rows:
Name Age City
0 Alice 25 New York
3 David 47 Tokyo
4 Eve 33 Sydney
List of Columns:
Name City
0 Alice New York
1 Bob Paris
2 Charlie London
3 David Tokyo
4 Eve Sydney
Specific Value:
Bob
Here, we used .iloc
to access a row, a list of rows, a list of columns and a specific value using the respective integer values.
Example: Slicing Using .iloc
import pandas as pd
# create a sample DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 32, 18, 47, 33],
'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
}
df = pd.DataFrame(data)
# slice rows from position 1 to 3
slice_rows = df.iloc[1:4]
# slice columns from position 0 to 1
slice_columns = df.iloc[:, 0:2]
# display results
print("Sliced Rows:")
print(slice_rows)
print()
print("Sliced Columns:")
print(slice_columns)
Output
Sliced Rows:
Name Age City
1 Bob 32 Paris
2 Charlie 18 London
3 David 47 Tokyo
Sliced Columns:
Name Age
0 Alice 25
1 Bob 32
2 Charlie 18
3 David 47
4 Eve 33
Notice that the position 4 is not inclusive in df.iloc[1:4]
.
.loc vs .iloc
The main differences between .loc
and .iloc
are as follows:
Basis | .loc | .iloc |
---|---|---|
Indexing | Label-based indexing | Integer-based indexing |
Endpoint | Endpoint is included | Endpoint is not included |
Boolean indexing | Boolean indexing is supported | Boolean indexing is not supported |