Jul 9, 2023

Pandas Indexing and Slicing

In Pandas, indexing refers to accessing rows and columns of data from a DataFrame, whereas slicing refers to accessing a range of rows and columns.

We can access data or range of data from a DataFrame using different methods.

Access Columns of a DataFrame

We can access columns of a DataFrame using the bracket ([]) operator. For example,

import pandas as pd

# create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 32, 18, 47, 33],
    'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
}
df = pd.DataFrame(data)

# access the Name column
names = df['Name']

print(names)

Output

0      Alice
1        Bob
2    Charlie
3      David
4        Eve
Name: Name, dtype: object

In this example, we accessed the Name column of df using the [] operator. It returned a series containing the values of the Name column.

We can also access multiple columns using the [] operator. For example,

import pandas as pd

# create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 32, 18, 47, 33],
    'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
}
df = pd.DataFrame(data)

# access multiple columns
name_city = df[['Name','City']]

print(name_city)

Output

      Name      City
0    Alice  New York
1      Bob     Paris
2  Charlie    London 
3    David     Tokyo
4      Eve    Sydney

In this example, we accessed the Name and the City columns of df using the [] operator. It returned a DataFrame containing the values from Name and City of df.

The [] operator, however, provides limited functionality. Even basic operations like selecting rows, slicing DataFrames and selecting individual elements are quite tricky using the [] operator only.

So we use the .loc and .iloc properties for indexing and slicing DataFrames. They provide much more flexibility compared to the [] operator.

Pandas .loc

In Pandas, we use the .loc property to access and modify data within a DataFrame using label-based indexing. It allows us to select specific rows and columns based on their labels.

Syntax

The syntax of .loc in Pandas is:

df.loc[row_indexer, column_indexer]

Here,

row_indexer - selects rows by their labels, can be a single label, a list of labels, or a boolean array
column_indexer - selects columns, can also be a single label, a list of labels, or a boolean array

Example: Indexing Using .loc

We can use .loc to access the data from a dataframe using its indexes.

import pandas as pd

# create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 32, 18, 47, 33],
    'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
}
df = pd.DataFrame(data)

# access a single row
single_row = df.loc[2]

print("Single row:")
print(single_row)
print()

# access rows 0, 3 and 4
row_list = df.loc[[0, 3, 4]]

print("List of Rows:")
print(row_list)
print()

# access a list of columns
column_list = df.loc[:,['Name', 'Age']]

print("List of Columns:")
print(column_list)
print()

# access second row of 'Name' column
specific_value = df.loc[1, 'Name']

print("Specific Value:")
print(specific_value)

Output

Single row:
Name    Charlie
Age          18
City     London
Name: 2, dtype: object

List of Rows:
    Name  Age      City
0  Alice   25  New York
3  David   47     Tokyo
4    Eve   33    Sydney

List of Columns:
      Name  Age
0    Alice   25
1      Bob   32
2  Charlie   18
3    David   47
4      Eve   33

Specific Value:
Bob

Here, we used .loc to access a row, a list of rows, a list of columns and a specific value using the respective labels.

In the line,

column_list = df.loc[:,['Name', 'Age']]

The : operator indicates that all the rows are to be selected.

Example: Slicing Using .loc

We can also use .loc to access a range of rows and columns. If we sequentially access a DataFrame (say from index 1 to 3), we call it slicing.

import pandas as pd

# create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 32, 18, 47, 33],
    'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
}

df = pd.DataFrame(data)

# slice rows from index 1 to 3
slice_rows = df.loc[1:3]

print("Sliced Rows:")
print(slice_rows)
print()

 # slicing columns from 'Name' to 'Age'
slice_columns = df.loc[:, 'Name':'Age']

print("Sliced Columns:")
print(slice_columns)

Output

Sliced Rows:
      Name  Age    City
1      Bob   32   Paris
2  Charlie   18  London
3    David   47   Tokyo

Sliced Columns:
      Name  Age
0    Alice   25
1      Bob   32
2  Charlie   18
3    David   47
4      Eve   33

Here, we sliced rows and columns using .loc and : operator.

Notice the endpoints are inclusive i.e. both 1 and 3 positions are included in df.loc[1:3].

Example: Boolean Indexing With .loc

We can use the boolean indexing to set conditions and filter the data.

import pandas as pd

# create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 32, 18, 47, 33],
    'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
}
df = pd.DataFrame(data)

# boolean indexing with .loc
boolean_index = df.loc[df['Age'] > 30]

print("Filtered DataFrame: ")
print(boolean_index)

Output

Boolean Indexing:
      Name  Age    City
1      Bob   32   Paris
3    David   47   Tokyo
4      Eve   33  Sydney

In this example, we selected all the rows where the value of Age is greater than 30.

Pandas .iloc

In Pandas, the .iloc property is used to access and modify data within a DataFrame using integer-based indexing. It allows us to select specific rows and columns based on their integer locations.

Syntax

The syntax of .iloc in Pandas is:

df.iloc[row_indexer, column_indexer]

Here,

row_indexer - is used to select rows by their integer location, and can be a single integer, a list of integers, or a boolean array
column_indexer - selects columns, and can also be a single integer, a list of integers, or a boolean array

Example: Indexing Using .iloc

import pandas as pd

# create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 32, 18, 47, 33],
    'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
}
df = pd.DataFrame(data)

# access single row
single_row = df.iloc[2]

# access rows 0, 3 and 4
row_list = df.iloc[[0, 3, 4]]

# access columns 0 and 2
column_list = df.iloc[:,[0,2]]

# access a specific value
specific_value = df.iloc[1, 0]

# display result
print("Single Row:")
print(single_row)
print()
print("List of Rows:")
print(row_list)
print()
print("List of Columns:")
print(column_list)
print()
print("Specific Value:")
print(specific_value)

Output

Single Row:
Name    Charlie
Age          18
City     London
Name: 2, dtype: object

List of Rows:
    Name  Age      City
0  Alice   25  New York
3  David   47     Tokyo
4    Eve   33    Sydney

List of Columns:
      Name      City
0    Alice  New York
1      Bob     Paris
2  Charlie    London
3    David     Tokyo
4      Eve    Sydney

Specific Value:
Bob

Here, we used .iloc to access a row, a list of rows, a list of columns and a specific value using the respective integer values.

Example: Slicing Using .iloc

import pandas as pd

# create a sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 32, 18, 47, 33],
    'City': ['New York', 'Paris', 'London', 'Tokyo', 'Sydney']
}
df = pd.DataFrame(data)

# slice rows from position 1 to 3
slice_rows = df.iloc[1:4]

# slice columns from position 0 to 1
slice_columns = df.iloc[:, 0:2]

# display results
print("Sliced Rows:")
print(slice_rows)
print()
print("Sliced Columns:")
print(slice_columns)

Output

Sliced Rows:
      Name  Age    City
1      Bob   32   Paris
2  Charlie   18  London
3    David   47   Tokyo

Sliced Columns:
      Name  Age
0    Alice   25
1      Bob   32
2  Charlie   18
3    David   47
4      Eve   33

Notice that the position 4 is not inclusive in df.iloc[1:4].

.loc vs .iloc

The main differences between .loc and .iloc are as follows:

Basis	.loc	.iloc
Indexing	Label-based indexing	Integer-based indexing
Endpoint	Endpoint is included	Endpoint is not included
Boolean indexing	Boolean indexing is supported	Boolean indexing is not supported