Jul 9, 2023

Pandas Index

In Pandas, an index refers to the labeled array that identifies rows or columns in a DataFrame or a Series. For example,

    Name    Age      City
0   John     25  New York
1  Alice     28    London
2    Bob     32     Paris

In the above DataFrame, the numbers 0, 1, and 2 represent the index, providing unique labels to each row.

We can use indexes to uniquely identify data and access data with efficiency and precision.

Create Indexes in Pandas

Pandas offers several ways to create indexes. Some common methods are as follows:

Default Index
Setting Index
Creating a Range Index

Default Index

When we create a DataFrame or Series without specifying an index explicitly, Pandas assigns a default integer index starting from 0. For example,

import pandas as pd

data = {
		'Name': ['John', 'Alice', 'Bob'],
        'Age': [25, 28, 32],
        'City': ['New York', 'London', 'Paris']
}

df = pd.DataFrame(data)
print(df)

Output

    Name  Age      City
0   John   25  New York
1  Alice   28    London
2    Bob   32     Paris

In this example, the default index [0, 1, 2] is automatically assigned to the rows.

Setting Index

We can set an existing column as the index using the set_index() method. For example,

import pandas as pd

# create dataframe
data = {
		'Name': ['John', 'Alice', 'Bob'],
        'Age': [25, 28, 32],
        'City': ['New York', 'London', 'Paris']
}

df = pd.DataFrame(data)

# set the 'Name' column as index
df.set_index('Name', inplace=True)

print(df)

Output

Name     Age    City      
John     25  New York
Alice    28    London
Bob      32     Paris

In this example, the Name column is set as the index, replacing the default integer index.

Here, the inplace=True parameter performs the operation directly on the object itself, without creating a new object. When we specify inplace=True, the original object is modified, and the changes are directly applied.

Creating a Range Index

We can create a range index with specific start and end values using the RangeIndex() function. For example,

import pandas as pd

# create dataframe
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [25, 28, 32],
        'City': ['New York', 'London', 'Paris']}

df = pd.DataFrame(data)

# create a range index
df = pd.DataFrame(data, index=pd.RangeIndex(5, 8, name='Index'))

print(df)

Output

         Name  Age      City
Index                       
5        John   25  New York
6       Alice   28    London
7         Bob   32     Paris

Here, a range index from 5 to 8(excluded) is created with the name Index.

Modifying Indexes in Pandas

Pandas allows us to make changes to indexes easily. Some common modification operations are:

Renaming Index
Resetting Index

Renaming Index

We can rename an index using the rename() method. For example,

import pandas as pd

# create a dataframe
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [25, 28, 32],
        'City': ['New York', 'London', 'Paris']}
        
df = pd.DataFrame(data)

# display original dataframe
print('Original DataFrame:')
print(df)
print()

# rename index
df.rename(index={0: 'A', 1: 'B', 2: 'C'}, inplace=True)

# display dataframe after index is renamed
print('Modified DataFrame')
print(df)

Output

Original DataFrame:
    Name  Age      City
0   John   25  New York
1  Alice   28    London
2    Bob   32     Paris

Modified DataFrame
    Name  Age      City
A   John   25  New York
B  Alice   28    London
C    Bob   32     Paris

In this example, we renamed the indexes 0, 1, and 2 to 'A', 'B', and 'C' respectively.

Resetting Index

We can reset the index to the default integer index using the reset_index() method. For example,

import pandas as pd

data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [25, 28, 32],
        'City': ['New York', 'London', 'Paris']}

# create a dataframe
df = pd.DataFrame(data)

# rename index
df.rename(index={0: 'A', 1: 'B', 2: 'C'}, inplace=True)

# display dataframe
print('Original DataFrame:')
print(df)
print('\n')

# reset index
df.reset_index(inplace=True)

# display dataframe after index is reset
print('Modified DataFrame:')
print(df)

Output

Original DataFrame:
    Name  Age      City
A   John   25  New York
B  Alice   28    London
C    Bob   32     Paris

Modified DataFrame:
  index   Name  Age      City
0     A   John   25  New York
1     B  Alice   28    London
2     C    Bob   32     Paris

Access Rows by Index

We can access rows of a DataFrame using the .iloc property. For example,

import pandas as pd

# create a dataframe
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [25, 28, 32],
        'City': ['New York', 'London', 'Paris']}

df = pd.DataFrame(data)

second_row = df.iloc[1]

print(second_row)

Output

Name     Alice
Age         28
City    London
Name: 1, dtype: object

In this example, we displayed the second row of the df DataFrame by its index value (1) using the .iloc property.

Get DataFrame Index

We can access the DataFrame Index using the index attribute. For example,

import pandas as pd

# create a dataframe
data = {'Name': ['John', 'Alice', 'Bob'],
        'Age': [25, 28, 32],
        'City': ['New York', 'London', 'Paris']}

df = pd.DataFrame(data)

# return index object
print(df.index)

# return index values
print(df.index.values)

Output

RangeIndex(start=0, stop=3, step=1)
[0 1 2]

Here,

df.index - returns the index object
df.index.values - returns the index values as a list

Types of Indexes

Pandas supports different types of indexes that offer various functionalities based on the data requirements. A few notable types are listed in the table below.

Type	Description	Examples
Range Index (RangeIndex)	It represents a sequence of integers within a specified range. It is of type `int64`. The range index `[0, 1, 2, ...]` is often used as the default index when creating DataFrame	`[0, 1, 2, 3, 4, 5, 6]` `[100, 101, 102, 103, 104]`
Categorical Index (CategoricalIndex)	It is used when dealing with categorical data. It stores a fixed set of unique categorical values.	`['Red', 'Green', 'Blue', 'Red', 'Blue']` `['Category A', 'Category B', 'Category C', 'Category A', 'Category B']`
Datetime Index (DatetimeIndex)	It is used when working with time series data. It is of type `datetime64`.	`['2023-06-01', '2023-06-02', '2023-06-03', '2023-06-04', '2023-06-05']` `['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05']`

In addition to these, there are other types of indexes:

Multi Index (MultiIndex)	It allows us to have multiple levels of indexing on one or more axes of a DataFrame or a Series object.
Interval Index (IntervalIndex)	It is used to represent intervals or ranges of values in pandas.
Timedelta Index (TimedeltaIndex)	It represents a sequence of time durations. Each element in the index represents a specific duration of time, such as hours, minutes, seconds, or a combination of these.
Period Index (PeriodIndex)	It represents a sequence of time periods. Each element in the index represents a specific time period, such as a day, month, quarter, or year.