Jul 9, 2023

Pandas DataFrame Manipulation

DataFrame manipulation in Pandas involves editing and modifying existing DataFrames. Some common DataFrame manipulation operations are:

Adding rows/columns
Removing rows/columns
Renaming rows/columns

Add a New Column to a Pandas DataFrame

We can add a new column to an existing Pandas DataFrame by simply declaring a new list as a column. For example,

import pandas as pd

# define a dictionary containing student data
data = {
		'Name': ['John', 'Emma', 'Michael', 'Sophia'],
        'Height': [5.5, 6.0, 5.8, 5.3],
        'Qualification': ['BSc', 'BBA', 'MBA', 'BSc']
}

# convert the dictionary into a DataFrame
df = pd.DataFrame(data)

# declare a new list
address = ['New York', 'London', 'Sydney', 'Toronto']

# assign the list as a column
df['Address'] = address

print(df)

Output

      Name  Height Qualification   Address
0     John     5.5           BSc  New York
1     Emma     6.0           BBA    London
2  Michael     5.8           MBA    Sydney
3   Sophia     5.3           BSc   Toronto

In this example, we assign the list address to the Address column in the DataFrame.

Add a New Row to a Pandas DataFrame

Adding rows to a DataFrame is not quite as straightforward as adding columns in Pandas. We use the .loc property to add a new row to a Pandas DataFrame.

For example,

import pandas as pd

# define a dictionary containing student data
data = {'Name': ['John', 'Emma', 'Michael', 'Sophia'],
        'Height': [5.5, 6.0, 5.8, 5.3],
        'Qualification': ['BSc', 'BBA', 'MBA', 'BSc']}

# convert the dictionary into a DataFrame
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)
print()

# add a new row
df.loc[len(df.index)] = ['Amy', 5.2, 'BIT'] 
df.loc[len(df.index)] = ['Ahm', 5.9, 'BIT']

print("Modified DataFrame:")
print(df)

Output

Original DataFrame:
      Name  Height Qualification
0     John    5.5           BSc
1     Emma    6.0           BBA
2  Michael    5.8           MBA
3   Sophia    5.3           BSc

Modified DataFrame:
      Name  Height Qualification
0     John    5.5           BSc
1     Emma    6.0           BBA
2  Michael    5.8           MBA
3   Sophia    5.3           BSc
4      Amy    5.2           BIT
5      Ahm    5.9           BIT

In this example, we added a row ['Amy', 5.2, 'BIT'] to the df DataFrame.

Here,

len(df.index): returns the number of rows in df
df.loc[...]: accesses the row with index value enclosed by the square brackets

Remove Rows/Columns from a Pandas DataFrame

We can use drop() to delete rows and columns from a DataFrame.

Example: Delete Rows

import pandas as pd

# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Felipe', 'Rita'],
        'Age': [25, 30, 35, 40, 22, 29],
        'City': ['New York', 'London', 'Paris', 'Tokyo', 'Bogota', 'Banglore']}
df = pd.DataFrame(data)

# display the original DataFrame
print("Original DataFrame:")
print(df)
print()

# delete row with index 4
df.drop(4, axis=0, inplace=True)

# delete row with index 5
df.drop(index=5, inplace=True)

# delete rows with index 1 and 3
df.drop([1, 3], axis=0, inplace=True)

# display the modified DataFrame after deleting rows
print("Modified DataFrame:")
print(df)

Output

Original DataFrame:
      Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   35     Paris
3    David   40     Tokyo

Modified DataFrame:
      Name  Age      City
0    Alice   25  New York
2  Charlie   35     Paris

In this example, we deleted single rows using the labels=4 and index=5 parameters. We also deleted multiple rows with labels=[1,3] argument.

Here,

axis=0: indicates that rows are to be deleted
inplace=True: indicates that the changes are to be made in the original DataFrame

Example: Delete columns

import pandas as pd

# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40],
        'City': ['New York', 'London', 'Paris', 'Tokyo'],
        'Height': ['165', '178', '185', '171'],
        'Profession': ['Engineer', 'Entrepreneur', 'Unemployed', 'Actor'],
        'Marital Status': ['Single', 'Married', 'Divorced', 'Engaged']}
df = pd.DataFrame(data)

# display the original DataFrame
print("Original DataFrame:")
print(df)
print()

# delete age column
df.drop('Age', axis=1, inplace=True)

# delete marital status column
df.drop(columns='Marital Status', inplace=True)

# delete height and profession columns
df.drop(['Height', 'Profession'], axis=1, inplace=True)

# display the modified DataFrame after deleting rows
print("Modified DataFrame:")
print(df)

Output

Original DataFrame:
      Name  Age      City   Height    Profession   Marital Status
0    Alice  25  New York    165      Engineer         Single
1      Bob  30    London    178  Entrepreneur        Married
2  Charlie  35     Paris    185    Unemployed       Divorced
3    David  40     Tokyo    171         Actor        Engaged

Modified DataFrame:
      Name      City
0    Alice  New York
1      Bob    London
2  Charlie     Paris
3    David     Tokyo

In this example, we deleted single columns using the labels='Age' and columns='Marital Status' parameters. We also deleted multiple columns with labels=['Height', 'Profession'] argument.

Here,

axis=1: indicates that columns are to be deleted
inplace=True: indicates that the changes are to be made in the original DataFrame

Rename Labels in a DataFrame

We can rename columns in a Pandas DataFrame using the rename() function.

Example: Rename Columns

import pandas as pd

# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40],
        'City': ['New York', 'London', 'Paris', 'Tokyo']}
df = pd.DataFrame(data)

# display the original DataFrame
print("Original DataFrame:")
print(df)
print()

# rename column 'Name' to 'First_Name'
df.rename(columns= {'Name': 'First_Name'}, inplace=True)

# rename columns 'Age' and 'City'
df.rename(mapper= {'Age': 'Number', 'City':'Address'}, axis=1, inplace=True)

# display the DataFrame after renaming column
print("Modified DataFrame:")
print(df)

Output

Original DataFrame:
      Name  Age      City
0    Alice  25  New York
1      Bob  30    London
2  Charlie  35     Paris
3    David  40     Tokyo

Modified DataFrame:
  First_Name  Number   Address
0      Alice     25  New York
1        Bob     30    London
2    Charlie     35     Paris
3      David     40     Tokyo

In this example, we renamed a single column using the columns={'Name': 'First_Name'} parameter. We also renamed multiple columns with mapper={'Age': 'Number', 'City':'Address'} argument.

Here,

axis=1: indicates that columns are to be renamed
inplace=True: indicates that the changes are to be made in the original DataFrame

Example: Rename Row Labels

import pandas as pd

# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40],
        'City': ['New York', 'London', 'Paris', 'Tokyo']}
df = pd.DataFrame(data)

# display the original DataFrame
print("Original DataFrame:")
print(df)
print()

# rename column one index label
df.rename(index={0: 7}, inplace=True)

# rename columns multiple index labels
df.rename(mapper={1: 10, 2: 100}, axis=0, inplace=True)

# display the DataFrame after renaming column
print("Modified DataFrame:")
print(df)

Output

Original DataFrame:
      Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   35     Paris
3    David   40     Tokyo

Modified DataFrame:
        Name  Age      City
7      Alice   25  New York
10       Bob   30    London
100  Charlie   35     Paris
3      David   40     Tokyo

In this example, we renamed a single row using the index={0: 7} parameter. We also renamed multiple rows with mapper={1: 10, 2: 100} argument.

Here,

axis=0: indicates that rows are to be renamed
inplace=True: indicates that the changes are to be made in the original DataFrame