Pandas DataFrame Manipulation
DataFrame manipulation in Pandas involves editing and modifying existing DataFrames. Some common DataFrame manipulation operations are:
- Adding rows/columns
- Removing rows/columns
- Renaming rows/columns
Add a New Column to a Pandas DataFrame
We can add a new column to an existing Pandas DataFrame by simply declaring a new list as a column. For example,
import pandas as pd
# define a dictionary containing student data
data = {
'Name': ['John', 'Emma', 'Michael', 'Sophia'],
'Height': [5.5, 6.0, 5.8, 5.3],
'Qualification': ['BSc', 'BBA', 'MBA', 'BSc']
}
# convert the dictionary into a DataFrame
df = pd.DataFrame(data)
# declare a new list
address = ['New York', 'London', 'Sydney', 'Toronto']
# assign the list as a column
df['Address'] = address
print(df)
Output
Name Height Qualification Address
0 John 5.5 BSc New York
1 Emma 6.0 BBA London
2 Michael 5.8 MBA Sydney
3 Sophia 5.3 BSc Toronto
In this example, we assign the list address to the Address
column in the DataFrame.
Add a New Row to a Pandas DataFrame
Adding rows to a DataFrame is not quite as straightforward as adding columns in Pandas. We use the .loc
property to add a new row to a Pandas DataFrame.
For example,
import pandas as pd
# define a dictionary containing student data
data = {'Name': ['John', 'Emma', 'Michael', 'Sophia'],
'Height': [5.5, 6.0, 5.8, 5.3],
'Qualification': ['BSc', 'BBA', 'MBA', 'BSc']}
# convert the dictionary into a DataFrame
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
print()
# add a new row
df.loc[len(df.index)] = ['Amy', 5.2, 'BIT']
df.loc[len(df.index)] = ['Ahm', 5.9, 'BIT']
print("Modified DataFrame:")
print(df)
Output
Original DataFrame:
Name Height Qualification
0 John 5.5 BSc
1 Emma 6.0 BBA
2 Michael 5.8 MBA
3 Sophia 5.3 BSc
Modified DataFrame:
Name Height Qualification
0 John 5.5 BSc
1 Emma 6.0 BBA
2 Michael 5.8 MBA
3 Sophia 5.3 BSc
4 Amy 5.2 BIT
5 Ahm 5.9 BIT
In this example, we added a row ['Amy', 5.2, 'BIT']
to the df DataFrame.
Here,
len(df.index)
: returns the number of rows in dfdf.loc[...]
: accesses the row with index value enclosed by the square brackets
Remove Rows/Columns from a Pandas DataFrame
We can use drop()
to delete rows and columns from a DataFrame.
Example: Delete Rows
import pandas as pd
# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Felipe', 'Rita'],
'Age': [25, 30, 35, 40, 22, 29],
'City': ['New York', 'London', 'Paris', 'Tokyo', 'Bogota', 'Banglore']}
df = pd.DataFrame(data)
# display the original DataFrame
print("Original DataFrame:")
print(df)
print()
# delete row with index 4
df.drop(4, axis=0, inplace=True)
# delete row with index 5
df.drop(index=5, inplace=True)
# delete rows with index 1 and 3
df.drop([1, 3], axis=0, inplace=True)
# display the modified DataFrame after deleting rows
print("Modified DataFrame:")
print(df)
Output
Original DataFrame:
Name Age City
0 Alice 25 New York
1 Bob 30 London
2 Charlie 35 Paris
3 David 40 Tokyo
Modified DataFrame:
Name Age City
0 Alice 25 New York
2 Charlie 35 Paris
In this example, we deleted single rows using the labels=4
and index=5
parameters. We also deleted multiple rows with labels=[1,3]
argument.
Here,
axis=0
: indicates that rows are to be deletedinplace=True
: indicates that the changes are to be made in the original DataFrame
Example: Delete columns
import pandas as pd
# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'London', 'Paris', 'Tokyo'],
'Height': ['165', '178', '185', '171'],
'Profession': ['Engineer', 'Entrepreneur', 'Unemployed', 'Actor'],
'Marital Status': ['Single', 'Married', 'Divorced', 'Engaged']}
df = pd.DataFrame(data)
# display the original DataFrame
print("Original DataFrame:")
print(df)
print()
# delete age column
df.drop('Age', axis=1, inplace=True)
# delete marital status column
df.drop(columns='Marital Status', inplace=True)
# delete height and profession columns
df.drop(['Height', 'Profession'], axis=1, inplace=True)
# display the modified DataFrame after deleting rows
print("Modified DataFrame:")
print(df)
Output
Original DataFrame:
Name Age City Height Profession Marital Status
0 Alice 25 New York 165 Engineer Single
1 Bob 30 London 178 Entrepreneur Married
2 Charlie 35 Paris 185 Unemployed Divorced
3 David 40 Tokyo 171 Actor Engaged
Modified DataFrame:
Name City
0 Alice New York
1 Bob London
2 Charlie Paris
3 David Tokyo
In this example, we deleted single columns using the labels='Age'
and columns='Marital Status'
parameters. We also deleted multiple columns with labels=['Height', 'Profession']
argument.
Here,
axis=1
: indicates that columns are to be deletedinplace=True
: indicates that the changes are to be made in the original DataFrame
Rename Labels in a DataFrame
We can rename columns in a Pandas DataFrame using the rename()
function.
Example: Rename Columns
import pandas as pd
# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'London', 'Paris', 'Tokyo']}
df = pd.DataFrame(data)
# display the original DataFrame
print("Original DataFrame:")
print(df)
print()
# rename column 'Name' to 'First_Name'
df.rename(columns= {'Name': 'First_Name'}, inplace=True)
# rename columns 'Age' and 'City'
df.rename(mapper= {'Age': 'Number', 'City':'Address'}, axis=1, inplace=True)
# display the DataFrame after renaming column
print("Modified DataFrame:")
print(df)
Output
Original DataFrame:
Name Age City
0 Alice 25 New York
1 Bob 30 London
2 Charlie 35 Paris
3 David 40 Tokyo
Modified DataFrame:
First_Name Number Address
0 Alice 25 New York
1 Bob 30 London
2 Charlie 35 Paris
3 David 40 Tokyo
In this example, we renamed a single column using the columns={'Name': 'First_Name'}
parameter. We also renamed multiple columns with mapper={'Age': 'Number', 'City':'Address'}
argument.
Here,
axis=1
: indicates that columns are to be renamedinplace=True
: indicates that the changes are to be made in the original DataFrame
Example: Rename Row Labels
import pandas as pd
# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'London', 'Paris', 'Tokyo']}
df = pd.DataFrame(data)
# display the original DataFrame
print("Original DataFrame:")
print(df)
print()
# rename column one index label
df.rename(index={0: 7}, inplace=True)
# rename columns multiple index labels
df.rename(mapper={1: 10, 2: 100}, axis=0, inplace=True)
# display the DataFrame after renaming column
print("Modified DataFrame:")
print(df)
Output
Original DataFrame:
Name Age City
0 Alice 25 New York
1 Bob 30 London
2 Charlie 35 Paris
3 David 40 Tokyo
Modified DataFrame:
Name Age City
7 Alice 25 New York
10 Bob 30 London
100 Charlie 35 Paris
3 David 40 Tokyo
In this example, we renamed a single row using the index={0: 7}
parameter. We also renamed multiple rows with mapper={1: 10, 2: 100}
argument.
Here,
axis=0
: indicates that rows are to be renamedinplace=True
: indicates that the changes are to be made in the original DataFrame