Pandas DataFrame
A DataFrame is like a table where the data is organized in rows and columns. It is a two-dimensional data structure like a two-dimensional array. For example,
Country Capital Population
0 Canada Ottawa 37742154
1 Australia Canberra 25499884
2 UK London 67886011
3 Brazil Brasília 212559417
Here,
Country
,Capital
andPopulation
are the column names.- Each row represents a record, with the index value on the left. The index values are auto-assigned starting from 0.
- Each column contains data of the same type. For instance,
Country
andCapital
contain strings, andPopulation
contains integers.
The DataFrame is similar to a table in a SQL database, or a spreadsheet in Excel. It is designed to manage ordered and unordered datasets in Python.
Create a Pandas DataFrame
We can create a Pandas DataFrame in the following ways:
- Using Python Dictionary
- Using Python List
- From a File
- Creating an Empty DataFrame
Pandas DataFrame Using Python Dictionary
We can create a dataframe using a dictionary by passing it to the DataFrame()
function. For example,
import pandas as pd
# create a dictionary
data = {
'Name': ['John', 'Alice', 'Bob'],
'Age': [25, 30, 35],
'City': ['New York', 'London', 'Paris']
}
# create a dataframe from the dictionary
df = pd.DataFrame(data)
print(df)
Output
Name Age City
0 John 25 New York
1 Alice 30 London
2 Bob 35 Paris
In this example, we created a dictionary called data that contains the column names (Name
, Age
, City
) as keys, and lists of values as their respective values.
We then used the pd.DataFrame()
function to convert the dictionary into a DataFrame called df.
Pandas DataFrame Using Python List
We can also create a DataFrame using a two-dimensional list. For example,
import pandas as pd
# create a two-dimensional list
data = [['John', 25, 'New York'],
['Alice', 30, 'London'],
['Bob', 35, 'Paris']]
# create a DataFrame from the list
df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
print(df)
Output
Name Age City
0 John 25 New York
1 Alice 30 London
2 Bob 35 Paris
In this example, we created a two-dimensional list called data containing nested lists.
The DataFrame()
function converts the 2-D list to a DataFrame. Each nested list behaves like a row of data in the DataFrame.
The columns
argument provides a name to each column of the DataFrame.
Note: We can also create a DataFrame using NumPy array in a similar way.
Pandas DataFrame From a File
Another common way to create a DataFrame is by loading data from a CSV (Comma-Separated Values) file. For example,
import pandas as pd
# load data from a CSV file
df = pd.read_csv('data.csv')
print(df)
In this example, we used the read_csv()
function which reads the CSV file data.csv
, and automatically creates a DataFrame object df
, containing data from the CSV file.
Note: We can also create a DataFrame using other file types like JSON, Excel spreadsheet, SQL database, etc. The methods to read different file types are listed below:
- JSON -
read_json()
- Excel spreadsheet -
read_excel()
- SQL -
read_sql()
Create an Empty DataFrame
Sometimes we may want to create an empty DataFrame and then add data later. For example,
import pandas as pd
# create an empty DataFrame
df = pd.DataFrame()
print(df)
Output
Empty DataFrame
Columns: []
Index: []
In this example, we have created an empty DataFrame by calling pd.DataFrame()
without any arguments.
Here, both the Columns and Index lists are empty in the DataFrame.The DataFrame has no data, but it can be used as a container to store and manipulate data later.