Pandas CSV
Pandas provides functions for both reading from and writing to CSV files.
CSV stands for Comma-Separated Values. It is a popular file format used for storing tabular data, where each row represents a record, and columns are separated by a delimiter (generally a comma).
For example, contents of a CSV file may look like,
Employee ID,First Name,Last Name,Department,Position,Salary
101,John,Doe,Marketing,Manager,50000
102,Jane,Smith,Sales,Associate,35000
103,Michael,Johnson,Finance,Analyst,45000
104,Emily,Williams,HR,Coordinator,40000
Read CSV Files
In Pandas, the read_csv()
function allows us to read data from a CSV file into a DataFrame. It automatically detects commas and parses the data into appropriate columns.
Here’s an example of reading a CSV file using Pandas:
import pandas as pd
# read csv file
df = pd.read_csv('data.csv', header = 0)
print(df)
Output
Employee ID First Name Last Name Department Position Salary
0 101 John Doe Marketing Manager 50000
1 102 Jane Smith Sales Associate 35000
2 103 Michael Johnson Finance Analyst 45000
3 104 Emily Williams HR Coordinator 40000
The above code reads the contents of the data.csv
file and creates a DataFrame named df
containing the data from the CSV file.
Here, header = 0
sets the first row as the header of the dataframe.
The contents of the data.csv
file are the same as the contents of the CSV file provided in the introduction section.
Note:
data.csv
file should be present in the current directory for the above code to work. If it is in a different directory, you will need to provide the full path to the file.
For example, if the file data.csv
is in the folder csv_files
, the path './csv_files/data.csv'
should be specified as:
df = pd.read_csv('./csv_files/data.csv', header = 0)
read_csv() Syntax
The syntax of read_csv()
in Pandas is:
df = pd.read_csv(
filepath_or_buffer,
sep=',',
header=0,
names=['col1', 'col2', 'col3'],
index_col='col1',
usecols=['col1', 'col3'],
skiprows=[1, 3],
nrows=100,
skipinitialspace=True
)
Here,
filepath_or_buffer
: represents the path or buffer object containing the CSV data to be readsep
(optional): specifies the delimiter used in the CSV fileheader
(optional): indicates the row number to be used as the header or column namesnames
(optional): a list of column names to assign to the DataFrameindex_col
(optional): specifies the column to be used as the index of the DataFrameusecols
(optional): a list of columns to be read and included in the DataFrameskiprows
(optional): used to skip specific rows while reading the CSV filenrows
(optional): sets the maximum number of rows to be read from the CSV fileskipinitialspace
(optional): determines whether to skip any whitespace after the delimiter in each field
These are some commonly used arguments of the read_csv()
function. All of them are optional except filepath_or_buffer
. There are many other optional arguments of read_csv()
.
To learn more, please refer to the official documentation on read_csv()
.
Example: read_csv() With Arguments
Let’s suppose that we have a CSV file named data.csv
with the following contents:
23, 'Hello', 45.6
56, 'World', 78.9
89, 'Foo', 12.3
34, 'Bar', 56.7
Now, let’s load this CSV file into a DataFrame.
import pandas as pd
# read csv file with some arguments
df = pd.read_csv('data.csv', header = None, names = ['col1', 'col2', 'col3'], skiprows = 2)
print(df)
Output
col1 col2 col3
0 89 'Foo' 12.3
1 34 'Bar' 56.7
In this example, we read a CSV file using the read_csv()
method. We specified some arguments while reading the file to load the necessary data in appropriate format.
Here,
header = None
: indicates that the file doesn’t have a header rownames = ['col1', 'col2', 'col3']
: assigns the column names as'col1'
,'col2'
and'col3'
skiprows = 2
: skips the first two rows
Write to CSV Files
We used read_csv()
to read data from a CSV file into a DataFrame.
Pandas also provides the to_csv()
function to write data from a DataFrame into a CSV file.
Let’s see an example.
import pandas as pd
# create a dictionary
data = {'Name': ['John', 'Alice', 'Bob'],
'Age': [25, 30, 35],
'City': ['New York', 'London', 'Paris']}
# create a dataframe from the dictionary
df = pd.DataFrame(data)
# write dataframe to csv file
df.to_csv('output.csv', index=False)
Output
Name,Age,City
John,25,New York
Alice,30,London
Bob,35,Paris
Here, the above code writes the DataFrame df
to the output.csv
file. The index=False
parameter is used to exclude the index labels from the CSV file.
Note: The above code will create a new file named
output.csv
in the current directory (unless a different directory is specified in the file path).
If the file output.csv
already exists in the current directory, running this code will overwrite the existing file with the new contents of the DataFrame.
to_csv() Syntax
The syntax of to_csv()
in Pandas is:
df.to_csv(
path_or_buf,
sep=',',
header=True,
index=False,
mode='w',
encoding=None,
quoting=None,
line_terminator='\n',
)
Here,
path_or_buf
: represents the path or buffer object where the DataFrame will be saved as a CSV filesep
(optional): specifies the delimiter to be used in the output CSV fileheader
(optional): indicates whether to include the header row in the output CSV fileindex
(optional): determines whether to include the index column in the output CSV filemode
(optional): specifies the mode in which the output file will be openedencoding
(optional): sets the character encoding to be used when writing the CSV filequoting
(optional): determines the quoting behavior for fields that contain special charactersline_terminator
(optional): specifies the character sequence used to terminate lines in the output CSV file
These are some commonly used arguments of the to_csv()
function. All of them are optional except path_or_buf
. There are many other optional arguments that can be used with to_csv()
.
To learn more, please refer to the official documentation to_csv()
.
Example: to_csv() With Arguments
import pandas as pd
# create dataframe
data = {'Name': ['Tom', 'Nick', 'John', 'Tom'],
'Age': [20, 21, 19, 18],
'City': ['New York', 'London', 'Paris', 'Berlin']}
df = pd.DataFrame(data)
# write to csv file
df.to_csv('output.csv', sep = ';', index = False, header = True)
Output
Name;Age;City
Tom;20;New York
Nick;21;London
John;19;Paris
Tom;18;Berlin
In this example, we wrote a DataFrame to the CSV file 'output.csv'
using the to_csv()
method. We used some arguments to write necessary data to the file in required format.
Here,
sep = ';'
: specifies the delimiter as';'
index = False
: instructs not to include the index column in the output fileheader = True
: instructs to include the column names as the header in the output file