Pandas Read Text File
Pandas offers several methods to read plain text (.txt
) files and convert them to Pandas DataFrame.
We can read text files in Pandas in the following ways:
- Using the
read_fwf()
function - Using the
read_table()
function - Using the
read_csv()
function
Using the above methods, let’s read a sample text file named data.txt
with the following content.
John 25 170
Alice 28 165
Bob 30 180
Read Text Using read_fwf()
The acronym fwf
in the read_fwf()
function in Pandas stands for fixed-width lines, and it is used to load DataFrames from files such as text files.
The text file should be separated into columns of fixed-width for it to be read using read_fwf()
.
Syntax
The syntax of read_fwf()
in Pandas is:
pandas.read_fwf(
filepath_or_buffer,
colspecs = [],
widths=None,
infer_nrows=100,
**kwds
)
filepath_or_buffer
: specifies the file path or a file-like object from which the data will be readcolspecs
: defines the column positions or ranges in the filewidths
(optional): an alternative to colspecs and can be used to define the width of each column in the fileinfer_nrows
(optional): specifies the number of rows to be used for inferring the column widths if widths is not explicitly provided**kwds
(optional): allows additional keyword arguments to be passed for further customization
Example: read_fwf()
The content of the data.txt
file is the same as mentioned in the introduction section.
import pandas as pd
# read the fixed-width file
df = pd.read_fwf('data.txt', colspecs=[(0, 5), (6, 10), (11, 15)], names = ['Name', 'Age', 'Height'])
print(df)
Output
Name Age Height
0 John 25 70
1 Alice 28 65
2 Bob 30 80
In the above example, we read a text file 'data.txt'
using read_fwf()
.
Here,
colspecs = [(0,5), (6,10), (11,15)]
: specifies the position of each column in the text filenames = ['Name', 'Age', 'Height']
: specifies the names to be assigned to each column
The names
argument is an example of a keyword argument in **kwds
.
Read Text Using read_table()
The read_table()
function in pandas is used to read tabular data from a file or a URL. It is a convenient way to read data from delimited text files.
Syntax
The syntax of read_table()
in Pandas is:
df = pd.read_table(
filepath_or_buffer,
sep='\t',
header='infer',
names=None
)
Here,
filepath_or_buffer
: specifies the path to the file to be read or a URL pointing to the filesep
: specifies the separator or delimiter used in the file to separate columnsheader
: specifies the row number (0-indexed) to be used as the column namesnames
: a list of column names for the DataFrame
To learn more, please refer to the official documentation on read_table()
.
Example: read_table()
The content of the data.txt
file is the same as mentioned in the introduction section.
import pandas as pd
# read the file using read_table()
df = pd.read_table("data.txt", sep="\s+", names=['Name', 'Age', 'Height'])
print(df)
Output
Name Age Height
0 John 25 170
1 Alice 28 165
2 Bob 30 180
Here, the sep="\s+"
parameter is used to specify that the data is separated by one or more whitespace characters.
The separator is selected based on the format of the text file. For example, if the text file contained comma separated values, we would have used sep = ','
.
Read Text Using read_csv()
The read_csv()
function in Pandas is used to read csv files.
It can also be used to read text files as read_csv()
allows use of other separators like white spaces, tabs, semicolons etc in addition to commas.
Syntax
The syntax of read_csv()
in Pandas is given below:
df = pd.read_csv(
filepath_or_buffer,
sep=',',
header=0,
names=['col1', 'col2', 'col3'],
index_col='col1'
)
Here,
filepath_or_buffer
: represents the path or buffer object containing the CSV data to be readsep
(optional): specifies the delimiter used in the CSV fileheader
(optional): indicates the row number to be used as the header or column namesnames
(optional): a list of column names to assign to the DataFrameindex_col
(optional): specifies the column to be used as the index of the DataFrame
These are some commonly used arguments of the read_csv()
function. All of them are optional except filepath_or_buffer
. There are many other optional arguments that can be used with read_csv()
.
Example: read_csv()
The content of the data.txt
file is the same as mentioned in the introduction section. The values in the file are thus separated by whitespaces.
import pandas as pd
# read the file using read_table()
df = pd.read_csv("data.txt", sep="\s+", header = None, names=['Name', 'Age', 'Height'])
print(df)
Output
Name Age Height
0 John 25 170
1 Alice 28 165
2 Bob 30 180
Here, header = None
indicates that none of the rows in the text file is a header row.
Also, notice the use of sep="\s+"
which indicates that the values in the files are separated by one or more spaces.