Pandas Array
An array allows us to store a collection of multiple values in a single data structure.
Pandas array is designed to provide a more memory-efficient and performance-enhanced alternative to Python’s built-in lists, NumPy arrays, and other data structures for handling the same type of data.
Create Array Using Python List
We can create a Pandas array using a Python List. For example,
import pandas as pd
# create a list named data
data = [2, 4, 6, 8]
# create Pandas array using data
array1 = pd.array(data)
print(array1)
Output
<IntegerArray>
[2, 4, 6, 8]
Length: 4, dtype: Int64
In the above example, we first imported the pandas
library as pd
and created a list named data. Notice the code
array1 = pd.array(data)
Here, we have created an array by passing data as an argument to the pd.array()
function.
Instead of creating a list and using the list variable with the pd.array()
function, we can directly pass list elements as an argument. For example,
import pandas as pd
# create Pandas array by passing list directly
array1 = pd.array([2, 4, 6, 8])
print(array1)
Output
<IntegerArray>
[2, 4, 6, 8]
Length: 4, dtype: Int64
This code gives the same output as the previous code.
Explicitly Specify Array Elements Data Type
In Pandas, we can explicitly specify the data type of array elements. For example,
import pandas as pd
# creating a pandas.array of integers
int_array = pd.array([1, 2, 3, 4, 5], dtype='int')
print(int_array)
print()
# creating a pandas.array of floating-point numbers
float_array = pd.array([1.1, 2.2, 3.3, 4.4, 5.5], dtype='float')
print(float_array)
print()
# creating a pandas.array of strings
string_array = pd.array(['apple', 'banana', 'cherry', 'date'], dtype='str')
print(string_array)
print()
# creating a pandas.array of boolean values
bool_array = pd.array([True, False, True, False], dtype='bool')
print(bool_array)
print()
Output
<NumpyExtensionArray>
[1, 2, 3, 4, 5]
Length: 5, dtype: int64
<NumpyExtensionArray>
[1.1, 2.2, 3.3, 4.4, 5.5]
Length: 5, dtype: float64
<NumpyExtensionArray>
['apple', 'banana', 'cherry', 'date']
Length: 4, dtype: str192
<NumpyExtensionArray>
[True, False, True, False]
Length: 4, dtype: bool
In the above example, we have passed the dtype
argument inside pd.array()
to explicitly specify the data type of the array elements.
Here,
- int_array - creates an array containing integers by specifying
dtype = 'int'
- float_array - creates an array containing floating-point numbers by specifying
dtype = 'float'
- string_array - creates an array containing strings by specifying
dtype = 'str'
- bool_array - creates an array containing boolean values (
True
orFalse
) by specifyingdtype = 'bool'
Create Series From Pandas Array
In Pandas, we can directly create Pandas Series from Pandas Array.
For that we use the Series()
method. Let’s look at an example.
import pandas as pd
# create a Pandas array
arr = pd.array([18, 20, 19, 21, 22])
# create a Pandas series from the Pandas array
arr_series = pd.Series(arr)
print(arr_series)
Output
0 18
1 20
2 19
3 21
4 22
dtype: Int64
Here, we have used pd.Series(arr)
to create a Series from Pandas array named arr.
In the output,
- The left column represents the index of the Series. The default index is a sequence of integers starting from 0.
- The right column represents the values of the Series, which correspond to the values of the Pandas array arr.