NumPy - Data Types
NumPy Data Types
NumPy supports a much greater variety of numerical types than Python does. The following table shows different scalar data types defined in NumPy.
Sr.No. | Data Types & Description |
---|---|
1 | bool_Boolean (True or False) stored as a byte |
2 | int_Default integer type (same as C long; normally either int64 or int32) |
3 | intcIdentical to C int (normally int32 or int64) |
4 | intpInteger used for indexing (same as C ssize_t; normally either int32 or int64) |
5 | int8Byte (-128 to 127) |
6 | int16Integer (-32768 to 32767) |
7 | int32Integer (-2147483648 to 2147483647) |
8 | int64Integer (-9223372036854775808 to 9223372036854775807) |
9 | uint8Unsigned integer (0 to 255) |
10 | uint16Unsigned integer (0 to 65535) |
11 | uint32Unsigned integer (0 to 4294967295) |
12 | uint64Unsigned integer (0 to 18446744073709551615) |
13 | float_Shorthand for float64 |
14 | float16Half precision float: sign bit, 5 bits exponent, 10 bits mantissa |
15 | float32Single precision float: sign bit, 8 bits exponent, 23 bits mantissa |
16 | float64Double precision float: sign bit, 11 bits exponent, 52 bits mantissa |
17 | complex_Shorthand for complex128 |
18 | complex64Complex number, represented by two 32-bit floats (real and imaginary components) |
19 | complex128Complex number, represented by two 64-bit floats (real and imaginary components) |
NumPy numerical types are instances of dtype (data-type) objects, each having unique characteristics. The dtypes are available as np.bool_, np.float32, etc.
Data Type Objects (dtype)
A data type object describes interpretation of fixed block of memory corresponding to an array, depending on the following aspects −
- Type of data (integer, float or Python object)
- Size of data
- Byte order (little-endian or big-endian)
- In case of structured type, the names of fields, data type of each field and part of the memory block taken by each field.
- If data type is a subarray, its shape and data type
The byte order is decided by prefixing ’<’ or ’>’ to data type. ’<’ means that encoding is little-endian (least significant is stored in smallest address). ’>’ means that encoding is big-endian (most significant byte is stored in smallest address).
A dtype object is constructed using the following syntax −
numpy.dtype(object, align, copy)
The parameters are −
- Object − To be converted to data type object
- Align − If true, adds padding to the field to make it similar to C-struct
- Copy − Makes a new copy of dtype object. If false, the result is reference to builtin data type object
Example: Using Array-scalar Type
import numpy as np
dt = np.dtype(np.int32)
print(dt)
Following is the output obtained −
int32
Example: Using Equivalent String for Data Type
import numpy as np
dt = np.dtype('i4')
print(dt)
This will produce the following result −
int32
Example: Using Endian Notation
import numpy as np
dt = np.dtype('>i4')
print(dt)
Following is the output of the above code −
>i4
Example: Creating a Structured Data Type
The following examples show the use of structured data type. Here, the field name and the corresponding scalar data type is to be declared −
import numpy as np
dt = np.dtype([('age', np.int8)])
print(dt)
The output obtained is as shown below −
[('age', 'i1')]
Example: Applying Structured Data Type to ndarray
import numpy as np
dt = np.dtype([('age', np.int8)])
a = np.array([(10,), (20,), (30,)], dtype=dt)
print(a)
After executing the above code, we get the following output −
[(10,) (20,) (30,)]
Example: Accessing Field Content of Structured Data Type
import numpy as np
dt = np.dtype([('age', np.int8)])
a = np.array([(10,), (20,), (30,)], dtype=dt)
print(a['age'])
The result produced is as follows −
[10 20 30]
Example: Defining a Complex Structured Data Type
The following examples define a structured data type called student with a string field ‘name’, an integer field ‘age’ and a float field ‘marks’. This dtype is applied to ndarray object −
import numpy as np
student = np.dtype([('name', 'S20'), ('age', 'i1'), ('marks', 'f4')])
print(student)
We get the output as shown below −
[('name', 'S20'), ('age', 'i1'), ('marks', '<f4')])
Example: Applying Complex Structured Data Type to ndarray
import numpy as np
student = np.dtype([('name', 'S20'), ('age', 'i1'), ('marks', 'f4')])
a = np.array([('abc', 21, 50), ('xyz', 18, 75)], dtype=student)
print(a)
The output is as follows −
[('abc', 21, 50.0), ('xyz', 18, 75.0)]
Each built-in data type has a character code that uniquely identifies it.
- ‘b’ − boolean
- ‘i’ − (signed) integer
- ‘u’ − unsigned integer
- ‘f’ − floating-point
- ‘c’ − complex-floating point
- ‘m’ − timedelta
- ‘M’ − datetime
- ‘O’ − (Python) objects
- ‘S’, ‘a’ − (byte-)string
- ‘U’ − Unicode
- ‘V’ − raw data (void)
Checking the Data Type of an Array
You can check the data type of an array using the dtype attribute. This attribute returns a dtype object, which describes the type of elements in the array as shown below −
import numpy as np
a = np.array([1, 2, 3])
print(a.dtype)
Following is the output obtained −
int64
Create Arrays With Defined Data Type
In NumPy, you can explicitly specify the data type (dtype) of the elements in an array at the time of its creation.
We can use the dtype parameter in array creation functions (such as np.array(), np.zeros(), np.ones(), etc.) to define the data type of the array elements. By default, NumPy refers the data type from the input data.
Example: Creating an Integer Array
In this example, we create an array a with elements of type int32, which means each element is a 32-bit integer −
import numpy as np
# Creating an array of integers with a specified dtype
a = np.array([1, 2, 3], dtype=np.int32)
print("Array:", a)
print("Data type:", a.dtype)
This will produce the following result −
Array: [1 2 3]
Data type: int32
Example: Creating an Integer Array
Here, we create an array c with elements of type complex64, indicating 64-bit complex numbers (32-bit real part and 32-bit imaginary part) −
import numpy as np
# Creating an array of complex numbers with a specified dtype
c = np.array([1+2j, 3+4j, 5+6j], dtype=np.complex64)
print("Array:", c)
print("Data type:", c.dtype)
Following is the output of the above code −
Array: [1.+2.j 3.+4.j 5.+6.j]Data type: complex64
Convert Data Type of NumPy Arrays
NumPy provides several methods to convert the data type of arrays, allowing you to change how data is stored and processed without modifying the underlying values −
- astype() Method − It is the most commonly used method for type conversion.
- numpy.cast() Functions − A set of functions provided by NumPy for casting arrays to different types.
- In-place Type Conversion − It convert types directly while creating arrays.
Example: Using the “astype” Method
The astype method creates a copy of the array, cast to a specified type. This is the most commonly used method for changing the data type of an array.
Here, we are converting an array of integers to float data type using the astype() method in NumPy −
import numpy as np
# Creating an array of integers
a = np.array([1, 2, 3, 4, 5])
print("Original array:", a)
print("Original dtype:", a.dtype)
# Converting to float
a_float = a.astype(np.float32)
print("Converted array:", a_float)
print("Converted dtype:", a_float.dtype)
The output obtained is as shown below −
Original array: [1 2 3 4 5]
Original dtype: int64
Converted array: [1. 2. 3. 4. 5.]
Converted dtype: float32
Example: Using “numpy.cast” Functions
NumPy also provides functions for casting arrays to specific types. These functions are less commonly used but can be handy in some cases.
In this example, we are creating an array of floats and converting it to integer using the numpy.int32() function −
import numpy as np
# Creating an array of floats
d = np.array([1.1, 2.2, 3.3, 4.4, 5.5])
print("Original array:", d)
print("Original dtype:", d.dtype)
# Converting to integer using numpy.int32
d_int = np.int32(d)
print("Converted array:", d_int)
print("Converted dtype:", d_int.dtype)
After executing the above code, we get the following output −
Original array: [1.1 2.2 3.3 4.4 5.5]
Original dtype: float64
Converted array: [1 2 3 4 5]
Converted dtype: int32
Example: In-place Type Conversion
You can also specify the data type during array creation to avoid the need to convert the type later.
Now, we are creating an array of integers by specifying the float data type using the numpy.float32() function −
import numpy as np
# Creating an array of integers with a specified dtype
e = np.array([1, 2, 3, 4, 5], dtype=np.float32)
print("Array:", e)
print("Data type:", e.dtype)
The result produced is as follows −
Array: [1. 2. 3. 4. 5.]
Data type: float32
What if a Value Cannot Be Converted?
When converting data types in NumPy, you may encounter values that cannot be converted to the desired type. This situation typically raises an error or results in unexpected behavior.
Let us explore different scenarios where a value cannot be converted and how to handle them −
Scenario 1: Converting Non-numeric Strings to Numbers
If you attempt to convert a non-numeric string to an integer or float, NumPy will raise a ValueError as shown below −
import numpy as np
# Creating an array with non-numeric strings
a = np.array(['1', '2', 'three', '4', '5'])
print("Original array:", a)
print("Original dtype:", a.dtype)
try:
# Attempting to convert to integer
a_int = a.astype(np.int32)
print("Converted array:", a_int)
print("Converted dtype:", a_int.dtype)
except ValueError as e:
print("Error:", e)
In this case, the string ‘three’ cannot be converted to an integer, resulting in a ValueError as shown in the output below −
Original array: ['1' '2' 'three' '4' '5']
Original dtype: <U5
Error: invalid literal for int() with base 10: 'three'
Scenario 2: Converting Out-of-Range Numbers
If you attempt to convert numbers that are out of range for the target data type, NumPy will raise an OverflowError −
import numpy as np
# Creating an array with large float values
b = np.array([1.1e10, 2.2e10, 3.3e10])
print("Original array:", b)
print("Original dtype:", b.dtype)
try:
# Attempting to convert to integer
b_int = b.astype(np.int32)
print("Converted array:", b_int)
print("Converted dtype:", b_int.dtype)
except OverflowError as e:
print("Error:", e)
Here, the large float values cannot be converted to int32 without overflow −
Original array: [1.1e+10 2.2e+10 3.3e+10]
Original dtype: float64
Error: OverflowError: (34, 'Numerical result out of range')
Scenario 3: Converting Complex Numbers to Real Numbers
When converting complex numbers to real numbers, NumPy discards the imaginary part and raises a ComplexWarning −
import numpy as np
# Creating an array with complex numbers
c = np.array([1+2j, 3+4j, 5+6j])
print("Original array:", c)
print("Original dtype:", c.dtype)
# Converting to float, discarding imaginary part
c_float = c.astype(np.float32)
print("Converted array:", c_float)
print("Converted dtype:", c_float.dtype)
In this case, NumPy raises a ComplexWarning and discards the imaginary part during conversion −
Original array: [1.+2.j 3.+4.j 5.+6.j]
Original dtype: complex128
ComplexWarning: Casting complex values to real discards the imaginary partc_float = c.astype(np.float32)
Converted array: [1. 3. 5.]
Converted dtype: float32
Scenario 4: Handling Conversion Errors
To handle conversion errors, you can use error handling techniques like try-except blocks to catch and process exceptions.
import numpy as np
# Creating an array with mixed data
d = np.array(['1', '2', 'three', '4', '5'])
print("Original array:", d)
print("Original dtype:", d.dtype)
def safe_convert(arr, target_type):
try:
return arr.astype(target_type)
except ValueError as e:
print("Conversion error:", e)
return None
# Attempting to convert to integer
d_int = safe_convert(d, np.int32)
if d_int is not None:
print("Converted array:", d_int)
print("Converted dtype:", d_int.dtype)
else:
print("Conversion failed.")
In this example, the safe_convert() function catches the “ValueError” and handles it by returning None and printing an error message as shown in the output below −
Original array: ['1' '2' 'three' '4' '5']
Original dtype: <U5
Conversion error: invalid literal for int() with base 10: 'three'
Conversion failed.
Scenario 5: Using “np.nan” for Invalid Conversions
For numeric conversions, you can use np.nan (Not a Number) to handle invalid values. This approach is useful when dealing with missing or corrupt data.
import numpy as np
# Creating an array with strings, including an invalid entry
e = np.array(['1.1', '2.2', 'three', '4.4', '5.5'])
print("Original array:", e)
print("Original dtype:", e.dtype)
def convert_with_nan(arr):
result = []
for item in arr:
try:
result.append(float(item))
except ValueError:
result.append(np.nan)
return np.array(result)
# Converting to float with np.nan for invalid entries
e_float = convert_with_nan(e)
print("Converted array:", e_float)
print("Converted dtype:", e_float.dtype)
Here, invalid entries are replaced with np.nan −
Original array: ['1.1' '2.2' 'three' '4.4' '5.5']
Original dtype: <U5
Converted array: [1.1 2.2 nan 4.4 5.5]
Converted dtype: float64
Converting Data Type on Existing Arrays
You can also convert the data type of existing arrays using the view() method to change the interpretation of the data without changing the underlying bytes.
Example
Here, the data is reinterpreted as “float32”, resulting in unexpected values because the underlying bytes remain unchanged −
import numpy as np
# Creating an array of integers
g = np.array([1, 2, 3, 4], dtype=np.int32)
print("Original array:", g)
print("Original dtype:", g.dtype)
# Viewing the array as float32
g_view = g.view(np.float32)
print("Viewed array:", g_view)
print("Viewed dtype:", g_view.dtype)
Following is the output of the above code −
Original array: [1 2 3 4]
Original dtype: int32
Viewed array: [1.4012985e-45 2.8025969e-45 4.2038954e-45 5.6051939e-45]
Viewed dtype: float32