Introducing NumPy, Part 3: Manipulating Arrays

Author:Murphy | View: 26107 | Time: 2025-03-23 11:25:16

Welcome to Part 3 of Introducing NumPy, a primer for those new to this essential Python library. Part 1 introduced NumPy Arrays and how to create them. Part 2 covered indexing and slicing arrays. Part 3 will show you how to manipulate existing arrays by reshaping them, swapping their axes, and merging and splitting them. These tasks are handy for jobs like rotating, enlarging, and translating images and fitting machine learning models.

Shaping and Transposing

NumPy comes with methods to change the shape of arrays, transpose arrays (invert columns with rows), and swap axes. You've already been working with the reshape() method in this series.

One thing to be aware of with reshape() is that, like all NumPy assignments, it creates a view of an array rather than a copy. In the following example, reshaping the arr1d array produces only a temporary change to the array:

In [1]: import numpy as np

In [2]: arr1d = np.array([1, 2, 3, 4])

In [3]: arr1d.reshape(2, 2)
Out[3]: 
array([[1, 2],
       [3, 4]])

In [4]: arr1d
Out[4]: array([1, 2, 3, 4])

This behavior is useful when you want to temporarily change the shape of the array for use in a computation, without copying any data.

Likewise, assigning an array to a new variable just creates another reference to the source array. In the following example, despite assigning the reshaped arr1d array to a new variable named arr2d, changing values in arr2d also changes the corresponding values in arr1d:

In [5]: arr2d = arr1d.reshape(2, 2)

In [6]: arr2d
Out[6]: 
array([[1, 2],
       [3, 4]])

In [7]: arr2d[0] = 42

In [8]: arr2d
Out[8]: 
array([[42, 42],
       [ 3,  4]])

In [9]: arr1d
Out[9]: array([42, 42,  3,  4])

This type of behavior can trip you up. As mentioned earlier, if you want to create a distinct ndarray object from an existing array, use the copy() method.

To modify an array in place rather than just create a view, use the shape() function and pass it a shape tuple:

In [10]: arr1d.shape = (2, 2)

In [11]: arr1d
Out[11]: 
array([[42, 42],
       [ 3,  4]])

Compare this code to In [2] – Out [4]. Here, the source array is permanently changed.

Flattening an Array

There are times when you'll want to use 1D arrays as input to some process, even though your data is of a higher dimension. For example, standard plotting routines typically expect simple data structures, such as a list or single flat array. Likewise, image data is generally converted to 1D arrays before being fed to the input layer of a neural network.

Going from a higher dimension array to a 1D array is known as flattening. The ravel() function lets you do this while making a view of the array. Here's an example:

In [12]: arr2d = np.arange(8).reshape(2, 4)

In [13]: arr2d
Out[13]: 
array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [14]: arr1d = arr2d.ravel()

In [15]: arr1d
Out[15]: array([0, 1, 2, 3, 4, 5, 6, 7])

To create a copy __ of the array when flattening, you can use the flatten() method of the ndarray object. Because this produces a copy rather than a _vie_w, it's a bit slower than ravel(). Here's the syntax:

In [16]: arr2d.flatten()
Out[16]: array([0, 1, 2, 3, 4, 5, 6, 7])

You can also flatten the original array in place by using the shape() function and passing it the number of elements in the array:

In [17]: arr2d.shape = (8)

In [18]: arr2d
Out[18]: array([0, 1, 2, 3, 4, 5, 6, 7])

Remember, you can get the size of an array by calling its size attribute using dot notation.

Swapping an Array's Columns and Rows

When analyzing data, it's good to examine it in multiple ways. The following figure shows average temperature data by month for three Texas cities. How you present the data, either by month or by location, can be beneficial depending on the questions you're trying to answer as well as how much space you have for printing the information in a report.

The average monthly temperatures (o F) for three Texas cities displayed by month and by city (from Python Tools for Scientists) (this and future links to my book represent affiliate links)

Just as Microsoft Excel lets you easily invert columns and rows, NumPy provides the handy transpose() function for this operation:

In [19]: arr2d = np.arange(8).reshape(2, 4)

In [20]: arr2d
Out[20]: 
array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [21]: arr2d.transpose()
Out[21]: 
array([[0, 4],
       [1, 5],
       [2, 6],
       [3, 7]])

This is still a view of the original array. To create a new array, you can add the copy() function, like so:

In [22]: arr2d_transposed = arr2d.transpose().copy()

For higher-dimension arrays, you can pass transpose() a tuple of axis numbers in the order you desire. Let's transpose a 3D array so that the axes are reordered with the third axis first, the first axis second, and the second axis unchanged:

In [23]: arr3d = np.arange(12).reshape(2, 2, 3)

In [24]: arr3d
Out[24]: 
array([[[ 0,  1,  2],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

In [25]: arr3d.transpose((2, 1, 0))
Out[25]: 
array([[[ 0,  6],
        [ 3,  9]],

       [[ 1,  7],
        [ 4, 10]],

       [[ 2,  8],
        [ 5, 11]]])

Another method for swapping axes is swapaxes(). It takes a pair of axes and rearranges the array, returning a view of the array. Here's an example:

In [26]: arr3d
Out[26]: 
array([[[ 0,  1,  2],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

In [27]: arr3d.swapaxes(0, 1)
Out[27]: 
array([[[ 0,  1,  2],
        [ 6,  7,  8]],

       [[ 3,  4,  5],
        [ 9, 10, 11]]])

Joining Arrays

NumPy provides several functions that let you merge, or stack, multiple existing arrays into a new array. Let's begin by making two 2D arrays, the first composed of zeros, and the second composed of ones:

In [28]: zeros = np.zeros((3, 3))

In [29]: ones = np.ones((3, 3))

Now let's vertically stack the two arrays using the vstack() method. This will add the second array to the first as new rows along axis 0:

In [30]: np.vstack((zeros, ones))
Out[30]: 
array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

The hstack() method adds the second array as new columns on the first:

In [31]: np.hstack((zeros, ones))
Out[31]: 
array([[0., 0., 0., 1., 1., 1.],
       [0., 0., 0., 1., 1., 1.],
       [0., 0., 0., 1., 1., 1.]])

The row_stack() and column_stack() functions stack 1D arrays to form new 2D arrays. For example:

In [32]: x = np.array([1, 2, 3])

In [33]: y = np.array([4, 5, 6])

In [34]: z = np.array([7, 8, 9])

In [35]: np.row_stack((x, y, z))
Out[35]: 
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [36]: np.column_stack((x, y, z))
Out[36]: 
array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

You also can accomplish column stacking along axis 2 using the depth stacking method (dstack((x, y, z)). This method is like hstack(), except it first converts 1D arrays to 2D column vectors.

Splitting Arrays

NumPy also lets you divide, or split, arrays. As with joining, you can perform splitting both vertically and horizontally.

Here's an example using the vsplit() method. First, let's create an array:

In [37]: source = np.arange(24).reshape((4, 6))

In [38]: source
Out[38]: 
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

To split the source array in half vertically (axis=0), pass the vsplit() function the array and 2 as arguments:

In [39]: split1, split2 = np.vsplit(source, 2)

In [40]: split1
Out[40]: 
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])

In [41]: split2
Out[41]: 
array([[12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

To split the source array in half horizontally (axis=1), pass hsplit() the array and 2 as arguments:

In [42]: split1, split2 = np.hsplit(source, 2)

In [43]: split1
Out[43]: 
array([[ 0,  1,  2],
       [ 6,  7,  8],
       [12, 13, 14],
       [18, 19, 20]])

In [44]: split2
Out[44]: 
array([[ 3,  4,  5],
       [ 9, 10, 11],
       [15, 16, 17],
       [21, 22, 23]])

In the previous examples, the array split must result in an equal division.

You can split an array into multiple arrays along an axis with the split() method. You pass the method the original array and the indexes for the parts to be split, along with an optional axis number (the default is axis 0). For example, to divide the source array into three arrays of two, three, and one column, you would enter the following:

In [45]: a, b, c = np.split(source, [2, 5], axis=1)

In [46]: a
Out[46]: 
array([[ 0,  1],
       [ 6,  7],
       [12, 13],
       [18, 19]])

In [47]: b
Out[47]: 
array([[ 2,  3,  4],
       [ 8,  9, 10],
       [14, 15, 16],
       [20, 21, 22]])

In [48]: c
Out[48]: 
array([[ 5],
       [11],
       [17],
       [23]])

The indexes [2, 5] told NumPy where along axis 1 to split the array. To repeat this over the rows, just change the axis argument to 0.

Test Your Knowledge

Testing yourself on newly acquired knowledge is a great way to lock in what you've learned. Here's a quick quiz to help you on your way. The answers are at the end of the article.

Question 1: Why is there so much whitespace in the first two elements in this output array: