Reshaping a single column in to multiple column using Python

sahar · June 19, 2022, 5:24pm

I have an excel file containing a single column (Row number is not fixed). Using Python 3, I want to,

Import my excel file/data in python,
Read/select the data column (first column), and
Reshape this column into multiple columns having 10 rows in each column and finally
Writing output to a new excel file.
I am a Matlab user and know how to do this in Matlab using Matlab’s reshape command newVar = reshape (myColumn, 10, [ ]).

Looking for someone to help me out achieving this in Python 3.

cameron · June 19, 2022, 10:30pm

By Sahar via Discussions on Python.org at 19Jun2022 17:34:

I have an excel file containing a single column (Row number is not
fixed). Using Python 3, I want to,

Import my excel file/data in python,

Have a look at the openpyxl module, which you can install with:

python3 -m pip install openpyxl

adjusting python3 to however you invoke Python 3 from the command
line. Documentation here:
https://openpyxl.readthedocs.io/en/stable/

Alternatively I think pandas has a function to read Excel files:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
along the lines of the very useful read-csv function.

Read/select the data column (first column), and

Pandas will give you a DataFrame, possibly a few.

Reshape this column into multiple columns having 10 rows in each column and finally

See what you can do with DataFrames. At the east it is fairly easy to
add new columns selected from an existing column.

Writing output to a new excel file.

Again, openpyxl or Pandas.

I am a Matlab user and know how to do this in Matlab using Matlab’s reshape command newVar = reshape (myColumn, 10, [ ]).
Looking for someone to help me out achieving this in Python 3.

Enjoy,
Cameron Simpson cs@cskk.id.au

sahar · June 20, 2022, 5:44am

Many thanks for your explanation. I have done this so far but it is not producing any result, only a bunch of errors.

import pandas as pd
import numpy as np
df =  pd.read_excel('sample.xlsx')
myCol = pd.DataFrame(df.iloc[:,0])
arr = np.array (myCol)
newArr = np.reshape(arr, (10, -1), order='F')
np.savetxt("newMatrix.csv", newArr, delimiter=",")

cameron · June 20, 2022, 7:34am

By Sahar via Discussions on Python.org at 20Jun2022 05:54:

Many thanks for your explanation. I have done this so far but it is not
producing any result, only a bunch of errors.

That’s pretty detailed for something producing errors.

Please also include a transcription of the full error messages between
triple backticks.

Cheers,
Cameron Simpson cs@cskk.id.au

sahar · June 20, 2022, 8:06am

I have found a strange thing. If i add one more row in my column, the same code performs as intended.

Please note now I have 31 rows. To me it seems that something has to do with indexing.

cameron · June 20, 2022, 8:47am

By Sahar via Discussions on Python.org at 20Jun2022 08:16:

I have found a strange thing. If i add one more row in my column, the
same code performs as intended.

Untitled666×791 9.04 KB

Please note now I have 31 rows. To mee it seems something with
indexing.

Please paste text between triple backticks, not images.

Thanks,
Cameron Simpson cs@cskk.id.au

sahar · June 20, 2022, 9:11am

Traceback (most recent call last):
  File "D:\Dropbox\KFUEIT\Research Collaobaration\Dr Ijaz\CsPbF3\Band Structure\BandStructure.py", line 6, in <module>
    newArr = np.reshape(arr, (10, -1), order='F')
  File "<__array_function__ internals>", line 180, in reshape
  File "C:\Program Files\Python310\lib\site-packages\numpy\core\fromnumeric.py", line 298, in reshape
    return _wrapfunc(a, 'reshape', newshape, order=order)
  File "C:\Program Files\Python310\lib\site-packages\numpy\core\fromnumeric.py", line 57, in _wrapfunc
    return bound(*args, **kwds)
ValueError: cannot reshape array of size 29 into shape (10,newaxis)

vbrozik · June 20, 2022, 1:19pm

It looks like you are trying to reshape an array 29(x1) to 10xN. As I understand it the elements need to fit into the new shape perfectly so you need to add one element to the original array to make its size divisible by 10.

mlgtechuser · June 25, 2022, 3:38pm

You didn’t mention where your data comes from or where it’s going, Sahar, but if you’re looking for a simple approach, reading and reshaping is very doable in native Python without even loading any functions from the Standard Library (stdlib). Note that this will be slower than numpy, for example, but it’s a good way to learn basic python before learning about extended library functions.

Save the spreadsheet as a csv file
- or put the data directly into a csv file instead of an Excel file.
Read the rows into a list and strip LF characters.
Break the list into groups with a slicing window.

raw_list = open("SaharDataIN.csv",'r').read().split()
chunk_size = 10
grouped_list1 = [raw_list[pos:pos + chunk_size] for pos in range(0,len(raw_list), chunk_size)]
#NOTE: the line above is longer than the PEP8 recommendation.

#a narrower version of the 'grouped_list1' list comprehension:
grouped_list2 = []
for pos in range(0,len(raw_list), chunk_size):
    grouped_list2.append(raw_list[pos:pos + chunk_size])

And to write the CSV file:

cooked_list = open("SaharDataOUT.csv", 'w')
for row in grouped_list1:
    for item in row:
        cooked_list.writelines(item+',')
    cooked_list.write("\n")
cooked_list.close()

The CSV file will have the following data:

1,2,3,4,5,6,7,8,9,10,
11,12,13,14,15,16,17,18,19,20,
21,22,23,24,25,26,27,28,29,
#Excel will ignore the commas at the end of each line.