Various length dataframe to extract (or split)

henrytang · May 11, 2022, 8:47am

Sorry newbie question. I have extracted a list of file name into dataframe like:
ElonMusk.txt
BillGate.txt
SteveJobs.txt

I just want to extract the name in that column (without .txt). How to code?

What I have tried but not success:
df[‘Filename’]=df[‘Filename’].str[:(df[‘Filename’].str.len())-4]

Error return:
AttributeError: ‘list’ object has no attribute ‘str’

Thanks a millions.

vainaixr · May 11, 2022, 9:39am

maybe slice it,

'ElonMusk.txt'[:-4]

or use split,

'ElonMusk.txt'.split('.')[0]

or use removesuffix,

'ElonMusk.txt'.removesuffix('.txt')

or if you want to use itertools,

from itertools import *
print(list(accumulate(islice(x := 'ElonMusk.txt', len(x) - 4)))[-1])

or,

total = ''
print([total := total + i for i in islice('ElonMusk.txt', len('ElonMusk.txt')-4)][-1])

but it would take more time, so we could instead use functools.reduce

from itertools import *
import functools
import operator as op
print(functools.reduce(op.iconcat, islice(x := 'ElonMusk.txt', len(x) - 4)))

henrytang · May 11, 2022, 9:51am

Hi Vainaixr,
Thanks for reply.
But any chance I can trim that column down by a single line code?
Imagine if I have millions of record in the dataframe that will be not good to type every file name in coding.

Regards,
Henry

vainaixr · May 11, 2022, 9:56am

we could use a list comprehension,

lst = ['abc.txt', 'def.txt', 'ghi.txt']
print([i.removesuffix('.txt') for i in lst]) # or any of the ways in my last post

or could use map, lambda

print(list(map(lambda x: x.removesuffix('.txt'), lst)))