Splitting a string without braking a word

cheesebird · June 17, 2022, 5:25pm

So I’m looking for a way to split a string without breaking a word and came across this which works but not exactly as I need.

orig_s = 'I am a noob in python please can you help me break a string without cutting a word'

from textwrap import wrap

print(wrap(orig_s, 26))

Output…

['I  a noob in python', 'please can you help me', 'break a string without', 'cutting a word']

This is a great module but I need the the first spilt to be 26 and the second to be 40 and even the possibility of a 3rd split if the string is long enough.

Is there anyway to modify textwrap to do this or any alternative suggestions?

cheesebird · June 17, 2022, 6:14pm

Regex to the rescue …

cheese = re.findall('(.{1,26}(?:\s))(.{27,67}(?:\s|$))',orig_s)

print(cheese)

But surely there is a better way?

rob42 · June 17, 2022, 6:53pm

I’m not sure about ‘better’, but have you looked at the .isspace() method? I’m sure that could be used in a loop so that you ‘know’ if your split lands on a space character.

vbrozik · June 17, 2022, 7:18pm

Is this what you need?

from textwrap import wrap

orig_s = 'I am a noob in python please can you help me break a string without cutting a word'
width_first = 26
width_rest = 40

initial_indent = ' ' * (width_rest - width_first)
print('\n'.join(wrap(orig_s, width_rest, initial_indent=initial_indent)))

              I am a noob in python
please can you help me break a string
without cutting a word

If you do not want the spaces at the beginning, remove them using the str.lstrip() method.

vbrozik · June 17, 2022, 7:34pm

Regex is a good idea as a simple solution for this (for short texts, not megabytes) but for more than two lines you will need to apply it repeatedly in a loop to split the string to a line and the rest which will be split in the next iteration.

Something like:

result = []
text_rest = orig_str
while text_rest:
    result_line, text_rest = your_regex_splitter(text_rest)
    result.append(result_line)

or better make it a generator:

from typing import Iterator

def text_wrap(text: str) -> Iterator[str]:
    while text:
        result_line, text = your_regex_splitter(text)
        yield result_line

your_regex_splitter should have the line length as an argument.

mlgtechuser · June 18, 2022, 6:46am

This can be optimized further, but works as requested. Would be a simple function def: to move the bulky code into the attic. Certain not to be the fastest option, though…

orig_sentence = 'I am a noob in python. Please can you help me break a string without cutting a word?'
split_sentence = orig_sentence.split()
line = ''
split_len = 26
wrapped_sentence = []
for word in split_sentence:
    if len(line+word)<split_len:
        line += word + ' '
    else:
        wrapped_sentence.append(line)
        line = word + ' '
        split_len = 40
wrapped_sentence.append(line) #needed to paste the contents of the last 'line'

Output:

['I am a noob in python. ', 'Please can you help me break a string ', 'without cutting a word? ']

cheesebird · June 18, 2022, 7:42am

Nice thanks works perfectly