Please add Template.split()

I’m branching this off from the Template.join() thread and is related to the PEP 787 discussion.

A Template.split(sep=’ ') function that broke a t-string into its respective parts seems like it would be logical, and would be a good sequence to feed into subprocess.run. Only the string parts of the template would be split, the Interpolation parts would not be . It would look like:

name = "Dolly"
feet = 12

greeting = with_split(t"Hello {name}")
print(greeting.split())      # ('Hello ', 'Dolly')

conversion = with_split(t"A foot is {feet:.2f} inches")
print(conversion.split())    # ('A foot is ', '12.00', ' inches')

user_input = "/tmp; rm -fr /"
no_inject = with_split(t"ls {user_input}")
print(no_inject.split())     # ('ls ', '/tmp; rm -fr /')

home = "/home"
long_listing = with_split(t"ls -l{home}") 
print(long_listing.split())  # ('ls', '-l', '/home')

Because it would use the t string curly delinerators instead of (suspect) shell splitting rules, it could look more like a command line: t"ls {user_input}" but would not be subject to injection attacks

An implementation could be something like:

def split(self,sep=' '):
            context = globals()
            parts = []
            strings = self._template.strings
            interpolations = self._template.interpolations

            for i in range(len(interpolations)):
                if strings[i]:
                    parts.extend(strings[i].split(sep))

                expr = interpolations[i].expression
                conversion = interpolations[i].conversion
                fmt = interpolations[i].format_spec or ""

                value = eval(expr, context)

                if conversion == "s": 
                    value = str(value)
                elif conversion == "r": 
                    value = repr(value)
                elif conversion == "a": 
                    value = ascii(value)

                parts.append(format(value, fmt))

            if strings[-1]:
                parts.append(strings[-1])

            return tuple(parts)

1 Like

While I’m not as opposed as some others in the other thread, this feels like something that could be added later if and when it’s clearly needed or widely used. Your examples show how it might work, but don’t seem to
motivate why this API is useful.

To specifically address your proposed API: Template.split should return a list of Template objects; it should not presume to format the interpolations.

I’d also suggest entirely avoiding any use of shell command strings as examples or motivation. The other thread discussing use of Templates in the subprocess api makes clear that this is a minefield and will undermine your case for why this method would be useful more generally.

I probably should have been more explicit – subprocess.run(args,…) does not invoke a shell when args is a sequence (tuple or list). The whole goal of the proposal is to provide useful tuples from the t-string.

I believe that the problem here is that there are several different ways to split a string, it is not as an obvious operation as merging a sequence of strings.

For example, the separator being a space is not the default for str.split(). The default is to split on a contiguous sequence of blanks:

>>> 'foo   bar\t baz'.split()
['foo', 'bar', 'baz']

>>> 'foo   bar\t baz'.split(" ")
['foo', '', '', 'bar\t', 'baz']

and on top of that string offers rsplit, splitline too. It could be argued that str.__iter__ is a form of splitting too.

All these before even considering the semantics of what you are splitting, e.g. whether you want to split a shell command on meaningful tokens as shlex.split() does.

This is not unlike numbers: adding a sequence of numbers together is a trivial operation [1], splitting them is a pattern (see the Money.allocate() discussion on the book - couldn’t find any publicly available version).


  1. this statement is false with floating point numbers of different signs ↩︎