Minmax function alongside min and max

avayert · December 11, 2019, 6:17pm

There is a frequent pattern where you need to get both the min and max of some iterable. You cannot always call both min and max on the same iterable, as it might be exhausted upon iteration. While it is certainly simple to implement this with a raw loop, I think a minmax function would be a good addition to the language alongside the two existing built-ins.

As the name suggests, it would return a tuple of min_elem, max_elem, and would work as min and max do now (either pass in an iterable or *args, and have a key kwarg). If adding more built-in functions is a problem - for example due to reserving names in the global namespace - it could live in itertools or math.

pylang · December 11, 2019, 10:10pm

I believe I’ve seen minmax() functions in libraries from other languages. I’m not too familiar with it. Can you please demonstrate some use cases in Python?

pf_moore · December 11, 2019, 10:50pm

I believe it’s simply the following:

def minmax(it):
    min = max = None
    for val in it:
        if min is None or val < min:
            min = val
        if max is None or val > max:
            max = val
    return min, max

The point being that it computes both min and max in one pass, which is important if the argument cannot be consumed repeatedly (for example, a generator) or the cost of iterating twice is a problem.

I’m not sure how useful this would be, given that the implementation is relatively straightforward. But of course, the same could be said of min or max, although they are much more commonly needed.

aeros · December 12, 2019, 8:16am

Yeah I think something like this would be specifically suited towards the math module.

Depending upon how common the use case would be, it could certainly benefit from being in the math module. Particularly if a C accelerator is implemented, as the performance improvements can be significant when iterating over a decent number of items.

pf_moore · December 12, 2019, 8:35am

That suggests it’s intended only for use with numbers, which isn’t true of the builtin min and max.

steven.daprano · December 12, 2019, 1:05pm

I think you have misunderstood what minmax does, it is nothing like the

clamp/clip functions you have linked to.

uranusjr · December 12, 2019, 1:11pm

Ah, you’re right, I don’t know how I somehow so terribly misunderstood the topic. Sorry for the disturbance.

avayert · December 12, 2019, 1:51pm

The most frequent use case I personally face is having to find the upper and lower bounds of a desired value found in some collections. I’m sure there would be some other clever use cases that I haven’t thought of.

Concrete examples I’ve dealth with that I can remember include figuring out line lengths for wrapping, having a dictionary work as a simple “sparse array”, and dealing with time intervals (e.g. file creation dates, git commit dates, dates of news articles) in general.

A simple use case outlined in the original post would be finding both the min and max value of some iterator, without having to cast it to a collection such as list first. Sometimes this might not even be possible, as the result would take far too much memory. For this reason minmax is especially useful for dealing with files, as for example logfiles you might want to parse could be gigabytes in size. One might not be able to load them in memory, and reading through them twice is not ideal either.

As far as I can tell the itertools module is also implemented in C, and as was said would more likely be a better fit as math is for number related functions.

encukou · December 12, 2019, 2:16pm

Proposed additions to stdlib should generally go in a module on PyPI first. That way, they get some real-world validation and they’re easily available for older versions of Python.

A good place for minmax could be more-itertools, a fairly well-known project with “more routines for operating on iterables, beyond itertools”.
(For me personally, the fact that minmax is not there yet makes me doubt its general usefulness.)

aeros · December 12, 2019, 2:39pm

Ah, good point. I primarily have used the existing min() and max() functions when working with numbers, so the math module was my first consideration.

But, I’m in agreement with @encukou that it should be implemented in another module before being potentially considered for itertools, which typically takes a highly conservative approach to adding new functions.

steven.daprano · December 12, 2019, 3:02pm

I have long wanted an efficient version of minmax, because calculating
it in Python is trickier than it seems. The biggest hassle is trying to
match the existing min/max API.

The minmax function would be useful for the statistics module, where
the statistical range function is a simple measure of dispersion and is
defined as:

maximum data point - minimum data point

For small data sets, the range is sometimes considered a better measure
of dispersion than the IQR or stdev. The range is taught in secondary
school maths classes in Australia, from Years 7 to 10 or so.

minmax would also be useful for the midrange, although that isn’t
commonly taught in Australian schools (as far as I know).

The main reason I haven’t already added minmax to statistics as a helper
function for calculating the range is my anticipation of objections that
“range” will shadow the built-in and so will be confusing.

steven.daprano · December 12, 2019, 3:21pm

Petr Viktorin wrote:

“Proposed additions to stdlib should generally go in a module on PyPI first.”

This is Python, not Node.js

We don’t typically install tiny packages from PyPI, nor should we have
to. Demanding that each proposed new feature prove itself on PyPI first
is just a stealth way of saying “No”, since the Python community
typically doesn’t install single-function packages (as the Node.js
community does); nor do we typically lift single functions from popular
packages like more-itertools. Putting individual functions on PyPI will
just doom them to obscurity: without a critical mass of interest, nobody
will even know it exists, let alone download it.

PyPI is the right place for big feature-sets to experiment first,
where the API may be unstable for a while, or where the utility of the
module is uncertain, or the release schedule doesn’t match Python’s. It
is completely the wrong place for a single function like minmax.

The minmax function is trickier to get right than most people think, but
for many purposes, calling min and max separately is “good enough”. But
not all purposes.

storchaka · December 12, 2019, 4:08pm

I would want to see several examples. I have not found any use of this pattern in the stdlib.

mjpieters · December 12, 2019, 6:54pm

For the standard library it’d have to match the existing min() and max() signatures, which accept either an iterable or multiple positional arguments, an optional key callable and an optional default value:

_sentinel = object()

def minmax(*args, key=None, default=_sentinel):
    if len(args) > 1:
        if default is not _sentinel:
            raise ValueError(
                "Cannot specify a default for minmax() with "
                "multiple positional arguments"
           )
        v = args
    else:
        v = args[0]

    minval = minitem = maxval = maxitem = None
    for item in v:
        val = key(item) if key is not None else item
        if minval is None or val < minval:
            minval, minitem = val, item
        if maxval is None or val > maxval:
            maxval, maxitem = val, item
    
    if minval is None:
        assert minitem is maxval is maxitem is None
        if default is _sentinel:
            raise ValueError("minmax() arg is an empty sequence")
        minitem = maxitem = default
    
    return minitem, maxitem

Of course, in real-world code where someone found they need to find both the min and max of an iterator, they usually create a simpler version.

pylang · December 13, 2019, 12:50am

Lovely discussion. Thanks everyone.

@avayert “A simple use case outlined in the original post would be finding both the min and max value of some iterator, without having to cast it to a collection such as list first.”

Based on your examples, in a pinch, I would use two iterators:

def minmax(it, key=None):
    """Return a tuple of min and max."""
    if key is None:
        key = lambda x: x
    it1, it2 = itertools.tee(it)
    return min(it1, key=key), max(it2, key=key)

This is a quick solution that should work on more than just numbers, although other posts have demonstrated some improvements such as one-pass (@pf_moore) and complete signatures (@mjpieters).

As @encukou suggests, I would definitely propose this to more_itertools as it seems like a nice fit there.

steven.daprano · December 13, 2019, 1:22am

py wrote:

“Based on your examples, in a pinch, I would use two iterators:”

[snip sample code using iterator.tee]

Interally, itertools.tee has to save all the values already seen until
the other iterators consume them. Essentially, under the hood tee saves
the values in a FIFO queue.

See the docs, especially the note at the end:

“if one iterator uses most or all of the data before another iterator
starts, it is faster to use list() instead of tee()”

https://docs.python.org/2/library/itertools.html#itertools.tee

Using tee in this case is an anti-pattern. But it’s a trap people can
fall into if they don’t know the performance characteristics of tee.

Paddy3118 · December 13, 2019, 4:07pm

I would extend its usefulness to general collections and not just iterators. finding min and max with the same key function is, for me, a common way of getting an idea of the extent, or range, or size, of a collection. I use min and max because that is what is currently available but a minmax function that was twice as fast and worked on the same inputs, (as well as consuming only one iterator), seems good.

ruud · December 13, 2019, 5:36pm

Further to @mjpieters 's excellent Python version, I suggest a version which doesn’t need to check for None, neither for the key, neither for the value.

_sentinel = object()
def minmax(*args, key=None, default=_sentinel):
    if len(args) > 1:
        if default is not _sentinel:
            raise ValueError(
                "Cannot specify a default for minmax() with "
                "multiple positional arguments"
           )
        v = iter(args)
    else:
        v = iter(args[0])
    if key is None:
        key = lambda x:x
    try:
        minitem = maxitem = next(v)
        minval = maxval = key(minitem)

    except StopIteration:
        raise ValueError('minmax() arg is an empty sequence')        

    for item in v:
        val = key(item)
        if val < minval:
            minval, minitem = val, item
        if val > maxval:
            maxval, maxitem = val, item
        
    return minitem, maxitem

avayert · December 13, 2019, 7:36pm

I’m not all too familiar with the CPython codebase but a cursory search with the GitHub search ignoring the test folders revealed at least some uses:

finding both min and max is used twice in colorsys within the functions
it’s used in timeit to deal with time intervals
It seems to be used in multiple separate places to deal with common path handling like here and here (I think it was also in ntpath)
It’s used in this seemingly out of place test function inside random

Additionally, using global GitHub search I typed in “min max” into the github advanced search, limiting results to only show .py files, and sorted by most recently indexed (link). Within only the last 15 or so minutes of me hitting search there were already plenty of uses, of which I’ve linked some here:
[1] [2] [3] [4] [5] [6]

Marco_Sulla · December 14, 2019, 12:07am

If I can, I have a modification to your solution:

_sentinel = object()

def minmax(*args, key=None):
    args_len = len(args)
    
    if args_len == 0:
        fname = minmax.__name__
        raise ValueError(f"{fname}() expected 1 argument, got 0")
    elif args_len == 1:
        seq = args[0]
    else:
        seq = args
    
    it = iter(seq)

    vmax = vmin = next(it, _sentinel)
    
    if vmax is _sentinel:
        fname = minmax.__name__
        raise ValueError(f"{fname}() arg is an empty sequence")
    
    if key is None:
        for val in it:
            if val > vmax:
                vmax = val
            elif val < vmin:
                vmin = val
    else:
        fmax = fmin = key(vmax)
        
        for val in it:
            fval = key(val)
            
            if fval > fmax:
                fmax = fval
                vmax = val
            elif fval < fmin:
                fmin = fval
                vmin = val
    
    return (vmin, vmax)

I know, it’s more verbose and less elegant, but I think it’s more fast for the majority of cases, ie key=None.

PS: no default. What must it returns, (default, default)? Or we have to define a default_min and a default_max? Overcomplicated IMHO.