There is a frequent pattern where you need to get both the min and max of some iterable. You cannot always call both min and max on the same iterable, as it might be exhausted upon iteration. While it is certainly simple to implement this with a raw loop, I think a minmax function would be a good addition to the language alongside the two existing built-ins.
As the name suggests, it would return a tuple of min_elem, max_elem, and would work as min and max do now (either pass in an iterable or *args, and have a key kwarg). If adding more built-in functions is a problem - for example due to reserving names in the global namespace - it could live in itertools or math.
I believe I’ve seen minmax() functions in libraries from other languages. I’m not too familiar with it. Can you please demonstrate some use cases in Python?
def minmax(it):
min = max = None
for val in it:
if min is None or val < min:
min = val
if max is None or val > max:
max = val
return min, max
The point being that it computes both min and max in one pass, which is important if the argument cannot be consumed repeatedly (for example, a generator) or the cost of iterating twice is a problem.
I’m not sure how useful this would be, given that the implementation is relatively straightforward. But of course, the same could be said of min or max, although they are much more commonly needed.
Yeah I think something like this would be specifically suited towards the math module.
Depending upon how common the use case would be, it could certainly benefit from being in the math module. Particularly if a C accelerator is implemented, as the performance improvements can be significant when iterating over a decent number of items.
The most frequent use case I personally face is having to find the upper and lower bounds of a desired value found in some collections. I’m sure there would be some other clever use cases that I haven’t thought of.
Concrete examples I’ve dealth with that I can remember include figuring out line lengths for wrapping, having a dictionary work as a simple “sparse array”, and dealing with time intervals (e.g. file creation dates, git commit dates, dates of news articles) in general.
A simple use case outlined in the original post would be finding both the min and max value of some iterator, without having to cast it to a collection such as list first. Sometimes this might not even be possible, as the result would take far too much memory. For this reason minmax is especially useful for dealing with files, as for example logfiles you might want to parse could be gigabytes in size. One might not be able to load them in memory, and reading through them twice is not ideal either.
As far as I can tell the itertools module is also implemented in C, and as was said would more likely be a better fit as math is for number related functions.
Proposed additions to stdlib should generally go in a module on PyPI first. That way, they get some real-world validation and they’re easily available for older versions of Python.
A good place for minmax could be more-itertools, a fairly well-known project with “more routines for operating on iterables, beyond itertools”.
(For me personally, the fact that minmax is not there yet makes me doubt its general usefulness.)
Ah, good point. I primarily have used the existing min() and max() functions when working with numbers, so the math module was my first consideration.
But, I’m in agreement with @encukou that it should be implemented in another module before being potentially considered for itertools, which typically takes a highly conservative approach to adding new functions.
I have long wanted an efficient version of minmax, because calculating
it in Python is trickier than it seems. The biggest hassle is trying to
match the existing min/max API.
The minmax function would be useful for the statistics module, where
the statistical range function is a simple measure of dispersion and is
defined as:
maximum data point - minimum data point
For small data sets, the range is sometimes considered a better measure
of dispersion than the IQR or stdev. The range is taught in secondary
school maths classes in Australia, from Years 7 to 10 or so.
minmax would also be useful for the midrange, although that isn’t
commonly taught in Australian schools (as far as I know).
The main reason I haven’t already added minmax to statistics as a helper
function for calculating the range is my anticipation of objections that
“range” will shadow the built-in and so will be confusing.
“Proposed additions to stdlib should generally go in a module on PyPI first.”
This is Python, not Node.js
We don’t typically install tiny packages from PyPI, nor should we have
to. Demanding that each proposed new feature prove itself on PyPI first
is just a stealth way of saying “No”, since the Python community
typically doesn’t install single-function packages (as the Node.js
community does); nor do we typically lift single functions from popular
packages like more-itertools. Putting individual functions on PyPI will
just doom them to obscurity: without a critical mass of interest, nobody
will even know it exists, let alone download it.
PyPI is the right place for big feature-sets to experiment first,
where the API may be unstable for a while, or where the utility of the
module is uncertain, or the release schedule doesn’t match Python’s. It
is completely the wrong place for a single function like minmax.
The minmax function is trickier to get right than most people think, but
for many purposes, calling min and max separately is “good enough”. But
not all purposes.
For the standard library it’d have to match the existing min() and max() signatures, which accept either an iterable or multiple positional arguments, an optional key callable and an optional default value:
_sentinel = object()
def minmax(*args, key=None, default=_sentinel):
if len(args) > 1:
if default is not _sentinel:
raise ValueError(
"Cannot specify a default for minmax() with "
"multiple positional arguments"
)
v = args
else:
v = args[0]
minval = minitem = maxval = maxitem = None
for item in v:
val = key(item) if key is not None else item
if minval is None or val < minval:
minval, minitem = val, item
if maxval is None or val > maxval:
maxval, maxitem = val, item
if minval is None:
assert minitem is maxval is maxitem is None
if default is _sentinel:
raise ValueError("minmax() arg is an empty sequence")
minitem = maxitem = default
return minitem, maxitem
Of course, in real-world code where someone found they need to find both the min and max of an iterator, they usually create a simpler version.
@avayert “A simple use case outlined in the original post would be finding both the min and max value of some iterator, without having to cast it to a collection such as list first.”
Based on your examples, in a pinch, I would use two iterators:
def minmax(it, key=None):
"""Return a tuple of min and max."""
if key is None:
key = lambda x: x
it1, it2 = itertools.tee(it)
return min(it1, key=key), max(it2, key=key)
This is a quick solution that should work on more than just numbers, although other posts have demonstrated some improvements such as one-pass (@pf_moore) and complete signatures (@mjpieters).
As @encukou suggests, I would definitely propose this to more_itertools as it seems like a nice fit there.
“Based on your examples, in a pinch, I would use two iterators:”
[snip sample code using iterator.tee]
Interally, itertools.tee has to save all the values already seen until
the other iterators consume them. Essentially, under the hood tee saves
the values in a FIFO queue.
See the docs, especially the note at the end:
“if one iterator uses most or all of the data before another iterator
starts, it is faster to use list() instead of tee()”
I would extend its usefulness to general collections and not just iterators. finding min and max with the same key function is, for me, a common way of getting an idea of the extent, or range, or size, of a collection. I use min and max because that is what is currently available but a minmax function that was twice as fast and worked on the same inputs, (as well as consuming only one iterator), seems good.
Further to @mjpieters 's excellent Python version, I suggest a version which doesn’t need to check for None, neither for the key, neither for the value.
_sentinel = object()
def minmax(*args, key=None, default=_sentinel):
if len(args) > 1:
if default is not _sentinel:
raise ValueError(
"Cannot specify a default for minmax() with "
"multiple positional arguments"
)
v = iter(args)
else:
v = iter(args[0])
if key is None:
key = lambda x:x
try:
minitem = maxitem = next(v)
minval = maxval = key(minitem)
except StopIteration:
raise ValueError('minmax() arg is an empty sequence')
for item in v:
val = key(item)
if val < minval:
minval, minitem = val, item
if val > maxval:
maxval, maxitem = val, item
return minitem, maxitem
I’m not all too familiar with the CPython codebase but a cursory search with the GitHub search ignoring the test folders revealed at least some uses:
finding both min and max is used twice in colorsys within the functions
It seems to be used in multiple separate places to deal with common path handling like here and here (I think it was also in ntpath)
It’s used in this seemingly out of place test function inside random
Additionally, using global GitHub search I typed in “min max” into the github advanced search, limiting results to only show .py files, and sorted by most recently indexed (link). Within only the last 15 or so minutes of me hitting search there were already plenty of uses, of which I’ve linked some here: [1][2][3][4][5][6]