Disabling use of certain keywords with defaultdict

this syntax does not raise an error,

from collections import defaultdict
x = defaultdict(enumerate)
y = defaultdict(filter)
z = defaultdict(lambda x: x.upper())

is there a way to use the above, otherwise shouldn’t the above syntax raise an error?

defaultdict requires a function that takes no arguments.

If you pass a function that takes arguments, you will get an error when the function is used.

What are you trying to do?

enumerate and filter aren’t keywords, they’re builtin functions. I’m not sure why defaultdict should explicitly blacklist a few arbitrary function names, when it is the function’s behavior, not its name, that reflects whether it is appropriate to be passed, and that is ultimately up to the user’s discretion.

In the third case, lambda is a keyword, but it is of course not being passed to defaultdict() (keywords are syntactic constructs, not objects, and so cannot be passed around), but is rather being used to create a function for defaultdict(), which is a textbook usage for such; however, as @steven.daprano stated, the function the user passed is invalid, and so will raise an error (and static analyzers may flag it, etc).

By C.A.M. Gerlach via Discussions on Python.org at 02Apr2022 17:09:

In the third case, lambda is a keyword, but it is of course not being
passed to defaultdict() (keywords are syntactic constructs, not
objects, and so cannot be passed around), but is rather being used to
create a function for defaultdict(), which is a textbook usage for
such;

Indeed. I did that only last night:

tagmap = defaultdict(lambda: defaultdict(set))

Cheers,
Cameron Simpson cs@cskk.id.au

A few days earlier, I was checking what all builtins could be inherited from, and found out that enumerate and filter could also be inherited from.

yesterday, I was checking what all arguments could be provided to defaultdict, and gave the same enumerate and filter to it, it did not give an error.

but it turns out that there is no way to use it.
again, I ran the check for more builtins, similar for the inheritance case.

from collections import defaultdict
import keyword, re

passed = set()
failed = set()

for i, j in keyword.__builtins__.items():
  if not re.search('Error|Warning|__|ipython', i):
    try:
      x = defaultdict(eval(i))
      passed.add(i)
    except:
      failed.add(i)
passed
{'BaseException', 'Exception', 'GeneratorExit', 'KeyboardInterrupt', 'None',
 'StopAsyncIteration', 'StopIteration', 'SystemExit', 'abs', 'all', 'any',
 'ascii', 'bin', 'bool', 'breakpoint', 'bytearray', 'bytes', 'callable', 'chr',
 'classmethod', 'compile', 'complex', 'copyright', 'credits', 'delattr', 'dict',
 'dir', 'display', 'divmod', 'dreload', 'enumerate', 'eval', 'exec', 'execfile',
 'filter', 'float', 'format', 'frozenset', 'getattr', 'globals', 'hasattr',
 'hash', 'help', 'hex', 'id', 'input', 'int', 'isinstance', 'issubclass',
 'iter', 'len', 'license', 'list', 'locals', 'map', 'max', 'memoryview', 'min',
 'next', 'object', 'oct', 'open', 'ord', 'pow', 'print', 'property', 'range',
 'repr', 'reversed', 'round', 'runfile', 'set', 'setattr', 'slice', 'sorted',
 'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'vars', 'zip'}
failed
{'Ellipsis', 'False', 'NotImplemented', 'True'}

running the same check with keyword.kwlist gives passed as None and rest all are in failed.

I did further check, if an element could be accessed like x['a']

passed_both = set()
passed_first_failed_second = set()

for i in passed:
  try:
    x = defaultdict(eval(i))
    y = x['a']
    passed_both.add(i)
  except:
    passed_first_failed_second.add(i)

I think so, some of it is triggering debugger, so, I had to press q four times.

passed_both
{'BaseException', 'Exception', 'GeneratorExit', 'KeyboardInterrupt',
 'StopAsyncIteration', 'StopIteration', 'SystemExit', 'bool', 'bytearray',
 'bytes', 'complex', 'copyright', 'credits', 'dict', 'dir', 'display', 'float',
 'frozenset', 'globals', 'help', 'input', 'int', 'license', 'list', 'locals',
 'object', 'print', 'property', 'set', 'str', 'tuple', 'vars', 'zip'}
passed_first_failed_second
{'None', 'abs', 'all', 'any', 'ascii', 'bin', 'breakpoint', 'callable', 'chr',
 'classmethod', 'compile', 'delattr', 'divmod', 'dreload', 'enumerate', 'eval',
 'exec', 'execfile', 'filter', 'format', 'getattr', 'hasattr', 'hash', 'hex',
 'id', 'isinstance', 'issubclass', 'iter', 'len', 'map', 'max', 'memoryview',
 'min', 'next', 'oct', 'open', 'ord', 'pow', 'range', 'repr', 'reversed',
 'round', 'runfile', 'setattr', 'slice', 'sorted', 'staticmethod', 'sum',
 'super', 'type'}

probably, it should raise an error as soon as the user specifies keywords in passed_first_failed_second just like it does for True, False.

x = defaultdict(super)
y = defaultdict(staticmethod)

they cannot be used, so, why not raise an error directly?

one more thing I did was to specify a function with one of those names, for example,

def enumerate():
    return 1
x = defaultdict(enumerate)
x['a']

gives 1
for such a case, shouldn’t it raise an error if the function requires arguments?

def enumerate(a, b):
    return 1
x = defaultdict(enumerate)

is it possible to check that the function provided to defaultdict does not take any argument here itself, else raise an error?
something like this,

import inspect
class CustomDefaultDict(defaultdict):
  def __init__(self, arg):
    if inspect.isfunction(arg):
      if arg.__code__.co_argcount > 0:
        raise TypeError('function with more than one argument not allowed')
    super().__init__(arg)

but this would not work for builtin functions.

Hi Tsef,

First off, I really want to congratulate you for exploring Python in the interpreter, and I’m not being patronising. That’s fantastic and I wish more people would do that.

But please stop calling these things “keywords”. They are not keywords, they are just regular functions or classes that happen to be built into the interpreter.

Python keywords include:

  • if, elif, else

  • for, while

  • try, except, raise

etc. You can get a full list from the keywords module. Notice that they are (almost) all commands, not functions, objects or values.

The only exceptions are (from memory):

  • None, True, False.

which are built-in values considered so important that they are protected by being made keywords.

Everything else is just a plain old name, like x or mylist. Because they are built-in to the interpreter, we call them builtins.

A better way to get the builtins is by using the builtins module:

import builtins

gives you a module containing all the builtins. Now you can access its names and objects:

for name, obj in vars(builtins).items(): ...

You should not use __builtins__ as that is reserved for the interpreter, and could disappear without notice if the interpreter ever decides it doesn’t need it any more. So Best Practice is to import the builtins module instead.

No need to use eval(name) to get access to the builtin object itself. It is right there in vars(builtins), as the dictionary value. The dict key is the name, and the value is the object.

Generally speaking, Python will usually delay checks until the last possible moment (when the object is actually needed) before raising an error.

So defaultdict(obj) will likely accept anything that passes the simple test is None or callable(obj). It is relatively cheap for the interpreter to check if the object is callable, but expensive to check how many arguments it takes, so the interpreter delays doing that until it needs to call the function.

yes, I changed the implementation, now using builtins

from collections import defaultdict
import keyword, re, builtins

passed = dict()
failed = dict()

for key, value in vars(builtins).items():
  if not re.search('Error|Warning|__|ipython', key):
    try:
      x = defaultdict(value)
      passed[key] = value
    except:
      failed[key] = value
passed_both = dict()
passed_first_failed_second = defaultdict(dict)
passed_first_failed_second_check = set()

for key, value in passed.items():
  try:
    x = defaultdict(value)
    y = x['a']
    passed_both[key] = value
  except:
    passed_first_failed_second[type(value)][key] = value
    passed_first_failed_second_check.add(value)

where,

passed_first_failed_second

looks like this,

defaultdict(<class 'dict'>,
            {<class 'function'>: {'dreload': <function _dreload at 0x7f5b620efdd0>,
                                  'execfile': <function execfile at 0x7f5b572a5950>,
                                  'runfile': <function runfile at 0x7f5b56f66290>},
             <class 'builtin_function_or_method'>: {'abs': <built-in function abs>,
                                                    'all': <built-in function all>,
                                                    'any': <built-in function any>,
                                                    'ascii': <built-in function ascii>,
                                                    'bin': <built-in function bin>,
                                                    'breakpoint': <built-in function breakpoint>,
                                                    'callable': <built-in function callable>,
                                                    'chr': <built-in function chr>,
                                                    'compile': <built-in function compile>,
                                                    'delattr': <built-in function delattr>,
                                                    'divmod': <built-in function divmod>,
                                                    'eval': <built-in function eval>,
                                                    'exec': <built-in function exec>,
                                                    'format': <built-in function format>,
                                                    'getattr': <built-in function getattr>,
                                                    'hasattr': <built-in function hasattr>,
                                                    'hash': <built-in function hash>,
                                                    'hex': <built-in function hex>,
                                                    'id': <built-in function id>,
                                                    'isinstance': <built-in function isinstance>,
                                                    'issubclass': <built-in function issubclass>,
                                                    'iter': <built-in function iter>,
                                                    'len': <built-in function len>,
                                                    'max': <built-in function max>,
                                                    'min': <built-in function min>,
                                                    'next': <built-in function next>,
                                                    'oct': <built-in function oct>,
                                                    'open': <built-in function open>,
                                                    'ord': <built-in function ord>,
                                                    'pow': <built-in function pow>,
                                                    'repr': <built-in function repr>,
                                                    'round': <built-in function round>,
                                                    'setattr': <built-in function setattr>,
                                                    'sorted': <built-in function sorted>,
                                                    'sum': <built-in function sum>},
             <class 'NoneType'>: {'None': None},
             <class 'type'>: {'classmethod': <class 'classmethod'>,
                              'enumerate': <class 'enumerate'>,
                              'filter': <class 'filter'>,
                              'map': <class 'map'>,
                              'memoryview': <class 'memoryview'>,
                              'range': <class 'range'>,
                              'reversed': <class 'reversed'>,
                              'slice': <class 'slice'>,
                              'staticmethod': <class 'staticmethod'>,
                              'super': <class 'super'>,
                              'type': <class 'type'>}})

so, enumerate and filter are both class here, and we have some builtin_function_or_method, some function and one NoneType.

and now CustomDefaultDict looks like this,

class CustomDefaultDict(defaultdict):
  def __init__(self, arg):
    import inspect    

    if inspect.isfunction(arg):
      if arg.__code__.co_argcount > 0:
        raise TypeError('function with arguments not allowed')

    if arg is None:
      raise TypeError('NoneType not allowed')

    if inspect.isbuiltin(arg):
      if arg in passed_first_failed_second_check:
        raise TypeError('builtin_function_or_method with arguments not allowed')
      
    if isinstance(arg, type): # there might be a better way to check if it is a class
      if arg in passed_first_failed_second_check:
        raise TypeError('class which requires arguments not allowed')

    super().__init__(arg)

but here, for some reason, the inspect.isfunction does not recognize a builtin_function_or_method as a function.
so,

inspect.isfunction(abs)

gives

False

although it is a function, in the original implementation it checks like this,

BuiltinFunctionType = type(len)

and since

type(len)

gives,

builtin_function_or_method

so,

inspect.isbuiltin(abs)

gives

True

but this is a bit doubtful, as builtin_function_or_method is also either a function or a method.

same thing could also be said with method, one could say that a method is a function bound to an object, so, inspect.isfunction on a method should probably return True. But the way it is implemented, it checks like this,

class _C:
    def _m(self): pass
MethodType = type(_C()._m)

so, here, _C()._m is not considered as a function, although it is a function bound to an object, it is again confusing to me.

after this, our CustomDefaultDict does the required task of raising an error, for example,

x = CustomDefaultDict(sum)

gives,

TypeError: builtin_function_or_method with arguments not allowed
y = CustomDefaultDict(enumerate)

gives,

TypeError: class which requires arguments not allowed
z = CustomDefaultDict(None)

gives,

TypeError: NoneType not allowed

and,

w = CustomDefaultDict(execfile)

gives,

TypeError: function with arguments not allowed

we could define a custom function like,

def enumerate():
    return 1
v = CustomDefaultDict(enumerate)

it works, while if we give arguments, like,

def enumerate(a):
    return 1
u = CustomDefaultDict(enumerate)

then, it gives,

TypeError: function with arguments not allowed

does this cover all possible cases?
plus is the check for class correct?

does it mean there is some issue with using,

      if arg.__code__.co_argcount > 0:

the same problem appears to be there in a few other places also, for example,

map(1, 'abc')

there is no way to use it, but it would not give an error directly.

but here,

import functools
functools.reduce(1, ['a', 'b'])

it gives error directly,

TypeError: 'int' object is not callable

map returns a lazy iterator that delays computation until it is requested.

>>> it = map(len, ["abc", "a", "ab"])
>>> it  # delayed computation
<map object at 0x7f4c280c9d20>
>>> next(it)  # results are computed on request
3
>>> next(it)
1

But reduce is an eager function that immediately processes its arguments.

>>> from functools import reduce
>>> reduce(lambda a, b: a+b, [1, 2, 3])  # immediate computation
6

So when you pass a bad argument to reduce, it immediately tries to use it, which fails immediately. But a bad argument to map may not be noticed until you actual request the next value.

yes, but does this not make things inconsistent.
I think so earlier, in Python 2, map and filter would also work the way functools.reduce does.
then they were changed in Python 3.
in the case of defaultdict also, the check is being done on request.
I am not aware of more examples, but does delayed computation at some places, and immediate computation at others, not makes things inconsistent.

one could say that if I use map in a situation like,

map(len, some_list_with_millions_of_elements)

then delayed computation is better as immediate computation would create memory issues.
but the check for whether, the first argument could be used should be done immediately, like,

map(1, some_list_with_millions_of_elements)

should directly throw an error.

Such checks themselves would both incur a performance hit, complicate the implementation, and add cognitive and code complexity. Additionally, error handling behavior would depend on the details of the arguments provided, rather than being…well, consistent.

Furthermore, it raises the question (pun intended)—what heuristic do you actually employ to check the first argument? Whether callable(first_arg) is True? I’m not sure how useful that really is, since it only would detect cases where a non-callable was passed, which is pretty obvious at runtime and is something linters can statically detect in most cases (e.g. literals, as you showed). And again, this would mean the error would be raised in a different time and location depending on the arguments, which is highly inconsistent behavior.

And of course, if you really wanted an immediate check, you could just define something like

def safe_map(function, /, *iterables):
    # Perform any checks you wanted on `function`, e.g.
   if not callable(function):
      raise ...
   return map(function, *iterables)