Title: Pickle protocol version 6: skipcode pickles
Author: Wes Turner (@westurner)
Sponsor:
PEP-Delegate:
Discussions-To: Create a new pickle protocol version to add skipcode
Status: Draft
Type: Standards Track
Topic: Pickles
Requires:
Created: 2024-03-18
Python-Version: 3.X
Post-History: 03-18-2024 Create a new pickle protocol version to add skipcode
Replaces:
Superseded-By:
Resolution:
Abstract
Create a new Pickle protocol and/or support a ’ skipcode=True
’ pickle keyword argument
that prevents code from being saved in or executed when read from pickles,
in order to reduce risk of unauthorized code execution particularly in applications where pickle is already the data storage format.
Motivation
There’s yet no way to save data but not code to a Python pickle.
Rationale
- Given that, as the Python Docs indicate [TODO], pickles are dangerous and you should not
unpickle untrusted data;
Pickles could just not save or execute untrusted code.
Specification
-
Other than a protocol version bump and NOP’ing out the serialize code and deserialize
parts of pickle.py, there should be no necessary changes to the pickle specification. -
A data-only pickle serialization protocol implementation would need to skip
calls toself.save_global()
inpickle._Pickler.save()
if condition(pickle_protocol)
here also in thesave_type()
dispatch table at #L1123 .
Backwards Compatibility
- Pickles with pickle protocol 6 or pickle protocol 6 with e.g.
skipcode=True
would be deserializable with at least protocol 5;
but obviously without code in the serialized pickles.
Security Implications
[How could a malicious user take advantage of this new feature?]
-
Users would need to learn that pickles are less safe
without a new optional e.g.skipcode=True
ornoexec=True
flag. -
Pickles do otherwise parse non-codeobject values after parsing the string
prefixes specified in the pickle.py protocol. -
If users do not understand that pickle is only safe from such risk
if protocolv6/skipcode=True
is explicitly specified, users could
inadvertantly over-trust pickles which are still unsafe by default. -
If the user does not specify protocol
v6/skipcode=True
,
reading a pickle will execute code; for example:import pickle pickle.loads("\cos.system('sh -c \"cat /etc/passwd | tee | curl\"')") # TODO >>
-
To limit risk of code execution with pickles (which still otherwise do use
eval()
),
users would:import pickle pickle.loads("\cos.system('echo shouldfail')", nocode=True, )
-
Should there be an environment variable to globally enable or disable
nocode=True
for all pickles in a process?- ’ '
PYTHONNOCODEPICKLES
'?
PYTHONNOCODEPICKLES=1 python -m pickle -t
- ’ '
How to Teach This
[How to teach users, new and experienced, how to apply the PEP to their work.]
-
pickle.dumps()
saves code as strings to binary files. -
pickle.loads()
loads strings that start with\c
into executable code objects. -
Similar to
pickle.load()
,eval()
of untrusted code is unsafe. (eval(str)
also parses and then executes code from a string) -
As referenced in PEP 574 > Related Work [TODO], there are a number of (
faster, zero-copy, portable) data serialization/de-serialization data formats
that might should be considered before choosing pickle for text and/or binary data storage without code execution: JSON is a subset of YAML, TOML, pyarrow and parquet, dask.distributed’s task serialization, lancedb/lance.
Reference Implementation
[Link to any existing implementation and details about its state, e.g. proof-of-concept.]
Rejected Ideas
[Why certain ideas that were brought while discussing this PEP were not ultimately pursued.]
Open Issues
- Security Implications
[Any points that are still being decided/discussed.]
References
- “PEP 574 – Pickle protocol 5 with out-of-band data” (2018; Python 3.8)
PEP 574 – Pickle protocol 5 with out-of-band data | peps.python.org - “PEP 3154 – Pickle protocol version 4” (2011)
PEP 3154 – Pickle protocol version 4 | peps.python.org - “PEP 307 – Extensions to the pickle protocol” (2003)
PEP 307 – Extensions to the pickle protocol | peps.python.org - CPython Docs > Pickle module:
pickle — Python object serialization — Python 3.12.2 documentation
Copyright
This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.