This way makes sense, to me, today,
-
use random access to read WHEEL, RECORD and prepare for hash checking
-
generate series of (paths inside the wheel, and filelike to get the data) - include the ZipInfo for necessary metadata like +x bits and “isdir”
-
automatically check wheel integrity/consistency here (hook on the readable stream for each archive member, raise error if .close() and hash mismatch)
-
split paths between {package}.data/category/ /rest/of/path or ‘root of archive’ /rest/of/path for files not in the data directory
-
map from {package}.data/category or ‘’, to category name one of PURELIB, PLATLIB, SCRIPTS, … at this stage we can no longer tell the difference between files at ‘’ or {package}.data/purelib if Root-Is-Purelib
-
map from category name to installation target directory
-
join target directory with /rest/of/path
-
stream file contents to disk
-
rewrite legacy scripts etc.,
-
RECORD
-
build pyc’s? the ‘smart enough to uninstall’ step just means any files you generate as a result of installing the wheel, also go into RECORD
If steps can be combined or optimized away then that should happen. If it is streaming the installer should be prepared to roll back after an error, say, if the last file doesn’t match its hash.
We want to change step #2 to improve compression so it would be helpful for that to be independent.