We’ve recently received a bug report (#120522) that raises an interesting philosophical question for CPython core development: How far are we willing to go to accomodate App Store review processes?
In this case, the issue is that Apple’s macOS App Store is auto-rejecting any app that has the string itms-services in it. This is the custom URL prefix used for requesting an app installation from the iTunes App Store; however, sandboxed apps are prohibited from using these URLs. Apple’s automagical review processes are catching on the code in urllib’s parser’s handling of these URLs - even if the app in question never uses an itms-services:// URL. It’s present in the standard library; therefore the app is rejected.
Some light obfuscation of the magic string appears to avoid the issue. However, this isn’t guaranteed to be a fix forever, and could lead to an obfuscation arms race; and there aren’t any guarantees that this will be the only app validation problem we’ll need to resolve.
Although it’s a macOS app triggering the issue right now, similar App Store auto-review processes exist for iOS, Android and Microsoft. Apple’s reviewing is definitely the most… paranoid and inscrutable… of the lot - but they all have validation and acceptance processes that are entirely opaque.
The question is: what approach should CPython (as a project) take to this class of issue? Some options:
Consider “acceptable to app stores” to be a design goal of CPython, and incorporate any patches necessary to meet that requirement. This means users won’t need to do any special patching to make CPython “App Store compatible”; but it does mean some occasionally ugly obfuscation code will be merged. If, at some point, the rules change, more patches may be needed; and if an old rule is removed, we won’t have any good signal that one particular piece of obfuscation isn’t required any longer.
Consider this to be a distribution problem. CPython is what it is; tools that generate bundled apps (like Briefcase, Py2app, Buildozer, and others) are then responsible for patching CPython to make it acceptable to App Stores. In the case of Briefcase and Bulldozer, these tools also patch and build custom CPython libraries. Historically, this has been because CPython didn’t support iOS and Android out of the box; 3.13 sources now work without patching… but if we consider this a distribution problem, patching will essentially be a ongoing requirement.
(1) also means that distributed Pythons aren’t “officially” Python, because they’ve been modified for distribution; I don’t know how much we want to consider that a security or brand risk. I guess there’s an analog here in the patches Linux distributions apply to Python for distribution, so maybe that isn’t a concern.
In the case of (2), there’s an additional question of whether CPython should document what it knows about any App Store restrictions.
So: how should CPython’s address this kind of obfuscation?
Just so I understand correctly: the issue here is that the binary artifacts submitted for review contain the sequence of bytes \x69\x74\x6d\x73\x2d\x73\x65\x72\x76\x69\x63\x65\x73? Or are the .py / .pyc files being parsed correctly and the string "itms-services" is being found?
If it’s the former, that seems… awful. Would we be expected to change our bytecode format (or worse, our compiled C code) if it started resembling banned strings?
I’d like to suggest a third direction. This is inspired by our experience on pyca/cryptography where we often get bug reports saying “you refuse to parse this certificate, which while technically invalid was issued by [some widely used device or CA]”. And the answer we’ve come up with is:
In general, we will accept PRs that work around these kinds of issues, provided they are small, localized, and generally aren’t too awful. BUT, before we’ll merge the PR, someone needs to complain to the third party and make sure they are aware of the issue and have given some indication that they’ll do something about it. And any workaround we accept will be time limited in some way (i.e., we’ll remove the workaround in a few releases).
This tries to preserve a balance between giving users a decent OOTB experience, while also not letting large firms simply externalize their bizarre issues onto OSS projects.
FWIW, the closest equivalent we face on the Windows app store is a special permission to allow our default app to be “headless” (that is, python.exe is not a GUI app). It just requires an email to be sent to support whenever a new version is published, but they’re very good at just flipping the switch, especially when we point out all the previous versions they flipped it for. Other than that, nothing particularly gets scanned beyond blatant malware.
But generally, I’m okay with us making changes or having “support” for this kind of stuff. In this case, patching out the code for an iOS/app distro is fine. We’ve done it before (some parts of distutils were patched in some formats out way before we deprecated the whole thing), so I don’t think it makes it any less “official”, but then, I think the whole concept of “official” is overrated and a net-negative in OSS
If having an upstream-endorsed method to omit parts of our code to make Python usable in more places is what’s needed, then I’m all for it.
The submission process is entirely inscrutable; but as far as it is possible to work out, it’s a literal substring match on the py/pyc file. And yes, it is awful, in every sense of the word.
Maybe. Maybe not. Ask again later.
That definitely sounds appealing as an approach - but in this case, it’s going to be screaming into the void. There’s barely even an appeals process for app rejection on Apple’s App Store. We definitely don’t have any sort of channel to raise a complaint that we could reasonably believe would result in a change of policy.
Also, is obfuscating the string “allowed”? Or would this be (rightfully) seen as circumventing the review process and punished somehow?
This is a good question. In this case, the actual reason given for rejection was:
Guideline 2.5.2 - Performance - Software Requirements
The app installed or launched executable code. Specifically, the app uses the itms-services URL scheme to install an app.
Clearly Python is not doing anything of the sort; it simply understands a bit about parsing those urls. I would argue this a false positive and not some sort of malicious compliance. I would hope Apple would agree if they took a close look, but who knows.
Also, in cases like this, just removing that string completely from Apple distributions works if the obfuscation route feels dirty. Platforms like Android or iOS certainly have swaths of functionality unavailable and this would just be one more very minor thing. Perhaps that is preferable to adding ugly obfuscation for everyone? I guess that’s what we’re discussing here.
It is best assume that Apple doesn’t give a hoot about our, or anyones, project and will arbitrarily change things and reject things on a whim for never-thought-through reasons. Anything done today may be pointless tomorrow and we will never be informed or have any automated way to find out.
It’s a time wasting game of whack a mole. We can obfuscate something, just don’t assume it’ll solve anything long term. This is why I favor separating things like this into a build time transmogrification/stripping step for iOS to elide whatever bits have ever been identified as pissing fruit-reviewer-robots off when people do builds if we ever wind up with multiple things like this. Obfuscation living within the code itself just adds tech debt for everyone else.
Hoping Apple would take a close look at something is hope as a strategy. That’d require human involvement to do manual review on each and every app submission by any user. Which costs them money. Or human involvement to improve their detection logic, which also costs them money. Why would they bother - they have no incentive to care.
While this doesn’t answer the problem in general, unless I’m mistaken the said string occurs only in tests. In this particular case, wouldn’t a better solution be for a redistributor not to ship tests? It should also reduce the resulting application size.
Reading the original issue that added itms-services support to url splitting and joining, it does make me wonder if it would be worth changing urllib to read the initial setup of its module level attributes from a JSON config file bundled with the standard library rather than hardcoding its knowledge of all the relevant schemes.
Then the bundled app generators could just drop itms-services from that config file rather than having to patch urllib.py directly.
As a more general case though, I think it does make sense to view this as equivalent to Linux distro patching (even with the URL schema config file idea, the bundled app generators would still need to modify the config file).
I can see how that would work; but it strikes me as a bit of overkill for an edge case that would be just as easy to handle with obfuscation or distribution-level patching. Are there any other analogs of this approach in the existing codebase? I can’t think of any.
For urllb.parse however, I don’t think we should make a registry of special handling more convenient to do with a config file rather than code. URIs have had standard structure for many years, so registering schemes and special parsing is not needed anymore. There have been previous discussion, from memory I think the general mood was to not break expectations around urllib.parse, and maybe have a new module for fully generic URI handling.
Neither can I. The pattern just struck me when looking at the code, since I’ve encountered other similar (non-CPython) situations where the temptation to embed data directly into module code proved really annoying when other systems later needed access to the info.
For this specific case, letting app store distribution tools make the change feels like the best readily available option, and that wouldn’t get notably simpler with a config file anyway.
Another approach just occurred to me: rather than obfuscating the source, which could potentially be considered by Apple to be “an attempt to circumvent a legitimate security review process” (cough), we add a build-time option that removes the code we know to be problematic.
The Mac folder would contain a diff that described the changes that need to be applied to the source tree (in this case, removing support for the itms-server URL from parse.py; but it could be extended if needed).
configure would then gain a --with-app-store-patch option. This would be disabled by default on most platforms (including macOS), but enabled on iOS. If enabled, it would apply the patch before building the standard library. The option could also accept a file (i.e. ,--with-app-store-patch=path/to/patch), so if App Store rules change at some time in the future after the maintenance window for a particular Python release has closed, there’s still a supported option for distributors to provide an updated the patch.
Yes - this essentially reproduces something that could be trivially reproduced by distributors - but it has the advantage that CPython as a project can provide an official list of changes known to be required for App Store compliance.
As mentioned on the GH issue, IMHO we should either implement a workaround ourselves, or document how anyone targeting the App Store needs to adjust Python to pass the requirements (as far as we know). A patch file with a configure option would be better than a documentation file.
That said, I’ll almost certainly add code to py2app to apply such a patch when bundling an app for distribution because a goal of py2app is to work with whatever install of Python is on the user’s machine.
One thing I haven’t looked into for this particular issue is there is a way to reimplement parse.py in such a way that mentioning the problematic string is no longer necessary. That said, it is unlikely that this is possible due to uses_netloc being effectively public API.
I also had a Briefcase itms-services rejection by the iOS App Store and just found this thread.
Regardless of your decision, I’d like to pass my appreciation to the Core Development team. Russell thoughtfullly presented the issue and it’s impressive to follow the dialog as you carefully consider your options.
This is yet another reason why I love the Python community.
This is basically what I had in mind. In a sense, we are becoming a distributor for the iOS use case. That seems appropriate. Sure, there will be other distibutors as well, but given this a common need among all of them, maintaining the patches to apply as part of the config/build process when targeting the platform within our project tree makes sense.
I expect it is unlikely to be a large set of patches. Maintaining them won’t be much of a burden so long as they are small and we have a CI test that ensures they still apply cleanly.
Does that mean it’s time to paint a config option bikeshed?
--with-app-store-patch feels both a bit broad (not scoped to iOS), but also a bit narrow (“patch” specifies a mechanism rather than an intent).
Would --with-ios-app-store-compliance work? That way it would cover anything that was found necessary to produce runtime binaries that pass iOS app store compliance checks, whether those are patches, other build options, or anything else that comes up.
I did spend a little bit of time looking at that, but didn’t see anything obvious that wasn’t a variant on obfuscating the text string in a way that would keep it in separate pieces in the pyc file (e.g. writing it as "-".join("itms", "services") rather than as a regular string literal, although I’m not sure even that would be enough to confuse the AST optimiser these days)