PyPI account recovery process triaging on halt

tarekziade · January 25, 2024, 8:01am

Here’s an example of a very simple improvement for triaging 2FA, see the issue below:

Ee looked at the database to see if the user had a valid recovery code, finds it, and answer to the user
The user answers the same day that they don’t really know what that means
Ee explains
the user says they don’t have the code, and ask what to do

And that’s it for this cycle. And this was “fast” because the two had an exchange under a week. You have other scenarios where it would be slower.

Hopefully Ee will look at it again and help the user a few months later.

What I think could be improved:

=> change the template for those issues with a check box “I don’t have my recovery code anymore”

That would speed up the process. This user would have probably gained back their access in september.

Am I missing something obvious? what is the attack vector here @dstufft – please educate me

Cheers
Tarek

tarekziade · January 25, 2024, 8:07am

“significant” is too vague. It is subject to interpretation. When I add an issue in an open source project and have no answer after 4 months I assume the project is dead.

Maybe you could have a link there explaining exactly what is happening – and what is the expectation, so the user won’t hold their breath if they need to push package with a shorter delay than 4 months.

tarekziade · January 25, 2024, 8:25am

I don’t think the account recovery process qualifies to your description here. See my latest message about the back and forth PyPI admins are doing just to ask the user “have you tried your recovery code?”

tarekziade · January 25, 2024, 8:42am

Well, I hope the Zephyr one will be solved quickly. At this point I don’t think the PyPI admins will want to delegate any task – and reading through the answers the process is considered perfect and if we complain it’s not nice because they struggle and the only solution is to hire someone for those boring task. That’s one strategy. Wait for a hire and don’t delegate.

But why would they trust that hire more than a member of the community ? what’s is behind that “higher trust” required?

because requires months if not years of training and commitment to be able to triage bugs ?
because of the contract? the person would be legally responsible in case of a malicious act?

I think any core dev would be capable of triaging those bugs, even if it means signing another paper.

I don’t trust @dstufft more than any other core dev to do this work.

pitrou · January 25, 2024, 9:29am

So it looks like the PSF should hire/contract someone for this? That would help solve the legal issues, assuming they do exist, while helping expedite the process of addressing tedious user requests.

dstufft · January 25, 2024, 2:14pm

The wording of those forms, as well as the questions that are asked are controlled by files in the pypi/support repo, under the .github/ISSUE_TEMPLATE directory. If you have an improvement that you want to propose to one of them, you can make a pull request. Like all things OSS, if you’re proposing a radical change you may want to discuss it first, but if you’re just trying to clarify or streamline things than an individual PR should be fine as well.

Well, like Ee mentioned there’s several axis of trust here.

(1) is “Do I trust this person not to do something malicious”, and yea I think any of the core devs could be trusted not to do something actively malicious. A hired person has a legal obligation that helps ensure this one as well.
(2) is “Do I trust this person to understand the implications of the actions they are taking”. I wouldn’t trust any random core dev (or any random person in general, no matter how much I personally trusted them) because there’s a lot of random stuff that is non obvious (like needing to use the private email address vs the public).
(3) is “Do I trust that this person is going to consistently help with these issues”. Temporary bandaid fixes are sort of a double edged sword for us. Yes they fix the immediate backlog, but if it’s a temporary fix and not a sustainable long term fix, then the backlog is just going to build back up again. However, in the interim we lose a powerful point of justification for getting an actual long term, sustainable solution in place. Being able to point at the 4+ month backlog and be like “look, this here is the problem you can solve” is a much more compelling justification than “well we had a backlog, but then someone came along and fixed it for now”.

Please note, I’m purposefully being somewhat vague here, because I don’t think I’ve personally handled a single one of these issues on the support tracker (for various reasons, mostly related to time), so I wouldn’t even trust myself to answer authortatively on specifics of what would or wouldn’t help, nor would I trust myself to jump in and handle these requests without getting a bit of training/brain dump from the folks who have been handling them. I’m just trying to answer somewhat generally, based on my experience with the parts of PyPI I do contribute to and what I’ve been told (either publically or not) by the various people who are handling these things.

tarekziade · January 25, 2024, 5:19pm

Donald Stufft:

(2) is “Do I trust this person to understand the implications of the actions they are taking”. I wouldn’t trust any random core dev (or any random person in general, no matter how much I personally trusted them) because there’s a lot of random stuff that is non obvious (like needing to use the private email address vs the public).
(3) is “Do I trust that this person is going to consistently help with these issues”. Temporary bandaid fixes are sort of a double edged sword for us. Yes they fix the immediate backlog, but if it’s a temporary fix and not a sustainable long term fix, then the backlog is just going to build back up again. However, in the interim we lose a powerful point of justification for getting an actual long term, sustainable solution in place. Being able to point at the 4+ month backlog and be like “look, this here is the problem you can solve” is a much more compelling justification than “well we had a backlog, but then someone came along and fixed it for now”.

For (2) I don’t really understand what you mean by not trusting some core dev to do the right call for some “random” stuff. I would expect everything to be documented and when a new situation arises, get guidance, this is how delegation works imho.

For (3) your argument about pointing at the 4+ months backlog for an argument to justify a PSF hire is the gist of our disagreement and the tension here. It means to me that those developers are your hostages to justify a hiring.

I am going to propose a change in the template, and if it’s rejected or ignored, I will be thinking “oh, they don’t want to improve the process because they need that crisis to justify a hiring”

CAM-Gerlach · January 25, 2024, 5:24pm

Just for everyone’s reference, I can confirm that per the canonical Devguide Developer Log and verified on the private python/voters repo, Tarek Ziadé with username tarekziade is a former core developer, joining on 2008-12-21, apparently as a maintainer for distutils, and left 2017-02-10 when they did not make the GitHub transition. (Of course, I cannot confirm conclusively that @tarekziade is the same person, as for privacy reasons Discourse will not let me compare their account email with the one stored in the voters repo.)

marc-h38 · January 25, 2024, 6:50pm

Of course there will always be difficult cases. They’re off-topic here; a sheer distraction and probably the main reason why this thread is getting too long. This topic is about the BULK of the huge backlog. About helping with it and optimizing the common cases. Yes, there are privacy and other concerns but there is also a lot of tedious, non-sensitive “grunt” research work that can be done with just public / “open-source” information. There are also ways to remove some bottlenecks and other process improvements for the common cases.

Wow, I really hope you’ll be proven wrong. But I admit rejecting a volunteering offer from a well known contributor is troubling.

To be clear: I agree with this 200%. But I don’t think consciously making it worse to then make it better would be a good idea. I also don’t think volunteers are automatically a “bandaid”, Ideally, every open-source project should have a healthy mix.

EDIT: a fee for “expedited processing” wouldn’t shock me. It would seem fair.

mwichmann · January 25, 2024, 7:41pm

This could easily be covered by the TOU (which is frustratingly light on use of confidential/personally identifiable data or anything at “account” or “project” level, as it only talks about uploaded content)

pitrou · January 26, 2024, 9:11am

It’s only fair if you make the fee small enough for everyone to afford it (and also, if there are enough payment options).

tarekziade · January 26, 2024, 10:33am

Here’s a first couple of changes I am proposing:

The delay of interaction between the admin and the user is closer to months than weeks
the user can check a box to notify the admin they lost their recovery code. It prevents the round trip where the admin ask the user to try the recovery codes that are in the DB, and then the user says they lost then. That round trip sometimes takes months to finish.

Thanks

elis.byberi · January 26, 2024, 10:52am

The attack vector is time; be patient.

tarekziade · January 26, 2024, 10:57am

There are systems like Patreon for this, but you would still need to have volunteers that want to do it, and in the same time have this “high trust” from the PyPI maintainer to be in their elite club.

Which leads me to wonder if improving the process like how it’s described in some issues should not be part of the PSF paid staff mission?

My company gave $170,000 in 2017 for PyPI - Python Software Foundation News: The PSF awarded $170,000 grant from Mozilla Open Source Program to improve sustainability of PyPI and I am glad we did.

I am just wondering what is the actual mission of the paid staff with regards to PyPI today.

Is there a document that explains this? maybe this is also something where we can weigh in as a community in terms of priorities.

Cheers

tarekziade · January 26, 2024, 10:59am

I don’t understand - I am not a security expert, could you explain in details what you mean in the issue I have added ? Update account-recovery.yml by tarekziade · Pull Request #3567 · pypi/support · GitHub

thnks

pf_moore · January 26, 2024, 11:06am

It’s been made pretty clear, I thought, that at the current time there are no staff paid to work on PyPI - so they don’t have a “mission” as such. And volunteers, by definition, work on whatever they prefer to work on.

(Edit: “there are no staff” - sorry for a typo that completely changed the meaning of my comment!)

steve.dower · January 26, 2024, 11:16am

Keeping the lights on is the first mission (which is the one I don’t envy at all… running a service is brave work). I thought Ee listed the rough set of tasks in this thread already, but I didn’t see it when I just skimmed through, so maybe it was off in a link.

Certainly the long-term plan for PyPI is a bit vague for those of us on the outside. No doubt there’s clarity between those who work closely on it, and probably enough hints scattered throughout the Warehouse issue tracker and probably posts on here, but I would agree that there’s no clear roadmap for the rest of us (I’m not necessarily saying they owe us one, just that I haven’t seen one).

I suspect this is snark, suggesting that the PyPI team is attacking you (Tarek) by trying to wear you down.

As someone who is often mistaken for a security expert, I’d say the defence being applied here is time. Slowing down password resets/etc. is a known way to make many attacks infeasible. Clearly these have been slowed down to the point where it’s impractical and it’s not really a defence (I can be facetious too! ), but there’s no attack vector here.

pf_moore · January 26, 2024, 11:29am

There’s no attack vector. But there’s no significant benefit. The checkbox would have saved 1 week, but it wouldn’t have given the user back their access. I don’t see why you think that answering this particular question a bit faster would have made any difference to the followup process (which is still ongoing at this time).

And in the occasional case, the more informative question that a human can ask (“You generated recovery codes on such-and-such date, do you no longer have access to them?”) might have prompted the user to think “Oh, I remember now - yes, I’ve found them!” and as a result improved the situation compared to a simple checkbox. Maybe this isn’t common enough to be worth worrying about, but it’s a consideration.

elis.byberi · January 26, 2024, 10:16pm

No, the use of time as an attack vector is in a metaphorical sense, suggesting that the most effective strategy in the given situation is to wait patiently, rather than taking immediate or forceful action.

The message was intended for Tarek Ziade in response to his question about the vector attack.

There is clearly no cybersecurity attack vector whatsoever, just a lack of available resources.

I have had a similar occurrence with an ‘old’ email (due to service provider mismanagement), and I still haven’t fully recovered all the accounts, nor do I remember them anymore. I know that it is frustrating; I was just going off-grid .

There are service providers that don’t even care to respond. It is good to know that PSF has a clear path to account recovery.

(I wasn’t able to respond earlier; I was in ‘offline mode.’ I simply forward emails to another email on my phone device, and I don’t have any access to any account at all. I only use communication apps. Keeping my personal email, the authenticator app, and phone number all on a single device doesn’t make me feel good.)

tarekziade · January 31, 2024, 9:13pm

Because it saves a round trip and the PyPI admin can take months between two answers… I’ve explained the scenario in the PR I have opened