Account anonymization not powerful enough?

pitrou · March 13, 2024, 2:34pm

Hello,

I was reading though Account deletion/permanent deactivation? , and I noticed that, while the anonymized account is protected from certains actions (you cannot view the account, for example), it is still possible to use the search feature to find the messages posted by the account.

Depending on the account’s past activity, you could therefore definitely link the “anonymized” account to the real person that was behind it. It seems that Discourse should allow more thorough cleaning of an account’s activity, which, admittedly, would also disrupt past discussions.

(for additional context, past posts from that person show that they were a minor)

(and for additional additional context, I don’t know the poster, I was just curious)

jeanas · March 13, 2024, 3:33pm

I think you should post this on https://meta.discourse.org/.

pitrou · March 13, 2024, 5:05pm

Ok, I did. Not posting a link, because I don’t know whether Discourse creates backlinks between instances, and it doesn’t seem productive to give more direct publicity to this.

pitrou · March 13, 2024, 9:17pm

Just a heads up that I received this potentially useful in reply, in case a moderator wants to take a look:

There is already an option to wipe an account completely. You just need to delete all posts then delete the account.

CAM-Gerlach · March 13, 2024, 10:47pm

For reference, this was the thread @pitrou posted on Discourse Meta, which had a number of useful responses:

The account merge feature looks like a useful and mostly non-destructive way to resolve this problem, but I’m not seeing an option for it, perhaps because I’m only a mod and not an admin.

In any case, I’ve gone ahead and removed the relevant bits of personal information (name, age, country, etc) from the user’s posts, adding a note stating this was done by me per the user’s request, and removing all previous revisions from the revision history. I also flagged the rest of the mod team to take a look at this.

We actually had a fairly extensive discussion about this last time a user requested account deletion, who in that case also included a signature in their posts:

Per that discussion, last time this came up, the conclusion was that forum posts are not generally required to be deleted by the GDPR, and personal information that the user voluntarily chooses to include in their own posts (as opposed to that collected “by the system”, like profiles), appears to be somewhat of a gray area. However, if a user requested such be removed, then we of course would not object and be happy to scrub it for them, and particularly in this case given the amount of personal detail provided it seemed quite prudent.

Additionally, I would be in favor of a policy of no post signatures (which are generally very rare here anyway, though ironically both the users who recently requested account deletion had them), like Discourse Meta has, as they are just redundant clutter that takes up space and presents most of the problem in this regard.

As for

Yup, I am indeed aware of that option, thanks, though only posts younger than 60 days (of which those posts were not) can be deleted by default without admins (of which I am not) manually overriding the setting and then performing the action. And that’s a very drastic, destructive option to delete a relatively small fraction of personal data that should only be taken as a last resort. There was a general community consensus against allowing deleting posts, particularly OPs and at large scale, aside from exceptional circumstances, per e.g. this thread:

As well as on other threads, where retaining messages as part of a reasonably permanent archive outside of exceptional situations just like the legacy mailing lists Discourse has replaced was part of the agreed social contract of switching to it.

pitrou · March 13, 2024, 11:35pm

Thanks for the swift handling of this @CAM-Gerlach !

This seems to concern deletion of an individual post (and, annoyingly, the entire discussion thread that follows) while keeping an active presence on the forum.

In the present case, the user wanted to disappear entirely from the site, which is a reasonable request - particularly from someone who gave out tidbits of their personal info, including the fact that they were < 18 - and can be motivated by very legitimate concerns.

CAM-Gerlach · March 14, 2024, 1:29am

Right, though to my reading the underlying concerns mainly centered on the effect of those actions on the flow and archival of discussions, rather than on the particular individual user account. As such, they still mostly apply here, just on a potentially far larger and more and disruptive scale (for an account with many threads and posts).

And as mentioned, the previous thread I linked discussed this very same scenario (a user who requested their account and personal data be removed, but had voluntarily included some amount of potentially identifying information in their posts in the form of a redundant signature line), in which it was come to the conclusion that deletion of their posts for GDPR purposes was not a necessary or desirable outcome (outside of extraordinary circumstances).

@malemburg as you provided a fairly authoritative response there based on your expertise, you might want to chime in here as well.

Right, but I scrubbed all such potentially personal information from their posts, as well as the old revisions in the history, after you flagged it, so I’m not sure there still an issue? And they did agree that anonymizing their account was acceptable after I stated that’s what my action of removing their account would do; if they had requested deletion of specific (or in general) personally identifiable information from their past posts at that time, I would not have hesitated to do so.

alicederyn · March 14, 2024, 7:26am

Did you use the word “anonymizing”? Because if someone said that to me, I probably wouldn’t anticipate PII being left behind, lacking a technical understanding of Discourse admin features that might have suggested otherwise.

I’ll note the original blog post (!) that is being relied on here stated (according to Google translate) “Names or email addresses, but also IP addresses and cookies with unique numbers are personal data. If that information is outdated or irrelevant, it must be removed upon request.” That does not mean you need to remove all posts, but it does seem like a reasonable interpretation that you need to edit those posts (as you have done) to remove any PII. The blog post does not contradict this interpretation.

All to say, the full set of steps you have taken should probably be documented somewhere as best practice for the next time this comes up.

malemburg · March 14, 2024, 9:19am

As mentioned here, the GDPR gives people the right to have their PII removed from a system.

They can request removal of all PII data from a system, which is very time consuming for admins and may very well have significant effects on the system itself, but it’s also possible and, given that this forum is run by volunteers, very reasonable, to only request anonymizing their posts and account, so that the published information cannot easily be linked back to them.

When anonymizing the content, there will still be traces of PII in the system, e.g. IP addresses in logs, backups, etc. But to most of the world, it’ll require a fair amount of work to link back the content to an individual.

I would hope that everyone participating in this forum understands the efforts it takes to handle these cases, and agrees with anonymization rather than complete removal.

Perhaps it would make sense to add some text related to this this to the forum terms and conditions. It is not possible to override the GDPR, but we can at least set the expectations right, so that people know what to reasonably expect from the fine people running the forum before sending of content.

pitrou · March 14, 2024, 9:46am

I’m not the person doing the work, so this is only a casual POV from me, but it seems that complete removal would both be much easier and a better guarantee of eliminating all stored personal data, than anonymization, so I’m a bit surprised by this statement.

Or perhaps I misunderstand what you mean with “complete removal”.

alicederyn · March 14, 2024, 9:59am

I think the issue here is whether hitting “anonymize profile” as an admin meets this stated goal of anonymization.

malemburg · March 14, 2024, 10:19am

Complete removal means removal of all PII from:

the system database or repository
any logging systems
any hot/cold backup systems, which may hold local data
any backups, including possibly ones which are years old
any downstream systems we maintain, which receive data from the main system

In addition, we’d have to inform known downstream system operators of the removal request, i.e. this can become a recursive request in some cases.

The above probably isn’t even complete. In any case, this is a lot of work.

Note that IP addresses are considered PII in many jurisdictions, which creates a large part of the work. It’s wise for system operators to anonymize those IP addresses early to simplify things and at the same time enhance privacy.

CAM-Gerlach · March 14, 2024, 4:53pm

Thanks for the detailed response! To try to answer the key question posed here, in the specific case discussed here where the user user voluntarily adds information that could be considered PII to their own free-text posts (such as introducing themselves, or a signature, which we could formally discourage), we need to go through all of their posts and irrevocably scrub that information (as I have done), balancing the mandate in Article 17(a) with the limitations in Article 17 (c):

Paragraphs 1 and 2 shall not apply to the extent that processing is necessary:

for exercising the right of freedom of expression and information;

…

…

for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) in so far as the right referred to in paragraph 1 is likely to render impossible or seriously impair the achievement of the objectives of that processing;

I certainly don’t mind doing so if the user requests it, particularly if they identify specific PII or specific posts they would like scrubbed, but it would be helpful to know if we are actually required in all cases of such requests (at least for EU citizens), and inform whether we should establish a policy disallowing “signatures” and such (as Discourse Meta has).

Well, it would require an admin to override the maximum 60 day window in which posts of deleted accounts can be automatically deleted. But the bigger concern here is relative to surgical edits, it would be a very drastic destructive action that would also remove tremendous amount of false positive non-PII for the possibility of a small fraction of PII, and a serious risk to the integrity of the historical record of most of the canonical discussions behind the recent, present and future decisions on the Python language and ecosystem, which is runs contrary to the intent of GDPR Article 17 (c), particularly paragraphs 1 and 4 highlighted above.

pitrou · March 14, 2024, 5:16pm

Note these paragraphs outline exceptions to the rule, and as such you might expect that they should probably be interpreted parciomoniously, and may have to be justified on a case-by-case basis.

In other words: if a certain Guido v. R., abiding by Dutch and EU law, wants us to remove any trace of his past activity in the community, we can potentially brandish the “public interest, scientific or historical research purposes” exception against that. But if a regular user whose main activity was asking questions in the Python Help category asks us the same thing, it’s going to be more difficult to argue for an exception.

(disclaimer: not a lawyer)

More generally, it helps realizing that the GDPR is not an act of bureaucratic lunacy, it’s the outcome of decades of growing awareness of the European public to those issues. Saying “we don’t want to delete those 20 posts of yours because we believe they’re of public interest for historical research” is probably not going to be very positively looked upon.

alicederyn · March 14, 2024, 5:18pm

I think someone needs to pay a lawyer, unfortunately. Obviously blog posts from a lawyer are not legal advice, especially when not answering our specific question.

Rosuav · March 14, 2024, 5:29pm

Maybe, but also “I demand that you immediately go and delete every copy of all of my messages out of every archive on the internet” is also not going to fly. There’s a balance to be struck, particularly when looking at something that was posted in public for all to see.

alicederyn · March 14, 2024, 5:35pm

That balance is for lawyers to advise on though. Or a judge, I guess.

fungi · March 14, 2024, 5:35pm

I think someone needs to pay a lawyer, unfortunately. Obviously
blog posts from a lawyer are not legal advice, especially when not
answering our specific question.

But also, a lawyer is not going to tell you what to do. They’re
going to give you advice, and try to help you understand (to the
extent possible) the risks associated with following that advice vs.
not following it, so that you can weigh them against one another and
come to a conclusion for how you’ll choose to proceed.

CAM-Gerlach · March 14, 2024, 5:38pm

Right, but I thought we’d all established and agreed on that removing the user’s account and its data, severing any link between that and the posts, and scrubbing any remaining personal information manually entered in posts is sufficient for fulfilling the spirit and letter of the GDPR and protecting the user’s privacy, and the question was whether that last bit was in addition also necessary. Are you now suggesting that it is in fact necessary to also remove the non-PII, non-attributable content of their posts as well? And to your specific example, it is in cases where users have fewer posts that it is much easier to check and scrub them, whereas this would be more challenging for users with hundreds or thousands, but the loss would also be proportionally far greater.

The PSF had a longtime legal counsel, Van, who they retained for these situations. However, Van recently retired, and I’m not aware if the PSF currently has a replacement. Perhaps @malemburg would know?

FWIW, the GDPR does attempt to address that point with 17(b):

Where the controller has made the personal data public and is obliged pursuant to paragraph 1 to erase the personal data, the controller, taking account of available technology and the cost of implementation, shall take reasonable steps, including technical measures, to inform controllers which are processing the personal data that the data subject has requested the erasure by such controllers of any links to, or copy or replication of, those personal data.

In our case, the relevant question would be the individual email archives of users in mailing list mode or that have subscribed to the relevant thread (automatically or manually). However, as there is no available technical solution to address this or realistic means to persuade all such users to manually find and delete such messages from their email clients, I don’t believe the GDPR necessarily obligates us to act on that aspect.

malemburg · March 14, 2024, 6:24pm

Yes, Pamela Chestek is the PSF’s new general counsel.

As others have noted, a lawyer can only help understand the risks and provide guidance on what precautions to take. If you think the moderators need such legal advice from Pamela, please ping Deb Nicholson, the PSF executive director, to see whether she would be open to arranging a meeting with Pamela to discuss.