Community policy on AI-generated answers (e.g. ChatGPT)

fungi · March 28, 2023, 6:14pm

An earlier reply suggested adding it to pinned posts in the Python
Help and Welcome to Discourse topics.

CAM-Gerlach · March 28, 2023, 6:19pm

I’m honestly not sure—FWIW, I’ve found Discourse to be pretty limited when it comes to having places for guidelines, rules and announcements and making them readily visible to all users, at least compared to what I’m used to other places (e.g. Reddit).

N.B., that looks to be just the generic PSF Code of Conduct, not any kind of FAQ. I know its been mentioned before here recently—it would be good to have an actual succinct but clear list of community rules/guidelines for this Discourse besides just the CoC (which is focused on a particular set of harmful interactions in the context of generic spaces, above and on top of the general norms for a specific space). The CoC handles negative behavior above and beyond the specific rules of a particular platform/space, but it complements rather than replaces the same.

CAM-Gerlach · March 28, 2023, 6:21pm

Yeah, that could work to help spread the word, though as pinned posts disappear as soon as you read them (AFAIK), it should also be documented somewhere more central and permanently accessible. As I’m not a Discourse expert though, I’m not 100% sure where the best place for that would be—the user-facing navigation doesn’t make it very obvious, IMO.

brettcannon · March 28, 2023, 7:10pm

Typically in the welcome message (e.g. it’s where we mention the PSF CoC is applied here).

barry-scott · March 28, 2023, 8:04pm

How do view that welcome message again? It’s been a while since I joined.
A quick click on the UI did not turn it up.

Also does the welcome message cover, the topics “no-code-as-pictures” and “how-to-preformat-text”? If so I would love to be able to use a link to the welcome when that info will help a new contributer.

If the welcome is easy to skip and hard to find again it may not help.

Mariatta · March 28, 2023, 8:36pm

Yes it would be good to put it we someplace that we can easily revisit and access instead of in the one time welcome message.

notatallshaw · March 29, 2023, 12:12am

Apologies for being speculative here but it is my belief that once the hype settles down this will be the primary use case for LLMs.

Language models attempt to predict language, factual accuracy is not an inherent quality of any language that I am aware of, in fact it’s quite challenging to generate factually accurate statements even for experts in both the language they are writing in and the topic they are writing about.

However if you prompt an advanced LLM like ChatGPT to reword an existing piece of text into a different style, such as being better edited, summarized, more whilsical, etc. Then you don’t have to rely on a language model solving a problem that are difficult to solve in languages, and instead giving it all the information it needs and just having to solve a problem about language, which well trained LLMs should be very good at.

Bringing it back on topic, I believe tools like ChatGPT will be increasingly used to check posts and continue to be a bad idea to generate posts from scratch.

brettcannon · March 29, 2023, 12:45am

I have no idea.

It might be worth making sure wherever these guidelines end up that we mention folks are expected to read and abide by those guidelines in the welcome message.

barry-scott · March 29, 2023, 6:37am

I see that there is a FAQ entry on the three-lines menu.

Maybe added to that FAQ?
Can more entries be added to that menu?

CAM-Gerlach · March 29, 2023, 7:56am

I would suggest including the public user facing statement (i.e., what I’ve initially drafted above:

If you quote or re-post content that was originally published on another site (blog, docs, tutorial, Stack Overflow, forum, GitHub, etc) either by you or someone else, always attribute and link the original source. If another person or tool, including a large language model (such as ChatGPT or other “AI” services) is used to generate your post or a portion of it, please disclose that as well.

…as part of the welcome message that new users get when they join, as well as on a permanent meta-page for others’ reference.

For the latter, as the About page and the various legal pages clearly don’t seem to be the right place, unless we can create new top-level pages (which a bit of naive Googling didn’t reveal an obvious way to), it seems the FAQ is the only obvious possible place.

However, the overarching problem is that currently, the entirety of the FAQ page is occupied by a copy of the PSF Code of Conduct; while that is important for users to understand (I’m assuming its also linked to in the welcome message, and if it isn’t it should be), I’m not sure a FAQ page is the right place for it, and given its presence, its hard to see where any such LLM/content attribution policy would well, and easy to see how users would miss it.

Therefore, here’s what I suggest:

Instead of duplicating the entire PSF CoC on the FAQ page, I would instead provide a prominent link near the top of the page to the canonical python.org version asking all users to read and follow it (in addition to the places where that is done already). It can also be re-emphasized in a separate FAQ question, if desired.
Add a short section/FAQ question regarding posting external and AI-generated content, containing the final version of the draft language above
Either here or as part of a separate discussion (perhaps the same one as the below), we should also add a short list of basic rules/guidelines for this Discourse (with the first one presumably being "follow the code of conduct at all times), including that as a rule.

Perhaps as a separate followup discussion, we could also add questions/sections giving concise but helpful answers to such common questions/issues like how to format code, posting screenshots of code, what category to post in, and whether this forum is the right place for your post, and maybe what basic information to provide when asking a question. This these could then be also included in or linked from the welcome message and possibly the Python Help, Welcome to Discourse and Discourse Feedback category descriptions. This would hopefully both help new users, and also provide easy pastable links for forum members instead of re-posting the same requests/explanations over and over. This would probably be best discussed and iterated on further in a separate discussion.

Also see my post on that previously, FYI

malemburg · March 29, 2023, 8:45am

In the light of what @notatallshaw mentioned above, I believe this needs to be reworded. If LLMs become the norm for doing spell checking, editing, summarizing, etc. we certainly don’t want to have a requirement to state this with every single post.

Overall, I can understand the desire to reduce the amount of distracting original LLM posts, but I also don’t think it’ll persist much after we’ve gone through the hype phase. Or to use Knuth’s words: “premature optimization is the root of all evil”

CAM-Gerlach · March 29, 2023, 9:02am

I did try to draw a bit of a line, albeit a rather fuzzy and implicit one, between “generating” a post (or a substantial portion of it) and just using it to revise an existing post you’ve written yourself, but I can make that more clear—what about this as a start (not 100% happy with the wording of the inserted parenthetical maybe its better as a separate sentence)?

If you quote or re-post content that was originally published on another site (blog, docs, tutorial, Stack Overflow, forum, GitHub, etc) either by you or someone else, always attribute and link the original source. If another person or tool, including a large language model (such as ChatGPT or other “AI” services) is used to generate your post or a substantial portion of it (as opposed to just using such to copyedit/proofread a post you wrote yourself), please disclose that as well.

sinoroc · March 29, 2023, 9:38am

I think maybe we can take a chance and ping @SamSaffron to ask what would be a good place for such a policy.

sinoroc · March 29, 2023, 11:52am

There is this official Discourse plugin called “Discourse Policy”:

Discourse Policy | Discourse - Civilized Discussion

Rosuav · March 29, 2023, 12:54pm

True, but a policy like this would at least make it clear when it has become the norm. If a majority (or even a significant minority) of posts are adorned with a boilerplate (Discourse supports signatures, right? Right?) saying “My posts take advantage of <X> to improve my grammar and formatting, but any factual errors are mine alone”, we’d know to reconsider the policy.

CAM-Gerlach · March 29, 2023, 1:01pm

That might be particularly useful for the Code of Conduct, as well as perhaps the guidelines about where to post, how to format code/not post screenshots of code, and how to include a minimal reproducible example (though perhaps better for the latter would be making a mandatory Discourse bot tutorial ensuring they actually learn these things rather than just click through it). Given the scale of the problem so far, I’m not sure we need a special callout just for reposting and AI post generation, but it could include point to the list/summary of all the main guidelines, per the proposed approach in my post above.

Quercus · March 29, 2023, 9:59pm

If we create a page with a friendly title, implying that the information within is for the benefit of the reader, we could send them there whenever necessary, without their feeling that we are yelling at them. The aim would be for them to feel that we are doing our best to help them get the most out of Python Discourse.

We could create a new go to (or send to) thread, permanently pinned at the top of the Welcome to Discourse! category with a title such as:

Best Practices for Creating a Good Post

or

How to Benefit the Most from Python Discourse

The title would be prominently displayed in large bold type at the top of the page.

Below the title, we would briefly discuss, each in its own concise paragraph, the common practices that make for a good post, or to put it another way, the issues that most often necessitate corrective action. These could include the following, perhaps each in a little more detail than given below:

How to format code, because by making it easier to read, it will increase the likelihood of informative responses.
Respecting others for the benefit of the quality of the discussion for all involved.
Disclosing the use of automated content editors or generators, when they have been used, because understanding the context of post’s creation may help others respond to it effectively.
Citing sources of material, when used, because it enables the reader to consult related information, so that they may learn more or formulate a helpful response.
Being as clear as possible when posing questions or providing answers. A careful review of an entire post before submission, with a consideration of whether the reader at the other end will understand it, will yield a better post than one that is submitted in haste.
Additional common issues not listed here …
…, etc.

Underneath all of that would be a subtitle such as:

Additional Details and Important Rules for Using this Forum

Following that subtitle, we could list all the standard go to pages, such as:

Lock (close) this new thread as soon as it is created, so that it does not accumulate distracting responses.

Revisit and revise it periodically, as necessary.

SamSaffron · March 29, 2023, 10:23pm

Some options would be:

FAQ
banners (very loud, everywhere, dismissable)
pinned topics (either per category / global)
theme components (anything you can imagine can be implemented… eg: must read FAQ prior to posting component could be built - we have a badge for read guidelines we could lean on)

I think the general problem here is … so many knobs, which I completely hear. I guess splitting off a “how do we surface important guidelines” topics is probably appropriate.

Back to the LLM discussion, we have this on meta which can help ground things:

It is a brave new world, we are all just scrambling to adapt. The last thing we all want is for these new shiny tools to obliterate all online discussion, they are a force multiplier. Finding the balance is really hard. (also keep in mind co-pilot has been with us for a while)

I think even the “this is easy just label it” solutions turn out to be hard. What if code you wrote was assisted by an LLM, you used it to brainstorm and then adjusted / tested / rewrote portions yourself.

Does that need labeling? Do we need a label for “I leaned on Google a bit” or “I read through Python source code”.

A straight cut-and-paste, untested, would be something I would caution to outright ban, but there is just so much grey here.

brettcannon · March 30, 2023, 12:22am

I think that’s a good point because it makes you ask what are we optimizing for? Anyone can provide bad information. And anyone can provide bad information convincingly. And anyone can provide bad information convincingly that they got from somewhere else.

If it’s volume we already have spam mechanisms built into Discourse to detect tons of posts coming too quickly.

So what are we specifically trying to avoid?

CAM-Gerlach · March 30, 2023, 8:07am

An excellent, well-rounded and thought out plan—SGTM, and a thread would also make it easier for non-admins to create and maintain, as moderators, category mods and any user granted TL4 or above could contribute to it, vs. just admins with the FAQ page.

I do think we should revamp our “FAQ” though at some point soon to be something more generally useful and appropriate than just the standard PSF CoC, as it has the major advantage of being our one persistently and centrally accessible page that doesn’t require diving into the pinned post of a specific subforum (especially if there isn’t a way to make it non-transient once read).

The one issue with a pinned topic is, at least AFAIK, they disappear for users after being read once (unless specifically searched or browsed for). Is there a way to make them persistent? Otherwise, this leaves the FAQ as the only page that can be referred back to by users after reading it once, which I would think would be equally useful to skimming through it during onboarding. Any insight here?

I suppose anyone can in theory, yes, but the fundamental issue with LLMs, as discussed upstread and why SO banned it outright, is that it makes it categorically far easier for anyone with minimal skill and effort to provide outwardly very convincing answers that require very careful reading or non-trivial subject matter expertise to determine they are inaccurate, as they’re missing all of the numerous flags, patterns and signals that are typically used, especially by more experienced readers, to quickly evaluate the quality and credibility of an answer without having to be a subject matter expert.

Of course, the concern here has fortunately been mostly hypothetical so far on the Python Discourse, the community of askers is currently small and tight-knit enough that persistent use of these methods in a harmful manner would likely get spotted fairly quickly. And as I’ve said, I don’t think we nessesarily need to blast everybody with announcements about this at this point, as opposed to focusing on more widespread areas we can improve in the community (as @Quercus enumerated above). But I don’t think its bad to be discuss the issue and be prepared with a response if the problem does escalate.