Community policy on AI-generated answers (e.g. ChatGPT)

Various people in this discussion have indicated that they would like to be conversing with humans rather than with machines, myself being one of them.

While it is difficult to prove or quantify analytically, the cited example feels rather flat and dispassionate to me, as if it were generated mechanically rather than by a human. Even if individual responses by LLMs differ in that regard, with some of them being difficult to distinguish with certainty from those generated by a person, a proliferation of responses from LLMs on this forum over time may gradually erode the general tenor of the discussions over time, leaving us with a venue depleted in genuine human flavor.

3 Likes

I think this discussion is lengthy enough and I feel the same message gets repeated again and again.

My conclusion is this:
We can’t reliably detect if a message was written by AI, or from other sources, unless if they told us.

I also don’t think it’s worth flagging such messages to moderators. It simply adds burden to moderator and I don’t know whether it really helps anyone.

I think will be good to set policy in which people must disclose their source. I.e, tell us if they copy code/content from some tutorial, stack overflow, someone’s social media post, or from ChatGPT.

In case you suspect someone’s message/answer is AI generated, there is no need to flag it as spam, inappropriate, etc, especially if it was the first time they do that. Just ask. “Where did you get that from, I wouldn’t have thought that” if they then say, “it’s from ChatGPT” you can then remind them that to mention that fact next time since that’s how we do it here.

Let’s start with this change, and let’s evaluate again in the future if things doesn’t improve and if see that there will be a need to start banning ai-generated content.

11 Likes

Just for the record, as the moderator who initiated this discussion, I wouldn’t have done so if I wasn’t willing to accept the burden of enforcing whatever policy was decided on, and it seems the other active moderator feels the same, so I wouldn’t want that to be a factor in our consensus here either way. IMO, I am fairly new as a mod but so far the mod burden of this Discourse is very low overall compared to what I’m used to elsewhere, and most of the work there is is just fixing formatting, moving threads/posts around, etc. (and much of that can be offloaded to TL3/TL4 users and category mods if really needed), so adding maybe one extra flag a day (almost certainly much less) is negligible.

In any case, If people agree on the approach @Mariatta suggested independent of that, then it seems reasonable to me. Regardless, people shouldn’t hesitate to report messages that are AI-generated if they are spam, off topic, mislead users or violate the CoC. And IMO, likelihood of AI generation/plagiarism/mass posting is still a useful input variable to be used in spam post filtering and spam account enforcement (which is fly Discord has some filters for it by default), even if not on its own the primary reason for a mod action. Conversely, for posts by human users with no clear evidence of bad-faith, @zainkhan 's case demonstrates that taking a gentle approach and simply asking them can indeed be effective while avoiding discouraging a potentially valuable contributor to the community.

If you mean this:

then yes, sounds good to me.

1 Like

Indeed, that was reassuring. Based on our gained optimism, we could set a policy of disclosure, but remain alert regarding whether the forum experiences a gradual loss of quality.

In contrast to the one in the apologue, an actual frog would evidently take action before it is too late.

To be as fair as possible to all, could we have another consensus check from others who have contributed their good work to this discussion before proceeding with the aforesaid policy change?

It’s great to hear that you’re willing and able to take on additional moderator responsibility here, so thank you.
Though I want to consider long term sustainability and scalability in effort not just for yourself, but also for other current and future moderators. I think for the next while, we will see increased user base, increased use of AI everywhere, whereas the number of our moderators don’t change by much, so I think let’s not add up even more responsibilities to moderators for now. I do believe that reminding people (which anyone can do) that they need to cite their source will be more useful and impactful (and seems like less of a punishment) than having their message hidden/flagged and awaiting moderator action.

3 Likes

As someone who considers themselves an active moderator, I actually prefer Mariatta’s approach. I know several people who use ChatGPT to help them craft a response in English to emails, messages, etc. because it’s not their primary language. So while the structure and such of the writing might seem “AI-like”, that doesn’t mean the text doesn’t accurately reflect the intended content.

3 Likes

I’d like to provide an example that supports the idea of using LLMs as an editing tool instead of a sole post generator. Although English is my first language I find myself using ChatGPT to compose posts on Python help forums (albeit not here).

My typical process involves writing out my post or response in a notepad and then prompting ChatGPT to edit it for me. I ask it to improve the post’s spelling, grammar, clarity, and conciseness. After a few back-and-forths with ChatGPT, I copy the edited text, make any necessary tweaks, and then post it.

I find this approach particularly useful when I have a lengthy train of thought, for me the effort required to trim it down to a more digestible length can be as much as the effort I’ve put in to already writing my first draft.

I would consider “here’s a post, please edit it for me” to be very different from “hey ChatGPT, please answer this question so I can just use that unchanged”. Though it’s still worth annotating that you got ChatGPT to edit your post, for the same reason that I would always tell someone if I used Google Translate to post a message in their language.

1 Like

For me they don’t seem equivalent, I don’t tell people I used Firefox’s inbuilt spellcheck or Android’s inbuilt grammar check when I make a post.

I’m not taking output from ChatGPT I don’t understand, and I’m making the decision on whether to use it’s output wholly or in part or whether edit it further.

But perhaps I’m missing something here.

The reliability of spellcheckers is significantly higher than that of grammar checkers, which in turn is much more reliable than automatic machine translation.

More importantly, most people can recognise when the spell checker gets things badly wrong. (I hope!) They might not know how to spell “antidisestablishmentarianism”, but if the spell checker suggests “antirepublicanism” most people will usually suspect it is wrong.

But if your language translator gives you the Spanish “antidesestablecerismo” (anti-establishmentism, the opposite of antidisestablishmentism!), or worse, the Finish “ilmatyynyalus” (hovercraft, possibly full of ankeriaita), chances are you will have no clue.

(Warning: I speak neither Spanish nor Finish, and have relied on Google Translate. Any mistakes are Google’s fault.)

Bringing it back to ChatGTP and friends, I don’t have a problem with people using ChatGTP to generate code which they then run, edit for style, and debug before posting. That in itself is no worse (and maybe better) than linking to a random blog or StackOverflow thread where the code may be of, let’s say, variable quality.

So I’ll walk back my early statement that it should be banned for answers. Unchecked ChatGTP code should be banned, but if a human uses it as a starting point and checks it, that’s not so bad.

Sorry, I mistyped that during editing—I meant to say “it seems the other active moderator who commented feels the same”, rather than inadverntantly implying we were the only two active moderators. My mistake!

Ah, I didn’t consider the use of ChatGPT to edit existing content—I actually wasn’t aware it could even do that. That’s certainly a very different case, at least if the human is responsible and reviews the resulting content to ensure it accurately reflects what they intended to write. Personally, I’ve not been impressed with the results of less-sophisticated grammar checkers like Grammarly; for example a PEP author used Grammarly to suggest a bunch of edits to their PEP and at least half if of them were either no net improvement or straight up wrong/regressions. However, at least assuming the source material is of at least decent quality, then I imagine the result can be quite an improvement, generally.

Or if a human repeatedly spells “ChatGPT” as “ChatGTP” :stuck_out_tongue:

But yeah, that is the overriding concern about ChatGPT and similar AIs, that their results make it much more difficult to tell when something’s wrong and figure out what the person meant, as they lack most of the patterns that both simpler tools and most humans rely on to detect such errors. Capturing the form and pattern of “good” text is what machine-learning models, especially deep ones, do best, whereas getting the substance right is the much harder and yet not fully-solved problem. So I’m not sure how I feel about treating them just like a far simpler spelling/grammar checker and not requiring disclosure, though I do also see the argument for it.

Tangent: I believe that that was because Grammarly is designed to elevate poor English to normal conversational or business English, but technical language is distinct from typical conversational/business language in ways that make it harder to judge. It’s like how adsorb is a word, but it’s more likely to be a typo for absorb, so a lot of spell checkers will flag it as wrong. I’m sure Grammarly’s a spectacular tool for a lot of people, but it’s not well suited to editing a PEP.

Indeed, this is true. That and some of the other recent dialog here suggests that is it inevitable, for better or worse (most likely both!), that we will see here a gradual increase in the presence of ChatGPT and other LLMs. Along with the evolving mix of human and machine dialog (also for better and for worse), the tenor of the forum will gradually change.

With that noted, I thought a little experiment was in order comparing how a human, such as myself, might react differently to a particular sentence copied from this discussion. I first wrote my own reaction, then asked ChatGPT for its reaction.

The sentence was this one:

My reaction was:

“Finish” might be “Finnish”, unfinished.

ChatGPT’s reaction was superior in that it noted the spelling issue, but it resisted the temptation to offer a bad pun. :wink: I, being a mere human, yielded to the temptation. So yes, things here are gonna change. Let’s hope it will be for better more than for worse.

As a long-time proponent of dad jokes, I strongly dispute that this counts as “superior” :smiley:

2 Likes

Anyone discussing typos and spelling misteaks is guaranteed to make at least one :wink:

You misspelled “inferior”.

3 Likes

As always, you have unflailingly provided us with some good meat for discussion. :wink:

:grin:

… and perhaps a spelling adjustment is also merited here, in order to correct a subtle self-contradiction:

“Finish” might be “Finnish”, unfinnished.

We may save the above banter for posterity, perhaps as entertainment for when we review what unfolds over the coming months, but it would be best now to get back to the topic at hand. It might be helpful for someone to offer each of the following in draft form:

  • A public statement of policy regarding the use of large language models (LLMs), such as ChatGPT, on this forum
  • A protocol of best practices for responding to posts that we suspect may have violated that policy
1 Like

I’ll offer a rough draft based on @Mariatta 's proposal above for further feedback and iteration:

Here’s a start, covering both LLMs and other sources:

If you quote or re-post content that was originally published on another site (blog, docs, tutorial, Stack Overflow, forum, GitHub, etc) either by you or someone else, always attribute and link the original source. If another person or tool, including a large language model (such as ChatGPT or other “AI” services) is used to generate your post or a portion of it, please disclose that as well.

I’ve ripped off adapted a clarified and copyedited version of @Mariatta 's suggestions above:

If you suspect someone’s message/answer is AI-generated, there is no need to flag it as spam, off topic, etc. if it wouldn’t otherwise qualify as such, especially for a new user. Just ask where they got it from, and if they then say it’s AI-generated or taken from somewhere else, you can then remind them that to mention that fact in that and future posts following our guidelines.

3 Likes

BTW where does such guidelines live? Are we adding it to the FAQ FAQ - Discussions on Python.org ?

1 Like