Community policy on AI-generated answers (e.g. ChatGPT)

I did try to draw a bit of a line, albeit a rather fuzzy and implicit one, between “generating” a post (or a substantial portion of it) and just using it to revise an existing post you’ve written yourself, but I can make that more clear—what about this as a start (not 100% happy with the wording of the inserted parenthetical maybe its better as a separate sentence)?

If you quote or re-post content that was originally published on another site (blog, docs, tutorial, Stack Overflow, forum, GitHub, etc) either by you or someone else, always attribute and link the original source. If another person or tool, including a large language model (such as ChatGPT or other “AI” services) is used to generate your post or a substantial portion of it (as opposed to just using such to copyedit/proofread a post you wrote yourself), please disclose that as well.

1 Like

I think maybe we can take a chance and ping @SamSaffron to ask what would be a good place for such a policy.

There is this official Discourse plugin called “Discourse Policy”:

True, but a policy like this would at least make it clear when it has become the norm. If a majority (or even a significant minority) of posts are adorned with a boilerplate (Discourse supports signatures, right? Right?) saying “My posts take advantage of <X> to improve my grammar and formatting, but any factual errors are mine alone”, we’d know to reconsider the policy.

That might be particularly useful for the Code of Conduct, as well as perhaps the guidelines about where to post, how to format code/not post screenshots of code, and how to include a minimal reproducible example (though perhaps better for the latter would be making a mandatory Discourse bot tutorial ensuring they actually learn these things rather than just click through it). Given the scale of the problem so far, I’m not sure we need a special callout just for reposting and AI post generation, but it could include point to the list/summary of all the main guidelines, per the proposed approach in my post above.

If we create a page with a friendly title, implying that the information within is for the benefit of the reader, we could send them there whenever necessary, without their feeling that we are yelling at them. The aim would be for them to feel that we are doing our best to help them get the most out of Python Discourse.

We could create a new go to (or send to) thread, permanently pinned at the top of the Welcome to Discourse! category with a title such as:

Best Practices for Creating a Good Post

or

How to Benefit the Most from Python Discourse

The title would be prominently displayed in large bold type at the top of the page.

Below the title, we would briefly discuss, each in its own concise paragraph, the common practices that make for a good post, or to put it another way, the issues that most often necessitate corrective action. These could include the following, perhaps each in a little more detail than given below:

  • How to format code, because by making it easier to read, it will increase the likelihood of informative responses.
  • Respecting others for the benefit of the quality of the discussion for all involved.
  • Disclosing the use of automated content editors or generators, when they have been used, because understanding the context of post’s creation may help others respond to it effectively.
  • Citing sources of material, when used, because it enables the reader to consult related information, so that they may learn more or formulate a helpful response.
  • Being as clear as possible when posing questions or providing answers. A careful review of an entire post before submission, with a consideration of whether the reader at the other end will understand it, will yield a better post than one that is submitted in haste.
  • Additional common issues not listed here …
  • …, etc.

Underneath all of that would be a subtitle such as:

Additional Details and Important Rules for Using this Forum

Following that subtitle, we could list all the standard go to pages, such as:

Lock (close) this new thread as soon as it is created, so that it does not accumulate distracting responses.

Revisit and revise it periodically, as necessary.

3 Likes

Some options would be:

  • FAQ
  • banners (very loud, everywhere, dismissable)
  • pinned topics (either per category / global)
  • theme components (anything you can imagine can be implemented… eg: must read FAQ prior to posting component could be built - we have a badge for read guidelines we could lean on)

I think the general problem here is … so many knobs, which I completely hear. I guess splitting off a “how do we surface important guidelines” topics is probably appropriate.

Back to the LLM discussion, we have this on meta which can help ground things:

It is a brave new world, we are all just scrambling to adapt. The last thing we all want is for these new shiny tools to obliterate all online discussion, they are a force multiplier. Finding the balance is really hard. (also keep in mind co-pilot has been with us for a while)

I think even the “this is easy just label it” solutions turn out to be hard. What if code you wrote was assisted by an LLM, you used it to brainstorm and then adjusted / tested / rewrote portions yourself.

Does that need labeling? Do we need a label for “I leaned on Google a bit” or “I read through Python source code”.

A straight cut-and-paste, untested, would be something I would caution to outright ban, but there is just so much grey here.

1 Like

I think that’s a good point because it makes you ask what are we optimizing for? Anyone can provide bad information. And anyone can provide bad information convincingly. And anyone can provide bad information convincingly that they got from somewhere else.

If it’s volume we already have spam mechanisms built into Discourse to detect tons of posts coming too quickly.

So what are we specifically trying to avoid?

An excellent, well-rounded and thought out plan—SGTM, and a thread would also make it easier for non-admins to create and maintain, as moderators, category mods and any user granted TL4 or above could contribute to it, vs. just admins with the FAQ page.

I do think we should revamp our “FAQ” though at some point soon to be something more generally useful and appropriate than just the standard PSF CoC, as it has the major advantage of being our one persistently and centrally accessible page that doesn’t require diving into the pinned post of a specific subforum (especially if there isn’t a way to make it non-transient once read).

The one issue with a pinned topic is, at least AFAIK, they disappear for users after being read once (unless specifically searched or browsed for). Is there a way to make them persistent? Otherwise, this leaves the FAQ as the only page that can be referred back to by users after reading it once, which I would think would be equally useful to skimming through it during onboarding. Any insight here?

I suppose anyone can in theory, yes, but the fundamental issue with LLMs, as discussed upstread and why SO banned it outright, is that it makes it categorically far easier for anyone with minimal skill and effort to provide outwardly very convincing answers that require very careful reading or non-trivial subject matter expertise to determine they are inaccurate, as they’re missing all of the numerous flags, patterns and signals that are typically used, especially by more experienced readers, to quickly evaluate the quality and credibility of an answer without having to be a subject matter expert.

Of course, the concern here has fortunately been mostly hypothetical so far on the Python Discourse, the community of askers is currently small and tight-knit enough that persistent use of these methods in a harmful manner would likely get spotted fairly quickly. And as I’ve said, I don’t think we nessesarily need to blast everybody with announcements about this at this point, as opposed to focusing on more widespread areas we can improve in the community (as @Quercus enumerated above). But I don’t think its bad to be discuss the issue and be prepared with a response if the problem does escalate.

1 Like

Add FAQ category that is locked for moderators + trusted only to write into?

Would the topics in that category then be always viewable and linkable?

We’ll probably see an emerging use of Google Bard on this forum over the coming months. I asked Bard an autobiographical question this morning, as follows:

Me: Is Bard a large language model?

Bard: Yes, Bard is a large language model (LLM) developed by Google AI. It is a conversational AI that can understand and respond to natural language queries. Bard is trained on a massive dataset of text and code, and it can generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.

LLMs are a type of artificial intelligence (AI) that can generate text, translate languages, and write different kinds of creative content. They are trained on massive datasets of text and code, and they can learn to mimic the style and structure of human writing. LLMs are still under development, but they have the potential to revolutionize the way we interact with technology.

This morning, I thought to throw a somewhat naive question about Python to ChatGPT and Bard in order to consider whether a general policy toward all large language models could cover the entire range of sophistication. ChatGPT is allowing me to retrieve both the prompt and response from this morning as text, but I can only retrieve the prompt from Bard. So, below is a text copy of the episode with ChatGPT and a screen capture of such with Bard.

First, ChatGPT

Me: If I use Python to add two binary numbers, will it give me a binary number as a result?

ChatGPT: Yes, if you use Python to add two binary numbers, the result will also be a binary number. This is because Python treats binary numbers as strings of 0s and 1s, just like any other string. When you add two binary numbers in Python, the interpreter will concatenate the two strings and perform the binary addition operation, resulting in a new binary string that represents the sum of the two original binary numbers.

For example, if you add the binary numbers “1101” and “1010” in Python using the “+” operator, the result will be the binary number “101111”. Here’s an example Python code snippet that demonstrates binary addition:

makefile

binary1 = "1101"
binary2 = "1010"

# Convert binary strings to integers and add them
decimal_sum = int(binary1, 2) + int(binary2, 2)

# Convert decimal sum back to binary string
binary_sum = bin(decimal_sum)[2:]

print("Binary sum:", binary_sum)

This code will output: Binary sum: 101111 .

Now, Bard with the same prompt …

It was somewhat funny to see that ChatGPT’s response began with “Yes,” and Bard’s with an opposite “No,”. That is not so surprising though, considering that the prompt was crafted to be a bit foggy conceptually. ChatGPT’s answer does muddle things when it brings up concatenation.

Though the above differ, a single uniform policy toward LLMs, in general, could cover both, for example this suggestion posed earlier:

Let’s recall that some of the participants in this discussion have voiced opposition to allowing the use of LLMs at all here. We’d best remain respectful of their feelings, perhaps by not being too quick to post a permissive policy.

My original view regarding the issue was:

This situation is quite nuanced and fluid, and now I’m fundamentally unsure how to proceed. By and large, the other participants in this discussion are more central to this community than myself. So, hereafter I might just ask a question now and then, and ultimately cast my view in favor of whatever consensus the community may reach.

A small aside, you did give ChatGPT and Bard two different prompts.

Your ChatGPT prompt:

Me: If I use Python to add two binary numbers, will it give me a binary number as a result?

And your Bard Prompt:

image

Notice the “binary numbers” vs. “primary numbers”.

According to Wikipedia “Prime Powers” are sometimes called “Primary Numbers”: Prime power - Wikipedia, I have no idea if that’s what Bard interpreted your question as but a better response probably would have been to ask for clarity on your question.

1 Like

Oops …


This time, we got a "Yes, ".

Hmmm … worth following my own advice.

1 Like

I’ve got a suggestion: instead of trying to detect and ban, deincentivize replying with ChatGPT by directly providing a function to call a ChatGPT answer on any post.

This may not be a practical solution right now because ChatGPT is proprietary and costly, but when there is a lot of work underway to try to match the performance with an opensource framework, and it’s only a matter of time before distilled, reduced models that can run very fast or even in the user’s browser JS interpreter to be available.

Then, it will become a very practical possibility to just offer a “AI answer this” button for any post.

Given that, as already discussed, LLMs lack any sense of factuality
for their responses, it seems like it could be very misleading to
users. People might click the “get an AI answer” button and follow
its advice, not realizing that it’s not actually an answer but
rather predictive text made to sound like a believable answer
without any significant probability of being correct.

3 Likes

Even if it might, or might not, suppress the frequency of direct incorporation of ChatGPT responses into posts on this forum, wouldn’t this proposed feature effectively promote and even appear to advocate the naive practice of uncritical solicitation and utilization of advice from large language models (LLMs)?

1 Like

Unfortunately, misplaced uncritical beliefs of users in LLMs will happen regardless of what forums officially choose, due to the automatism and anthropomorphism biases. I have worked in the past on military drones, and these biases have been well documented there already: even trained military crews, who are highly trained, have been reported to risk their lives to carry back damaged mine clearing robots in a dangerous field.

I may have a bleak outlook on the topic, but as a AI researcher, I see no way non-AI researchers can have a good enough grasp of such a highly intricate and polymorphic topic as to not unconsciously get their perception of reality biased by AI models that are now ubiquitously available (and even before ChatGPT, there were algorithmic bubbles everywhere online due to recommendation algorithms). There are already tragic examples, such as a belgian health researcher ending their life (hence someone with a fair bit of education in a developed country) because of a GPT-J based chatbot.

What I am suggesting is to commoditize these tools so that humans can learn to tame them, for example with a clear explanation of what they do, and with a clear UI distinction between AI-generated answers and real human answers. I am not suggesting as you seem to imply that AI-generated answers would just be posted on the forum, but rather suggested in a very distinct UI field, eg, like the GitHub system of recommending duplicate answers when creating an issue, it could also suggest a succinct AI-generated answer. In fact this would be a very similar UI experence to summarization algos.

Yes, the lack of a concept of correctedness is a major issue, there is no formal guarantee, but other algorithms such as summarization algorithms are widely used (even here) and they also lack such a concept.

And while I certainly agree that the lack of formal truthiness guarantees are a major issue, and especially because the bot lacks any way to search for up-to-date information, and I’m sure there are additional heuristics that can be devised to improve practical accuracy, let’s not forget that there is no way to guarantee that any statement is true. Bots cannot solve a fundamentally impossible problem that humans are not exempt of either. And this is not the first criticism that was heard about the potential for a new technology to propagate misinformation: I remember the 1990-2000s when Internet was considered a cesspool of only amateurish-fake-conspirationist information (which partially true), and the invention of the printer was likewise criticized as it could be used to print pamphlets in huge volumes.

The TL;DR of my suggestion is that, while I certainly agree with a blanket ban in the current situation given we lack tools and hindsight, progress cannot and should not be stopped, LLMs are here to stay, future technos will be even better at mimicking human discourse, so I think that the only sustainable long-term solution is to educate humans about how to use these tools and their limitations, just like what happened historically with other new communication technologies. Devising standards about how to present such AI-generated content will certainly help in this endeavor.

1 Like

On the contrary, the following implies that your proposed feature might or might not succeed in decreasing the frequency of AI-generated answers on the forum:

Please clarify whether or not the provision of the function alluded to in the following would be offered in order to make it convenient for a user to seek an AI-generated answer in lieu of their soliciting a reply directly from a human on this forum:

Reviving this a bit, it seems as predicted LLM (specifically ChatGPT) based answers are starting to proliferate. Reviewing recent #users questions and answers over the past few days, I noticed we have one new user, @Fusion9334 , who has joined 3 days ago and has answered a number of questions in a pattern which appeared to me to be very likely to be at the very least heavily assisted by ChatGPT or a similar LLM. This conjecture was apparently confirmed by checking their profile, where the one topic they have posted is a question specifically about using ChatGPT’s Python API. In fact, as it was their first post, it seems that was in fact what brought them to this forum.

Now, to be clear, this user hasn’t done anything against the rules—however, I think their history might be a very useful set of datapoints to learn and discuss more practically about how LLMs might be used (for good or ill) in answers, and how we might address any negative impacts. Furthermore, their input here would be appreciated as well.

What follows is my personal initial impressions. It seems they’ve taken at least some care to not just jump the LLM output, and tailor it somewhat to the situation faced by each poster. Additionally, it seemed like the LLM’s wide range of background allowed it to potentially answer in much more detail (if not entirely correctly) to specialized subject matter questions about specific tools and services that the typical users here helping others were unlikely to know about.

On the other hand, on many more basic Python questions (more in scope for this forum), I did notice many instances where the other posters were trying to engage the user, typically a beginning learner and often working on an assignment, in a pedagogically motivated discourse to actually help them learn. However, before I even considered a LLM might be involved, I noticed the replies by this user were running somewhat contrary to that, directly giving the a bunch of code (that may or may not address the real problem) with relatively minimal explanation, that al least implicitly encouraged them to just copy/paste rather than actually learn something, like the other people were trying to each them. In fact, I was considering mentioning it privately to the user to consider in their approach to answering future questions, before I suspected an LLM.

After examining this real-world history, I’d like to hear others’ further thoughts, discussion points and proposals. Thanks!

3 Likes