Survey on Python comments

Hello, my fellow pythonists! I apologize if this is not the most appropriate subforum. I need a few minutes of your time to help me conduct a survey for academic research.
I am a PhD student, studying the problem of automatically generating short descriptive comments for source code. We are looking for volunteer participants for an anonymous online survey to help us evaluate our approach against other techniques, and would really appreciate anyone who would like to participate by evaluating a range of source code comments along various quality measures. If you are at least 18 years old, comfortable reading Python documentation in English, and comfortable with the terms of the completely anonymous survey (described on the first page of the survey), you are eligible and we would greatly appreciate your responses. Thank you in advance!

UPDATE: the survey is closed now.

1 Like

Any comments that can be automatically generated are useless. The whole point of comments is to carry information that ISN’T in the code itself.

5 Likes

Well, thank you for your frank feedback on the last few years of my life :slight_smile: I am going to have to disagree with you there. There are plenty of uses for comments that are inferred from code directly. A simple example is for those who just starting study programming. A complex example is verifying that the code you wrote actually does what you expect it to do. Or producing documentation for old projects that don’t have original developers on stuff anymore and are not properly documented. There are a lot more uses, of course.

Also, I would really appreciate if you take the survey, it’s about 20 minutes long :slight_smile:

2 Likes

Okay, then I will revise my statement: Anything generated from the code itself is not a comment, it is analysis and review. It may still be of value, but code comments serve a quite different purpose. Asking for analysis of “what does this code do?” can be extremely useful, as you say, to confirm that it’s what you expect; though it absolutely MUST be 100% accurate, and I have yet to see that from coding AIs. For a novice trying to get a grip on what the code’s doing, it may also be of value, but again, this should not be saved back as code comments.

So maybe I’m just quibbling over terminology, but I will stand by my statement that code comments should not be geenerated from the code.

When mentoring my feed back on comments is they need to say WHY code does what it does. Not WHAT it does, unless its obscure code, then justify the obscurity.

Can you do either?

1 Like

I am curious, how do you feel about doctest type of stuff, frameworks that use comments to write executable tests inside a function docstring comment? Are they comments in your opinion, do they provide value? In my research I found quite a lot of projects writing tests this way.

Also you might be happy to know that what I am developing is not an “AI” tool per se, it doesn’t use any of machine learning things and is based on formal methods. My system is guaranteed to produce 100% correct comments :slight_smile: Although it is still in early states, so there is not much to it.

I also look at some files in your github. Most of the comments I saw could be inferred from code, except for TODOs. Can you give an example of what you think the real comment is that cannot be inferred from the context?

They “why” is quite complicated. A lot of it has to do with business requirements or some crutch bug fix. I can’t just come up with “why” out of thin air, but eventually I might be able to do so if the requirements are provided.
It also would be possible eventually to say “why” if it’s something code related from the same code base.

I will give you a little example that I will steal from another forum. Someone asked a question, why does this code runs and doesn’t produce an error, and what is the usage of a colon here

print: float(input("Value?"))

if you run the system I am developing, it would tell you what this statement means, which will also help you with understanding why and how it works.

This could be useful for docstrings, which are more concerned with what rather than why.

1 Like

Docstrings are different again, they’re not the same as comments. But doctests, docstrings, and other such features are, like comments, created by the programmer as a deliberate action. You can’t automate them. If you generate tests from the code, all you can ever prove that nothing’s changed - and while that CAN be of value, it’s not the same as actual planned doctests or other tests, which test the intent of the code.

Okay, good, but even so, this sort of analysis is not what I would consider code comments. It’s a commentary on the code, so the terminology gets a bit tricky, but it definitely is not the same thing as what most people will think of when you talk about “comments” for source code.

There are a few that can theoretically be generated from the code, such as these:

from dataclasses import dataclass # ImportError? Upgrade to Python 3.7 or pip install dataclasses
import lzo # ImportError? pip install python-lzo

but their purpose is to be printed out alongside an exception, as an easy way to inform people how to get the necessary dependencies. Not common.

Could this be generated from the code?

def money(savefile): savefile.money[0] += 5000000 # Add more dollars

Would your program know that money]0] is the number of dollars you have? If so, how, and where is it getting that information from?

What about this one?

	# Random note: Glitch attachments that begin with 0 are identified correctly
	# eg GD_Ma_Weapons.Glitch_Attachments.Glitch_Attachment_0421 gives O0L4M2A1.
	# Other attachments have the internal name give an Amplify value higher by one
	# eg GD_Ma_Weapons.Glitch_Attachments.Glitch_Attachment_2144 is O2L1M4A3. Odd.

In fact, I’m going to invert the question. Can you give an example of any comment of mine that CAN simply be inferred from the code?

I usually add comments to explain to my future self WHY I am performing a math calculation, a float round up or round down, etc. That WHY is impossible to be understood from the code alone.

And the previous text itself cannot be understood WHY it has been written without a header like “Reply to Survey on Python comments”.

After doing a bit of the survey: You don’t seem to be interested in code comments, but in docstrings (or your examples are picked very badly). Those fundamentally serve different purposes. I am willing to believe that docstrings for simple functions can be auto generated. But none of the examples I have seen in the survey (whether human or AI generated) would add any value as a code comment. All example comments answer what the code does, not why.

1 Like

After finishing the survey: One “comment” would actually be helpful, and as expected, it in no way describes the actual code, but instead talks about something somewhat unrelated.

I am also a bit disappointed that most of the example functions are code that should never be anywhere close to production: They re-implemented basic python or stdlib features. Those don’t seem like useful tests for those tools. And as soon as we went outside of these very basic examples, the quality of the tool’s comment massively fell off.

1 Like

The theory of computation disagrees. A lot of information is computable, but non-trivial.

I’m not sure what you mean here. Yes, of course a lot of information is non-trivial but computable, that’s why we have caching and such. But we’re talking about code comments here, not arbitrary information. What comments would you ever want on your code (or docstrings etc) that can be entirely generated from the code itself?

Since docstrings are documentation for users who might not see the code, them describing What the function is doing could be useful for simple utility functions, and these functions can potentially have their docstrings auto generated to a useful degree. But IMO the functions where this is useful would be somewhat rare. But for code comments that shouldn’t explain What, but Why I fully agree.

Yeah. It’s like trying to generate autodoc comments from the code - sure, you can generate a template, but ultimately, autodoc comments are only as useful as the information that’s been put into them.

The comments here seem a little harsh to me. The OP’s pitch of their system as an auto-commenter is perhaps not the best angle.

A system able to produce 100% correct explanations of what non-trivial code does in natural langauge is obviously useful, even if it cannot explain why the code does what it does. For example, as an aid to programmers needing to understand a codebase in a language they are not fluent in.

3 Likes

Hi - I filled in the survey, since I was kind of curious, but did indeed get bored towards the end - as you expected - so I skipped some stuff. It’s good that people do these kind of studies, I think, but I was disappointed by the curious mixture of triviality and bad style (!) in the code snippets.
Almost none of the sample comments seemed helpful to me - partially because of the triviality of the code. Also, most comments seemed ambiguous to me. “Helpful” itself is a rather vague word - leaving unspecified: helpful to whom, and to which end(s).

Almost all comment samples were trying to describe what the code snippets were doing. I generally agree with others in this thread that descriptive comments like that rather belong in doc strings – assuming they are valuable at all. In only one case did a comment come close to commenting on why the code was written (the bugfix comment). A comment like that is in this context (isolated code snippets) not helpful at all imo, (to be helpful you need to know what the bug was for instance), but in actual repos it could be very helpful since it places the code in wider context.

I wonder why you didn’t take actual code samples from open source on the GitHub and tried to make a survey out of that? Given the status of current LLMs – ChatGPT or CoPilot – you could easily have the LLM generate alternative comments to actual code samples, and then ask human respondents to compare and score (without revealing which comments were human-made and which auto-generated).
It would be a bit more work - I imagine - but your study could also become much more valuable.
Alternatively, if selecting actual code is too laborious, you could also focus on sample code in Python tutorials or in small demo projects.

I tried to take the survey, but boredom hit hard. The comments were so long, attempting to rephrase what the code does in different ways and, of course, redundant. The rephrasing was so confusing, considering the code itself is very technical. It’s very different to say ‘until string length…’ than ‘while string length…’. Python has no ‘until’ statement.

Hi, thank you for your time taking the survey. In fact, I actually did exactly as you said. All of the code snippets are taken from the open source projects and some of the comments are generated by ChatGPT and some are hand written by the developers of the project.
I am creating a totally different way of generating comments that has not been done before, and it’s quite a challenging task. I have to start with simple and trivial comments and function summaries because doing more complex tasks requires a lot more time and work and I am constrained by academic requirements to publish and such. This is just the first step and eventually I would be able to produce more complex and interesting comments, hopefully.
I really appreciate you feedback and time :slight_smile: