Survey on Python comments

sgrey · January 6, 2024, 12:57am

What you’ve been saying so far is that “comment” must say something that is not in the code itself and cannot be inferred from the code. So by your own words, this “example of a bad comment”

i = i + 1;         // Add one to i

taken from the page you linked to is not a comment.
Now you are seem be to saying that it is indeed a comment, just not a good one. So, do tell me. Is this a comment or not? Because previously you seem to be telling me that when a string contains this kind of information, it is just a description or a summary, but not a comment.

They also literally say points to your contrary

It’s a good idea to comment code that someone else might consider unneeded or redundant

I mean, your whole argument is basically that this is bad by definition. No?

MegaIng · January 6, 2024, 1:31am

Sergey:

What you’ve been saying so far is that “comment” must say something that is not in the code itself and cannot be inferred from the code. So by your own words, this “example of a bad comment”
i = i + 1;         // Add one to i
taken from the page you linked to is not a comment.

I am sorry? Where did I claim that? With this claim you just convinced that you are a troll…

Ofcourse it is a comment. Anything after a comment sign (# in python, // in many other languages) is a comment # dasdafefasfdasd is a comment. Just not a good or useful one.

No. Please read my arguments again, with an open mind.

sgrey · January 6, 2024, 1:36am

In the post I replied to originally. Also @Rosuav was saying the same thing multiple times in this thread, verbatim. You just have to scroll up to see. Here →

you have defined a comment to be something not present in the code. If it’s not what you meant, please explain, then.

Also it is possible that both you and @Rosuav were making similar points and I am conflating both of your arguments into one, but it might not be the case.

Anyway, I didn’t claim that every comment generated by ChatGPT is needed. I said that in general all generated comments in that example are useful for understanding the code to someone who has never seen it. I completely agree that document “nonlocal” declaration is redundant and I would never write it. But I also specifically asked for a comment to be there from ChatGPT.
You can’t actually blame the generation system for producing the comments that you asked it to produce. Many of those comments are indeed redundant and can safely be removed, but some of them are actually stuff I would keep.
For example, I would keep this one

# Hide the cursor
    curses.curs_set(0)

this is useful and informative. This api call is poorly designed for readability and unless you know the api and have worked with it recently, you won’t be able to figure out what it does.

 # Display each line of the popup
    for i, line in enumerate(lines):
        if not isinstance(line, tuple):
            line = (line,)
        popup.addstr(i + 1, 1, line[0][:width - 6], *line[1:])

I would also keep this one, albeit this one is arguable, I am also willing to let it go. But this is in general a good summary for the following block of code which is not particularly clear what it does without knowing api in detail. The function addstr is not descriptive enough for me to infer that it actually for displaying the line and not for something like adding data to internal data sctructure like a list box.

and this

# List to store packages without automatic dependencies
nonautodeps = []

isn’t this an example of a perfectly good and useful comment generated by ChatGPT? I mean this is perfect. It tells my why this object exists and what it is used for. I don’t have to go parse the code with my eyes, track all usages of it and then figure it out. Well, if it’s actually correct. If it’s wrong, then it’s obviously bad.

hansgeunsmeyer · January 6, 2024, 1:48am

That’s not a bad place to start. But it still is just one opinion in a sea of other opinions.
What a comment is, is pretty clear (anything the interpreter or compiler will handle as such and – usually – ignore). What a “good” comment is (and whether or not it’s good to have any comments at all) seems pretty subjective, dependent on the whole context and especially on the presumed audience.

The wikipedia entry about Comment was also pretty decent, I thought. It takes a less prescriptive stance than the StackOverflow commentary. Quote:

Stress relief
Sometimes programmers will add comments as a way to relieve stress by commenting about development tools, competitors, employers, working conditions, or the quality of the code itself.

Would those count as “bad” comments? They surely felt good for the developer when they wrote them, I assume (There is probably still some of my own code floating around in a very big company where I commented ‘this is total crap’. A comment like that can become a useful meta-comment to others too. Even a crappy comment like that – vague, not specifying anything about what it refers to or why it would be “crap”, or what that even means – can become useful and “good” depending on the context. And it’s definitely “good” in the sense of being easy to read and drawing the attention of the reader. )

cameron · January 6, 2024, 2:04am

Well I don’t. There are useful comments which are technically redundant
because they can be deduced from the code.

You’ve asserted elsewhere that some things are more analysis or review
than “comments” (let me say I understand that to mean “a comment useful
to the programmer reading the code”). But a (missing) opening overview
comment can easily be of benefit, and such things if written after the
fact are effectivly analysis. I’m including here block comments inside a
function above some chunk of code.

There’s a heap of badly (sometimes) and under (often) code out there.
Undercommented code in particular can benefit from some comments which
are directly implied by the code.

An autocommenter which analysed some code and produced suggestions for
useful comments, both the function leaing comment andor docstring and
also for important points within, such as a preamble to a loop or
something, could well be a very useful tool. Even as a review tool
assistant to aid getting the code better commented before a merge.

Something that’s niggling at me reading this thread is the discussion of
comments which hold information not deduceable from the code such as
commentry about the wider purpose of the programme which informs the
code in front of us.

I write a fair bit of what I think of as “library” code - code which is
a tool which can be used to a larger purpose. I frequently find it
useful to distinguish in my mind “mechanism” from “policy”. The lower
level the code, the more it is pure mechanism. Policy belongs in higher
level code.

So a lower level function, being mechanism, basicly does what it says on
the tin, and its comments will tend to be mechanically deduceable from
the code. Maybe it has some tuning variables (like a
follow_symlinks=False parameter, which lets the caller dictate some
policy, and a conservative default policy).

There’s a threshold between trite annoying comments eg “increment i” and
useful comments which explain an aspect of the code which makes
something complex more reaily understandable. Because we’re talking
about mechanism only in a library function, there’s no larger context to
mention or inform us. Just summary/analysis/notes to aid comprehension
by the programmer. But without the larger picture (because this is
purpose agnostic library mechanism) such comments are technically
redundant in that someone can understand these implication by deep
thoguht about the code. That doesn’t make them useless.

Cheers,
Cameron Simpson cs@cskk.id.au

MegaIng · January 6, 2024, 2:16am

Ok, can you please stop reading what I am saying in bad faith? I obviously mean that this is what good code comments are IMO. Sorry that I don’t use 100% precises language all the time.

And from what I can tell, most of those opinions agree with this one. Or can you find some that significantly differ in what they consider good code comments (outside of this thread)?

Because it doesn’t try to define what a good code comment is, but instead document how code comments are used. Which is basically useless in a discussion about how good code comments should be.

That depends on the situation. If they just consist of insults, yeah, they are bad. If they at least give a decent amount of hints about the context of the code, they can be quite good.

sgrey · January 6, 2024, 2:29am

I am not trying to read your posts in bad faith. It is possible that I have misunderstood what you said, but in that case you are welcome to clarify. You also don’t answer any questions and never provide any counter arguments or examples or definitions when asked. You only expressed 2 things in general

all comments generated by an automated tool are generally usless, with some very specific exceptional cases
comments that are useful are by definition cannot be generated.

But you haven’t provided any examples of the information or sample comments. And I actually gave you many examples of what I would consider useful and what was generated by an algorithm.
I am also saying that there is inherent usefulness in comments that summarize or describe what next block of code does. This also implicitly tells you why that code is there and serves as both documentation and check - a code reviewer or someone else can read the comment and see that the code doesn’t actually match. Which indicates that either comment or code has a problem, which leads to an improvement in either one or both.
If you actually provide me with definition of what exactly you consider a useful comment to be, and provide examples of types of useful comments, I would indeed be very greatful.
And if you disagree with the examples I provided, then I am also very interested in knowing why you think that way.

There is also this

Chris Angelico:

Is it? How? Consider this example:

//For every option, filter out every other option.
return options.filter(keep => options.includes(check => check !== keep || check === '*'));

which is an example of bad human-written comment and for which ChatGPT actually generates a useful summary

The code filters an array of options, keeping only elements that either match a wildcard (‘*’) or have a unique value compared to other elements.

Yes, it could be phrased better, but it is descriptive enough for that piece of code.

MegaIng · January 6, 2024, 2:52am

I don’t think I said that everything that can be generated is useless. I am saying that almost all of it is useless as code comments, i.e. stuff put into the source file along side the code. These generated summarize would be useful as a side bar (or injected into the code by the IDE), generated on the fly, based on the code and existing developer comments.

Duplicating information inside the source file is almost always clutter, risks going out-of-sync and places an unnecessary burden on reviewers.

Good code comments should contain information that isn’t obvious from the code itself: Be it external information about the save file format we are interacting with, references to bug fixes performed, reason for why a particular weird pattern was chosen, or sometimes, only sometimes, explanations of what the code does.

All these tools can only reliably produce this last category of comments. They can at best guess at the others, and wrong code comments are worse than no code comments (since they are going to mislead people who don’t quite understand the code).

This is somewhat similar to way I and many others don’t want to see LLM output on StackOverflow: If you want computer generated output, there is no reason to preserve it long term. Just ask the question to the LLM when you have the question and use it’s output.

Or with other words: Why do you want to add the descriptions your tool generates as comments to the code? Just because people sometimes view the code in environments where they wouldn’t have access to your tool?

My problem with your tool isn’t what it’s doing. My problem is that you insist on calling it code comments instead of descriptions.

sgrey · January 6, 2024, 3:00am

Well, this is why I brought the issue of what is a comment before. Why do you have an issue with me calling comments comments, if

which is basically how I am using this word. I have also clarified for the sake of discussion

and in general I agree in regards what information a good comment should contain. But as someone who had to debug and fix one too many someone else’s undocumented projects, I am of the opinion that writing out what your code does is useful. Of course you shouldn’t do stuff like “i + 1 adds 1 to i”. But writing summaries for functions and cohesive blocks of code is very useful even to yourself in case you come back to your project 2 years later and cannot remember what exactly you did, especially if there is some complicated logic involved. And also documenting what exactly regular expressions supposed to is immensely beneficial.

MegaIng · January 6, 2024, 3:02am

Because it implies you want to add this to the source code file, on disk/github repo. That is the part I have an issue with. As I said many times now.

sgrey · January 6, 2024, 3:07am

I do want to add them to the source code, but later on. I just completed a milestone in my project and I need to collect empirical data for qualitative analysis of the current state, and then publish my paper. Hence, the survey. After this is done I will get my PhD and will be making improvements on this to produce more meaningful comments.

I cannot overstate again, that this is cutting edge research. I am solving an unsolved problem with a method that has never been tried before in history. It’s not a project to deploy into your PyCharm as a pliug-in to generate garbage. This is a very long way before it sees any publicity in actual real-world use. If I didn’t plan for it to be used later on or improved by someone else later, I wouldn’t have started it.

MegaIng · January 6, 2024, 3:29am

Yeah, for sure this is an interesting research project and I am also interested where this goes long term.

I am just of the opinion that generating code comments is not the most useful way this tool can be used, and that in fact that usecase is going to be a bad idea most of the time. Instead using a tool like this (basically all of the technology should carry over without any problems) to create explanations for code sounds way more useful and I have literally zero objections to that.

Therefore, framing this as “a tool to generate code comment” is going to imply baggage that the term “code comment” has, particularly:

Beginners, especially in novices in formal education are told to comment on every line. This is useless for real world code (most of the time).
Code comments serve many purposes (documentation, explanation, “What”, “Why”, references to sources, rants about project requirements, …)
Style guides have quite a few opinion on how and when to comment (from “any kind of comment is a code smell” to “every line needs to have a comment”)
Comments take up the same screen space as code and most of the time can’t be turned off

So by saying your tool generates (useful ^[1]) “code comments” , you are saying that it’s not gonna fall into that first category, and the generated comments are not going to conflict with any other uses of comments. The tool as presented in the survey does not at all manage to do that. Long term, it might be able to do that. But this tool in it’s current form might ^[2] already be useful for a different purpose, and IMO you should focus on that.

I at least would hope that’s the goal ↩︎
Depends on how carefully selected the example code snippets are ↩︎

sgrey · January 6, 2024, 3:47am

Calling what I am producing an explanation or a documentation would be appropriate. Although it depends on what exactly you mean by an explanation But what is in the survey is not everything that I can currently do. The survey basically contains summaries of functions and compares them from 3 different sources.

The whole reason I started this was because I had to fix a ton of undocumented projects at work that had almost no comments in them and I kept thinking, wouldn’t it be wonderful if I had something that would at least provide my some documentation?
Also pretty much all modern approaches do not guarantee correctness of the output, which is easy to see in ChatGPT. My main goal is to produce a description that is guaranteed to correspond to the code.
Also this can potentially be used to verify existing comments, if they describe/document the code correctly. I have many different ideas about future development and applications of this. But first I have to be able to make it work on different levels of abstraction, then I can tune up the actual output and extend this to more languages eventually. There is a lot of work ahead

flyinghyrax · January 6, 2024, 3:54am

Isn’t the internet wonderful. A 90+ comment thread about what turned out to be English semantics, but I’m glad we’re all getting somewhere.

By the way @sgrey , I saw on the survey page your project got NSF funding? Congratulations on that! I’ve heard writing grant proposals sucks.

Edit: and good luck on your paper. Given the interest this received here (haha), I hope you’ll let us know when you publish!

sgrey · January 6, 2024, 3:59am

I did try to clarify it early on, albeit unsuccessfully And thanks. Yes, writing grant proposals do suck and there are a lot of hoops to jump through. Fighting for the money is really hard

And well, it does seem to be something people are interesting in, which is good. I got some good feedback here