Survey on Python comments

sgrey · January 4, 2024, 9:54pm

Thanks for your feedback and for your time even if you didn’t finish it. I appreciate it anyway. I am trying to compare the output of my system to hand written comments and the ones generated by an LLM, so they are indeed rephrasing the same thing in different ways, but this is exactly the point of the survey. We would like to find out which phrasing is more appealing to developers and looks better and more natural, as these are some of the points of comparison for us.

sgrey · January 4, 2024, 9:56pm

Thank you, this is indeed one of my motivations. I have to use the original text as is because of public survey constraints

hansgeunsmeyer · January 4, 2024, 10:02pm

Oh, wow… Good for you! I didn’t take into account how crummy open source code can be
I do think there is a dependency between the code quality in the survey and the value of any comments (with the implication that the value of the survey itself will also be higher with better code), so
I hope you’ll be able to take this further in the future - good luck

sgrey · January 4, 2024, 10:04pm

Chris Angelico:

Could this be generated from the code?
def money(savefile): savefile.money[0] += 5000000 # Add more dollars
Would your program know that money]0] is the number of dollars you have? If so, how, and where is it getting that information from?

What about this one?
	# Random note: Glitch attachments that begin with 0 are identified correctly
	# eg GD_Ma_Weapons.Glitch_Attachments.Glitch_Attachment_0421 gives O0L4M2A1.
	# Other attachments have the internal name give an Amplify value higher by one
	# eg GD_Ma_Weapons.Glitch_Attachments.Glitch_Attachment_2144 is O2L1M4A3. Odd.
In fact, I’m going to invert the question. Can you give an example of any comment of mine that CAN simply be inferred from the code?

The first one can be easily generated from code and many code generation systems actually can do that already. Depending on the context of where this function is written, a code generation system can write something like “add x amount of money” or “add x amount of currency”, where currency would be specific currency. It can depend on where you are and what the project is. It might be calculating not just dollars, but other types of currency, so the “dollars” would in fact be incorrect in this context

As for the second comment, it looks like a bug description. I don’t believe there are any system right now that can describe bugs properly. At the moment my system wouldn’t be able to generate documentation like this, but in future it would be able to. I am not sure if it’s going to be the next publication or two away, but I will be able to tell you stuff like requirements for dependency injection and api usage. Of course with limitation.

hansgeunsmeyer · January 4, 2024, 10:08pm

If you ask ChatGPT (3.5) to add an explanatory comment to the code (without comment) it gives:

# Increase the money attribute in the savefile by 5,000,000.

If you then ask, Yes, but what does that mean? it comes up with

This comment succinctly conveys that the purpose of the function is to raise the available funds (money) in the savefile by an amount of 5,000,000.

(I do like Chris’s comment a lot better though )

sgrey · January 4, 2024, 10:15pm

Thanks Due to requirements it was actually hard to find appropriate snippets I could use for the current level of the system. Most of the code uses abstractions and design patterns, but I can’t do those yet.

sgrey · January 4, 2024, 10:20pm

I am quite limited in the survey and can only ask a few questions. If it’s too long, no one would take it. There is also a question of comparison points. There are also legal limits to what I can ask and do. Currently I am asking about function summaries because it’s the easiest point of comparison and also quite an important step. In actuality my system can do more than just function summaries, but I can’t include that in the survey.

sgrey · January 4, 2024, 10:24pm

Some of the comments where hand written by the developers of the code and the code itself came from open source projects on GitHub. There are some basic algorithms there and some library implementation code. It came from various different projects, but all of them are real projects.

I really appreciate your feedback on this, it’s very important and will decide my future development goals. As this is a totally new system for comment generation, I have to start small and publish as well. Otherwise I will never be done. This is just an early development state, but will improve in the future.

Rosuav · January 5, 2024, 12:59am

Ah, but that’s the exact problem. There’s a set of four of them, and they have these comments:

@synthesizer
def money(savefile): savefile.money[0] += 5000000 # Add more dollars
@synthesizer
def eridium(savefile): savefile.money[1] += 500 # Add more eridium/moonstones
@synthesizer
def seraph(savefile): savefile.money[2] += 500 # Not sure what, if anything, these two would do in TPS
@synthesizer
def torgue(savefile): savefile.money[4] += 500

Now, how do you know exactly what each one does? The names of the functions are partial clues, but what if they’re wrong? How do you know that money[4] is Torgue tokens?

This information does not exist outside of those comments. You cannot generate it from the code. That is WHY the comments exist.

So, question. Would your program have said “add more dollars” on all four of them? If so, it is worse than useless, because it would be flat-out wrong for three of them.

Perhaps, but if it’s a bug, it’s not MY bug, it’s someone else’s. My code is acknowledging this and coping with it.

Rosuav · January 5, 2024, 1:02am

Hans Geuns-Meyer:

Rosuav:
Could this be generated from the code?
def money(savefile): savefile.money[0] += 5000000 # Add more dollars
If you ask ChatGPT (3.5) to add an explanatory comment to the code (without comment) it gives:
# Increase the money attribute in the savefile by 5,000,000.

This is technically wrong - it adds to the first element of the money attribute - and functionally useless, since the point of the comment is to annotate what the different elements of money[] represent. See my other post, and try to ask ChatGPT to add explanatory comments to all four of them as a set. Maybe it can make a guess based on the names, but again, what if the names were wrong? It’s not much use just echoing back that.

MegaIng · January 5, 2024, 1:33am

Really? You found open source developers re-implementing python’s max function? And considered whatever comment they might be writing potentially useful?

This is exactly my problem: In it’s current state, the tool seems useless since none of the examples add any value.

This is exactly missing the point: The comment adds information not present in the code, in this case that the currency in question is dollars. This makes it a useful comment which can never be generated by an AI that only looks at the local code.

sgrey · January 5, 2024, 1:35am

Chris Angelico:

Ah, but that’s the exact problem. There’s a set of four of them, and they have these comments:

@synthesizer
def money(savefile): savefile.money[0] += 5000000 # Add more dollars
@synthesizer
def eridium(savefile): savefile.money[1] += 500 # Add more eridium/moonstones
@synthesizer
def seraph(savefile): savefile.money[2] += 500 # Not sure what, if anything, these two would do in TPS
@synthesizer
def torgue(savefile): savefile.money[4] += 500

Sure, yes, if you write out a series of math operations with random numbers and ask any system to figure out what’s going on, that would impossible. I am not claiming that it’s a psychic system that can just guess something. Well, ChatGPT guesses stuff and writes it out, but not what I do.
However, most modern systems that are specifically designed to generate comments and nothing else, unlike ChatGPT, will use names of your variables, objects and functions in the generation process. So for eridium example, it would potentially produce something like “increase the amount of eridium in savefile”. It is very volatile process, though, and it is dependent on your code style.
Also, the purpose of my work is for education and for analyzing complex code structures for undocumented code. This example is context-requirements based where you are doing a game or something like that and giving meaning to arbitrary numbers in your project.

I am also not sure if these comments are actually useful and there is also a question of design. Why are you saving several different attributes in a list instead of an object or a dictionary? If you have “money” object encapsulated in your savefile, and each field would correspond to the appropriate item, then it would be much easier to read and reason about it for humans and for algorithms. I can only think of optimization for memory access.

I mean… I am not making an AI developer which would debug your code and write out bug fixes and such. I am making a system that assists in understanding what code does or produces comments when there aren’t any. Like if you are assigned a 20 year old project with few hundred files and 5 comments across them and strange structures inside. So yes, I wouldn’t be able to write out that you get some specific value from an api call and the very particular structure you have in your code handles weird behavior of a library function. This not really a solvable problem at the moment.

sgrey · January 5, 2024, 1:50am

I am not sure what is the issue you have here. What’s wrong with max function and the developers that implement it? Some of the code came from educational repositories with basic algorithms and that’s also my starting point. At this stage it’s not a question of usefulness, but “can you do it?”

I will have to disagree with you there. Yes, there are many types of comments, but documenting what the code actually does is inherently useful, especially if the code is quite complex. It is not always easy to read a piece of code and just understand what it does. There are also many adjacent usages for such comments.

Rosuav · January 5, 2024, 2:02am

And that’s my point: the comments are there because these are NOT random numbers, they have important meaning. That’s exactly the purpose of code comments. They tell you something you cannot figure out purely from the code.

Because it’s not my savefile design. It’s someone else’s. I don’t get to make those sorts of decisions, I just work with what exists.

It’s easy to produce comments where there aren’t any. It’s much MUCH harder to produce useful comments. Can you give any examples where your program actually adds useful comments?

Comments that are useful to a complete beginner who’s just learned how to reimplement the max function are quite useless to that same programmer a month later, once s/he has learned a bit more about coding. So, if your goal is to make a tool that is only useful for utter beginners, say so, and I will change my views on it. But if you expect this to be of value to production developers, it needs to be capable of doing more than just what any mid-level programmer would be capable of doing for themselves.

Is it? How? Consider this example:

//For every option, filter out every other option.
return options.filter(keep => options.includes(check => check !== keep || check === '*'));

Does that comment help? (I lifted that from a JS project of mine, but tweaked it a bit, since the original actually used CSS rather than a filter per se.) Is it useful?

This comment is false. ^[1] Maybe it was once true, but now it isn’t. That makes it worse than useless since it gives a wrong impression about the code.

Can you absolutely 100% guarantee that your program will NEVER give inaccurate code comments? If not, how much value are they really giving? You have to first find a piece of code that you don’t understand and that the program is capable of parsing for you, then check to make sure that its analysis wasn’t wrong… which means you have to understand the code.

[edit: had the wrong JS method after I simplified the code - to be fair, the example was supposed to be incorrect, but not THAT incorrect]

(Don’t think about it, don’t think about it – GLaDOS) ↩︎

sgrey · January 5, 2024, 2:27am

For the purpose of comment writing these are just arbitrary numbers. They can be changed to anything and the meaning of that function will not change in any way. This meaning here doesn’t come from code, but from the overall system design and the developer. This is contrast, for example, to a formula where if you change a number, it will actually change the formula completely and the meaning of the code will change as well.

Well, you seem to have a particular definition of what a useful comment is, which doesn’t seem to correspond to what is commonly considered useful comment by the research community. I don’t know if at the current stage anything my system produces would be considered useful by you.

So… I am not making a tool. I am doing research and the current state of the system is very rudimentary. The survey is for the purpose of collecting opinions of developers on the phrasing of the comments and their accuracy, which is what reflected in our questionnaire. The value is in the general approach I am taking, not in the specific implementation of it, and it also will improve in the future. I am just publishing a paper about my approach and collecting opinions on the current state, which will also dictate where this project goes in the next stage.

Ok, now you are just being facetious I clearly don’t mean that anything you write in the comment tag is inherently useful. Mind you, this is written by a human, which shows that not only algorithms write bad comments ChatGPT actually does a decent job here

This code snippet filters an array of options based on a condition. It returns an array containing elements from options where each element is not equal to any other element (excluding itself) or is equal to ‘*’. The filtering is done using nested arrow functions and the filter method.

One of the reasons I am making my system is also to fix outdated and wrong comments

Depending on what kind of granularity you are talking about. But I can guarantee it within reason. In a similar way that you can guarantee that a library function you use will never return something unexpected or there won’t be a buffer overrun if you add input of more than certain number of charters.

flyinghyrax · January 5, 2024, 2:29am

I think you’re attacking the medium and missing the message, here.

The kind of text being generated by this experimental tool does not have to surfaced as inline comments. It could just as well show up in an analysis tool, in pop ups, as placeholders. My impression (OP obviously can correct me) is that the crux of the research is to generate accurate and useful text, not replace explanatory/contextual comments. There are then many practical applications available once the method works. Isn’t that how lots of research goes?

hansgeunsmeyer · January 5, 2024, 2:34am

The comment as such – even if it would be correct – is not helpful without more context or detail. Since my JS is extremely rusty, I asked an LLM to explain the code (without giving it the comment). Its conclusion was:

In simpler terms, it keeps elements in the array unless there is another element that is exactly the same as the current element (keep) or there is an element equal to ‘*’. The return statement returns the filtered array.
However, it’s worth noting that the provided code might not work as intended due to the misuse of includes. The includes method expects a value to search for in the array, but in this case, it’s given a function (check => check !== keep || check === '*'). The correct approach would involve using some or every to perform the condition check for elements in the array.

Now, even though there might be errors in that, this does help me to start understand the actual code and would help for instance if I needed to write some extra test function.

Continuing research in these kind of tools (either LLMs or other) seems pretty useful to me. Current tools are already changing the ways people learn programming and the ways software is developed, despite all the limitations they have.

sgrey · January 5, 2024, 2:34am

Yes, you are correct. My goal is to produce comments that always correct and properly correspond to what the code does.

flyinghyrax · January 5, 2024, 2:35am

This sounds very interesting, I’m quite ignorant of modern formal methods in practice and look forward to seeing your publication.

I haven’t taken the survey yet, but I’ll have a go tomorrow. I imagine every little bit of statistical relevance helps, it can be hard to get enough quantity from surveys.

sgrey · January 5, 2024, 2:38am

Thank you in advance, then It is indeed challenging to collect statistically significant data.