Claude Code – how much hype, how much true wizardry?

I just meant something like mypy or pyright. Forgot LSP is just the acronym for the spec and not the language implementation.

Better wording would have just been “your language server” which is implementation specific and there are several available for Python.

Not fully, alas, see The Register last summer

1 Like

Absolutely so. They’re not omniscient, and make mistakes. But same with human collaborators: you have to make up your own mind. Human or bot, they’re at best collaborators, not Sources of Truth™,

But also like humans, they can surprise to the upside too! I was recently working on a hairy problem that required a “sliding window” approach over a suffix array. Now I’ve already lost most humans :wink: But not the bot. I asked it to make one of its ideas concrete by writing code. It proved to be a downright elegant approach, cleaner and easier to reason about than any code I’ve written for that kind of thing. It was also correct, but it took some time for the bot to convince me that its seeming simplicity wasn’t overlooking worst-case performance degradations. There weren’t any. It did some operations in a different order, but worst-case linear time still applied.

Unfortunately, bots are at least as good as humans at rationalizing: making up “reasons” for why its ideas must be right - indeed, inevitable. Which is why my first advice to @smontanaro was 'push back!". Verify everything they tell you as best you can. Unlike humans, they never take offense at being challenged, or “clam up” in a huff. They always engage as best they can.

3 Likes

So, here’s my first interaction with Claude. I have a small project hosted on GitHub which I use to convert a “ride” done on rollers into something which has distance.

I don’t have a smart trainer, nor do I have a gizmo I can attach to a hub which will compute distance and send it to my Wahoo bike computer, but my heart rate monitor can measure cadence. Given the details of the drivetrain and my cadence, I can pop out distance details and match it up with a course I’ve defined, in this case the Ed Rudolph Velodrome in Northbrook, IL. I then upload the result to Strava.

So… I asked Claude to suggest changes to help me test the code. (Notice the complete lack of test cases in the repo.)

It made a several suggestions which I’m working to implement, for example (note that it even properly Markdown-highlighted two method names – I didn’t do that)::

  1. Extract course loading from __init__ — inject data, don’t read files
  2. Separate GPX parsing from distance computation in update_lat_long

It seems to have also found a bug in one of my calculations (not something I asked it to do). “delta_t computation looks like a bug — worth a test”. Not too snarky about it either. :wink:

In each case it gave me good reasons for making the recommended changes. At first glance, it seems pretty impressive.

1 Like

This is a basic feature for LLMs. They are language models and they generate syntactically correct (or rather plausible) language whether that be English prose, markdown, Python code or whatever. This is actually the clearest sign that someone is using an LLM to write things.

If you look here you can see how LLMs make excessive use of markdown formatting in a way that makes things just inhuman. I asked that person not to talk to me using LLMs and their reply was:

@oscarbenjamin — I didn’t intend to upset you. I used a tool to help me with wording earlier, but the reasoning here is mine, and I’ll keep future discussion fully in my own words.

Even that apology has the U+2014 unicode em-dash right after my name. No way did they type that on their actual keyboard.

Clearest sign now that someone is not using an LLM: there are typos and they were lazy about markdown formatting.

3 Likes

I’ve used ChatGPT for about half a year and recently switched to Claude.

With time it becomes apparent LLMs are dumb as rock. They’re simply a far more capable google with a much better interface.

My mental model is “buggy super compilter.”

Long ago, I wrote assembler for a Commodore 64 due to BASIC’s lack of speed. Then C → C++ → Java / Python. Each step transferred more technical detail from me to the tool. “AI” is just the next step.

I get best results with small explicit steps, code review, and testing each step.

Folks here have probably already seen this article, but just in case not…

GitHub ponders kill switch for pull requests to stop AI slop

4 Likes

A simple example to show that they’re not just aping human language. Here’s a simple regexp:

haystack = re.compile(r"\d+\s+")

Matches at least one digit followed by at least one whitespace character.

Apply it to strings of the form:

needle = "123" * N

It can’t succeed because there’s no whitespace in the needle.

.match() fails “almost instantly”, in time linear in N.

But .search() takes time quadratic in N to fail. Essentially because it starts all over again at every starting index.

There’s nothing “pathological” about the regexp - it’s very simple and well-behaved. It’s the higher level strategy search uses that drives it. Note that N has to get will into the thousands before this is noticeable. In “catastrophic backtracking” cases - which take exponential time to fail - massive slowdown becomes apparent with N in the dozens.

My first thought was “duh - use a possessive quantifier - \d++”. Which does help. Cuts search time-to-fail about in half, but it’s still quadratic time.

There’s very little discussion of this kind of thing I’ve seen anywhere. Is it possible to change the regexp so it’s always worst-case linear time?

ChatGPT answered at once: sure! Just start it with a caret (“^”), effectively turning search() into match(). But that’s incorrect. It should match needles like "5-345 ". which require starting at index 2.

If did eventually solve it, but I’m pretty sure it didn’t find any “canned solution” in its training data. It instead showed every sign of “thinking”, reasoning about how the implementation worked, trying various dead ends until it hit on one that worked (all in less than a minute).

Me? I didn’t solve it, although I didn’t try very hard. @stefan2 solved it by adding 7 characters to the start of the regexp. ChatGPT also found that one - but then cut it to adding just two new characters at the start. It gave every appearance of “knowing” what it was doing, far deeper than just constructing grammatical sentences.

Its parting comment:

Just so.

2 Likes

Aha! @Stefan2 retains the crown after all :smiley: The bot’s “improved” idea was to start the regexp with a word boundary assertion (\b). That worked for all the examples we were looking at, but fails to match

needle = "5k23 " # note the trailing blank

That should match the trailing "23 ", but there is no word boundary between “k” and “2”.

So there’s still some use for humans after all :rofl:

1 Like

You guys are too advanced for me… I learned the ++, *+ and atomic group trick reading one of your past posts, and I still have to find a project that really needs it… :stuck_out_tongue:

Anyway, I’m just curious to see if my favorite bot reached Stefan – whose posts are every time really interesting to me. What about (?<!\d)\d+\s+?

1 Like

Yes, @Stefan2 and ChatGPT-5 both thought of using a negative lookbehind assertion - which does the right thing in all cases, and always in worst-case linear time. Very clever!

1 Like

And GNOME bans AI-generated extensions | The Verge

2 Likes

is anyone considering the amount of resources that playing with such imho fraudulent technology is wasting, and the implied ecological devestations?

4 Likes

Did ChatGPT “think” of it before or after you had publicly posted it?

1 Like

I check in important designs to version control, including designs that I use AI to implement, so that kind of context isn’t lost. Some people call these “architecture design records” or “tech designs”.

bad AI unit tests

I write my own end-to-end test names, in the form of sentences (or given-when-then), since that’s the important part IMHO. Then I use AI to implement them, which gives me a good lift.

good control of QA in other ways

I have custom linting rules that enforce project-specific conventions and warn about unsafe API usage patterns in the context of my codebase. Then AI gets immediate feedback while it is writing code, without me having to repeatedly explain/notice issues.

I refuse to not have type hinting

Type checkers I also find useful for providing immediate feedback to AIs. However I sometimes find that the way that AIs choose to fix typing errors to be problematic: unnecessary algorithm rewrites, inappropriate use/non-use of # type: ignore. I’m still working on improving that experience.

As for 5), well you check it. E.g. you trust your tests.

I review all AI-generated code before merging it, just like I would with my own code. I almost always revise & simplify.

these things [AI agents] only actually work when used by someone who knows what they are doing

Yes, which means you have have to practice using them to get the best results. I’ve been intentionally spending a few hours per week, for multiple weeks, to build up experience over time.

1 Like

After, but I’m sure it didn’t see my post. It came up with it after exploring a number of dead ends, “reasoning” about how the engine works. More generally it’s usually unaware of anything published until some months have passed.

I believe it :wink:

1 Like

Around maybe two years ago I saw a new Stack Overflow question with broken code. I copied just the question title into ChatGPT. It responded with a fixed(?) version of the question’s code, which had unusual lengthy variable names. It had clearly seen the question’s code. That was about 30 minutes after the question had been posted.

1 Like

I believe you. I also believe my bot. But ChatGPT “around maybe two years ago” appears to have almost nothing in common with the bot of that name today.. It’s approximately infinitely more capable now :wink:

1 Like

I’ve often talked about “collaboration” with AI assistants. Their ability to do this productively has amazed me repeatedly. So I asked my favorite bot about what it thought about it all. Its reply:

+1. I don’t treat it as An Authority™ or as a moron, but as an indefatigable colleague always happy to dig into any level of detail needed to reach the heart of a problem. Which isn’t always the outcome I hoped for. But when it’s not, the exploration leaves us both with strong reasons to believe “no, you can’t get there from here” - and usually uncovers a different approach that can work.

That’s the most I hope for from human collaborators too - but very few of them are available when I wake up at 3 in the morning with a “bright idea” demanding to be entertained :wink:

1 Like

I’ve been messing around with Claude Code the past few days (thank you @gpshead for the trial license). I already had working code, but no test cases, and wanted Claude’s opinion about the code structure. I’ve never used any chatbots before, so I wasn’t at all sure how to write a prompt. I just wrote plain English. It worked pretty well.

Claude offered some excellent suggestions and in the end I wound up with improved code structure and many more test cases. It was more like a conversation, not “write me a function to compute the haversine of two lat/long tuples.”

So far, I’m happy with it. I’ll try to throw some more code at it in the next few days, maybe a Flask project (which is likely poorly structured from a Flask app perspective).

2 Likes