Understanding PEP discussions

Please take a look at Kirigami, an open-source PEP discussion reader designed to improve the understanding and flow of a PEP thread.

Motivation

Long PEP discussions are time-consuming and difficult to follow. The lack of threading in Discourse makes it difficult to move toward consensus.

As a former Steering Council member, PEP editor, and core team member, I have read many PEP discussions. It’s particularly difficult to understand a discussion’s flow and key points.

How it works

Instead of using an LLM to summarize a thread and risk losing context and detail of a discussion, it leaves the individual messages intact and easy to access. It’s designed to reduce the toil of reading long threads and to help understand the discussion from different perspectives. See How it works page for more details.

Thanks

Huge thanks to @jonathandekhtiar for pairing with me at PyCon sprints to bring my ideas from the past two years to life as a usable website. It’s still early days for the FastAPI, SpaCy app, but we wanted you to benefit from what is already possible. The web app is in the feature/webdesign branch.

Thanks to @vyasr, Sarah Kaiser, @barry, @savannahostrowski, @emmatyping and many others for adding conversations at PyCon that helped guide our approach.

Next steps and hope

Give it a try. Share your feedback. I hope it will reduce toil while reading, give us more time back in our days, and assist us in building consensus. Python and its packaging help many people get real work done. Let’s work together to take it to the next level. :sunny: :globe_showing_europe_africa: :microscope: :tada:

14 Likes

Is it just me getting PR_CONNECT_RESET_ERRORs accessing the links?

1 Like

No, I’m getting them too.

1 Like

Sorry folks. Try How it works | Kirigami https://kirigami.fastapicloud.dev/how-it-works. If it doesn’t work hang on and we’ll refine today.

FYI @jonathandekhtiar.

I updated the link in the original post.


Update: @bwoodsend @pf_moore I did get confirmation from a few folks that the updated link is working. Do let me know if it doesn’t work for you. https://www.dpodoesnt.work/how-it-works

Hmm, still dead for me.

1 Like

Interesting project! I threw a couple of threads at it, one I started and more-or-less recall, and one a PEP that I’ve tuned in and out of. At the moment, I don’t find it very helpful for reorienting myself to the threads. It foregrounds people quoting each other as agreement, and pieces quoted as evidence often use pronouns without antecedents, so it’s difficult to understand what the topic of (dis)agreement even is.

Here are the summaries:

A couple screenshots demonstrating pieces that felt distractingly unhelpful:

I read the “How it works” and that there’s no LLM involved, which makes me think resolving these issues is likely a significant challenge with traditional NLP (although I’m no expert in the state of the art, modulo LLMs). However, identifying and filtering literal quotes might be a fairly straightforward refinement.

1 Like

Thanks for trying it out. :grinning_face: It’s still early days, and this is the first prototype. One place that we do want to improve is visualizing the conversation flow on long threads. Right now, it groups the conversation into 4 areas. I think adding visual maps of the conversation will assist with reorienting the reader.

In the future, we may add LLM summaries of individual messages or the entire thread. Today, the LLMs lose nuance in highly technical discussions. For now, the design decision is to initially focus meaning from views of the raw messages.

1 Like

@pf_moore @bwoodsend I think I tuned the DNS - it shall be good.

Curious what you think :wink:

@effigies Yes … It’s not that trivial to get NLP to be “just right”. We could use LLMs but that would come with its own issues, and the cost would be fairly significant. Now it operates very cheaply and we expected the community to be very against using LLM summarization.

Feel free to play with the repo and the NLP, if you manage to tune it better, happy to merge

2 Likes

Seems to be working now.

I can’t say I find it that helpful, but I’m unusual in that I actually don’t find linear discussions that hard to read (as long as people quote well, which they mostly do). I was hoping the highlighting of significant points would be useful, but as you say it’s hard to get that right and right now, the false positive rate is higher than what I get just by skimming the thread.

I’m really glad you chose not to use LLMs - if there was any sort of LLM analysis involved, I wouldn’t use it at all.

Please, no. Having to fact-check an LLM summary is just as much work as reading the thread itself, which destroys the whole point of the app. And taking a potentially incorrect summary at face value is really bad - we already have enough miscommunication from people misinterpreting other people’s comments, adding LLM hallucinations into the mix would be a huge problem.

10 Likes

Ahahah that’s absolutely a struggle of mine - though if you have ideas on how to refine the NLP, or the type of outcome you’d like to see.

It’s a very crude working example of the high level idea @willingc brilliantly put together, I just had fun putting it together with a few people at PyCon. Sarah Kaiser had a few promising ideas.

I’ll try to fine tune UX & NLP over the coming months, I agree with you that right now it doesnt give a very clear & readable picture. I think the “take-home” is really that it’s possible and work-ish decently, let’s see if we do manage to get somewhere

2 Likes

I saw an early demo of the tool and it looks interesting, but I haven’t had time yet to review it in more depth.

That said I have long had difficulties with flat discussion threads. I think we’ve lost a lot when our forums ditched threaded conversations, because I think that’s mostly the way we normally talk to each other in person[1]. I’m reminded of the summit rules of raising your hand to comment on the current discussion, and raising a finger to start a new “sub-thread”. I’m also reminded of Usenet which, despite its many other problems, was actually nice in being able to track (and more importantly, prune!) side tangents.

I’d love a way to track side-threads and mark them as resolved or closed (for me at least) so I don’t have to think about them any more.

Thanks Carol, JD, et al, and please do keep experimenting!


  1. who hasn’t been in a great multi-hour conversation with friends that weaves and bobs all over the place? :grin: ↩︎

5 Likes

@barry I’d like to mention that Discourse extensions/plugins are possible to “include threads” (I believe). If the steering council is opened to it - I’m happy to volunteer some time and explore if we could enable “tree structure” instead of linear in DPO.

Maybe it can be enabled on a per-thread basis initially (no idea if that’s possible, I’ll need to play and experiment).

6 Likes

Just as a side point, I was confused initially by the name because Kirigami is pretty well known as the name of one of the major KDE UI frameworks. If you do web searches for stuff like “Python Kirigami” you will find stuff with people asking how to write KDE apps in Python and so forth. So this name is maybe not the best choice.

1 Like

Side note for those as ignorant of ‘kirigami’ as I was. kiri-gami means cut-paper, an extention of ori-gami, fold-paper.

1 Like

Ditto. Certainly at first.

If you ever experiment with summarizing posts, perhaps only for internal use, I suggest using small, local, open-source models than can be specialized with existing PEPs and discussions. I suspect that getting a really good prompt might be as difficult as getting NLTK code right.

People differ in their ability to manage linear streams. Like Paul, I do decently well if I try, but I think the limit is somewhere less than 100 posts. At that point, discussion should be closed and summarized and the PEP either closed or revised and a new discussion started. Kirigami, however named, could help with both on-going and final summary.

Thank you Carol and Jonathon for experimenting.

2 Likes

@BrenBarn Thanks, I am aware of the KDE project.

The following are random notes from the last two years which explains the naming: kirigami idea docs

P.S. My daughter lives in Japan so I tend to name projects about stuff that is there.

Absolutely. We are using SpaCy now, and there is much more we can do with it before the need for a local small model :smiley:

1 Like

Hmm, got in in the end – had to turn off the corporate VPN to do it.

I fed it the PEP 802 (empty set literal) discussion. I think it’s assumed that any form of assent/dissent can only be for the PEP, not noticing when it’s really directed at either a counter proposal or someone else’s assent/dissent for anything. With the categorisations misled like that, pretty much the whole analysis gets lost. It does sound like it would be valuable though if it worked.

I also think linear conversations, with decent quoting/cross referencing capabilities, are underated. Conversations aren’t trees. They split up, join together, criss cross, overlap, exasperate each other, render one another moot. When a tree like structure is enforced with threaded conversations, comments get duplicated to all the leaves they apply to. It also makes tracking which comments are unread impossible. DPO is the only forum site I’ve used that seems to recognise that. The lengths that conversations get to is problematic but I don’t see how that improves with threads.

1 Like

Online discourse analysis is an entire field of research. There are many different papers written about async and sync communication and decision making as well as threaded and flattened communication. Here’s an overview: https://www.sciencedirect.com/science/article/pii/S2211695825000789?via%3Dihub

1 Like

My own personal thinking … Feel free to ignore or tell me how wrong I am :smiley:

I think there’s a few problems with DPO (not saying that I know how to fix them), here are a few that I find personally difficult.

  • The linear nature makes it very difficult to get a good sense about the evolution of one specific topic (or otherwise to ignore a specific topic to focus on smthg else). I do believe that threads would be of great help to that extent, though I recognize that @bwoodsend 's comment is very valid Conversations aren’t trees. They split up, join together, criss cross, overlap, exasperate each other, render one another moot. I’m just not convinced a linear thread is a sweet spot middle ground.

  • It’s very easy for anyone to steer a conversation into an abyss by just obsessing over a “tiny/not-so-tiny detail” and completely derail the conversation. Making topic X immensely out of proportion compared to the “scope of the topic/PEP”. It’s very hard to control/limit without actively shutting down said behavior and it can be rapidly seen as author tries to control what is being discussed

  • I personally believe that even if it’s good philosophy / mantra to put everybody on the same level in the conversation and effectively give everybody an equally loud voice. It doesn’t work in the real world. It’s not constructive when anyone with no specific experience on a topic - no “skin in the game” (aka. maintaining X of the ecosystem) can just derail the conversation or force a topic to be the highlight of the thread. I understand this is touchy and may even be arguably on the line with the CoC (I hope I won’t offend anyone though), but I really think the current mode of operation is ineffective for “ideological reasons” not “practical ones”. And I understand that we don’t want to also make it 100% the other way, I think there’s a balance to find.

If I can give an example of how this impacts me, I usually participate quite a bit in packaging discussions ( we all have our passions don’t judge :smiley: ), I have immensely difficulty to track the opinion of people like @pf_moore (I hope you won’t mind me name dropping you) or any packaging ecosystem maintainers on virtually any topic. It’s burried in the noise. And it’s for me a real problem when the opinion of the actually knowledgeable people that the community depends on is being made invisible. I named Paul because it’s easy and I find it very symptomatic (and I say that in total friendliness - I want to see Paul’s messages and opinions among the first messages if anything - and not just Paul).

If it was up to me … I would try the following …

That being said … If it was only up to me and I was able to experiment a bit … I would do the following experiment:

Thread Mode

Find a way to use threading/tree structure and evaluate for a few months if it’s helpful or what @bwoodsend mentions is actually more on point. Or even potentially provide a way to “flatten the tree” and make it “as you see fit”.

Rate Limiting

Every new threads (especially PEP threads) would now includes 3 new fields:
- PEP Author(s)
- PEP Delegate(s)
- PEP Sponsor(s)

If provided, the following people (in addition to moderators, SC, PSF Board, etc.) would get unlimited messages (mostly to reply to other people), every other member gets a 5 posts / day rate limit. I immensely believe in the power of “slowing conversation down”, and I’m convinced this would bring a strong breath of fresh air. We can even add a “small tag” next to the name of the person (author/delegate/sponsor/sc/psf/moderator/etc.) to help identify who is who (not convinced it’s so useful but why not.

TL;DR

Again, it’s really my own personal views and I’m fully aware it’s probably not the sweet spot, I’d like to personally see the community iterating over ideas, refining over time and adjusting instead of staying in an eternal state of fairly poor status quo.

I’m immensely grateful for @willingc to have started this work and entertained my motivation. I do think it’s possible to make DPO a little bit nicer to user and more constructive. I think a few minor tweaks can have immense effects. Regardless if it’s this project that we started or something else. I’m just motivated to “try something”

4 Likes