PEP 722 and PEP 723 User Study Discussion

User studies were conducted on 09/22/2023 to get feedback on PEP 722 and PEP 723. I am happy to provide clarity to anything summarized below or answer any questions that arise from the study results.

TL; DR

  • Participants want the ability to specify 3rd-party dependencies and required Python versions.
  • None of the participants had knowledge of or had used TOML syntax but did not necessarily view it as a blocker. PEP 722 syntax felt more aligned with their current experience but PEP 723 wasn’t incomprehensible.
  • Participants felt both proposals would enable them to run and share Python scripts more easily.

Participants

The profile used to source participants was Python developers with 1-2 years of experience who worked in single file scripts regularly and had experience using third-party dependencies. The 1-hour sessions were conducted on Microsoft Teams. Sessions started with a discussion of getting to know each user’s experience, a hypothetical aligned with the scenario outlined in the PEPs, and ended with a concept value test for both proposals.

Session/Participant 1

  • Electrical engineering major
  • Currently doing automation testing

Session/Participant 2

  • Master’s student in data analytics and visualization
  • Working as a data analyst

Session/Participant 3

  • Senior studying computer engineering

Key Insights and Findings

Key Insight Details and Key Participant Quotes
Participants want the ability to specify 3rd-party dependencies and required Python versions. Participant 1 says that the scripts they work on always come with Python version requirements and this is something that is integral to their workflow. When evaluating the proposals, he viewed PEP 722’s limitation in that there is no format for defining version requirements as a “disqualifying factor”. Participant 2 currently documents which dependencies a script is using in a readme.md file as well as other descriptive context that might be useful for others reading their code. They felt PEP 723 provided “more control and options over the installation process” although they don’t currently have a use case for specifying Python version requirements. Participant 3 mentioned they currently document dependencies needed for a script in a comment at the top of the code and made a note that this is something “a lot of students do as well”. They went on to say that providing an option to specify the Python version is very helpful because they can open a script and “have all the information you need there and ready”. They liked the Python version option and went on to give examples of times in their experience where not knowing the required Python version was a key problem when trying to receive and run code.
None of the participants had knowledge of or had used TOML syntax but did not necessarily view it as a blocker. Participant 1 had heard of pyproject.toml files but has not had a use case for them themselves. In their experience, their single-file scripts rarely, if ever, grow into larger projects. They went on to say using TOML syntax “would be useful in some industry applications” and that it looks “more complex until you actually try it out” and doesn’t necessarily view the syntax as a “disqualifying factor, just [something] to get used to”. Participant 1 felt that PEP 722 “was safe in terms of their experience” although they mentioned following a syntax that is already used and is a standard could be useful. Participant 2 had never heard of TOML and did not have experience growing their single file scripts into larger projects. However, they did mention “transferring code between things with different syntax is frustrating so I would assume this is beneficial”. The participant said “I don’t think it would be difficult for me to understand TOML, I just don’t use it currently or have a use for it. It wasn’t incomprehensible.”. Participant 2 viewed PEP 722 “like a more beginner-friendly simpler solution” and that PEP 723 “is more tailored to experienced developers” and a solution that “more experienced developers could get more use out of”.Participant 3 had never heard of TOML but does have some scripts that end up growing into larger projects. They didn’t see the benefit of aligning with TOML syntax because they didn’t necessarily have a use case for it themselves. They felt PEP 723 was “very informative and a little complex in code but if you’re a coder it makes sense. It’s like a block of comments and helps you at the end of the day”. All three participants felt apprehensive when they saw PEP 723’s syntax for the first time, but as they digested it, they began to open to it more.
Participants felt both proposals would enable them to run and share Python scripts more easily. Participant 1 felt that being able to specify script metadata would help a lot in their workflow and particularly found it helpful for tools to know script runtime requirements. Participant 2 felt that enabling tools to know runtime requirements “would be the most beneficial thing to me” when evaluating the two proposals. The participant also said that if a tool was able to spin up an environment at runtime with the metadata they specified “it would stick out to them” as they are interacting with the tool. Participant 3 said that the proposals would enable them to spend “less time explaining the script and more time on the actual code”. They also explained how they could see these proposals being helpful in both school environments and work environments saying that this “would really be useful for students [because] we share code all the time” and in a work “it would be really useful because it was hard to know what was installed” when onboarding to a new script. In general, the participant said that either of these proposals would “help decrease issues sharing code” and gave examples of how this would also be useful for users writing or contributing to open-source code.

Educator Feedback

While evaluating PEP 722 and PEP 723, we sought out educator feedback in addition to speaking directly with Python beginners.

One educator gave general feedback on their reservations about creating another format for specifying dependencies. They voiced concern that “these proposals bring yet another way to deal with dependencies in Python” and said that most often their students are “confused and overwhelmed by the plethora of formats, files, etc.”. Currently, they do not teach TOML or use pyrpoject.toml in their class but said they “will need to eventually as the Python community is moving in that direction” although they believe it causes “fragmentation” with install tools. When asked about their perspective on teaching TOML, they said “students get really overwhelmed” but there are many ways to overcome this, and “one thing can be taught at a time”. Their perspective was to “define a way to do this and provide a clear way to teach this”. Regarding the syntax used in both proposals, they said it has “the potential of becoming unbearable without syntax highlighting” or “another way to make it distinct to users and easier to read”.

32 Likes

It’s interesting to see user studies, but do you plan to make some with more participants than just 3?

1 Like

I think 3-7ish is pretty industry standard for user studies, in fact I think there’s actual research that diminishing returns kicks in really hard at ~5 users.

5 Likes

I don’t have a real dog in the PEP 722 / 723 fight, but thanks for putting this together! It was an interesting read at the very least for someone who is happy either way :smiley:

7 Likes

This was extremely interesting to read, thank you very much for facilitating the study! I wish I had the resources as an individual to conduct these for other features as well.

1 Like

Very cool! I guess now we have to think about what it means. . . :slight_smile: Is this a thread where we should discuss the implications of this for these two PEPs as well? It seems maybe better to do it in one thread instead of two separate threads for the two PEPs. I will have to ponder the results a bit more but I do have some thoughts.

@courtneywebster thanks for doing this work and presenting it in a way that’s digestible but also gives us plenty of detail to chew on. I’m curious about the process and do have a few questions, maybe you could clarify:

  1. It looks like participants 2 and 3 are students. Is Participant 1 also a student, or does “electrical engineering major” just mean that was what they studied in the past? Was it a specific goal to recruit students for the study?
  2. How/why was it decided to focus on participants with 1-2 years of experience? Was the goal of this specifically to assess the accessibility of these PEPs to beginners?
  3. How/whence were the participants recruited?

I can imagine that, but it kind of depends on what the goals are and other aspects of the study. This seems to be an exploratory, feedback-gathering study more akin to a focus group, and I agree that in that case it’s fine to have just a few users.

What concerns me more, though, is the coverage of different user types. In this case it looks like at least two of the three participants were students. It’s great to get feedback from students, and from beginners in general, as I think that’s an audience that’s very important in considering proposals like this and perhaps somewhat underrepresented in discussions here. At the same time, the discussion of these PEPs and related packaging issues mentioned a range of user types (such as academic researchers), so I’d still say there are some significant unknowns.[1]

I also think that there is a place for larger-scale studies akin to what was done with the packaging survey, where we really would like to see bigger numbers. I’m not sure we need to do that with every PEP[2], but on some of the bigger-picture issues it might be informative. Which is just to say, I don’t think this study needed to be bigger, but I do think @jeanas’s question is relevant.


  1. Not sure if this is the same research you’re talking about, but I see that on this page that mentions 5 users as optimal, they mention that one reason to test more users is when there are distinct groups of users. ↩︎

  2. it could even be counterproductive in some situations, as asking people “do you prefer X or Y” in isolation doesn’t always give the most representative results ↩︎

3 Likes

That’s really interesting. Thanks for putting it together. For what it’s worth I would be fine with adding a python version requirement to PEP 722 based on these findings. I won’t do so unless @brettcannon asks for it, though, as I don’t think it’s fair to start changing things at this point, otherwise.

8 Likes

This is super-interesting, and didn’t match my expectations at all. Awesome!

One thing I wonder about is that students may have a different perspective from other python learners (e.g. research scientists). I’m ignorant of the norms for such studies. I don’t suppose adding 1-2 people as a second cohort is an option of interest?

do you plan to make some with more participants than just 3?

We planned to source 5 participants for the study, but 2 fell through due to scheduling conflicts and experience alignment. In general, our research team aims for 4-5 participants for studies like this.

To answer @BrenBarn questions:

  1. It looks like participants 2 and 3 are students. Is Participant 1 also a student, or does “electrical engineering major” just mean that was what they studied in the past? Was it a specific goal to recruit students for the study?

Participant 1 was a recent graduate with an electrical engineering background. The screener for the study filtered for those newer to Python (1-2 years of experience), which oftentimes leads us to talk with students. We didn’t filter for students only for this particular study.

  1. How/why was it decided to focus on participants with 1-2 years of experience? Was the goal of this specifically to assess the accessibility of these PEPs to beginners?

Correct, the goal was to assess the accessibility of these PEPs to beginners. We felt this level of experience was pertinent to this study and its scope, and is a group of voices we don’t often hear from. However, it is important to note that the results of this study only represent users of this experience level.

  1. How/whence were the participants recruited?

Our research team recruits participants through userinterviews.com. They are able to recruit participants by defining study and screener requirements and then have the opportunity to pick from those who applied that appear to be the best fit.

9 Likes

No because this is rather time consuming and expensive to do (this involved three full-time MS employees for multiple days, so it adds up just in salary costs).

I’m totally fine with that and why I asked Courtney to make this a separate topic from the other PEP topics.

If we choose PEP 722, then I will ask for such an addition. If we choose PEP 723 then we will discuss whether the [run] table and its keys make sense as I expect them to also end up being valid in pyproject.toml as well.

14 Likes

Students generally ignore what they will need in the future.

Their projects in 3 weeks go from start to abandonware.

1 Like

I feel this is an unfair over generalization that implies bad faith on the part of students and is not true of all programs.

Students don’t necessarily ignore what they will need in the future - they don’t know what they will need in the future. Their education is our opportunity to introduce them to tools that will grow with them, or that they can recall later when they finally do encounter the problem a tool solves.

——————————

Given the context and resources available I appreciate that the participant pool is cross disciplinary (EE, data science, and computer engineering) and multi-level [1] (senior undergraduate, recent graduate, masters student).

In general, as this is a qualitative study and not quantitative observation, I think it’s unnecessary to critique the “sample size” and more interesting to focus on the information that was received…


  1. there’s probably a better phrasing for this… ↩︎

16 Likes

Well, it completely doesn’t apply to my personal experience. Coming from (a simple) R background, when I was making my first complex Python scripts, the first “advanced” thing I’ve tried to do is to make my script automatically install missing Python dependencies with subprocess module. I didn’t even know that you’re supposed to split your scripts into multiple modules and handle dependencies separately through requirements.txt, and that R and Python conventions differ substantially. Either I was some sort of lone genius, or the idea of dependency management inside scripts does actually come naturally to at least a somewhat broad range of beginners.

1 Like

When I went to university I was already using linux, so for me it was easier
to just stick to the packaged libraries and hopefully never need the others.

A couple of years after I became a debian contributor and started packaging
the python modules I needed.

Before graduating and starting to work I had never once used venv and pip.

Now I still prefer using packages that are available in my distribution :smiley:

Just to add my two-cent interpretation of these results, I’d say:

  1. It seems there is a user desire to be able to specify some kind of metadata in single-file scripts, so in that sense these PEPs are on to something.
  2. The fact that users were not super turned off by TOML is enough to make me think it’s worth trying to find a way to make that work somehow. There is no question that a full-fledged metadata format is more flexible and extensible than a plain dependencies-only format, and if users seem okay with handling such a format, why not take the opportunity to use it?[1]
  3. Unsurprisingly, I read users’ desire to specify Python version requirements along with package version requirements as an indication that we should try to treat Python as “just another dependency”. I’d be interested to see more canvassing for user thoughts on this matter (independent of these PEPs).

  1. There are still some lingering questions about stuff like escaping and quoting, but I’m optimistic a solution can be found there. ↩︎

3 Likes

Clearly the concept (simply specify dependencies inline) is considered very useful.

TOML seems little known, and apparently even triggered apprehension in all participants. Also the implied generality caused misconceptions about imaginary capabilities:

In the last few months no substantial use cases have been found for TOML, beyond specifying the Python version. If that can be done with

it seems that would avoid unneeded complexity, lower the learning curve, be easier to remember, and cause fewer misconceptions.

2 Likes

I think the times have changed. From my extremely limited experience with helping out, beginners these days rarely stick just to stdlib. They usually use at least pandas (though their pandas code is usually terrible), BeautifulSoup4 or requests, sometimes a GUI library like PyQT or PySimpleGUI, at the very least because that’s what all the educational material on the Internet uses. Even the book I learned Python with explicitly encouraged me to download libraries like pyperclip, PyPDF2, openpyxl, Pillow, etc. for its assignments.

1 Like

Python version matters less to me than it used to. 3.6, f-strings, 3.7, dataclasses, 3.8, “=“ in fstrings, 3.9 “|” to merge dictionaries… So typically I can (without even thinking about it) have a script that is compatible with the last 3 releases, 4 if I avoid “|”.

On another note, without rereading the PEPS I can still remember 722 # Script Dependencies:, but would need to copy-paste 723 TOML. I think that adds to the “apprehension”. Although, with time, I suppose it would become as natural as other syntax.

5 Likes

That’s a good point! Maybe I don’t write enough pyproject.toml files but every time I do, I have to look up the keys again just to make sure. # Script Dependencies: is much easier to remember IMO.

1 Like

With the exception of PySimpleGUI, all the ones you mentioned are available from distribution packages.