Use cases for search functionality in PyPI

Hello all,

I’ve been following and working on the search portion of PyPI for some time and noticed it’s a part of Warehouse that generates some frustration among users.

Some time ago opened a meta issue to try to establish some use cases and give narrow down what is it that users expect from searching in PyPI. I thought it be good to open up this discussion to a wider audience.

Here are the use cases I distilled from the different issues in Warehouse:

  1. Project name searches : users that have a vague recollection of the name of a package or want to make sure of the spelling before installing. I believe this is the main use case for pip search . (#5506).
  2. Solution searches : users that would like to know what the best package is for a particular task. This could be covered by a “popularity” metric however it’s hard to get right as there’s a lot of aggregated community wisdom that is just not reflected in terms of project metadata as names, classifiers and descriptions are sometimes lacking or misleading. (#3932, #3860)
  3. Meta searches : users that would like to explore the project ecosystem based project metadata like interpreter versions, license, contributors, etc. (#727, #1971)

The goal would be to define these or more use cases and a set of requirements for them to start creating some issues in Warehouse.

Thanks in advance.

5 Likes

Personally, I only ever really want project name searches, and typically I either know the full project name, and just want to go to the project page, or I know a partial name and want to confirm the full actual name.

However, it’s possible that I’m “too close to the problem”, and because I know the limitations of the current search feature I’ve simply never tried it for anything more complex :slightly_smiling_face: But I do tend to head straight to Google for broader searches - e.g., “python library for extracting images from pdf” - mainly because if there isn’t a library, it will often give you useful references anyway.

2 Likes

Functionality search: users that are looking for a specific, precise functionality. For example “parsing ISO datetimes” or “RTMP protocol decoder” or “SAT solver”.

6 Likes

I think it’s worth mentioning a lot of users assume package with names similar or equal to a particular technology or service are somehow reserved to the “best package” for it.

This, unfortunately, is not necessarily the case anymore. One very flagrant case is aws while others are more subtle. Recently there was a PEP 541 case for grpc versus grpcio which was resolved but unfortunately not updated.

User @MiloslavPojman replied on twitter with:

  1. Looking for available names for a new library.
  2. Checking correct spelling (e.g. sklearn vs. scikit-learn)

Regarding use-case 2, I really like the “ecosystem” section of the wikis in Marshmallow projects. It allows users to see a list of publicly-available functionality, and developers can update the list themselves.

Unfortunately, I can’t think of a way to integrate this intuitively with PyPI’s search itself

+1 for functionality search - this is by far the most frequent and important search I do. When I search by name, since google search is in my browser address bar: e.g. pypi bokeh

Hi @yeraydiazdiaz - Thanks for reaching out for feedback! Chris P here, author of Plotly Dash.

New user here, so I can only insert 2 links into my post, so apologies for the code-formatted links.

Speaking on behalf of the Dash community, we’d like to use PyPI search to better understand which PyPI packages are published by Dash community members. Here is our imperfect system right now:

  • We added a Dash framework classifier https://github.com/pypa/warehouse/issues/6273 and included that framework classifier in our cookiecutter plugin https://github.com/plotly/dash-component-boilerplate/pull/92 so that any Dash component packages / plugins created after October 28, 2019 will be searchable on PyPI: https://pypi.org/search/?q=&o=&c=Framework+%3A%3A+Dash. There are currently 49 projects here. This works pretty well, but it:
    • Excludes components that were created before October 28, 2019. Component authors need to opt-in to including this in their setup.py
    • Excludes packages or libraries that don’t start from the cookie cutter by authors that didn’t take the time to look up framework classifiers
    • FWIW, we face the same issue with GitHub project topics since it’s opt-in: https://github.com/topics/plotly-dash
  • In Dash-land, we had a convention early on to prefix our libraries with dash-, e.g. dash-core-components, dash-html-components, dash-renderer, dash-table. It seems this implicit naming convention has been adopted by some community members and there are many more (over 1,000) packages on pypi that start with "dash-" and seem related to our project: https://pypi.org/search/?q=dash-&o=. dash is a common name though and so not all of these packages are related to Plotly Dash. It’s relatively easy for me to determine if it’s related to Plotly Dash by reading the description results.

Besides searching for actual published packages, I use the search to find the GitHub repository from some package I’m interested in. For example, today I searched for dash-extensions:

I’m pretty much always interested in checking out the source code, so I immediately click on the “Homepage” when I discover one of these packages. Historically, I’ve found it a little confusing that Homepage is synonymous with GitHub repo. I think it would be nice if there was a link that explicitly said “Source Code” or “Repository”, but that’s not a big deal.

Hope this is helpful!

Historically, I’ve found it a little confusing that Homepage is synonymous with GitHub repo. I think it would be nice if there was a link that explicitly said “Source Code” or “Repository”, but that’s not a big deal.

“Homepage” is not synonymous with “GitHub repo”; the “Homepage” link is simply whatever URL the project author chose to best represent the project, which is often — but not always — a link to the public repository. Having the link’s label change based on the structure of the URL would lead to a confusing and inconsistent UI. If you really want a link in your project to be labelled “Source Code”, use the project_urls argument to setup() in setup.py like so:

setup(
    project_urls={
        "Source Code": "INSERT URL HERE",
    },
    ...
)