GitHub Issues Migration is coming soon

Currently the CLA checker bot, The Knights Who Say “Ni” (repo), checks on BPO if the PR contributor has a BPO account with a linked GitHub username, and has signed the CLA. (For example.)

Will the bot need an update to check the CLA status for contributors from another source?

1 Like

Yes, I already set up a replacement based on EdgeDB and hosted on Heroku. Now testing on some test repos, and moving data from BPO.

6 Likes

Sorry if I’ve missed this somewhere: all :issue: links in the documentation need to be updated, right? Has this been tested?

For doing this, there are documentation translation issues to think about. If there is a batch replacement of issue numbers, it would also need to be made in batch in the .po files in the python-docs-* repositories. (CC @julien)

2 Likes

Existing links will still work, so there’s no rush to update the result of :issue: links. Also, excluding the whatsnew directory (I’m not sure it’s worthwhile to update the links in the historical whatsnew documents), there are only a handful of :issue: links in the rest of the docs.

What we could do ahead of time is add a new :github: (or :gh-issue: and :gh-pr:? I’m not too particular on the color :slight_smile: ) role for use instead of :issue:, and when the time comes the links outside of whatsnew can be updated to the appropriate :github: link, either en masse or as they come into question.

Since :issue: is a role, and (AFAIK) a custom one, once the old BPO issues are frozen, you could just modify the role code to point to the GH issue instead, using a static lookup table mapping BPO → GH numbers, while using a new :github: role to explicitly point to the new issues. This would avoid mass-changing every reference in every source file, just the snippet of code that handles their display. You could even have it render something like BPO-12345 (GH-23456), with both links (or vice versa). That’s the beauty and power of Sphinx roles and directives over hardcoded markup in the source.

2 Likes

It is custom, defined at

Reasonably easy to extend, although a hash table of every old to new issue would be rather large!

A

1 Like

It wouldn’t be that small, but if the source and target were stored as a 5-character (i.e. byte, for ASCII numbers) string, it would be under 500 KB (not counting object overhead) for all current BPO issues. Storing them as standard 64-bit ints would come out to 800 KB (they would also fit in 16-bit unsigned ints, which would only be 200 KB + overhead).

1 Like

Right, there’s no rush to update existing issue links, although there is a point in having a :gh: (or whatever bikeshed color) role ready by the time of the migration, for use in newly written News entries. A static mapping sounds like a good idea. Regarding News entries, it would be possible to check for the date in the file name of an entry using :issue: being earlier than the date of the migration, as a safeguard against muscle memory causing people to write :issue: instead of :gh: inadvertently.

Another option (although more disruptive) would be to mass-rename all :issue: roles currently in use to e.g. :bpo-issue:, freeing up the :issue: role for GitHub issues.

A

Redirects from BPO issue numbers to GH equivalents is planned: Add a page that redirects from bpo to GitHub · Issue #17 · psf/gh-migration · GitHub

As a follow-up of #15, once we have the GitHub id as an attribute in the issue items, we need to create a new script accessible through a URL like bugs.python.org/redirect/BPO-ID that redirects to the corresponding GitHub issue. This could be deployed and tested with fake IDs even before the migration starts.

The plan is to replace #XXXXX issues references with BPO-XXXXX in messages and set GitHub autolinking to point to bugs.python.org/redirect/XXXXX, which in turn redirects back to the corresponding GitHub issue.

So could the :issue: role be adjusted to bugs.python.org/redirect/XXXXX, which will then redirect to github.com/python/cpython/issues/YYYYY?

2 Likes

That would definitely seem to be the ideal solution, since it should only require a fairly trivial change in the :issue: role code, rather than a large, complex and potentially expensive lookup table operation.

1 Like

BPO issue numbers and GH issue/PR numbers are not unique and intersect. I’d leave :issue: for BPO only as it is and add a :gh: as Jean suggested. Given GH has a single numberspace per project not bothering to distinguish between Issue and PR makes sense.

What I want is to prevent natural human mistakes: If :issue: is made to support both BPO and GH namespaces at once, it is easy to forget to add a specifier as to which one it is when they conflict and wind up linking to entirely the wrong thing. A distinct tag and only one link destination per tag rather than a magic lookup helps reduce the chance of us humans doing that.

2 Likes

When I unfold “Show more details” of Py_XDECREF() module on fail in Py_mod_exec · Issue #46605 · python/issues-test-demo-20220218 · GitHub I see many fields which are set to None, [] or [None]. Moreover, nosy_count is redundant with len(nose_ids) and message_count is redundant with len(messages).

Would it be possible to omit them to make the display shorter (easier to read)?

Example with bugs.python.org fields:

activity = <Date 2022-02-01.21:49:08.439>
actor = 'ov2k'
actor_id = '41979'
assignee = 'none'
assignee_id = None
closed = False
closed_date = None
closer = None
closer_id = None
components = ['Extension Modules']
creation = <Date 2022-02-01.21:49:08.439>
creator = 'ov2k'
creator_id = '41979'
dependencies = []
files = []
hgrepos = []
issue_num = 46605
keywords = []
message_count = 1.0
messages = ['412315']
nosy_count = 1.0
nosy_ids = ['41979']
pr_ids = []
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue46605'
versions = ['Python 3.7', 'Python 3.8', 'Python 3.9', 'Python 3.10', 'Python 3.11']

Without losing any information, It can be simplified to:

activity = <Date 2022-02-01.21:49:08.439>
actor = 'ov2k'
actor_id = '41979'
closed = False
components = ['Extension Modules']
creation = <Date 2022-02-01.21:49:08.439>
creator = 'ov2k'
creator_id = '41979'
issue_num = 46605
messages = ['412315']
nosy_ids = ['41979']
priority = 'normal'
status = 'open'
type = 'behavior'
url = 'https://bugs.python.org/issue46605'
versions = ['Python 3.7', 'Python 3.8', 'Python 3.9', 'Python 3.10', 'Python 3.11']

Issues · python/issues-test-demo-20220218 · GitHub has a short list of labels. For example, there is no label for the SSL component. “C API” and “Subinterpreters” components are also missing. Is it a deliberate choice?

By the way, I dislike “Library” name. Maybe “Stdlib” would be a better name. I know that “Library” is also used by blurb for the Changelog.

There is no Python version label or field. I’m fine with dropping it, most people misuse it :slight_smile:

I like the “Build” component which is also missing. Can I vote for my favorite components? :smiley: My worry is that once the 40k+ issues will be migrated, it will be too late to add labels lost in the conversion. Unless there is a way to automate modifying all issues migrated to GitHub? But it may be better to do it at the beginning?

1 Like

blurb/Changelog and What’s New in Python 3.11 currently uses bpo numbers. The Python 3.11 changelog and What’s New already contains many references to bpo. IMO it’s fine to keep them.

But. After the flag day, what number should I put in the “issue” field of blurb? The GitHub issue number? How will blurb distinguish bpo issues from GH issues?

1 Like

Labels are discussed here: Map bpo issue metadata to GitHub fields/labels · Issue #5 · psf/gh-migration · GitHub

I tried to keep the ones that seemed most used/useful, also based on some analysis of the current issues. The more labels there are the more selecting and applying all the right labels become difficult (as you mentioned about the version), so I’m trying to make things simpler and find a balance between simplicity and accurate classification.

Other things to consider while selecting labels:

  • GitHub doesn’t allow nested labels or label groups so all the labels are listed together, possibly creating a very long list of labels to pick from in the sidebar. Colors and prefixes help, but this also results in a more “chaotic” issue list, with several long and colorful tags.
  • Only triagers/committers can set labels – I don’t think that people reporting issues can. On one hand this is bad because triagers/committers will be responsible for all labels, on the other hand it prevents other people that are not familiar with our conventions from mistriaging issues.
  • GitHub projects or meta-issues could be used to track and organize related issues too.

I initially put those there mostly for debug purposes and also to preserve fields that are not being migrated, so I didn’t spent much time trying to make this more readable, but it’s something we could do. In particular, several of the fields list internal bpo ids, and the corresponding value is also displayed (e.g. actor_id vs actor). All the fields with ids that have been translated could be removed. The first list with the GitHub fields can also be removed.

I’m currently moving more fields (nosy list members, linked PRs, attachments, dependencies, etc.) in the table at the top, so that they they also work as links. Instead of an ini-like code block, I could turn the other values into a table too.

1 Like

Regarding the :issue: issue, I confirm things mentioned by the previous posters:

  • We are planning to add a redirect URL on bpo that given a bpo id it redirects to the corresponding GH issue automatically. URLs pointing to bpo, can be update to this new URL without having to update/know the new GH ID.
  • The code of the :issue: role can be updated easily, to change both the URL and the text.
  • It’s also true that GH uses a shared namespace for issues and PRs, so a new :gh: role makes sense.

The only thing I’m conflicted about is what to do with :issue:. I don’t think it should work with both sets of IDs given the overlapping, and having a :gh: and :bpo: pair makes the most sense to me since it avoids confusion and ambiguity, even though it would also require an s/:issue:/:bpo:/g.

:bpo: could be added as a new explicit role and :issue: kept around as a deprecated alias of :bpo:, even though if we are not going to need bpo links in the future, having :bpo: without the mass-replace is a bit useless since it will never be used. Regardless, in the future we will only use :gh: and make patchcheck (or its modern incarnation if it’s still around) could warn users that try to use :issue: in new code.

1 Like

I’m not too familiar with blurb, but ISTM that it already uses unambiguous bpo-prefixed IDs in the filenames. After the migration I would assume that we will only use GH IDs, so blurb might need to be updated to accept GH IDs and use gh-prefixed IDs instead, possibly pointing the old bpo references to the redirect link mentioned above.

Edit: I reported this here: Update blurb to use GitHub IDs · Issue #428 · python/core-workflow · GitHub

I think the versions are very important, enough that GitHub should add something for them if they haven’t yet. You want to know what users are looking at when they describe a bug.

It’s true that this field is not always populated correctly, but many times it is and it provides useful information.

4 Likes