The discussion happened in the issue and the PR . I checked that the job was non-voting and so made the assumption that it’s fine to add a non-voting CI.
I agree, it’s annoying me as well. FreeBSD and Windows are both affected by test_concurrent_futures sickness: test_crash_big_data() and test_interpreter_shutdown() random failures. It would be good if someone can look into these issues Well, many buildbots are also affected by this sickness.
I don’t know if GitHub gives a fine control on how it summarizes all CI jobs as “success” or “failure”.
By the way, I have the same problem with Windows x86 and Azure Pipelines which are more and more unstable these days, whereas they are non-voting.
Is Windows 32-bit still supported nowadays? PEP 11 says that it’s a Tier-1 platforms and so: “All core developers are responsible to keep main, and thus these platforms, working.”
This is not entirely accurate. The documentation is phrased as such due to the audience being highly varied in skill and confidence levels and familiarity with the rolling release concept, so is more of “you accept the challenge of learning as you go”. In reality, -CURRENT is meant to be usable as a rolling release in the appropriate environments (this is one of them!), not dissimilar to Linux distributions that operate this way. (-STABLE is also a rolling release, but the base system API/ABI is kept as stable as possible)
I agree that CI isn’t the best way, entirely because you need a pull request or commit on the CPython side to trigger a build. I haven’t looked into CirrusCI enough to see what they do for -CURRENT, but it’s not that relevant anyway.
As for a buildbot, in theory I could set one/some up as jail(s) on my “HP air fryer”, but it wouldn’t be public or that automated, not to mention not a good use of my personal resources. Without some kinds of external support, which judging by this and related threads the general tone reads like “go pound sand”, I could pull the plug at any time for any or no reason just like koobs@ did, and the community would be left with nothing, not even recourse. Thus, the issue isn’t that much about who stands a new buildbot up, but how to keep them up in a sustainable manner that benefits both CPython and FreeBSD. I cannot emphasise the sustainability bit enough, not least because I am precipitously being driven towards burnout myself, and these types of discussions aren’t helping.
I don’t think I suggested youse running -CURRENT as users, but rather the build and testing infrastructure prioritise -CURRENT as a target (whilst keeping -RELEASEs at least) for easier and more immediate triage and tracing of problems as they happen. Neither does the suggestion exist that youse as CPython have to go it alone. Whatever comes of the buildbot or similar would have some alerting facility.
What is needed to provide a buildbot worker from the FreeBSD side? Install devel/py-buildbot-worker and go with New Buildbot Workers (python.org)? What are the requirements about this buildbot worker in terms of availability? I could easily provide a worker in a FreeBSD jail on a 16 CPU Xeon with 64 GB RAM. It runs FreeBSD current, but occasionally reboots or is under full load. How often will the buildbot receive work?
Are you (whoever is interested in setting up a FreeBSD system) aware of the free offer from Oracle to get one 2 CPU 24 GB RAM 200 GB disk arm-VM in their cloud? Would a FreeBSD-arm system running current be interesting as a python-buildbot-worker?
FreeBSD is often good at uncovering concurrency issues that don’t occur elsewhere, due to different implementations and timings.
From the FreeBSD side if we’re given the choice of excluding FreeBSD from CI or skipping this test on FreeBSD I would much prefer the latter. Although that would just mask the problem in my opinion identifying new regressions is much more valuable.
That said some of these issues seem to be longstanding sources of instability in test results across many platforms, and if FreeBSD is able to reproduce them more reliably that should be a good thing.
Sadly, test_concurrent_futures has a failure rate of at least 1/3 even or even 50% on Windows CI (on GitHub Action) these days (I hope that I’m pessimistic and it’s lower.) There are multiple test_concurrent_futures issues, so it’s even more likely that a CI fails because one of these test_concurrent_futures issues. It’s just statistics.
That’s why I’m actively tracking tests failing randomly and attempting to fix them one by one. By the way, I need your help, see my cpython issues that I reported, many of them are unstable tests
Nope. I think the general approach for having required vs not is so you know you can take the risk to merge even with a failure going, but otherwise you should avoid merging unless all of CI is passing.
Do you want to start a separate conversation about dropping Azure Pipelines from CI?
I believe the building of it is still supported for older Python versions, but I don’t think Python 3.12 is going to have 32-bit installers released. Did you want to start a conversation about dropping 32-bit Windows from being tier 1 for CPython going forward?
Yep. That’s why we have platform support tiers and FreeBSD is currently tier 3 mostly thanks to Victor’s hard work.
Sure, but we still have to understand the failures.
Noticing the failure is part of it. The other is understanding the failures (e.g., is it FreeBSD or CPython’s fault), and then how to fix it.
To be considered stable, roughly 24/7. This isn’t to say stuff going down for a day or something on occasion is the end of the world for a tier 3 platform, but in general the machine is expected to be up.
Off-and-on throughout the day. Any time a PR lands on any maintained branch plus explicit requests on PRs to test on specific buildbots. For instance, the current FreeBSD with the list of builds over the last couple of days: Buildbot
Not sure who “you” is in this case, but I’m personally not (but then again I don’t have the bandwidth to maintain another buildbot as I’m busy setting one up for WASI).
It sounds like Charlie is. More buildbots with different configurations also doesn’t hurt, it’s just a question of who pays attention to them to file issues and help fix bugs the buildbot finds.
That’s not the question. It’s more of a policy thing as no one has added a tier 3 platform to CI before, so as a team it has not been discussed if we want that (see PEP 11 and what a tier 3 platform means in terms of support, as it suggests the buildbot could be failing forever).
I realize that; I’m just trying to provide some feedback or insight that might help inform the policy decision to come.
I see that Tier 3 includes " * Must have a reliable buildbot." so I assume this is mean nobody has added a tier 3 platform to tests run on pull requests (not more general testing).
I can imagine that discussion going from “FreeBSD CI results are unreliable and it causes a lot of extra load to investigate if there is an actual issue or not” to “Tier-3 systems are not permitted in [pull request tests].”
Correct. Everything in CI right now are tier 1 platforms (which are platforms every core dev is responsible for and not supposed to break main for).
It’s not even that. It’s simply a question of do we want to have tiers show up as failures ever in CI? Even if FreeBSD had no failures ever I would still be bringing this up as it’s a question of the intent of CI and the status checks. Are non-required/“optional” status checks okay to have in general, or only in exceptional cases? And since tier 2 and 3 don’t block merging by policy, this is causing a bigger question of how we want to treat/interpret our CI results.
At one point that was probably the major “blocker”, but since then there have been bug reports (and backstories) for people hosting CPython in 32-bit processes, so that’s another group who need 32-bit builds. Even if we were to stop publishing our own binaries, I wouldn’t want to remove the CI jobs - they’re probably the only ones actually checking that we handle data types safely and don’t just assume 64-bit everywhere.
Note: Python 3.12 release candidate versions have Windows 32-bit installer, and @thomas confirmed me that the Python 3.12 final version will be released with a Windows 32-bit installer. If something changes, it should only occur in Python 3.13. Well, let’s continue the discussion in the topic that I created
Sorry, I didn’t get if anyone knows a way to exclude "non-voting CIs from the overall PR status?
If there is no way for that, I would like to suggest disable/remove all non-voting CIs on pull requests, and only run them on the main branch (once changes are merged):
My concern is that these CI resources will be wasted since basically no one is going to look into these failures anymore. I already noticed that very few people pay attention to CIs and prefer to ignore all failures as “it’s not my business, the failure is not related to my exact change”. It’s the “duty of everyone” to fix the CI, which means that… nobody fix it
I don’t know how to do this, unless there’s some way to catch a failure and report a pass or
a skip. But that would hide the failure.
We can mark some checks as “required”, which means they must pass for the PR to be merged.
There’s a lot of overlap between Azure Pipelines and GitHub Actions. GitHub Actions is easier to use and maintain, so let’s move things over. python/cpython#105823 already removed some duplicate docs testing from AP. There’s still some unique patchcheck things in AP that should be need moving over.
Yes, they’re run under the “Tests” workflow:
It has logic to only run docs jobs if there are docs changes (for example), and only run Ubuntu, Windows, macOS CI etc if there are normal code changes (for example).
Cirrus-CI might be an alternative, apparently, given this issue and relative PR, although I’m not familiar with them. Recently they introduced a limit to free usage which Python would likely exceed. So someone would have to get in touch and ask for additional resources for Python.
From that blog post:
Starting September 1st 2023, there will be an upper monthly limit on free usage equal to 50 compute credits (which is equal to a little over 16,000 CPU-minutes for Linux tasks).