In 73694087a088ed4481b4ab268a03351b1bcaac3c I gave builds that failed
because of a timeout or exceeded log limit a stop sign and I stand by
that reasoning: with that it's possible to distinguish between actual
build failures and rather transient things such as timeouts.
Back then I considered it a feature that these are shown in a different
tab, but I don't think that's a good idea anymore. When using a jobset to
e.g. track the regressions from a mass rebuild (like a compiler or gcc
update), "Newly failed builds" should exclusively display regressions (and
flaky builds of course, not much I can do about that).
Also, when a bunch of builds fail in such a jobset because of e.g. a
broken connection to a builder that results in a timeout, I want to be
able to restart them all w/o rebuilding actual regressions.
To make it clear that we not only have "Aborted" builds in the tab, I
renamed the label to "Aborted / Timed out".
Depends on https://github.com/nix-community/nix-eval-jobs/pull/349 & #1421.
Almost equivalent to #1425, but with a small change: when having e.g. an
aggregate job with a glob that matches nothing, the jobset evaluation is
failed now. This was the intended behavior before (hydra-eval-jobset
fails hard if an aggregate is broken), the code-path was never reached
however since the aggregate was never marked as broken in this case
before.
There were some hangs caused by this. Need to fix them, ideally
reproducing the issue in a test, before trying this again.
This reverts commit 4a4a0f901c70676ee47f830d2ff6a72789ba1baf.
My main motivation here is to get metrics with brackets to work in order
to support "pytest" test names:
- test_foo.py::test_bar[1]
- test_foo.py::test_bar[2]
I couldn't find an "HTML escape"-style function that would generate
valid html `id` attribute names from random strings, so I went with a
hash digest instead.
Needed one more thing before trying out using `LegacySSHStore` directly.
Flake lock file updates:
• Updated input 'nix':
'github:NixOS/nix/674a87462cb93f605d4fbeef607d3453e7e5a7d8?narHash=sha256-TBoHqnIdVWhsBcL05vO2B1hSl9m//5Mz2NU%2BPMk3h3Y%3D' (2025-02-16)
→ 'github:NixOS/nix/e310c19a1aeb1ce1ed4d41d5ab2d02db596e0918?narHash=sha256-q/RgA4bB7zWai4oPySq9mch7qH14IEeom2P64SXdqHs%3D' (2025-02-18)
This avoids some duplicated code, leveraging the same `StoreReference`
type that also undergirds the machine file dedup we just did prior.
By using `LegacySSHStoreConfig`, we're also taking a baby step towards
using the store interface rather than messing around with the protocol
internals.
incrementally ingest eval results
nix-eval-jobs streams output, unlike hydra-eval-jobs. Now that we've
migrated, we can use this to:
1. Use less RAM by avoiding buffering a whole eval's worth of metadata
into a Perl string and an array of JSON objects.
2. Make evals latency a bit lower by allowing the queue runner to start
ingesting builds faster.
Also use the newly-restored constituents support in `nix-eval-jobs`
Note, we pass --workers and --max-memory-size to n-e-j
Lost in the h-e-j -> n-e-j migration, causing evaluation to always be
single threaded and limited to 4GiB RAM. Follow the config settings like
h-e-j used to do (via C++ code).
`nix-eval-jobs` should check `hydraJobs` and then `checks` with flakes
(cherry picked from commit 6d4ccff43c41adaf6e4b2b9bced7243bc2f6e97b)
(cherry picked from commit b0e9b4b2f99f9d8f5c4e780e89f955c394b5ced4)
(cherry picked from commit cdfc5c81e8037d3e4818a3e459d0804b2c157ea9)
(cherry picked from commit 4b107e6ff36bd89958fba36e0fe0340903e7cd13)
Co-Authored-By: Maximilian Bosch <maximilian@mbosch.me>