This is implement in an extremely hacky way due to poor DBIx feature
support. Ideally, what we'd need is a way to tell DBIx to ignore the
errormsg column unless explicitly requested, and to automatically add a
computed 'errormsg IS NULL' column in others. Since it does not support
that, this commit instead hacks some support via method overrides while
taking care to not break anything obvious.
This allows for better builder usage when the queue runner is busy. To
avoid running into uncontrollable imbalances between builder/queue
runner, we only release the machine reservation after the local
throttler has found a slot to start copying the outputs for that build.
As opposed to asserting uniqueness to understand resource utilization,
we just switch to using `std::unique_ptr`.
We don't rely on sequential / monotonic build IDs processing anymore, so
randomizing actually has the advantage of mixing builds for different
systems together, to avoid only one chunk of builds for a single system
getting processed while builders for other systems are starved.
Each output for a given step being ingested is looked up in parallel,
which should basically multiply the speed of builds ingestion by the
average number of outputs per derivation.
Running the query with/without it shows that it makes no difference to
postgres, since there's an index on finished=0 already. This allows a
few simplifications, but also paves the way towards running multiple
parallel monitor threads in the future.
By looking at the ratio of running vs. waiting for the dispatcher and
the queue monitor, we should get better visibility into what hydra is
currently bottlenecked on.
There are other side effects we can try to measure to get to the same
result, but having a simple way doesn't cost us much.
My current theory is that running more parallel xz than available CPU
cores is reducing our overall throughput by requiring more scheduling
overhead and more cache thrashing.
The third argument to `open()` in `-|` mode is passed to a shell if it's
a string. In my case the store URI contains
`?secret-key=${signingKey.directory}/secret&compression=zstd`
For the `nix store cat` case this means that
* until `&` the process will be started in the background. This fails
immediately because no path to cat is specified.
* `compression=zstd` is a variable assignment
* the `$path` argument to `store cat` is attempted to be executed as
another command
Passing just the list solves the problem.
(cherry picked from commit 3ee51dbe589458cc54ff753317bbc6db530bddc0)
When an artifact is requested from hydra the output is first copied
from the nix store into memory and then sent as a response, delaying
the download and taking up significant amounts of memory.
As reported in https://github.com/NixOS/hydra/issues/1357
Instead of calling a command and blocking while reading in the entire
output, this adds read_into_socket(). the function takes a
command, starting a subprocess with that command, returning a file
descriptor attached to stdout.
This file descriptor is then by responsebuilder of Catalyst to steam
the output directly
(cherry picked from commit 459aa0a5983a0bd546399c08231468d6e9282f54)
When building e.g. nixpkgs, the "Running builds" view will mostly look
like this
hello.x86_64-linux (Build of hello-X.Y)
exa.x86_64-linux (Build of exa-X.Y)
...
This doesn't provide any useful information. Showing the step name only
makes sense if it's not a child of the job's derivation. With this
patch, that information will only be shown if the drv name (i.e. w/o
`/nix/store/` prefix, .drv ext & hash) is not equal to the drv name of
the job itself (build.nixname).
When using Hydra to build machine configurations, you'll often see
"nixosConfigurations.foo" five times, i.e. for each build step being
run. This isn't very helpful I think because in such a case, a single
build step can also be compiling the Linux kernel.
This change also fetches the `drvpath` and `type` from the `buildsteps`
relation. We're already joining it, so this doesn't make much difference
(confirmed via query logging that this doesn't cause extra SQL queries).
Unfortunately build steps don't have a human readable name, so I'm
deriving it from the drvpath by stripping away the hash (assuming that
it'll never contain a `-` and that `/nix/store/` is used as prefix). I
decided against using the Nix bindings for that to avoid too much
overhead due to store operations for each build step.
In 73694087a088ed4481b4ab268a03351b1bcaac3c I gave builds that failed
because of a timeout or exceeded log limit a stop sign and I stand by
that reasoning: with that it's possible to distinguish between actual
build failures and rather transient things such as timeouts.
Back then I considered it a feature that these are shown in a different
tab, but I don't think that's a good idea anymore. When using a jobset to
e.g. track the regressions from a mass rebuild (like a compiler or gcc
update), "Newly failed builds" should exclusively display regressions (and
flaky builds of course, not much I can do about that).
Also, when a bunch of builds fail in such a jobset because of e.g. a
broken connection to a builder that results in a timeout, I want to be
able to restart them all w/o rebuilding actual regressions.
To make it clear that we not only have "Aborted" builds in the tab, I
renamed the label to "Aborted / Timed out".
My main motivation here is to get metrics with brackets to work in order
to support "pytest" test names:
- test_foo.py::test_bar[1]
- test_foo.py::test_bar[2]
I couldn't find an "HTML escape"-style function that would generate
valid html `id` attribute names from random strings, so I went with a
hash digest instead.
There were some hangs caused by this. Need to fix them, ideally
reproducing the issue in a test, before trying this again.
This reverts commit 4a4a0f901c70676ee47f830d2ff6a72789ba1baf.
This avoids some duplicated code, leveraging the same `StoreReference`
type that also undergirds the machine file dedup we just did prior.
By using `LegacySSHStoreConfig`, we're also taking a baby step towards
using the store interface rather than messing around with the protocol
internals.
incrementally ingest eval results
nix-eval-jobs streams output, unlike hydra-eval-jobs. Now that we've
migrated, we can use this to:
1. Use less RAM by avoiding buffering a whole eval's worth of metadata
into a Perl string and an array of JSON objects.
2. Make evals latency a bit lower by allowing the queue runner to start
ingesting builds faster.
Also use the newly-restored constituents support in `nix-eval-jobs`
Note, we pass --workers and --max-memory-size to n-e-j
Lost in the h-e-j -> n-e-j migration, causing evaluation to always be
single threaded and limited to 4GiB RAM. Follow the config settings like
h-e-j used to do (via C++ code).
`nix-eval-jobs` should check `hydraJobs` and then `checks` with flakes
(cherry picked from commit 6d4ccff43c41adaf6e4b2b9bced7243bc2f6e97b)
(cherry picked from commit b0e9b4b2f99f9d8f5c4e780e89f955c394b5ced4)
(cherry picked from commit cdfc5c81e8037d3e4818a3e459d0804b2c157ea9)
(cherry picked from commit 4b107e6ff36bd89958fba36e0fe0340903e7cd13)
Co-Authored-By: Maximilian Bosch <maximilian@mbosch.me>
Based off the existing GithubPulls.pm and GitlabPulls.pm plugins.
Also adds an integration test for the new 'giteapulls' input type to
the existing 'gitea' test.
Original commit message:
> There are some known regressions regarding local testing setups - since
> everything was kinda half written with the expectation that build dir =
> source dir (which should not be true anymore). But everything builds and
> the test suite runs fine, after several hours spent debugging random
> crashes in libpqxx with MALLOC_PERTURB_...
I have not experienced regressions with local testing.
(cherry picked from commit 4b886d9c45cd2d7fe9b0a8dbc05c7318d46f615d)
In my system logs I see this every time a new eval starts:
```
hydra-evaluator[PID]: hint: Using 'master' as the name for the initial branch. This default branch name
hydra-evaluator[PID]: hint: is subject to change. To configure the initial branch name to use in all
hydra-evaluator[PID]: hint: of your new repositories, which will suppress this warning, call:
hydra-evaluator[PID]: hint:
hydra-evaluator[PID]: hint: git config --global init.defaultBranch <name>
hydra-evaluator[PID]: hint:
hydra-evaluator[PID]: hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hydra-evaluator[PID]: hint: 'development'. The just-created branch can be renamed via this command:
hydra-evaluator[PID]: hint:
hydra-evaluator[PID]: hint: git branch -m <name>
```
This ensures this hint is not logged anymore and unclutters the syslog.
I presume it does not really matter what name is chosen for the branch.
See https://github.com/NixOS/hydra/pull/1414#issuecomment-2412350929
The variable is defined in src/lib/Hydra/Helper/Nix.pm
Error message without this patch:
```
hydra-evaluator[PID]: Couldn't require Hydra::Plugin::S3Backup : Global symbol "$MACHINE_LOCAL_STORE" requires explicit package name (did you forget to declare "my $MACHINE_LOCAL_STORE"?) at /nix/store/xxx-hydra-0-unstable-2024-09-24/libexec/hydra/lib/Hydra/Plugin/S3Backup.pm line 95.
hydra-evaluator[PID]: Compilation failed in require at /nix/store/xxx-hydra-perl-deps/lib/perl5/site_perl/5.38.2/Module/Runtime.pm line 314.
hydra-evaluator[PID]: at /nix/store/xxx-hydra-perl-deps/lib/perl5/site_perl/5.38.2/Module/Pluggable.pm line 32.
```