incrementally ingest eval results
nix-eval-jobs streams output, unlike hydra-eval-jobs. Now that we've
migrated, we can use this to:
1. Use less RAM by avoiding buffering a whole eval's worth of metadata
into a Perl string and an array of JSON objects.
2. Make evals latency a bit lower by allowing the queue runner to start
ingesting builds faster.
Also use the newly-restored constituents support in `nix-eval-jobs`
Note, we pass --workers and --max-memory-size to n-e-j
Lost in the h-e-j -> n-e-j migration, causing evaluation to always be
single threaded and limited to 4GiB RAM. Follow the config settings like
h-e-j used to do (via C++ code).
`nix-eval-jobs` should check `hydraJobs` and then `checks` with flakes
(cherry picked from commit 6d4ccff43c41adaf6e4b2b9bced7243bc2f6e97b)
(cherry picked from commit b0e9b4b2f99f9d8f5c4e780e89f955c394b5ced4)
(cherry picked from commit cdfc5c81e8037d3e4818a3e459d0804b2c157ea9)
(cherry picked from commit 4b107e6ff36bd89958fba36e0fe0340903e7cd13)
Co-Authored-By: Maximilian Bosch <maximilian@mbosch.me>
Based off the existing GithubPulls.pm and GitlabPulls.pm plugins.
Also adds an integration test for the new 'giteapulls' input type to
the existing 'gitea' test.
Original commit message:
> There are some known regressions regarding local testing setups - since
> everything was kinda half written with the expectation that build dir =
> source dir (which should not be true anymore). But everything builds and
> the test suite runs fine, after several hours spent debugging random
> crashes in libpqxx with MALLOC_PERTURB_...
I have not experienced regressions with local testing.
(cherry picked from commit 4b886d9c45cd2d7fe9b0a8dbc05c7318d46f615d)
In my system logs I see this every time a new eval starts:
```
hydra-evaluator[PID]: hint: Using 'master' as the name for the initial branch. This default branch name
hydra-evaluator[PID]: hint: is subject to change. To configure the initial branch name to use in all
hydra-evaluator[PID]: hint: of your new repositories, which will suppress this warning, call:
hydra-evaluator[PID]: hint:
hydra-evaluator[PID]: hint: git config --global init.defaultBranch <name>
hydra-evaluator[PID]: hint:
hydra-evaluator[PID]: hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hydra-evaluator[PID]: hint: 'development'. The just-created branch can be renamed via this command:
hydra-evaluator[PID]: hint:
hydra-evaluator[PID]: hint: git branch -m <name>
```
This ensures this hint is not logged anymore and unclutters the syslog.
I presume it does not really matter what name is chosen for the branch.
See https://github.com/NixOS/hydra/pull/1414#issuecomment-2412350929
The variable is defined in src/lib/Hydra/Helper/Nix.pm
Error message without this patch:
```
hydra-evaluator[PID]: Couldn't require Hydra::Plugin::S3Backup : Global symbol "$MACHINE_LOCAL_STORE" requires explicit package name (did you forget to declare "my $MACHINE_LOCAL_STORE"?) at /nix/store/xxx-hydra-0-unstable-2024-09-24/libexec/hydra/lib/Hydra/Plugin/S3Backup.pm line 95.
hydra-evaluator[PID]: Compilation failed in require at /nix/store/xxx-hydra-perl-deps/lib/perl5/site_perl/5.38.2/Module/Runtime.pm line 314.
hydra-evaluator[PID]: at /nix/store/xxx-hydra-perl-deps/lib/perl5/site_perl/5.38.2/Module/Pluggable.pm line 32.
```
With https://github.com/NixOS/nix/pull/9839, the `storeUri` field is
much better structured, so we can use it while still opening the SSH
connection ourselves.
This fixes:
> Caught exception in Hydra::Controller::Root->realisations "Undefined subroutine &Hydra::Controller::Root::queryRawRealisation called at /nix/store/v842xb35ph8ka1yi1xanjhk4xh1pn5nm-hydra-2024-04-22/libexec/hydra/lib/Hydra/Controller/Root.pm line 371."
When content addressed derivations are built on the hydra server,
one may run into an issue where some builds suddenly don't load anymore.
This seems to be caused by outPaths that are NULL (which is
allowed for ca-derivations). Filter them out to prevent querying the
database for them, which is not supported by the database abstraction
layer that's currently in use.
On my instance this appears to resolve the issue.
I feel like I might be doing this at the wrong abstraction layer, but on
the other hand -- it seems to resolve it and it also doesn't really look
like it will hurt anything.
The test added in a previous commit uncovers this issue, and this commit
resolves it. So I'm happy with this patch for now.
The issue I was seeing on my server:
hydra-server[2549]: [error] Couldn't render template "undef error - DBIx::Class::SQLMaker::ClassicExtensions::puke(): Fatal: NULL-within-IN not implemented: The upcoming SQL::Abstract::Classic 2.0 will emit the logically correct SQL instead of raising this exception. at /nix/store/<hash>-hydra-unstable-2024-03-08_nix_2_20/libexec/hydra/lib/Hydra/Helper/Nix.pm line 190
See also short discussion here: https://github.com/NixOS/nixpkgs/pull/297392#issuecomment-2035366263
Closes#1336
When restarting postgresql, the connections are still reused in
`hydra-queue-runner` causing errors like this
main thread: Lost connection to the database server.
queue monitor: Lost connection to the database server.
and no more builds being processed.
`hydra-evaluator` doesn't have that issue since it crashes right away.
We could let it retry indefinitely as well (see below), but I don't
want to change too much.
If the DB is still unreachable 10s later, the process will stop with a
non-zero exit code because of a missing DB connection. This however
isn't such a big deal because it will be immediately restarted
afterwards. With the current configuration, Hydra will never give up,
but restart (and retry) infinitely. To me that seems reasonable, i.e. to
retry DB connections on a long-running process. If this doesn't work
out, the monitoring should fire anyways because the queue fills up, but
I'm open to discuss that.
Please note that this isn't reproducible with the DB and the queue
runner on the same machine when using `services.hydra-dev`, because of
the `Requires=` dependency `hydra-queue-runner.service` ->
`hydra-init.service` -> `postgresql.service` that causes the queue
runner to be restarted on `systemctl restart postgresql`.
Internally, Hydra uses Nix's pool data structure: it basically has N
slots (here DB connections) and whenever a new one is requested, an idle
slot is provided or a new one is created (when N slots are active, it'll
be waited until one slot is free). The issue in the code here is however
that whenever an error is encountered, the slot is released, however the
same broken connection will be reused the next time. By using
`Pool::Handle::markBad`, Nix will drop a broken slot. This is now being
done when `pqxx::broken_connection` was caught.