The hydra-queue-runner opens a connection to the builder. If the
builder is 'localhost' it starts `nix-store`, otherwise it starts
'ssh'.
Currently, if the hydra-queue-runner can not start `nix-store` (not in
the PATH for instance), the error message is:
cannot connect to ‘localhost’: error: cannot start ssh: No such file
or directory
This is not useful since ssh is actually not started:/
With this patch the error message is now:
cannot connect to ‘localhost’: error: cannot start nix-store: No such file
or directory
This cannot be done in the hydra-evaluator systemd unit, since then
every other Nix process (e.g. hydra-evaluator and nix-prefetch-*) will
also allocate the specified heap size, probably leading to OOM.
Thus, we no longer hold the send lock while substituting missing paths
on the build machine. This is a good thing in particular for macOS
builders which have a tendency to hang forever in curl downloads.
Previously, when hydra-queue-runner was restarted, any pending "build
finished" notifications were lost. Now hydra-queue-runner marks
finished but unnotified builds in the database and uses that to run
pending notifications at startup.
The queue runner can now run up to ‘max-concurrent-notifications’ in
parallel (default is 2). This is useful when some hydra-notify
invocations can take a long time to complete (e.g. because they need
to compress a giant build log) and we don't want this to block all
other notifications.
As @dtzWill discovered, with the concurrent hydra-evaluator, there can
be multiple active transactions adding builds to the database. As a
result, builds can become visible in a non-monotonically increasing
order, breaking the queue monitor's assumption that build IDs only go
up.
The fix is to have hydra-eval-jobset provide the lowest build ID it
just added in the builds_added notification, and have the queue
monitor check from there.
Fixes#496.
Adding a 96-core aarch64 build machine to the build farm caused the
potential number of database connections to increase a lot, so we
started hitting the Postgres connection limit.
Setting
xxx-jobset-repeats = patchelf:master:2
will cause Hydra to perform every build step in the specified jobset 2
additional times (i.e. 3 times in total). Non-determinism is not fatal
unless the derivation has the attribute "isDeterministic = true"; we
just note the lack of determinism in the Hydra database. This will
allow us to get stats about the (lack of) reproducibility of all of
Nixpkgs.
Builds can now specify the attribute "isDeterministic = true" to tell
Hydra to build with build-repeat > 0. If there is a mismatch between
rounds, the step / build fails with a suitable status.
Maybe this should be a meta attribute, but that makes it invisible to
hydra-queue-runner, and it seems reasonable to make a claim of
mandatory determinism part of the derivation (since e.g. enabling this
flag should trigger a rebuild).