My current theory is that running more parallel xz than available CPU
cores is reducing our overall throughput by requiring more scheduling
overhead and more cache thrashing.
With https://github.com/NixOS/nix/pull/9839, the `storeUri` field is
much better structured, so we can use it while still opening the SSH
connection ourselves.
Closes#1336
When restarting postgresql, the connections are still reused in
`hydra-queue-runner` causing errors like this
main thread: Lost connection to the database server.
queue monitor: Lost connection to the database server.
and no more builds being processed.
`hydra-evaluator` doesn't have that issue since it crashes right away.
We could let it retry indefinitely as well (see below), but I don't
want to change too much.
If the DB is still unreachable 10s later, the process will stop with a
non-zero exit code because of a missing DB connection. This however
isn't such a big deal because it will be immediately restarted
afterwards. With the current configuration, Hydra will never give up,
but restart (and retry) infinitely. To me that seems reasonable, i.e. to
retry DB connections on a long-running process. If this doesn't work
out, the monitoring should fire anyways because the queue fills up, but
I'm open to discuss that.
Please note that this isn't reproducible with the DB and the queue
runner on the same machine when using `services.hydra-dev`, because of
the `Requires=` dependency `hydra-queue-runner.service` ->
`hydra-init.service` -> `postgresql.service` that causes the queue
runner to be restarted on `systemctl restart postgresql`.
Internally, Hydra uses Nix's pool data structure: it basically has N
slots (here DB connections) and whenever a new one is requested, an idle
slot is provided or a new one is created (when N slots are active, it'll
be waited until one slot is free). The issue in the code here is however
that whenever an error is encountered, the slot is released, however the
same broken connection will be reused the next time. By using
`Pool::Handle::markBad`, Nix will drop a broken slot. This is now being
done when `pqxx::broken_connection` was caught.
We have to oddly make a `StoreConfig` subclass to get it, but
https://github.com/NixOS/nix/pull/9848 will fix that.
The purpose of this is to ensure that, absent an explicit config,
`localhost` includes `ca-derivations` and `recursive-nix` if those
experimental features are enabled.
Very much the complement of #1342, the previous PR.
This is *just* using the fields from that type, and only where the types
coincide. (There are two fields with different types, `speedFactor` most
interestingly.) No code is reused, so we can be sure that no behavior is
changed.
Once the types are reconciled on the Nix side, then we can start
carefully actually reusing code.
Progress on #1164
This is just C++ changes without any Perl / Frontend / SQL Schema
changes.
The idea is that it should be possible to redeploy Hydra with these
chnages with (a) no schema migration and also (b) no regressions. We
should be able to much more safely deploy these to a staging server and
then production `hydra.nixos.org`.
Extracted from #875
Co-Authored-By: Théophane Hufschmitt <theophane.hufschmitt@tweag.io>
Co-Authored-By: Alexander Sosedkin <monk@unboiled.info>
Co-Authored-By: Andrea Ciceri <andrea.ciceri@autistici.org>
Co-Authored-By: Charlotte 🦝 Delenk Mlotte@chir.rs>
Co-Authored-By: Sandro Jäckel <sandro.jaeckel@gmail.com>
Otherwise, when the port is randomly chosen (e.g. by specifying no port,
or a port of 0), it will just show that the port is 0 and not the port
that is actually serving the metrics.
Periodically, I have seen tests fail because of out of order queue runner behavior:
checking the queue for builds > 0...
loading build 1 (tests:basic:empty_dir)
aborting unsupported build step '...-empty-dir.drv' (type 'x86_64-linux')
marking build 1 as failed
adding new machine ‘localhost’
This patch should prevent the dispatcher from running before any machines are
made available.
Recently a few internal APIs have changed[1]. The `outputPaths` function
has been removed and a lot of data structures are modeled with
`std::optional` which broke compilation.
This patch updates the code in `hydra-queue-runner` accordingly to make
sure that Hydra compiles again.
[1] https://github.com/NixOS/nix/pull/3883