Commit Graph

244 Commits

Author SHA1 Message Date
09a1e64ed2 Dedup with nix: use nix::Machine::parseConfig
Companion to https://github.com/NixOS/nix/pull/10763
2024-05-23 09:59:46 -04:00
d55bea2a1e Utilize nix::Machine more fully
With https://github.com/NixOS/nix/pull/9839, the `storeUri` field is
much better structured, so we can use it while still opening the SSH
connection ourselves.
2024-05-22 22:02:46 -04:00
99afff03b0 hydra-queue-runner: drop broken connections from pool
Closes #1336

When restarting postgresql, the connections are still reused in
`hydra-queue-runner` causing errors like this

    main thread: Lost connection to the database server.
    queue monitor: Lost connection to the database server.

and no more builds being processed.

`hydra-evaluator` doesn't have that issue since it crashes right away.
We could let it retry indefinitely as well (see below), but I don't
want to change too much.

If the DB is still unreachable 10s later, the process will stop with a
non-zero exit code because of a missing DB connection. This however
isn't such a big deal because it will be immediately restarted
afterwards. With the current configuration, Hydra will never give up,
but restart (and retry) infinitely. To me that seems reasonable, i.e. to
retry DB connections on a long-running process. If this doesn't work
out, the monitoring should fire anyways because the queue fills up, but
I'm open to discuss that.

Please note that this isn't reproducible with the DB and the queue
runner on the same machine when using `services.hydra-dev`, because of
the `Requires=` dependency `hydra-queue-runner.service` ->
`hydra-init.service` -> `postgresql.service` that causes the queue
runner to be restarted on `systemctl restart postgresql`.

Internally, Hydra uses Nix's pool data structure: it basically has N
slots (here DB connections) and whenever a new one is requested, an idle
slot is provided or a new one is created (when N slots are active, it'll
be waited until one slot is free). The issue in the code here is however
that whenever an error is encountered, the slot is released, however the
same broken connection will be reused the next time. By using
`Pool::Handle::markBad`, Nix will drop a broken slot. This is now being
done when `pqxx::broken_connection` was caught.
2024-03-15 14:09:31 +01:00
7b826ec5ad Merge branch 'nix-next' into nix-2.20 2024-01-30 13:26:45 -05:00
fcde5908d8 More CA derivations prep
Again, with care not to change the schema in any way.
2024-01-25 21:32:22 -05:00
7a53b866f6 Merge branch 'master' into nix-next
• Updated input 'nix' (merge):
    'github:NixOS/nix/212ba69e6f995992f8b4e4c0656d19c0156c8714'
    'github:NixOS/nix/2c4bb93ba5a97e7078896ebc36385ce172960e4e' (2024-01-25)
  → 'github:NixOS/nix/8df68a213fc52a57b02a57005b0e06cc8de40ce3' (2024-01-25)
2024-01-25 16:26:07 -05:00
c64eed7d07 Simplify StoreConfig::getDefaultSystemFeatures call
That method is now static.
2024-01-25 15:58:07 -05:00
b1fa6b3aac Use StoreConfig::getDefaultSystemFeatures for default machine config
We have to oddly make a `StoreConfig` subclass to get it, but
https://github.com/NixOS/nix/pull/9848 will fix that.

The purpose of this is to ensure that, absent an explicit config,
`localhost` includes `ca-derivations` and `recursive-nix` if those
experimental features are enabled.

Very much the complement of #1342, the previous PR.
2024-01-24 21:37:13 -05:00
449eb2d873 Use more nix::Machine fields
The upstream fields were made to match Hydra, so we can get rid of the
extra fields temporary added in
70e5469303.
2024-01-24 20:14:31 -05:00
d45e14fd43 Merge pull request #1316 from NixOS/ca-derivations-prep
Prepare for CA derivation support with lower impact changes
2024-01-24 18:12:42 -05:00
70e5469303 Use Nix's Machine type in a mimimal way
This is *just* using the fields from that type, and only where the types
coincide. (There are two fields with different types, `speedFactor` most
interestingly.) No code is reused, so we can be sure that no behavior is
changed.

Once the types are reconciled on the Nix side, then we can start
carefully actually reusing code.

Progress on #1164
2024-01-23 12:18:57 -05:00
2e6ee28f9b Machine -> ::Machine so we don't conflict with Nix's 2024-01-23 11:03:19 -05:00
ebfefb9161 Sync up with some changes done to the main CA branch 2023-12-11 12:46:36 -05:00
9ba4417940 Prepare for CA derivation support with lower impact changes
This is just C++ changes without any Perl / Frontend / SQL Schema
changes.

The idea is that it should be possible to redeploy Hydra with these
chnages with (a) no schema migration and also (b) no regressions. We
should be able to much more safely deploy these to a staging server and
then production `hydra.nixos.org`.

Extracted from #875

Co-Authored-By: Théophane Hufschmitt <theophane.hufschmitt@tweag.io>
Co-Authored-By: Alexander Sosedkin <monk@unboiled.info>
Co-Authored-By: Andrea Ciceri <andrea.ciceri@autistici.org>
Co-Authored-By: Charlotte 🦝 Delenk Mlotte@chir.rs>
Co-Authored-By: Sandro Jäckel <sandro.jaeckel@gmail.com>
2023-12-04 16:14:47 -05:00
c922e73c11 Update to Nix 2.19
Flake lock file updates:

• Updated input 'nix':
    'github:NixOS/nix/f5f4de6a550327b4b1a06123c2e450f1b92c73b6' (2023-10-02)
  → 'github:NixOS/nix/50f8f1c8bc019a4c0fd098b9ac674b94cfc6af0d' (2023-11-27)
2023-11-30 15:26:46 -05:00
d135b123cd Merge pull request #1292 from Ma27/fix-queue-runner-stats
hydra-queue-runner: fix stats
2023-07-17 09:56:05 +02:00
5c35d1be20 hydra-queue-runner: fix stats 2023-06-25 17:28:15 +02:00
9f69bb5c2c Fix compilation against Nix 2.16 2023-06-23 15:06:55 +02:00
5b35e13898 hydra-queue-runner: use initializer lists for constructing JSON
And also fix the parts that were broken
2023-02-04 20:08:27 +01:00
96e36201eb hydra-queue-runner: adapt to nlohmann::json 2023-02-04 20:08:09 +01:00
e1965250b5 Merge pull request #1173 from DeterminateSystems/queue-runner-exporter
hydra-queue-runner metrics
2022-04-07 12:27:33 -04:00
f8dc48f171 hydra-queue-runner: fixup: remove extraneous newline 2022-04-06 17:53:11 -07:00
59ac96a99c Track the number of steps created 2022-04-06 20:23:02 -04:00
1c12c5882f hydra queue runner: instrument the process of loading new builds with prom 2022-04-06 20:18:29 -04:00
5de08d412e queue metrics: refactor the metrics into a struct 2022-04-06 20:00:30 -04:00
46f52b4c4e bring back the working version Cole made 2022-04-06 15:49:38 -04:00
5bff730f2c WIP: I love it when they delete the assignment operator :) 2022-04-06 11:41:40 -07:00
edf3c348f2 hydra-queue-runner: make entire address configurable 2022-04-06 10:59:45 -07:00
33bc60b83c hydra-queue-runner: move exporter back to State::run
It's (arguably) better than risking pinning the thread at 100% due to
the busy `while` loop.
2022-04-06 10:49:14 -07:00
8c5636fe18 hydra-queue-runner: use port 9198 by default
Co-authored-by: Graham Christensen <graham@grahamc.com>
2022-04-02 17:32:14 -07:00
089da272c7 fix build against nix 2.7.0
fix build after such commits as df552ff53e68dff8ca360adbdbea214ece1d08ee
and e862833ec662c1bffbe31b9a229147de391e801a
2022-03-29 15:38:24 -04:00
c64c5f0a7e hydra-queue-runner: rename build-result.hh to hydra-build-result.hh 2022-03-29 15:34:29 -04:00
4789eba92c hydra-queue-runer: split metrics functionality into its own function 2022-03-29 10:55:28 -07:00
928b3b8268 hydra-queue-runner: fix priority of flag over config file 2022-03-29 10:42:07 -07:00
5ddb9a98ca fixup! hydra-queue-runner: log message before and after exporter is started 2022-03-29 08:47:41 -07:00
905a7a7beb hydra-queue-runner: read metrics port from queue_runner_metrics_port config 2022-03-29 08:46:43 -07:00
9cdc5aceed hydra-queue-runner: log message before and after exporter is started
This way, if something goes wrong between the two, it's easier to narrow
down where the issue lies.
2022-03-29 08:41:19 -07:00
8503a7917b fixup! hydra-queue-runner: make registry member of State, configurable metrics port 2022-03-22 13:38:13 -07:00
c0f826b92d hydra-queue-runner: get the listening port from the exposer itself
Otherwise, when the port is randomly chosen (e.g. by specifying no port,
or a port of 0), it will just show that the port is 0 and not the port
that is actually serving the metrics.
2022-03-14 08:41:45 -07:00
52a29d43e6 hydra-queue-runner: make registry member of State, configurable metrics port
Thanks to the updated prometheus-cpp library, specifying a port of 0
will cause it to pick a random (available) port -- ideal for tests.
2022-03-11 11:58:10 -08:00
3bf31bd6a6 hydra-queue-runner: add simple "up" exporter
There are probably better ways to achieve this (and will likely need to
be refactored a bit to support further metrics).
2022-03-10 12:36:58 -08:00
4acaf9c8b0 hydra-queue-runner: don't dispatch until the machines parser has completed one run
Periodically, I have seen tests fail because of out of order queue runner behavior:

    checking the queue for builds > 0...
    loading build 1 (tests:basic:empty_dir)
    aborting unsupported build step '...-empty-dir.drv' (type 'x86_64-linux')
    marking build 1 as failed
    adding new machine ‘localhost’

This patch should prevent the dispatcher from running before any machines are
made available.
2022-02-10 10:54:30 -05:00
abff212d06 Use system-features from the Nix conf in the default machine file
Fix #936
2021-04-28 11:43:04 +02:00
a7d8ee98da Fix build 2021-02-22 15:10:24 +01:00
9cc76f6d69 Fix build with latest Nix
Recently a few internal APIs have changed[1]. The `outputPaths` function
has been removed and a lot of data structures are modeled with
`std::optional` which broke compilation.

This patch updates the code in `hydra-queue-runner` accordingly to make
sure that Hydra compiles again.

[1] https://github.com/NixOS/nix/pull/3883
2020-09-26 23:37:39 +02:00
f8e15bc311 Revive putBytes 2020-08-04 18:25:21 +02:00
4b5813051b unsigned long long -> uint64_t 2020-08-04 11:38:22 +02:00
7d3ba616a9 Fix build 2020-08-04 11:33:29 +02:00
a0e24f446b Remove unused getMemSize() function 2020-07-27 20:40:57 +02:00
d4e4be4fd1 Remove SHA-1 hash from BuildProducts
SHA-1 is deprecated and it will be expensive to compute with the
streaming NAR handler.
2020-07-27 18:24:10 +02:00