101 Commits

Author SHA1 Message Date
Pierre Bourdon
2050b2c324 queue-runner: reduce the time between queue monitor restarts
This will induce more DB queries (though these are fairly cheap), but at
the benefit of processing bumps within 1m instead of within 10m.
2025-04-09 11:31:47 -04:00
Pierre Bourdon
21d6d805ba queue-runner: remove id > X from new builds query
Running the query with/without it shows that it makes no difference to
postgres, since there's an index on finished=0 already. This allows a
few simplifications, but also paves the way towards running multiple
parallel monitor threads in the future.
2025-04-09 11:31:47 -04:00
Pierre Bourdon
478bb01f7f queue-runner: add prom metrics to allow detecting internal bottlenecks
By looking at the ratio of running vs. waiting for the dispatcher and
the queue monitor, we should get better visibility into what hydra is
currently bottlenecked on.

There are other side effects we can try to measure to get to the same
result, but having a simple way doesn't cost us much.
2025-04-09 11:31:47 -04:00
John Ericson
9c022848cf Fix the build 2025-04-09 11:31:47 -04:00
John Ericson
21c6afa83b Fix build (due to C++ API changes) 2025-04-09 11:31:47 -04:00
John Ericson
ef7bf1e67b
Merge pull request #1375 from NixOS/nix-2.21
Nix 2.21
2024-04-12 17:28:37 -04:00
Maximilian Bosch
99afff03b0
hydra-queue-runner: drop broken connections from pool
Closes #1336

When restarting postgresql, the connections are still reused in
`hydra-queue-runner` causing errors like this

    main thread: Lost connection to the database server.
    queue monitor: Lost connection to the database server.

and no more builds being processed.

`hydra-evaluator` doesn't have that issue since it crashes right away.
We could let it retry indefinitely as well (see below), but I don't
want to change too much.

If the DB is still unreachable 10s later, the process will stop with a
non-zero exit code because of a missing DB connection. This however
isn't such a big deal because it will be immediately restarted
afterwards. With the current configuration, Hydra will never give up,
but restart (and retry) infinitely. To me that seems reasonable, i.e. to
retry DB connections on a long-running process. If this doesn't work
out, the monitoring should fire anyways because the queue fills up, but
I'm open to discuss that.

Please note that this isn't reproducible with the DB and the queue
runner on the same machine when using `services.hydra-dev`, because of
the `Requires=` dependency `hydra-queue-runner.service` ->
`hydra-init.service` -> `postgresql.service` that causes the queue
runner to be restarted on `systemctl restart postgresql`.

Internally, Hydra uses Nix's pool data structure: it basically has N
slots (here DB connections) and whenever a new one is requested, an idle
slot is provided or a new one is created (when N slots are active, it'll
be waited until one slot is free). The issue in the code here is however
that whenever an error is encountered, the slot is released, however the
same broken connection will be reused the next time. By using
`Pool::Handle::markBad`, Nix will drop a broken slot. This is now being
done when `pqxx::broken_connection` was caught.
2024-03-15 14:09:31 +01:00
Maximilian Bosch
e499509595
Switch to new Nix bindings, update Nix for that
Implements support for Nix's new Perl bindings[1]. The current state
basically does `openStore()`, but always uses `auto` and doesn't support
stores at other URIs.

Even though the stores are cached inside the Perl implementation, I
decided to instantiate those once in the Nix helper module. That way
store openings aren't cluttered across the entire codebase. Also, there
are two stores used later on - MACHINE_LOCAL_STORE for `auto`,
BINARY_CACHE_STORE for the one from `store_uri` in `hydra.conf` - and
using consistent names should make the intent clearer then.

This doesn't contain any behavioral changes, i.e. the build product
availability issue from #1352 isn't fixed. This patch only contains the
migration to the new API.

[1] https://github.com/NixOS/nix/pull/9863
2024-02-12 18:50:56 +01:00
John Ericson
7b826ec5ad Merge branch 'nix-next' into nix-2.20 2024-01-30 13:26:45 -05:00
John Ericson
fcde5908d8 More CA derivations prep
Again, with care not to change the schema in any way.
2024-01-25 21:32:22 -05:00
John Ericson
7a53b866f6 Merge branch 'master' into nix-next
• Updated input 'nix' (merge):
    'github:NixOS/nix/212ba69e6f995992f8b4e4c0656d19c0156c8714'
    'github:NixOS/nix/2c4bb93ba5a97e7078896ebc36385ce172960e4e' (2024-01-25)
  → 'github:NixOS/nix/8df68a213fc52a57b02a57005b0e06cc8de40ce3' (2024-01-25)
2024-01-25 16:26:07 -05:00
John Ericson
07cb5d1b7c Use nix::ParsedDerivation::getRequiredSystemFeatures()
A slight dedup, and also ensures that floating CA derivations require a
`ca-derivations` experimental feature. This fixes the scheduling issue
that @SuperSandro2000 found.
2024-01-24 21:04:14 -05:00
John Ericson
9e7ac58042 Merge branch 'master' into nix-next 2024-01-24 18:36:03 -05:00
John Ericson
89cfe26533 Merge remote-tracking branch 'upstream/master' into nix-next 2024-01-22 13:01:40 -05:00
John Ericson
588a0c5269 Merge remote-tracking branch 'upstream/master' into ca-derivations-prep 2023-12-23 19:19:54 -05:00
John Ericson
75f26f1fc4 Clean up std::optional dereferencing in the queue runner
Instead of doing this partial operation a number of times, assert (with
a comment, get a reference to the thing inside, and use that just once.
(This refactor was done twice, "just once" for each time.)
2023-12-23 19:10:58 -05:00
John Ericson
6e67884ff1 One more queryDerivationOutputMap should use the eval store param 2023-12-11 14:05:18 -05:00
John Ericson
a6b6c5a539 Revert query -- those columns don't exist yet! 2023-12-11 12:58:54 -05:00
John Ericson
ebfefb9161 Sync up with some changes done to the main CA branch 2023-12-11 12:46:36 -05:00
John Ericson
20c8263e3c Update to Nix master
The point of this branch is to always track Nix master, so we are
proactively ready to upgrade to the next Nix release when it is ready.

Flake lock file updates:

• Updated input 'nix':
    'github:NixOS/nix/50f8f1c8bc019a4c0fd098b9ac674b94cfc6af0d' (2023-11-27)
  → 'github:NixOS/nix/c3827ff6348a4d5199eaddf8dbc2ca2e2ef46ec5' (2023-12-07)
• Added input 'nix/libgit2':
    'github:libgit2/libgit2/45fd9ed7ae1a9b74b957ef4f337bc3c8b3df01b5' (2023-10-18)
2023-12-07 13:11:31 -05:00
John Ericson
e3443cd22a Put back nicer copyClosure instead of manual closure + copy
It looks like we accidentally got the old code back, probably after a merge
conflict resolution.
2023-12-04 17:41:11 -05:00
John Ericson
9ba4417940 Prepare for CA derivation support with lower impact changes
This is just C++ changes without any Perl / Frontend / SQL Schema
changes.

The idea is that it should be possible to redeploy Hydra with these
chnages with (a) no schema migration and also (b) no regressions. We
should be able to much more safely deploy these to a staging server and
then production `hydra.nixos.org`.

Extracted from #875

Co-Authored-By: Théophane Hufschmitt <theophane.hufschmitt@tweag.io>
Co-Authored-By: Alexander Sosedkin <monk@unboiled.info>
Co-Authored-By: Andrea Ciceri <andrea.ciceri@autistici.org>
Co-Authored-By: Charlotte 🦝 Delenk Mlotte@chir.rs>
Co-Authored-By: Sandro Jäckel <sandro.jaeckel@gmail.com>
2023-12-04 16:14:47 -05:00
chayleaf
e9da80fff6
support nix 2.18 2023-11-21 18:41:52 +07:00
Eelco Dolstra
9f69bb5c2c Fix compilation against Nix 2.16 2023-06-23 15:06:55 +02:00
Maximilian Bosch
5c01800fbe
flake: Update Nix to 2.9.1
NOTE: I'm well-aware that we have to be careful with this to avoid new
regressions on hydra.nixos.org, so this should only be merged after
extensive testing from more people.

Motivation: I updated Nix in my deployment to 2.9.1 and decided to also
update Hydra in one go (and compile it against the newer Nix). Given
that this also updates the C++ code in `hydra-{queue-runner,eval-jobs}`
this patch might become useful in the future though.
2022-06-16 14:54:57 +02:00
Graham Christensen
e1965250b5
Merge pull request #1173 from DeterminateSystems/queue-runner-exporter
hydra-queue-runner metrics
2022-04-07 12:27:33 -04:00
Graham Christensen
59ac96a99c Track the number of steps created 2022-04-06 20:23:02 -04:00
Graham Christensen
1c12c5882f hydra queue runner: instrument the process of loading new builds with prom 2022-04-06 20:18:29 -04:00
Graham Christensen
5de08d412e queue metrics: refactor the metrics into a struct 2022-04-06 20:00:30 -04:00
Graham Christensen
46f52b4c4e bring back the working version Cole made 2022-04-06 15:49:38 -04:00
Cole Helbling
5bff730f2c WIP: I love it when they delete the assignment operator :) 2022-04-06 11:41:40 -07:00
ajs124
089da272c7 fix build against nix 2.7.0
fix build after such commits as df552ff53e68dff8ca360adbdbea214ece1d08ee
and e862833ec662c1bffbe31b9a229147de391e801a
2022-03-29 15:38:24 -04:00
ajs124
c64c5f0a7e hydra-queue-runner: rename build-result.hh to hydra-build-result.hh 2022-03-29 15:34:29 -04:00
Graham Christensen
3b048ed136 Revert "Revert "Use copyClosure instead of computeFSClosure + copyPaths""
This reverts commit 8e3ada2afcc2dd5153d3ae162afbb0633a570285.
2022-03-29 15:28:47 -04:00
Cole Helbling
8e3ada2afc Revert "Use copyClosure instead of computeFSClosure + copyPaths"
This reverts commit f14c583ce5188903f7c9db6f99c8c3fb42c77416.
2022-03-28 09:54:02 -07:00
John Ericson
f14c583ce5 Use copyClosure instead of computeFSClosure + copyPaths
It is more terse, and in the future it is possible `copyClosure` will
become more sophisticated.
2022-02-19 11:59:17 -05:00
Graham Christensen
72c3110002 queue-runner: track jobsets by ID 2022-01-15 14:06:00 -05:00
Eelco Dolstra
5edb58b314 Fix build 2021-08-10 13:47:16 +02:00
Graham Christensen
87d46ad5d6
hydra-queue-runner: --build-one: correctly handle a cached build
Previously, the build ID would never flow through channels which
exited.

This patch tracks the buildOne state as part of State and exits avoids
waiting forever for new work.

The code around buildOnly is a bit rough, making this a bit weird to
implement but since it is only used for testing the value of improving
it on its own is a bit questionable.
2021-03-16 16:13:38 -04:00
Shea Levy
930f05c38e
Bump Nix version 2021-03-10 12:53:03 -05:00
Maximilian Bosch
9cc76f6d69
Fix build with latest Nix
Recently a few internal APIs have changed[1]. The `outputPaths` function
has been removed and a lot of data structures are modeled with
`std::optional` which broke compilation.

This patch updates the code in `hydra-queue-runner` accordingly to make
sure that Hydra compiles again.

[1] https://github.com/NixOS/nix/pull/3883
2020-09-26 23:37:39 +02:00
Eelco Dolstra
405c52b589
Fix build 2020-08-27 17:46:36 +02:00
Eelco Dolstra
1113c2895a Fix build 2020-08-07 21:42:09 +02:00
Eelco Dolstra
4b5813051b
unsigned long long -> uint64_t 2020-08-04 11:38:22 +02:00
Eelco Dolstra
7d3ba616a9
Fix build 2020-08-04 11:33:29 +02:00
Eelco Dolstra
5b4df3ad5a
Get data needed by getBuildOutput() from the incoming NAR in a streaming fashion 2020-07-27 20:38:59 +02:00
Eelco Dolstra
d4e4be4fd1
Remove SHA-1 hash from BuildProducts
SHA-1 is deprecated and it will be expensive to compute with the
streaming NAR handler.
2020-07-27 18:24:10 +02:00
Eelco Dolstra
7985757a1d
Fix build 2020-07-08 12:50:02 +02:00
Eelco Dolstra
bb32aafa4a
Fix build 2020-06-23 13:56:44 +02:00
Maximilian Bosch
2f9d422172
Fix build against latest Nix 2020-04-07 13:55:38 +02:00