hydra

Author	SHA1	Message	Date
Pierre Bourdon	478bb01f7f	queue-runner: add prom metrics to allow detecting internal bottlenecks By looking at the ratio of running vs. waiting for the dispatcher and the queue monitor, we should get better visibility into what hydra is currently bottlenecked on. There are other side effects we can try to measure to get to the same result, but having a simple way doesn't cost us much.	2025-04-09 11:31:47 -04:00
John Ericson	9c022848cf	Fix the build	2025-04-09 11:31:47 -04:00
John Ericson	21c6afa83b	Fix build (due to C++ API changes)	2025-04-09 11:31:47 -04:00
John Ericson	ef7bf1e67b	Merge pull request #1375 from NixOS/nix-2.21 Nix 2.21	2024-04-12 17:28:37 -04:00
Maximilian Bosch	99afff03b0	hydra-queue-runner: drop broken connections from pool Closes #1336 When restarting postgresql, the connections are still reused in `hydra-queue-runner` causing errors like this main thread: Lost connection to the database server. queue monitor: Lost connection to the database server. and no more builds being processed. `hydra-evaluator` doesn't have that issue since it crashes right away. We could let it retry indefinitely as well (see below), but I don't want to change too much. If the DB is still unreachable 10s later, the process will stop with a non-zero exit code because of a missing DB connection. This however isn't such a big deal because it will be immediately restarted afterwards. With the current configuration, Hydra will never give up, but restart (and retry) infinitely. To me that seems reasonable, i.e. to retry DB connections on a long-running process. If this doesn't work out, the monitoring should fire anyways because the queue fills up, but I'm open to discuss that. Please note that this isn't reproducible with the DB and the queue runner on the same machine when using `services.hydra-dev`, because of the `Requires=` dependency `hydra-queue-runner.service` -> `hydra-init.service` -> `postgresql.service` that causes the queue runner to be restarted on `systemctl restart postgresql`. Internally, Hydra uses Nix's pool data structure: it basically has N slots (here DB connections) and whenever a new one is requested, an idle slot is provided or a new one is created (when N slots are active, it'll be waited until one slot is free). The issue in the code here is however that whenever an error is encountered, the slot is released, however the same broken connection will be reused the next time. By using `Pool::Handle::markBad`, Nix will drop a broken slot. This is now being done when `pqxx::broken_connection` was caught.	2024-03-15 14:09:31 +01:00
Maximilian Bosch	e499509595	Switch to new Nix bindings, update Nix for that Implements support for Nix's new Perl bindings[1]. The current state basically does `openStore()`, but always uses `auto` and doesn't support stores at other URIs. Even though the stores are cached inside the Perl implementation, I decided to instantiate those once in the Nix helper module. That way store openings aren't cluttered across the entire codebase. Also, there are two stores used later on - MACHINE_LOCAL_STORE for `auto`, BINARY_CACHE_STORE for the one from `store_uri` in `hydra.conf` - and using consistent names should make the intent clearer then. This doesn't contain any behavioral changes, i.e. the build product availability issue from #1352 isn't fixed. This patch only contains the migration to the new API. [1] https://github.com/NixOS/nix/pull/9863	2024-02-12 18:50:56 +01:00
John Ericson	7b826ec5ad	Merge branch 'nix-next' into nix-2.20	2024-01-30 13:26:45 -05:00
John Ericson	fcde5908d8	More CA derivations prep Again, with care not to change the schema in any way.	2024-01-25 21:32:22 -05:00
John Ericson	7a53b866f6	Merge branch 'master' into nix-next • Updated input 'nix' (merge): 'github:NixOS/nix/212ba69e6f995992f8b4e4c0656d19c0156c8714' 'github:NixOS/nix/2c4bb93ba5a97e7078896ebc36385ce172960e4e' (2024-01-25) → 'github:NixOS/nix/8df68a213fc52a57b02a57005b0e06cc8de40ce3' (2024-01-25)	2024-01-25 16:26:07 -05:00
John Ericson	07cb5d1b7c	Use `nix::ParsedDerivation::getRequiredSystemFeatures()` A slight dedup, and also ensures that floating CA derivations require a `ca-derivations` experimental feature. This fixes the scheduling issue that @SuperSandro2000 found.	2024-01-24 21:04:14 -05:00
John Ericson	9e7ac58042	Merge branch 'master' into nix-next	2024-01-24 18:36:03 -05:00
John Ericson	89cfe26533	Merge remote-tracking branch 'upstream/master' into nix-next	2024-01-22 13:01:40 -05:00
John Ericson	588a0c5269	Merge remote-tracking branch 'upstream/master' into ca-derivations-prep	2023-12-23 19:19:54 -05:00
John Ericson	75f26f1fc4	Clean up `std::optional` dereferencing in the queue runner Instead of doing this partial operation a number of times, assert (with a comment, get a reference to the thing inside, and use that just once. (This refactor was done twice, "just once" for each time.)	2023-12-23 19:10:58 -05:00
John Ericson	6e67884ff1	One more `queryDerivationOutputMap` should use the eval store param	2023-12-11 14:05:18 -05:00
John Ericson	a6b6c5a539	Revert query -- those columns don't exist yet!	2023-12-11 12:58:54 -05:00
John Ericson	ebfefb9161	Sync up with some changes done to the main CA branch	2023-12-11 12:46:36 -05:00
John Ericson	20c8263e3c	Update to Nix master The point of this branch is to always track Nix master, so we are proactively ready to upgrade to the next Nix release when it is ready. Flake lock file updates: • Updated input 'nix': 'github:NixOS/nix/50f8f1c8bc019a4c0fd098b9ac674b94cfc6af0d' (2023-11-27) → 'github:NixOS/nix/c3827ff6348a4d5199eaddf8dbc2ca2e2ef46ec5' (2023-12-07) • Added input 'nix/libgit2': 'github:libgit2/libgit2/45fd9ed7ae1a9b74b957ef4f337bc3c8b3df01b5' (2023-10-18)	2023-12-07 13:11:31 -05:00
John Ericson	e3443cd22a	Put back nicer `copyClosure` instead of manual closure + copy It looks like we accidentally got the old code back, probably after a merge conflict resolution.	2023-12-04 17:41:11 -05:00
John Ericson	9ba4417940	Prepare for CA derivation support with lower impact changes This is just C++ changes without any Perl / Frontend / SQL Schema changes. The idea is that it should be possible to redeploy Hydra with these chnages with (a) no schema migration and also (b) no regressions. We should be able to much more safely deploy these to a staging server and then production `hydra.nixos.org`. Extracted from #875 Co-Authored-By: Théophane Hufschmitt <theophane.hufschmitt@tweag.io> Co-Authored-By: Alexander Sosedkin <monk@unboiled.info> Co-Authored-By: Andrea Ciceri <andrea.ciceri@autistici.org> Co-Authored-By: Charlotte 🦝 Delenk Mlotte@chir.rs> Co-Authored-By: Sandro Jäckel <sandro.jaeckel@gmail.com>	2023-12-04 16:14:47 -05:00
chayleaf	e9da80fff6	support nix 2.18	2023-11-21 18:41:52 +07:00
Eelco Dolstra	9f69bb5c2c	Fix compilation against Nix 2.16	2023-06-23 15:06:55 +02:00
Maximilian Bosch	5c01800fbe	flake: Update Nix to 2.9.1 NOTE: I'm well-aware that we have to be careful with this to avoid new regressions on hydra.nixos.org, so this should only be merged after extensive testing from more people. Motivation: I updated Nix in my deployment to 2.9.1 and decided to also update Hydra in one go (and compile it against the newer Nix). Given that this also updates the C++ code in `hydra-{queue-runner,eval-jobs}` this patch might become useful in the future though.	2022-06-16 14:54:57 +02:00
Graham Christensen	e1965250b5	Merge pull request #1173 from DeterminateSystems/queue-runner-exporter hydra-queue-runner metrics	2022-04-07 12:27:33 -04:00
Graham Christensen	59ac96a99c	Track the number of steps created	2022-04-06 20:23:02 -04:00
Graham Christensen	1c12c5882f	hydra queue runner: instrument the process of loading new builds with prom	2022-04-06 20:18:29 -04:00
Graham Christensen	5de08d412e	queue metrics: refactor the metrics into a struct	2022-04-06 20:00:30 -04:00
Graham Christensen	46f52b4c4e	bring back the working version Cole made	2022-04-06 15:49:38 -04:00
Cole Helbling	5bff730f2c	WIP: I love it when they delete the assignment operator :)	2022-04-06 11:41:40 -07:00
ajs124	089da272c7	fix build against nix 2.7.0 fix build after such commits as df552ff53e68dff8ca360adbdbea214ece1d08ee and e862833ec662c1bffbe31b9a229147de391e801a	2022-03-29 15:38:24 -04:00
ajs124	c64c5f0a7e	hydra-queue-runner: rename build-result.hh to hydra-build-result.hh	2022-03-29 15:34:29 -04:00
Graham Christensen	3b048ed136	Revert "Revert "Use `copyClosure` instead of `computeFSClosure` + `copyPaths`"" This reverts commit 8e3ada2afcc2dd5153d3ae162afbb0633a570285.	2022-03-29 15:28:47 -04:00
Cole Helbling	8e3ada2afc	Revert "Use `copyClosure` instead of `computeFSClosure` + `copyPaths`" This reverts commit f14c583ce5188903f7c9db6f99c8c3fb42c77416.	2022-03-28 09:54:02 -07:00
John Ericson	f14c583ce5	Use `copyClosure` instead of `computeFSClosure` + `copyPaths` It is more terse, and in the future it is possible `copyClosure` will become more sophisticated.	2022-02-19 11:59:17 -05:00
Graham Christensen	72c3110002	queue-runner: track jobsets by ID	2022-01-15 14:06:00 -05:00
Eelco Dolstra	5edb58b314	Fix build	2021-08-10 13:47:16 +02:00
Graham Christensen	87d46ad5d6	hydra-queue-runner: --build-one: correctly handle a cached build Previously, the build ID would never flow through channels which exited. This patch tracks the buildOne state as part of State and exits avoids waiting forever for new work. The code around buildOnly is a bit rough, making this a bit weird to implement but since it is only used for testing the value of improving it on its own is a bit questionable.	2021-03-16 16:13:38 -04:00
Shea Levy	930f05c38e	Bump Nix version	2021-03-10 12:53:03 -05:00
Maximilian Bosch	9cc76f6d69	Fix build with latest Nix Recently a few internal APIs have changed[1]. The `outputPaths` function has been removed and a lot of data structures are modeled with `std::optional` which broke compilation. This patch updates the code in `hydra-queue-runner` accordingly to make sure that Hydra compiles again. [1] https://github.com/NixOS/nix/pull/3883	2020-09-26 23:37:39 +02:00
Eelco Dolstra	405c52b589	Fix build	2020-08-27 17:46:36 +02:00
Eelco Dolstra	1113c2895a	Fix build	2020-08-07 21:42:09 +02:00
Eelco Dolstra	4b5813051b	unsigned long long -> uint64_t	2020-08-04 11:38:22 +02:00
Eelco Dolstra	7d3ba616a9	Fix build	2020-08-04 11:33:29 +02:00
Eelco Dolstra	5b4df3ad5a	Get data needed by getBuildOutput() from the incoming NAR in a streaming fashion	2020-07-27 20:38:59 +02:00
Eelco Dolstra	d4e4be4fd1	Remove SHA-1 hash from BuildProducts SHA-1 is deprecated and it will be expensive to compute with the streaming NAR handler.	2020-07-27 18:24:10 +02:00
Eelco Dolstra	7985757a1d	Fix build	2020-07-08 12:50:02 +02:00
Eelco Dolstra	bb32aafa4a	Fix build	2020-06-23 13:56:44 +02:00
Maximilian Bosch	2f9d422172	Fix build against latest Nix	2020-04-07 13:55:38 +02:00
Kevin Quick	a055796ef5	Merge branch 'master' into libpqxx_undeprecate	2020-04-01 11:54:41 -07:00
Eelco Dolstra	adf61e5cf8	Fix build (cherry picked from commit 639c660abfd5de62ecfcd8d3cbc2eb6924c7ec75)	2020-02-20 10:26:45 +01:00

1 2

99 Commits