hydra

Author	SHA1	Message	Date
Maximilian Bosch	99afff03b0	hydra-queue-runner: drop broken connections from pool Closes #1336 When restarting postgresql, the connections are still reused in `hydra-queue-runner` causing errors like this main thread: Lost connection to the database server. queue monitor: Lost connection to the database server. and no more builds being processed. `hydra-evaluator` doesn't have that issue since it crashes right away. We could let it retry indefinitely as well (see below), but I don't want to change too much. If the DB is still unreachable 10s later, the process will stop with a non-zero exit code because of a missing DB connection. This however isn't such a big deal because it will be immediately restarted afterwards. With the current configuration, Hydra will never give up, but restart (and retry) infinitely. To me that seems reasonable, i.e. to retry DB connections on a long-running process. If this doesn't work out, the monitoring should fire anyways because the queue fills up, but I'm open to discuss that. Please note that this isn't reproducible with the DB and the queue runner on the same machine when using `services.hydra-dev`, because of the `Requires=` dependency `hydra-queue-runner.service` -> `hydra-init.service` -> `postgresql.service` that causes the queue runner to be restarted on `systemctl restart postgresql`. Internally, Hydra uses Nix's pool data structure: it basically has N slots (here DB connections) and whenever a new one is requested, an idle slot is provided or a new one is created (when N slots are active, it'll be waited until one slot is free). The issue in the code here is however that whenever an error is encountered, the slot is released, however the same broken connection will be reused the next time. By using `Pool::Handle::markBad`, Nix will drop a broken slot. This is now being done when `pqxx::broken_connection` was caught.	2024-03-15 14:09:31 +01:00
John Ericson	449eb2d873	Use more `nix::Machine` fields The upstream fields were made to match Hydra, so we can get rid of the extra fields temporary added in 70e5469303b422bdb4b123be222bdea4d7f9611c.	2024-01-24 20:14:31 -05:00
John Ericson	9e7ac58042	Merge branch 'master' into nix-next	2024-01-24 18:36:03 -05:00
John Ericson	d45e14fd43	Merge pull request #1316 from NixOS/ca-derivations-prep Prepare for CA derivation support with lower impact changes	2024-01-24 18:12:42 -05:00
John Ericson	9a86da0e7b	Merge branch 'master' into nix-next	2024-01-23 15:49:14 -05:00
John Ericson	70e5469303	Use Nix's `Machine` type in a mimimal way This is just using the fields from that type, and only where the types coincide. (There are two fields with different types, `speedFactor` most interestingly.) No code is reused, so we can be sure that no behavior is changed. Once the types are reconciled on the Nix side, then we can start carefully actually reusing code. Progress on #1164	2024-01-23 12:18:57 -05:00
John Ericson	4ac31c89df	Use `nix::serv_proto::BasicConnection` in build_remote.cc - Use the type itself This lays the foundation for being able to dedup the protocol code. - Use `BasicConnection::handshake`, replacing ours. - Use `BasicConnection::queryValidPaths` - Use `BasicConnection::putBuildDerivationRequest`	2024-01-22 14:20:39 -05:00
John Ericson	69a5b00e60	Use `ServeProto::BuildOption` More deduplication with Nix.	2023-12-10 13:01:00 -05:00
John Ericson	9ba4417940	Prepare for CA derivation support with lower impact changes This is just C++ changes without any Perl / Frontend / SQL Schema changes. The idea is that it should be possible to redeploy Hydra with these chnages with (a) no schema migration and also (b) no regressions. We should be able to much more safely deploy these to a staging server and then production `hydra.nixos.org`. Extracted from #875 Co-Authored-By: Théophane Hufschmitt <theophane.hufschmitt@tweag.io> Co-Authored-By: Alexander Sosedkin <monk@unboiled.info> Co-Authored-By: Andrea Ciceri <andrea.ciceri@autistici.org> Co-Authored-By: Charlotte 🦝 Delenk Mlotte@chir.rs> Co-Authored-By: Sandro Jäckel <sandro.jaeckel@gmail.com>	2023-12-04 16:14:47 -05:00
John Ericson	104baef503	Document the connection initialization process	2023-12-04 09:42:04 -05:00
John Ericson	67eeabd518	Merge remote-tracking branch 'upstream/master' into split-buildRemote	2023-12-04 09:12:58 -05:00
John Ericson	622c25e3c4	Sedding prior to merge	2023-12-04 08:56:06 -05:00
John Ericson	831a2d9bd5	Merge remote-tracking branch 'upstream/master' into split-buildRemote	2023-11-30 11:27:40 -05:00
John Ericson	3526d61ff2	Merge remote-tracking branch 'upstream/master' into split-buildRemote	2022-10-25 11:24:54 -04:00
Jörg Thalheim	94d19e1972	hydra: fix localhost detection when protocol prefix are used	2022-09-29 20:46:13 +02:00
Graham Christensen	e1965250b5	Merge pull request #1173 from DeterminateSystems/queue-runner-exporter hydra-queue-runner metrics	2022-04-07 12:27:33 -04:00
Graham Christensen	59ac96a99c	Track the number of steps created	2022-04-06 20:23:02 -04:00
Graham Christensen	1c12c5882f	hydra queue runner: instrument the process of loading new builds with prom	2022-04-06 20:18:29 -04:00
Graham Christensen	5de08d412e	queue metrics: refactor the metrics into a struct	2022-04-06 20:00:30 -04:00
Graham Christensen	46f52b4c4e	bring back the working version Cole made	2022-04-06 15:49:38 -04:00
Cole Helbling	5bff730f2c	WIP: I love it when they delete the assignment operator :)	2022-04-06 11:41:40 -07:00
Cole Helbling	edf3c348f2	hydra-queue-runner: make entire address configurable	2022-04-06 10:59:45 -07:00
ajs124	089da272c7	fix build against nix 2.7.0 fix build after such commits as df552ff53e68dff8ca360adbdbea214ece1d08ee and e862833ec662c1bffbe31b9a229147de391e801a	2022-03-29 15:38:24 -04:00
Cole Helbling	4789eba92c	hydra-queue-runer: split metrics functionality into its own function	2022-03-29 10:55:28 -07:00
Cole Helbling	928b3b8268	hydra-queue-runner: fix priority of flag over config file	2022-03-29 10:42:07 -07:00
Cole Helbling	905a7a7beb	hydra-queue-runner: read metrics port from `queue_runner_metrics_port` config	2022-03-29 08:46:43 -07:00
Théophane Hufschmitt	b430d41afd	Use the `BuildOptions` more eagerly	2022-03-29 17:04:19 +02:00
Théophane Hufschmitt	365776f5d7	Factor out the building part	2022-03-29 17:04:19 +02:00
Théophane Hufschmitt	5db8642224	Factor out a struct representing a connection to a machine	2022-03-29 16:52:59 +02:00
Cole Helbling	52a29d43e6	hydra-queue-runner: make registry member of State, configurable metrics port Thanks to the updated prometheus-cpp library, specifying a port of 0 will cause it to pick a random (available) port -- ideal for tests.	2022-03-11 11:58:10 -08:00
Graham Christensen	4acaf9c8b0	hydra-queue-runner: don't dispatch until the machines parser has completed one run Periodically, I have seen tests fail because of out of order queue runner behavior: checking the queue for builds > 0... loading build 1 (tests:basic:empty_dir) aborting unsupported build step '...-empty-dir.drv' (type 'x86_64-linux') marking build 1 as failed adding new machine ‘localhost’ This patch should prevent the dispatcher from running before any machines are made available.	2022-02-10 10:54:30 -05:00
Graham Christensen	72c3110002	queue-runner: track jobsets by ID	2022-01-15 14:06:00 -05:00
Graham Christensen	87d46ad5d6	hydra-queue-runner: --build-one: correctly handle a cached build Previously, the build ID would never flow through channels which exited. This patch tracks the buildOne state as part of State and exits avoids waiting forever for new work. The code around buildOnly is a bit rough, making this a bit weird to implement but since it is only used for testing the value of improving it on its own is a bit questionable.	2021-03-16 16:13:38 -04:00
Eelco Dolstra	5b4df3ad5a	Get data needed by getBuildOutput() from the incoming NAR in a streaming fashion	2020-07-27 20:38:59 +02:00
Eelco Dolstra	cbcf6359b4	Remove TokenServer in preparation of making NAR copying O(1) memory	2020-07-27 14:57:22 +02:00
Eelco Dolstra	7985757a1d	Fix build	2020-07-08 12:50:02 +02:00
Eelco Dolstra	bb32aafa4a	Fix build	2020-06-23 13:56:44 +02:00
Eelco Dolstra	9727892b61	Don't spam the journal with hydra-queue-runner status dumps (cherry picked from commit 15ae932488512ba235ed2f6f841cc5eb56ba9314)	2020-03-31 22:19:07 +02:00
Eelco Dolstra	ccd046ca3d	Keep track of the number of unsupported steps (cherry picked from commit 45ffe578b695f9de101b30d44d46f12aa0654f10)	2020-03-31 22:19:03 +02:00
Eelco Dolstra	4417f9f260	Abort unsupported build steps If we don't see machine that supports a build step for 'max_unsupported_time' seconds, the step is aborted. The default is 0, which is appropriate for Hydra installations that don't provision missing machines dynamically. (cherry picked from commit f5cdbfe21d930db43d3812c7d8e87746d6378ef9)	2020-03-31 22:19:01 +02:00
Eelco Dolstra	adf61e5cf8	Fix build (cherry picked from commit 639c660abfd5de62ecfcd8d3cbc2eb6924c7ec75)	2020-02-20 10:26:45 +01:00
Eelco Dolstra	e4f5156c41	Build against nix-master (cherry picked from commit e7f2139e251cb73195eea6fb84e2a6167b4db968)	2020-02-20 10:24:04 +01:00
Eelco Dolstra	d4b4255dd2	hydra-queue-runner: Support running in a NixOS container In a NixOS container, cmdBuildDerivation doesn't work because we're not privileged. But we also don't need it because the store already has the derivation. Also, don't copy from/to the store since this gives errors about missing signatures.	2019-09-25 17:26:03 +02:00
Eelco Dolstra	2946899504	Turn hydra-notify into a daemon It now receives notifications about started/finished builds/steps via PostgreSQL. This gets rid of the (substantial) overhead of starting hydra-notify for every event. It also allows other programs (even on other machines) to listen to Hydra notifications.	2019-08-13 18:18:21 +02:00
Eelco Dolstra	8d26144121	Fix building against nix master	2018-10-30 14:41:21 +01:00
Eelco Dolstra	5a1f2a50e5	Handle derivations with system type 'builtin' Fixes #540.	2018-03-07 10:22:35 +01:00
Eelco Dolstra	e9670641ec	Distinguish build step states The web interface now shows whether a build step is connecting, copying inputs/outputs, building, etc.	2017-12-07 15:35:31 +01:00
Eelco Dolstra	b04dc6c76e	Fix root creation when the root already exists but is owned by another user	2017-10-19 12:28:38 +02:00
Eelco Dolstra	45b138373b	hydra-queue-runner: Write GC roots for outputs paths We lost this behaviour somewhere. So build outputs could be GC'ed when running the collector with --option gc-keep-outputs false.	2017-10-12 18:55:38 +02:00
Eelco Dolstra	27103398c9	Make maxLogSize configurable	2017-09-22 15:23:58 +02:00

1 2 3

115 Commits