115 Commits

Author SHA1 Message Date
Maximilian Bosch
99afff03b0
hydra-queue-runner: drop broken connections from pool
Closes #1336

When restarting postgresql, the connections are still reused in
`hydra-queue-runner` causing errors like this

    main thread: Lost connection to the database server.
    queue monitor: Lost connection to the database server.

and no more builds being processed.

`hydra-evaluator` doesn't have that issue since it crashes right away.
We could let it retry indefinitely as well (see below), but I don't
want to change too much.

If the DB is still unreachable 10s later, the process will stop with a
non-zero exit code because of a missing DB connection. This however
isn't such a big deal because it will be immediately restarted
afterwards. With the current configuration, Hydra will never give up,
but restart (and retry) infinitely. To me that seems reasonable, i.e. to
retry DB connections on a long-running process. If this doesn't work
out, the monitoring should fire anyways because the queue fills up, but
I'm open to discuss that.

Please note that this isn't reproducible with the DB and the queue
runner on the same machine when using `services.hydra-dev`, because of
the `Requires=` dependency `hydra-queue-runner.service` ->
`hydra-init.service` -> `postgresql.service` that causes the queue
runner to be restarted on `systemctl restart postgresql`.

Internally, Hydra uses Nix's pool data structure: it basically has N
slots (here DB connections) and whenever a new one is requested, an idle
slot is provided or a new one is created (when N slots are active, it'll
be waited until one slot is free). The issue in the code here is however
that whenever an error is encountered, the slot is released, however the
same broken connection will be reused the next time. By using
`Pool::Handle::markBad`, Nix will drop a broken slot. This is now being
done when `pqxx::broken_connection` was caught.
2024-03-15 14:09:31 +01:00
John Ericson
449eb2d873 Use more nix::Machine fields
The upstream fields were made to match Hydra, so we can get rid of the
extra fields temporary added in
70e5469303b422bdb4b123be222bdea4d7f9611c.
2024-01-24 20:14:31 -05:00
John Ericson
9e7ac58042 Merge branch 'master' into nix-next 2024-01-24 18:36:03 -05:00
John Ericson
d45e14fd43
Merge pull request #1316 from NixOS/ca-derivations-prep
Prepare for CA derivation support with lower impact changes
2024-01-24 18:12:42 -05:00
John Ericson
9a86da0e7b Merge branch 'master' into nix-next 2024-01-23 15:49:14 -05:00
John Ericson
70e5469303 Use Nix's Machine type in a mimimal way
This is *just* using the fields from that type, and only where the types
coincide. (There are two fields with different types, `speedFactor` most
interestingly.) No code is reused, so we can be sure that no behavior is
changed.

Once the types are reconciled on the Nix side, then we can start
carefully actually reusing code.

Progress on #1164
2024-01-23 12:18:57 -05:00
John Ericson
4ac31c89df Use nix::serv_proto::BasicConnection in build_remote.cc
- Use the type itself

  This lays the foundation for being able to dedup the protocol code.

- Use `BasicConnection::handshake`, replacing ours.

- Use `BasicConnection::queryValidPaths`

- Use `BasicConnection::putBuildDerivationRequest`
2024-01-22 14:20:39 -05:00
John Ericson
69a5b00e60 Use ServeProto::BuildOption
More deduplication with Nix.
2023-12-10 13:01:00 -05:00
John Ericson
9ba4417940 Prepare for CA derivation support with lower impact changes
This is just C++ changes without any Perl / Frontend / SQL Schema
changes.

The idea is that it should be possible to redeploy Hydra with these
chnages with (a) no schema migration and also (b) no regressions. We
should be able to much more safely deploy these to a staging server and
then production `hydra.nixos.org`.

Extracted from #875

Co-Authored-By: Théophane Hufschmitt <theophane.hufschmitt@tweag.io>
Co-Authored-By: Alexander Sosedkin <monk@unboiled.info>
Co-Authored-By: Andrea Ciceri <andrea.ciceri@autistici.org>
Co-Authored-By: Charlotte 🦝 Delenk Mlotte@chir.rs>
Co-Authored-By: Sandro Jäckel <sandro.jaeckel@gmail.com>
2023-12-04 16:14:47 -05:00
John Ericson
104baef503 Document the connection initialization process 2023-12-04 09:42:04 -05:00
John Ericson
67eeabd518 Merge remote-tracking branch 'upstream/master' into split-buildRemote 2023-12-04 09:12:58 -05:00
John Ericson
622c25e3c4 Sedding prior to merge 2023-12-04 08:56:06 -05:00
John Ericson
831a2d9bd5 Merge remote-tracking branch 'upstream/master' into split-buildRemote 2023-11-30 11:27:40 -05:00
John Ericson
3526d61ff2 Merge remote-tracking branch 'upstream/master' into split-buildRemote 2022-10-25 11:24:54 -04:00
Jörg Thalheim
94d19e1972 hydra: fix localhost detection when protocol prefix are used 2022-09-29 20:46:13 +02:00
Graham Christensen
e1965250b5
Merge pull request #1173 from DeterminateSystems/queue-runner-exporter
hydra-queue-runner metrics
2022-04-07 12:27:33 -04:00
Graham Christensen
59ac96a99c Track the number of steps created 2022-04-06 20:23:02 -04:00
Graham Christensen
1c12c5882f hydra queue runner: instrument the process of loading new builds with prom 2022-04-06 20:18:29 -04:00
Graham Christensen
5de08d412e queue metrics: refactor the metrics into a struct 2022-04-06 20:00:30 -04:00
Graham Christensen
46f52b4c4e bring back the working version Cole made 2022-04-06 15:49:38 -04:00
Cole Helbling
5bff730f2c WIP: I love it when they delete the assignment operator :) 2022-04-06 11:41:40 -07:00
Cole Helbling
edf3c348f2 hydra-queue-runner: make entire address configurable 2022-04-06 10:59:45 -07:00
ajs124
089da272c7 fix build against nix 2.7.0
fix build after such commits as df552ff53e68dff8ca360adbdbea214ece1d08ee
and e862833ec662c1bffbe31b9a229147de391e801a
2022-03-29 15:38:24 -04:00
Cole Helbling
4789eba92c hydra-queue-runer: split metrics functionality into its own function 2022-03-29 10:55:28 -07:00
Cole Helbling
928b3b8268 hydra-queue-runner: fix priority of flag over config file 2022-03-29 10:42:07 -07:00
Cole Helbling
905a7a7beb hydra-queue-runner: read metrics port from queue_runner_metrics_port config 2022-03-29 08:46:43 -07:00
Théophane Hufschmitt
b430d41afd Use the BuildOptions more eagerly 2022-03-29 17:04:19 +02:00
Théophane Hufschmitt
365776f5d7 Factor out the building part 2022-03-29 17:04:19 +02:00
Théophane Hufschmitt
5db8642224 Factor out a struct representing a connection to a machine 2022-03-29 16:52:59 +02:00
Cole Helbling
52a29d43e6 hydra-queue-runner: make registry member of State, configurable metrics port
Thanks to the updated prometheus-cpp library, specifying a port of 0
will cause it to pick a random (available) port -- ideal for tests.
2022-03-11 11:58:10 -08:00
Graham Christensen
4acaf9c8b0 hydra-queue-runner: don't dispatch until the machines parser has completed one run
Periodically, I have seen tests fail because of out of order queue runner behavior:

    checking the queue for builds > 0...
    loading build 1 (tests:basic:empty_dir)
    aborting unsupported build step '...-empty-dir.drv' (type 'x86_64-linux')
    marking build 1 as failed
    adding new machine ‘localhost’

This patch should prevent the dispatcher from running before any machines are
made available.
2022-02-10 10:54:30 -05:00
Graham Christensen
72c3110002 queue-runner: track jobsets by ID 2022-01-15 14:06:00 -05:00
Graham Christensen
87d46ad5d6
hydra-queue-runner: --build-one: correctly handle a cached build
Previously, the build ID would never flow through channels which
exited.

This patch tracks the buildOne state as part of State and exits avoids
waiting forever for new work.

The code around buildOnly is a bit rough, making this a bit weird to
implement but since it is only used for testing the value of improving
it on its own is a bit questionable.
2021-03-16 16:13:38 -04:00
Eelco Dolstra
5b4df3ad5a
Get data needed by getBuildOutput() from the incoming NAR in a streaming fashion 2020-07-27 20:38:59 +02:00
Eelco Dolstra
cbcf6359b4
Remove TokenServer in preparation of making NAR copying O(1) memory 2020-07-27 14:57:22 +02:00
Eelco Dolstra
7985757a1d
Fix build 2020-07-08 12:50:02 +02:00
Eelco Dolstra
bb32aafa4a
Fix build 2020-06-23 13:56:44 +02:00
Eelco Dolstra
9727892b61 Don't spam the journal with hydra-queue-runner status dumps
(cherry picked from commit 15ae932488512ba235ed2f6f841cc5eb56ba9314)
2020-03-31 22:19:07 +02:00
Eelco Dolstra
ccd046ca3d Keep track of the number of unsupported steps
(cherry picked from commit 45ffe578b695f9de101b30d44d46f12aa0654f10)
2020-03-31 22:19:03 +02:00
Eelco Dolstra
4417f9f260 Abort unsupported build steps
If we don't see machine that supports a build step for
'max_unsupported_time' seconds, the step is aborted. The default is 0,
which is appropriate for Hydra installations that don't provision
missing machines dynamically.

(cherry picked from commit f5cdbfe21d930db43d3812c7d8e87746d6378ef9)
2020-03-31 22:19:01 +02:00
Eelco Dolstra
adf61e5cf8
Fix build
(cherry picked from commit 639c660abfd5de62ecfcd8d3cbc2eb6924c7ec75)
2020-02-20 10:26:45 +01:00
Eelco Dolstra
e4f5156c41
Build against nix-master
(cherry picked from commit e7f2139e251cb73195eea6fb84e2a6167b4db968)
2020-02-20 10:24:04 +01:00
Eelco Dolstra
d4b4255dd2
hydra-queue-runner: Support running in a NixOS container
In a NixOS container, cmdBuildDerivation doesn't work because we're
not privileged. But we also don't need it because the store already
has the derivation.

Also, don't copy from/to the store since this gives errors about
missing signatures.
2019-09-25 17:26:03 +02:00
Eelco Dolstra
2946899504
Turn hydra-notify into a daemon
It now receives notifications about started/finished builds/steps via
PostgreSQL. This gets rid of the (substantial) overhead of starting
hydra-notify for every event. It also allows other programs (even on
other machines) to listen to Hydra notifications.
2019-08-13 18:18:21 +02:00
Eelco Dolstra
8d26144121
Fix building against nix master 2018-10-30 14:41:21 +01:00
Eelco Dolstra
5a1f2a50e5
Handle derivations with system type 'builtin'
Fixes #540.
2018-03-07 10:22:35 +01:00
Eelco Dolstra
e9670641ec
Distinguish build step states
The web interface now shows whether a build step is connecting,
copying inputs/outputs, building, etc.
2017-12-07 15:35:31 +01:00
Eelco Dolstra
b04dc6c76e
Fix root creation when the root already exists but is owned by another user 2017-10-19 12:28:38 +02:00
Eelco Dolstra
45b138373b
hydra-queue-runner: Write GC roots for outputs paths
We lost this behaviour somewhere. So build outputs could be GC'ed when
running the collector with --option gc-keep-outputs false.
2017-10-12 18:55:38 +02:00
Eelco Dolstra
27103398c9
Make maxLogSize configurable 2017-09-22 15:23:58 +02:00