339 Commits

Author SHA1 Message Date
Cole Helbling
928b3b8268 hydra-queue-runner: fix priority of flag over config file 2022-03-29 10:42:07 -07:00
Cole Helbling
5ddb9a98ca fixup! hydra-queue-runner: log message before and after exporter is started 2022-03-29 08:47:41 -07:00
Cole Helbling
905a7a7beb hydra-queue-runner: read metrics port from queue_runner_metrics_port config 2022-03-29 08:46:43 -07:00
Cole Helbling
9cdc5aceed hydra-queue-runner: log message before and after exporter is started
This way, if something goes wrong between the two, it's easier to narrow
down where the issue lies.
2022-03-29 08:41:19 -07:00
Cole Helbling
8503a7917b fixup! hydra-queue-runner: make registry member of State, configurable metrics port 2022-03-22 13:38:13 -07:00
Cole Helbling
c0f826b92d hydra-queue-runner: get the listening port from the exposer itself
Otherwise, when the port is randomly chosen (e.g. by specifying no port,
or a port of 0), it will just show that the port is 0 and not the port
that is actually serving the metrics.
2022-03-14 08:41:45 -07:00
Cole Helbling
52a29d43e6 hydra-queue-runner: make registry member of State, configurable metrics port
Thanks to the updated prometheus-cpp library, specifying a port of 0
will cause it to pick a random (available) port -- ideal for tests.
2022-03-11 11:58:10 -08:00
Cole Helbling
3bf31bd6a6 hydra-queue-runner: add simple "up" exporter
There are probably better ways to achieve this (and will likely need to
be refactored a bit to support further metrics).
2022-03-10 12:36:58 -08:00
Graham Christensen
4acaf9c8b0 hydra-queue-runner: don't dispatch until the machines parser has completed one run
Periodically, I have seen tests fail because of out of order queue runner behavior:

    checking the queue for builds > 0...
    loading build 1 (tests:basic:empty_dir)
    aborting unsupported build step '...-empty-dir.drv' (type 'x86_64-linux')
    marking build 1 as failed
    adding new machine ‘localhost’

This patch should prevent the dispatcher from running before any machines are
made available.
2022-02-10 10:54:30 -05:00
Graham Christensen
f6e86efc9f
Merge pull request #1091 from Ma27/ssh-remote-store-location
hydra-queue-runner: support store URIs declaring an alternate store location
2022-01-24 14:10:54 -05:00
Graham Christensen
3a4ea6e563
Merge pull request #1124 from obsidiansystems/simplify--closure-of-path-set
simplify, `computeFSClosure` can take a set now
2022-01-24 14:09:35 -05:00
Graham Christensen
ba96a13407 Record metrics when getting the closure to localhost 2022-01-21 15:38:05 -05:00
Graham Christensen
7e9e82398d build-remote: copy missing paths from the binary cache to localhost
In a Hydra instance I saw:

    possibly transient failure building ‘/nix/store/X.drv’ on ‘localhost’:
      dependency '/nix/store/Y' of '/nix/store/Y.drv' does not exist,
      and substitution is disabled

This is confusing because the Hydra in question does have substitution enabled.

This instance uses:

  keep-outputs = true
  keep-derivations = true

and an S3 binary cache which is not configured as a substituter in the nix.conf.

It appears this instance encountered a situation where store path Y was built
and present in the binary cache, and Y.drv was GC rooted on the instance,
however Y was not on the host.

When Hydra would try to build this path locally, it would look in the binary
cache to see if it was cached:

    (nix)
    439      bool valid = isValidPathUncached(storePath);
    440
    441      if (diskCache && !valid)
    442          // FIXME: handle valid = true case.
    443          diskCache->upsertNarInfo(getUri(), hashPart, 0);
    444
    445      return valid;

Since it was cached, the store path was considered Valid.

The queue monitor would then not put this input in for substitution, because
the path is valid:

    (hydra)
    470          if (!destStore->isValidPath(*i.second.path(*localStore, step->drv->name, i.first))) {
    471              valid = false;
    472              missing.insert_or_assign(i.first, i.second);
    473          }

Hydra appears to correctly handle the case of missing paths that need
to be substituted from the binary cache already, but since most
Hydra instances use `keep-outputs` *and* all paths in the binary cache
originate from that machine, it is not common for a path to be cached
and not GC rooted locally.

I'll run Hydra with this patch for a while and see if we run in to the
problem again.

A big thanks to John Ericson who helped debug this particular issue.
2022-01-21 15:26:45 -05:00
John Ericson
e7a1ae87aa simplify, computeFSClosure can take a set now 2022-01-20 14:53:01 -05:00
Graham Christensen
72c3110002 queue-runner: track jobsets by ID 2022-01-15 14:06:00 -05:00
Maximilian Bosch
a18b487403
hydra-queue-runner: support store URIs declaring an alternate store location
When having a builder like this in `/etc/nix/machines`

    ssh://mfbuild?remote-store=/home/bosch/store

Hydra cannot build there since it tries to pass the entire value to
`ssh(1)` which doesn't work. Also, an alternate store-location is e.g.
used if the user isn't a trusted user on the remote system and thus
cannot use `/nix/store`.

If such a URI is given, Hydra will now add a `--store /home/bosch/store`
to the `ssh`-command to select the appropriate location remotely.
2022-01-12 15:56:05 +01:00
Eelco Dolstra
5edb58b314 Fix build 2021-08-10 13:47:16 +02:00
regnat
abff212d06 Use system-features from the Nix conf in the default machine file
Fix #936
2021-04-28 11:43:04 +02:00
Maximilian Bosch
2808227eb7
Fix std::bad_alloc errors for remote builds
In Nix the protocol was slightly altered[1] to also contain more
information about realisations. This however wasn't read from the pipe
that was used to read from the store.

After the `cmdBuildDerivation` command which caused this issue, Hydra
will issue a `cmdQueryPathInfos` that tries to read from the remote
store as well. However, there's still left over to read from the
previous command and thus Nix fails to properly allocate the expected
string.

[1] See rev a2b69660a9b326b95d48bd222993c5225bbd5b5f

Fixes #898
2021-04-15 15:16:52 +02:00
regnat
26ffd4a93e Fix build with latest master 2021-04-08 17:11:15 +02:00
Graham Christensen
87d46ad5d6
hydra-queue-runner: --build-one: correctly handle a cached build
Previously, the build ID would never flow through channels which
exited.

This patch tracks the buildOne state as part of State and exits avoids
waiting forever for new work.

The code around buildOnly is a bit rough, making this a bit weird to
implement but since it is only used for testing the value of improving
it on its own is a bit questionable.
2021-03-16 16:13:38 -04:00
Shea Levy
930f05c38e
Bump Nix version 2021-03-10 12:53:03 -05:00
Graham Christensen
68ac64dbd9
Merge pull request #832 from wizeman/fix-hash-mismatch
Fix persistent hash mismatch errors when importing
2021-03-02 16:04:23 -05:00
regnat
f602ed0d86 Remove the sendDerivation logic from the builder
The queue runner used to special-case `localhost` as a remote builder:
Rather than using the normal remote-build (using the
`cmdBuildDerivation` command), it was using the (generally less
efficient, except when running against localhost) `cmdBuildPaths`
command because the latter didn't require a privileged Nix user (so made
testing easier − allowing to run hydra in a container in particular).

However:
1. this means that the build loop can follow two discint code paths depending
   on the setup, the irony being that the most commonly used one in production
   (the “non-localhost” case) isn't the one used in the testsuite (because all
   the tests run against a local store);
2. It turns out that the “localhost” version is buggy in relatively obvious
   ways − in particular a failure in a fixed-output derivation or a hash
   mismatch isn't reported properly;
3. If the “run in a container” use-case is indeed that important, it can be
   (partially) restored using a chroot store (which wouldn't behave excactly
   the same way of course, but would be more than good-enough for testing)
2021-02-23 09:50:15 +01:00
Eelco Dolstra
a7d8ee98da Fix build 2021-02-22 15:10:24 +01:00
Orivej Desh
34a856c7ab Update for receiveContents taking string_view
nix change: https://github.com/NixOS/nix/commit/faa31f40
2020-12-26 11:23:26 +00:00
Ricardo M. Correia
f47749a62d Fix persistent hash mismatch errors when importing
This would start happening if the network connection between the Hydra
server and the remote build server breaks after sucessfully importing
at least one output of a derivation, but before having finished
importing all outputs.

Fixes #816.
2020-11-10 04:50:35 +01:00
Eelco Dolstra
73dfef364b Copy deriver field to the binary cache
Fixes https://github.com/NixOS/nixos-org-configurations/issues/129.
2020-11-02 17:08:02 +01:00
Eelco Dolstra
4e05acc471 Fix localhost builds 2020-10-20 12:11:46 +02:00
Eelco Dolstra
6cd2bb6954 Fix build 2020-10-18 21:01:06 +02:00
Maximilian Bosch
9cc76f6d69
Fix build with latest Nix
Recently a few internal APIs have changed[1]. The `outputPaths` function
has been removed and a lot of data structures are modeled with
`std::optional` which broke compilation.

This patch updates the code in `hydra-queue-runner` accordingly to make
sure that Hydra compiles again.

[1] https://github.com/NixOS/nix/pull/3883
2020-09-26 23:37:39 +02:00
Eelco Dolstra
405c52b589
Fix build 2020-08-27 17:46:36 +02:00
Eelco Dolstra
1113c2895a Fix build 2020-08-07 21:42:09 +02:00
Eelco Dolstra
f8e15bc311 Revive putBytes 2020-08-04 18:25:21 +02:00
Eelco Dolstra
4b5813051b
unsigned long long -> uint64_t 2020-08-04 11:38:22 +02:00
Eelco Dolstra
7d3ba616a9
Fix build 2020-08-04 11:33:29 +02:00
Eelco Dolstra
77c33c1d71
Restore NoCheckSigs
https://github.com/NixOS/nixpkgs/pull/93945#issuecomment-668244478
2020-08-04 10:53:06 +02:00
Eelco Dolstra
8722927c08
Copy paths in the right order 2020-07-28 13:46:57 +02:00
Eelco Dolstra
a0e24f446b
Remove unused getMemSize() function 2020-07-27 20:40:57 +02:00
Eelco Dolstra
5b4df3ad5a
Get data needed by getBuildOutput() from the incoming NAR in a streaming fashion 2020-07-27 20:38:59 +02:00
Eelco Dolstra
d4e4be4fd1
Remove SHA-1 hash from BuildProducts
SHA-1 is deprecated and it will be expensive to compute with the
streaming NAR handler.
2020-07-27 18:24:10 +02:00
Eelco Dolstra
7622cbfe37
buildRemote(): Copy paths to the destination store in O(1) memory 2020-07-27 18:11:04 +02:00
Eelco Dolstra
cbcf6359b4
Remove TokenServer in preparation of making NAR copying O(1) memory 2020-07-27 14:57:22 +02:00
Eelco Dolstra
e5f6fc2e4e
Quick hack to fix compilation 2020-07-27 14:53:43 +02:00
Eelco Dolstra
7985757a1d
Fix build 2020-07-08 12:50:02 +02:00
Eelco Dolstra
bb32aafa4a
Fix build 2020-06-23 13:56:44 +02:00
Ben Wolsieffer
f020f7efef
hydra-queue-runner: don't try to distribute builds on localhost 2020-05-03 00:05:52 -04:00
Graham Christensen
7b705758ec
Merge pull request #732 from Ma27/fix-build
Fix build against latest Nix
2020-04-09 09:02:45 -04:00
Bas van Dijk
6e358189ad Separate the build IDs in the build_finished payload with tabs
hydra-notify splits the payload on tabs so we shouldn't separate the
IDs with spaces.
2020-04-08 12:05:25 +02:00
Maximilian Bosch
2f9d422172
Fix build against latest Nix 2020-04-07 13:55:38 +02:00