hydra

Author	SHA1	Message	Date
Linus Heckemann	b23431a657	Support Nix 2.17	2023-08-04 15:53:48 +02:00
Eelco Dolstra	9f69bb5c2c	Fix compilation against Nix 2.16	2023-06-23 15:06:55 +02:00
Eelco Dolstra	44e1efff7f	Send the right nix-serve client version We were using protocol version 6 but requesting version 4. The only reason that this worked was because of a broken version check in 'nix-store --serve'. That was fixed in `c2d7456926`, which had the side-effect of breaking hydra-queue-runner.	2022-09-08 11:51:13 +02:00
Eelco Dolstra	bcaad1c934	openConnection(): Don't throw exceptions in forked child On hydra.nixos.org the queue runner had child processes that were stuck handling an exception: Thread 1 (Thread 0x7f501f7fe640 (LWP 1413473) "bld~v54h5zkhmb3"): #0 futex_wait (private=0, expected=2, futex_word=0x7f50c27969b0 <_rtld_local+2480>) at ../sysdeps/nptl/futex-internal.h:146 #1 __lll_lock_wait (futex=0x7f50c27969b0 <_rtld_local+2480>, private=0) at lowlevellock.c:52 #2 0x00007f50c21eaee4 in __GI___pthread_mutex_lock (mutex=0x7f50c27969b0 <_rtld_local+2480>) at ../nptl/pthread_mutex_lock.c:115 #3 0x00007f50c1854bef in __GI___dl_iterate_phdr (callback=0x7f50c190c020 <_Unwind_IteratePhdrCallback>, data=0x7f501f7fb040) at dl-iteratephdr.c:40 #4 0x00007f50c190d2d1 in _Unwind_Find_FDE () from /nix/store/65hafbsx91127farbmyyv4r5ifgjdg43-glibc-2.33-117/lib/libgcc_s.so.1 #5 0x00007f50c19099b3 in uw_frame_state_for () from /nix/store/65hafbsx91127farbmyyv4r5ifgjdg43-glibc-2.33-117/lib/libgcc_s.so.1 #6 0x00007f50c190ab90 in uw_init_context_1 () from /nix/store/65hafbsx91127farbmyyv4r5ifgjdg43-glibc-2.33-117/lib/libgcc_s.so.1 #7 0x00007f50c190b08e in _Unwind_RaiseException () from /nix/store/65hafbsx91127farbmyyv4r5ifgjdg43-glibc-2.33-117/lib/libgcc_s.so.1 #8 0x00007f50c1b02ab7 in __cxa_throw () from /nix/store/dd8swlwhpdhn6bv219562vyxhi8278hs-gcc-10.3.0-lib/lib/libstdc++.so.6 #9 0x00007f50c1d01abe in nix::parseURL (url="root@cb893012.packethost.net") at src/libutil/url.cc:53 #10 0x0000000000484f55 in extraStoreArgs (machine="root@cb893012.packethost.net") at build-remote.cc:35 #11 operator() (__closure=0x7f4fe9fe0420) at build-remote.cc:79 ... Maybe the fork happened while another thread was holding some global stack unwinding lock (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744). Anyway, since the hanging child inherits all file descriptors to SSH clients, shutting down remote builds (via 'child.to = -1' in State::buildRemote()) doesn't work and 'child.pid.wait()' hangs forever. So let's not do any significant work between fork and exec.	2022-03-30 22:39:48 +02:00
ajs124	089da272c7	fix build against nix 2.7.0 fix build after such commits as df552ff53e68dff8ca360adbdbea214ece1d08ee and e862833ec662c1bffbe31b9a229147de391e801a	2022-03-29 15:38:24 -04:00
Graham Christensen	3b048ed136	Revert "Revert "Use `copyClosure` instead of `computeFSClosure` + `copyPaths`"" This reverts commit `8e3ada2afc`.	2022-03-29 15:28:47 -04:00
Cole Helbling	8e3ada2afc	Revert "Use `copyClosure` instead of `computeFSClosure` + `copyPaths`" This reverts commit `f14c583ce5`.	2022-03-28 09:54:02 -07:00
Eelco Dolstra	962bf36939	Merge pull request #1162 from obsidiansystems/less-ref Make `copyClosureTo` take a regular C++ ref to the store	2022-03-23 16:25:59 +01:00
John Ericson	445bba337b	Make `copyClosureTo` take a regular C++ ref to the store This is syntactically lighter wait, and demonstates there are no weird dynamic lifetimes involved, just regular passing reference to callee which it only borrows for the duration of the call.	2022-02-20 17:22:43 +00:00
John Ericson	f14c583ce5	Use `copyClosure` instead of `computeFSClosure` + `copyPaths` It is more terse, and in the future it is possible `copyClosure` will become more sophisticated.	2022-02-19 11:59:17 -05:00
Graham Christensen	f6e86efc9f	Merge pull request #1091 from Ma27/ssh-remote-store-location hydra-queue-runner: support store URIs declaring an alternate store location	2022-01-24 14:10:54 -05:00
Graham Christensen	3a4ea6e563	Merge pull request #1124 from obsidiansystems/simplify--closure-of-path-set simplify, `computeFSClosure` can take a set now	2022-01-24 14:09:35 -05:00
Graham Christensen	ba96a13407	Record metrics when getting the closure to localhost	2022-01-21 15:38:05 -05:00
Graham Christensen	7e9e82398d	build-remote: copy missing paths from the binary cache to localhost In a Hydra instance I saw: possibly transient failure building ‘/nix/store/X.drv’ on ‘localhost’: dependency '/nix/store/Y' of '/nix/store/Y.drv' does not exist, and substitution is disabled This is confusing because the Hydra in question does have substitution enabled. This instance uses: keep-outputs = true keep-derivations = true and an S3 binary cache which is not configured as a substituter in the nix.conf. It appears this instance encountered a situation where store path Y was built and present in the binary cache, and Y.drv was GC rooted on the instance, however Y was not on the host. When Hydra would try to build this path locally, it would look in the binary cache to see if it was cached: (nix) 439 bool valid = isValidPathUncached(storePath); 440 441 if (diskCache && !valid) 442 // FIXME: handle valid = true case. 443 diskCache->upsertNarInfo(getUri(), hashPart, 0); 444 445 return valid; Since it was cached, the store path was considered Valid. The queue monitor would then not put this input in for substitution, because the path is valid: (hydra) 470 if (!destStore->isValidPath(i.second.path(localStore, step->drv->name, i.first))) { 471 valid = false; 472 missing.insert_or_assign(i.first, i.second); 473 } Hydra appears to correctly handle the case of missing paths that need to be substituted from the binary cache already, but since most Hydra instances use `keep-outputs` and all paths in the binary cache originate from that machine, it is not common for a path to be cached and not GC rooted locally. I'll run Hydra with this patch for a while and see if we run in to the problem again. A big thanks to John Ericson who helped debug this particular issue.	2022-01-21 15:26:45 -05:00
John Ericson	e7a1ae87aa	simplify, `computeFSClosure` can take a set now	2022-01-20 14:53:01 -05:00
Maximilian Bosch	a18b487403	hydra-queue-runner: support store URIs declaring an alternate store location When having a builder like this in `/etc/nix/machines` ssh://mfbuild?remote-store=/home/bosch/store Hydra cannot build there since it tries to pass the entire value to `ssh(1)` which doesn't work. Also, an alternate store-location is e.g. used if the user isn't a trusted user on the remote system and thus cannot use `/nix/store`. If such a URI is given, Hydra will now add a `--store /home/bosch/store` to the `ssh`-command to select the appropriate location remotely.	2022-01-12 15:56:05 +01:00
Eelco Dolstra	5edb58b314	Fix build	2021-08-10 13:47:16 +02:00
Maximilian Bosch	2808227eb7	Fix `std::bad_alloc` errors for remote builds In Nix the protocol was slightly altered[1] to also contain more information about realisations. This however wasn't read from the pipe that was used to read from the store. After the `cmdBuildDerivation` command which caused this issue, Hydra will issue a `cmdQueryPathInfos` that tries to read from the remote store as well. However, there's still left over to read from the previous command and thus Nix fails to properly allocate the expected string. [1] See rev a2b69660a9b326b95d48bd222993c5225bbd5b5f Fixes #898	2021-04-15 15:16:52 +02:00
regnat	26ffd4a93e	Fix build with latest master	2021-04-08 17:11:15 +02:00
Shea Levy	930f05c38e	Bump Nix version	2021-03-10 12:53:03 -05:00
Graham Christensen	68ac64dbd9	Merge pull request #832 from wizeman/fix-hash-mismatch Fix persistent hash mismatch errors when importing	2021-03-02 16:04:23 -05:00
regnat	f602ed0d86	Remove the `sendDerivation` logic from the builder The queue runner used to special-case `localhost` as a remote builder: Rather than using the normal remote-build (using the `cmdBuildDerivation` command), it was using the (generally less efficient, except when running against localhost) `cmdBuildPaths` command because the latter didn't require a privileged Nix user (so made testing easier − allowing to run hydra in a container in particular). However: 1. this means that the build loop can follow two discint code paths depending on the setup, the irony being that the most commonly used one in production (the “non-localhost” case) isn't the one used in the testsuite (because all the tests run against a local store); 2. It turns out that the “localhost” version is buggy in relatively obvious ways − in particular a failure in a fixed-output derivation or a hash mismatch isn't reported properly; 3. If the “run in a container” use-case is indeed that important, it can be (partially) restored using a chroot store (which wouldn't behave excactly the same way of course, but would be more than good-enough for testing)	2021-02-23 09:50:15 +01:00
Ricardo M. Correia	f47749a62d	Fix persistent hash mismatch errors when importing This would start happening if the network connection between the Hydra server and the remote build server breaks after sucessfully importing at least one output of a derivation, but before having finished importing all outputs. Fixes #816.	2020-11-10 04:50:35 +01:00
Eelco Dolstra	73dfef364b	Copy deriver field to the binary cache Fixes https://github.com/NixOS/nixos-org-configurations/issues/129.	2020-11-02 17:08:02 +01:00
Eelco Dolstra	4e05acc471	Fix localhost builds	2020-10-20 12:11:46 +02:00
Eelco Dolstra	6cd2bb6954	Fix build	2020-10-18 21:01:06 +02:00
Maximilian Bosch	9cc76f6d69	Fix build with latest Nix Recently a few internal APIs have changed[1]. The `outputPaths` function has been removed and a lot of data structures are modeled with `std::optional` which broke compilation. This patch updates the code in `hydra-queue-runner` accordingly to make sure that Hydra compiles again. [1] https://github.com/NixOS/nix/pull/3883	2020-09-26 23:37:39 +02:00
Eelco Dolstra	405c52b589	Fix build	2020-08-27 17:46:36 +02:00
Eelco Dolstra	1113c2895a	Fix build	2020-08-07 21:42:09 +02:00
Eelco Dolstra	7d3ba616a9	Fix build	2020-08-04 11:33:29 +02:00
Eelco Dolstra	77c33c1d71	Restore NoCheckSigs https://github.com/NixOS/nixpkgs/pull/93945#issuecomment-668244478	2020-08-04 10:53:06 +02:00
Eelco Dolstra	8722927c08	Copy paths in the right order	2020-07-28 13:46:57 +02:00
Eelco Dolstra	5b4df3ad5a	Get data needed by getBuildOutput() from the incoming NAR in a streaming fashion	2020-07-27 20:38:59 +02:00
Eelco Dolstra	7622cbfe37	buildRemote(): Copy paths to the destination store in O(1) memory	2020-07-27 18:11:04 +02:00
Eelco Dolstra	cbcf6359b4	Remove TokenServer in preparation of making NAR copying O(1) memory	2020-07-27 14:57:22 +02:00
Eelco Dolstra	e5f6fc2e4e	Quick hack to fix compilation	2020-07-27 14:53:43 +02:00
Eelco Dolstra	bb32aafa4a	Fix build	2020-06-23 13:56:44 +02:00
Ben Wolsieffer	f020f7efef	hydra-queue-runner: don't try to distribute builds on localhost	2020-05-03 00:05:52 -04:00
Eelco Dolstra	e4f5156c41	Build against nix-master (cherry picked from commit `e7f2139e25`)	2020-02-20 10:24:04 +01:00
Eelco Dolstra	d4b4255dd2	hydra-queue-runner: Support running in a NixOS container In a NixOS container, cmdBuildDerivation doesn't work because we're not privileged. But we also don't need it because the store already has the derivation. Also, don't copy from/to the store since this gives errors about missing signatures.	2019-09-25 17:26:03 +02:00
Antoine Eiche	9a73ec6455	hydra-queue-runner: better error message if nix-store can not be started The hydra-queue-runner opens a connection to the builder. If the builder is 'localhost' it starts `nix-store`, otherwise it starts 'ssh'. Currently, if the hydra-queue-runner can not start `nix-store` (not in the PATH for instance), the error message is: cannot connect to ‘localhost’: error: cannot start ssh: No such file or directory This is not useful since ssh is actually not started:/ With this patch the error message is now: cannot connect to ‘localhost’: error: cannot start nix-store: No such file or directory	2019-01-23 10:42:47 +01:00
Eelco Dolstra	e9670641ec	Distinguish build step states The web interface now shows whether a build step is connecting, copying inputs/outputs, building, etc.	2017-12-07 15:35:31 +01:00
Eelco Dolstra	27103398c9	Make maxLogSize configurable	2017-09-22 15:23:58 +02:00
Eelco Dolstra	6517446c34	Update to latest nixUnstable	2017-09-14 17:22:48 +02:00
Eelco Dolstra	4af97c57f5	Acquire the send lock only while actually sending Thus, we no longer hold the send lock while substituting missing paths on the build machine. This is a good thing in particular for macOS builders which have a tendency to hang forever in curl downloads.	2017-09-01 16:28:49 +02:00
Eelco Dolstra	50ab80caf2	Don't wait forever to acquire the send lock	2017-09-01 15:29:06 +02:00
Eelco Dolstra	66ae66024e	Sync with latest Nix	2017-07-17 11:38:58 +02:00
Eelco Dolstra	4f11cf45dc	Fix build cancellation We nowadays ignore SIGINT, so the sshd child process inherited this and ignored SIGINT as well.	2017-04-05 11:01:57 +02:00
Eelco Dolstra	a366f362e1	Use latest nixUnstable	2017-02-03 14:39:18 +01:00
Eelco Dolstra	8a120006f0	Fix version test	2016-12-08 16:03:50 +01:00

1 2 3

101 Commits