- Use the type itself
This lays the foundation for being able to dedup the protocol code.
- Use `BasicConnection::handshake`, replacing ours.
- Use `BasicConnection::queryValidPaths`
- Use `BasicConnection::putBuildDerivationRequest`
Both sides need to agree on a version (with `std::min`) for anything to
work. Somehow... we've never done this.
With this comment, the next commit succeeds. Without this commit, the
next commit fails. This is because the next commit exposes serializers
which do different things for proto version 2.7, and we're currently
requesting 2.6.
Opened https://github.com/NixOS/nix/issues/9584 to track this issue
The point of this branch is to always track Nix master, so we are
proactively ready to upgrade to the next Nix release when it is ready.
Flake lock file updates:
• Updated input 'nix':
'github:NixOS/nix/50f8f1c8bc019a4c0fd098b9ac674b94cfc6af0d' (2023-11-27)
→ 'github:NixOS/nix/c3827ff6348a4d5199eaddf8dbc2ca2e2ef46ec5' (2023-12-07)
• Added input 'nix/libgit2':
'github:libgit2/libgit2/45fd9ed7ae1a9b74b957ef4f337bc3c8b3df01b5' (2023-10-18)
For the record, here is the Nix 2.19 version:
https://github.com/NixOS/nix/blob/2.19-maintenance/src/libstore/serve-protocol.cc,
which is what we would initially use.
It is a more complete version of what Hydra has today except for one
thing: it always unconditionally sets the start/stop times.
I think that is correct at the other end seems to unconditionally
measure them, but just to be extra careful, I reproduced the old
behavior of falling back on Hydra's own measurements if `startTime` is
0.
The only difference is that the fallback `stopTime` is now measured from
after the entire `BuildResult` is transferred over the wire, but I think
that should be negligible if it is measurable at all. (And remember,
this is fallback case I already suspect is dead code.)
We were using protocol version 6 but requesting version 4. The only
reason that this worked was because of a broken version check in
'nix-store --serve'. That was fixed in
c2d7456926,
which had the side-effect of breaking hydra-queue-runner.
On hydra.nixos.org the queue runner had child processes that were
stuck handling an exception:
Thread 1 (Thread 0x7f501f7fe640 (LWP 1413473) "bld~v54h5zkhmb3"):
#0 futex_wait (private=0, expected=2, futex_word=0x7f50c27969b0 <_rtld_local+2480>) at ../sysdeps/nptl/futex-internal.h:146
#1 __lll_lock_wait (futex=0x7f50c27969b0 <_rtld_local+2480>, private=0) at lowlevellock.c:52
#2 0x00007f50c21eaee4 in __GI___pthread_mutex_lock (mutex=0x7f50c27969b0 <_rtld_local+2480>) at ../nptl/pthread_mutex_lock.c:115
#3 0x00007f50c1854bef in __GI___dl_iterate_phdr (callback=0x7f50c190c020 <_Unwind_IteratePhdrCallback>, data=0x7f501f7fb040) at dl-iteratephdr.c:40
#4 0x00007f50c190d2d1 in _Unwind_Find_FDE () from /nix/store/65hafbsx91127farbmyyv4r5ifgjdg43-glibc-2.33-117/lib/libgcc_s.so.1
#5 0x00007f50c19099b3 in uw_frame_state_for () from /nix/store/65hafbsx91127farbmyyv4r5ifgjdg43-glibc-2.33-117/lib/libgcc_s.so.1
#6 0x00007f50c190ab90 in uw_init_context_1 () from /nix/store/65hafbsx91127farbmyyv4r5ifgjdg43-glibc-2.33-117/lib/libgcc_s.so.1
#7 0x00007f50c190b08e in _Unwind_RaiseException () from /nix/store/65hafbsx91127farbmyyv4r5ifgjdg43-glibc-2.33-117/lib/libgcc_s.so.1
#8 0x00007f50c1b02ab7 in __cxa_throw () from /nix/store/dd8swlwhpdhn6bv219562vyxhi8278hs-gcc-10.3.0-lib/lib/libstdc++.so.6
#9 0x00007f50c1d01abe in nix::parseURL (url="root@cb893012.packethost.net") at src/libutil/url.cc:53
#10 0x0000000000484f55 in extraStoreArgs (machine="root@cb893012.packethost.net") at build-remote.cc:35
#11 operator() (__closure=0x7f4fe9fe0420) at build-remote.cc:79
...
Maybe the fork happened while another thread was holding some global
stack unwinding lock
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744). Anyway, since
the hanging child inherits all file descriptors to SSH clients,
shutting down remote builds (via 'child.to = -1' in
State::buildRemote()) doesn't work and 'child.pid.wait()' hangs
forever.
So let's not do any significant work between fork and exec.
This is syntactically lighter wait, and demonstates there are no weird
dynamic lifetimes involved, just regular passing reference to callee
which it only borrows for the duration of the call.