440 Commits

Author SHA1 Message Date
John Ericson
ef7bf1e67b
Merge pull request #1375 from NixOS/nix-2.21
Nix 2.21
2024-04-12 17:28:37 -04:00
Maximilian Bosch
99afff03b0
hydra-queue-runner: drop broken connections from pool
Closes #1336

When restarting postgresql, the connections are still reused in
`hydra-queue-runner` causing errors like this

    main thread: Lost connection to the database server.
    queue monitor: Lost connection to the database server.

and no more builds being processed.

`hydra-evaluator` doesn't have that issue since it crashes right away.
We could let it retry indefinitely as well (see below), but I don't
want to change too much.

If the DB is still unreachable 10s later, the process will stop with a
non-zero exit code because of a missing DB connection. This however
isn't such a big deal because it will be immediately restarted
afterwards. With the current configuration, Hydra will never give up,
but restart (and retry) infinitely. To me that seems reasonable, i.e. to
retry DB connections on a long-running process. If this doesn't work
out, the monitoring should fire anyways because the queue fills up, but
I'm open to discuss that.

Please note that this isn't reproducible with the DB and the queue
runner on the same machine when using `services.hydra-dev`, because of
the `Requires=` dependency `hydra-queue-runner.service` ->
`hydra-init.service` -> `postgresql.service` that causes the queue
runner to be restarted on `systemctl restart postgresql`.

Internally, Hydra uses Nix's pool data structure: it basically has N
slots (here DB connections) and whenever a new one is requested, an idle
slot is provided or a new one is created (when N slots are active, it'll
be waited until one slot is free). The issue in the code here is however
that whenever an error is encountered, the slot is released, however the
same broken connection will be reused the next time. By using
`Pool::Handle::markBad`, Nix will drop a broken slot. This is now being
done when `pqxx::broken_connection` was caught.
2024-03-15 14:09:31 +01:00
Maximilian Bosch
e499509595
Switch to new Nix bindings, update Nix for that
Implements support for Nix's new Perl bindings[1]. The current state
basically does `openStore()`, but always uses `auto` and doesn't support
stores at other URIs.

Even though the stores are cached inside the Perl implementation, I
decided to instantiate those once in the Nix helper module. That way
store openings aren't cluttered across the entire codebase. Also, there
are two stores used later on - MACHINE_LOCAL_STORE for `auto`,
BINARY_CACHE_STORE for the one from `store_uri` in `hydra.conf` - and
using consistent names should make the intent clearer then.

This doesn't contain any behavioral changes, i.e. the build product
availability issue from #1352 isn't fixed. This patch only contains the
migration to the new API.

[1] https://github.com/NixOS/nix/pull/9863
2024-02-12 18:50:56 +01:00
John Ericson
7b826ec5ad Merge branch 'nix-next' into nix-2.20 2024-01-30 13:26:45 -05:00
John Ericson
fcde5908d8 More CA derivations prep
Again, with care not to change the schema in any way.
2024-01-25 21:32:22 -05:00
John Ericson
7a53b866f6 Merge branch 'master' into nix-next
• Updated input 'nix' (merge):
    'github:NixOS/nix/212ba69e6f995992f8b4e4c0656d19c0156c8714'
    'github:NixOS/nix/2c4bb93ba5a97e7078896ebc36385ce172960e4e' (2024-01-25)
  → 'github:NixOS/nix/8df68a213fc52a57b02a57005b0e06cc8de40ce3' (2024-01-25)
2024-01-25 16:26:07 -05:00
John Ericson
c64eed7d07 Simplify StoreConfig::getDefaultSystemFeatures call
That method is now static.
2024-01-25 15:58:07 -05:00
John Ericson
b1fa6b3aac Use StoreConfig::getDefaultSystemFeatures for default machine config
We have to oddly make a `StoreConfig` subclass to get it, but
https://github.com/NixOS/nix/pull/9848 will fix that.

The purpose of this is to ensure that, absent an explicit config,
`localhost` includes `ca-derivations` and `recursive-nix` if those
experimental features are enabled.

Very much the complement of #1342, the previous PR.
2024-01-24 21:37:13 -05:00
John Ericson
07cb5d1b7c Use nix::ParsedDerivation::getRequiredSystemFeatures()
A slight dedup, and also ensures that floating CA derivations require a
`ca-derivations` experimental feature. This fixes the scheduling issue
that @SuperSandro2000 found.
2024-01-24 21:04:14 -05:00
John Ericson
449eb2d873 Use more nix::Machine fields
The upstream fields were made to match Hydra, so we can get rid of the
extra fields temporary added in
70e5469303b422bdb4b123be222bdea4d7f9611c.
2024-01-24 20:14:31 -05:00
John Ericson
9e7ac58042 Merge branch 'master' into nix-next 2024-01-24 18:36:03 -05:00
John Ericson
d45e14fd43
Merge pull request #1316 from NixOS/ca-derivations-prep
Prepare for CA derivation support with lower impact changes
2024-01-24 18:12:42 -05:00
John Ericson
9a86da0e7b Merge branch 'master' into nix-next 2024-01-23 15:49:14 -05:00
John Ericson
70e5469303 Use Nix's Machine type in a mimimal way
This is *just* using the fields from that type, and only where the types
coincide. (There are two fields with different types, `speedFactor` most
interestingly.) No code is reused, so we can be sure that no behavior is
changed.

Once the types are reconciled on the Nix side, then we can start
carefully actually reusing code.

Progress on #1164
2024-01-23 12:18:57 -05:00
John Ericson
2e6ee28f9b Machine -> ::Machine so we don't conflict with Nix's 2024-01-23 11:03:19 -05:00
John Ericson
7386caaecf Use Nix's SSHMaster 2024-01-23 10:24:02 -05:00
John Ericson
84c46b6b68 Update to newer Nix
Flake lock file updates:

• Updated input 'nix':
    'github:NixOS/nix/74534829f23b668fb9b2f2a14ff6afa4d5e71d4a' (2024-01-22)
  → 'github:NixOS/nix/b6aee9a93f6646bbffd919d362a5c75c37bb9caa' (2024-01-23)
2024-01-23 10:21:48 -05:00
John Ericson
f1d9230f25 Merge remote-tracking branch 'upstream/master' into nix-next 2024-01-23 01:18:13 -05:00
John Ericson
4e8fbaa3d6 Replace Child with SSHMaster::Connection
Nix defines basically an identical struct for the same purpose, so let's
just use that.
2024-01-23 01:11:46 -05:00
John Ericson
4ac31c89df Use nix::serv_proto::BasicConnection in build_remote.cc
- Use the type itself

  This lays the foundation for being able to dedup the protocol code.

- Use `BasicConnection::handshake`, replacing ours.

- Use `BasicConnection::queryValidPaths`

- Use `BasicConnection::putBuildDerivationRequest`
2024-01-22 14:20:39 -05:00
John Ericson
89cfe26533 Merge remote-tracking branch 'upstream/master' into nix-next 2024-01-22 13:01:40 -05:00
John Ericson
588a0c5269 Merge remote-tracking branch 'upstream/master' into ca-derivations-prep 2023-12-23 19:19:54 -05:00
John Ericson
75f26f1fc4 Clean up std::optional dereferencing in the queue runner
Instead of doing this partial operation a number of times, assert (with
a comment, get a reference to the thing inside, and use that just once.
(This refactor was done twice, "just once" for each time.)
2023-12-23 19:10:58 -05:00
John Ericson
6e67884ff1 One more queryDerivationOutputMap should use the eval store param 2023-12-11 14:05:18 -05:00
John Ericson
a6b6c5a539 Revert query -- those columns don't exist yet! 2023-12-11 12:58:54 -05:00
John Ericson
ebfefb9161 Sync up with some changes done to the main CA branch 2023-12-11 12:46:36 -05:00
John Ericson
8783dd53f6 Merge remote-tracking branch 'upstream/master' into ca-derivations-prep 2023-12-11 12:42:43 -05:00
John Ericson
f3a760ad9c
Merge pull request #1324 from obsidiansystems/serve-proto-build-options-serializer
Use `ServeProto::Serialise<ServeProto::BuildOptions>`
2023-12-11 10:45:33 -05:00
John Ericson
8c10331ee8 Fix totalNarSize summation
I accidentally removed it in d0d3b0a2986915ab7aa96d3fce8371a5012c9021.
2023-12-10 14:05:26 -05:00
John Ericson
20f5a2120c Use ServeProto::Serialise<ServeProto::BuildOptions> 2023-12-10 13:24:17 -05:00
John Ericson
b56d2383c1 Do not attempt to speak a newer version of the protocol
Both sides need to agree on a version (with `std::min`) for anything to
work. Somehow... we've never done this.

With this comment, the next commit succeeds. Without this commit, the
next commit fails. This is because the next commit exposes serializers
which do different things for proto version 2.7, and we're currently
requesting 2.6.

Opened https://github.com/NixOS/nix/issues/9584 to track this issue
2023-12-10 13:24:17 -05:00
John Ericson
69a5b00e60 Use ServeProto::BuildOption
More deduplication with Nix.
2023-12-10 13:01:00 -05:00
John Ericson
f6f817926a std::move the into the path info map 2023-12-09 12:12:00 -05:00
John Ericson
d0d3b0a298 Use ServeProto::Serialise<UnkeyedValidPathInfo> for QueryValidPaths
Companion to already-merged https://github.com/NixOS/nix/pull/9560
2023-12-09 12:08:04 -05:00
John Ericson
3f932a6731 build-remote: Use std::map<StorePath, UnkeyedValidPathInfo>
It is less denormalized
2023-12-09 11:59:09 -05:00
John Ericson
4515b5aa17
Merge pull request #1321 from NixOS/master
Mere `master` into `nix-next`
2023-12-09 11:53:58 -05:00
John Ericson
831021808c
Merge pull request #1318 from obsidiansystems/use-build-result-serialiser
Use factored-out `BuildResult` serializer
2023-12-08 11:25:05 -05:00
John Ericson
2ee0068fdc Do not copy for both stores for now
It has a performance cost, and as the comment says we should be doing
the better solution. We want to land this preparatory change on prod
while the rest is still on staging, so we should just skip it for now.

Skipping it will not affect regular fixed-output and input-addressed
derivations, which are the only ones prod would deal with upon getting
this code.

The main CA derivations support branch will revert this commit so it
still works.
2023-12-07 15:05:03 -05:00
John Ericson
31ea6458ca Merge remote-tracking branch 'upstream/master' into ca-derivations-prep 2023-12-07 15:01:35 -05:00
John Ericson
20c8263e3c Update to Nix master
The point of this branch is to always track Nix master, so we are
proactively ready to upgrade to the next Nix release when it is ready.

Flake lock file updates:

• Updated input 'nix':
    'github:NixOS/nix/50f8f1c8bc019a4c0fd098b9ac674b94cfc6af0d' (2023-11-27)
  → 'github:NixOS/nix/c3827ff6348a4d5199eaddf8dbc2ca2e2ef46ec5' (2023-12-07)
• Added input 'nix/libgit2':
    'github:libgit2/libgit2/45fd9ed7ae1a9b74b957ef4f337bc3c8b3df01b5' (2023-10-18)
2023-12-07 13:11:31 -05:00
John Ericson
6a54ab24e2 Use factored-out BuildResult serializer
For the record, here is the Nix 2.19 version:
https://github.com/NixOS/nix/blob/2.19-maintenance/src/libstore/serve-protocol.cc,
which is what we would initially use.

It is a more complete version of what Hydra has today except for one
thing: it always unconditionally sets the start/stop times.

I think that is correct at the other end seems to unconditionally
measure them, but just to be extra careful, I reproduced the old
behavior of falling back on Hydra's own measurements if `startTime` is
0.

The only difference is that the fallback `stopTime` is now measured from
after the entire `BuildResult` is transferred over the wire, but I think
that should be negligible if it is measurable at all. (And remember,
this is fallback case I already suspect is dead code.)
2023-12-07 02:00:22 -05:00
John Ericson
86cd5e9076 copyClosureTo: Use SubstituteFlag instead of bool
This matches Nix (in the same serialization logic in
`src/libstore/legacy-ssh-store.cc`) and adds clarity.
2023-12-07 00:18:50 -05:00
John Ericson
11f8030b0f Add comment from GitHub about adding to store as code comment 2023-12-06 17:59:25 -05:00
John Ericson
e3443cd22a Put back nicer copyClosure instead of manual closure + copy
It looks like we accidentally got the old code back, probably after a merge
conflict resolution.
2023-12-04 17:41:11 -05:00
John Ericson
8046ec2668 Remove unused outputHashes variable
This looks like a stray copy paste.
2023-12-04 16:21:56 -05:00
John Ericson
9ba4417940 Prepare for CA derivation support with lower impact changes
This is just C++ changes without any Perl / Frontend / SQL Schema
changes.

The idea is that it should be possible to redeploy Hydra with these
chnages with (a) no schema migration and also (b) no regressions. We
should be able to much more safely deploy these to a staging server and
then production `hydra.nixos.org`.

Extracted from #875

Co-Authored-By: Théophane Hufschmitt <theophane.hufschmitt@tweag.io>
Co-Authored-By: Alexander Sosedkin <monk@unboiled.info>
Co-Authored-By: Andrea Ciceri <andrea.ciceri@autistici.org>
Co-Authored-By: Charlotte 🦝 Delenk Mlotte@chir.rs>
Co-Authored-By: Sandro Jäckel <sandro.jaeckel@gmail.com>
2023-12-04 16:14:47 -05:00
John Ericson
a5d44b60ea
Merge pull request #1313 from obsidiansystems/split-buildRemote
Split the `buildRemote` function, take 2
2023-12-04 11:37:36 -05:00
John Ericson
363604846a Again, use const in for loop
As requested by @teh. Was lost in merge with master, now added back.
2023-12-04 11:31:05 -05:00
John Ericson
162b538912 Remove unused thisArrow variable 2023-12-04 11:27:39 -05:00
John Ericson
104baef503 Document the connection initialization process 2023-12-04 09:42:04 -05:00