- Replace deprecated exec_params/exec_params0 calls with exec()
- Wrap all parameterized queries with pqxx::params{}
- Add .no_rows()/.one_row() to exec calls that don't return results
This prevents a forever-hanging build (don't know why) when < or > are
in the path of hydra-build-products. This is not to prevent any XSS (see
next commits), just to prevent the DOS (if you can even call it that).
Instead of just going for "whatever is the oldest build we know of",
use the following first:
- Is the step more constrained? If so, schedule it first to avoid
filling up "more desirable" build slots with less constrained builds.
- Does the step have more dependents? If so, schedule it first to try
and maximize open parallelism and breadth of scheduling options.
(cherry picked from commit b8d03adaf4)
This allows for better builder usage when the queue runner is busy. To
avoid running into uncontrollable imbalances between builder/queue
runner, we only release the machine reservation after the local
throttler has found a slot to start copying the outputs for that build.
As opposed to asserting uniqueness to understand resource utilization,
we just switch to using `std::unique_ptr`.
We don't rely on sequential / monotonic build IDs processing anymore, so
randomizing actually has the advantage of mixing builds for different
systems together, to avoid only one chunk of builds for a single system
getting processed while builders for other systems are starved.
Each output for a given step being ingested is looked up in parallel,
which should basically multiply the speed of builds ingestion by the
average number of outputs per derivation.
Running the query with/without it shows that it makes no difference to
postgres, since there's an index on finished=0 already. This allows a
few simplifications, but also paves the way towards running multiple
parallel monitor threads in the future.
By looking at the ratio of running vs. waiting for the dispatcher and
the queue monitor, we should get better visibility into what hydra is
currently bottlenecked on.
There are other side effects we can try to measure to get to the same
result, but having a simple way doesn't cost us much.
My current theory is that running more parallel xz than available CPU
cores is reducing our overall throughput by requiring more scheduling
overhead and more cache thrashing.
There were some hangs caused by this. Need to fix them, ideally
reproducing the issue in a test, before trying this again.
This reverts commit 4a4a0f901c.
This avoids some duplicated code, leveraging the same `StoreReference`
type that also undergirds the machine file dedup we just did prior.
By using `LegacySSHStoreConfig`, we're also taking a baby step towards
using the store interface rather than messing around with the protocol
internals.
Original commit message:
> There are some known regressions regarding local testing setups - since
> everything was kinda half written with the expectation that build dir =
> source dir (which should not be true anymore). But everything builds and
> the test suite runs fine, after several hours spent debugging random
> crashes in libpqxx with MALLOC_PERTURB_...
I have not experienced regressions with local testing.
(cherry picked from commit 4b886d9c45cd2d7fe9b0a8dbc05c7318d46f615d)
With https://github.com/NixOS/nix/pull/9839, the `storeUri` field is
much better structured, so we can use it while still opening the SSH
connection ourselves.