Commit Graph

84 Commits

Author SHA1 Message Date
72c3110002 queue-runner: track jobsets by ID 2022-01-15 14:06:00 -05:00
87d46ad5d6 hydra-queue-runner: --build-one: correctly handle a cached build
Previously, the build ID would never flow through channels which
exited.

This patch tracks the buildOne state as part of State and exits avoids
waiting forever for new work.

The code around buildOnly is a bit rough, making this a bit weird to
implement but since it is only used for testing the value of improving
it on its own is a bit questionable.
2021-03-16 16:13:38 -04:00
5b4df3ad5a Get data needed by getBuildOutput() from the incoming NAR in a streaming fashion 2020-07-27 20:38:59 +02:00
cbcf6359b4 Remove TokenServer in preparation of making NAR copying O(1) memory 2020-07-27 14:57:22 +02:00
7985757a1d Fix build 2020-07-08 12:50:02 +02:00
bb32aafa4a Fix build 2020-06-23 13:56:44 +02:00
9727892b61 Don't spam the journal with hydra-queue-runner status dumps
(cherry picked from commit 15ae932488)
2020-03-31 22:19:07 +02:00
ccd046ca3d Keep track of the number of unsupported steps
(cherry picked from commit 45ffe578b6)
2020-03-31 22:19:03 +02:00
4417f9f260 Abort unsupported build steps
If we don't see machine that supports a build step for
'max_unsupported_time' seconds, the step is aborted. The default is 0,
which is appropriate for Hydra installations that don't provision
missing machines dynamically.

(cherry picked from commit f5cdbfe21d)
2020-03-31 22:19:01 +02:00
adf61e5cf8 Fix build
(cherry picked from commit 639c660abf)
2020-02-20 10:26:45 +01:00
e4f5156c41 Build against nix-master
(cherry picked from commit e7f2139e25)
2020-02-20 10:24:04 +01:00
d4b4255dd2 hydra-queue-runner: Support running in a NixOS container
In a NixOS container, cmdBuildDerivation doesn't work because we're
not privileged. But we also don't need it because the store already
has the derivation.

Also, don't copy from/to the store since this gives errors about
missing signatures.
2019-09-25 17:26:03 +02:00
2946899504 Turn hydra-notify into a daemon
It now receives notifications about started/finished builds/steps via
PostgreSQL. This gets rid of the (substantial) overhead of starting
hydra-notify for every event. It also allows other programs (even on
other machines) to listen to Hydra notifications.
2019-08-13 18:18:21 +02:00
8d26144121 Fix building against nix master 2018-10-30 14:41:21 +01:00
5a1f2a50e5 Handle derivations with system type 'builtin'
Fixes .
2018-03-07 10:22:35 +01:00
e9670641ec Distinguish build step states
The web interface now shows whether a build step is connecting,
copying inputs/outputs, building, etc.
2017-12-07 15:35:31 +01:00
b04dc6c76e Fix root creation when the root already exists but is owned by another user 2017-10-19 12:28:38 +02:00
45b138373b hydra-queue-runner: Write GC roots for outputs paths
We lost this behaviour somewhere. So build outputs could be GC'ed when
running the collector with --option gc-keep-outputs false.
2017-10-12 18:55:38 +02:00
27103398c9 Make maxLogSize configurable 2017-09-22 15:23:58 +02:00
6517446c34 Update to latest nixUnstable 2017-09-14 17:22:48 +02:00
50ab80caf2 Don't wait forever to acquire the send lock 2017-09-01 15:29:06 +02:00
bba383bf1b hydra-queue-runner: Keep some notification statistics 2017-07-24 16:26:44 +02:00
f46a21e16e Slight cleanup 2017-07-21 17:22:11 +02:00
150228d7de Upload build logs to the binary cache 2017-03-15 16:59:57 +01:00
7e6486e694 Move log compression to a plugin 2017-03-15 16:59:57 +01:00
500c27e4d5 Add hydra.conf option "nar_buffer_size" to configure memoryTokens limit
It defaults to half the physical RAM.
2017-03-03 12:37:27 +01:00
f6081668dc Allow determinism checking for entire jobsets
Setting

  xxx-jobset-repeats = patchelf:master:2

will cause Hydra to perform every build step in the specified jobset 2
additional times (i.e. 3 times in total). Non-determinism is not fatal
unless the derivation has the attribute "isDeterministic = true"; we
just note the lack of determinism in the Hydra database. This will
allow us to get stats about the (lack of) reproducibility of all of
Nixpkgs.
2016-12-07 15:57:13 +01:00
8bb36e79bd Support testing build determinism
Builds can now specify the attribute "isDeterministic = true" to tell
Hydra to build with build-repeat > 0. If there is a mismatch between
rounds, the step / build fails with a suitable status.

Maybe this should be a meta attribute, but that makes it invisible to
hydra-queue-runner, and it seems reasonable to make a claim of
mandatory determinism part of the derivation (since e.g. enabling this
flag should trigger a rebuild).
2016-12-06 17:46:06 +01:00
b4d32a3085 hydra-queue-runner: More accurate memory accounting
We now take into account the memory necessary for compressing the NAR
being exported to the binary cache, plus xz compression overhead.

Also, we now release the memory tokens for the NAR accessor *after*
releasing the NAR accessor. Previously the memory for the NAR accessor
might still be in use while another thread does an allocation, causing
the maximum to be exceeded temporarily.

Also, use notify_all instead of notify_one to wake up memory token
waiters. This is not very nice, but not every waiter is requesting the
same number of tokens, so some might be able to proceed.
2016-11-16 17:48:50 +01:00
7863d2e1da Step cancellation: Don't use pthread_cancel()
This was a bad idea because pthread_cancel() is unsalvageable broken
in C++. Destructors are not allowed to throw exceptions (especially in
C++11), but pthread_cancel() can cause a __cxxabiv1::__forced_unwind
exception inside any destructor that invokes a cancellation
point. (This exception can be caught but *must* be rethrown.) So let's
just kill the builder process instead.
2016-11-07 19:38:24 +01:00
4f08c85c69 hydra-queue-runner: Fix assertion failure
It was hitting

    assert(reservation.unique());

Since we do want the machine reservation to be released before calling
wakeDispatcher(), let's use a different object for keeping track of
active steps.
2016-11-02 12:41:00 +01:00
b3169ce438 Kill active build steps when builds are cancelled
We now kill active build steps when there are no more referring
builds. This is useful e.g. for preventing cancelled multi-hour TPC-H
benchmark runs from hogging build machines.
2016-10-31 14:58:29 +01:00
ee2e9f5335 Update to reflect BinaryCacheStore changes
BinaryCacheStore no longer implements buildPaths() and ensurePath(),
so we need to use copyPath() / copyClosure().
2016-10-07 20:23:05 +02:00
a55942603a Provide a plugin hook for when build steps finish
Fixes .
2016-05-27 14:35:32 +02:00
ef72569cc3 Merge pull request from shlevy/github-status-api
Add a plugin to interact with the github status API.
2016-04-14 20:03:45 +02:00
077ed3f571 Periodically clear orphaned build steps
These are build steps that remain "busy" in the database even though
they have finished, because they couldn't be updated (e.g. due to a
PostgreSQL connection problem). To prevent them from showing up as
busy in the "Machine status" page, we now periodically purge them.
2016-04-13 16:30:52 +02:00
f3f661bac1 Reuse build products / metrics stored in the database
Previously, if the queue monitor thread encounters a build that Hydra
has previously built, it downloaded the output paths from the binary
cache, just to determine the build products and metrics. This is very
inefficient. In particular, when doing something like merging
nixpkgs:staging into nixpkgs:master, the queue monitor thread will be
locked up for a long time fetching files from S3, causing the build
farm to be mostly idle.

Of course this is entirely unnecessary, since the build
products/metrics are already in the Hydra database. So now we just
look up a previous build with the same output path, and copy the
products/metrics.
2016-04-13 16:30:52 +02:00
9b37cb89ae Add buildStarted plugin hook 2016-04-12 14:42:01 -04:00
33da40f272 Doh 2016-03-09 17:31:57 +01:00
4b9c76e502 hydra-queue-runner: Ensure regular status dumps 2016-03-09 17:11:34 +01:00
4151be7e69 Make the output size limit configurable
The maximum output size per build step (as the sum of the NARs of each
output) can be set via hydra.conf, e.g.

  max-output-size = 1000000000

The default is 2 GiB.

Also refactored the build error / status handling a bit.
2016-03-09 17:00:09 +01:00
80ff78b1b6 Unify build and step status codes
Also remove the obsolete status code 5 from the database.
2016-03-09 15:30:43 +01:00
9127f5bbc3 hydra-queue-runner: Limit memory usage
When using a binary cache store, the queue runner receives NARs from
the build machines, compresses them, and uploads them to the
cache. However, keeping multiple large NARs in memory can cause the
queue runner to run out of memory. This can happen for instance when
it's processing multiple ISO images concurrently.

The fix is to use a TokenServer to prevent the builder threads to
store more than a certain total size of NARs concurrently (at the
moment, this is hard-coded at 4 GiB). Builder threads that cause the
limit to be exceeded will block until other threads have finished.

The 4 GiB limit does not include certain other allocations, such as
for xz compression or for FSAccessor::readFile(). But since these are
unlikely to be more than the size of the NARs and hydra.nixos.org has
32 GiB RAM, it should be fine.
2016-03-09 14:30:13 +01:00
b77a43b83d Get rid of "will retry" messages after "maybe cancelling..." 2016-03-08 13:09:39 +01:00
718fef29ef Keep track of time required to load builds 2016-03-08 13:09:29 +01:00
b98a061c24 Add some instrumentation to keep track of dispatcher cost 2016-03-02 14:18:39 +01:00
6beee0ab49 Fix segfault sorting runnable steps
Same problem as d744362e4a.

    at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/predefined_ops.h:166
    __last@entry=..., __comp=...) at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/stl_algo.h:1827
    __comp=...) at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/stl_algo.h:4717
2016-03-02 13:59:24 +01:00
7cd08c7c46 Warn if PostgreSQL appears stalled 2016-02-29 15:10:30 +01:00
6d741d2ffa Prevent download of NARs we just uploaded 2016-02-26 15:21:44 +01:00
8321a3eb27 Sync with Nix 2016-02-24 14:04:31 +01:00