43 Commits

Author SHA1 Message Date
Eelco Dolstra
b04dc6c76e
Fix root creation when the root already exists but is owned by another user 2017-10-19 12:28:38 +02:00
Eelco Dolstra
45b138373b
hydra-queue-runner: Write GC roots for outputs paths
We lost this behaviour somewhere. So build outputs could be GC'ed when
running the collector with --option gc-keep-outputs false.
2017-10-12 18:55:38 +02:00
Eelco Dolstra
7c976d2aec
hydra-queue-runner: Make build notification more reliable
Previously, when hydra-queue-runner was restarted, any pending "build
finished" notifications were lost. Now hydra-queue-runner marks
finished but unnotified builds in the database and uses that to run
pending notifications at startup.
2017-07-26 15:17:51 +02:00
Eelco Dolstra
e117d85c2a
hydra-queue-runner: Set a thread title for the builder threads
This should make debugging slightly easier.
2017-07-25 15:59:41 +02:00
Eelco Dolstra
8364f4ec70
Upload log files to the right location
We were mixing up builds and steps. So for example

  https://cache.nixos.org/log/2w66a98iqbjdppc5s2b8qvhi3gprvy45-freecell-solver-4.8.0.drv

at the moment contains the log for
/nix/store/442r9d5ihbcpgq8q9dhijhvhlmplzp96-perl-namespace-autoclean-0.28.drv
because the latter is a step in http://hydra.nixos.org/build/51300420.
Oops.
2017-04-06 13:05:30 +02:00
Eelco Dolstra
147ba3ca31
Set proper charset on log files 2017-03-31 18:00:08 +02:00
Eelco Dolstra
150228d7de
Upload build logs to the binary cache 2017-03-15 16:59:57 +01:00
Eelco Dolstra
7e6486e694
Move log compression to a plugin 2017-03-15 16:59:57 +01:00
Eelco Dolstra
f6081668dc
Allow determinism checking for entire jobsets
Setting

  xxx-jobset-repeats = patchelf:master:2

will cause Hydra to perform every build step in the specified jobset 2
additional times (i.e. 3 times in total). Non-determinism is not fatal
unless the derivation has the attribute "isDeterministic = true"; we
just note the lack of determinism in the Hydra database. This will
allow us to get stats about the (lack of) reproducibility of all of
Nixpkgs.
2016-12-07 15:57:13 +01:00
Eelco Dolstra
b4d32a3085
hydra-queue-runner: More accurate memory accounting
We now take into account the memory necessary for compressing the NAR
being exported to the binary cache, plus xz compression overhead.

Also, we now release the memory tokens for the NAR accessor *after*
releasing the NAR accessor. Previously the memory for the NAR accessor
might still be in use while another thread does an allocation, causing
the maximum to be exceeded temporarily.

Also, use notify_all instead of notify_one to wake up memory token
waiters. This is not very nice, but not every waiter is requesting the
same number of tokens, so some might be able to proceed.
2016-11-16 17:48:50 +01:00
Eelco Dolstra
1ecc8a4f40 hydra-queue-runner: Fix a race keeping cancelled steps alive
If a step is cancelled just as its builder step is starting,
doBuildStep() will return sRetry. This causes builder() to make the
step runnable again, since the queue monitor may have added new builds
referencing it. The idea is that if the latter condition is not true,
the step's reference count will drop to zero and it will be
deleted. However, if the dispatcher thread sees and locks the step
before the reference count can drop to zero in the builder thread, the
dispatcher thread will start a new builder thread for the step. Thus
the step can be kept alive for an indefinite amount of time.

The fix is for State::builder() to use a weak pointer to the step, to
ensure that the step's reference count can drop to zero *before* it's
added to the runnable queue.
2016-11-08 11:47:49 +01:00
Eelco Dolstra
7863d2e1da Step cancellation: Don't use pthread_cancel()
This was a bad idea because pthread_cancel() is unsalvageable broken
in C++. Destructors are not allowed to throw exceptions (especially in
C++11), but pthread_cancel() can cause a __cxxabiv1::__forced_unwind
exception inside any destructor that invokes a cancellation
point. (This exception can be caught but *must* be rethrown.) So let's
just kill the builder process instead.
2016-11-07 19:38:24 +01:00
Eelco Dolstra
d7453bd8be hydra-queue-runner: Fix message 2016-11-02 12:44:18 +01:00
Eelco Dolstra
4f08c85c69 hydra-queue-runner: Fix assertion failure
It was hitting

    assert(reservation.unique());

Since we do want the machine reservation to be released before calling
wakeDispatcher(), let's use a different object for keeping track of
active steps.
2016-11-02 12:41:00 +01:00
Eelco Dolstra
b3169ce438 Kill active build steps when builds are cancelled
We now kill active build steps when there are no more referring
builds. This is useful e.g. for preventing cancelled multi-hour TPC-H
benchmark runs from hogging build machines.
2016-10-31 14:58:29 +01:00
Eelco Dolstra
0b00d51baf Prevent orphaned build steps
If two active steps of the same build failed, then the first would be
marked as "failed", but the second would end up as "orphaned", causing
it to be marked as "aborted" later on. Now it's correctly marked as
"failed".
2016-10-26 14:42:28 +02:00
Eelco Dolstra
b1512a152a Fix build failure on GCC 5.4 2016-09-30 17:05:07 +02:00
Eelco Dolstra
a55942603a Provide a plugin hook for when build steps finish
Fixes #318.
2016-05-27 14:35:32 +02:00
Eelco Dolstra
ef72569cc3 Merge pull request #280 from shlevy/github-status-api
Add a plugin to interact with the github status API.
2016-04-14 20:03:45 +02:00
Eelco Dolstra
077ed3f571 Periodically clear orphaned build steps
These are build steps that remain "busy" in the database even though
they have finished, because they couldn't be updated (e.g. due to a
PostgreSQL connection problem). To prevent them from showing up as
busy in the "Machine status" page, we now periodically purge them.
2016-04-13 16:30:52 +02:00
Eelco Dolstra
00c78440b1 Disambiguate "marking build as succeeded" message 2016-04-13 16:30:52 +02:00
Shea Levy
9b37cb89ae Add buildStarted plugin hook 2016-04-12 14:42:01 -04:00
Eelco Dolstra
5535bc28ca Tweak 2016-03-10 16:46:15 +01:00
Eelco Dolstra
75e7b35477 Fix retry of transient failures 2016-03-10 16:44:26 +01:00
Eelco Dolstra
4151be7e69 Make the output size limit configurable
The maximum output size per build step (as the sum of the NARs of each
output) can be set via hydra.conf, e.g.

  max-output-size = 1000000000

The default is 2 GiB.

Also refactored the build error / status handling a bit.
2016-03-09 17:00:09 +01:00
Eelco Dolstra
80ff78b1b6 Unify build and step status codes
Also remove the obsolete status code 5 from the database.
2016-03-09 15:30:43 +01:00
Eelco Dolstra
b77a43b83d Get rid of "will retry" messages after "maybe cancelling..." 2016-03-08 13:09:39 +01:00
Eelco Dolstra
b98a061c24 Add some instrumentation to keep track of dispatcher cost 2016-03-02 14:18:39 +01:00
Eelco Dolstra
7cd08c7c46 Warn if PostgreSQL appears stalled 2016-02-29 15:10:30 +01:00
Eelco Dolstra
6d741d2ffa Prevent download of NARs we just uploaded 2016-02-26 15:21:44 +01:00
Eelco Dolstra
02190b0fef Support hydra-build-products on binary cache stores 2016-02-26 14:45:03 +01:00
Eelco Dolstra
ce5790285a Merge remote-tracking branch 'origin/master' into binary-cache 2016-02-17 11:54:59 +01:00
Eelco Dolstra
d7a123fcd4 Keep track of the time we spend copying to/from build machines 2016-02-17 10:30:23 +01:00
Eelco Dolstra
2d0dd7fb49 hydra-queue-runner: Write directly to a binary cache 2016-02-15 21:10:29 +01:00
Eelco Dolstra
92d8b59361 Process Nix API changes 2016-02-11 15:59:47 +01:00
Eelco Dolstra
97f8c61928 Fix hydra-queue-runner --build-one 2015-12-29 17:53:33 +01:00
Eelco Dolstra
d8d188301d Fix division-by-zero crash
Not clear why step_->jobsets was empty...
2015-10-30 18:01:48 +01:00
Eelco Dolstra
4d1816b152 Remove obsolete Builds columns and provide accurate "Running builds"
This removes the "busy", "locker" and "logfile" columns, which are no
longer used by the queue runner. The "Running builds" page now only
shows builds that have an active build step.
2015-10-27 15:37:17 +01:00
Eelco Dolstra
8e8e31ce86 Re-implement log size limits
The old queue runner already had this. However, we now store "log
limit exceeded" as a separate status code in the database.
2015-10-06 17:35:08 +02:00
Eelco Dolstra
d571e44b86 Keep stats for the Hydra auto scaler
"hydra-queue-runner --status" now prints how many runnable and running
build steps exist for each machine type. This allows additional
machines to be provisioned based on the Hydra load.
2015-08-17 13:50:41 +02:00
Eelco Dolstra
97f11baa8d Revive jobset scheduling
(I.e. taking the jobset scheduling share into account.)
2015-08-11 01:31:56 +02:00
Eelco Dolstra
c18fb0ad74 Temporarily disable machines after a connection failure 2015-07-21 15:58:47 +02:00
Eelco Dolstra
7e026d35f7 Split hydra-queue-runner.cc more 2015-07-21 15:14:17 +02:00