361 Commits

Author SHA1 Message Date
Eelco Dolstra
3b84d4711b Bump Nix 2016-10-26 15:10:56 +02:00
Eelco Dolstra
0b00d51baf Prevent orphaned build steps
If two active steps of the same build failed, then the first would be
marked as "failed", but the second would end up as "orphaned", causing
it to be marked as "aborted" later on. Now it's correctly marked as
"failed".
2016-10-26 14:42:28 +02:00
Eelco Dolstra
8e1d791d0c Truncate the log just before starting the remote build
This gets rid of all those remote substitution messages that were
polluting the build logs.
2016-10-26 13:41:51 +02:00
Eelco Dolstra
3fcfa20d1a Fix regression caused by ee2e9f53
‘basicDrv.inputSrcs’ also contains the outputs of inputDrvs. These
don't necessarily exist in the local store, so copying them may cause
an exception. We should only copy the real inputSrcs.
2016-10-24 16:49:11 +02:00
Eelco Dolstra
a3efdcdfd9 Use std::regex 2016-10-21 18:06:26 +02:00
Eelco Dolstra
e0b2921ff2 Concurrent hydra-evaluator
This rewrites the top-level loop of hydra-evaluator in C++. The Perl
stuff is moved into hydra-eval-jobset. (Rewriting the entire evaluator
would be nice but is a bit too much work.) The new version has some
advantages:

* It can run multiple jobset evaluations in parallel.

* It uses PostgreSQL notifications so it doesn't have to poll the
  database. So if a jobset is triggered via the web interface or from
  a GitHub / Bitbucket webhook, evaluation of the jobset will start
  almost instantaneously (assuming the evaluator is not at its
  concurrency limit).

* It imposes a timeout on evaluations. So if e.g. hydra-eval-jobset
  hangs connecting to a Mercurial server, it will eventually be
  killed.
2016-10-14 14:22:12 +02:00
Eelco Dolstra
16feddd5d4 Drop obsolete -laws-cpp-sdk-s3 2016-10-14 14:22:12 +02:00
Eelco Dolstra
dd5af7637d Remove finally.hh 2016-10-14 14:22:12 +02:00
Eelco Dolstra
ee2e9f5335 Update to reflect BinaryCacheStore changes
BinaryCacheStore no longer implements buildPaths() and ensurePath(),
so we need to use copyPath() / copyClosure().
2016-10-07 20:23:05 +02:00
Eelco Dolstra
6a313c691b hydra-queue-runner: Fix build 2016-10-06 16:58:54 +02:00
Alexander Ried
7089142fdc Add error/warnings for deprecated store specification 2016-10-06 15:10:14 +02:00
Alexander Ried
a73f211bf2 Use store-api for binary cache instantiation 2016-10-06 15:09:44 +02:00
Alexander Ried
1c2f6281b9 Remove signing parameter (nix#f435f82) 2016-10-06 15:09:12 +02:00
Alexander Ried
232e6e8556 Replace buildVerbosity with verboseBuild (nix#5761827) 2016-10-06 15:08:02 +02:00
Alexander Ried
492d16074c Remove s3binarystore (moved to nix in d155d80) 2016-10-06 15:07:21 +02:00
Eelco Dolstra
b1512a152a Fix build failure on GCC 5.4 2016-09-30 17:05:07 +02:00
Shea Levy
5962367ffc Send BuildFinished notifications on cached build results.
Fixes #342
2016-08-17 06:40:12 -04:00
Eelco Dolstra
a55942603a Provide a plugin hook for when build steps finish
Fixes #318.
2016-05-27 14:35:32 +02:00
Eelco Dolstra
b50a105ca7 S3BinaryCacheStore: Use disk cache 2016-04-20 15:29:40 +02:00
Eelco Dolstra
afb86638cd Updates for negative .narinfo caching 2016-04-15 15:39:20 +02:00
Eelco Dolstra
177bf25d64 Queue monitor: Bail out earlier if a step has failed previously
Currently, the hydra.nixos.org queue contains 1000s of Darwin builds
that all depend on a stdenv-darwin that previously failed. However,
before, first createStep() would construct a dependency graph for each
build, then getQueuedBuilds() would discover that one of the steps had
failed previously and discard all those steps. Since the graph
construction involves a lot of uncached calls to isValidPath(), this
took several seconds per build.

Now createStep() detects the previous failure right away and bails
out.
2016-04-15 14:32:16 +02:00
Eelco Dolstra
ef72569cc3 Merge pull request #280 from shlevy/github-status-api
Add a plugin to interact with the github status API.
2016-04-14 20:03:45 +02:00
Eelco Dolstra
d6f188a01a Typo 2016-04-13 16:45:40 +02:00
Eelco Dolstra
b1e36b550c max-output-size -> max_output_size
To be consistent with other Catalyst/Hydra config option names.
2016-04-13 16:30:52 +02:00
Eelco Dolstra
077ed3f571 Periodically clear orphaned build steps
These are build steps that remain "busy" in the database even though
they have finished, because they couldn't be updated (e.g. due to a
PostgreSQL connection problem). To prevent them from showing up as
busy in the "Machine status" page, we now periodically purge them.
2016-04-13 16:30:52 +02:00
Eelco Dolstra
f3f661bac1 Reuse build products / metrics stored in the database
Previously, if the queue monitor thread encounters a build that Hydra
has previously built, it downloaded the output paths from the binary
cache, just to determine the build products and metrics. This is very
inefficient. In particular, when doing something like merging
nixpkgs:staging into nixpkgs:master, the queue monitor thread will be
locked up for a long time fetching files from S3, causing the build
farm to be mostly idle.

Of course this is entirely unnecessary, since the build
products/metrics are already in the Hydra database. So now we just
look up a previous build with the same output path, and copy the
products/metrics.
2016-04-13 16:30:52 +02:00
Eelco Dolstra
8c7edb1005 Fix narrowing conversion 2016-04-13 16:30:52 +02:00
Eelco Dolstra
00c78440b1 Disambiguate "marking build as succeeded" message 2016-04-13 16:30:52 +02:00
Eelco Dolstra
ad834343b5 Fix build against current Nix master 2016-04-13 16:30:52 +02:00
Shea Levy
9b37cb89ae Add buildStarted plugin hook 2016-04-12 14:42:01 -04:00
Eelco Dolstra
ddc9f3cc6a Temporarily disable machines on any exception, not just connection failures 2016-03-22 16:54:40 +01:00
Eelco Dolstra
0aecd65e59 /queue-runner-status: Include info about temporarily disabled machines 2016-03-22 16:54:06 +01:00
Eelco Dolstra
5535bc28ca Tweak 2016-03-10 16:46:15 +01:00
Eelco Dolstra
60e7930d2b Bump memory limit a bit 2016-03-10 16:46:01 +01:00
Eelco Dolstra
75e7b35477 Fix retry of transient failures 2016-03-10 16:44:26 +01:00
Eelco Dolstra
33da40f272 Doh 2016-03-09 17:31:57 +01:00
Eelco Dolstra
4b9c76e502 hydra-queue-runner: Ensure regular status dumps 2016-03-09 17:11:34 +01:00
Eelco Dolstra
4151be7e69 Make the output size limit configurable
The maximum output size per build step (as the sum of the NARs of each
output) can be set via hydra.conf, e.g.

  max-output-size = 1000000000

The default is 2 GiB.

Also refactored the build error / status handling a bit.
2016-03-09 17:00:09 +01:00
Eelco Dolstra
dc790c5f7e Fix bad format string 2016-03-09 16:59:35 +01:00
Eelco Dolstra
80ff78b1b6 Unify build and step status codes
Also remove the obsolete status code 5 from the database.
2016-03-09 15:30:43 +01:00
Eelco Dolstra
9127f5bbc3 hydra-queue-runner: Limit memory usage
When using a binary cache store, the queue runner receives NARs from
the build machines, compresses them, and uploads them to the
cache. However, keeping multiple large NARs in memory can cause the
queue runner to run out of memory. This can happen for instance when
it's processing multiple ISO images concurrently.

The fix is to use a TokenServer to prevent the builder threads to
store more than a certain total size of NARs concurrently (at the
moment, this is hard-coded at 4 GiB). Builder threads that cause the
limit to be exceeded will block until other threads have finished.

The 4 GiB limit does not include certain other allocations, such as
for xz compression or for FSAccessor::readFile(). But since these are
unlikely to be more than the size of the NARs and hydra.nixos.org has
32 GiB RAM, it should be fine.
2016-03-09 14:30:13 +01:00
Eelco Dolstra
b77a43b83d Get rid of "will retry" messages after "maybe cancelling..." 2016-03-08 13:09:39 +01:00
Eelco Dolstra
718fef29ef Keep track of time required to load builds 2016-03-08 13:09:29 +01:00
Eelco Dolstra
2feb17c681 Some more logging 2016-03-08 13:08:07 +01:00
Eelco Dolstra
45b237453a hydra-queue-runner: Recycle finishedDrvs
This should prevent the queue monitor thread from looking up the same
derivations over and over again.
2016-03-08 11:52:13 +01:00
Eelco Dolstra
2ab8e9a1e0 hydra-queue-runner: Fix handling of missing derivations
This barfed with 'queue monitor: ERROR: column "errormsg" of relation
"builds" does not exist' due to the removal of the errorMsg column.
2016-03-07 19:05:24 +01:00
Eelco Dolstra
e7ce225558 Fix build 2016-03-04 17:51:32 +01:00
Eelco Dolstra
86a2d6471c Fix a boost format string abort 2016-03-02 20:06:48 +01:00
Eelco Dolstra
232ca8fea2 Fix build 2016-03-02 17:05:07 +01:00
Eelco Dolstra
b98a061c24 Add some instrumentation to keep track of dispatcher cost 2016-03-02 14:18:39 +01:00