84 Commits

Author SHA1 Message Date
Eelco Dolstra
4d26546d3c Add support for tracking custom metrics
Builds can now emit metrics that Hydra will store in its database and
render as time series via flot charts. Typical applications are to
keep track of performance indicators, coverage percentages, artifact
sizes, and so on.

For example, a coverage build can emit the coverage percentage as
follows:

  echo "lineCoverage $pct %" > $out/nix-support/hydra-metrics

Graphs of all metrics for a job can be seen at

  http://.../job/<project>/<jobset>/<job>#tabs-charts

Specific metrics are also visible at

  http://.../job/<project>/<jobset>/<job>/metric/<metric>

The latter URL also allows getting the data in JSON format (e.g. via
"curl -H 'Accept: application/json'").
2015-07-31 00:57:30 +02:00
Eelco Dolstra
7e026d35f7 Split hydra-queue-runner.cc more 2015-07-21 15:14:17 +02:00
Eelco Dolstra
5370be9f52 hydra-queue-runner: Use cmdBuildDerivation
See 1511aa9f48 and eda2f36c2a.
2015-07-21 01:54:24 +02:00
Eelco Dolstra
3ded87329d Keep track of how many threads are waiting 2015-07-10 19:10:14 +02:00
Eelco Dolstra
89fb723ace Notify the queue runner when a build is deleted 2015-07-08 11:43:35 +02:00
Eelco Dolstra
35b7c4f82b Allow only 1 thread to send a closure to a given machine at the same time
This prevents a race where multiple threads see that machine X is
missing path P, and start sending it concurrently. Nix handles this
correctly, but it's still wasteful (especially for the case where P ==
GHC).

A more refined scheme would be to have per machine, per path locks.
2015-07-07 14:06:48 +02:00
Eelco Dolstra
63745b8e25 Move buildRemote() into State 2015-07-07 10:25:33 +02:00
Eelco Dolstra
df29527531 Refactor 2015-07-07 10:17:21 +02:00
Eelco Dolstra
dffb629b8a Unify Hydra's NixOS module with the one used for hydra.nixos.org
In particular, the queue runner and web server now run under different
UIDs.
2015-07-02 01:01:44 +02:00
Eelco Dolstra
2ece42b2b9 Support preferLocalBuild
Derivations with "preferLocalBuild = true" can now be executed on
specific machines (typically localhost) by setting the mandary system
features field to include "local". For example:

  localhost x86_64-linux,i686-linux - 10 100 - local

says that "localhost" can *only* do builds with "preferLocalBuild =
true". The speed factor of 100 will make the machine almost always win
over other machines.
2015-06-30 00:20:19 +02:00
Eelco Dolstra
008d610467 getQueuedBuilds(): Don't catch errors while loading a build from the queue
Otherwise we never recover from reset daemon connections, e.g.

  hydra-queue-runner[16106]: while loading build 599369: cannot start daemon worker: reading from file: Connection reset by peer
  hydra-queue-runner[16106]: while loading build 599236: writing to file: Broken pipe
  ...

The error is now handled queueMonitor(), causing the next call to
queueMonitorLoop() to create a new connection.
2015-06-26 21:06:35 +02:00
Eelco Dolstra
2f4676bd97 JSONObject doesn't handle 64-bit integers 2015-06-25 16:59:48 +02:00
Eelco Dolstra
c6fcce3b3b Moar stats 2015-06-25 16:47:39 +02:00
Eelco Dolstra
18a3c3ff1c Update "make check" for the new queue runner
Also, if the machines file contains an entry for localhost, then run
"nix-store --serve" directly, without going through SSH.
2015-06-25 16:47:39 +02:00
Eelco Dolstra
32210905d8 Automatically reload $NIX_REMOTE_SYSTEMS when it changes
Otherwise, you'd have to restart the queue runner to add or remove
machines.
2015-06-25 16:47:25 +02:00
Eelco Dolstra
1a0e1eb5a0 More stats 2015-06-24 13:19:27 +02:00
Eelco Dolstra
3f8891b6ff Fix incorrect debug message 2015-06-23 17:53:15 +02:00
Eelco Dolstra
af5cbe97aa createStep(): Cache finished derivations
This gets rid of a lot of redundant calls to readDerivation().
2015-06-23 03:25:31 +02:00
Eelco Dolstra
524ee295e0 Fix sending notifications in the successful case 2015-06-23 02:13:06 +02:00
Eelco Dolstra
4db7c51b5c Rate-limit the number of threads copying closures at the same time
Having a hundred threads doing I/O at the same time is bad on magnetic
disks because of the excessive disk seeks. So allow only 4 threads to
copy closures in parallel.
2015-06-23 01:49:14 +02:00
Eelco Dolstra
a317d24b29 hydra-queue-runner: Send build notifications
Since our notification plugins are written in Perl, sending
notification from C++ requires a small Perl helper named
‘hydra-notify’.
2015-06-23 00:14:49 +02:00
Eelco Dolstra
5312e1209b Keep per-machine stats 2015-06-22 17:11:17 +02:00
Eelco Dolstra
d06366e7cf Remove obsolete comment 2015-06-22 16:59:50 +02:00
Eelco Dolstra
e069ee960e Doh 2015-06-22 16:58:40 +02:00
Eelco Dolstra
41ba7418e2 hydra-queue-runner: More stats 2015-06-22 15:34:33 +02:00
Eelco Dolstra
62b53a0a47 Guard against concurrent invocations of hydra-queue-runner 2015-06-22 14:24:03 +02:00
Eelco Dolstra
fbd7c02217 Periodically dump/log status 2015-06-22 14:15:43 +02:00
Eelco Dolstra
4f4141e1db Add command ‘hydra-queue-runner --status’ to show current status 2015-06-22 14:06:44 +02:00
Eelco Dolstra
44a2b74f5a Keep track of the number of build steps that are being built
(As opposed to being in the closure copying stage.)
2015-06-22 11:23:00 +02:00
Eelco Dolstra
fed71d3fe9 Move "created" field into Step::State 2015-06-22 11:07:52 +02:00
Eelco Dolstra
90a08db241 hydra-queue-runner: Fix assertion failure 2015-06-22 10:59:07 +02:00
Eelco Dolstra
d744362e4a hydra-queue-runner: Fix segfault sorting machines by load
While sorting machines by load, the load of a machine
(machine->currentJobs) can be changed by other threads. If that
happens, the comparator is no longer a proper ordering, in which case
std::sort() can segfault. So we now make a copy of currentJobs before
sorting.
2015-06-21 16:21:42 +02:00
Eelco Dolstra
a0eff6fc15 Fix machine selection 2015-06-19 17:45:26 +02:00
Eelco Dolstra
e13477bdf2 Robustness 2015-06-19 16:35:49 +02:00
Eelco Dolstra
f196967c43 Don't create a propagated build step to the same build 2015-06-19 15:33:37 +02:00
Eelco Dolstra
7afc61691b Doh 2015-06-19 15:27:49 +02:00
Eelco Dolstra
133d298e26 Asynchronously compress build logs 2015-06-19 15:06:12 +02:00
Eelco Dolstra
8e408048e2 Create build step for non-top-level cached failures
This fixes the missing build step on failures like

  http://hydra.nixos.org/build/23222231
2015-06-19 11:33:15 +02:00
Eelco Dolstra
77c8bfd392 Improve logging for aborts 2015-06-19 10:37:22 +02:00
Eelco Dolstra
8db1ae2855 Less verbosity 2015-06-18 17:43:13 +02:00
Eelco Dolstra
89b629eeb1 Fix finishing steps that are not top-level of any build 2015-06-18 17:37:35 +02:00
Eelco Dolstra
9cdbff2fdf Handle concurrent finishing of the same build
There is a slight possibility that the queue monitor and a builder
thread simultaneously decide to mark a build as finished. That's fine,
as long as we ensure the DB update is idempotent (as ensured by doing
"update Builds set finished = 1 ... where finished = 0").
2015-06-18 17:12:51 +02:00
Eelco Dolstra
948473c909 Fix race between the queue monitor and the builder threads 2015-06-18 16:30:28 +02:00
Eelco Dolstra
9c03b11ca8 Simplify retry handling 2015-06-18 14:51:50 +02:00
Eelco Dolstra
e039f5f840 Create failed build steps for cached failures 2015-06-18 04:35:37 +02:00
Eelco Dolstra
92ea800cfb Set finishedInDB in a few more places 2015-06-18 04:19:21 +02:00
Eelco Dolstra
47367451c7 hydra-queue-runner: Set isCachedBuild 2015-06-18 03:28:58 +02:00
Eelco Dolstra
8257812d0a Acquire exclusive table lock earlier 2015-06-18 02:44:29 +02:00
Eelco Dolstra
69be3cfe93 hydra-queue-runner: Handle status queries on the main thread
Doing it on the queue monitor thread was problematic because
processing the queue can take a while.
2015-06-18 01:57:01 +02:00
Eelco Dolstra
a40ca6b76e hydra-queue-runner: Improve dispatcher
We now take the machine speed factor into account, just like
build-remote.pl.
2015-06-18 01:52:20 +02:00