hydra

Author	SHA1	Message	Date
Eelco Dolstra	1ecc8a4f40	hydra-queue-runner: Fix a race keeping cancelled steps alive If a step is cancelled just as its builder step is starting, doBuildStep() will return sRetry. This causes builder() to make the step runnable again, since the queue monitor may have added new builds referencing it. The idea is that if the latter condition is not true, the step's reference count will drop to zero and it will be deleted. However, if the dispatcher thread sees and locks the step before the reference count can drop to zero in the builder thread, the dispatcher thread will start a new builder thread for the step. Thus the step can be kept alive for an indefinite amount of time. The fix is for State::builder() to use a weak pointer to the step, to ensure that the step's reference count can drop to zero before it's added to the runnable queue.	2016-11-08 11:47:49 +01:00
Eelco Dolstra	7863d2e1da	Step cancellation: Don't use pthread_cancel() This was a bad idea because pthread_cancel() is unsalvageable broken in C++. Destructors are not allowed to throw exceptions (especially in C++11), but pthread_cancel() can cause a __cxxabiv1::__forced_unwind exception inside any destructor that invokes a cancellation point. (This exception can be caught but must be rethrown.) So let's just kill the builder process instead.	2016-11-07 19:38:24 +01:00
Eelco Dolstra	d7453bd8be	hydra-queue-runner: Fix message	2016-11-02 12:44:18 +01:00
Eelco Dolstra	4f08c85c69	hydra-queue-runner: Fix assertion failure It was hitting assert(reservation.unique()); Since we do want the machine reservation to be released before calling wakeDispatcher(), let's use a different object for keeping track of active steps.	2016-11-02 12:41:00 +01:00
Eelco Dolstra	b3169ce438	Kill active build steps when builds are cancelled We now kill active build steps when there are no more referring builds. This is useful e.g. for preventing cancelled multi-hour TPC-H benchmark runs from hogging build machines.	2016-10-31 14:58:29 +01:00
Eelco Dolstra	0b00d51baf	Prevent orphaned build steps If two active steps of the same build failed, then the first would be marked as "failed", but the second would end up as "orphaned", causing it to be marked as "aborted" later on. Now it's correctly marked as "failed".	2016-10-26 14:42:28 +02:00
Eelco Dolstra	b1512a152a	Fix build failure on GCC 5.4	2016-09-30 17:05:07 +02:00
Eelco Dolstra	a55942603a	Provide a plugin hook for when build steps finish Fixes #318.	2016-05-27 14:35:32 +02:00
Eelco Dolstra	ef72569cc3	Merge pull request #280 from shlevy/github-status-api Add a plugin to interact with the github status API.	2016-04-14 20:03:45 +02:00
Eelco Dolstra	077ed3f571	Periodically clear orphaned build steps These are build steps that remain "busy" in the database even though they have finished, because they couldn't be updated (e.g. due to a PostgreSQL connection problem). To prevent them from showing up as busy in the "Machine status" page, we now periodically purge them.	2016-04-13 16:30:52 +02:00
Eelco Dolstra	00c78440b1	Disambiguate "marking build as succeeded" message	2016-04-13 16:30:52 +02:00
Shea Levy	9b37cb89ae	Add buildStarted plugin hook	2016-04-12 14:42:01 -04:00
Eelco Dolstra	5535bc28ca	Tweak	2016-03-10 16:46:15 +01:00
Eelco Dolstra	75e7b35477	Fix retry of transient failures	2016-03-10 16:44:26 +01:00
Eelco Dolstra	4151be7e69	Make the output size limit configurable The maximum output size per build step (as the sum of the NARs of each output) can be set via hydra.conf, e.g. max-output-size = 1000000000 The default is 2 GiB. Also refactored the build error / status handling a bit.	2016-03-09 17:00:09 +01:00
Eelco Dolstra	80ff78b1b6	Unify build and step status codes Also remove the obsolete status code 5 from the database.	2016-03-09 15:30:43 +01:00
Eelco Dolstra	b77a43b83d	Get rid of "will retry" messages after "maybe cancelling..."	2016-03-08 13:09:39 +01:00
Eelco Dolstra	b98a061c24	Add some instrumentation to keep track of dispatcher cost	2016-03-02 14:18:39 +01:00
Eelco Dolstra	7cd08c7c46	Warn if PostgreSQL appears stalled	2016-02-29 15:10:30 +01:00
Eelco Dolstra	6d741d2ffa	Prevent download of NARs we just uploaded	2016-02-26 15:21:44 +01:00
Eelco Dolstra	02190b0fef	Support hydra-build-products on binary cache stores	2016-02-26 14:45:03 +01:00
Eelco Dolstra	ce5790285a	Merge remote-tracking branch 'origin/master' into binary-cache	2016-02-17 11:54:59 +01:00
Eelco Dolstra	d7a123fcd4	Keep track of the time we spend copying to/from build machines	2016-02-17 10:30:23 +01:00
Eelco Dolstra	2d0dd7fb49	hydra-queue-runner: Write directly to a binary cache	2016-02-15 21:10:29 +01:00
Eelco Dolstra	92d8b59361	Process Nix API changes	2016-02-11 15:59:47 +01:00
Eelco Dolstra	97f8c61928	Fix hydra-queue-runner --build-one	2015-12-29 17:53:33 +01:00
Eelco Dolstra	d8d188301d	Fix division-by-zero crash Not clear why step_->jobsets was empty...	2015-10-30 18:01:48 +01:00
Eelco Dolstra	4d1816b152	Remove obsolete Builds columns and provide accurate "Running builds" This removes the "busy", "locker" and "logfile" columns, which are no longer used by the queue runner. The "Running builds" page now only shows builds that have an active build step.	2015-10-27 15:37:17 +01:00
Eelco Dolstra	8e8e31ce86	Re-implement log size limits The old queue runner already had this. However, we now store "log limit exceeded" as a separate status code in the database.	2015-10-06 17:35:08 +02:00
Eelco Dolstra	d571e44b86	Keep stats for the Hydra auto scaler "hydra-queue-runner --status" now prints how many runnable and running build steps exist for each machine type. This allows additional machines to be provisioned based on the Hydra load.	2015-08-17 13:50:41 +02:00
Eelco Dolstra	97f11baa8d	Revive jobset scheduling (I.e. taking the jobset scheduling share into account.)	2015-08-11 01:31:56 +02:00
Eelco Dolstra	c18fb0ad74	Temporarily disable machines after a connection failure	2015-07-21 15:58:47 +02:00
Eelco Dolstra	7e026d35f7	Split hydra-queue-runner.cc more	2015-07-21 15:14:17 +02:00

33 Commits