The previous implementation was O(N²lg(N)) due to sorting the full
runnables priority list once per runnable being scheduled. While not
confirmed, this is suspected to cause performance issues and
bottlenecking with the queue runner when the runnable list gets large
enough.
This commit changes the dispatcher to instead only sort runnables per
priority once per dispatch cycle. This has the drawback of being less
reactive to runnable priority changes: the previous code would react
immediately, while this might end up using "old" priorities until the
next dispatch cycle. However, dispatch cycles are not supposed to take
very long (seconds, not minutes/hours), so this is not expected to have
much or any practical impact.
Ideally runnables would be maintained in a sorted data structure instead
of the current approach of copying + sorting in the scheduler. This
would however be a much more invasive change to implement, and might
have to wait until we can confirm where the queue runner bottlenecks
actually lie.
Periodically, I have seen tests fail because of out of order queue runner behavior:
checking the queue for builds > 0...
loading build 1 (tests:basic:empty_dir)
aborting unsupported build step '...-empty-dir.drv' (type 'x86_64-linux')
marking build 1 as failed
adding new machine ‘localhost’
This patch should prevent the dispatcher from running before any machines are
made available.
Previously, the build ID would never flow through channels which
exited.
This patch tracks the buildOne state as part of State and exits avoids
waiting forever for new work.
The code around buildOnly is a bit rough, making this a bit weird to
implement but since it is only used for testing the value of improving
it on its own is a bit questionable.
If we don't see machine that supports a build step for
'max_unsupported_time' seconds, the step is aborted. The default is 0,
which is appropriate for Hydra installations that don't provision
missing machines dynamically.
(cherry picked from commit f5cdbfe21d930db43d3812c7d8e87746d6378ef9)
Same problem as d744362e4a249a92ecfc3e48eae135e3a46db49e.
at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/predefined_ops.h:166
__last@entry=..., __comp=...) at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/stl_algo.h:1827
__comp=...) at /nix/store/ksvsbr7pg4z69bv6fbbc8h7x7rm2104m-gcc-4.9.3/include/c++/4.9.3/bits/stl_algo.h:4717
For example, steps that require the "kvm" feature may require a
different kind of machine to be provisioned. This can also be used to
require performance-sensitive tests to run on a particular kind of
machine, e.g., by setting requiredSystemFeatures to something like
"ec2-i2.8xlarge".
"hydra-queue-runner --status" now prints how many runnable and running
build steps exist for each machine type. This allows additional
machines to be provisioned based on the Hydra load.