18 Commits

Author SHA1 Message Date
Maximilian Bosch
90a8a0d94a Reimplement (named) constituent jobs (+globbing) based on nix-eval-jobs
Depends on https://github.com/nix-community/nix-eval-jobs/pull/349 & #1421.

Almost equivalent to #1425, but with a small change: when having e.g. an
aggregate job with a glob that matches nothing, the jobset evaluation is
failed now. This was the intended behavior before (hydra-eval-jobset
fails hard if an aggregate is broken), the code-path was never reached
however since the aggregate was never marked as broken in this case
before.
2025-04-09 11:31:47 -04:00
Pierre Bourdon
d84ff32ce6 hydra-eval-jobset: Use nix-eval-jobs instead of hydra-eval-jobs
incrementally ingest eval results

nix-eval-jobs streams output, unlike hydra-eval-jobs. Now that we've
migrated, we can use this to:

1. Use less RAM by avoiding buffering a whole eval's worth of metadata
   into a Perl string and an array of JSON objects.
2. Make evals latency a bit lower by allowing the queue runner to start
   ingesting builds faster.

Also use the newly-restored constituents support in `nix-eval-jobs`

Note, we pass --workers and --max-memory-size to n-e-j

Lost in the h-e-j -> n-e-j migration, causing evaluation to always be
single threaded and limited to 4GiB RAM. Follow the config settings like
h-e-j used to do (via C++ code).

`nix-eval-jobs` should check `hydraJobs` and then `checks` with flakes

(cherry picked from commit 6d4ccff43c41adaf6e4b2b9bced7243bc2f6e97b)
(cherry picked from commit b0e9b4b2f99f9d8f5c4e780e89f955c394b5ced4)
(cherry picked from commit cdfc5c81e8037d3e4818a3e459d0804b2c157ea9)
(cherry picked from commit 4b107e6ff36bd89958fba36e0fe0340903e7cd13)

Co-Authored-By: Maximilian Bosch <maximilian@mbosch.me>
2025-02-07 16:55:28 -05:00
Rick van Schijndel
578a3d2292 t: increase timeouts for slow commands with high load
We've seen many fails on ofborg, at lot of them ultimately appear to come down to
a timeout being hit, resulting in something like this:

Failure executing slapadd -F /<path>/slap.d -b dc=example -l /<path>/load.ldif.

Hopefully this resolves it for most cases.
I've done some endurance testing and this helps a lot.
some other commands also regularly time-out with high load:

- hydra-init
- hydra-create-user
- nix-store --delete

This should address most issues with tests randomly failing.

Used the following script for endurance testing:

```

import os
import subprocess

run_counter = 0
fail_counter = 0

while True:
    try:
        run_counter += 1
        print(f"Starting run {run_counter}")
        env = os.environ
        env["YATH_JOB_COUNT"] = "20"
        result = subprocess.run(["perl", "t/test.pl"], env=env)
        if (result.returncode != 0):
            fail_counter += 1
        print(f"Finish run {run_counter}, total fail count: {fail_counter}")
    except KeyboardInterrupt:
        print(f"Finished {run_counter} runs with {fail_counter} fails")
        break
```

In case someone else wants to do it on their system :).
Note that YATH_JOB_COUNT may need to be changed loosely based on your
cores.
I only have 4 cores (8 threads), so for others higher numbers might
yield better results in hashing out unstable tests.
2024-07-31 17:13:28 +02:00
John Ericson
323b556dc8 Minimal CA support
This verison has a worse UI, but also chnages the schema less: One
non-null constraint is removed, but no new columns are added.

Co-Authored-By: Andrea Ciceri <andrea.ciceri@autistici.org>
Co-Authored-By: regnat <rg@regnat.ovh>
2024-01-26 00:34:58 -05:00
Eelco Dolstra
c72bed5cb4 Fix tests
Use $NIX_REMOTE instead of the legacy environment variables.
2022-07-12 14:45:30 +02:00
Graham Christensen
5d169e3a2e Add a test validating direct and indirect constituents 2022-02-20 12:28:40 -05:00
Graham Christensen
33f4c4c13d build-locally-with-substitutable-path.t: give nix-store --delete a bit more time to run
Under high load, like 64-128 tests at once, this can take more than a second.
2022-02-10 11:13:31 -05:00
Graham Christensen
845e6d4760 captureStdoutStderr*: move to Hydra::Helper::Exec which helps avoid some environment variable fixation problems 2022-02-09 14:28:50 -05:00
Graham Christensen
952f629b7c Test the queue runner in the scenario where a dependency is available in the cache but GC'd locally, where we're building locally 2022-01-21 15:26:45 -05:00
Graham Christensen
fbce3b6ed1 default-machine-file: use makeAndEvaluateJobset 2021-12-14 20:52:40 -05:00
Graham Christensen
06f824ca48 notifications.t: use system() with lists 2021-12-14 20:48:19 -05:00
Graham Christensen
c2384a04d8 notifications.t: move to makeAndEvaluateJobset 2021-12-14 20:41:21 -05:00
Graham Christensen
7dcf6a01c6 JSON -> JSON::MaybeXS 2021-12-13 15:37:56 -05:00
Luke Granger-Brown
67ebce8493 Output evaluation errors without crashing if aggregate job is broken.
At the moment, aggregate jobs can easily break and cause the entire
evaluation to fail, which is not ideal. For Nixpkgs, we do have some
important aggregate jobs (like `tested`), but for debugging and building
purposes it's still useful to get a partial result even if the channel
won't actually advance.

This commit changes the behaviour of hydra-eval-jobs such that it
aggregates any errors found during the construction of an aggregate, and
will instead annotate the job with the evaluation failure such that it
shows up in a "cleaner" way.

There are really two types of failure that we care about: one is where
the attribute just ends up missing altogether in the final output, and
also where the attribute is in the output but fails to evaluate. Both
are handled here.

Note that this does mean that the same error message may be output
multiple times, but this aids debuggability because it'll be much
clearer what's blocking the job from being created.
2021-10-26 10:14:34 +01:00
Your Name
4677a7c894 perlcritic: use strict, use warnings 2021-09-06 22:13:33 -04:00
Graham Christensen
cb8929b7ed Tighten up 'should exit with return code' 2021-06-16 11:48:49 -04:00
regnat
305c27d3fb Move the default-machine-file test under queue-runner
Just a bit of housekeeping to keep the testsuite clean
2021-04-29 06:55:23 +02:00
Graham Christensen
cf4434bc9f queue runner: test notifications
Especially, test the difference in behavior of substituted and unsubstituted builds.
2021-04-14 14:19:10 -04:00