Commit Graph

293 Commits

Author SHA1 Message Date
17f9920cf9 jobset-eval: fix actions not showing up sometimes for new jobs
New jobs have their "new" status take precedence over them being
"failed" or "queued", which means actions that can act on "failed" or
"queued" jobs weren't shown to the user when they could only act on
"new" jobs.

(cherry picked from commit 9a4a5dd624)
2025-05-14 20:29:25 -04:00
65618fd590 web: replace 'errormsg' with 'errormsg IS NULL' in most cases
This is implement in an extremely hacky way due to poor DBIx feature
support. Ideally, what we'd need is a way to tell DBIx to ignore the
errormsg column unless explicitly requested, and to automatically add a
computed 'errormsg IS NULL' column in others. Since it does not support
that, this commit instead hacks some support via method overrides while
taking care to not break anything obvious.
2025-04-09 11:31:47 -04:00
17094c8371 lazy-load evaluation errors
Closes #1362
2025-04-09 11:31:47 -04:00
90a8a0d94a Reimplement (named) constituent jobs (+globbing) based on nix-eval-jobs
Depends on https://github.com/nix-community/nix-eval-jobs/pull/349 & #1421.

Almost equivalent to #1425, but with a small change: when having e.g. an
aggregate job with a glob that matches nothing, the jobset evaluation is
failed now. This was the intended behavior before (hydra-eval-jobset
fails hard if an aggregate is broken), the code-path was never reached
however since the aggregate was never marked as broken in this case
before.
2025-04-09 11:31:47 -04:00
341b2f1309 Update build system to depend on Nix 2.26 2025-02-13 21:54:35 -05:00
2f92846e5a hydra-eval-jobs: remove, replaced by nix-eval-jobs
(cherry picked from commit ed7c58708cd3affd62a598a22a500ed2adf318bf)
2025-02-07 16:55:28 -05:00
d84ff32ce6 hydra-eval-jobset: Use nix-eval-jobs instead of hydra-eval-jobs
incrementally ingest eval results

nix-eval-jobs streams output, unlike hydra-eval-jobs. Now that we've
migrated, we can use this to:

1. Use less RAM by avoiding buffering a whole eval's worth of metadata
   into a Perl string and an array of JSON objects.
2. Make evals latency a bit lower by allowing the queue runner to start
   ingesting builds faster.

Also use the newly-restored constituents support in `nix-eval-jobs`

Note, we pass --workers and --max-memory-size to n-e-j

Lost in the h-e-j -> n-e-j migration, causing evaluation to always be
single threaded and limited to 4GiB RAM. Follow the config settings like
h-e-j used to do (via C++ code).

`nix-eval-jobs` should check `hydraJobs` and then `checks` with flakes

(cherry picked from commit 6d4ccff43c41adaf6e4b2b9bced7243bc2f6e97b)
(cherry picked from commit b0e9b4b2f99f9d8f5c4e780e89f955c394b5ced4)
(cherry picked from commit cdfc5c81e8037d3e4818a3e459d0804b2c157ea9)
(cherry picked from commit 4b107e6ff36bd89958fba36e0fe0340903e7cd13)

Co-Authored-By: Maximilian Bosch <maximilian@mbosch.me>
2025-02-07 16:55:28 -05:00
141b5fd0b5 Improve tests around constituents
- Test how shorter names are preferred when multiple jobs resolve to the
  same derivation.

- Test the exact aggregate map we get, by looking in the DB.
2025-02-07 16:39:13 -05:00
8a8ac14877 Test using Hydra with flakes
It seemed there was no self-contained end-to-end test actually doing
this?!

Among other things, this will help ensure that the switch-over to
`nix-eval-jobs` is correct.
2025-02-06 21:30:49 -05:00
182a48c9fb autotools -> meson
Original commit message:

> There are some known regressions regarding local testing setups - since
> everything was kinda half written with the expectation that build dir =
> source dir (which should not be true anymore). But everything builds and
> the test suite runs fine, after several hours spent debugging random
> crashes in libpqxx with MALLOC_PERTURB_...

I have not experienced regressions with local testing.

(cherry picked from commit 4b886d9c45cd2d7fe9b0a8dbc05c7318d46f615d)
2024-11-24 15:58:26 -05:00
b6f44b5cd0 Merge pull request #1402 from NixOS/like-sub
tests: use `like` for testing regexes
2024-09-15 23:50:13 +02:00
f730433789 Create eval-jobset role and guard /api/push route 2024-08-27 19:49:05 +02:00
916531dc9c api: Require POST for /api/push 2024-08-27 17:52:13 +02:00
250780aaf2 tests: use like for testing regexes
This gives us better diagnostics when the test fails.
2024-08-21 08:34:25 +02:00
54002f0fcf t/evaluator/evaluate-oom-job.t: always skip, the test always fails
We should look into how to resolve this, but I tried some things and nothing really worked.
Let's put it skipped for now until someone comes along to improve it.
2024-07-31 17:15:02 +02:00
a6b14369ee t/test.pl: increase event-timeout, set qvf
Only log issues/failures when something's actually up.
It has irked me for a long time that so much output came
out of running the tests, this seems to silence it.
It does hide some warnings, but I think it makes the output
so much more readable that it's worth the tradeoff.

Helps for highly parallel running of jobs, sometimes they'd not give output for a while.
Setting this timeout higher appears to help.
Not completely sure if this is the right place to do it, but it works fine for me.
2024-07-31 17:15:02 +02:00
578a3d2292 t: increase timeouts for slow commands with high load
We've seen many fails on ofborg, at lot of them ultimately appear to come down to
a timeout being hit, resulting in something like this:

Failure executing slapadd -F /<path>/slap.d -b dc=example -l /<path>/load.ldif.

Hopefully this resolves it for most cases.
I've done some endurance testing and this helps a lot.
some other commands also regularly time-out with high load:

- hydra-init
- hydra-create-user
- nix-store --delete

This should address most issues with tests randomly failing.

Used the following script for endurance testing:

```

import os
import subprocess

run_counter = 0
fail_counter = 0

while True:
    try:
        run_counter += 1
        print(f"Starting run {run_counter}")
        env = os.environ
        env["YATH_JOB_COUNT"] = "20"
        result = subprocess.run(["perl", "t/test.pl"], env=env)
        if (result.returncode != 0):
            fail_counter += 1
        print(f"Finish run {run_counter}, total fail count: {fail_counter}")
    except KeyboardInterrupt:
        print(f"Finished {run_counter} runs with {fail_counter} fails")
        break
```

In case someone else wants to do it on their system :).
Note that YATH_JOB_COUNT may need to be changed loosely based on your
cores.
I only have 4 cores (8 threads), so for others higher numbers might
yield better results in hashing out unstable tests.
2024-07-31 17:13:28 +02:00
8b48579593 Merge pull request #1374 from Mindavi/bugfix/rendering-issue-content-addressed
ca-derivations: fix rendering issue
2024-04-18 13:08:30 -04:00
3f913a771d t: content-addressed: add a comment about a misleading testcase 2024-04-03 22:55:42 +02:00
1665aed5e3 t: content-addressed: add test for caDependingOnFailingCA
This uncovers an issue with the front-end.
2024-04-03 22:45:53 +02:00
e499509595 Switch to new Nix bindings, update Nix for that
Implements support for Nix's new Perl bindings[1]. The current state
basically does `openStore()`, but always uses `auto` and doesn't support
stores at other URIs.

Even though the stores are cached inside the Perl implementation, I
decided to instantiate those once in the Nix helper module. That way
store openings aren't cluttered across the entire codebase. Also, there
are two stores used later on - MACHINE_LOCAL_STORE for `auto`,
BINARY_CACHE_STORE for the one from `store_uri` in `hydra.conf` - and
using consistent names should make the intent clearer then.

This doesn't contain any behavioral changes, i.e. the build product
availability issue from #1352 isn't fixed. This patch only contains the
migration to the new API.

[1] https://github.com/NixOS/nix/pull/9863
2024-02-12 18:50:56 +01:00
c62eaf248f Remove now-unneeded workaround 2024-01-26 01:20:07 -05:00
13b5f007ef Merge branch 'master' into ca-no-new-col 2024-01-26 01:19:45 -05:00
5ee0e443e4 Remove now-unneeded workaround 2024-01-26 01:08:11 -05:00
323b556dc8 Minimal CA support
This verison has a worse UI, but also chnages the schema less: One
non-null constraint is removed, but no new columns are added.

Co-Authored-By: Andrea Ciceri <andrea.ciceri@autistici.org>
Co-Authored-By: regnat <rg@regnat.ovh>
2024-01-26 00:34:58 -05:00
fcde5908d8 More CA derivations prep
Again, with care not to change the schema in any way.
2024-01-25 21:32:22 -05:00
411e4d0c24 Let tests themselves intentionally leak temp dir (#1320)
* Let tests themselves intentionally leak temp dir

By default Yath will clean up temporary files, so the result is the
same. But `--keep-dirs` can be passed to `yath test` telling Yath to
*not* clean them up instead. This is very useful for debugging.

* Update t/lib/HydraTestContext.pm

Co-authored-by: Cole Helbling <cole.e.helbling@outlook.com>
2023-12-08 16:30:31 +00:00
ce001bb142 Relax time interval checks
I saw one of these failing randomly.
2023-06-23 15:09:09 +02:00
fd765bc97a Fix "My Jobs" tab in user dashboard
Nowadays `Builds` doesn't reference `Project` directly anymore. This
means that simply resolving both `jobset` and `project` with a single
JOIN from `Builds` doesn't work anymore. Instead we need to resolve the
relation to `jobset` first and then the relation to `project`.

For similar fixes see e.g. c7c4759600.
2022-11-22 20:54:51 +01:00
d3fe4ffbf6 Job: expose closuresize and size (output size in the UI) as prometheus metrics 2022-09-22 10:47:22 +02:00
c72bed5cb4 Fix tests
Use $NIX_REMOTE instead of the legacy environment variables.
2022-07-12 14:45:30 +02:00
bb1f04ed86 AddBuilds: fix declarative jobsets with dynamic runcommand enabled
$project->{enable_dynamic_run_command} is undefined
2022-06-30 01:49:30 +02:00
065039beba feat(t/evaluator/evaluate-oom): comment intentions 2022-05-02 15:26:26 -04:00
87f610e7c1 fix(t/evaluator/evaluate-oom): use test_context to get path to ./t/jobs instead of relative paths 2022-05-02 15:14:46 -04:00
013a1dcabc fix(t/evaluator/evaluate-oom): check that the exit value of the systemd-run check is zero. Rework skip messages 2022-05-02 15:13:59 -04:00
e917d9e546 fix(t/evaluator/evaluate-oom): convert systemd-run presence check to eval, fix indentaion, show relationships between flags and commands with indentation 2022-05-02 14:40:13 -04:00
01ec004108 feat(t/evaluator/evaluate-oom-job): skip test if systemd-run is not present 2022-05-02 14:08:50 -04:00
2c909c038f feat(t/evaluator/hydra-eval-jobs): add basic evaluation test for hydra-eval-jobs 2022-05-02 13:50:57 -04:00
90769ab5ad feat(t/jobs): add test job to cause an OOM 2022-05-02 13:49:32 -04:00
5c90edd19f Merge pull request #1103 from DeterminateSystems/runcommand/dynamic
Dynamic RunCommand
2022-04-19 10:09:47 -04:00
e1965250b5 Merge pull request #1173 from DeterminateSystems/queue-runner-exporter
hydra-queue-runner metrics
2022-04-07 12:27:33 -04:00
edf3c348f2 hydra-queue-runner: make entire address configurable 2022-04-06 10:59:45 -07:00
9c1f36c47c t/lib/HydraTestContext: set queue runner port to 0
This makes the exposer choose a random, available port.
2022-03-29 11:41:23 -07:00
e5393c2cf8 fixup: make id non-ambiguous 2022-03-19 23:56:47 -04:00
a582e4c485 HydraTestContext: add \n's to various dies 2022-03-19 14:46:53 -04:00
0c51de6334 hydra-evaluate-jobset: assert it logs errored constituents properly 2022-03-19 14:35:30 -04:00
25f6bae847 HydraTestContext: make it easy to create a jobset without evaluating 2022-03-19 14:34:43 -04:00
e0921eba0a Create a basic test which verifies we can't delete the derivation of aggregate jobs 2022-02-20 12:28:40 -05:00
be46f02164 tests: relocate evaluator tests 2022-02-20 12:28:40 -05:00
5d169e3a2e Add a test validating direct and indirect constituents 2022-02-20 12:28:40 -05:00