Original commit message:
> There are some known regressions regarding local testing setups - since
> everything was kinda half written with the expectation that build dir =
> source dir (which should not be true anymore). But everything builds and
> the test suite runs fine, after several hours spent debugging random
> crashes in libpqxx with MALLOC_PERTURB_...
I have not experienced regressions with local testing.
(cherry picked from commit 4b886d9c45cd2d7fe9b0a8dbc05c7318d46f615d)
We should look into how to resolve this, but I tried some things and nothing really worked.
Let's put it skipped for now until someone comes along to improve it.
Only log issues/failures when something's actually up.
It has irked me for a long time that so much output came
out of running the tests, this seems to silence it.
It does hide some warnings, but I think it makes the output
so much more readable that it's worth the tradeoff.
Helps for highly parallel running of jobs, sometimes they'd not give output for a while.
Setting this timeout higher appears to help.
Not completely sure if this is the right place to do it, but it works fine for me.
We've seen many fails on ofborg, at lot of them ultimately appear to come down to
a timeout being hit, resulting in something like this:
Failure executing slapadd -F /<path>/slap.d -b dc=example -l /<path>/load.ldif.
Hopefully this resolves it for most cases.
I've done some endurance testing and this helps a lot.
some other commands also regularly time-out with high load:
- hydra-init
- hydra-create-user
- nix-store --delete
This should address most issues with tests randomly failing.
Used the following script for endurance testing:
```
import os
import subprocess
run_counter = 0
fail_counter = 0
while True:
try:
run_counter += 1
print(f"Starting run {run_counter}")
env = os.environ
env["YATH_JOB_COUNT"] = "20"
result = subprocess.run(["perl", "t/test.pl"], env=env)
if (result.returncode != 0):
fail_counter += 1
print(f"Finish run {run_counter}, total fail count: {fail_counter}")
except KeyboardInterrupt:
print(f"Finished {run_counter} runs with {fail_counter} fails")
break
```
In case someone else wants to do it on their system :).
Note that YATH_JOB_COUNT may need to be changed loosely based on your
cores.
I only have 4 cores (8 threads), so for others higher numbers might
yield better results in hashing out unstable tests.
Implements support for Nix's new Perl bindings[1]. The current state
basically does `openStore()`, but always uses `auto` and doesn't support
stores at other URIs.
Even though the stores are cached inside the Perl implementation, I
decided to instantiate those once in the Nix helper module. That way
store openings aren't cluttered across the entire codebase. Also, there
are two stores used later on - MACHINE_LOCAL_STORE for `auto`,
BINARY_CACHE_STORE for the one from `store_uri` in `hydra.conf` - and
using consistent names should make the intent clearer then.
This doesn't contain any behavioral changes, i.e. the build product
availability issue from #1352 isn't fixed. This patch only contains the
migration to the new API.
[1] https://github.com/NixOS/nix/pull/9863
This verison has a worse UI, but also chnages the schema less: One
non-null constraint is removed, but no new columns are added.
Co-Authored-By: Andrea Ciceri <andrea.ciceri@autistici.org>
Co-Authored-By: regnat <rg@regnat.ovh>
* Let tests themselves intentionally leak temp dir
By default Yath will clean up temporary files, so the result is the
same. But `--keep-dirs` can be passed to `yath test` telling Yath to
*not* clean them up instead. This is very useful for debugging.
* Update t/lib/HydraTestContext.pm
Co-authored-by: Cole Helbling <cole.e.helbling@outlook.com>
Nowadays `Builds` doesn't reference `Project` directly anymore. This
means that simply resolving both `jobset` and `project` with a single
JOIN from `Builds` doesn't work anymore. Instead we need to resolve the
relation to `jobset` first and then the relation to `project`.
For similar fixes see e.g. c7c47596009687d1652522c556333cefce28ec51.