Commit Graph

3105 Commits

Author SHA1 Message Date
Janne Heß
c6424f37a6 templates: Hopefully escape all template inputs 2025-08-10 12:40:21 +02:00
Janne Heß
b94f47ed27 templates: Make whitespace in [% %] consistent 2025-08-10 12:40:21 +02:00
Janne Heß
615798a51e templates: Use HTML.attributes for all links 2025-08-10 12:40:21 +02:00
Janne Heß
99a6656b40 build: Properly escape all input values 2025-08-10 12:40:21 +02:00
Janne Heß
33b5c6fb41 product-list: Escape untrusted values 2025-08-10 12:40:21 +02:00
Janne Heß
5f226f3b6f hydra-queue-runner: Validate metric type 2025-08-10 12:40:21 +02:00
Janne Heß
7c4f0ab01a hydra-queue-runner: Validate hydra-metrics unit 2025-08-10 12:40:21 +02:00
Janne Heß
0d3842aa2f hydra-queue-runner: Validate metric name in hydra-metrics 2025-08-10 12:40:21 +02:00
Janne Heß
a0ba36db79 hydra-queue-runner: Validate release name 2025-08-10 12:40:21 +02:00
Janne Heß
552ca356ae hydra-queue-runner: Verify product names in hydra-build-products 2025-08-10 12:40:20 +02:00
Janne Heß
85b330be41 hydra-queue-runner: Fix potential UB
Removing two characters from a string when it starts with " can lead to
a substring call with -1
2025-08-02 17:21:27 +02:00
Janne Heß
1657f6fff4 hydra-queue-runner: Fix crash when < > are in hydra-build-products
This prevents a forever-hanging build (don't know why) when < or > are
in the path of hydra-build-products. This is not to prevent any XSS (see
next commits), just to prevent the DOS (if you can even call it that).
2025-08-02 17:21:27 +02:00
Janne Heß
05a05667d8 Merge branch 'master' into fix/useless-message 2025-08-02 14:21:44 +02:00
Janne Heß
0527fddd6a Remove useless previous eval message
This message serves no purpose and looks like something went wrong.
There is nothing wrong, there is just no previous evaluation.
2025-08-02 14:20:59 +02:00
Janne Heß
0017a1d0f3 Merge pull request #1498 from NixOS/feat/new-q-runner-machine-status
machine-status: Render new queue runner details
2025-08-02 12:11:07 +00:00
Janne Heß
7096ae3a5b machine-status: Fixup double localhost during development 2025-08-02 14:05:23 +02:00
Janne Heß
d2c10bf851 Fixup static libraries in development server 2025-08-02 13:53:22 +02:00
Janne Heß
632a59172a machine-status: Make new runner status prettier
- Remove bottom margin
- Properly format memory in human format
- Calculate free memory
- Format the load with 2 digits after comma
- Lpad pressure percentages
- Use a macro to render pressure
- Score -> Scheduling Score
- More spacing in the load
- Add IRQ pressure
2025-08-01 11:25:14 +02:00
Janne Heß
7b1968236d machine-status: Render new queue runner details 2025-07-31 18:45:04 +02:00
Janne Heß
b812bb5017 Merge pull request #869 from andir/patch-1
Add Queue Runner Status to the topbar
2025-07-17 21:31:27 +00:00
Janne Heß
61573c71d1 Merge pull request #1497 from helsinki-systems/feat/show-new-q-runner-status
Show queue runner v2 status
2025-07-17 21:30:36 +00:00
Janne Heß
f50263976c Merge branch 'master' into patch-1 2025-07-17 23:21:18 +02:00
Janne Heß
97ec796db5 Merge branch 'master' into CORE-21733-add-link-to-raw-log 2025-07-16 18:42:40 +02:00
Janne Heß
2fcfa969b8 Merge branch 'master' into fix/local-store-detection 2025-07-16 18:25:54 +02:00
Janne Heß
d0008d4238 Show queue runner v2 status
This is guarded behind a setting and will overwrite everything that was
learned from the machines file. Also drops `sshKeys` since that wasn't
used anyway.
2025-07-16 17:39:06 +02:00
Dionysis Grigoropoulos
62fcacb7d2 fix: Update Nix download url 2025-07-15 19:45:13 +03:00
John Ericson
278a3ebfd5 Fix build with Nix 2.29 2025-05-25 20:53:18 -04:00
Pierre Bourdon
720db63d52 queue runner: attempt at slightly smarter scheduling criteria
Instead of just going for "whatever is the oldest build we know of",
use the following first:

- Is the step more constrained? If so, schedule it first to avoid
  filling up "more desirable" build slots with less constrained builds.

- Does the step have more dependents? If so, schedule it first to try
  and maximize open parallelism and breadth of scheduling options.

(cherry picked from commit b8d03adaf4)
2025-04-20 13:44:06 +10:00
Pierre Bourdon
0ab357e435 jobset-eval: fix actions not showing up sometimes for new jobs
New jobs have their "new" status take precedence over them being
"failed" or "queued", which means actions that can act on "failed" or
"queued" jobs weren't shown to the user when they could only act on
"new" jobs.

(cherry picked from commit 9a4a5dd624)
2025-04-16 09:50:32 +10:00
Jörg Thalheim
6fcfa9e796 Merge commit from fork
Re-enable restrict-eval for non-flakes
2025-04-15 06:48:18 +02:00
Martin Weinelt
cf33a9158a web: increase colspan for machine row in machine status 2025-04-13 08:29:01 +02:00
Jörg Thalheim
8d75026513 re-enable restrict-eval for non-flakes 2025-04-11 13:42:55 +02:00
Maximilian Bosch
f1a976d3fd Fix displaying eval errors in jobset eval view
Quickfix for something that annoyed me once too often.

Specifically, I'm talking about `/eval/1#tabs-errors`.

To not fetch long errors on each request, this is only done on-demand.
I.e., when the tab is opened, an iframe is requested with the errors.
This iframe uses a template for both the jobset view and the jobset-eval
view. It is differentiated by checking if `jobset` or `eval` is defined.

However, the jobset-eval view also has a `jobset` variable in its stash
which means that in both cases the `if` path was used. Since
`jobset.fetcherrormsg` isn't defined in the eval case though, you always
got an empty error.

The band-aid fix is relatively simple: swap if and else: the `eval`
variable is not defined in the stash of the jobset view, so now this is
a useful condition to decide which view we're in.

(cherry picked from commit 70c3d75f73)
2025-04-11 09:03:11 +10:00
Sandro Jäckel
7e0157e387 Fix compilation with a nix which was compiled withou aws sdk 2025-04-09 17:53:14 +02:00
John Ericson
a5b17d0686 Queue-runner: Always produce a machines JSON object
Even if there are no machines, there should at least be an empty object.
2025-04-08 17:38:19 -04:00
Pierre Bourdon
b4322edd05 web: replace 'errormsg' with 'errormsg IS NULL' in most cases
This is implement in an extremely hacky way due to poor DBIx feature
support. Ideally, what we'd need is a way to tell DBIx to ignore the
errormsg column unless explicitly requested, and to automatically add a
computed 'errormsg IS NULL' column in others. Since it does not support
that, this commit instead hacks some support via method overrides while
taking care to not break anything obvious.
2025-04-07 14:48:07 -04:00
Pierre Bourdon
143a07bff0 queue-runner: release machine reservation while copying outputs
This allows for better builder usage when the queue runner is busy. To
avoid running into uncontrollable imbalances between builder/queue
runner, we only release the machine reservation after the local
throttler has found a slot to start copying the outputs for that build.

As opposed to asserting uniqueness to understand resource utilization,
we just switch to using `std::unique_ptr`.
2025-04-07 14:01:50 -04:00
K900
8a6482bb1c Add metric for builds waiting for download slot
(cherry picked from commit f23ec71227911891807706b6b978836e4d80edde)
2025-04-07 13:16:49 -04:00
Pierre Bourdon
8e02589ac8 queue-runner: switch to pseudorandom ordering of builds processing
We don't rely on sequential / monotonic build IDs processing anymore, so
randomizing actually has the advantage of mixing builds for different
systems together, to avoid only one chunk of builds for a single system
getting processed while builders for other systems are starved.
2025-04-07 12:33:35 -04:00
Pierre Bourdon
52a0199a9b queue runner: introduce some parallelism for remote paths lookup
Each output for a given step being ingested is looked up in parallel,
which should basically multiply the speed of builds ingestion by the
average number of outputs per derivation.
2025-04-07 12:33:35 -04:00
Pierre Bourdon
9265fc5002 queue-runner: reduce the time between queue monitor restarts
This will induce more DB queries (though these are fairly cheap), but at
the benefit of processing bumps within 1m instead of within 10m.
2025-04-07 12:33:35 -04:00
Pierre Bourdon
d8ffa6b56a queue-runner: remove id > X from new builds query
Running the query with/without it shows that it makes no difference to
postgres, since there's an index on finished=0 already. This allows a
few simplifications, but also paves the way towards running multiple
parallel monitor threads in the future.
2025-04-07 12:33:35 -04:00
Pierre Bourdon
efcf6815d9 queue-runner: add prom metrics to allow detecting internal bottlenecks
By looking at the ratio of running vs. waiting for the dispatcher and
the queue monitor, we should get better visibility into what hydra is
currently bottlenecked on.

There are other side effects we can try to measure to get to the same
result, but having a simple way doesn't cost us much.
2025-04-07 12:33:35 -04:00
Pierre Bourdon
1e2d3211d9 queue-runner: limit parallelism of CPU intensive operations
My current theory is that running more parallel xz than available CPU
cores is reducing our overall throughput by requiring more scheduling
overhead and more cache thrashing.
2025-04-07 12:33:35 -04:00
Pierre Bourdon
5a9985f96c web: Skip System on /machines
It is redundant
2025-04-07 12:33:35 -04:00
Maximilian Bosch
6133693097 readIntoSocket: fix with store URIs containing an &
The third argument to `open()` in `-|` mode is passed to a shell if it's
a string. In my case the store URI contains
`?secret-key=${signingKey.directory}/secret&compression=zstd`

For the `nix store cat` case this means that

* until `&` the process will be started in the background. This fails
  immediately because no path to cat is specified.
* `compression=zstd` is a variable assignment
* the `$path` argument to `store cat` is attempted to be executed as
  another command

Passing just the list solves the problem.

(cherry picked from commit 3ee51dbe589458cc54ff753317bbc6db530bddc0)
2025-04-07 11:59:49 -04:00
git@71rd.net
abe35881e4 Stream files from store instead of buffering them
When an artifact is requested from hydra the output is first copied
from the nix store into memory and then sent as a response, delaying
the download and taking up significant amounts of memory.

As reported in https://github.com/NixOS/hydra/issues/1357

Instead of calling a command and blocking while reading in the entire
output, this adds read_into_socket(). the function takes a
command, starting a subprocess with that command, returning a file
descriptor attached to stdout.
This file descriptor is then by responsebuilder of Catalyst to steam
the output directly

(cherry picked from commit 459aa0a5983a0bd546399c08231468d6e9282f54)
2025-04-07 11:59:49 -04:00
ajs124
99359c251a lazy-load evaluation errors
Closes #1362
2025-04-07 11:54:47 -04:00
Maximilian Bosch
9d8f30affe Only show stepname if it doesn't equal the name of the drv
When building e.g. nixpkgs, the "Running builds" view will mostly look
like this

    hello.x86_64-linux (Build of hello-X.Y)
    exa.x86_64-linux (Build of exa-X.Y)
    ...

This doesn't provide any useful information. Showing the step name only
makes sense if it's not a child of the job's derivation. With this
patch, that information will only be shown if the drv name (i.e. w/o
`/nix/store/` prefix, .drv ext & hash) is not equal to the drv name of
the job itself (build.nixname).
2025-04-07 11:54:47 -04:00
Maximilian Bosch
33b982f408 Running builds view: show build step names
When using Hydra to build machine configurations, you'll often see
"nixosConfigurations.foo" five times, i.e. for each build step being
run. This isn't very helpful I think because in such a case, a single
build step can also be compiling the Linux kernel.

This change also fetches the `drvpath` and `type` from the `buildsteps`
relation. We're already joining it, so this doesn't make much difference
(confirmed via query logging that this doesn't cause extra SQL queries).

Unfortunately build steps don't have a human readable name, so I'm
deriving it from the drvpath by stripping away the hash (assuming that
it'll never contain a `-` and that `/nix/store/` is used as prefix). I
decided against using the Nix bindings for that to avoid too much
overhead due to store operations for each build step.
2025-04-07 11:54:47 -04:00