hydra-queue-runner: Limit memory usage

When using a binary cache store, the queue runner receives NARs from
the build machines, compresses them, and uploads them to the
cache. However, keeping multiple large NARs in memory can cause the
queue runner to run out of memory. This can happen for instance when
it's processing multiple ISO images concurrently.

The fix is to use a TokenServer to prevent the builder threads to
store more than a certain total size of NARs concurrently (at the
moment, this is hard-coded at 4 GiB). Builder threads that cause the
limit to be exceeded will block until other threads have finished.

The 4 GiB limit does not include certain other allocations, such as
for xz compression or for FSAccessor::readFile(). But since these are
unlikely to be more than the size of the NARs and hydra.nixos.org has
32 GiB RAM, it should be fine.
This commit is contained in:
Eelco Dolstra
2016-03-09 14:30:13 +01:00
parent 49a4639377
commit 9127f5bbc3
4 changed files with 84 additions and 27 deletions

View File

@ -283,16 +283,43 @@ void State::buildRemote(ref<Store> destStore,
/* Copy the output paths. */
if (/* machine->sshName != "localhost" */ true) {
printMsg(lvlDebug, format("copying outputs of %1% from %2%") % step->drvPath % machine->sshName);
MaintainCount mc(nrStepsCopyingFrom);
auto now1 = std::chrono::steady_clock::now();
PathSet outputs;
for (auto & output : step->drv.outputs)
outputs.insert(output.second.path);
MaintainCount mc(nrStepsCopyingFrom);
/* Query the size of the output paths. */
size_t totalNarSize = 0;
to << cmdQueryPathInfos << outputs;
to.flush();
while (true) {
if (readString(from) == "") break;
readString(from); // deriver
readStrings<PathSet>(from); // references
readLongLong(from); // download size
totalNarSize += readLongLong(from);
}
printMsg(lvlDebug, format("copying outputs of %s from %s (%d bytes)")
% step->drvPath % machine->sshName % totalNarSize);
/* Block until we have the required amount of memory
available. FIXME: only need this for binary cache
destination stores. */
auto resStart = std::chrono::steady_clock::now();
auto memoryReservation(memoryTokens.get(totalNarSize));
auto resStop = std::chrono::steady_clock::now();
auto resMs = std::chrono::duration_cast<std::chrono::milliseconds>(resStop - resStart).count();
if (resMs >= 1000)
printMsg(lvlError, format("warning: had to wait %d ms for %d memory tokens for %s")
% resMs % totalNarSize % step->drvPath);
result.accessor = destStore->getFSAccessor();
auto now1 = std::chrono::steady_clock::now();
to << cmdExportPaths << 0 << outputs;
to.flush();
destStore->importPaths(false, from, result.accessor);