Builds being aborted due to reason "Lost connection to build agent"

Bitrise Build Issue Report template

Description of the issue

Some of our builds are getting aborted with the reason “Lost connection to build agent”.

Environment:

On Bitrise.io, Standard Machine, Android & Docker, on Ubuntu 20.04.
Using the bitriseio/android-20.04:pinned docker image.

Which build Step causes the issue and which version of the step?

The connection is normally lost during our Gradle build, which is the longest portion of the build.

Reproducibility

  • Does a “Rebuild” help? (You can trigger a rebuild from the Build’s page, by clicking the “Rebuild” button in the top right corner of a finished build) :NO
  • Does a rebuild without caches help? (You can remove the Cache:Pull and Cache:Push steps temporarily to not to use the cache, or you can delete all the caches on the Settings tab of the app. : NO
  • Does the issue happen sporadically, or every time? : Sporadically
  • Does upgrading the build Step to the latest version help? : NO
  • When did the issue start? : Sept 25th is the first time we noticed this type of aborted build. We recently switched to a trial period of the new credit-based system and selected the “standard” machine as a start to our trial. Previously we had been using an Elite-level machine.

Local reproduction

Can it be reproduced on your own Mac/PC by following our local debug guide? Please follow at least the first section (“Testing with a full clean git clone”) to make sure to test the state of the code what bitrise.io will get when it does a git clone in the clean environment! If possible please note which sections you tried.

I have not attempted local reproduction, I’m unsure if the “Lost connection to build agent” is relevant for local reproduction.

Local reproduction: Linux / Android (docker based) stack builds

Can it be reproduced by running the build locally, after doing a new git clone of the repository into the /tmp directory and running the build from there with the Bitrise CLI ( Bitrise CLI )? If no, can it be reproduced with Docker (using the same docker images / environment we use on bitrise.io)? Related guide: Redirecting… .

Build log

Please copy paste the build’s bitrise.io URL here (or if the issue happens somewhere else then the full logs), or if you can’t share the url / log here then send the url or full log through a private channel (e.g. email - Contact us ), with a link to the related Discuss issue.

There are no interesting log outputs for this error, just the aborted status in Bitrise. I’d be happy to pull logs of any specific info that might be helpful.

We started using the Elite machines instead of Standard and this stopped happening. It seems strange that we are unable to rely on the slower machine to complete a build successfully.

We’ve been seeing the exact same issue. Not really sure what to do about it either as this is not happening locally at all.

Hello,

Sorry to hear this is happening, but we are on it! In the meantime, please add a script step to the beginning of your workflow that contains:

#!/usr/bin/env bash

docker run --privileged -i -v /:/host/ ubuntu:focal chroot /host/ sudo bash -c 'echo "Adjusting OOM scores"; for i in $(pgrep bitrise) $(pgrep docker) $(pgrep bitrise-agent); do echo "PID: $i $(ps -p $i -o comm=)"; echo -n "Old score: "; cat "/proc/${i}/oom_score"; echo "-1000" > "/proc/${i}/oom_score_adj"; echo -n "New score: "; cat "/proc/${i}/oom_score"; echo ""; done' 

Please let us know if you continue to experience this issue after adding this.

THANKS!

Actually, please post feedback after adding the script to let us know if it resolved the issue or if it didn’t!

We want to confirm that there isn’t another reason that this may be occuring!

Tried the above solution you provided but builds are still being aborted.