Repeated DNS resolution failures

Recently (about 1-2 weeks ago) our builds on started to fail randomly due to failures in name resolution.
A few examples:
Issues have never occurred in rebuilt builds.

There are no changes in DNS configuration of these domains (which we are aware of) and there are no reachability issues outside, however such usages were not as frequent as from bitrise builds.

1 Like

Hey @koral

Do you have any network issue anywhere in the other steps?

1 Like

No, in failed builds these are only issues.

We will check this issue if it’s source is on our side. These VMs are hosted on Google Compute Engine with the standard DNS configuration, so there is a very small probability that this issue comes from us. Thank you for the report :slight_smile:

OK, I’ve also set up an external DNS reachability monitor at It performs checking every minute, so if issue occurs again we’ll know whether it was only on or globally.

1 Like

Thank you very much, please let us know about the results :slight_smile:

1 Like

Issue appeared again on 2017-10-27T18:28:50Z (+20/-0 s).
Here is the build log:
That domain has not been monitored however, there is 2nd one monitored and handled by the same DNS servers, pointing to the different IP and that one has 100% uptime.

Additionally, there was another issue about 2017-10-27T18:33:00Z (+60/-0 min).
In this build:
That build was aborted due to timeout (75 min) but, it usually takes 20-25 min.

All that indicates that something went wrong on side.

1 Like

Thanks for the feedback! I made anninternal note to check this.

1 Like

Is it always the same URL which fails? Asking because we did not see any DNS resolution issues, in fact we had no DNS resolution issue in any of our continuous “control” builds.

If it’s always the same URL which fails then it’s more likely that the issue is either on that service’s side, or somewhere in-between (e.g. on domain / DNS server level), but not on / on the build workers.

There are 2 domains: 1. serving REST API with several URLs, 2. serving Cisco VPN.

It seems that services are not even reached because domains cannot be resolved from bitrise.

What do you suggest then? Changing DNS server to something else, e.g. CloudFlare?

Hard to say. Keep in mind that these build VMs are always clean, in every build, meaning if there’s a blip/temp DNS issue that might not affect your monitoring service as it already resolved the IP, but an environment which never resolved the IP might be affected. AFAIK this might happen for example if the DNS - IP refresh frequency is too low. But again quite hard to say as we can’t reproduce this issue :confused:

1 Like

Issue reappeared today (random order):

Few minutes after latest occurrence I could not repro that:

Can you add a retry in the related Script for the call? We don’t see any issue with any of our tests and no reports either. Would help a lot if you could add a retry there and let us know if that helps or not.

OK, I’ve added steps like that after affected ones:

- script:
    is_always_run: true
    run_if: .IsBuildFailed
    - content: |-
        #!/usr/bin/env bash
        host <host from previous step>
1 Like

Let us know how it goes! :wink: