Repeated DNS resolution failures

koral · October 4, 2017, 4:36pm

Recently (about 1-2 weeks ago) our builds on bitrise.io started to fail randomly due to failures in name resolution.
A few examples:
https://www.bitrise.io/build/8f8aeff445e4cf83
https://www.bitrise.io/build/6b5ea6aeb5beae07
https://www.bitrise.io/build/4950f573ba5a574c
https://www.bitrise.io/build/4b53b075fc8d97be
Issues have never occurred in rebuilt builds.

There are no changes in DNS configuration of these domains (which we are aware of) and there are no reachability issues outside bitrise.io, however such usages were not as frequent as from bitrise builds.

gergelybekesi · October 5, 2017, 7:39am

Hey @koral

Do you have any network issue anywhere in the other steps?

koral · October 5, 2017, 9:25am

No, in failed builds these are only issues.

gergelybekesi · October 5, 2017, 9:32am

We will check this issue if it’s source is on our side. These VMs are hosted on Google Compute Engine with the standard DNS configuration, so there is a very small probability that this issue comes from us. Thank you for the report

koral · October 5, 2017, 10:12am

OK, I’ve also set up an external DNS reachability monitor at port-monitor.com. It performs checking every minute, so if issue occurs again we’ll know whether it was only on bitrise.io or globally.

gergelybekesi · October 5, 2017, 10:35am

Thank you very much, please let us know about the results

koral · October 30, 2017, 1:21pm

Issue appeared again on 2017-10-27T18:28:50Z (+20/-0 s).
Here is the build log: https://www.bitrise.io/build/19b44839b2e52522
That domain has not been monitored however, there is 2nd one monitored and handled by the same DNS servers, pointing to the different IP and that one has 100% uptime.

Additionally, there was another issue about 2017-10-27T18:33:00Z (+60/-0 min).
In this build: https://www.bitrise.io/build/924f50f7c14e28ab
That build was aborted due to timeout (75 min) but, it usually takes 20-25 min.

All that indicates that something went wrong on bitrise.io side.

tamaspapik · October 30, 2017, 6:36pm

Thanks for the feedback! I made anninternal note to check this.

viktorbenei · October 31, 2017, 11:43am

Is it always the same URL which fails? Asking because we did not see any DNS resolution issues, in fact we had no DNS resolution issue in any of our continuous “control” builds.

If it’s always the same URL which fails then it’s more likely that the issue is either on that service’s side, or somewhere in-between (e.g. on domain / DNS server level), but not on bitrise.io / on the bitrise.io build workers.

koral · October 31, 2017, 2:24pm

There are 2 domains: 1. serving REST API with several URLs, 2. serving Cisco VPN.

It seems that services are not even reached because domains cannot be resolved from bitrise.

What do you suggest then? Changing DNS server to something else, e.g. CloudFlare?

viktorbenei · October 31, 2017, 2:35pm

Hard to say. Keep in mind that these build VMs are always clean, in every build, meaning if there’s a blip/temp DNS issue that might not affect your monitoring service as it already resolved the IP, but an environment which never resolved the IP might be affected. AFAIK this might happen for example if the DNS - IP refresh frequency is too low. But again quite hard to say as we can’t reproduce this issue

koral · November 8, 2017, 6:09pm

Issue reappeared today (random order):
https://www.bitrise.io/build/7e8e639e221f780c
https://www.bitrise.io/build/7e8e639e221f780c
https://www.bitrise.io/build/f4a2695b29f671a1
https://www.bitrise.io/build/f8ff3e150cb92a0d

Few minutes after latest occurrence I could not repro that:
https://www.bitrise.io/build/f8ff3e150cb92a0d

viktorbenei · November 8, 2017, 6:13pm

Can you add a retry in the related Script for the call? We don’t see any issue with any of our tests and no reports either. Would help a lot if you could add a retry there and let us know if that helps or not.

koral · November 8, 2017, 6:44pm

OK, I’ve added steps like that after affected ones:

- script:
    is_always_run: true
    run_if: .IsBuildFailed
    inputs:
    - content: |-
        #!/usr/bin/env bash
        host <host from previous step>

viktorbenei · November 9, 2017, 2:23pm

Let us know how it goes!

Topic		Replies	Views
Networking issues? Builds	3	1194	July 22, 2019
Constant waiting for worker Website Issues	2	1200	April 25, 2021
Build succeeded previously were failing now with out any change on the code Builds	5	2457	July 6, 2019
Certificate-and-profile-installer@1.9.2 (exit code: 1) Builds	2	1385	May 28, 2018
What is the build device's UDID? Builds	2	1431	April 26, 2017

Repeated DNS resolution failures

Related topics