Gitlab webhook calling multiple times

gitlab
#1

I don’t know if Gitlab is calling the callback multiple times or Bitrise is starting multiple builds for one call. I created the webhook configuration from bitrise/gitlab integration and my trigger only on develop branch. It has began to happen 2 weeks ago. Every push on develop branch starts 4 or 5 builds. Anyone is having this issue?

1 Like

#2

Hi @jera,

Thanks for reporting this here! :slight_smile:

We did receive reports from a few other GitLab users, pretty much the same thing what you described, that sometimes a push triggers multiple builds, but not always.

This is what we know so far:

  • Bitrise starts a single build for every webhook it receives. If GitLab sends 3 webhooks for the same event then 3 builds will be triggered. One webhook can’t start more than one build.
  • The issue started to happen a couple of weeks ago.

What we think happened (but we have to confirm this) is that GitLab most likely changed something related to webhook sending. It seems that they had webhook retries for a while, but probably they lowered the timeout for retries. Or maybe they didn’t have retry before.

What is a retry? Related docs: https://gitlab.com/help/user/project/integrations/webhooks.md#webhook-endpoint-tips

Basically, if GitLab sends a webhook it expects a response in X seconds. If it does not receive one, it’ll retry the webhook.

From their docs, the one linked above:

Your endpoint should send its HTTP response as fast as possible. If you wait too long, GitLab may decide the hook failed and retry it.

Now, the question is how many seconds “as fast as possible” is. Depending on the complexity of the build trigger it might take a couple of seconds for our webhook handler to respond.

We suspect that this is what causes the issue, that “as fast as possible” was lowered recently.

What we can do

We’ll investigate the issue more thoroughly next week, but so far the only solution seems to be adding a special handling to the webhook server, used only for GitLab webhooks, to filter out retries.

The technical challenge here is that most other git services don’t do retries, or if the service does it includes some info in the retries which indicates that this is a retry. In case of GitLab we could not find any docs/info in the webhook which would indicate that this is indeed a retry.

This means that we’ll most likely have to implement a retry detection ourselves, storing infos temporarily about received GitLab webhooks in a database, and checking every incoming GitLab webhook against the db to see if this is a retry of a webhook.

We’ll dig deeper next week, but if this is indeed the only solution then it might take a couple of days to implement the required changes, as no other service required this retry filtering/detection so far.

We’ll definitely find and deploy a solution for this issue next week.

1 Like

#3

PR to fix this issue:

In short, we decided that in case of GitLab we simply won’t wait for the build to actually start/trigger. As soon as the Trigger API call is sent to bitrise.io the gitlab handler simply returns with a {"did_not_wait_for_trigger_response":true,"success_responses":[]} response for GitLab.

As GitLab does not have any kind of webhook history on its UI this does not matter that much or at all, as our trigger responses were never visible on the GitLab UI anyway.

0 Likes

#4

Status: the PR is ready, we’re testing it, and will deploy it into production ASAP.

0 Likes

#5

Fix deployed into production.

If you’d still see this issue happening please let us know!

0 Likes