First class mono repo support

If we have a mono repo with 50 apps, then a single pull request will trigger 50 jobs on bitrise. Regardless of how many apps were actually changed in the PR.

As a workaround, every build job can inspect the git differences and determine if it should exit early.

It’d be better if the trigger logic for the job had first class support for change detection. For example, if I have a parent app, then there must be changes in the /parent/ folder for the job to run. Stopping the job from running in the first place is a huge performance win.

Here’s our current change detection logic:
https://github.com/instructure/instructure-android/blob/master/fastlane/utils/smart_build_utils.rb

Thanks for the #feature-request @bootstraponline!

So, just to be sure I understand this correctly, you’d like to register this repo multiple times as separate Bitrise projects/apps and only build the one which changed, right?

Question : is there a reason why you don’t want to register the repo only once, and just skip the parts you want to ignore (which did not change) during the build?

Yes[quote=“viktorbenei, post:2, topic:1993”]
Question : is there a reason why you don’t want to register the repo only once, and just skip the parts you want to ignore (which did not change) during the build?
[/quote]

I’m happy to do a screen share walk through of our setup. We tried that and the bitrise UI did not support this use case in a meaningful way. Have you tried adding 10+ different apps all with multiple unique workflows as a single job? It’s incredibly painful to understand the build results.

just skip the parts you want to ignore (which did not change) during the build?

This is what we’re doing. It’s quite slow. We have to wait for the build environment to be provisioned and the job must run to a certain part of the workflow to activate the git change detection. Also it’s not possible to differentiate between successful vs skipped jobs.

I think the main question then is whether we should improve the UI to be able to handle this as a single project, or should we improve it for the use case when you register the same repo as multiple projects.

To be honest registering it as 50 separate projects seems to be more time consuming than registering it as one project and then manage the 50 apps under that app/project.

If the UI would support this better, to have it registered as a single app/project on bitrise.io would you prefer that, or you’d still want to register these as separate projects?

If it’d be a single project on bitrise.io you shouldn’t skip jobs/builds, right? Or would that still be a requirement?

I think it’d be easier to just show you how we’re using bitrise via screenshare. It’s difficult to talk about monorepos in a theoretical sense.

Agree, let’s schedule a chat (please ping us through the onsite chat) and I’ll try to summarise the infos here (it’s important to have it in a written form, so that others - including bitrise devs - can also add to the discussion).

A very brief summary of the chat:


Project structure:

REPOROOT:
├── app1
├── app2
└── app3

All 3 apps have their own project on bitrise.io but are all in the same repository.

app1’s project should only start a build if there was any change under the “app1/” dir, app2’s project if a change under “app2/” dir etc.


Solution ideas:

  • A base level solution would be an Abort API endpoint which can be called from the build, to abort the build. Should support “abort with success”. Planned & scheduled for the API: Bitrise.io API v0.1 (Work In Progress)
  • A better solution would be to extend the trigger_map to allow filtering based on changed file paths not just branch/tag
2 Likes

I am in a monorepo environment.

A lot of changes are getting merged in concerning backend systems, often without any affect on the application we are deploying.

Would it be possible to disable the workflow, or at least large parts of the workflow, based on what files are changed? Our application resides in a theoretical “mobile_app” folder.

1 Like

@awitherow moved your post here as this seems to be the right discussion / #feature-request

Thanks for the use case and infos, please vote on this #feature-request if you’d want to see this implemented, and feel free to add a comment if you have any after reading through the discussion :slight_smile:

Update / status report: we did check the supported webhooks to see whether we have to add additional components/logic to get all the relevant info (time consuming) required to determine the file change list.

Unfortunately, after checking only GitHub and Bitbucket, only GitHub code push webhooks include the list directly. Bitbucket does not include the file change list in either in code push nor in PR webhooks, and GitHub does not include it in PR webhooks.

This means that to get the changed files list a new component should be added to the existing system, which receives the webhook but will have to delay starting the build (or not starting it) until it can retrieve the file change from the related service, and only perform the trigger map check after that.

The issue with this approach is:

  • It might significantly increase the webhook processing time. The trigger map check can only be performed after bitrise.io performs an API call to the relevant service (GitHub, Bitbucket, …) to retrieve the file change list. If those APIs are slow for any reason, or even worse have issues (causing retries for the API calls) that can slow down the whole processing and build starting quite a bit.
  • It requires an additional API call, specific for the service, for every supported service and webhook type.
  • It requires API authentication, meaning the API call can only be performed if you connect your GitHub/Bitbucket/… account to your Bitrise.io account (which is not a requirement for the current webhook system, only for pushing back build status to the service, but builds can start without any connected account just by processing the webhook).
  • Bitrise.io can’t return a precise response for the webhook like it does right now, only after it could retrieve the file change list. If it takes too long (API call timeout) then this will reduce the debugging options (http://devcenter.bitrise.io/webhooks/troubleshooting/).

This is how the current system works:

GitHub/Bitbucket/... -> webhooks.bitrise.io -> bitrise.io (start build)

And this is roughly how it should work to support file change based filtering:

GitHub/Bitbucket/... -> webhooks.bitrise.io -> bitrise.io -> GitHub/Bitbucket/... (API) -> bitrise.io (start build)

Don’t get me wrong, we don’t reject this #feature-request! It definitely can be done, but will take more time than we hoped. This is simply an update & summary of what we discovered so far.

1 Like

Additionally just a question @awitherow & @bootstraponline, as this came up during the internal discussions: why don’t you use submodules for the semi-related projects? Just curious.

In our case (we do have mono repos):

  • If the parts are actually just semi-related (micro services) we have separate repos for the separate projects and a single repository which includes all parts as git submodules. This way it’s possible to get all components e.g. for local development, but the projects are still tested separately.
  • If the parts of the project are dependent on each other then we have a single repo for those and perform test for every part for every change. E.g. if it’s a webservice and a related CLI project, if the webservice changes we want to make sure that the CLI still can communicate with it.

So my question would be - if you have the time of course - what do you do differently compared to the two use case / setup above? What do we miss? Or rather, what’s the reason you don’t use git submodules for example to separate semi-or-non related projects?

Submodules are the opposite of mono repos. Updating them at scale for many different projects is super painful. From a maintainability perspective, they are not a viable option for the projects I work on. API cancelling of builds with success reporting would be a nice short term solution.

I suggest checking out the existing implementation of Selective Builds on a large project. From an end user perspective, the experience is incredible. One repo. Only change code is built. File patterns for triggers are easy to understand, we only use one or two per app. I was able to seamlessly migrate my shell script which accomplished the same thing (git diffing / hard coded file paths / API cancelling).

I understand it’s a lot of work and introduces complexity into the system. I think the benefit is worth the cost, especially since it’s an opt-in feature and not on by default.

It’s less about the work, more about the added complexity mentioned at

but I definitely see your point. We’ll probably add the abort API with status for now, and try to think about a cleaner solution which preferably will provide proper troubleshooting options (e.g. at request timeouts when communicating with the APIs).

Sounds good. I’ll be happy as long as this gets added eventually. I don’t think there’s any rush.

1 Like

I think the only reason we have not used git submodules is inexperience with this.

Honestly the things COULD be separate repositories, but legacy deciders decided for a monorepo and we’ve kind of just continued marching to the beat of that drum.

Do you have any recommended places to get started with transforming a mono repo into an effectively maintained submodule based mono repo?

Thanks!

@awitherow to be honest we kind of just grew into it, it wasn’t planned so I don’t really have a guide for it.

For us it started like this:

  1. First we had a single mono repo for the website (Ruby on Rails)
  2. Then we realized that some of the features should be separated, either for technical reasons (e.g. using Go instead of RoR) or team structure reasons (that feature belongs to a separate team)
  3. So we created a “root” repository, which only has some utility scripts and guides, and which includes all the other services as git submodules
  4. This way we can test, deploy etc. the services separately (by separate teams)
  5. But we can still have all the services e.g. for local dev just by cloning the main repo and its submodules

This way e.g. the team working on the build cache service can have everything running locally, but will only change the build cache service’s code, but can test it as part of the service locally / in dev.

It’s not a simple setup, we had to get used to using git submodules (not a super simple thing), but right now this gives our teams the freedom to work separately, using the services & technologies they want to, and still have a main repo which makes it simple to get the whole service up and running relatively easily for local dev.

If you have any questions just let me know! :slight_smile:

Just and update and a question; we did discuss this a couple of times with the team and right now we’re thinking about something like this (question is what you think about it):

We would provide an alternative webhook server as a premium feature, most likely for an additional fix cost per app, which would include the required features to fetch the file change data from GitHub/Bitbucket/…

With this, instead of the previously mentioned flow:

We could simply:

GitHub/Bitbucket/... -> PREMIUM-WEBHOOKS.bitrise.io (-> call back to GitHub/Bitbucket/.. to fetch additional infos) -> bitrise.io (start build)

From an architectural point of view this seems to be the right solution, the question is whether you’d be OK with paying ~$10/mo/bitrise.io app for this feature.

What do you think @bootstraponline & @awitherow ? Would you pay ~$10/mo/bitrise.io app for this?


A bit more info about this solution:

  • The “premium webhook” server could be built on top of the existing open source GitHub - bitrise-io/bitrise-webhooks: Bitrise Webhooks processor, in an “open core” way, so the open source version could also be improved when/if required.
  • The interface for defining the file filters could remain the Trigger Map, no additional settings would be required, the Trigger Map syntax would be extended.
  • And overall pretty much every interface/troubleshooting feature could remain the same (e.g. the ones described at Redirecting… - Bitrise Docs ), so the tooling could remain unified and you won’t have to learn about additional tools for e.g. debugging/troubleshooting.

I’d prefer this be part of the “elite tier.” The problem with billing “per app” is it isn’t per app, it’s per CI job. Mono repos mean we have an ever growing amount of jobs. Having unpredictable billing as we use the service more doesn’t make sense. It’s similar to how GitHub use to charge per repo and now they charge per user. If the billing model limits use of the service then that’s a bit strange.

Buddybuild has the right model I think. Features are based on different tiers of concurrency.

1 Like