Known issue with the M1 open beta: Simulator GPU hang

Description of the issue

There is a bug in Apple’s hypervisor framework that can cause issues in virtualized environments like ours. The issue can occur whenever a simulator is launched: for example, our Xcode Test for iOS Step uses an iOS simulator to run tests. Once the simulator starts, your build can hang indefinitely. The issue has already been reported to Apple, under the following ID: FB10015465.

In order to confirm that you are having this issue, we recommend running the following command:
sudo dmesg

You might see a similar error:

[10090.671982]: Trying to restart GPU (Undefined)...

[10090.671990]: stalling for detach from AppleParavirtGPU

[10090.674754]: virtual void IOGPUScheduler::progressTimerCheckInterrupt(): GPURestartEnd stamp[idx:1, 11366144] type=0

[10090.674799]: virtual void IOGPUScheduler::progressTimerCheckInterrupt(): command timeout detected - GPU start time 10006873 ms, now 10090674 ms, priority=0

[10090.674806]: virtual void IOGPUScheduler::progressTimerCheckInterrupt(): GPURestartBegin stamp[idx:1, 11366144] type=2

[10090.674864]: GPU hang: AppleParavirtGPU HardwareDiagnosisReport (BEGIN)

If you encounter this issue, please vote here so we can better see its impact. Thank you!

How should I run sudo dmesg ? In a Script step? Or in Remote Access?

1 Like

It is really hard to predict if a build will produce the issue in order to build it we remote access in order to execute this command, is there a way you used to execute this command sudo dmesg during build ?

1 Like

I’ve seen a similar issue on the M1 beta. It doesn’t hang 100% of the time, but when it does, it’s after these two lines:

▸ Running script ‘[CP] Embed Pods Frameworks’
▸ Running script ‘[CP] Copy Pods Resources’

Since it’s pretty reliably reproducible for us, I was able to enable remote access and run sudo dmesg while it was hung. I did get a GPU hang as referenced above; here’s the full report if that’s helpful:

https://share.getcloudapp.com/o0uGx4wG

2 Likes

I have meet the same problem with my own virtualization app in m1. the host system is 12.3, do your guys get any feedback from apple?

Hello,
This is a way to detect hangs related to the Xcode simulator by using the Start simulator Step to preboot the simulator:

- xcode-start-simulator:
    inputs:
    - destination: platform=iOS Simulator,name=iPhone 8,OS=latest
    - wait_for_boot_timeout: 90
# Optional workaround to restart the current build
- trigger-bitrise-workflow:
    is_always_run: true
    run_if: '{{enveq "BITRISE_SIMULATOR_STATUS" "hanged"}}'
    inputs:
    - api_token: <insert build trigger token here>
    - workflow_id: <insert workflow name here>

Please see the details about how to mitigate this issue with the xcode-start-simulator step. :rocket:

1 Like

GPU-related issues were fixed in macOS Ventura beta, so there is a chance that this issue is fixed there. :slight_smile:

try upgrade host system to macOS Ventura beta, still can reproduce. a new reply has sent to apple feedback :cold_face:

Thanks for the link for the workaround. But can’t we simply use BITRISE_TRIGGERED_WORKFLOW_ID variable? So the steps could simply look like this right?

  _ci_fix_m1_hang_bug_workaround:
    steps:
    - xcode-start-simulator:
        inputs:
        - destination: platform=iOS Simulator,name=iPhone 8,OS=iOS 15.0
        - wait_for_boot_timeout: 90
    - trigger-bitrise-workflow:
        is_always_run: true
        run_if: '{{enveq "BITRISE_SIMULATOR_STATUS" "hanged"}}'
        inputs:
        - api_token: $TRIGGER_TOKEN
        - workflow_id: $BITRISE_TRIGGERED_WORKFLOW_ID

Hi there. Do you have any ways to solve these GPU panics/restart issues if the freeze occurs not on the simulator, but on the mac platform builds?

Hi,

We’ve just released Xcode 14.0 Ventura-based stack, where the issue is resolved based on our in-house testing. Please try out the stack if you experienced the GPU hang on M1 previously. The Xcode 14.1 Ventura-based stack will be released in the next few days also :rocket:

Also enabling the no output timeout for your workflows will abort hang builds to avoid waiting for hours.

1 Like