Skip to content

CI: Added GHA CI workflow#303

Merged
jithunnair-amd merged 43 commits intomasterfrom
leo/create-gha-workflow
Mar 11, 2026
Merged

CI: Added GHA CI workflow#303
jithunnair-amd merged 43 commits intomasterfrom
leo/create-gha-workflow

Conversation

@leo-amd
Copy link
Copy Markdown
Collaborator

@leo-amd leo-amd commented Feb 19, 2026

Testing:
Build job: https://github.com/ROCm/apex/actions/runs/22921037182/job/66569521660?pr=303
Test job: https://github.com/ROCm/apex/actions/runs/22921037182/job/66569521362?pr=303

Cherry-picked to release/1.8.0 branch via #317

Cherry-picked to release/1.9.0 branch via #318

Cherry-picked to release/1.10.0 branch via #319

@leo-amd leo-amd requested a review from jithunnair-amd March 10, 2026 15:00
@jithunnair-amd
Copy link
Copy Markdown
Collaborator

Created a separate story to address:

@jithunnair-amd jithunnair-amd merged commit 4fe55b9 into master Mar 11, 2026
2 of 4 checks passed
@jithunnair-amd jithunnair-amd deleted the leo/create-gha-workflow branch March 11, 2026 18:28
@leo-amd leo-amd restored the leo/create-gha-workflow branch March 12, 2026 15:08
@leo-amd
Copy link
Copy Markdown
Collaborator Author

leo-amd commented Mar 17, 2026

! cherry-pick --onto release/1.8.0

rocm-repo-management-api-6 bot pushed a commit that referenced this pull request Mar 17, 2026
* Added GHA CI workflow

* Change target branch

* Update naming

* ci: trigger actions

* Move the file

* Setup python env

* Use containers

* These k8s runners don't support native containers, therefore I am running containers in bash

* Typo

* Fix git dubious ownership

* Git fixes

* Typo

* Cmake change

* requirements.txt fix

* Clone in container

* Resolve latest PyTorch main SHA

* Rewrite from scratch

* Set rocm

* Add sanity check

* set -euxo pipefail

* typo

* Rewritten

* Fix tests

* Set large timeout for tests

* Split the steps

* Implement discussed features

* Fix tests

* Fix tests more

* Try tests

* Removed the HIP_VISIBLE_DEVICES code

* Lock the RCCL context

* Force  CPU to wait for the GPUs, and we need to force all GPUs to wait for each other before anyone is allowed to reset the memory pool

* Revert

* Resolve comments

* Hausekepping

* Run CI

* Propagate import errors

* Extension tests fix

* Apply launch bounds unconditionally

* Define USE_ROCM during JIT compilation

* Revert some changes

* Resolve comments

* Fix typo
@rocm-repo-management-api-6
Copy link
Copy Markdown

Created branch autogenerated/release/1.8.0_cherry-pick_pr-303 and #317

Comment processed by Build

@leo-amd
Copy link
Copy Markdown
Collaborator Author

leo-amd commented Mar 18, 2026

! cherry-pick --onto release/1.9.0

rocm-repo-management-api-6 bot pushed a commit that referenced this pull request Mar 18, 2026
* Added GHA CI workflow

* Change target branch

* Update naming

* ci: trigger actions

* Move the file

* Setup python env

* Use containers

* These k8s runners don't support native containers, therefore I am running containers in bash

* Typo

* Fix git dubious ownership

* Git fixes

* Typo

* Cmake change

* requirements.txt fix

* Clone in container

* Resolve latest PyTorch main SHA

* Rewrite from scratch

* Set rocm

* Add sanity check

* set -euxo pipefail

* typo

* Rewritten

* Fix tests

* Set large timeout for tests

* Split the steps

* Implement discussed features

* Fix tests

* Fix tests more

* Try tests

* Removed the HIP_VISIBLE_DEVICES code

* Lock the RCCL context

* Force  CPU to wait for the GPUs, and we need to force all GPUs to wait for each other before anyone is allowed to reset the memory pool

* Revert

* Resolve comments

* Hausekepping

* Run CI

* Propagate import errors

* Extension tests fix

* Apply launch bounds unconditionally

* Define USE_ROCM during JIT compilation

* Revert some changes

* Resolve comments

* Fix typo
@rocm-repo-management-api-6
Copy link
Copy Markdown

Created branch autogenerated/release/1.9.0_cherry-pick_pr-303 and #318

Comment processed by Build

@leo-amd
Copy link
Copy Markdown
Collaborator Author

leo-amd commented Mar 18, 2026

! cherry-pick --onto release/1.10.0

rocm-repo-management-api-6 bot pushed a commit that referenced this pull request Mar 18, 2026
* Added GHA CI workflow

* Change target branch

* Update naming

* ci: trigger actions

* Move the file

* Setup python env

* Use containers

* These k8s runners don't support native containers, therefore I am running containers in bash

* Typo

* Fix git dubious ownership

* Git fixes

* Typo

* Cmake change

* requirements.txt fix

* Clone in container

* Resolve latest PyTorch main SHA

* Rewrite from scratch

* Set rocm

* Add sanity check

* set -euxo pipefail

* typo

* Rewritten

* Fix tests

* Set large timeout for tests

* Split the steps

* Implement discussed features

* Fix tests

* Fix tests more

* Try tests

* Removed the HIP_VISIBLE_DEVICES code

* Lock the RCCL context

* Force  CPU to wait for the GPUs, and we need to force all GPUs to wait for each other before anyone is allowed to reset the memory pool

* Revert

* Resolve comments

* Hausekepping

* Run CI

* Propagate import errors

* Extension tests fix

* Apply launch bounds unconditionally

* Define USE_ROCM during JIT compilation

* Revert some changes

* Resolve comments

* Fix typo
@rocm-repo-management-api-6
Copy link
Copy Markdown

Created branch autogenerated/release/1.10.0_cherry-pick_pr-303 and #319

Comment processed by Build

@amd-sriram amd-sriram mentioned this pull request Mar 22, 2026
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants