Skip to content

OMPI: Notified RMA ops Public API#9

Open
joe-explr wants to merge 76 commits intodevreal:notified-rmafrom
joe-explr:notified-rma-sm
Open

OMPI: Notified RMA ops Public API#9
joe-explr wants to merge 76 commits intodevreal:notified-rmafrom
joe-explr:notified-rma-sm

Conversation

@joe-explr
Copy link

@joe-explr joe-explr commented Nov 19, 2025

Summary of Changes:

  • Created a new files: put_with_notify.c.in, get_with_notify.c.in. to enable a public api for put_with_notify and get_with_notify.
  • Added OMPI_SPC_GET_WITH_NOTIFY, OMPI_SPC_PUT_WITH_NOTIFY enum to track call counts.
  • Added 'MPI_ERR_NOTIFY_IDX' to error out invalid notification_idx value.
  • Edited mpi.h.in to add definitions for the ERROR and the operation signature.

@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

8d7ea3b: Public APis to makefile

  • check_signed_off: does not contain a valid Signed-off-by line

0b6b3c3: Public APIs for put_withnotify. get_with_notify

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@joe-explr joe-explr force-pushed the notified-rma-sm branch 3 times, most recently from 5a9c010 to f79b94b Compare November 19, 2025 19:28
@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

8ba738e: Edits

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

@github-actions
Copy link

github-actions bot commented Dec 3, 2025

Hello! The Git Commit Checker CI bot found a few problems with this PR:

dff9ea5: Public APIs for:

  • check_signed_off: does not contain a valid Signed-off-by line

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

devreal and others added 12 commits January 15, 2026 11:58
Address sanitizer helps us catch memory bugs even if they don't
manifest into faults right away. The instrumention incurs
some overhead so this is run on a reduced set of mpi4py runs.
Also tests `ompi_info` and `mpicc`.

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
The PMIx_Fence_nb function can return PMIX_OPERATION_SUCCEEDED to indicate
that the function was executed atomically and the callback function will
therefore not be called. The PMIx Standard lists a few reasons why this
can happen, but the point here was to fix usage to properly handle
that possibility.

Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
The variable will go out of scope and ASAN flags this in ompi_info.

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
without this patch I keep seeing this compiler warning when building with newer gcc's.

shmem_put_nb.c:230:6: warning: no previous prototype for ‘shmemx_alltoall_global_nb’ [-Wmissing-prototypes]
 void shmemx_alltoall_global_nb(void *dest,
      ^~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
joe-explr and others added 5 commits February 3, 2026 09:20
This commit adds notification support to the OSC SM component by
implementing the put_with_notify, get_with_notify, rput_with_notify,
and rget_with_notify functions. These functions perform the same
operations as their non-notify counterparts but also increment
notification counters after the data transfer completes.

The changes include:
- Added function pointer types for notify variants in osc.h
- Added function prototypes in osc_sm.h
- Implemented the notify functions in osc_sm_comm.c
- Updated the module template to register the new functions
- Removed TODO comments that have been addressed

Signed-off-by: Joseph Antony <jajoseph.antony18@gmail.com>
Signed-off-by: Joseph Antony <jajoseph.antony18@gmail.com>
	put_with_notify
	get_with_notify

Signed-off-by: Joseph Antony <jajoseph.antony18@gmail.com>
            put_with_notify
            get_with_notify

    Signed-off-by: Joseph Antony <jajoseph.antony18@gmail.com>
…for a single and multi rank window.

Signed-off-by: Joseph Antony <jajoseph.antony18@gmail.com>
hppritcha and others added 13 commits February 26, 2026 10:20
…uce_topo

coll/acoll: Fix sbuf handling in reduce_topo
when fetching  PMIX_GROUP_LOCAL_CID  values from pmix server.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Signed-off-by: Shachar Hasson <shasson@nvidia.com>
Signed-off-by: Thomas Vegas <tvegas@nvidia.com>
    Signed-off-by: Joseph Antony <jajoseph.antony18@gmail.com>
    Signed-off-by: Joseph Antony <jajoseph.antony18@gmail.com>
    Signed-off-by: Joseph Antony <jajoseph.antony18@gmail.com>
    Signed-off-by: Joseph Antony <jajoseph.antony18@gmail.com>
Signed-off-by: Orion Poplawski <orion@nwra.com>
It indicates if any accelerator support has been build in, and if we
need all the frameworks dealing with such devices.

Dont build the smcuda BTL, coll accelerator, SMSC accelerator,
rcache gpusm and rgpusm without accelerators support.

Signed-off-by: George Bosilca <gbosilca@nvidia.com>
PMIX returns success and does not set the output to NULL while looking
for an optional key. Thus, to prevent segfaults we need to set the
output value to a known value before.

Signed-off-by: George Bosilca <gbosilca@nvidia.com>
Signed-off-by: George Bosilca <gbosilca@nvidia.com>
ompi_osc_base_module_free_fn_t osc_free;

ompi_osc_base_module_put_fn_t osc_put;
ompi_osc_base_module_put_notify_fn_t osc_put_notify;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should move these to the end of the struct. That avoids creating gaps in the struct that the other osc components need to fill (for now). It's not just osc/ubcl, it's also osc/ucx, osc/rdma, and osc/portals

brminich and others added 16 commits March 9, 2026 10:06
OMPI/RTE: Modify job name printing to use thread local storage
…thout_accelerator

Avoiding building support for any accelerator-based modules without accelerators
 Signed-off-by: Joseph Antony <jajoseph.antony18@gmail.com>
The newly added code to support shared memory queries in osc/rdma
had the check for MPI_PROC_NULL too late, which caused an out-of-bounds
access into the peer array.

Signed-off-by: Joseph Schuchart <joseph.schuchart@stonybrook.edu>
Drop __opal_attribute_always_inline__ for mca_part_persist_start
…sions

Comm create from group: improve debug statements
…roc-null

osc/rdma: Fix handling of MPI_PROC_NULL in shared_query
Signed-off-by: Matthew Whitlock <mwhitlo@sandia.gov>
…ailed

pml/ob1: Return MPI_ERR_PROC_FAILED on unmatched I(m)probes when appropriate
remove btl-smcuda,rcache-gpusm,rcache-rgpusm from the list of components
that have to be compiled as dsos at all times. The components do not
contain any references/function calls to a GPU software stack anymore,
everything is based off the accelerator framework APIs.

Signed-off-by: Edgar Gabriel <Edgar.Gabriel@amd.com>
…smcuda

opal/mca: remove smcuda from dso list
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.