Skip to content

[linux-nvidia-6.17] Backport MPAM fixes and support for CPU-less NUMA nodes#348

Open
fyu1 wants to merge 49 commits intoNVIDIA:24.04_linux-nvidia-6.17-nextfrom
fyu1:24.04_linux-nvidia-6.17-next.mpam.extras.fixes2
Open

[linux-nvidia-6.17] Backport MPAM fixes and support for CPU-less NUMA nodes#348
fyu1 wants to merge 49 commits intoNVIDIA:24.04_linux-nvidia-6.17-nextfrom
fyu1:24.04_linux-nvidia-6.17-next.mpam.extras.fixes2

Conversation

@fyu1
Copy link
Copy Markdown
Collaborator

@fyu1 fyu1 commented Mar 20, 2026

This PR replaces #328

This branch fixes a few MPAM issues including:

  1. Performance issue due to small MBW_MIN on Grace: https://nvbugspro.nvidia.com/bug/5928376
  2. Performance issue due to 0 CMAX on Vera: https://nvbugspro.nvidia.com/bug/5717435
  3. Stress Online/offline issue on Vera: https://nvbugspro.nvidia.com/bug/5919525
  4. Clean up numa node MBA/MBM code to avoid future issues.

There are total 49 patches:

  1. The first 10 patches revert ARM's extra patches which are numa node, event filter, and mem hotplug patches. The patches are buggy and cause most of the above issues.
  2. The patches 11 and 12 revert old buggy T241-MPAM-4 Grace erratum workaround and apply an updated one.
  3. The patches 13-42 are from resctrl upstream for mainly alignment of monitoring type for the later numa patches.
  4. The patches 43-49 are mainly supporting CPU-less and numa node, plus fixing IOMMU, MSC tear down, MBWU type issues.

This is patches list:
0001-Revert-NVIDIA-SAUCE-untested-arm_mpam-resctrl-Allow-.patch
0002-Revert-NVIDIA-SAUCE-arm_mpam-resctrl-Add-NUMA-node-n.patch
0003-Revert-NVIDIA-SAUCE-untested-arm_mpam-resctrl-Split-.patch
0004-Revert-NVIDIA-SAUCE-arm_mpam-resctrl-Change-domain_h.patch
0005-Revert-NVIDIA-SAUCE-arm_mpam-resctrl-Pick-whether-MB.patch
0006-Revert-NVIDIA-SAUCE-Fix-unused-variable-warning.patch
0007-Revert-NVIDIA-SAUCE-fs-resctrl-Add-mount-option-for-.patch
0008-Revert-NVIDIA-SAUCE-fs-resctrl-Take-memory-hotplug-l.patch
0009-Revert-NVIDIA-SAUCE-mm-memory_hotplug-Add-lockdep-as.patch
0010-Revert-NVIDIA-SAUCE-untested-arm_mpam-resctrl-Allow-.patch
0011-Revert-NVIDIA-SAUCE-arm_mpam-Add-workaround-for-T241.patch
0012-NVIDIA-SAUCE-arm_mpam-Add-workaround-for-T241-MPAM-4.patch
0013-x86-fs-resctrl-Improve-domain-type-checking.patch
0014-x86-resctrl-Move-L3-initialization-into-new-helper-f.patch
0015-x86-resctrl-Refactor-domain_remove_cpu_mon-ready-for.patch
0016-x86-resctrl-Clean-up-domain_remove_cpu_ctrl.patch
0017-x86-fs-resctrl-Refactor-domain-create-remove-using-s.patch
0018-fs-resctrl-Split-L3-dependent-parts-out-of-mon_eve.patch
0019-x86-fs-resctrl-Use-struct-rdt_domain_hdr-when-readin.patch
0020-x86-fs-resctrl-Rename-struct-rdt_mon_domain-and-rdt
.patch
0021-x86-fs-resctrl-Rename-some-L3-specific-functions.patch
0022-fs-resctrl-Make-event-details-accessible-to-function.patch
0023-x86-fs-resctrl-Handle-events-that-can-be-read-from-a.patch
0024-x86-fs-resctrl-Support-binary-fixed-point-event-coun.patch
0025-x86-fs-resctrl-Add-an-architectural-hook-called-for-.patch
0026-x86-fs-resctrl-Add-and-initialize-a-resource-for-pac.patch
0027-fs-resctrl-Emphasize-that-L3-monitoring-resource-is-.patch
0028-x86-resctrl-Discover-hardware-telemetry-events.patch
0029-x86-fs-resctrl-Fill-in-details-of-events-for-perform.patch
0030-x86-fs-resctrl-Add-architectural-event-pointer.patch
0031-x86-resctrl-Find-and-enable-usable-telemetry-events.patch
0032-x86-resctrl-Read-telemetry-events.patch
0033-fs-resctrl-Refactor-mkdir_mondata_subdir.patch
0034-fs-resctrl-Refactor-rmdir_mondata_subdir_allrdtgrp.patch
0035-x86-fs-resctrl-Handle-domain-creation-deletion-for-R.patch
0036-x86-resctrl-Add-energy-perf-choices-to-rdt-boot-opti.patch
0037-x86-resctrl-Handle-number-of-RMIDs-supported-by-RDT
.patch
0038-fs-resctrl-Move-allocation-free-of-closid_num_dirty_.patch
0039-x86-fs-resctrl-Compute-number-of-RMIDs-as-minimum-ac.patch
0040-fs-resctrl-Move-RMID-initialization-to-first-mount.patch
0041-x86-resctrl-Enable-RDT_RESOURCE_PERF_PKG.patch
0042-x86-fs-resctrl-Update-documentation-for-telemetry-ev.patch
0043-NVIDIA-VR-SAUCE-arm_mpam-Fix-compilation-errors.patch
0044-NVIDIA-SAUCE-arm_mpam-Avoid-MSC-teardown-for-the-SW-.patch
0045-NVIDIA-VR-SAUCE-arm_mpam-Handle-CPU-less-numa-nodes.patch
0046-NVIDIA-VR-SAUCE-arm_mpam-Include-all-associated-MSC-.patch
0047-NVIDIA-SAUCE-resctrl-mpam-reset-RIS-by-applying-expl.patch
0048-NVIDIA-SAUCE-iommu-arm-smmu-v3-Fix-MPAM-for-indentit.patch
0049-NVIDIA-VR-SAUCE-arm_mpam-Resolve-MBWU-type-before-fe.patch

Test results are in http://10.112.214.86/vera/tests/ including

  1. init registers test
  2. iommu assignment test
  3. online/offline test
  4. Spec2017 performance test
  5. CXL test

GPU MPAM test is not covered because as of now there is SBIOS support for the feature yet.


LP: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.17/+bug/2146389

@nirmoy
Copy link
Copy Markdown
Collaborator

nirmoy commented Mar 23, 2026

These patches doesn't match upstream commit
8f62caa: x86/resctrl: Add energy/perf choices to rdt boot option
d0f8995: fs/resctrl: Move RMID initialization to first mount
for example:

 git range-diff 8f62caa8be62~1..8f62caa8be62 842e7f97d71a~1..842e7f97d71a
      ## Documentation/admin-guide/kernel-parameters.txt ##
    -@@
    +@@ Documentation/admin-guide/kernel-parameters.txt: Kernel parameters
    +   rdt=            [HW,X86,RDT]
                        Turn on/off individual RDT features. List is:
                        cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
    -                   mba, smba, bmec, abmc.
    +-                  mba, smba, bmec, abmc, sdciae.
     +                  mba, smba, bmec, abmc, sdciae, energy[:guid],
     +                  perf[:guid].
                        E.g. to turn on cmt and turn off mba use:

please make sure to add a comment if some upstream cherry-pick needed conflict fixes.
otherwise the series looks fine. Tested with older SBIOS

sudo /tmp/mpam-ok
[12:37:29] ============================================
[12:37:29]   MPAM Feature Validation (mpam-ok)
[12:37:29] ============================================
[12:37:29] Kernel:  6.17.9+
[12:37:29] Arch:    aarch64
[12:37:29] Date:    Mon Mar 23 12:37:29 PM UTC 2026
[12:37:29]
[12:37:29] Building memory workload ...
[12:37:29] Workload: /tmp/mpam_wl_e285ki (512 MB, 10s)
[12:37:29] INFO: STREAM binary not provided (-s); STREAM-based BW checks will be skipped
[12:37:29] INFO: MBA throttle test will rely on MBM counters only
[12:37:29] INFO: Install a STREAM benchmark binary and pass -s /path/to/stream for BW test
[12:37:29]
[12:37:29] --- Test: MPAM kernel support ---
[12:37:29] PASS  MPAM enabled: MPAM enabled with 47 PARTIDs and 2 PMGs
[12:37:29] --- Test: resctrl filesystem ---
[12:37:29] PASS  resctrl mounted successfully
[12:37:29] --- Test: NUMA topology ---
[12:37:29] PASS  NUMA: 2 node(s), 0 CPU-less, 0 CXL
[12:37:29] --- Test: resctrl resource info ---
[12:37:29] PASS  Resource: L3
[12:37:29] PASS  Resource: L3_MAX
[12:37:30] PASS  Resource: L3_MON
[12:37:30] PASS  Resource: MB_MON
[12:37:30] --- Test: schemata entries ---
[12:37:30] PASS  L3 allocation: 2 domain(s)
[12:37:30] FAIL  MBA allocation missing (monitoring exists with 2 domain(s) -- NUMA-based MSC support likely incomplete)
[12:37:30] --- Test: resctrl partition ---
[12:37:30] PASS  Created partition 'mpam_ok_18427', assigned PID 18427
[12:37:30] --- Test: monitor directories ---
[12:37:30] PASS  MB monitor directories: 2
[12:37:30] PASS  L3 monitor directories: 2
[12:37:30] --- Test: monitoring counters ---
[12:37:30] PASS  MBM readable: domain 0 (Unassigned bytes)
[12:37:30] PASS  MBM readable: domain 1 (Unassigned bytes)
[12:37:30] PASS  L3 occupancy readable (0 bytes)
[12:37:30] --- Test: MPAM schemata defaults (regression check, bugs 5717435/5928376) ---
[12:37:30] PASS  L3_MAX defaults safe: L3_MAX:1=100;2=100
[12:37:30] --- Test: MBA schemata write/readback ---
[12:37:30] SKIP  MBA schemata (no allocation domains)
[12:37:30] --- Test: MBM traffic detection ---
[12:37:30] --- Cleanup ---

@fyu1 fyu1 force-pushed the 24.04_linux-nvidia-6.17-next.mpam.extras.fixes2 branch 2 times, most recently from 96cc0e5 to 31c434f Compare March 24, 2026 02:03
@fyu1
Copy link
Copy Markdown
Collaborator Author

fyu1 commented Mar 24, 2026

Thank you very much for your comments! All comments have been addressed. Please review the new patches (in the same branch).

@clsotog
Copy link
Copy Markdown
Collaborator

clsotog commented Mar 24, 2026

I have not been able to finish the review but I have a question with this commit
d62e19f NVIDIA: SAUCE: arm_mpam: Add workaround for T241-MPAM-4
At 6.14 we took this patch from morse tree but now it says
backported from 02f5cf363057ceddd099313d1e43636fdcf3d47c dev/dev-main-nvidia-pset-linux-6.19.6
but how canonical will see this backport when doing the review?

@fyu1 fyu1 force-pushed the 24.04_linux-nvidia-6.17-next.mpam.extras.fixes2 branch from 31c434f to 7718a84 Compare March 24, 2026 02:53
@fyu1
Copy link
Copy Markdown
Collaborator Author

fyu1 commented Mar 24, 2026

I have not been able to finish the review but I have a question with this commit d62e19f NVIDIA: SAUCE: arm_mpam: Add workaround for T241-MPAM-4 At 6.14 we took this patch from morse tree but now it says backported from 02f5cf363057ceddd099313d1e43636fdcf3d47c dev/dev-main-nvidia-pset-linux-6.19.6 but how canonical will see this backport when doing the review?

Matt told me that the line [backported from ... pset_branch] is only for internal info. External people cannot see pset.

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented Mar 24, 2026

I have not been able to finish the review but I have a question with this commit d62e19f NVIDIA: SAUCE: arm_mpam: Add workaround for T241-MPAM-4 At 6.14 we took this patch from morse tree but now it says backported from 02f5cf363057ceddd099313d1e43636fdcf3d47c dev/dev-main-nvidia-pset-linux-6.19.6 but how canonical will see this backport when doing the review?

Matt told me that the line [backported from ... pset_branch] is only for internal info. External people cannot see pset.

We just need a way to identify the patch provenance. If the patch is already in a public location, we should pick from there.

@fyu1 I see this patch is on LKML (https://lore.kernel.org/all/20260313144617.3420416-38-ben.horgan@arm.com/) but differs a bit. Is the reason you didn't pick the LKML version due to the base set of MPAM patches we carry in linux-nvidia-6.17 being based on an older revision of the series?

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented Mar 24, 2026

@fyu1

611616a NVIDIA: VR: SAUCE: arm_mpam: Fix compilation errors

Nit: The change for resctrl_arch_rmid_read() is doing more than what is described in the commit message (changing number of parameters and parameter data types). Is that intended? (it looks like it’s trying to match the prototype but want to double check)

Similarly, mpam_resctrl_monitor_init() has more than just a name change.

@fyu1
Copy link
Copy Markdown
Collaborator Author

fyu1 commented Mar 24, 2026

I have not been able to finish the review but I have a question with this commit d62e19f NVIDIA: SAUCE: arm_mpam: Add workaround for T241-MPAM-4 At 6.14 we took this patch from morse tree but now it says backported from 02f5cf363057ceddd099313d1e43636fdcf3d47c dev/dev-main-nvidia-pset-linux-6.19.6 but how canonical will see this backport when doing the review?

Matt told me that the line [backported from ... pset_branch] is only for internal info. External people cannot see pset.

We just need a way to identify the patch provenance. If the patch is already in a public location, we should pick from there.

@fyu1 I see this patch is on LKML (https://lore.kernel.org/all/20260313144617.3420416-38-ben.horgan@arm.com/) but differs a bit. Is the reason you didn't pick the LKML version due to the base set of MPAM patches we carry in linux-nvidia-6.17 being based on an older revision of the series?

Hi, Matt,

They are same patch with minor changes. Need to change to this line to fit to 6.17:

  •   .iidr       = IIDR_PROD(0x241) | IIDR_VAR(0) | IIDR_REV(0) | IIDR_IMP(0x36b),
    

Now I backported the T241-MPAM-4 workaround patch from Ben's branch: https://gitlab.arm.com/linux-arm/linux-bh/-/commit/de0a00982d0aefb3d94828e908179aca02feaa85

Please check if the backported patch is good.

BTW, this workaround is only for Grace. Vera doesn't have MBW_MIN feature and doesn't need this workaround to function.

@fyu1 fyu1 force-pushed the 24.04_linux-nvidia-6.17-next.mpam.extras.fixes2 branch from 7718a84 to 0be0368 Compare March 24, 2026 06:56
@fyu1
Copy link
Copy Markdown
Collaborator Author

fyu1 commented Mar 24, 2026

@fyu1

611616a NVIDIA: VR: SAUCE: arm_mpam: Fix compilation errors

Nit: The change for resctrl_arch_rmid_read() is doing more than what is described in the commit message (changing number of parameters and parameter data types). Is that intended? (it looks like it’s trying to match the prototype but want to double check)

Similarly, mpam_resctrl_monitor_init() has more than just a name change.

Fixed. Add detailed changes in the commit message.

@clsotog
Copy link
Copy Markdown
Collaborator

clsotog commented Mar 24, 2026

I have not been able to finish the review but I have a question with this commit d62e19f NVIDIA: SAUCE: arm_mpam: Add workaround for T241-MPAM-4 At 6.14 we took this patch from morse tree but now it says backported from 02f5cf363057ceddd099313d1e43636fdcf3d47c dev/dev-main-nvidia-pset-linux-6.19.6 but how canonical will see this backport when doing the review?

Matt told me that the line [backported from ... pset_branch] is only for internal info. External people cannot see pset.

Some of the last patches also reference the pset 6.19 kernel so what we should do with those?

@fyu1 fyu1 force-pushed the 24.04_linux-nvidia-6.17-next.mpam.extras.fixes2 branch from 0be0368 to 9dbadcd Compare March 24, 2026 15:06
@jamieNguyenNVIDIA
Copy link
Copy Markdown
Collaborator

The following commit has two bodies and two sign-offs:
NVIDIA: VR: SAUCE: arm_mpam: Fix compilation errors to adapt to resctrl L3 domain and arch API updates

I believe you intended to remove this part:

Fix the following compilation errors:

1. Commit https://github.com/fyu1/NV-Kernels.fenghuay.baseos/commit/1e9e1305357e9cc033922b8c51217adb27f6d6cb ("x86,fs/resctrl: Rename struct rdt_mon_domain and
   rdt_hw_mon_domain") renames struct rdt_mon_domain to rdt_l3_mon_domain.
   Change the names in MPAM.
2. Implement empty resctrl arch API resctrl_arch_pre_mount(void) to make
   compilation succeed.

Fixes: https://github.com/fyu1/NV-Kernels.fenghuay.baseos/commit/a42549e64ce0aa7f72ec6fb47a8abd5ac6b428b8 ("NVIDIA: SAUCE: arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation")
Fixes: https://github.com/fyu1/NV-Kernels.fenghuay.baseos/commit/ae2a29c5ebb8d3ab1e83319465237f1713083dec ("NVIDIA: SAUCE: arm_mpam: resctrl: Add support for csu counters")
Fixes: https://github.com/fyu1/NV-Kernels.fenghuay.baseos/commit/1cbc0f2c3d5df7425f78060239f0c88925af95cb ("NVIDIA: SAUCE: arm_mpam: resctrl: Add resctrl_arch_config_cntr() for ABMC use")
Fixes: https://github.com/fyu1/NV-Kernels.fenghuay.baseos/commit/dd44394e2b41aff18a54379e4946ccbdc1b4b45e ("NVIDIA: SAUCE: arm_mpam: resctrl: Add resctrl_arch_rmid_read() and resctrl_arch_reset_rmid()")
Fixes: https://github.com/fyu1/NV-Kernels.fenghuay.baseos/commit/842967000721b1495ee4c24d0bcc8333228a8bc3 ("NVIDIA: SAUCE: arm_mpam: resctrl: Add resctrl_arch_cntr_read() & resctrl_arch_reset_cntr()")

Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

@fyu1 fyu1 force-pushed the 24.04_linux-nvidia-6.17-next.mpam.extras.fixes2 branch from 9dbadcd to 1091c7f Compare March 24, 2026 16:45
@fyu1
Copy link
Copy Markdown
Collaborator Author

fyu1 commented Mar 24, 2026

The following commit has two bodies and two sign-offs: NVIDIA: VR: SAUCE: arm_mpam: Fix compilation errors to adapt to resctrl L3 domain and arch API updates

I believe you intended to remove this part:

Fix the following compilation errors:

1. Commit https://github.com/fyu1/NV-Kernels.fenghuay.baseos/commit/1e9e1305357e9cc033922b8c51217adb27f6d6cb ("x86,fs/resctrl: Rename struct rdt_mon_domain and
   rdt_hw_mon_domain") renames struct rdt_mon_domain to rdt_l3_mon_domain.
   Change the names in MPAM.
2. Implement empty resctrl arch API resctrl_arch_pre_mount(void) to make
   compilation succeed.

Fixes: https://github.com/fyu1/NV-Kernels.fenghuay.baseos/commit/a42549e64ce0aa7f72ec6fb47a8abd5ac6b428b8 ("NVIDIA: SAUCE: arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation")
Fixes: https://github.com/fyu1/NV-Kernels.fenghuay.baseos/commit/ae2a29c5ebb8d3ab1e83319465237f1713083dec ("NVIDIA: SAUCE: arm_mpam: resctrl: Add support for csu counters")
Fixes: https://github.com/fyu1/NV-Kernels.fenghuay.baseos/commit/1cbc0f2c3d5df7425f78060239f0c88925af95cb ("NVIDIA: SAUCE: arm_mpam: resctrl: Add resctrl_arch_config_cntr() for ABMC use")
Fixes: https://github.com/fyu1/NV-Kernels.fenghuay.baseos/commit/dd44394e2b41aff18a54379e4946ccbdc1b4b45e ("NVIDIA: SAUCE: arm_mpam: resctrl: Add resctrl_arch_rmid_read() and resctrl_arch_reset_rmid()")
Fixes: https://github.com/fyu1/NV-Kernels.fenghuay.baseos/commit/842967000721b1495ee4c24d0bcc8333228a8bc3 ("NVIDIA: SAUCE: arm_mpam: resctrl: Add resctrl_arch_cntr_read() & resctrl_arch_reset_cntr()")

Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>

Hi, Jamie, Fixed in the updated branch: 68595a9

@fyu1
Copy link
Copy Markdown
Collaborator Author

fyu1 commented Mar 24, 2026

I have not been able to finish the review but I have a question with this commit d62e19f NVIDIA: SAUCE: arm_mpam: Add workaround for T241-MPAM-4 At 6.14 we took this patch from morse tree but now it says backported from 02f5cf363057ceddd099313d1e43636fdcf3d47c dev/dev-main-nvidia-pset-linux-6.19.6 but how canonical will see this backport when doing the review?

Matt told me that the line [backported from ... pset_branch] is only for internal info. External people cannot see pset.

Some of the last patches also reference the pset 6.19 kernel so what we should do with those?

Hi, Matt, since Carol is concerned about the pset branch names, do you want to keep them in the change logs?

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented Mar 24, 2026

I have not been able to finish the review but I have a question with this commit d62e19f NVIDIA: SAUCE: arm_mpam: Add workaround for T241-MPAM-4 At 6.14 we took this patch from morse tree but now it says backported from 02f5cf363057ceddd099313d1e43636fdcf3d47c dev/dev-main-nvidia-pset-linux-6.19.6 but how canonical will see this backport when doing the review?

Matt told me that the line [backported from ... pset_branch] is only for internal info. External people cannot see pset.

Some of the last patches also reference the pset 6.19 kernel so what we should do with those?

Let's keep the pset branch / SHA references to maintain provenance for our own tracking.

aegl and others added 23 commits March 25, 2026 22:18
…ming domains

The feature to sum event data across multiple domains supports systems with
Sub-NUMA Cluster (SNC) mode enabled. The top-level monitoring files in each
"mon_L3_XX" directory provide the sum of data across all SNC nodes sharing an
L3 cache instance while the "mon_sub_L3_YY" sub-directories provide the event
data of the individual nodes.

SNC is only associated with the L3 resource and domains and as a result the
flow handling the sum of event data implicitly assumes it is working with
the L3 resource and domains.

Reading of telemetry events does not require to sum event data so this feature
can remain dedicated to SNC and keep the implicit assumption of working with
the L3 resource and domains.

Add a WARN to where the implicit assumption of working with the L3 resource
is made and add comments on how the structure controlling the event sum
feature is used.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
(cherry picked from commit db64994)
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Each CPU collects data for telemetry events that it sends to the nearest
telemetry event aggregator either when the value of MSR_IA32_PQR_ASSOC.RMID
changes, or when a two millisecond timer expires.

There is a feature type ("energy" or "perf"), GUID, and MMIO region associated
with each aggregator. This combination links to an XML description of the
set of telemetry events tracked by the aggregator. XML files are published
by Intel in a GitHub repository¹.

The telemetry event aggregators maintain per-RMID per-event counts of the
total seen for all the CPUs. There may be multiple telemetry event aggregators
per package.

There are separate sets of aggregators for each feature type. Aggregators
in a set may have different GUIDs. All aggregators with the same feature
type and GUID are symmetric keeping counts for the same set of events for
the CPUs that provide data to them.

The XML file for each aggregator provides the following information:
0) Feature type of the events ("perf" or "energy")
1) Which telemetry events are tracked by the aggregator.
2) The order in which the event counters appear for each RMID.
3) The value type of each event counter (integer or fixed-point).
4) The number of RMIDs supported.
5) Which additional aggregator status registers are included.
6) The total size of the MMIO region for an aggregator.

Introduce struct event_group that condenses the relevant information from
an XML file. Hereafter an "event group" refers to a group of events of a
particular feature type (event_group::pfname set to "energy" or "perf") with
a particular GUID.

Use event_group::pfname to determine the feature id needed to obtain the
aggregator details. It will later be used in console messages and with the
rdt= boot parameter.

The INTEL_PMT_TELEMETRY driver enumerates support for telemetry events.
This driver provides intel_pmt_get_regions_by_feature() to list all available
telemetry event aggregators of a given feature type. The list includes the
"guid", the base address in MMIO space for the region where the event counters
are exposed, and the package id where the all the CPUs that report to this
aggregator are located.

Call INTEL_PMT_TELEMETRY's intel_pmt_get_regions_by_feature() for each event
group to obtain a private copy of that event group's aggregator data. Duplicate
the aggregator data between event groups that have the same feature type
but different GUID. Further processing on this private copy will be unique
to the event group.

  ¹https://github.com/intel/Intel-PMT

  [ bp: Zap text explaining the code, s/guid/GUID/g ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
(cherry picked from commit 1fb2daa)
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
…GUIDs

The telemetry event aggregators of the Intel Clearwater Forest CPU support two
RMID-based feature types: "energy" with GUID 0x26696143¹, and "perf" with
GUID 0x26557651².

The event counter offsets in an aggregator's MMIO space are arranged in groups
for each RMID.

E.g., the "energy" counters for GUID 0x26696143 are arranged like this:

  MMIO offset:0x0000 Counter for RMID 0 PMT_EVENT_ENERGY
  MMIO offset:0x0008 Counter for RMID 0 PMT_EVENT_ACTIVITY
  MMIO offset:0x0010 Counter for RMID 1 PMT_EVENT_ENERGY
  MMIO offset:0x0018 Counter for RMID 1 PMT_EVENT_ACTIVITY
  ...
  MMIO offset:0x23F0 Counter for RMID 575 PMT_EVENT_ENERGY
  MMIO offset:0x23F8 Counter for RMID 575 PMT_EVENT_ACTIVITY

After all counters there are three status registers that provide indications
of how many times an aggregator was unable to process event counts, the time
stamp for the most recent loss of data, and the time stamp of the most recent
successful update.

  MMIO offset:0x2400 AGG_DATA_LOSS_COUNT
  MMIO offset:0x2408 AGG_DATA_LOSS_TIMESTAMP
  MMIO offset:0x2410 LAST_UPDATE_TIMESTAMP

Define event_group structures for both of these aggregator types and define
the events tracked by the aggregators in the file system code.

PMT_EVENT_ENERGY and PMT_EVENT_ACTIVITY are produced in fixed point format.
File system code must output as floating point values.

  ¹https://github.com/intel/Intel-PMT/blob/main/xml/CWF/OOBMSM/RMID-ENERGY/cwf_aggregator.xml
  ²https://github.com/intel/Intel-PMT/blob/main/xml/CWF/OOBMSM/RMID-PERF/cwf_aggregator.xml

  [ bp: Massage commit message. ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
(cherry picked from commit 8f6b6ad)
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
The resctrl file system layer passes the domain, RMID, and event id to the
architecture to fetch an event counter.

Fetching a telemetry event counter requires additional information that is
private to the architecture, for example, the offset into MMIO space from
where the counter should be read.

Add mon_evt::arch_priv that architecture can use for any private data related
to the event. The resctrl filesystem initializes mon_evt::arch_priv when the
architecture enables the event and passes it back to architecture when needing
to fetch an event counter.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
(backported from commit 8ccb1f8)
[fenghuay: fix minor conflicts in __check_limbo()]
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Every event group has a private copy of the data of all telemetry event
aggregators (aka "telemetry regions") tracking its feature type. Included
may be regions that have the same feature type but tracking different GUID
from the event group's.

Traverse the event group's telemetry region data and mark all regions that
are not usable by the event group as unusable by clearing those regions'
MMIO addresses. A region is considered unusable if:
1) GUID does not match the GUID of the event group.
2) Package ID is invalid.
3) The enumerated size of the MMIO region does not match the expected
   value from the XML description file.

Hereafter any telemetry region with an MMIO address is considered valid for
the event group it is associated with.

Enable all the event group's events as long as there is at least one usable
region from where data for its events can be read. Enabling of an event can
fail if the same event has already been enabled as part of another event
group. It should never happen that the same event is described by different
GUID supported by the same system so just WARN (via resctrl_enable_mon_event())
and skip the event.

Note that it is architecturally possible that some telemetry events are only
supported by a subset of the packages in the system. It is not expected that
systems will ever do this. If they do the user will see event files in resctrl
that always return "Unavailable".

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
(cherry picked from commit 7e6df96)
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Introduce intel_aet_read_event() to read telemetry events for resource
RDT_RESOURCE_PERF_PKG. There may be multiple aggregators tracking each
package, so scan all of them and add up all counters. Aggregators may return
an invalid data indication if they have received no records for a given RMID.
The user will see "Unavailable" if none of the aggregators on a package
provide valid counts.

Resctrl now uses readq() so depends on X86_64. Update Kconfig.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
(cherry picked from commit 51541f6)
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Population of a monitor group's mon_data directory is unreasonably complicated
because of the support for Sub-NUMA Cluster (SNC) mode.

Split out the SNC code into a helper function to make it easier to add support
for a new telemetry resource.

Move all the duplicated code to make and set owner of domain directories into
the mon_add_all_files() helper and rename to _mkdir_mondata_subdir().

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
(cherry picked from commit 0ec1db4)
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Clearing a monitor group's mon_data directory is complicated because of the
support for Sub-NUMA Cluster (SNC) mode.

Refactor the SNC case into a helper function to make it easier to add support
for a new telemetry resource.

Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
(cherry picked from commit 93d9fd8)
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
…_PKG

The L3 resource has several requirements for domains. There are per-domain
structures that hold the 64-bit values of counters, and elements to keep
track of the overflow and limbo threads.

None of these are needed for the PERF_PKG resource. The hardware counters
are wide enough that they do not wrap around for decades.

Define a new rdt_perf_pkg_mon_domain structure which just consists of the
standard rdt_domain_hdr to keep track of domain id and CPU mask.

Update resctrl_online_mon_domain() for RDT_RESOURCE_PERF_PKG. The only action
needed for this resource is to create and populate domain directories if a
domain is added while resctrl is mounted.

Similarly resctrl_offline_mon_domain() only needs to remove domain directories.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
(cherry picked from commit f4e0cd8)
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Legacy resctrl features are enumerated by X86_FEATURE_* flags. These may be
overridden by quirks to disable features in the case of errata.  Users can use
kernel command line options to either disable a feature, or to force enable
a feature that was disabled by a quirk.

A different approach is needed for hardware features that do not have an
X86_FEATURE_* flag.

Update parsing of the "rdt=" boot parameter to call the telemetry driver
directly to handle new "perf" and "energy" options that controls activation of
telemetry monitoring of the named type. By itself a "perf" or "energy" option
controls the forced enabling or disabling (with ! prefix) of all event groups
of the named type. A ":guid" suffix allows for fine grained control per event
group.

  [ bp: s/intel_aet_option/intel_handle_aet_option/g ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
(backported from commit 842e7f9)
[fenghuay: fix a minor conflict in kernel-parameters.txt doc]
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
There are now three meanings for "number of RMIDs":

1) The number for legacy features enumerated by CPUID leaf 0xF. This is the
   maximum number of distinct values that can be loaded into MSR_IA32_PQR_ASSOC.
   Note that systems with Sub-NUMA Cluster mode enabled will force scaling down
   the CPUID enumerated value by the number of SNC nodes per L3-cache.

2) The number of registers in MMIO space for each event. This is enumerated in
   the XML files and is the value initialized into event_group::num_rmid.

3) The number of "hardware counters" (this isn't a strictly accurate
   description of how things work, but serves as a useful analogy that does
   describe the limitations) feeding to those MMIO registers. This is enumerated
   in telemetry_region::num_rmids returned by intel_pmt_get_regions_by_feature().

Event groups with insufficient "hardware counters" to track all RMIDs are
difficult for users to use, since the system may reassign "hardware counters"
at any time. This means that users cannot reliably collect two consecutive
event counts to compute the rate at which events are occurring.

Disable such event groups by default. The user may override this with
a command line "rdt=" option. In this case limit an under-resourced event
group's number of possible monitor resource groups to the lowest number of
"hardware counters".

Scan all enabled event groups and assign the RDT_RESOURCE_PERF_PKG resource
"num_rmid" value to the smallest of these values as this value will be used
later to compare against the number of RMIDs supported by other resources to
determine how many monitoring resource groups are supported.

N.B. Change type of resctrl_mon::num_rmid to u32 to match its usage and the
type of event_group::num_rmid so that min(r->num_rmid, e->num_rmid) won't
complain about mixing signed and unsigned types.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
(cherry picked from commit 67640e3)
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
closid_num_dirty_rmid[] and rmid_ptrs[] are allocated together during resctrl
initialization and freed together during resctrl exit.

Telemetry events are enumerated on resctrl mount so only at resctrl mount will
the number of RMID supported by all monitoring resources and needed as size
for rmid_ptrs[] be known.

Separate closid_num_dirty_rmid[] and rmid_ptrs[] allocation and free in
preparation for rmid_ptrs[] to be allocated on resctrl mount.

Keep the rdtgroup_mutex protection around the allocation and free of
closid_num_dirty_rmid[] as ARM needs this to guarantee memory ordering.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
(cherry picked from commit ee7f6af)
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
resctrl assumes that only the L3 resource supports monitor events, so it
simply takes the rdt_resource::num_rmid from RDT_RESOURCE_L3 as the system's
number of RMIDs.

The addition of telemetry events in a different resource breaks that
assumption.

Compute the number of available RMIDs as the minimum value across all
mon_capable resources (analogous to how the number of CLOSIDs is computed
across alloc_capable resources).

Note that mount time enumeration of the telemetry resource means that
this number can be reduced. If this happens, then some memory will
be wasted as the allocations for rdt_l3_mon_domain::mbm_states[] and
rdt_l3_mon_domain::rmid_busy_llc created during resctrl initialization will
be larger than needed.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
(cherry picked from commit 0ecc988)
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
L3 monitor features are enumerated during resctrl initialization and
rmid_ptrs[] that tracks all RMIDs and depends on the number of supported
RMIDs is allocated during this time.

Telemetry monitor features are enumerated during first resctrl mount and
may support a different number of RMIDs compared to L3 monitor features.

Delay allocation and initialization of rmid_ptrs[] until first mount.
Since the number of RMIDs cannot change on later mounts, keep the same set of
rmid_ptrs[] until resctrl_exit(). This is required because the limbo handler
keeps running after resctrl is unmounted and needs to access rmid_ptrs[]
as it keeps tracking busy RMIDs after unmount.

Rename routines to match what they now do:
dom_data_init() -> setup_rmid_lru_list()
dom_data_exit() -> free_rmid_lru_list()

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
(backported from commit d089164)
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
[fenghuay: fix minor conflicts in setup_rmid_lru_list() and dom_data_exit()]
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Since telemetry events are enumerated on resctrl mount the RDT_RESOURCE_PERF_PKG
resource is not considered "monitoring capable" during early resctrl initialization.
This means that the domain list for RDT_RESOURCE_PERF_PKG is not built when the CPU
hotplug notifiers are registered and run for the first time right after resctrl
initialization.

Mark the RDT_RESOURCE_PERF_PKG as "monitoring capable" upon successful telemetry
event enumeration to ensure future CPU hotplug events include this resource and
initialize its domain list for CPUs that are already online.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
(cherry picked from commit 4bbfc90)
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Update resctrl filesystem documentation with the details about the resctrl
files that support telemetry events.

  [ bp: Drop the debugfs hunk of the documentation until a better debugging
    solution is found. ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
(cherry picked from commit a8848c4)
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
…rl L3 domain and arch API updates

Upstream resctrl renamed the L3 monitor domain type and extended the arch
hooks:
1. Use struct rdt_l3_mon_domain in MPAM's resctrl integration,
2. Pass struct rdt_domain_hdr * into resctrl_online_mon_domain() /
   resctrl_offline_mon_domain(),
3. Match the new resctrl_arch_rmid_read() prototype (header pointer +
   arch_priv).
4. Update resctrl_arch_cntr_read(), resctrl_arch_reset_rmid(),
   resctrl_arch_reset_cntr(), and resctrl_arch_config_cntr() to take
   struct rdt_l3_mon_domain *.
5. Call the new resctrl_enable_mon_event() signature when wiring monitor
   events and set mon_capable from its return value.
6. Add a no-op resctrl_arch_pre_mount() so MPAM builds with the generic
   resctrl mount path.

Fixes: a42549e ("NVIDIA: SAUCE: arm_mpam: resctrl: Add boilerplate cpuhp and domain allocation")
Fixes: ae2a29c ("NVIDIA: SAUCE: arm_mpam: resctrl: Add support for csu counters")
Fixes: 1cbc0f2 ("NVIDIA: SAUCE: arm_mpam: resctrl: Add resctrl_arch_config_cntr() for ABMC use")
Fixes: dd44394 ("NVIDIA: SAUCE: arm_mpam: resctrl: Add resctrl_arch_rmid_read() and resctrl_arch_reset_rmid()")
Fixes: 8429670 ("NVIDIA: SAUCE: arm_mpam: resctrl: Add resctrl_arch_cntr_read() & resctrl_arch_reset_cntr()")

Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
…rors

No need to destory MSC instance for the user/admin programming errors
sicne it's not causing any functional issues.

Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
(cherry picked from 316e5833ccb2ef66f50290e48c45b70bf286c8fd dev/dev-main-nvidia-pset-linux-6.19.6)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
In a NUMA system, each node may include CPUs, memory, MPAM MSC
instances, or any combination thereof. Some high-end servers may
have NUMA nodes that include MPAM MSC but no CPUs. In such cases,
associate all possible CPUs for those MSCs.

Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
(cherry picked from f902b5abf39fe10a50b7062dc9ae9d2cfc723248 dev/dev-main-nvidia-pset-linux-6.19.6)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
…ring domain setup

The current MPAM driver only considers the first component associated
with an online/offline CPU during domain creation and teardown. This
is insufficient, as CPU-initiated traffic may traverse multiple MSCs
before reaching the target, and each MSC must be programmed consistently
for proper resource partitioning.

Update the MPAM driver to include all components associated with a
given CPU during domain setup/teardown to expose expected schemata
to userspace for effective resource control.

Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
(backported from 4309ce9856f87170670c9db40546d9f2fc9dbb86 dev/dev-main-nvidia-pset-linux-6.19.6)
[fenghuay: In addition to the core change, this backport includes the
following adaptations to bridge the gap between the 24.04 (6.17) MPAM
driver and the 6.19.6 base the original was written against:

  - Add for_each_mpam_resctrl_control() and for_each_mpam_resctrl_mon()
    iteration macros (from pset c15c066 and 4f42221)
  - Add MPAM_MAX_EVENT constant to bound the monitor event array
  - Add traffic_matches_l3() to validate that a memory-class MSC's
    traffic matches L3 egress topology (from pset ebc0760)
    Remove redundant if (class->type != MPAM_CLASS_MEMORY)
  - Replace exposed_alloc_capable/exposed_mon_capable static bools
    with dynamic resctrl_arch_alloc_capable()/resctrl_arch_mon_capable()
    that iterate over resources
  - Change mpam_resctrl_offline_cpu() return type from int to void
  - Change mpam_resctrl_monitor_init() return type from void to int
    and propagate errors
  - Change num_rmid from mpam_pmg_max + 1 to
    resctrl_arch_system_num_rmid_idx()
  - Use guard(mutex) for domain_list_lock
  - Use INIT_LIST_HEAD_RCU for domain lists
  - Fix not found mba issue on GMEM by only checking traffic_matches_l3() in
    mpam_resctrl_pick_mba() on class that doesn't have NUMA node]
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
…onfig

Reset an RIS by building a default mpam_config and applying it via
mpam_reprogram_ris_partid(), like any other config.

- mpam_init_reset_cfg(): set features and default values only for
  controls supported by the RIS (cpor_part, mbw_part, mbw_max,
  mbw_prop, cmax_cmax, cmax_cmin). Use full masks for CPBM/MBW_PBM
  and MPAMCFG_* defaults for MBW_MAX, CMAX, CMIN.
- mpam_reprogram_ris_partid(): apply cfg for all supported controls
  (no separate reset path).

Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
(backported from c076b208842db87ed50b1c63cff302975a9c8f67 dev/dev-main-nvidia-pset-linux-6.19.6)
[fenghuay: Fix porting conflicts and compilaton errors.
 Remove this sentence in the commit message to avoid confusion because
 MBW_PROP feature is not supported on Vera/Grace:
 "Include mpam_feat_mbw_prop when supported so MBW_PROP is written to 0
  on reset."]
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
There is no struct arm_smmu_domain context for domains configured
with identity mappings. Use the device to obtain the necessary
information to program PARTID and PMGID.

Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
(backported from e5020b38475ef58c5bb3d1a92028d4e0dd7aff4d dev/dev-main-nvidia-pset-linux-6.19.6)
[fenghuay: Koba Ko fixes a typo in iommu_group_get_qos_params():
s/!ops->set_group_qos_params/!ops->get_group_qos_params/]
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
…n mpam_msmon_read

Resolve mpam_feat_msmon_mbwu to the concrete counter type (31/44/63)
before mpam_has_feature() and before filling the mon_read arg. This
avoids -EOPNOTSUPP when only a specific MBWU feature is set, and
ensures _msmon_read() gets the resolved type in arg.type.

Fixes: 5b91005 ("NVIDIA: SAUCE: arm_mpam: Use long MBWU counters if supported")
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
@fyu1 fyu1 force-pushed the 24.04_linux-nvidia-6.17-next.mpam.extras.fixes2 branch from cc8ab11 to ecd11fd Compare March 25, 2026 22:19
@fyu1
Copy link
Copy Markdown
Collaborator Author

fyu1 commented Mar 25, 2026

I fixed a blocking issue on GMEM test failure in the patch "NVIDIA: VR: SAUCE: arm_mpam: Include all associated MSC components during domain setup" and updated its commit message. Here is the fix patch:
diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c
index f7c2bf8aba99..0accede8cc09 100644
--- a/drivers/resctrl/mpam_resctrl.c
+++ b/drivers/resctrl/mpam_resctrl.c
@@ -1162,7 +1162,9 @@ static void mpam_resctrl_pick_mba(void)
continue;
}

  •   if (!traffic_matches_l3(class)) {
    
  •   /* Check memory at egress from L3 for MSC with L3 */
    
  •   if (!cpumask_equal(&class->affinity, cpu_possible_mask) &&
    
  •       !traffic_matches_l3(class)) {
      	pr_debug("class %u traffic doesn't match L3 egress\n",
      		 class->level);
      	continue;
    

With this fix, I don't see MBA/MBM issue on GMEM test with an engineer built SBIOS enabling GPU MPAM.

If this PR is good for you, please merge it to 6.17 BaseOS.

Thank you very much for your help!

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented Mar 25, 2026

I fixed a blocking issue on GMEM test failure in the patch "NVIDIA: VR: SAUCE: arm_mpam: Include all associated MSC components during domain setup" and updated its commit message. Here is the fix patch: diff --git a/drivers/resctrl/mpam_resctrl.c b/drivers/resctrl/mpam_resctrl.c index f7c2bf8aba99..0accede8cc09 100644 --- a/drivers/resctrl/mpam_resctrl.c +++ b/drivers/resctrl/mpam_resctrl.c @@ -1162,7 +1162,9 @@ static void mpam_resctrl_pick_mba(void) continue; }

  •   if (!traffic_matches_l3(class)) {
    
  •   /* Check memory at egress from L3 for MSC with L3 */
    
  •   if (!cpumask_equal(&class->affinity, cpu_possible_mask) &&
    
  •       !traffic_matches_l3(class)) {
      	pr_debug("class %u traffic doesn't match L3 egress\n",
      		 class->level);
      	continue;
    

With this fix, I don't see MBA/MBM issue on GMEM test with an engineer built SBIOS enabling GPU MPAM.

If this PR is good for you, please merge it to 6.17 BaseOS.

Thank you very much for your help!

Re-reviewed and confirmed this was the only change. No issues with the change.

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

@jamieNguyenNVIDIA
Copy link
Copy Markdown
Collaborator

Acked-by: Jamie Nguyen <jamien@nvidia.com>

@nvmochs nvmochs changed the title Please merge MPAM fixes branch: 24.04 linux nvidia 6.17 next.mpam.extras.fixes2 [linux-nvidia-6.17] Backport MPAM fixes and support for CPU-less NUMA nodes Mar 25, 2026
return false;
}

cpu = cpumask_any_and(&class->affinity, cpu_online_mask);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put a check cpu >= nr_cpu_ids like in function topology_matches_l3.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although adding another sanity checking doesn't hurt, without the sanity checking, there won't be any issue because the next statements will check any invalid cpu anyway:
err = find_l3_equivalent_bitmask(cpu, tmp_cpumask);
if (err) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for looking.

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented Mar 25, 2026

PR sent to Canonical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants