Skip to content

OT-OT Duplicate Cleanup#249

Draft
GNiendorf wants to merge 1 commit intomasterfrom
dup_ot_cleanup
Draft

OT-OT Duplicate Cleanup#249
GNiendorf wants to merge 1 commit intomasterfrom
dup_ot_cleanup

Conversation

@GNiendorf
Copy link
Copy Markdown
Member

@GNiendorf GNiendorf commented Mar 24, 2026

Small PR that adds hit-matching checks and loosens some duplicate cleaning cuts to reduce the number of OT-OT duplicates (e.g. T5-pT5, T5-T5, etc.) by 85%. Most duplicates are pLS-OT type, but this PR fixes some low-hanging fruit.

Here was the original prompt and a link to the original plan if curious: https://gist.github.com/GNiendorf/ee534ab10da87b37e3b7175bb32fe932

I'm curious if you can go into plan mode, make a bunch of documentation about how the duplicate cleaning works (both within a TC type and cross cleaning over the different TC types) and then using that documentation, our goal is to make a ntuple sample of like say 300 events with the PU200RelVal sample and see which duplicates exist in the TC collection that are not pLS related. So just OT-OT duplicates like pT3-pT5 or pT5-pT5 or T5-T5 or like any variation of those but no pLS-T5 or pLS-pT5 like I don't care about those for now. My thought is I want to know what caused those OT-OT duplicates to slip through the code, like what cuts failed? Were they far in eta/phi space and that caused those duplicate pairs to slip through? I want to know which specific cuts caused it, and how to fix it so that we have 0 OT-OT duplicates in the 300 event sample ideally with no significant decrease in efficiency (both overall efficiency and displaced efficiency very important) from the fix that you figure out. Can you do this? You will have to see after you do the duplicate study of all the duplicate cleaning + cross cleaning kernels that we have enough info in the ntuple to answer these questions, and if not first add that info to the ntuple in the write lst ntuple .cc file or whatever. Then we generate the sample see which cuts failed and figure out how to fix it without decreasing efficiency (total and displaced track).

@GNiendorf
Copy link
Copy Markdown
Member Author

run-ci: all

@github-actions
Copy link
Copy Markdown

The PR was built and ran successfully in standalone mode running on CPU. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     28.0    323.1    242.2    133.0     45.9    702.0     10.8    114.6    116.5    210.5      0.1    1926.7    1196.7+/- 289.1     594.8   explicit[s=4] (target branch)
   avg     27.7    324.5    241.2    131.9     45.4    677.4     10.9    114.0    115.5    194.2      0.1    1882.8    1177.6+/- 284.2     591.0   explicit[s=4] (this PR)

@github-actions
Copy link
Copy Markdown

The PR was built and ran successfully with CMSSW running on CPU. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant