Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ rule:
- api: wevtapi.EvtOpenSession
- basic block:
- and:
- string: /wevtutil(\.exe)?\s+(clear-log|cl)/i
- string: /\bwevtutil(\.exe)?\s+(clear-log|cl)/i
- call:
- and:
- string: /wevtutil(\.exe)?\s+(clear-log|cl)/i
- string: /\bwevtutil(\.exe)?\s+(clear-log|cl)/i
156 changes: 78 additions & 78 deletions anti-analysis/reference-analysis-tools-strings.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,81 +15,81 @@ rule:
- al-khaser_x86.exe_
features:
- or:
- string: /ollydbg(\.exe)?/i
- string: /ProcessHacker(\.exe)?/i
- string: /tcpview(\.exe)?/i
- string: /autoruns(\.exe)?/i
- string: /autorunsc(\.exe)?/i
- string: /filemon(\.exe)?/i
- string: /procmon(\.exe)?/i
- string: /regmon(\.exe)?/i
- string: /procexp(\.exe)?/i
- string: /(?<!\w)ida[gqtuw]?(\.exe)?$/i
- string: /ida[gqtuw]?64(\.exe)?$/i
- string: /ImmunityDebugger(\.exe)?/i
- string: /Wireshark(\.exe)?/i
- string: /dumpcap(\.exe)?/i
- string: /HookExplorer(\.exe)?/i
- string: /ImportREC(\.exe)?/i
- string: /PETools(\.exe)?/i
- string: /LordPE(\.exe)?/i
- string: /SysInspector(\.exe)?/i
- string: /proc_analyzer(\.exe)?/i
- string: /sysAnalyzer(\.exe)?/i
- string: /sniff_hit(\.exe)?/i
- string: /windbg(\.exe)?/i
- string: /joeboxcontrol(\.exe)?/i
- string: /joeboxserver(\.exe)?/i
- string: /ResourceHacker(\.exe)?/i
- string: /x32dbg(\.exe)?/i
- string: /x64dbg(\.exe)?/i
- string: /Fiddler(\.exe)?/i
- string: /httpdebugger(\.exe)?/i
- string: /fakenet(\.exe)?/i
- string: /netmon(\.exe)?/i
- string: /WPE PRO(\.exe)?/i
- string: /decompile(\.exe)?/i
- string: /scylla/i
- string: /megadumper/i
- string: /apdagent(\.exe)?/i
- string: /apimonitor(\.exe)?/i
- string: /azurearcsystray(\.exe)?/i
- string: /binaryninja(\.exe)?/i
- string: /burpsuite(\.exe)?/i
- string: /charles\.exe/i
- string: /cutter(\.exe)?/i
- string: /dbgx\.shell(\.exe)?/i
- string: /df5serv(\.exe)?/i
- string: /frida(\.exe)?/i
- string: /httpanalyzerv7(\.exe)?/i
- string: /httpdebuggerui(\.exe)?/i
- string: /netcat(\.exe)?/i
- string: /pin\.exe/i
- string: /prl_tools(\.exe)?/i
- string: /qemu-ga(\.exe)?/i
- string: /rammap(\.exe)?/i
- string: /rammap64(\.exe)?/i
- string: /rdpclip(\.exe)?/i
- string: /tasklist/i
- string: /cred-store(\.exe)?/i
- string: /decoder\.exe/i
- string: /dnspy(\.exe)?/i
- string: /drrun(\.exe)?/i
- string: /dumpit(\.exe)?/i
- string: /frida-inject(\.exe)?/i
- string: /frida-server(\.exe)?/i
- string: /gdb\.exe/i
- string: /httpdebuggersvc(\.exe)?/i
- string: /ilspy(\.exe)?/i
- string: /inetsim(\.exe)?/i
- string: /ksdumper(\.exe)?/i
- string: /ksdumperclient(\.exe)?/i
- string: /mitmdump(\.exe)?/i
- string: /pestudio(\.exe)?/i
- string: /private-cloud-proxy(\.exe)?/i
- string: /process\.exe/i
- string: /r2\.exe/i
- string: /rekall(\.exe)?/i
- string: /tcpdump(\.exe)?/i
- string: /windasm(\.exe)?/i
- string: /x32dbgn(\.exe)?/i
- string: /\bollydbg(\.exe)?\b/i
- string: /\bProcessHacker(\.exe)?\b/i
- string: /\btcpview(\.exe)?\b/i
- string: /\bautoruns(\.exe)?\b/i
- string: /\bautorunsc(\.exe)?\b/i
- string: /\bfilemon(\.exe)?\b/i
- string: /\bprocmon(\.exe)?\b/i
- string: /\bregmon(\.exe)?\b/i
- string: /\bprocexp(\.exe)?\b/i
- string: /\bida[gqtuw]?(\.exe)?\b/i
- string: /\bida[gqtuw]?64(\.exe)?\b/i
- string: /\bImmunityDebugger(\.exe)?\b/i
- string: /\bWireshark(\.exe)?\b/i
- string: /\bdumpcap(\.exe)?\b/i
- string: /\bHookExplorer(\.exe)?\b/i
- string: /\bImportREC(\.exe)?\b/i
- string: /\bPETools(\.exe)?\b/i
- string: /\bLordPE(\.exe)?\b/i
- string: /\bSysInspector(\.exe)?\b/i
- string: /\bproc_analyzer(\.exe)?\b/i
- string: /\bsysAnalyzer(\.exe)?\b/i
- string: /\bsniff_hit(\.exe)?\b/i
- string: /\bwindbg(\.exe)?\b/i
- string: /\bjoeboxcontrol(\.exe)?\b/i
- string: /\bjoeboxserver(\.exe)?\b/i
- string: /\bResourceHacker(\.exe)?\b/i
- string: /\bx32dbg(\.exe)?\b/i
- string: /\bx64dbg(\.exe)?\b/i
- string: /\bFiddler(\.exe)?\b/i
- string: /\bhttpdebugger(\.exe)?\b/i
- string: /\bfakenet(\.exe)?\b/i
- string: /\bnetmon(\.exe)?\b/i
- string: /\bWPE PRO(\.exe)?\b/i
- string: /\bdecompile(\.exe)?\b/i
- string: /\bscylla\b/i
- string: /\bmegadumper\b/i
- string: /\bapdagent(\.exe)?\b/i
- string: /\bapimonitor(\.exe)?\b/i
- string: /\bazurearcsystray(\.exe)?\b/i
- string: /\bbinaryninja(\.exe)?\b/i
- string: /\bburpsuite(\.exe)?\b/i
- string: /\bcharles\.exe\b/i
- string: /\bcutter(\.exe)?\b/i
- string: /\bdbgx\.shell(\.exe)?\b/i
- string: /\bdf5serv(\.exe)?\b/i
- string: /\bfrida(\.exe)?\b/i
- string: /\bhttpanalyzerv7(\.exe)?\b/i
- string: /\bhttpdebuggerui(\.exe)?\b/i
- string: /\bnetcat(\.exe)?\b/i
- string: /\bpin\.exe\b/i
- string: /\bprl_tools(\.exe)?\b/i
- string: /\bqemu-ga(\.exe)?\b/i
- string: /\brammap(\.exe)?\b/i
- string: /\brammap64(\.exe)?\b/i
- string: /\brdpclip(\.exe)?\b/i
- string: /\btasklist\b/i
- string: /\bcred-store(\.exe)?\b/i
- string: /\bdecoder\.exe\b/i
- string: /\bdnspy(\.exe)?\b/i
- string: /\bdrrun(\.exe)?\b/i
- string: /\bdumpit(\.exe)?\b/i
- string: /\bfrida-inject(\.exe)?\b/i
- string: /\bfrida-server(\.exe)?\b/i
- string: /\bgdb\.exe\b/i
- string: /\bhttpdebuggersvc(\.exe)?\b/i
- string: /\bilspy(\.exe)?\b/i
- string: /\binetsim(\.exe)?\b/i
- string: /\bksdumper(\.exe)?\b/i
- string: /\bksdumperclient(\.exe)?\b/i
- string: /\bmitmdump(\.exe)?\b/i
- string: /\bpestudio(\.exe)?\b/i
- string: /\bprivate-cloud-proxy(\.exe)?\b/i
- string: /\bprocess\.exe\b/i
- string: /\br2\.exe\b/i
- string: /\brekall(\.exe)?\b/i
- string: /\btcpdump(\.exe)?\b/i
- string: /\bwindasm(\.exe)?\b/i
- string: /\bx32dbgn(\.exe)?\b/i
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with this change, the patterns are much harder to read as a human, which is unfortunate.

this isn't a comment on the quality of the PR, just about what we're considering.

it seems many of these regexes are regexes versus substrings so that we can match the optional extension in a single pass. we could replace these with something like:

- substring: foo
- substring: foo.exe

this would be much easier to read (the \b would happen behind the scenes), though presumably runtime would be longer, because there are twice as many regexes to run.

with a little bit of benchmarking we could show this is or is not acceptable. thoughts @mr-tz @mike-hunhoff @Shaktisinhchavda ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the regexes are hard to audit. I propose refactoring them into simple substring pairs for the tool names and extensions. This prioritizes clarity while maintaining the same logic. I’m happy to update the PR this way if you agree.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is what i proposed above. please suggest a plan that includes benchmarking capa before/after to show that the runtime doesn't change substantially when introducing additional substring features.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the scenario that runtime is worse, we can imagine an optimization to capa that translates an or(substring, substring, substring) into a single regex that matches in a single pass. so i don't think this is insurmountable. but we should still check the data before committing to the changes.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed that data should drive the decision. Here is my plan for benchmarking the performance impact:

--Test Set: Run capa against the complete capa-testfiles repository.
Evaluation: Compare average wall-clock runtime across three passes for the current regex rules versus the proposed substring refactor.
--Verification: Confirm that match results and hit counts remain identical for all samples.

I’ll share the results here once complete. I also agree that if we do see a performance hit, an engine-level optimization to merge OR-ed substrings would be an excellent long-term solution.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've completed the benchmarking using a 3-pass average against capa-testfiles. Results confirmed identical hits but showed a ~55% increase in average runtime (from 1.04s up to 1.62s per file).

While the delta is ~0.6s, I believe the significant gain in readability and auditability justifies the overhead. We could potentially recover this performance later via the engine-level optimization mentioned. Is this trade-off acceptable to the team? @williballenthin

Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,6 @@ rule:
- optional:
- match: host-interaction/process/create
- or:
- string: /vaultcmd(\.exe)?/
- string: /\bvaultcmd(\.exe)?\b/
- substring: "/listcreds:"
- substring: "\"Windows Credentials\""
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ rule:
- 7204e3efc2434012e13ca939db0d0b02:0x403028
features:
- and:
- string: /ipconfig(\.exe)?/i
- string: /\bipconfig(\.exe)?\b/i
- api: msvcr100.system
- optional:
- and:
Expand Down
4 changes: 2 additions & 2 deletions load-code/powershell/run-powershell-expression.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ rule:

- and:
- or:
- string: /powershell(\.exe)?/i
- string: /pwsh(\.exe)?/i
- string: /\bpowershell(\.exe)?\b/i
- string: /\bpwsh(\.exe)?\b/i
- or:
- string: /\b-(e|en|enc|enco|encod|encodedcommand)\b/i
- string: /\biex\b/i
Expand Down
2 changes: 1 addition & 1 deletion nursery/delete-windows-backup-catalog.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@ rule:
features:
- and:
- os: windows
- string: /wbadmin(\.exe)?\s+delete\s+catalog/i
- string: /\bwbadmin(\.exe)?\s+delete\s+catalog/i
4 changes: 2 additions & 2 deletions nursery/disable-automatic-windows-recovery-features.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ rule:
- and:
- os: windows
- or:
- string: /bcdedit(\.exe)?\s+/set\s+{default}\s+bootstatuspolicy\s+ignoreallfailures/i
- string: /\bbcdedit(\.exe)?\s+/set\s+{default}\s+bootstatuspolicy\s+ignoreallfailures/i
description: ignore errors and boot normally even if there is a failed boot, shutdown, or checkpoint
- string: /bcdedit(\.exe)?\s+/set\s+{default}\s+recoveryenabled\s+no/i
- string: /\bbcdedit(\.exe)?\s+/set\s+{default}\s+recoveryenabled\s+no/i
description: disable automatic repair
2 changes: 1 addition & 1 deletion nursery/enumerate-device-drivers-on-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ rule:
features:
- or:
- api: EnumDeviceDrivers
- string: /driverquery(\.exe)?/i
- string: /\bdriverquery(\.exe)?\b/i
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did you decide when to have a trailing \b and when not to?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The \b was intended to prevent partial matches like driverqueryhost. I agree it’s messy, so I propose switching to dual substring entries instead (e.g., driverquery and driverquery.exe). It provides the same protection with much better readability. What do you think? @williballenthin

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you have one \b for some strings and two \b in other strings?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inconsistency was an oversight during the update. Switching to the proposed dual substring approach will fix this entirely and ensure consistent whole-word matching across all rules.

- and:
- or:
- match: query or enumerate registry key
Expand Down