Skip to content

feat: exclude opted-out repos from the full pipeline (analysis + issue creation) with wildcard support#43

Closed
Copilot wants to merge 4 commits intomainfrom
copilot/handle-blacklist-repositories
Closed

feat: exclude opted-out repos from the full pipeline (analysis + issue creation) with wildcard support#43
Copilot wants to merge 4 commits intomainfrom
copilot/handle-blacklist-repositories

Conversation

Copy link

Copilot AI commented Mar 6, 2026

Opted-out repositories are now excluded at both the metacheck analysis stage and the issue-creation stage, so no compute is wasted analyzing repos that will be skipped anyway.

Changes

  • metacheck_wrapper.py: Added --opt-outs option (defaults to .opt-outs in the current directory). When the file exists and the input is a JSON file, a _filter_opt_out_repos() helper strips opted-out repos into a temp file before passing it to metacheck (temp file cleaned up after the run). If the default .opt-outs file is absent, filtering is silently skipped. Opt-outs entries support * as a wildcard (e.g. https://github.com/SoftwareUnderstanding/* excludes all repos in that org); URL characters such as . are always treated as literals so patterns are safe to write without escaping. Invalid patterns emit a warning on stderr.

  • pipeline.py: Passes opt_outs_file as --opt-outs to the metacheck step, so a single file controls both analysis and issue-creation filtering.

  • tests/test_metacheck_wrapper.py (new): Covers exact URL removal, trailing-slash normalization, org-level wildcard patterns, wildcard suffix patterns, literal-dot behaviour, extra JSON key preservation, empty opt-outs list, invalid format error, and missing default file.

  • tests/test_pipeline.py: Updated expected metacheck args to include --opt-outs.

Example — standalone metacheck usage

sw-metadata-bot metacheck \
  --input assets/opt-ins.json \
  --opt-outs assets/opt-outs.json \
  --pitfalls-output pitfalls_outputs/

Example — .opt-outs file with wildcard patterns

{
  "repositories": [
    "https://github.com/SoftwareUnderstanding/*",
    "https://github.com/some-org/specific-repo"
  ]
}

Place a .opt-outs file in the working directory and it will be picked up automatically. The run-pipeline command passes the same file through to both stages automatically.

Original prompt

This section details on the original issue you should resolve

<issue_title>handling white list / black list</issue_title>
<issue_description>the bot should handle a black list of repos that requested to unsubscribe or where we don't want to run the bot

this is a simple list of repos in a file that is checked when running the bot</issue_description>

<agent_instructions>open a PR to implement this new feature</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: vuillaut <4263646+vuillaut@users.noreply.github.com>
Copilot AI changed the title [WIP] Add handling for black list of repositories feat: apply blacklist to metacheck analysis stage, not just issue creation Mar 6, 2026
@vuillaut
Copy link
Collaborator

vuillaut commented Mar 6, 2026

@copilot

  • rename --blacklist-file to --blacklist
  • --blacklist should be set to True by default with a .blacklist file in the root repo as default. if no file is found, then consider blacklist is empty. user can specifiy a different file
  • blacklist shuold handle regex so that we can specify all repos in an organisation with a single line, e.g. https://github.com/SoftwareUnderstanding/* would skip all repos in this orga

…add wildcard support

Co-authored-by: vuillaut <4263646+vuillaut@users.noreply.github.com>
Copy link
Author

Copilot AI commented Mar 6, 2026

@copilot

  • rename --blacklist-file to --blacklist
  • --blacklist should be set to True by default with a .blacklist file in the root repo as default. if no file is found, then cons...

Done in f4c4809:

  • --blacklist-file--blacklist: option renamed on metacheck_command and in pipeline.py.
  • Default .blacklist file: the option defaults to .blacklist in the current directory. If the file doesn't exist (the common case when no blacklist is configured), filtering is silently skipped — no error.
  • Wildcard/pattern support: each entry in the repositories list is treated as a pattern where * acts as a wildcard. The pattern is first passed through re.escape() so that URL characters like . are always treated as literals, then the escaped \* is replaced with .* to restore wildcard behaviour. This means https://github.com/SoftwareUnderstanding/* correctly excludes all repos in that org, without . in github.com accidentally matching other characters. Invalid patterns emit a warning on stderr rather than silently failing.

Copilot AI changed the title feat: apply blacklist to metacheck analysis stage, not just issue creation feat: blacklist repos from the full pipeline (analysis + issue creation) with wildcard support Mar 6, 2026
@vuillaut
Copy link
Collaborator

vuillaut commented Mar 6, 2026

@copilot rename "blacklist" into "opt-outs"

…nd tests

Co-authored-by: vuillaut <4263646+vuillaut@users.noreply.github.com>
Copy link
Author

Copilot AI commented Mar 6, 2026

@copilot rename "blacklist" into "opt-outs"

Done in 7639cd6. All "blacklist" terminology has been replaced with "opt-outs":

  • --blacklist--opt-outs (CLI option)
  • Default file .blacklist.opt-outs
  • _is_blacklisted()_is_opted_out()
  • _filter_blacklisted_repos()_filter_opt_out_repos()
  • Echo/warning messages updated accordingly
  • Tests and pipeline args updated to match

Copilot AI changed the title feat: blacklist repos from the full pipeline (analysis + issue creation) with wildcard support feat: exclude opted-out repos from the full pipeline (analysis + issue creation) with wildcard support Mar 6, 2026
@vuillaut vuillaut marked this pull request as ready for review March 6, 2026 11:03
@vuillaut
Copy link
Collaborator

vuillaut commented Mar 6, 2026

Already implemented :D

@vuillaut vuillaut closed this Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

handling white list / black list

2 participants