Skip to content

Extend preprint-publication dedup from FLoRA to FReD #106

@LukasWallrich

Description

@LukasWallrich

Context

FLoRA now has preprint-publication deduplication (added for #105) that:

  • Detects when the same paper appears as both preprint and published version (different DOIs)
  • Resolves confirmed duplicates, keeping the published version and storing the alternative DOI in doi_o_alt / doi_r_alt columns
  • Handles both replication-side (same doi_o, different doi_r) and original-side (different doi_o for the same paper) duplicates

What needs to happen

  1. Extend dedup to FReD: The same preprint-publication detection logic (R/preprint_dedup.R) should be applied to the FReD effect-level dataset, not just the paper-level FLoRA dataset.

  2. Add alternative DOI columns even without duplicates: Where FReD references a DOI that has a known preprint/published counterpart (from the FLoRA confirmed duplicates or CrossRef metadata), doi_o_alt and doi_r_alt should be populated even if FReD only contains one version. This ensures users can look up papers by either DOI.

References

  • Preprint dedup logic: R/preprint_dedup.R
  • FLoRA pipeline integration: Step 7c in pipelines/flora/prepare_flora.qmd
  • Confirmed duplicates: cache/confirmed_preprint_duplicates.csv
  • Original issue: Deduplicate preprints and publications #105

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions