Skip to content

docs: expand docstrings in amino_acid_distance, util, and anopheles#1135

Open
Komal2022 wants to merge 2 commits intomalariagen:masterfrom
Komal2022:docs/improve-docstrings
Open

docs: expand docstrings in amino_acid_distance, util, and anopheles#1135
Komal2022 wants to merge 2 commits intomalariagen:masterfrom
Komal2022:docs/improve-docstrings

Conversation

@Komal2022
Copy link

Summary

This PR expands and improves docstrings across three files to make the
codebase more accessible and easier to understand for new contributors
and users.

Changes

malariagen_data/amino_acid_distance.py

  • Added module-level docstring describing the purpose of the file
  • Expanded docstring for grantham_score() with Parameters and Returns sections
  • Expanded docstring for sneath_dist() with Parameters and Returns sections

malariagen_data/util.py

  • Expanded docstring for _jitter() with Parameters and Returns sections
  • Expanded docstring for _hash_params() with Parameters and Returns sections
  • Expanded docstring for _haplotype_frequencies() with Parameters and Returns sections

malariagen_data/anopheles.py

  • Added full docstring to _block_jackknife_cohort_diversity_stats()
    describing its parameters, and return value

Motivation

Several internal functions lacked sufficient documentation, making it
harder for new contributors to understand the codebase. These changes
follow the existing numpydoc style used throughout the project.

@jonbrenas
Copy link
Collaborator

Hi @Komal2022, could you adjust your PR so that the docstrings follow the same format that is used everywhere else in the codebase?

@Komal2022
Copy link
Author

Hi @Komal2022, could you adjust your PR so that the docstrings follow the same format that is used everywhere else in the codebase?

Hi @jonbrenas, thanks for the feedback! I want to make sure I get this right, looking at the codebase I can see different styles used (e.g. brief one-liners for some private functions, full numpydoc for others like _jackknife_ci). Could you point me to a specific function whose docstring format you'd like me to follow? I want to be consistent with your preference.

@jonbrenas
Copy link
Collaborator

Hi @Komal2022,

my apologies, it was indeed not a very clear explanation of what I meant. Only public functions need to have a docstring and it should look like:

    @doc(
        summary="""
            Group samples by taxon, area (space) and period (time), then compute
            SNP allele frequencies.
        """,
        returns="""
            The resulting dataset contains data has dimensions "cohorts" and
            "variants". Variables prefixed with "cohort" are 1-dimensional
            arrays with data about the cohorts, such as the area, period, taxon
            and cohort size. Variables prefixed with "variant" are
            1-dimensional arrays with data about the variants, such as the
            contig, position, reference and alternate alleles. Variables
            prefixed with "event" are 2-dimensional arrays with the allele
            counts and frequency calculations.
        """,
    )

There is more freedom for private functions (but the again, users should never see any of that) and comments in the code are at least as important as a description of the input and output. I

think, the relationship between the input and output in some functions is self-evident for some of the functions (e.g., _jitter).

For _block_jackknife_cohort_diversity_stats, the problem is the different because the presence of '*' in the parameters means that you can't effectively describe all the parameters.

@Komal2022
Copy link
Author

Komal2022 commented Mar 17, 2026

@jonbrenas Done! I've reverted the private function docstrings and kept only the amino_acid_distance.py public function documentation. Thanks for the clear explanation, really helpful for understanding the codebase conventions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants