Batch ProtParam

Batch calculation of protein physicochemical properties from FASTA files using Biopython.

This script reproduces many of the metrics provided by the ExPASy ProtParam web tool but allows high-throughput analysis of hundreds to hundreds of thousands of sequences locally without using a web browser.

Inspired by the ExPASy ProtParam tool:
https://web.expasy.org/protparam/

The output is written as CSV files that open directly in Excel, R, or Python.

Features

For each protein sequence the script calculates:

Amino acid counts
Amino acid percentages
Molecular weight
Aromaticity
Theoretical isoelectric point (pI)
Secondary structure fraction
- helix\
- turn\
- sheet
GRAVY (hydrophobicity score)
Instability index
Flexibility statistics
- mean\
- minimum\
- maximum\
- standard deviation

Additional metadata columns include:

source FASTA file
warnings (e.g., dropped residues)
error handling status

Installation

Requires Python 3.8+

Install dependency:

pip install biopython

Or install from requirements:

pip install -r requirements.txt

Example requirements.txt:

biopython

Example Folder Structure

project_folder/
│
├── batchProtParam.py
├── fastas/
│   ├── proteins1.fasta
│   └── proteins2.fasta

Basic Usage

Run from the project directory:

python batchProtParam.py --in_dir ./fastas --out_dir ./results

This produces:

results/
    proteins1.protparam.csv
    proteins2.protparam.csv

Output Modes

One CSV per FASTA (default)

python batchProtParam.py \
  --in_dir ./fastas \
  --out_dir ./results \
  --output_mode per_fasta

One combined CSV for all FASTAs

python batchProtParam.py \
  --in_dir ./fastas \
  --out_dir ./results \
  --output_mode all_fastas

Optional custom filename:

python batchProtParam.py \
  --in_dir ./fastas \
  --out_dir ./results \
  --output_mode all_fastas \
  --all_fastas_name my_results.csv

Handling Ambiguous Amino Acids

Sequences sometimes contain non-standard residues.

Residue Meaning

X unknown B D or N Z E or Q U selenocysteine O pyrrolysine

You can control how these are handled.

Default (recommended)

--ambiguous drop

Removes non-standard residues before calculation.

Strict mode

--ambiguous fail

Skips sequences containing ambiguous residues.

Advanced mode

--ambiguous keep

Keeps residues unchanged (may cause calculation errors).

Example Output Columns

seq_id
length_aa
count_A
pct_A
molecular_weight
aromaticity
theoretical_pi
ss_helix
ss_turn
ss_sheet
gravy
instability_index
flex_mean
flex_min
flex_max
flex_stdev
source_fasta
warnings
status
error_type

Why Use This Script?

The official ProtParam web server is useful for analyzing individual proteins but becomes impractical for large datasets.

This script enables:

High-throughput proteome analysis
Automated pipelines
Reproducible workflows
Integration with Python, R, or spreadsheet analysis

Citation

This tool relies on Biopython:

Cock et al. (2009).
Biopython: freely available Python tools for computational molecular biology and bioinformatics.
Bioinformatics.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
batchProtParam.py		batchProtParam.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Batch ProtParam

Features

Installation

Example Folder Structure

Basic Usage

Output Modes

One CSV per FASTA (default)

One combined CSV for all FASTAs

Handling Ambiguous Amino Acids

Default (recommended)

Strict mode

Advanced mode

Example Output Columns

Why Use This Script?

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Batch ProtParam

Features

Installation

Example Folder Structure

Basic Usage

Output Modes

One CSV per FASTA (default)

One combined CSV for all FASTAs

Handling Ambiguous Amino Acids

Default (recommended)

Strict mode

Advanced mode

Example Output Columns

Why Use This Script?

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages