Labelstudio Tools

Advanced Labelstudio API toolkit

labelstudio-tools extends the official label-studio-sdk with:

Idempotent bulk task actions
Client-side task caching
Snapshot (export) automation
Bulk downloads
S3 validation & transfer helpers
Cache-Label management
View & filter helpers

It is designed for power users managing large Label Studio projects programmatically.

Installation

pip install git+https://github.com/WHOIGit/labelstudio-tools.git

Quick Start

from labelstudio_tools import LabelStudioPlus

ls = LabelStudioPlus(
    host="https://your-labelstudio-instance.com",
    token="LABELSTUDIO_API_TOKEN",
    project="Some Project Name"  # or project_id integer
)

# or

ls = LabelStudioPlus.from_config('path/to/config.json')
# where config.json looks like
# {
#   "host": "https://my-labelstudio-instance.com",
#   "token": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
#   "project": 1,
#   "pk": "image",
#   "s3_config": {
#     "bucket": "myBucket",
#     "endpoint_url": "http://mys3.endpoint.com",
#     "aws_access_key_id": "XXXXXXXXXXXXXXXXXXXX",
#     "aws_secret_access_key": "xXxxXXXxxxxXxxXxxxXXxXXXXXxXXXXXxXxxXXxx"
#   }
# }

# fetch all tasks from the project
tasks = ls.get_tasks(limit_fields_to=["id", "data"])

With regards to secrets, another valid format is "token": "$LSTOKEN", where LSTOKEN is defined in a .env file. This works for other values in the config json file as well.

Features

Idempotent Bulk Uploads

Avoid duplicate task creation using primary-key fields.

report, responses = ls.create_tasks(
    tasks=my_tasks,
    pk_datafields="image"
)

Skips existing tasks
Chunked uploads (respects connection size limits)
Detailed report of created vs existing tasks, as well as chunk api responses
Optional dry-run mode

Client-Side Task Caching

Speed up repeated lookups.

ls.cache_tasks()
ls.cache_task_by_pk("image")

existing = ls.task_exists(task_data, data_fields="image", use_cache=True)

Useful for:

Deduplication
Fast existence checks
Large dataset management

Advanced Task Retrieval

from labelstudio_tools.utils import simple_task_filter_builder
my_filter = simple_task_filter_builder(field='myDatafield', value='some_value', operator='equal')
#{"conjunction": "and",
# "items": [{"filter": f"filter:tasks:data.myDatafield",
#          "operator": "equal",
#          "value": "some_value",
#          "type": "String"}]
# }
tasks = ls.get_tasks(
    with_annotations=True,
    view="Some Specified View",
    filter_dict=my_filter,
)

Supports:

View-based filtering
Explicit ID selection
Pagination auto-handling for large requests, with progress bar
Resolving S3 URLs to presigned URLs
Optional inclusion of annotations

S3 Integration

Optional S3 support for validating and transferring task data, if s3_config specified. Transfer functions have a clobber argument that will skip the actual transfer if a same-key or same-filename already exists.

ls.s3key_to_url('somewhere/something') # --> 's3://mybucket/somewhere/something'
ls.s3key_exists('somewhere/something') # --> false
ls.upload_s3url('path/to/local_file.ext', s3url='s3://mybucket/somewhere/something', clobber=False)
ls.s3key_exists('somewhere/something') # --> true
ls.download_s3url('s3://mybucket/somewhere/something', outfile='path/to/downloaded_file.ext', clobber=False)

Snapshot & Export Management

Great for bulk-downloads of your data and annotation Managed via SnapshotManager.

from labelstudio_tools import SnapshotManager

snapman = SnapshotManager(host=..., token=..., project=...)
snapman.make_snapshot(title=..., filter_obj=...)

# check or wait for snapshot to be ready
snapman.is_snap_ready()  # --> true/false or...
snapman.wait_for_snapshot_completion(sleep_cycle_seconds=10)

# then downlaod
data = snapman.download_snap()

# and optionally cleanup if you will not be downloading again
snapman.cleanup_snapshot()

Label Cache Management

Cached Labels is an experimental Labelstudio feature that creates datafields from annotations or predictions. It's the only way to filter on annotations at time of writing. If an annotation/prediction is ever updated, label-caching will have to be re-run for the changes to be reflected in that data field. The following functions automates the requests for creating Cached Labels across multiple labels. If there are many tasks, requests can time out. The functions below can automatically chunk tasks using views to avoid timeouts.

ls.update_cachelabels(
    control_tags=["my_annotation_label", "another_label"],
    with_counters=False,
    from_predictions = False,
    timeout_groups = 'auto'
)

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
src/labelstudio_tools		src/labelstudio_tools
tests		tests
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Labelstudio Tools

Installation

Quick Start

Features

Idempotent Bulk Uploads

Client-Side Task Caching

Advanced Task Retrieval

S3 Integration

Snapshot & Export Management

Label Cache Management

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Labelstudio Tools

Installation

Quick Start

Features

Idempotent Bulk Uploads

Client-Side Task Caching

Advanced Task Retrieval

S3 Integration

Snapshot & Export Management

Label Cache Management

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages