Extend importer module to allow bulk import from Rivet#958
Draft
Extend importer module to allow bulk import from Rivet#958
importer module to allow bulk import from Rivet#958Conversation
Co-authored-by: GraemeWatt <11544204+GraemeWatt@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Extend
Extend Mar 17, 2026
importer module to allow bulk import from Rivetimporter module to allow bulk import from Rivet
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The
importermodule was hardcoded to fetch INSPIRE IDs and submission files exclusively fromhepdata.net, and always assigned user ID1as the Coordinator. This blocked bulk import of ~780 Rivet analyses hosted at an alternate web location.Changes
api.pyget_inspire_ids: newids_urlparameter — when set, fetches the INSPIRE ID list directly from that URL (expects a JSON array of integers, e.g.inspire.json) instead of constructing the HEPData/search/idsendpoint.n_lateststill applies client-side;last_updatedis ignored whenids_urlis used._download_file: newfiles_urlparameter — when set, downloads from{files_url}/ins{inspire_id}.tar.gzinstead of{base_url}/download/submission/ins{inspire_id}/original._import_record/import_records: newcoordinator_id(default1) andfiles_urlparameters, replacing the hardcodedadmin_user_id = 1.cli.pyimport-records: adds--coordinator-id/-c,--files-url/-fbulk-import-records: adds--ids-url,--files-url/-f,--coordinator-id/-cExample — bulk import from a Rivet mirror:
Tests
test_get_inspire_idsto coverids_urlsuccess,ids_url+n_latest, andids_url404.test_import_recordscall-signature assertions and added a case for non-defaultcoordinator_id/files_urlpropagation.files_urldownload-error case totest_import_record.Original prompt
This section details on the original issue you should resolve
<issue_title>records: extend
importermodule to allow bulk import from Rivet</issue_title><issue_description>The
importermodule (CLI) was written to import records from hepdata.net to a developer's local instance. It uses a list of INSPIRE IDs given athttps://www.hepdata.net/search/ids?inspire_ids=trueand it downloads files using a URL patternurl = "{0}/download/submission/ins{1}/original".format(base_url, inspire_id)wherebase_url = 'https://hepdata.net'.The
importermodule should be extended to get the list of INSPIRE IDs and the download files from an alternate location, for example, a simple web directory with the INSPIRE IDs contained in the name of the files. It should also be possible to create records with any user assigned as the Coordinator (rather than justadmin_user_id = 1). The ability to import only a subset of the complete list of INSPIRE IDs would be useful.These changes should be carefully tested locally and on the QA system before importing to the production instance. Such an extension would be a quicker way of importing the 780 records obtained from Rivet than using the normal submission web interface.
See also discussion with @20DM in HEPData/hepdata_lib#229.
A list of the Rivet analyses can be seen at https://gitlab.com/hepcedar/rivet/-/issues/485 .</issue_description>
<agent_instructions>Most relevant information is given in the first and last comments.</agent_instructions>
Comments on the Issue (you are @copilot in this section)
@GraemeWatt @20DM : thanks, that's great! I'll look into modifying the [`importer`](https://github.com/HEPData/hepdata/blob/main/hepdata/modules/records/importer/api.py) module soon.I picked a random submission (
ins2705058.tar.gz) and uploaded it to my Sandbox. Few (optional) comments for your consideration:http://rivet.hepforge.org/analyses#BESIII_2023_I2705058as an additional resource. This is not strictly necessary (see submission docs) since the link will automatically be added after the record is finalised from the nightly harvesting of theanalyses.jsonfile. Moreover, the automatic link added will behttp://rivet.hepforge.org/analyses/BESIII_2023_I2705058with a/instead of a#. So if you want to keep the Rivet analysis in thesubmission.yamlfile, better to use a link with a/instead of a#, or just remove it completely.commenthas a weird markup that is not rendered by HEPData. It looks like you are taking this from the journal abstract given by the INSPIRE record (JSON). The INSPIRE JSON also provides the arXiv abstract (second item ofabstracts) that uses LaTeX markup and can be rendered by HEPData. HEPData uses the arXiv abstract from INSPIRE if possible (code). Since HEPData already stores the paper abstract (although it is only displayed if there is nocomment), I don't think you need to duplicate it in thecomment. So I would just use the additional information "NUMERICAL VALUES HAVE BEEN DIGITISED FROM THE PAPER." as thecommentor omit thecommentcompletely if there is nothing to add. (Another possibility is to use theDescriptionfrom the Rivet.infofile as thecomment, but in this case it containsBeam energy must be specified as analysis option "ENERGY" when rivet-merging samples.which is not relevant to the HEPData record.)<comment_new>@GraemeWatt
Thanks for making the changes to the tarballs. I haven't started looking at this yet, since I didn't see that it was particularly urgent, but I'll try to look into it within the next couple of months.</comment_new>
<comment_new>@GraemeWatt
The links given in the [previous comment](https://github.com/HEPData/hepdata/...
importermodule to allow bulk import from Rivet #811💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.