Memgraph split files by EvanDietzMorris · Pull Request #376 · RobokopU24/ORION

EvanDietzMorris · 2026-03-06T07:57:34Z

I think there was a bug in the kgx_file_converter, where when it was splitting the edges file for memgraph it wasn't handling the existence check for file handlers correctly, leading to an error when calling close() on None in the finally clause, and only writing 1 line per file. This should fix that.

This also changes it so that it writes the intermediate split edges files to the output directory instead of next to the input files, in case you need all of outputs written in a different place.

hyi · 2026-03-06T17:12:22Z

Common/kgx_file_converter.py

+                if rel_type not in file_handles:
+                    split_jsonl_path = f"{out_base}_{rel_type}.jsonl"
+                    file_handles[rel_type] = open(split_jsonl_path, "w", encoding="utf-8")
+                file_handles[rel_type].write(line)


@EvanDietzMorris I see the issue here with the original code, mainly related to checking whether the output file exists or not and if not, the corresponding file_handle is set to None. I cannot remember exactly why I did this, but probably was meant to not override existing files. I agree it is better to just output the file without doing this check. But I see one issue here with hardcoding the ext name to jsonl. In fact, the output file name is csv rather than jsonl. I think it is better to leverage the out_ext here to make the output path f"{out_base}_{rel_type}{out_ext}" here.

isn't this part just writing the split edge files before they are converted to csv? unless we change a lot of other stuff in ORION those would always be jsonl.. maybe I'm misunderstanding though

Yes, you are right, Evan. It is always jsonl, but since we pass in edges_input_file as an input parameter to this method, it is probably better to just use that input file to get its base and ext even though it is currently just edges.jsonl.

hyi · 2026-03-06T17:17:31Z

Common/kgx_file_converter.py

    # split a large edge jsonl file into multiple jsonl files, one per predicate (relationship type)
    # for subsequent conversions by edge types
-    base, ext = os.path.splitext(edges_input_file)
+    out_base, out_ext = os.path.splitext(output_base_file)


I think we will need the original input base and ext file for later use. We could rename them as in_base and in_ext to be clearer, though.

yea, feel free to make changes or rename those, I was mainly just trying to fix the None file handler issue

hyi · 2026-03-06T17:18:51Z

Common/kgx_file_converter.py

    all_file_names = []
    for rel_type in file_handles.keys():
-        input_split_file = f"{base}_{rel_type}{ext}"
+        input_split_file = f"{out_base}_{rel_type}.jsonl"


I think we need to keep the original code here since it is input_split_file?

hyi

@EvanDietzMorris Looking into this more, I think the changes you made here look good and clearer. My comments can be ignored. Feel free to merge it.

EvanDietzMorris added 2 commits March 6, 2026 02:01

fixing split edges files bug, always raising exceptions on fail

e4056a0

write temp split files to output dir instead of input file dir

fe7c180

github-actions bot added the Biological Context QC Require validation of biological context to ensure accuracy and consistency label Mar 6, 2026

EvanDietzMorris requested a review from hyi March 6, 2026 08:04

hyi reviewed Mar 6, 2026

View reviewed changes

hyi approved these changes Mar 6, 2026

View reviewed changes

EvanDietzMorris merged commit 5e6bb72 into neo4j-sources-issue Mar 6, 2026
2 checks passed

EvanDietzMorris deleted the memgraph-split-files branch March 6, 2026 17:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memgraph split files#376

Memgraph split files#376
EvanDietzMorris merged 2 commits intoneo4j-sources-issuefrom
memgraph-split-files

EvanDietzMorris commented Mar 6, 2026

Uh oh!

hyi Mar 6, 2026

Uh oh!

EvanDietzMorris Mar 6, 2026

Uh oh!

hyi Mar 6, 2026

Uh oh!

hyi Mar 6, 2026

Uh oh!

EvanDietzMorris Mar 6, 2026

Uh oh!

hyi Mar 6, 2026

Uh oh!

hyi left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

EvanDietzMorris commented Mar 6, 2026

Uh oh!

hyi Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

EvanDietzMorris Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

hyi Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

hyi Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

EvanDietzMorris Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

hyi Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

hyi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants