Skip to content

Serialize InelasticSample to JSON #133

@nvaytet

Description

@nvaytet

Any good idea as to how we can properly save the sample to json?

Claude did not have any ideas that would work for all. Here is a summary of what it came up with.

Option 1: Represent the function as a lookup key (recommended)
Rather than serializing arbitrary callables, define a registry of known/supported scattering functions inside tof, and serialize only the name (key) of the function:

# A registry of known inelastic scattering functions
INELASTIC_FUNC_REGISTRY = {
    "linear": lambda wav_i: wav_i * 1.1,
    "debye_waller": debye_waller_func,
    # ...
}

class InelasticSample:
    def as_json(self) -> dict:
        func_name = self._func_name  # stored at construction time
        if func_name not in INELASTIC_FUNC_REGISTRY:
            raise ValueError(f"Cannot serialize unknown function: {func_name}")
        return {
            "type": "inelastic_sample",
            "distance": var_to_dict(self.distance),
            "name": self.name,
            "func": func_name,
        }

This is safe, readable, and consistent with how the rest of the codebase works (e.g. "clockwise" / "anti-clockwise" for chopper direction). The downside is that users can only use pre-registered functions — but that's a feature, not a bug, since it makes JSON files portable and reproducible.

Option 2: Serialize via module:qualname reference
If you want to support user-defined functions but still avoid eval, you can store the function's fully qualified importable name using Python's __module__ and __qualname__.

import importlib

def func_to_json(func) -> dict:
    return {
        "module": func.__module__,
        "qualname": func.__qualname__,
    }

def func_from_json(d: dict):
    mod = importlib.import_module(d["module"])
    # Walk qualname for nested classes/functions
    obj = mod
    for part in d["qualname"].split("."):
        obj = getattr(obj, part)
    return obj

This serializes as e.g. {"module": "mypackage.scattering", "qualname": "debye_waller"}. On deserialization, you import the module and look up the attribute — no eval involved. The caveat: it only works for importable (module-level) functions, not lambdas or closures.

Option 3: dill / cloudpickle + base64
If you truly need to serialize arbitrary callables (including lambdas and closures):

import base64
import cloudpickle  # or dill

def func_to_json(func) -> str:
    return base64.b64encode(cloudpickle.dumps(func)).decode()

def func_from_json(s: str):
    return cloudpickle.loads(base64.b64decode(s))

This works for almost any Python callable. However:

It's not human-readable in the JSON
It's Python-version and library-version sensitive (pickles can break across versions)
It still carries security concerns when loading from untrusted sources (pickle deserialisation can execute arbitrary code too, just like eval)
So this is really only suitable if the JSON files are treated as internal/trusted artifacts.

Originally posted by @nvaytet in #124 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions