Skip to content

[New Feature] Proposal: Model-Agnostic Tool Calling Mechanism for AmritaCore #20

@JohnRichard4096

Description

@JohnRichard4096

中文用户?

  • 我是一个中文用户,并且我明白需要使用英语编写Issue

What new feature/improvement

Overview

We are designing a flexible, model-agnostic tool calling mechanism for AmritaCore. The goal is to enable any LLM (whether it supports native function calling or not) to invoke external tools in a structured, extensible way, while keeping token usage efficient and avoiding technical debt.

After internal discussions, we’ve settled on an XML‑based approach with an am: namespace. We’d love to get community feedback on the design before implementation.

Core Design

1. Tool Definition

Tools are registered in code and their definitions are presented to the LLM as an XML block in the system prompt.
Example:

<am:tools>
  <am:tool name="calculator" description="Add two integers">
    <parameter name="a" type="integer" description="First number" required="true"/>
    <parameter name="b" type="integer" description="Second number" required="true"/>
  </am:tool>
  <am:tool name="weather" description="Get current weather for a city">
    <parameter name="city" type="string" description="City name" required="true"/>
  </am:tool>
</am:tools>
  • Each <am:tool> contains metadata and a list of <parameter> children.
  • Parameter types can be extended later (arrays, objects) using nested elements or JSON strings.

2. Tool Invocation

When the LLM decides to call a tool, it must output a non‑self‑closing <am:tool_call> element, with every parameter as a child element.
Example:

<am:tool_call name="calculator">
  <a>5</a>
  <b>3</b>
</am:tool_call>
  • Why non‑self‑closing?
    Self‑closing tags would prevent us from using </am:tool_call> as a reliable stop sequence. By requiring an explicit closing tag, we can set the LLM’s stop parameter to ["</am:tool_call>"] and capture the entire invocation cleanly.
  • Why parameters as child elements?
    This avoids attribute escaping issues, naturally supports complex data (by nesting or embedding JSON), and keeps the format consistent and extensible. It also eliminates the risk of attribute‑based technical debt later.

3. Result Feedback

After executing the tool, the system inserts the result wrapped in an <am:tool_result> tag (or <am:tool_error> on failure):

<am:tool_result name="calculator">8</am:tool_result>

The LLM then continues generation, seeing the result as part of the conversation history.

Two Operating Modes

We envision two modes of operation, allowing users to choose the trade‑off between simplicity and token efficiency.

Mode 1: Pure XML

  • All tool definitions (the full <am:tools> block) are included in every conversation’s system prompt.
  • The LLM invokes tools directly using <am:tool_call>.
  • Pros: Model‑agnostic, simple to implement, works with any LLM.
  • Cons: If you have many tools, token usage can be high.

Mode 2: Hybrid (Native Function Calling + XML)

  • Step 1 – Tool selection:
    Use the LLM’s native function calling capability (e.g., OpenAI tools) to let the model choose which tools it needs. Only lightweight summaries (name + short description) are sent in this phase to save tokens.
  • Step 2 – Dynamic injection:
    Once the model signals the required tools (via a native tool call or a text list), the system injects the full XML definitions of only those tools into the context.
  • Step 3 – XML invocation:
    From that point on, the conversation uses the pure XML mechanism described above.
  • Pros: Drastically reduces token usage when tool sets are large; still model‑agnostic for the actual tool execution.
  • Cons: Requires native function calling support for the selection phase (optional fallback to text‑based selection). Slightly more complex to implement.

Both modes will be supported, with the hybrid mode as an advanced, opt‑in feature.

Open Questions & Discussion Points

We’d love the community’s input on the following:

  1. Parameter representation for complex types
    Should we standardize on JSON strings inside child elements for arrays/objects, or invest in fully structured XML nesting from the start? The former is simpler for LLMs to generate; the latter is more “native” to XML.

  2. Dynamic tool discovery
    In hybrid mode, how should the model request additional tools mid‑conversation? A special tag like <am:request_tools>? Or re‑enter the selection phase?

  3. Error handling and retries
    How detailed should error messages in <am:tool_error> be? Should we include stack traces or only user‑friendly messages?

  4. Tool definition granularity
    Should we include parameter schemas (like JSON Schema) inside the XML, or rely on code‑side validation and only provide human‑readable descriptions? The current proposal uses simple <parameter> elements, but we could embed JSON Schema for more rigor.

  5. Compatibility with existing agent frameworks
    Are there other tool‑calling standards (e.g., OpenAPI, MCP) we should consider aligning with?

  6. Streaming considerations
    How should we handle tool calls when using streaming responses? Detect the closing tag on the fly and abort?

Next Steps

  • We will implement a proof‑of‑concept in a feature branch, focusing on the pure XML mode first.
  • After gathering feedback, we’ll refine the design and move toward a production‑ready implementation in AmritaCore 1.x (pure XML) and 2.x (hybrid).

Please share your thoughts, concerns, or alternative ideas! We believe this approach balances flexibility, extensibility, and token efficiency, and we’re excited to build it with the community.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions