Enhance Multimodal Support in LLM Workflow and Update Documentation#45
Enhance Multimodal Support in LLM Workflow and Update Documentation#45
Conversation
- Expanded the LLM workflow to support multimodal input and output, including text, images, audio, and video. - Updated the `ChatRequestEvent` and `ChatResponseEvent` to include `input_parts` and `output_parts` for handling diverse content types. - Introduced new `ContentPart` and `MediaContentEvent` classes to encapsulate various media types and their properties. - Refactored the `ChatRuntime` and `RoleGAgent` to process and emit multimodal content effectively. - Enhanced documentation to reflect the new capabilities and provide clear guidelines for using multimodal features in workflows.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f5a50b5e50
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| TryGetStringByKeys(root, "image_base64", "imageBase64") ?? | ||
| TryGetNestedMediaBase64(root, "image"); |
There was a problem hiding this comment.
Restore legacy base64 aliases in tool media parser
This parser no longer checks the previously supported root aliases base64/data when extracting image payloads, so tool outputs that still return { "base64": "..." } will now fail TryExtractToolContentParts and be sent back as plain text JSON instead of multimodal ContentParts. That regresses existing tool-call flows that relied on the old schema and causes image outputs to be dropped from subsequent LLM turns.
Useful? React with 👍 / 👎.
| if (!TryParseContentPartKind(part.Type, out var kind)) | ||
| continue; |
There was a problem hiding this comment.
Reject chat requests when all inputParts are unsupported
Unsupported inputParts are silently skipped here, and the normalizer still returns success even if every part is dropped; combined with the new prompt-or-inputParts gate in ChatEndpoints/WebSocket parser (which only checks raw part count), requests like inputParts:[{"type":"foo"}] now get accepted and execute with an empty derived prompt. This should be treated as INVALID_PROMPT (or invalid part type) instead of dispatching a blank run.
Useful? React with 👍 / 👎.
- Added a new CI job for testing and building the console-web application. - Updated the `console_web` output in the CI workflow to include relevant paths. - Introduced a new environment variable `AEVATAR_CONSOLE_PUBLIC_PATH` for configuring deployment paths. - Refactored the public path resolution logic in the console-web configuration. - Removed deprecated enriched graph API and related decoding logic from the console API. - Updated authentication configuration to disable NyxID login when required environment variables are missing.
- Added type-checking step for console-web in the CI workflow. - Removed redundant pnpm setup step to streamline the workflow. - Updated architecture scorecard documentation to reflect successful compliance with all architecture guards. - Fixed naming issues and improved clarity in documentation regarding project structure and CI pipeline integration.
…multimodal-llm-support # Conflicts: # apps/aevatar-console-web/src/app.tsx # apps/aevatar-console-web/src/pages/actors/index.tsx # apps/aevatar-console-web/src/pages/observability/index.tsx # apps/aevatar-console-web/src/pages/overview/index.tsx # apps/aevatar-console-web/src/pages/playground/index.test.tsx # apps/aevatar-console-web/src/pages/playground/index.tsx # apps/aevatar-console-web/src/pages/primitives/index.tsx # apps/aevatar-console-web/src/pages/runs/index.tsx # apps/aevatar-console-web/src/pages/settings/index.tsx # apps/aevatar-console-web/src/pages/studio/components/StudioShell.test.tsx # apps/aevatar-console-web/src/pages/studio/index.test.tsx # apps/aevatar-console-web/src/pages/workflows/index.tsx # apps/aevatar-console-web/src/pages/yaml/index.test.tsx # apps/aevatar-console-web/src/pages/yaml/index.tsx # apps/aevatar-console-web/src/shared/api/consoleApi.ts # apps/aevatar-console-web/src/shared/api/decoders.ts
- Created WORKFLOW.md to define project workflow, including issue tracking, execution flow, and verification expectations. - Added start-local.sh script to set up the local development environment, ensuring required commands are available and configuring necessary environment variables.
ChatRequestEventandChatResponseEventto includeinput_partsandoutput_partsfor handling diverse content types.ContentPartandMediaContentEventclasses to encapsulate various media types and their properties.ChatRuntimeandRoleGAgentto process and emit multimodal content effectively.