feat(adaptor): add Sahara adaptor for OpenFn#1478
feat(adaptor): add Sahara adaptor for OpenFn#1478Deehas wants to merge 9 commits intoOpenFn:mainfrom
Conversation
josephjclark
left a comment
There was a problem hiding this comment.
Hi, thanks for raising this! Very cool.
Most of the core time is away on vacation until January so expect a bit less activity from us over the next two weeks :)
I've started taking a look at his and left a few comments - but I'm afraid I've hit quite a big problem which we'll all need to think about it.
Refer to my comments in util.uploadFile.
Basically it looks like you've designed this adaptor around the CLI and assumed the use of the file system. But the OpenFn app - the bit that handles all the actual automation - doesn't have a file system. So I'm afraid the adaptor as designed right now will not be compatible with our system.
We'd be happy to help you work out a solution in January (the answer is already in my comments actually)
.changeset/stale-trains-drop.md
Outdated
| '@openfn/language-sahara': major | ||
| --- | ||
|
|
||
| Add Sahara adaptor with axios-based upload helper, integration scripts, updated |
There was a problem hiding this comment.
Thank you for raising a changeset! But there's actually no need for a first release, so please delete this file
packages/sahara/CHANGELOG.md
Outdated
| - Comprehensive test suite | ||
| - Full JSDoc documentation | ||
|
|
||
| ### Implementation Notes |
There was a problem hiding this comment.
I think this stuff more appropriate for the readme than for the changelog?
packages/sahara/CHANGELOG.md
Outdated
| - Initial release of Sahara (Intron Health) adaptor | ||
| - Bearer token authentication support | ||
| - **Automatic retry logic with exponential backoff** for rate limits, server errors, and network failures | ||
| - `uploadAudioFile()` operation for uploading audio files (uses axios for reliable FormData handling) ✅ **FULLY FUNCTIONAL** |
There was a problem hiding this comment.
The fully functional bit is a bit weird - can we remove it? It sort of implies to users that the other bits NOT fully functional!
packages/sahara/CHANGELOG.md
Outdated
|
|
||
| ### Added | ||
|
|
||
| - Initial release of Sahara (Intron Health) adaptor |
There was a problem hiding this comment.
This looks AI generated and it's a lot more comprehensive and detailed than it needs to be, and a lot more comprehensive and detailed than our other adaptors. For the changelog we usually just list the added/changed functions, and note any breaking changes.
I'm happy to leave details in if you want them - it's your adaptor after all! This is just a note on style and best practice.
| { | ||
| "$schema": "http://json-schema.org/draft-07/schema#", | ||
| "properties": { | ||
| "baseUrl": { |
There was a problem hiding this comment.
Would users ever change this? Does Sahara have custom deployments or endpoints or anything?
packages/sahara/CHANGELOG.md
Outdated
|
|
||
| - **SSL Certificate**: Sahara's server has a certificate for `*.intron.health` but the endpoint is `infer.voice.intron.io`. Set `tls.rejectUnauthorized: false` in configuration to handle this server-side SSL configuration. | ||
|
|
||
| - **Testing**: 10/13 unit tests pass. Upload tests hit real API (axios bypasses undici mocks). 100% success rate with real API integration tests. |
There was a problem hiding this comment.
This is a weird note and not appropriate for the changelog - please remove it 🙏
Incidentally, if the note is accurate, what's up with the 3 tests that don't pass?
| uploadAudioFile({ | ||
| audio_file_name: 'test_basic_upload', | ||
| audio_file_blob: { | ||
| // Option 1: If you have a local file, you can use fs to read it |
There was a problem hiding this comment.
Alarm bells are ringing at the mention of local paths. There is no filesystem at all in the app. You might be able to use fs inside the adaptor in a way that works with the CLI, but I have to advise that using the file system is an anti-pattern - even in the CLI
packages/sahara/src/Adaptor.js
Outdated
| * Options for file upload | ||
| * @typedef {Object} UploadOptions | ||
| * @public | ||
| * @property {string} audio_file_name - Name for the uploaded audio file (required) |
There was a problem hiding this comment.
Rather than duplicating all the valid properties like this (which could be inaccurate and will slide out of date over time), I would recommend just linking to the relevant Sahara docs pages.
Take a look at what we did with OpenMRS: https://docs.openfn.org/adaptors/packages/openmrs-docs#get
packages/sahara/src/Utils.js
Outdated
| /** | ||
| * Helper function to upload files to Sahara API using axios | ||
| * | ||
| * Note: Uses axios instead of undici due to FormData compatibility issues in undici v6 and v7. |
There was a problem hiding this comment.
You're welcome to use axios, it's not a problem at all. It's good to note why you've made that choice but it doesn't need a lot of justification in the comments like this.
But I have to say, I'm very surprised undici isn't working for you? undici provides the node,js implementation of fetch(), and I'd expect it to be quite robust for uploading large blobs of form data
packages/sahara/src/Utils.js
Outdated
| // File path - use stream for efficiency | ||
| const absPath = nodepath.resolve(fileValue); | ||
| const fileName = nodepath.basename(absPath); | ||
| form.append('audio_file_blob', fs.createReadStream(absPath), fileName); |
There was a problem hiding this comment.
Ah, here we get to the issue.
This file uploader will never work with the OpenFn app / Lightning. It'll only work on local machines the CLI.
The CLI is a tool to test and debug workflows, but we expect real live workflows to be automated through the app. And because the app has no file system, this function just won't work.
Supporting a local file system might be OK, as a local, undocumented debugging option. But for integration in the app you'll need to modify this helper to have data stream passed in. Typically in a workflow you would download a file from s3 or something, and then pass the stream directly.
This can result in really nice workflows: you'd use one adaptor to download the data from some server, pass the stream straight into the Sahara adaptor, and then just handle the upload. That's the kind of thing OpenFn does really well.
The big caveat there is that, at the moment, OpenFn only allows you to pass JSON objects or strings between steps. We don't really support passing streams (it might work but I'd expect problems, particularly when running through the app).
So we might need to think about this.
packages/sahara/src/Utils.js
Outdated
| * Logging utility with toggle support | ||
| * Set ENABLE_LOGGING=false in environment or pass enableLogging: false in configuration to disable | ||
| */ | ||
| export const logger = { |
There was a problem hiding this comment.
I'm not keen on introducting this pattern because the CLI and app already give users a bunch of ways to control an configure logging. If we're missing features I'd rather build them into our platform rather than build out a bespoke logging tool for this adaptor..
This is not a pattern I'd want to spread to other adaptors. So I'd really like to see this removed.
|
Hiya @Deehas 👋 Just checking in on this. Are you guys able to carry on based on my feedback? Is there anything I can do to help you push forwards? I know there's a lot of comments here - but the thing we absolutely have to resolve is the use of the file system. The core functions really ought to be re-written to work from streams (and we may need to investigate some core engineering in the runtime to ensure that works reliably) |
|
Hello @josephjclark 👋, thanks a lot for the detailed review, really appreciate you taking the time to go through it. I’ve gone through the comments and addressed all of them except one. The remaining point is around file system usage. Is there an adaptor that uses file uploads which you could point us to as an example? |
|
Hey @Deehas That might be a longer conversation I'm afraid There are no adaptors that use the local file system, for the simple reason that the deployed openfn platform (app.openfn.org) doesn't support a file system. It doesn't really make sense to either. How would you get files on there? OpenFn doesn't have a file management system, and isn't designed for that sort of thing. What we'd usually see is that the files are uploaded to some third party service, like Sharepoint (or S3, but we don't yet have an adaptor for that). Then they're downloaded into the workflow and usually saved to state as string (I imagine base64 encoded for audio, but I've never dealt with audio), and passed on to the next step. Taking a look over the adaptor now, I think you're doing the right thing here: The adaptor just assumes that there's a blob of audio data on state. How it got there is not the adaptor's concern. But now you're going to ask me: 1) how do we test with local files and the CLI, and 2) how to process these audio files in production? For 1 I'd create a utility which generates a state object with audio binary encoded on to it, outside of openfn. Just a plan JS file: read an audio file in and generate a state file out. One you've got a state file in the right shape, you can call it from the CLI. 2 depends very much on your workflow. Where is openfn running, and where are your audio files coming from in the first place? If your use case is only to use openfn locally, and not to use the app at all, then maybe we can explore some CLI-only functionality... but I'd be hesitant to get into that because it'll confuse other users. Maybe an undocumented function is OK. |
Summary
This PR adds the Sahara adaptor for OpenFn workflows, enabling automated voice transcription and AI-powered clinical documentation for telehealth, call center, legal, and meeting use cases.
Fixes #
Details
Add technical details of what you've changed (and why).
AI Usage
Please disclose how you've used AI in this work (it's cool, we just want to
know!):
You can read more details in our
Responsible AI Policy
Review Checklist
Before merging, the reviewer should check the following items:
production? Is it safe to release?
dev only changes don't need a changeset.