llama integration with exception safety#2
llama integration with exception safety#2reeshabh90 wants to merge 13 commits intoas-ascii:masterfrom
Conversation
- changes made in docwire.cpp for default installation purpose.
1. Added configurable model load and unload feature, which gives sdk an option to decide whether to unload to the model after pipeline usage or keep it persistent for next usage. 2. Added files for local summarize and translate
src/local_ai_translate.h
Outdated
| class DOCWIRE_LOCAL_AI_EXPORT local_translate : public model_chain_element | ||
| { | ||
| public: | ||
| explicit local_translate(const std::string& language, std::shared_ptr<ai_runner> runner); |
There was a problem hiding this comment.
I think we can have some default model runner to simplify if user does not care.
There was a problem hiding this comment.
For default runner then, we may redirect the prompt to either c2t runner or else, we keep this for now.
Once, we have decided upon which models to use for specific task, that time, we decide upon which default model runner to use in case, use does not care.
There was a problem hiding this comment.
Yes, it should use the default model that we choose for the particular task.
ports/docwire/vcpkg.json
Outdated
| }, | ||
| "local-ai": | ||
| { | ||
| "description": "Enable local AI runtime (ctranslate2 + sentencepiece)", |
There was a problem hiding this comment.
I think that local-ai should disable everything: llama engine as well and embeddings as well. If only ctranslate2 than maybe this feature should be rather called "ctranslate2" or "ct2runner" or "local-ai-ct2" - something like this.
There was a problem hiding this comment.
Yes, it should ideally disable or enable everything.
ports/docwire/vcpkg.json
Outdated
| "multilingual-e5-small-ct2-int8" | ||
| ] | ||
| }, | ||
| "text-gen": |
There was a problem hiding this comment.
There is some inconsistency between "text-gen" feature and "llm-qwen" feature. Both enable/disable single LLM model but naming is different.
There was a problem hiding this comment.
moved text-gen to local ai only
ports/docwire/vcpkg.json
Outdated
| "flan-t5-large-ct2-int8" | ||
| ] | ||
| }, | ||
| "llm": |
There was a problem hiding this comment.
"llm" does not mean "local" (can be OpenAI API or Gemini API as well) so maybe better name would be something like "llama-engine" or "local-ai-llama", something like that
| set(MODEL_NAME "qwen2-7b-instruct") | ||
| set(MODEL_QUANT "q4_k_m") | ||
|
|
||
| set(MODEL_FILE "${MODEL_NAME}-${MODEL_QUANT}.gguf") |
There was a problem hiding this comment.
If MODEL_QUANT is significant and there are more than one on huggingface than port name should probably follow.
There was a problem hiding this comment.
Yes, there are more than one. I will make ammendments
| // Use a practical epsilon for the squared norm to check for zero vectors. | ||
| // This threshold is aligned with the one used for L2 normalization in | ||
| // c2t_runner.cpp (1e-6f). The squared value is 1e-12. | ||
| // ct2_runner.cpp (1e-6f). The squared value is 1e-12. |
There was a problem hiding this comment.
Ok this is an unexpected topic not to forget. cosine similarity should work with other engines as well. We need to check if this is something specific to ct2_runner or it is the same for Llama.cpp
| { | ||
| "description": "Enable local AI runtime (ctranslate2 + sentencepiece)", | ||
| "dependencies": [ | ||
| "ctranslate2", |
There was a problem hiding this comment.
If dependency is disabled the code will not compile correctly. We need additional (small) code in portfile.cmake to support a feature and similar in cmake to conditionally disable for example building docwire_local_ai library.
Features worked: