Skip to content

llama integration with exception safety#2

Open
reeshabh90 wants to merge 13 commits intoas-ascii:masterfrom
reeshabh90:llama-integration
Open

llama integration with exception safety#2
reeshabh90 wants to merge 13 commits intoas-ascii:masterfrom
reeshabh90:llama-integration

Conversation

@reeshabh90
Copy link
Copy Markdown

Features worked:

  1. Llama.cpp integration as one of the engines in docwire SDK.
  2. Ensured exception safety for llama_runner class

1. Added configurable model load and unload feature, which gives sdk an
   option to decide whether to unload to the model after pipeline usage
   or keep it persistent for next usage.

2. Added files for local summarize and translate
class DOCWIRE_LOCAL_AI_EXPORT local_translate : public model_chain_element
{
public:
explicit local_translate(const std::string& language, std::shared_ptr<ai_runner> runner);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can have some default model runner to simplify if user does not care.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For default runner then, we may redirect the prompt to either c2t runner or else, we keep this for now.

Once, we have decided upon which models to use for specific task, that time, we decide upon which default model runner to use in case, use does not care.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should use the default model that we choose for the particular task.

},
"local-ai":
{
"description": "Enable local AI runtime (ctranslate2 + sentencepiece)",
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that local-ai should disable everything: llama engine as well and embeddings as well. If only ctranslate2 than maybe this feature should be rather called "ctranslate2" or "ct2runner" or "local-ai-ct2" - something like this.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should ideally disable or enable everything.

"multilingual-e5-small-ct2-int8"
]
},
"text-gen":
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some inconsistency between "text-gen" feature and "llm-qwen" feature. Both enable/disable single LLM model but naming is different.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved text-gen to local ai only

"flan-t5-large-ct2-int8"
]
},
"llm":
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"llm" does not mean "local" (can be OpenAI API or Gemini API as well) so maybe better name would be something like "llama-engine" or "local-ai-llama", something like that

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Renamed it.

set(MODEL_NAME "qwen2-7b-instruct")
set(MODEL_QUANT "q4_k_m")

set(MODEL_FILE "${MODEL_NAME}-${MODEL_QUANT}.gguf")
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If MODEL_QUANT is significant and there are more than one on huggingface than port name should probably follow.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are more than one. I will make ammendments

// Use a practical epsilon for the squared norm to check for zero vectors.
// This threshold is aligned with the one used for L2 normalization in
// c2t_runner.cpp (1e-6f). The squared value is 1e-12.
// ct2_runner.cpp (1e-6f). The squared value is 1e-12.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this is an unexpected topic not to forget. cosine similarity should work with other engines as well. We need to check if this is something specific to ct2_runner or it is the same for Llama.cpp

{
"description": "Enable local AI runtime (ctranslate2 + sentencepiece)",
"dependencies": [
"ctranslate2",
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If dependency is disabled the code will not compile correctly. We need additional (small) code in portfile.cmake to support a feature and similar in cmake to conditionally disable for example building docwire_local_ai library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants