You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the solution you'd like
The plugin would return the answer sentence by sentence. Then, in the conversation window itself, we would not see one response from the model, but we would see, for example, 4-5 one-sentence responses. This functionality could be configurable where we could enable or disable it during model configuration.
Additional context
Currently I am using Nvidia Jetson AGX Orin 64GB and I have run the LLM 11B model on it (this device consumes very little electricity). However, the response generation time takes approximately 15-20 seconds. Is it possible to reply with sentences generated by the Ollama server? This solution would allow the generation of sentence by sentence via STT (piper) and the effect of generating this text would not be visible.
The text was updated successfully, but these errors were encountered:
As far as I know: Home Assistant doesn't support this currently. As soon as Home Assistant has support for doing this via the chat UI or via the STT agent, then I would consider doing it.
Please describe what you are trying to do with the component
It is possible to provide
stream
response from ollama model? The current model provides the entire answer: https://github.com/acon96/home-llm/blob/develop/custom_components/llama_conversation/conversation.py#L1513.I checked how looks response from Ollama API and it just response tokens by tokens:
Describe the solution you'd like
The plugin would return the answer sentence by sentence. Then, in the conversation window itself, we would not see one response from the model, but we would see, for example, 4-5 one-sentence responses. This functionality could be configurable where we could enable or disable it during model configuration.
Additional context
Currently I am using Nvidia Jetson AGX Orin 64GB and I have run the LLM 11B model on it (this device consumes very little electricity). However, the response generation time takes approximately 15-20 seconds. Is it possible to reply with sentences generated by the Ollama server? This solution would allow the generation of sentence by sentence via STT (piper) and the effect of generating this text would not be visible.
The text was updated successfully, but these errors were encountered: