Now lets edit the "lunademo-chat-block.tmpl", This is the template that model “Chat” trained models use, but changed for LocalAI
<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}}
{{if .Content}}{{.Content}}{{end}}
<|im_end|>
For the "lunademo-chat.tmpl", Looking at the huggingface repo, this model uses the <|im_start|>assistant tag for when the AI replys, so lets make sure to add that to this file. Do not add the user as we will be doing that in our yaml file!
{{.Input}}
<|im_start|>assistant
For the "lunademo.yaml" file. Lets set it up for your computer or hardware. (If you want to see advanced yaml configs - Link)
We are going to 1st setup the backend and context size.
context_size:2000
What this does is tell LocalAI how to load the model. Then we are going to add our settings in after that. Lets add the models name and the models settings. The models name: is what you will put into your request when sending a OpenAI request to LocalAI
name:lunademoparameters:model:7bmodelQ5.gguf
Now that LocalAI knows what file to load with our request, lets add the stopwords and template files to our models yaml file now.
If you are running on GPU or want to tune the model, you can add settings like (higher the GPU Layers the more GPU used)
f16:truegpu_layers:4
To fully tune the model to your like. But be warned, you must restart LocalAI after changing a yaml file
docker compose restart
If you want to check your models yaml, here is a full copy!
context_size:2000##Put settings right here for tunning!! Before name but after Backend! (remove this comment before saving the file)name:lunademoparameters:model:7bmodelQ5.ggufstopwords:- "user|"- "assistant|"- "system|"- "<|im_end|>"- "<|im_start|>"template:chat:lunademo-chatchat_message:lunademo-chat-block
Now that we got that setup, lets test it out but sending a request to Localai!
Easy Setup - Docker
Note
It is highly recommended to check out the Midori AI Subsystem Manager for setting up LocalAI. It does all of this for you!
You will need about 10gb of RAM Free
You will need about 15gb of space free on C drive for Docker compose
We are going to run LocalAI with docker compose for this set up.
Lets setup our folders for LocalAI (run these to make the folders for you if you wish)
At this point we want to set up our .env file, here is a copy for you to use if you wish, Make sure this is in the LocalAI folder.
## Set number of threads.## Note: prefer the number of physical cores. Overbooking the CPU degrades performance notably.LOCALAI_THREADS=2## Specify a different bind address (defaults to ":8080")# ADDRESS=127.0.0.1:8080## Define galleries.## models will to install will be visible in `/models/available`LOCALAI_GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]## Default path for modelsLOCALAI_MODELS_PATH=/models
## Enable debug modeLOCALAI_DEBUG=true## Disables COMPEL (Lets Stable Diffuser work)LOCALAI_COMPEL=0## Enable/Disable single backend (useful if only one GPU is available)# SINGLE_ACTIVE_BACKEND=true## Specify a build type. Available: cublas, openblas, clblas.LOCALAI_BUILD_TYPE=cublas
LOCALAI_REBUILD=true## Enable go tags, available: stablediffusion, tts## stablediffusion: image generation with stablediffusion## tts: enables text-to-speech with go-piper ## (requires LOCALAI_REBUILD=true)## LOCALAI_GO_TAGS=tts## Path where to store generated images# LOCALAI_IMAGE_PATH=/tmp## Specify a default upload limit in MB (whisper)# LOCALAI_UPLOAD_LIMIT# LOCALAI_HUGGINGFACEHUB_API_TOKEN=Token here
Now that we have the .env set lets set up our docker-compose.yaml file.
It will use a container from quay.io.
Core Images - Smaller images without predownload python dependencies
Also note this docker-compose.yaml file is for CPU only.
services: localai-midori-ai-backend: image: lunamidori5/midori_ai_subsystem_localai_cpu:master## use this for localai's base ## image: quay.io/go-skynet/local-ai:master tty: true# enable colorized logs restart: always # should this be on-failure ? ports: - 8080:8080 env_file: - .env volumes: - ./models:/models - ./images/:/tmp/generated/images/ command: ["/usr/bin/local-ai"]
Also note this docker-compose.yaml file is for CUDA only.
Please change the image to what you need.
services: localai-midori-ai-backend: deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu]## use this for localai's base ## image: quay.io/go-skynet/local-ai:CHANGEMETOIMAGENEEDED image: lunamidori5/midori_ai_subsystem_localai_nvidia_gpu:master tty: true# enable colorized logs restart: always # should this be on-failure ? ports: - 8080:8080 env_file: - .env volumes: - ./models:/models - ./images/:/tmp/generated/images/ command: ["/usr/bin/local-ai"]
Make sure to save that in the root of the LocalAI folder. Then lets spin up the Docker run this in a CMD or BASH
docker compose up -d --pull always
Now we are going to let that set up, once it is done, lets check to make sure our huggingface / localai galleries are working (wait until you see this screen to do this)
When you would like to request the model from CLI you can do
curl http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json"\
-d '{
"input": "The food was delicious and the waiter...",
"model": "bert-embeddings"
}'
Use the model installer to install all of the base models like Llava, tts, Stable Diffusion, and more! Click Here
—– By Hand Setup —–
(You do not have to run these steps if you have already done the auto installer)
In your models folder make a file called stablediffusion.yaml, then edit that file with the following. (You can change dreamlike-art/dreamlike-anime-1.0 with what ever model you would like.)
fromopenaiimportOpenAIclient=OpenAI(base_url="http://localhost:8080/v1",api_key="sk-xxx")messages=[{"role":"system","content":"You are LocalAI, a helpful, but really confused ai, you will only reply with confused emotes"},{"role":"user","content":"Hello How are you today LocalAI"}]completion=client.chat.completions.create(model="lunademo",messages=messages,)print(completion.choices[0].message)
importosimportopenaiopenai.api_base="http://localhost:8080/v1"openai.api_key="sx-xxx"OPENAI_API_KEY="sx-xxx"os.environ['OPENAI_API_KEY']=OPENAI_API_KEYcompletion=openai.ChatCompletion.create(model="lunademo",messages=[{"role":"system","content":"You are LocalAI, a helpful, but really confused ai, you will only reply with confused emotes"},{"role":"user","content":"How are you?"}])print(completion.choices[0].message.content)
Home Assistant is an open-source home automation platform that allows users to control and monitor various smart devices in their homes. It supports a wide range of devices, including lights, thermostats, security systems, and more. The platform is designed to be user-friendly and customizable, enabling users to create automations and routines to make their homes more convenient and efficient. Home Assistant can be accessed through a web interface or a mobile app, and it can be installed on a variety of hardware platforms, such as Raspberry Pi or a dedicated server.
Currently, Home Assistant supports conversation-based agents and services. As of writing this, OpenAIs API is supported as a conversation agent; however, access to your homes devices and entities is possible through custom components. Local based services, such as LocalAI, are also available as a drop-in replacement for OpenAI services.
In this guide I will detail the steps I’ve taken to get Home-LLM and Local-AI working together in conjunction with Home-Assistant!
3: Click on + ADD INTEGRATION on the lower-right part of the screen.
4: Type and then select Local LLM Conversation.
5: Select the Generic OpenAI Compatible API.
6: Enter the hostname or IP Address of your LocalAI host.
7: Enter the used port (Default is 8080 / 38080).
8: Enter mistral-7b-instruct-v0.3 as the Model Name*
Leave API Key empty
Do not check Use HTTPS
leave API Path* as /v1
9: Press Next
10: Select Assist under Selected LLM API
11: Make sure the Prompt Format* is set to Mistral
12: Make sure Enable in context learning (ICL) examples is checked.
13: Press Sumbit
14: Press Finish
3: Configure the Voice assistant.
1: Access the Settings page.
2: Click on Voice assistants.
3: Click on + ADD ASSISTANT.
4: Name the Assistant HomeLLM.
5: Select English as the Language.
6: Set the Conversation agent to the newly created LLM Model 'mistral-7b-instruct-v0.3' (remote).
7: Set your Speech-to-textWake word, and Text-to-speech to the ones you use. Leave to None if you don’t have any.
8: Click Create
4: Select the newly created voice assistant as the default one.
While remaining on the Voice assistants page click on the newly create assistant, and press the start at the top-right corner.
There you go! Your Assistant should now be working with Local-AI through Home-LLM!
Make sure that the entities you want to control are exposted to Assist within Home-Assistant!
Notice
Important Note:
Any devices you choose to expose to the model will be added to the context and may have their state changed by the model. Only expose devices that you are comfortable with the model modifying, even if the modification is not what you intended. The model may occasionally hallucinate and issue commands to the wrong device. Use at your own risk.
Voice Assistant HA-OS
In this guide I will explain how I’ve setup my Local voice assistant and satellites!
A few softwares will be used in this guide.
HACS for easy installation of the other tools on Home Assistant. LocalAI for the backend of the LLM. Home-LLM to connect our LocalAI instance to Home-assistant. HA-Fallback-Conversation to allow HA to use both the baked-in intent as well as the LLM as a fallback if no intent is found. Willow for the ESP32 sattelites.
Step 1) Installing LocalAI
We will start by installing LocalAI on our machine learning host.
I recommend using a good machine with access to a GPU with at least 12 GB of Vram. As Willow itself can takes up to 6gb of Vram with another 4-5GB for our LLM model. I recommend keeping those loaded in the machine at all time for speedy reaction times on our satellites.
Here an example of the VRAM usage for Willow and LocalAI with the Llampa 8B model:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:01:00.0 Off | N/A |
| 0% 39C P8 16W / 370W | 10341MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2862 C /opt/conda/bin/python 3646MiB |
| 0 N/A N/A 2922 C /usr/bin/python 2108MiB |
| 0 N/A N/A 2724851 C .../backend-assets/grpc/llama-cpp-avx2 4568MiB |
+-----------------------------------------------------------------------------------------+
I’ve chosen the Docker-Compose method for my LocalAI installation, this allows for easy management and easier upgrades when new relases are available.
This allows us to quickly create a container running LocalAI on our machine.
In order to do so, stop by the how to on how to setup a docker compose for LocalAI
Once that is done simply use docker compose up -d and your LocalAI instance should now be available at:
http://(hostipadress):8080/
Step 1.a) Downloading the LLM model
Once LocalAI if installed, you should be able to browse to the “Models” tab, that redirects to http://{{host}}:8080/browse. There we will search for the mistral-7b-instruct-v0.3 model and install it.
Once that is done, make sure that the model is working by heading to the Chat tab and selecting the model mistral-7b-instruct-v0.3 and initiating a chat.
Step 2) Installing Home-LLM
1: You will first need to install the Home-LLM integration to Home-Assistant
Thankfuly, there is a neat link to do that easely on their repo!
1: Integrate Fallback Conversation to Home-Assistant
1: Access the HACS page.
2: Search for Fallback
3: Click on fallback_conversation.
4: Click on Download and install the integration
5: Restart Home Assistant for the integration to be detected.
6: Access the Settings page.
7: Click on Devices & services.
8: Click on + ADD INTEGRATION on the lower-right part of the screen.
8: Search for Fallback
9: Click on Fallback Conversation Agent.
10 Set the debug level at Some Debug for now.
11: Click Sumbit
2: Configure the Voice assistant within Home-assistant to use the newly added model through the Fallback Conversation Agent.
1: Access the Settings page.
2: Click on Devices & services.
3: Click on Fallback Conversation Agent.
4: Click on CONFIGURE.
5: Select Home assistnat as the Primary Conversation Agent.
6: Select LLM MODEL 'mistral-7b-instruct-v0.3'(remote) as the Falback conversation Agent.
Step 4) Selecting the right agent in the Voice assistant settings.
1: Integrate Fallback Conversation to Home-Assistant
1: Access the Settings page.
2: Click on Voice assistants page.
3: Click on Add Assistant.
4: Set the fields as wanted except for Conversation Agent.
5: Select Fallback Conversation Agent as the Conversation agent.
Step 5) Setting up Willow Voice assistant satellites.
Since willow is a more complex Software, I will simply leave Their guide here.
I do recommend deploying your own Willow Inference Server in order to remain completely local!
Once the Willow sattelites are connencted to Home Assistant, they should automatically use your default Voice Assistant.
Be sure to set the one using the fallback system as your favorite/default one!