LocalAI How-tos

How-tos

These are the LocalAI How tos - Return to LocalAI

This section includes LocalAI end-to-end examples, tutorial and how-tos curated by the community and maintained by lunamidori5. To add your own How Tos, Please open a PR on this github - https://github.com/lunamidori5/Midori-AI-Website/tree/master/content/howtos

Programs and Demos

This section includes other programs and how to setup, install, and use of LocalAI.

Thank you to our collaborators and volunteers

  • TwinFinz: Help with the models template files and reviewing some code
  • Crunchy: PR helping with both installers and removing 7zip need
  • Maxi1134: Making our new HA-OS page for setting up LLM with HA

Subsections of LocalAI How-tos

Easy Model Setup

—– Midori AI Subsystem Manager —–

Use the model installer to install all of the base models like Llava, tts, Stable Diffusion, and more! Click Here

—– By Hand Setup —–

(You do not have to run these steps if you have already done the auto manager)

Lets learn how to setup a model, for this How To we are going to use the Dolphin Mistral 7B model.

To download the model to your models folder, run this command in a commandline of your picking.

curl -O https://tea-cup.midori-ai.xyz/download/7bmodelQ5.gguf

Each model needs at least 4 files, with out these files, the model will run raw, what that means is you can not change settings of the model.

File 1 - The model's GGUF file
File 2 - The model's .yaml file
File 3 - The Chat API .tmpl file
File 4 - The Chat API helper .tmpl file

So lets fix that! We are using lunademo name for this How To but you can name the files what ever you want! Lets make blank files to start with

touch lunademo-chat.tmpl
touch lunademo-chat-block.tmpl
touch lunademo.yaml

Now lets edit the "lunademo-chat-block.tmpl", This is the template that model “Chat” trained models use, but changed for LocalAI

<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}}
{{if .Content}}{{.Content}}{{end}}
<|im_end|>

For the "lunademo-chat.tmpl", Looking at the huggingface repo, this model uses the <|im_start|>assistant tag for when the AI replys, so lets make sure to add that to this file. Do not add the user as we will be doing that in our yaml file!

{{.Input}}
<|im_start|>assistant

For the "lunademo.yaml" file. Lets set it up for your computer or hardware. (If you want to see advanced yaml configs - Link)

We are going to 1st setup the backend and context size.

context_size: 2000

What this does is tell LocalAI how to load the model. Then we are going to add our settings in after that. Lets add the models name and the models settings. The models name: is what you will put into your request when sending a OpenAI request to LocalAI

name: lunademo
parameters:
  model: 7bmodelQ5.gguf

Now that LocalAI knows what file to load with our request, lets add the stopwords and template files to our models yaml file now.

stopwords:
- "user|"
- "assistant|"
- "system|"
- "<|im_end|>"
- "<|im_start|>"
template:
  chat: lunademo-chat
  chat_message: lunademo-chat-block

If you are running on GPU or want to tune the model, you can add settings like (higher the GPU Layers the more GPU used)

f16: true
gpu_layers: 4

To fully tune the model to your like. But be warned, you must restart LocalAI after changing a yaml file

docker compose restart

If you want to check your models yaml, here is a full copy!

context_size: 2000
##Put settings right here for tunning!! Before name but after Backend! (remove this comment before saving the file)
name: lunademo
parameters:
  model: 7bmodelQ5.gguf
stopwords:
- "user|"
- "assistant|"
- "system|"
- "<|im_end|>"
- "<|im_start|>"
template:
  chat: lunademo-chat
  chat_message: lunademo-chat-block

Now that we got that setup, lets test it out but sending a request to Localai!

Easy Setup - Docker

Note

It is highly recommended to check out the Midori AI Subsystem Manager for setting up LocalAI. It does all of this for you!

  • You will need about 10gb of RAM Free
  • You will need about 15gb of space free on C drive for Docker compose

We are going to run LocalAI with docker compose for this set up.

Lets setup our folders for LocalAI (run these to make the folders for you if you wish)

mkdir "LocalAI"
cd LocalAI
mkdir "models"
mkdir "images"

At this point we want to set up our .env file, here is a copy for you to use if you wish, Make sure this is in the LocalAI folder.

## Set number of threads.
## Note: prefer the number of physical cores. Overbooking the CPU degrades performance notably.
LOCALAI_THREADS=2

## Specify a different bind address (defaults to ":8080")
# ADDRESS=127.0.0.1:8080

## Define galleries.
## models will to install will be visible in `/models/available`
LOCALAI_GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]

## Default path for models
LOCALAI_MODELS_PATH=/models

## Enable debug mode
LOCALAI_DEBUG=true

## Disables COMPEL (Lets Stable Diffuser work)
LOCALAI_COMPEL=0

## Enable/Disable single backend (useful if only one GPU is available)
# SINGLE_ACTIVE_BACKEND=true

## Specify a build type. Available: cublas, openblas, clblas.
LOCALAI_BUILD_TYPE=cublas

LOCALAI_REBUILD=true

## Enable go tags, available: stablediffusion, tts
## stablediffusion: image generation with stablediffusion
## tts: enables text-to-speech with go-piper 
## (requires LOCALAI_REBUILD=true)
#
# LOCALAI_GO_TAGS=tts

## Path where to store generated images
# LOCALAI_IMAGE_PATH=/tmp

## Specify a default upload limit in MB (whisper)
# LOCALAI_UPLOAD_LIMIT

# LOCALAI_HUGGINGFACEHUB_API_TOKEN=Token here

Now that we have the .env set lets set up our docker-compose.yaml file. It will use a container from quay.io.

Recommened Midori AI - LocalAI Images

  • lunamidori5/midori_ai_subsystem_localai_cpu:master

For a full list of tags or images please check our docker repo

Base LocalAI Images

  • master
  • latest

Core Images - Smaller images without predownload python dependencies

Images with Nvidia accelleration support

If you do not know which version of CUDA do you have available, you can check with nvidia-smi or nvcc --version

Recommened Midori AI - LocalAI Images (Only Nvidia works for now)

  • lunamidori5/midori_ai_subsystem_localai_nvidia_gpu:master
  • lunamidori5/midori_ai_subsystem_localai_hipblas_gpu:master
  • lunamidori5/midori_ai_subsystem_localai_intelf16_gpu:master
  • lunamidori5/midori_ai_subsystem_localai_intelf32_gpu:master

For a full list of tags or images please check our docker repo

Base LocalAI Images

  • master-cublas-cuda11
  • master-cublas-cuda11-core
  • master-cublas-cuda11-ffmpeg
  • master-cublas-cuda11-ffmpeg-core

Core Images - Smaller images without predownload python dependencies

Images with Nvidia accelleration support

If you do not know which version of CUDA do you have available, you can check with nvidia-smi or nvcc --version

Recommened Midori AI - LocalAI Images (Only Nvidia works for now)

  • lunamidori5/midori_ai_subsystem_localai_nvidia_gpu:master
  • lunamidori5/midori_ai_subsystem_localai_hipblas_gpu:master
  • lunamidori5/midori_ai_subsystem_localai_intelf16_gpu:master
  • lunamidori5/midori_ai_subsystem_localai_intelf32_gpu:master

For a full list of tags or images please check our docker repo

Base LocalAI Images

  • master-cublas-cuda12
  • master-cublas-cuda12-core
  • master-cublas-cuda12-ffmpeg
  • master-cublas-cuda12-ffmpeg-core

Core Images - Smaller images without predownload python dependencies

Also note this docker-compose.yaml file is for CPU only.

services:
  localai-midori-ai-backend:
    image: lunamidori5/midori_ai_subsystem_localai_cpu:master
    ## use this for localai's base 
    ## image: quay.io/go-skynet/local-ai:master
    tty: true # enable colorized logs
    restart: always # should this be on-failure ?
    ports:
      - 8080:8080
    env_file:
      - .env
    volumes:
      - ./models:/models
      - ./images/:/tmp/generated/images/
    command: ["/usr/bin/local-ai" ]

Also note this docker-compose.yaml file is for CUDA only.

Please change the image to what you need.

services:
  localai-midori-ai-backend:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    ## use this for localai's base 
    ## image: quay.io/go-skynet/local-ai:CHANGEMETOIMAGENEEDED
    image: lunamidori5/midori_ai_subsystem_localai_nvidia_gpu:master
    tty: true # enable colorized logs
    restart: always # should this be on-failure ?
    ports:
      - 8080:8080
    env_file:
      - .env
    volumes:
      - ./models:/models
      - ./images/:/tmp/generated/images/
    command: ["/usr/bin/local-ai" ]

Make sure to save that in the root of the LocalAI folder. Then lets spin up the Docker run this in a CMD or BASH

docker compose up -d --pull always

Now we are going to let that set up, once it is done, lets check to make sure our huggingface / localai galleries are working (wait until you see this screen to do this)

You should see:

┌───────────────────────────────────────────────────┐
│                   Fiber v2.42.0                   │
│               http://127.0.0.1:8080               │
│       (bound on host 0.0.0.0 and port 8080)       │
│                                                   │
│ Handlers ............. 1  Processes ........... 1 │
│ Prefork ....... Disabled  PID ................. 1 │
└───────────────────────────────────────────────────┘

Now that we got that setup, lets go setup a model

Easy Setup - Embeddings

To install an embedding model, run the following command

curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
     "id": "model-gallery@bert-embeddings"
   }'  

When you would like to request the model from CLI you can do

curl http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": "The food was delicious and the waiter...",
    "model": "bert-embeddings"
  }'

See OpenAI Embedding for more info!

Easy Setup - Stable Diffusion

—– Midori AI Subsystem Manager —–

Use the model installer to install all of the base models like Llava, tts, Stable Diffusion, and more! Click Here

—– By Hand Setup —–

(You do not have to run these steps if you have already done the auto installer)

In your models folder make a file called stablediffusion.yaml, then edit that file with the following. (You can change dreamlike-art/dreamlike-anime-1.0 with what ever model you would like.)

name: animagine
parameters:
  model: dreamlike-art/dreamlike-anime-1.0
backend: diffusers
cuda: true
f16: true
diffusers:
  scheduler_type: dpm_2_a

If you are using docker, you will need to run in the localai folder with the docker-compose.yaml file in it

docker compose down

Then in your .env file uncomment this line.

COMPEL=0

After that we can reinstall the LocalAI docker VM by running in the localai folder with the docker-compose.yaml file in it

docker compose up -d

Then to download and setup the model, Just send in a normal OpenAI request! LocalAI will do the rest!

curl http://localhost:8080/v1/images/generations -H "Content-Type: application/json" -d '{
  "prompt": "Two Boxes, 1blue, 1red",
  "model": "animagine",
  "size": "1024x1024"
}'

Easy Request - All

Curl Request

Curl Chat API -

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "lunademo",
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.9 
   }'

This is for Python, OpenAI=>V1

OpenAI Chat API Python -

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="sk-xxx")

messages = [
{"role": "system", "content": "You are LocalAI, a helpful, but really confused ai, you will only reply with confused emotes"},
{"role": "user", "content": "Hello How are you today LocalAI"}
]
completion = client.chat.completions.create(
  model="lunademo",
  messages=messages,
)

print(completion.choices[0].message)

See OpenAI API for more info!

This is for Python, OpenAI=0.28.1

OpenAI Chat API Python -

import os
import openai
openai.api_base = "http://localhost:8080/v1"
openai.api_key = "sx-xxx"
OPENAI_API_KEY = "sx-xxx"
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

completion = openai.ChatCompletion.create(
  model="lunademo",
  messages=[
    {"role": "system", "content": "You are LocalAI, a helpful, but really confused ai, you will only reply with confused emotes"},
    {"role": "user", "content": "How are you?"}
  ]
)

print(completion.choices[0].message.content)

OpenAI Completion API Python -

import os
import openai
openai.api_base = "http://localhost:8080/v1"
openai.api_key = "sx-xxx"
OPENAI_API_KEY = "sx-xxx"
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

completion = openai.Completion.create(
  model="lunademo",
  prompt="function downloadFile(string url, string outputPath) ",
  max_tokens=256,
  temperature=0.5)

print(completion.choices[0].text)

HA-OS (HomeLLM) x LocalAI


Home Assistant is an open-source home automation platform that allows users to control and monitor various smart devices in their homes. It supports a wide range of devices, including lights, thermostats, security systems, and more. The platform is designed to be user-friendly and customizable, enabling users to create automations and routines to make their homes more convenient and efficient. Home Assistant can be accessed through a web interface or a mobile app, and it can be installed on a variety of hardware platforms, such as Raspberry Pi or a dedicated server.

Currently, Home Assistant supports conversation-based agents and services. As of writing this, OpenAIs API is supported as a conversation agent; however, access to your homes devices and entities is possible through custom components. Local based services, such as LocalAI, are also available as a drop-in replacement for OpenAI services.


In this guide I will detail the steps I’ve taken to get Home-LLM and Local-AI working together in conjunction with Home-Assistant!

This guide assumes that you already have Local-AI running (in or out of the subsystem). If that is not done, you can Follow this How To or Install Using Midori AI Subsystem!


  • 1: You will first need to follow this guide to install Home-LLMinto your Home-Assistant installation.

    If you simply want to install the Home-LLM component through HACS, you can press on this button:

    Open your Home Assistant instance and open a repository inside the Home Assistant Community Store.

  • 2: Add Home LLM Conversation integration to HA.

    • 1: Access the Settings page.
    • 2: Click on Devices & services.
    • 3: Click on + ADD INTEGRATION on the lower-right part of the screen.
    • 4: Type and then select Local LLM Conversation.
    • 5: Select the Generic OpenAI Compatible API.
    • 6: Enter the hostname or IP Address of your LocalAI host.
    • 7: Enter the used port (Default is 8080 / 38080).
    • 8: Enter mistral-7b-instruct-v0.3 as the Model Name*
      • Leave API Key empty
      • Do not check Use HTTPS
      • leave API Path* as /v1
    • 9: Press Next
    • 10: Select Assist under Selected LLM API
    • 11: Make sure the Prompt Format* is set to Mistral
    • 12: Make sure Enable in context learning (ICL) examples is checked.
    • 13: Press Sumbit
    • 14: Press Finish

photo photo

  • 3: Configure the Voice assistant.

    • 1: Access the Settings page.
    • 2: Click on Voice assistants.
    • 3: Click on + ADD ASSISTANT.
    • 4: Name the Assistant HomeLLM.
    • 5: Select English as the Language.
    • 6: Set the Conversation agent to the newly created LLM Model 'mistral-7b-instruct-v0.3' (remote).
    • 7: Set your Speech-to-text Wake word, and Text-to-speech to the ones you use. Leave to None if you don’t have any.
    • 8: Click Create
  • 4: Select the newly created voice assistant as the default one.

    • While remaining on the Voice assistants page click on the newly create assistant, and press the start at the top-right corner.

There you go! Your Assistant should now be working with Local-AI through Home-LLM!

  • Make sure that the entities you want to control are exposted to Assist within Home-Assistant!
Notice

Important Note:

Any devices you choose to expose to the model will be added to the context and may have their state changed by the model. Only expose devices that you are comfortable with the model modifying, even if the modification is not what you intended. The model may occasionally hallucinate and issue commands to the wrong device. Use at your own risk.

Voice Assistant HA-OS

In this guide I will explain how I’ve setup my Local voice assistant and satellites!

A few softwares will be used in this guide.

HACS for easy installation of the other tools on Home Assistant.
LocalAI for the backend of the LLM.
Home-LLM to connect our LocalAI instance to Home-assistant.
HA-Fallback-Conversation to allow HA to use both the baked-in intent as well as the LLM as a fallback if no intent is found.
Willow for the ESP32 sattelites.


Step 1) Installing LocalAI

We will start by installing LocalAI on our machine learning host.
I recommend using a good machine with access to a GPU with at least 12 GB of Vram. As Willow itself can takes up to 6gb of Vram with another 4-5GB for our LLM model. I recommend keeping those loaded in the machine at all time for speedy reaction times on our satellites.

Here an example of the VRAM usage for Willow and LocalAI with the Llampa 8B model:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        Off |   00000000:01:00.0 Off |                  N/A |
|  0%   39C    P8             16W /  370W |   10341MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2862      C   /opt/conda/bin/python                        3646MiB |
|    0   N/A  N/A      2922      C   /usr/bin/python                              2108MiB |
|    0   N/A  N/A   2724851      C   .../backend-assets/grpc/llama-cpp-avx2       4568MiB |
+-----------------------------------------------------------------------------------------+

I’ve chosen the Docker-Compose method for my LocalAI installation, this allows for easy management and easier upgrades when new relases are available.
This allows us to quickly create a container running LocalAI on our machine.

In order to do so, stop by the how to on how to setup a docker compose for LocalAI

Setup LocalAI with Docker Compose

Once that is done simply use docker compose up -d and your LocalAI instance should now be available at: http://(hostipadress):8080/


Step 1.a) Downloading the LLM model

Once LocalAI if installed, you should be able to browse to the “Models” tab, that redirects to http://{{host}}:8080/browse. There we will search for the mistral-7b-instruct-v0.3 model and install it.

Once that is done, make sure that the model is working by heading to the Chat tab and selecting the model mistral-7b-instruct-v0.3 and initiating a chat.

alt text alt text


Step 2) Installing Home-LLM

  • 1: You will first need to install the Home-LLM integration to Home-Assistant
    Thankfuly, there is a neat link to do that easely on their repo!

    Open your Home Assistant instance and open a repository inside the Home Assistant Community Store.

  • 2: Restart Home Assistant

  • 3: You will then need to add the Home LLM Conversation integration to Home-Assistant in order to connect LocalAI to it.

    • 1: Access the Settings page.
    • 2: Click on Devices & services.
    • 3: Click on + ADD INTEGRATION on the lower-right part of the screen.
    • 4: Type and then select Local LLM Conversation.
    • 5: Select the Generic OpenAI Compatible API.
    • 6: Enter the hostname or IP Address of your LocalAI host.
    • 7: Enter the used port (Default is 8080).
    • 8: Enter mistral-7b-instruct-v0.3 as the Model Name*
      • Leave API Key empty
      • Do not check Use HTTPS
      • leave API Path* as /v1
    • 9: Press Next
    • 10: Select Assist under Selected LLM API
    • 11: Make sure the Prompt Format* is set to Mistral
    • 12: Make sure Enable in context learning (ICL) examples is checked.
    • 13: Press Sumbit
    • 14: Press Finish

photo photo


Step 3) Installing HA-Fallback-Conversation

  • 1: Integrate Fallback Conversation to Home-Assistant

    • 1: Access the HACS page.
    • 2: Search for Fallback
    • 3: Click on fallback_conversation.
    • 4: Click on Download and install the integration
    • 5: Restart Home Assistant for the integration to be detected.
    • 6: Access the Settings page.
    • 7: Click on Devices & services.
    • 8: Click on + ADD INTEGRATION on the lower-right part of the screen.
    • 8: Search for Fallback
    • 9: Click on Fallback Conversation Agent.
    • 10 Set the debug level at Some Debug for now.
    • 11: Click Sumbit
  • 2: Configure the Voice assistant within Home-assistant to use the newly added model through the Fallback Conversation Agent.

    • 1: Access the Settings page.
    • 2: Click on Devices & services.
    • 3: Click on Fallback Conversation Agent.
    • 4: Click on CONFIGURE.
    • 5: Select Home assistnat as the Primary Conversation Agent.
    • 6: Select LLM MODEL 'mistral-7b-instruct-v0.3'(remote) as the Falback conversation Agent.

Step 4) Selecting the right agent in the Voice assistant settings.

  • 1: Integrate Fallback Conversation to Home-Assistant
  • 1: Access the Settings page.
  • 2: Click on Voice assistants page.
  • 3: Click on Add Assistant.
  • 4: Set the fields as wanted except for Conversation Agent.
  • 5: Select Fallback Conversation Agent as the Conversation agent.

Step 5) Setting up Willow Voice assistant satellites.

Since willow is a more complex Software, I will simply leave Their guide here. I do recommend deploying your own Willow Inference Server in order to remain completely local!

Once the Willow sattelites are connencted to Home Assistant, they should automatically use your default Voice Assistant. Be sure to set the one using the fallback system as your favorite/default one!