LocalAI How-tos

How-tos

These are the LocalAI How tos - Return to LocalAI

This section includes LocalAI end-to-end examples, tutorial and how-tos curated by the community and maintained by lunamidori5. To add your own How Tos, Please open a PR on this github - https://github.com/lunamidori5/Midori-AI

Programs and Demos

This section includes other programs and how to setup, install, and use of LocalAI.

Thank you to our collaborators and volunteers

  • TwinFinz: Help with the models template files and reviewing some code
  • Crunchy: PR helping with both installers and removing 7zip need

Subsections of LocalAI How-tos

Easy Model Setup

—– Midori AI Subsystem Manager —–

Use the model installer to install all of the base models like Llava, tts, Stable Diffusion, and more! Click Here

—– By Hand Setup —–

(You do not have to run these steps if you have already done the auto manager)

Lets learn how to setup a model, for this How To we are going to use the Dolphin Mistral 7B model.

To download the model to your models folder, run this command in a commandline of your picking.

curl -O https://tea-cup.midori-ai.xyz/download/7bmodelQ5.gguf

Each model needs at least 4 files, with out these files, the model will run raw, what that means is you can not change settings of the model.

File 1 - The model's GGUF file
File 2 - The model's .yaml file
File 3 - The Chat API .tmpl file
File 4 - The Chat API helper .tmpl file

So lets fix that! We are using lunademo name for this How To but you can name the files what ever you want! Lets make blank files to start with

touch lunademo-chat.tmpl
touch lunademo-chat-block.tmpl
touch lunademo.yaml

Now lets edit the "lunademo-chat-block.tmpl", This is the template that model “Chat” trained models use, but changed for LocalAI

<|im_start|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "user"}}user{{end}}
{{if .Content}}{{.Content}}{{end}}
<|im_end|>

For the "lunademo-chat.tmpl", Looking at the huggingface repo, this model uses the <|im_start|>assistant tag for when the AI replys, so lets make sure to add that to this file. Do not add the user as we will be doing that in our yaml file!

{{.Input}}
<|im_start|>assistant

For the "lunademo.yaml" file. Lets set it up for your computer or hardware. (If you want to see advanced yaml configs - Link)

We are going to 1st setup the backend and context size.

context_size: 2000

What this does is tell LocalAI how to load the model. Then we are going to add our settings in after that. Lets add the models name and the models settings. The models name: is what you will put into your request when sending a OpenAI request to LocalAI

name: lunademo
parameters:
  model: 7bmodelQ5.gguf

Now that LocalAI knows what file to load with our request, lets add the stopwords and template files to our models yaml file now.

stopwords:
- "user|"
- "assistant|"
- "system|"
- "<|im_end|>"
- "<|im_start|>"
template:
  chat: lunademo-chat
  chat_message: lunademo-chat-block

If you are running on GPU or want to tune the model, you can add settings like (higher the GPU Layers the more GPU used)

f16: true
gpu_layers: 4

To fully tune the model to your like. But be warned, you must restart LocalAI after changing a yaml file

docker compose restart

If you want to check your models yaml, here is a full copy!

context_size: 2000
##Put settings right here for tunning!! Before name but after Backend! (remove this comment before saving the file)
name: lunademo
parameters:
  model: 7bmodelQ5.gguf
stopwords:
- "user|"
- "assistant|"
- "system|"
- "<|im_end|>"
- "<|im_start|>"
template:
  chat: lunademo-chat
  chat_message: lunademo-chat-block

Now that we got that setup, lets test it out but sending a request to Localai!

Easy Setup - Docker

Note

It is highly recommended to check out the Midori AI Subsystem Manager for setting up LocalAI. It does all of this for you!

  • You will need about 10gb of RAM Free
  • You will need about 15gb of space free on C drive for Docker compose

We are going to run LocalAI with docker compose for this set up.

Lets setup our folders for LocalAI (run these to make the folders for you if you wish)

mkdir "LocalAI"
cd LocalAI
mkdir "models"
mkdir "images"

At this point we want to set up our .env file, here is a copy for you to use if you wish, Make sure this is in the LocalAI folder.

## Set number of threads.
## Note: prefer the number of physical cores. Overbooking the CPU degrades performance notably.
THREADS=2

## Specify a different bind address (defaults to ":8080")
# ADDRESS=127.0.0.1:8080

## Define galleries.
## models will to install will be visible in `/models/available`
GALLERIES=[{"name":"model-gallery", "url":"github:go-skynet/model-gallery/index.yaml"}, {"url": "github:go-skynet/model-gallery/huggingface.yaml","name":"huggingface"}]

## Default path for models
MODELS_PATH=/models

## Enable debug mode
DEBUG=true

## Disables COMPEL (Lets Stable Diffuser work)
COMPEL=0

## Enable/Disable single backend (useful if only one GPU is available)
# SINGLE_ACTIVE_BACKEND=true

## Specify a build type. Available: cublas, openblas, clblas.
BUILD_TYPE=cublas

## Uncomment and set to true to enable rebuilding from source
# REBUILD=true

## Enable go tags, available: stablediffusion, tts
## stablediffusion: image generation with stablediffusion
## tts: enables text-to-speech with go-piper 
## (requires REBUILD=true)
#
#GO_TAGS=tts

## Path where to store generated images
# IMAGE_PATH=/tmp

## Specify a default upload limit in MB (whisper)
# UPLOAD_LIMIT

# HUGGINGFACEHUB_API_TOKEN=Token here

Now that we have the .env set lets set up our docker-compose file. It will use a container from quay.io.

Recommened Midori AI - LocalAI Images

  • lunamidori5/midori_ai_subsystem_localai_cpu:master

For a full list of tags or images please check our docker repo

Base LocalAI Images

  • master
  • latest
  • v2.11.0
  • v2.11.0-ffmpeg
  • v2.11.0-ffmpeg-core

Core Images - Smaller images without predownload python dependencies

Images with Nvidia accelleration support

If you do not know which version of CUDA do you have available, you can check with nvidia-smi or nvcc --version

Recommened Midori AI - LocalAI Images

  • lunamidori5/midori_ai_subsystem_localai_gpu:master

For a full list of tags or images please check our docker repo

Base LocalAI Images

  • master-cublas-cuda11
  • master-cublas-cuda11-core
  • master-cublas-cuda11-ffmpeg
  • master-cublas-cuda11-ffmpeg-core
  • v2.11.0-cublas-cuda11
  • v2.11.0-cublas-cuda11-core
  • v2.11.0-cublas-cuda11-ffmpeg
  • v2.11.0-cublas-cuda11-ffmpeg-core

Core Images - Smaller images without predownload python dependencies

Images with Nvidia accelleration support

If you do not know which version of CUDA do you have available, you can check with nvidia-smi or nvcc --version

Recommened Midori AI - LocalAI Images

  • lunamidori5/midori_ai_subsystem_localai_gpu:master

For a full list of tags or images please check our docker repo

Base LocalAI Images

  • master-cublas-cuda12
  • master-cublas-cuda12-core
  • master-cublas-cuda12-ffmpeg
  • master-cublas-cuda12-ffmpeg-core
  • v2.11.0-cublas-cuda12
  • v2.11.0-cublas-cuda12-core
  • v2.11.0-cublas-cuda12-ffmpeg
  • v2.11.0-cublas-cuda12-ffmpeg-core

Core Images - Smaller images without predownload python dependencies

Also note this docker-compose file is for CPU only.

version: '3.6'

services:
  localai-midori-ai-backend:
    image: lunamidori5/midori_ai_subsystem_localai_cpu:master
    ## use this for localai's base 
    ## image: quay.io/go-skynet/local-ai:master
    tty: true # enable colorized logs
    restart: always # should this be on-failure ?
    ports:
      - 8080:8080
    env_file:
      - .env
    volumes:
      - ./models:/models
      - ./images/:/tmp/generated/images/
    command: ["/usr/bin/local-ai" ]

Also note this docker-compose file is for CUDA only.

Please change the image to what you need.

version: '3.6'

services:
  localai-midori-ai-backend:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    ## use this for localai's base 
    ## image: quay.io/go-skynet/local-ai:CHANGEMETOIMAGENEEDED
    image: lunamidori5/midori_ai_subsystem_localai_gpu:master
    tty: true # enable colorized logs
    restart: always # should this be on-failure ?
    ports:
      - 8080:8080
    env_file:
      - .env
    volumes:
      - ./models:/models
      - ./images/:/tmp/generated/images/
    command: ["/usr/bin/local-ai" ]

Make sure to save that in the root of the LocalAI folder. Then lets spin up the Docker run this in a CMD or BASH

docker compose up -d --pull always

Now we are going to let that set up, once it is done, lets check to make sure our huggingface / localai galleries are working (wait until you see this screen to do this)

You should see:

┌───────────────────────────────────────────────────┐
│                   Fiber v2.42.0                   │
│               http://127.0.0.1:8080               │
│       (bound on host 0.0.0.0 and port 8080)       │
│                                                   │
│ Handlers ............. 1  Processes ........... 1 │
│ Prefork ....... Disabled  PID ................. 1 │
└───────────────────────────────────────────────────┘

Now that we got that setup, lets go setup a model

Easy Setup - Embeddings

To install an embedding model, run the following command

curl http://localhost:8080/models/apply -H "Content-Type: application/json" -d '{
     "id": "model-gallery@bert-embeddings"
   }'  

When you would like to request the model from CLI you can do

curl http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": "The food was delicious and the waiter...",
    "model": "bert-embeddings"
  }'

See OpenAI Embedding for more info!

Easy Setup - Stable Diffusion

—– Midori AI Subsystem Manager —–

Use the model installer to install all of the base models like Llava, tts, Stable Diffusion, and more! Click Here

—– By Hand Setup —–

(You do not have to run these steps if you have already done the auto installer)

In your models folder make a file called stablediffusion.yaml, then edit that file with the following. (You can change dreamlike-art/dreamlike-anime-1.0 with what ever model you would like.)

name: animagine
parameters:
  model: dreamlike-art/dreamlike-anime-1.0
backend: diffusers
cuda: true
f16: true
diffusers:
  scheduler_type: dpm_2_a

If you are using docker, you will need to run in the localai folder with the docker-compose.yaml file in it

docker compose down

Then in your .env file uncomment this line.

COMPEL=0

After that we can reinstall the LocalAI docker VM by running in the localai folder with the docker-compose.yaml file in it

docker compose up -d

Then to download and setup the model, Just send in a normal OpenAI request! LocalAI will do the rest!

curl http://localhost:8080/v1/images/generations -H "Content-Type: application/json" -d '{
  "prompt": "Two Boxes, 1blue, 1red",
  "size": "1024x1024"
}'

Easy Request - All

Curl Request

Curl Chat API -

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "lunademo",
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.9 
   }'

This is for Python, OpenAI=>V1

OpenAI Chat API Python -

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="sk-xxx")

messages = [
{"role": "system", "content": "You are LocalAI, a helpful, but really confused ai, you will only reply with confused emotes"},
{"role": "user", "content": "Hello How are you today LocalAI"}
]
completion = client.chat.completions.create(
  model="lunademo",
  messages=messages,
)

print(completion.choices[0].message)

See OpenAI API for more info!

This is for Python, OpenAI=0.28.1

OpenAI Chat API Python -

import os
import openai
openai.api_base = "http://localhost:8080/v1"
openai.api_key = "sx-xxx"
OPENAI_API_KEY = "sx-xxx"
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

completion = openai.ChatCompletion.create(
  model="lunademo",
  messages=[
    {"role": "system", "content": "You are LocalAI, a helpful, but really confused ai, you will only reply with confused emotes"},
    {"role": "user", "content": "How are you?"}
  ]
)

print(completion.choices[0].message.content)

OpenAI Completion API Python -

import os
import openai
openai.api_base = "http://localhost:8080/v1"
openai.api_key = "sx-xxx"
OPENAI_API_KEY = "sx-xxx"
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY

completion = openai.Completion.create(
  model="lunademo",
  prompt="function downloadFile(string url, string outputPath) ",
  max_tokens=256,
  temperature=0.5)

print(completion.choices[0].text)

Home Assistant x LocalAI

Home Assistant is an open-source home automation platform that allows users to control and monitor various smart devices in their homes. It supports a wide range of devices, including lights, thermostats, security systems, and more. The platform is designed to be user-friendly and customizable, enabling users to create automations and routines to make their homes more convenient and efficient. Home Assistant can be accessed through a web interface or a mobile app, and it can be installed on a variety of hardware platforms, such as Raspberry Pi or a dedicated server.

Currently, Home Assistant supports conversation-based agents and services. As of writing this, OpenAIs API is supported as a conversation agent; however, access to your homes devices and entities is possible through custom components. Local based services, such as LocalAI, are also available as a drop-in replacement for OpenAI services.

There are multiple custom integrations available:

Please note that both of the projects are similar in term of visual interfaces, they seem to be derived from the official Home Assistant plugin: OpenAI Conversation (to be confirmed)

  • Home-LLM is a Home Assistant integration developed by Alex O’Connell (acon96) that allows for a completely local Large Language Model acting as a personal assistant. Using LocalAI as the backend is one of the supported platforms. The provided Large Language Models are specifically trained for home assistant and are therefore smaller in size.
  • Extended OpenAI Conversation uses OpenAI API’s feature of function calling to call service of Home Assistant. Is more generic and work with most of the Large Language Model.

Home-LLM

Installation Instructions – LocalAI

To install LocalAI, use our Midori AI Subsystem Manager

Installation Instructions – Home LLM (The HA plugin)

Please follow the installation instructions on Home-LLM repo to install HACS plug-in.

Setting up the plugin for HA & LocalAI

Before adding the Llama Conversation agent in Home Assistant, you must download a LLM in the LocalAI models directory. Although you may use any model you want, this specific integration uses a model that has been specifically fine-tuned to work with Home Assistant. Performance will vary widely with other models.

The models can be found on the Midori AI model repo, as a part of the LocalAI manager.

Use the Midori AI Subsystem Manager for a easy time installing models or follow Seting up a Model

Setting up the “remote” backend:

You will need the following settings in order to configure LocalAI backend:

  • Hostname: the host of the machine where LocalAI is installed and hosted.
  • Port: The port you listed in your docker-compose.yaml (normally 8080)
  • Name of the Model as exactly in the model.yaml file: This name must EXACTLY match the name as it appears in the file.

The component will validate that the selected model is available for use and will ensure it is loaded remotely.

Once you have this information, proceed to “add Integration” in Home Assistant and search for “Llama Conversation” Here you will be greeted with a config flow to add the above information. Once the information is accepted, search your integrations for “Llama Conversation” and you can now view your settings including prompt, temperature, top K and other parameters. For LocalAI use, please make sure to select that ChatML prompt and to use ‘Use chat completions endpoint’.

Configuring the component as a Conversation Agent

In order to utilize the conversation agent in HomeAssistant, you will need to configure it as a conversation agent. This can be done by following the the instructions here.

Note

ANY DEVICES THAT YOU SELECT TO BE EXPOSED TO THE MODEL WILL BE ADDED AS CONTEXT AND POTENTIALLY HAVE THEIR STATE CHANGED BY THE MODEL. ONLY EXPOSE DEVICES THAT YOU ARE OK WITH THE MODEL MODIFYING THE STATE OF, EVEN IF IT IS NOT WHAT YOU REQUESTED. THE MODEL MAY OCCASIONALLY HALLUCINATE AND ISSUE COMMANDS TO THE WRONG DEVICE! USE AT YOUR OWN RISK.

Changing the prompt

Example on how to use the prompt can be seen here.

Extended OpenAI Conversation

The project has been introduced here, and the Documentation is available directly on the author github project

Setup summary

LocalAI must be working with an installed LLM. You can directly ask the model if he is compatible with Home Assistant. To be confirmed: the model may work evene if it says he is not compatible. Mistral and Mixtral are compatible. Then install the Home Assistant integration, and follow the documentation provided above. High level Overview of the setup:

  • add the repository in HACS.
  • install the integration.
  • fill the needed information. You must fill something in the API key (if you don’t use api key just check the box “ignore authentication”), put the full url e.g. https://myLocalAIHostHere:8080/v1 (including /v1), Not sure: let the API version empty.
  • configure the Home Assistant Assist using the new conversation agent.