Local LLM Chat App

3 min readJan 14, 2025

The Local LLM Chat App is a new-age project spearheaded to bring the power of LLMs closer to the user in a real-time manner with a chat interface built upon Streamlit. The project relies on Ollama to integrate LLMs locally and tools such as Hugging Face to deploy real-time, enhancing efficiency in query processing. Using Python to manage dependencies, it allows you to experiment with various models (like Llama3, Phi3) and compare them. The intent behind this effort is to merge where AI stands now with what is good design to put cutting-edge LLMs within the reach of developers and enthusiasts.

🎯 Objective — Create a real-time, user-friendly chat application using Streamlit and Ollama’s local LLMs.
👉 Tech Stack — Streamlit, Ollama (e.g., Llama3, Phi3…), Python, Hugging Face, FastAPI.

Set Up Your Environment

local-llm-chat-app/
│
├── app/
│   ├── main.py                # Streamlit app
│   ├── api/
│   │   ├── server.py          # FastAPI server
│   │   ├── models/            # LLM model integration
│   │   └── utils.py           # Helper functions
│   └── assets/                # Any static files (e.g., logos, images)
│
├── requirements.txt           # Dependencies
├── Dockerfile                 # For deployment on Hugging Face
├── .env                       # Environment variables
└── README.md                  # Project overview

Set Up Python Environment

Create Virtual Envirnoment

python -m venv env
source env/bin/activate  # Linux/Mac
.\env\Scripts\activate   # Windows

2. Install essential libraries

pip install streamlit fastapi uvicorn ollama huggingface_hub

To get started with Ollama, you first need to download and install it from the official Ollama website. Follow the installation instructions provided on the website based on your operating system. Once installed, open your command prompt or terminal and download the desired model by running the following command:

ollama pull <model-name>

# Replace model-name you want to use ex.(ollama pull llama2)

Build the Chat Application
Streamlit UI Development

# app/main.py

import streamlit as st
from api.utils import query_local_llm

st.set_page_config(page_title="Local LLM Chat", page_icon=":shark:", layout="wide")
st.title("Local LLM ChatAPP")
st.sidebar.header("Settings")

model_name = st.sidebar.selectbox(
    "Choose a model",
    ["llama3", "phi4"]
)

user_input = st.text_input("Enter your query", key="user_input")

if st.button("Submit"):
    with st.spinner("Processing..."):
        response = query_local_llm(model_name, user_input)
        st.write(f"Response: {response}")

st.header("Compare Model Performance")
models = ["llama3", "phi4"]
queries = st.text_area("Enter test queries (one per line)")

if st.button("Compare Models"):
    for model in models:
        st.write(f"Model: {model}")
        for query in queries.splitlines():
            with st.spinner("Processing..."):
                response = query_local_llm(model, query)
                st.write(f"{query}: {response}")

Set up a function to query Ollama’s LLMs

# app/api/utils.py

import ollama

def query_local_llm(model_name, query):
    response = ollama.generate(model=model_name, prompt=query)
    return response['response']

FastAPI Integration for APIs

# app/api/server.py

from fastapi import FastAPI
from pydantic import BaseModel
import ollama

app = FastAPI()

class QueryRequest(BaseModel):
    model: str
    query: str

@app.post('/query')
async def query_model(request: QueryRequest):
    response = ollama.generate(model=request.model, prompt=request.query)
    return response['response']

Test server locally

uvicorn app.api.server:app --reload

Thank you for exploring this project! I hope you found it insightful and inspiring to build your own applications. For the complete source code, feel free to visit the GitHub repository. If you’d like to connect, collaborate, or share feedback, you can find me on LinkedIn.

Happy coding! 🚀✨

Local LLM Chat App

Set Up Python Environment

Written by Sai Chinmay Tripurari

No responses yet