Local LLM Chat App
The Local LLM Chat App is a new-age project spearheaded to bring the power of LLMs closer to the user in a real-time manner with a chat interface built upon Streamlit. The project relies on Ollama to integrate LLMs locally and tools such as Hugging Face to deploy real-time, enhancing efficiency in query processing. Using Python to manage dependencies, it allows you to experiment with various models (like Llama3, Phi3) and compare them. The intent behind this effort is to merge where AI stands now with what is good design to put cutting-edge LLMs within the reach of developers and enthusiasts.
🎯 Objective — Create a real-time, user-friendly chat application using Streamlit and Ollama’s local LLMs.
👉 Tech Stack — Streamlit, Ollama (e.g., Llama3, Phi3…), Python, Hugging Face, FastAPI.
Set Up Your Environment
local-llm-chat-app/
│
├── app/
│ ├── main.py # Streamlit app
│ ├── api/
│ │ ├── server.py # FastAPI server
│ │ ├── models/ # LLM model integration
│ │ └── utils.py # Helper functions
│ └── assets/ # Any static files (e.g., logos, images)
│
├── requirements.txt # Dependencies
├── Dockerfile # For deployment on Hugging Face
├── .env # Environment variables
└── README.md # Project overview
Set Up Python Environment
- Create Virtual Envirnoment
python -m venv env
source env/bin/activate # Linux/Mac
.\env\Scripts\activate # Windows
2. Install essential libraries
pip install streamlit fastapi uvicorn ollama huggingface_hub
To get started with Ollama, you first need to download and install it from the official Ollama website. Follow the installation instructions provided on the website based on your operating system. Once installed, open your command prompt or terminal and download the desired model by running the following command:
ollama pull <model-name>
# Replace model-name you want to use ex.(ollama pull llama2)
Build the Chat Application
Streamlit UI Development
# app/main.py
import streamlit as st
from api.utils import query_local_llm
st.set_page_config(page_title="Local LLM Chat", page_icon=":shark:", layout="wide")
st.title("Local LLM ChatAPP")
st.sidebar.header("Settings")
model_name = st.sidebar.selectbox(
"Choose a model",
["llama3", "phi4"]
)
user_input = st.text_input("Enter your query", key="user_input")
if st.button("Submit"):
with st.spinner("Processing..."):
response = query_local_llm(model_name, user_input)
st.write(f"Response: {response}")
st.header("Compare Model Performance")
models = ["llama3", "phi4"]
queries = st.text_area("Enter test queries (one per line)")
if st.button("Compare Models"):
for model in models:
st.write(f"Model: {model}")
for query in queries.splitlines():
with st.spinner("Processing..."):
response = query_local_llm(model, query)
st.write(f"{query}: {response}")
Set up a function to query Ollama’s LLMs
# app/api/utils.py
import ollama
def query_local_llm(model_name, query):
response = ollama.generate(model=model_name, prompt=query)
return response['response']
FastAPI Integration for APIs
# app/api/server.py
from fastapi import FastAPI
from pydantic import BaseModel
import ollama
app = FastAPI()
class QueryRequest(BaseModel):
model: str
query: str
@app.post('/query')
async def query_model(request: QueryRequest):
response = ollama.generate(model=request.model, prompt=request.query)
return response['response']
Test server locally
uvicorn app.api.server:app --reload
Thank you for exploring this project! I hope you found it insightful and inspiring to build your own applications. For the complete source code, feel free to visit the GitHub repository. If you’d like to connect, collaborate, or share feedback, you can find me on LinkedIn.
Happy coding! 🚀✨