Lately, there is a need of private chatbot service as a complete alternative to OpenAI's ChatGPT. So, I decide to implement one at home and make it accessible to everyone in my household alongside with my network printer and NAS (OpenMediaVault).
\ In the past, I used to recommend people using Llama series for English tasks and Qwen series for Chinese tasks. There was no open-source model that's strong enough in multilingual tasks comparing to proprietary ones (GPT/Claude).
\ However, as we all know—things have changed recently. I have been using DeepSeek-V2 occasionally every time I got tired with Qwen2.5 and have been falling behind with DeepSeek V2.5 and V3 due to lack of hardware. But DeepSeek didn't let me down, R1 performs so impressive and provides as small as 1.5B!
\ This means we can run it even on CPU with some considerable user experience. As many people has GPUs for gaming, speed is not an issue. To make local LLMs process uploaded documents and images is a big advantage since OpenAI limits this usage for free accounts.
\ Although Installing Open WebUI with Bundled Ollama Support is very easy with official one-line command:
docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama\ But to get RAG (Web search) working is not easy for most people, so I would like to find some out-of-box solution.
\ As I mentioned in my last post, harbor is a great testbed for experimenting with different LLM stack. But it is not only great for that, it's also an all-in-one solution for self-hosting local LLMs with RAG working out-of-box. So, let's begin implementing it from scratch and feel free to skip steps since most people don't start from OS installation.
System Preparation (Optional)As same as previously, go through install process using debian-11.6.0-amd64-netinst.iso
\ Add to sudoer usermod -aG sudo username then reboot
\ (Optional) Add extra swap
fallocate -l 64G /home/swapfile chmod 600 /home/swapfile mkswap /home/swapfile swapon /home/swapfile\ and make the swapfile persistent nano /etc/fstab
UUID=xxxxx-xxx swap swap defaults,pri=100 0 0 /home/swapfile swap swap defaults,pri=10 0 0Check with swapon --show or free -h
\ Disable Nouveau driver
bash -c "echo blacklist nouveau > /etc/modprobe.d/blacklist-nvidia-nouveau.conf" bash -c "echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf" update-initramfs -u update-grub reboot\ Install dependencies
apt install linux-headers-`uname -r` build-essential libglu1-mesa-dev libx11-dev libxi-dev libxmu-dev gcc software-properties-common sudo git python3 python3-venv pip libgl1 git-lfs -y\ (Optional) Perform uninstall if needed
apt-get purge nvidia* apt remove nvidia* apt-get purge cuda* apt remove cuda* rm /etc/apt/sources.list.d/cuda* apt-get autoremove && apt-get autoclean rm -rf /usr/local/cuda*\ Install cuda-tookit and cuda
wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-debian11-12-4-local_12.4.1-550.54.15-1_amd64.deb sudo dpkg -i cuda-repo-debian11-12-4-local_12.4.1-550.54.15-1_amd64.debsudo cp /var/cuda-repo-debian11-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/ sudo add-apt-repository contrib sudo apt-get update sudo apt-get -y install cuda-toolkit-12-4 sudo apt install libxnvctrl0=550.54.15-1 sudo apt-get install -y cuda-drivers\ Install the NVIDIA Container Toolkit since harbor is docker-based
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list\ Then sudo apt-get update and sudo apt-get install -y nvidia-container-toolkit
\ Perform a cuda post-install action nano ~/.bashrc
export PATH=/usr/local/cuda-12.4/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}\ Then sudo update-initramfs -u, ldconfig or source ~/.bashrc to apply changes
\ after reboot, confirm with nvidia-smi and nvcc --version
\ Install Miniconda (Optional, not for harbor)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && sudo chmod +x Miniconda3-latest-Linux-x86_64.sh && bash Miniconda3-latest-Linux-x86_64.sh Docker & Harbor # Add Docker's official GPG key: sudo apt-get update sudo apt-get install ca-certificates curl sudo install -m 0755 -d /etc/apt/keyrings sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc sudo chmod a+r /etc/apt/keyrings/docker.asc # Add the repository to Apt sources: echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \ $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get update sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-pluginPerform post-install for docker without sudo
sudo groupadd docker sudo usermod -aG docker $USER newgrp docker docker run hello-world git clone https://github.com/av/harbor.git && cd harbor ./harbor.sh ln\ Verify with harbor --version
\ Add with RAG support to defaults with harbor defaults add searxng
\ Use harbor defaults list to check, now there are three services active: ollama, webui, searxng
\ Run with harbor up to bring up these services in docker
\ Use harbor ps as docker ps , and harbor logs to see tailing logs
\ Now the open-webui frontend is serving at 0.0.0.0:33801 and can be accessed from http://localhost:33801 or clients from LAN with server's IP address.
\ Monitor VRAM usage with watch -n 0.3 nvidia-smi
\ \ Monitor log with harbor up ollama --tail or harbor logs
\ All ollama commands are usable such as harbor ollama list
\ It's time to access from other devices (desktop/mobile) to register an admin account and download models now.
Using Local LLMAfter login with admin account, click top right avatar icon, open Admin Panel then Settings, or simply access via `http://ip:33801/admin/settings.
\ Click Models, and at the top right click the Manage Models which looks like a download button.
\ Put deepseek-r1 or any other model in the textbox below Pull a model from Ollama.com and click the download button on the right side.
\ After model downloaded, it may require a refresh and the newly downloaded model will be usable under the drop down menu on the New Chat (home) page.
\ Now, it's not only running a chatbot alternative to ChatGPT, but also a fully functional API alternative to OpenAI API, plus a private search engine alternative to Google!
\ webui is accessible within LAN via: http://ip:33801
ollama is accessible within LAN via: http://ip:33821
searxng is accessible within LAN via: http://ip:33811
\ Call Ollama API with any application with LLM API integration:
http://ip:33821/api/ps http://ip:33821/v1/models http://ip:33821/api/generate http://ip:33821/v1/chat/completionsb\
All Rights Reserved. Copyright , Central Coast Communications, Inc.