My AI Learning Setup
Intro
Arguably, AI is moving at breakneck speed and as a techie I needed a way to learn at pace. All that in parallel to a rather demanding (and not really 9-5) job, I had to find a way to build a lab where I could experiment on everything AI while also improving my overall tech setup and a way to access it remotely from my Macbook while travelling somewhere around the world. But in order.
For the longest time, I have used commercial services - and my services bill each month is rather staggering. So why not jump onto the self-hosting bandwagon and go all in with the whole homelab idea?
I had bits and pieces of that already in place with a Synology NAS which ran docker containers and VMs for me and a very high-speed (some say hopelessly over-powered) network with a fixed outside IP.
But it was half-hearted. More of a ‘build, tinker, throw-away’ kind of setup. Also, the NAS is great if you do not need a lot of processing power for your containers. And of course, the GPU was less than impressing - certainly not enough for serious AI workloads.
So, here’s the description of the current setup with my architectural choices (and cloud-based options in case you don’t want to go all the way down the literal rabbit hole).
Networking
The first thing is networking. I have my whole house wired with Ethernet and use Wifi more in the living room area. We get fibre here with up to 25 Gbps synchronous flat, but my Unifi equipment does not support that right now, so I took the 10 Gbps option. Key is synchronous and flat (10 Gbps up and down and one price no matter the usage). It also has a fixed WAN IP, which I thought I needed but figured out in the last year that I don’t, because…
Tailscale
Tailscale is an amazing service. It builds a mesh of Wireguard connections between all of your devices - they can communicate on their tailnet, which is unique to you, has its own DNS service, a great SSH authorization service which does not require key distribution, and basically punches through all and any NAT setups. Completely transparent, Apple-style just works magic.
Sidebar: Family IT Support My family is on various different ISPs, in different cities/countries than me, and who is taking care of the inevitable tech issues? Of course. And of course, you are in the same boat. I would recommend you look into tailscale. The free tier is quite generous, it runs on all kinds of devices, and really simplifies tech support. Especially with something like RustDeck. Check out their Youtube Channel for all kinds of useful info.
So, all my devices, physical and virtual, are on my tailnet - which makes the homelab less of a lab and more of a production setup, as my macbook and iphone are of course also always on the tailnet.
Unifi
Unifi builds enterprise-grade network kit for prosumer-grade pricing. Simple to deploy, simple to manage (ok, simple for prosumers with network experience, to be honest). It also makes it easy to have a very comprehensive security and VLAN setup. This becomes rather important for the homelab, you really don’t want homelab traffic in your main network. Combining proxmox (see below) with VLANs makes it very easy to keep your networks properly segregated.
pi-hole
Another key component of my network setup is pi-hole. It is a DNS server which filters out all kinds of security risks, advertisment, and tracking - on the network layer. So when your browser wants to call a tracking JS file, it cannot be loaded because the DNS server just does not return an address. Very useful. I have two of those for DNS failover, and they are the upstream DNS server for tailscale, so every device connected to my tailnet gets this protection automatically with zero additional configuration. This includes my family’s devices - I join them into the tailnet anyways for above mentioned IT support use cases - and this comes as an added benefit.
If you use it for undercover security for your family, prepare to explain why the first couple of links in Google etc. (“sponsored links”) do not work anymore…
Server
Ok, Network in place. Now where do we actually run our AI and non-AI workloads? As said before, the Synology is great for not-so-critical workloads but it lacks in important areas. So, another solution had to be found.
Hardware
For hardware, I added a mini-PC to my setup. An Asus NUC 14 Pro+ with 98 GB RAM. It runs a proxmox virtual environment hypervisor to manage all containers and VMs. These mini-PCs are great - lots of power, extensible, and a small physical footprint - so the plan is to scale horizontally once the limits of the current machine are exhausted.
The downside of the NUC is of course the onboard GPU - again not really what a modern AI workload requires. So I added an eGPU. Wikingoo makes a bare-bone eGPU “enclosure” with a Thunderbolt 4 connector. I added a bequiet PSU (absolute overkill but I want to be extensible should I choose another GPU) and a RTX 2000 ADA GPU. 16 GB RAM, might be a bottleneck but right now this is probably enough.
With the proxmox VE, you can pass-through the eGPU to a VM, which is exactly what I am doing. The relevant link for this endeavour (because PCIe passthrough is an adventure!) is here.
For disk space, backup, and general NAS, I run a Synology 1621+, fully equipped and currently at 30TB of space in a Synology Hybrid RAID setup with 1 disk fault tolerance and an SSD cache group of 1TB. Fast, reliable, wired connectivity via 3 trunked ports.
Hypervisor
Ok, network and hardware in place. Now to the actual setup of the homelab and which services I run for enabling my AI learning journey.
Containers
I mentioned it a few times. Proxmox is my hypervisor of choice (calling it VE going forward). You can run it for free, it is open source, based on Debian, you can cluster it across hardware (for horizontal scalability) and it comes with a lot of useful enterprise features like backup. I run Proxmox Backup Server in a dedicated VM on VE with an NFS folder on my Synology as the datastore. Setting it up is also not trivial, so here is a guide on how to get that done.
The VE can run containers (LXC) or Virtual Machines. I choose VMs for situations where I need a full machine, like for example for the Backup Server or the AI Server. The latter needs the passthrough PCIe GPU.
As basic OS my choice is Debian, so I have a netinstall image stored on the VE server and it spins up both VMs and LXCs very quickly. For everything I want to run in Docker, I spin up a container.
Docker deserves mention - if possible, I run everything in Docker. It is great for separation, can be refreshed easily, is very transportable, and when using docker compose, you can easily map LXC paths into the container to persist data. Everything you need is in a compose.yaml file, and you just spin it up with
| |
pull gets the latest version, up -d runs the container in detached mode (daemon mode basically), logs -f shows the output (optional - only if you want to see what’s happening after spinning it up). If you want to update the version of a running container, the complete set of commands look like this:
| |
Spinning up a Container
This is a quick process. Using the Proxmox UI, create a Container, give it a number, a name, # of vCPUs, size of disk and RAM, set network to DHCP, set the VLAN according to your VLAN setup in Unifi, start.
LXCs and Tailscale need a bit of an extra step that you do not need for a VM, mainly because on the LXC, tailscale runs in user space. Do the following steps, on the VE shell, not the container shell. <number> is the number of the container you set in the Proxmox UI
| |
You can use another editor of course, but why would you? (flashback to the vi vs. EMACS debate during university!) In vi, do a
| |
and paste the following two lines
| |
| |
(this saves the file).
last, start the container again, easiest on the command line
| |
More information about this can be found here.
Then, using the console in the Proxmox UI, log in as root and execute the following commands. They update the system to the latest version, install curl, and then use curl to install docker and tailscale.
| |
Of course, before piping websites into sh, make sure you read what’s written on them and that you trust the source. The last line ends with a URL being shown on the terminal, copy it and open it in the browser, this adds the device to your tailnet and enables the ssh key distribution feature.
Now you can use vscode or your terminal of choice to work on the machine and don’t need to go through the console. vscode has a tailscale integration which is pretty epic for this use case - or any other coding task.
Configure Docker
Now that you have all this in place, you can create the compose.yaml for the docker container you want to run within the VE container. Create the persistent directories (I create them all as subdirectories of /opt) and spin up docker as described above.
Reverse Proxy
Most of the software packages I am deploying run on a port other than 443 or 80, and of course, most also do not have proper SSL setup. So, again, tailscale to the rescue. Tailscale has a reverse proxy which automatically gets you a valid Let’s Encrypt SSL certificate and exposes the service as a https service within the tailnet While you could have more of them running on a single server, I find it more useful to run one service on one Proxmox container, have it visible on my tailnet and the service running on https:// on that container.
Here’s how to set this up
| |
where port is the port that the service runs on. Again as mentioned above, the tailscale Youtube channel is a great source of learning and inspiration.
Services
Ok, finally, the preamble is done. Let’s talk about the services I am using, there are a few key ones:
ollama & Open Web-UI
ollama is a tool that let’s you run an LLM locally. and Open Web-UI is a front-end for LLMs, similar to the native front-end of Claude or Chat-GPT - and it can integrate easily with
ollama.
These two run on the AI VM, so I can expose my eGPU directly to ollama - which is kind of the whole point of having an eGPU.
Here is the gist of the docker compose file I am using for this. It is NVIDIA-specific of course and requires that the VM can see the GPU.
If the LLM model, which you can choose and then download from ollama directly, fits into the RAM of the GPU, you get accelerated responses. Go play!
n8n.io
n8n. Where do I start. It is a code-when-you-want (no-code if you don’t want) automation service similar to IFTTT, Zapier, etc. And you can self-host it, so of course I do. Again based on docker, here is the gist. This one requires a few more files, would suggest you look at the n8n pages for help.
Now what makes this a standout service in my AI learning homelab is the fact that it has a really comprehensive and very good langchain implementation, enabling you to build AI Agents on several levels (agents running multiple agents running …). They also have a highly interesting Youtube channel and you can also have a free-tier cloud version if you cannot see yourself running this at home.
Get it, figure out what to build, watch their tutorial on agent building - and go play.
RSS
No proper learning setup would be complete without an efficient way to read news, save interesting articles into a read-it-later service and into a data storage of all the interesting things you have read or planned to read.
I used to be - as I posted elsewhere on this blog before - a Google Reader, then Feedly, and now self-hosted miniflux user but I recently augmented the setup with karakeep (gist). It just hoards links, downloads them in their entirety to disk so in case they go out of business, you have a copy like in the wayback machine, puts an AI-generated summary into the card, tags it based on AI - and makes it searchable. Very useful.
And if you only have one feed in your blogroll, make it HN. First of all, hacker in the old-school sense of the word
A person skilled with the use of computers that uses his talents to gain knowledge. – Urban Dictionary
Secondly, I have learned so incredibly much in such a short time since I started reading HN not too long ago, I can only recommend it. Not only about IT topics.
So how does my reading workflow work?
subscribe to the feeds you want in miniflux or better in your RSS reader of choice, I am using the excellent Reeder IOS app.
quick-read the headlines and summaries (if available) in the Reeder app
star the articles that sound vaguely or somewhat or really interesting (this is really efficient if you have little pockets of time throughout the day and want to prefill your reading queue)
when I have more time, I open Instapaper (my not-yet-replaced read-it-later service) and read the articles in their completeness - or download them for flights or other types of transport where I have downtime
Whenever I remember something I read, I search for it on Karakeep where miniflux has helpfully added everything I starred in Reeder, and which Karakeep has already scraped and archived for me.
The end
That’s it for now. A long post but wanted to share how I keep myself up to date on this fascinating topic. There are other things running in my homelab but that might be for a separate article sometime. Right now I am trying to build an efficient Graph-based RAG for all my PDFs (I am running a completely paperless office with all correspondence in OCR’ed PDFs in DevonThink - this might also be a separate article sometime) with n8n, but I have not quite grasped that yet.
