Samuel Edusa MD
Why I Stopped Paying $150 a Month to Ask My AI the Same Questions
Samuel Edusa, MD | Apr 8, 2026

Samuel Edusa, MD. "An AI Generated Sketch of a Raspberry Pi on my desk." 2026, Digital artwork generated using Google Gemini. Personal collection.
I run a personal AI assistant on a Raspberry Pi. A tiny computer the size of a credit card that sits on my desk. It helps me write code, answer questions, and manage knowledge across my projects. The problem was the bill. Every time I asked it something, it was calling the most expensive AI model available, even for questions it had already answered yesterday.
That's like paging a subspecialist every time you need a Band-Aid.
So I built a system that fixed it. My monthly bill dropped from around $150 to under $2. Here's how, explained in ways that don't require a computer science degree.
The Hospital Analogy
Imagine you walk into an emergency room with a paper cut. In a normal hospital, you'd see the triage nurse first. She'd look at it, hand you a bandage, and send you on your way. You'd never see a surgeon.
But what if every patient (paper cut or heart attack) went directly to the subspecialist? Not because anyone decided the case was that complex, but because the hospital didn't bother triaging. The subspecialist would spend their day on problems that any nurse could have handled, the wait times would be absurd, and the hospital would hemorrhage money.
That's how most AI setups work. Every question, no matter how simple, gets routed to the most powerful AI model. Ask it what time zone London is in? Same cost as asking it to redesign your entire software architecture. There's no triage. No matching the problem to the right level of expertise.
My system works like a well-run ER. I have ten AI models with different levels of capability. Every question goes through triage first, and each clinician works at the top of their scope.
The triage nurse (a model that costs $0.11 per million words) does the initial assessment. Is this straightforward, or does it need to go further? About 75% of the time, it's within their scope of practice and they handle it right there. A nurse applying a bandage isn't doing less important work than a surgeon. They're doing the right work for the problem.
The resident ($0.30 per million words) picks up the next 15%. These need more clinical reasoning but are well within what a trainee with solid fundamentals can manage. The bread-and-butter cases.
The attending physician ($1 per million words) handles 6%. The cases that require the judgment that comes with years of independent practice.
The specialist ($3 per million words) handles 3%. These are focused, multi-step problems within a specific domain where general training isn't enough.
The subspecialist ($5-25 per million words) handles less than 1%. These are the rare and complex cases. The ones where you need someone who has spent years going deep on a narrow problem. They should be doing this work, not answering questions a resident could handle.
Here's the key: each doctor writes a brief note before referring up. The specialist doesn't re-examine the patient from scratch. They read the notes from everyone below them and focus only on what they couldn't figure out. By the time the subspecialist sees a case, they're reading a one-page summary, not a 50-page medical history.
The Library Analogy
The second piece is how the system remembers things.
Imagine a librarian who, every time you asked a question, walked into the stacks, pulled 200 books off the shelves, read them all, gave you an answer, and then put them all back. Tomorrow you ask the same question, and she does the whole thing again. That's how AI normally works. No memory between conversations.
My system has a different kind of library. I call it the crystal lattice, but you can think of it as a librarian who takes notes.
When the system learns something new (say, how my payment processing code works), it writes it on an index card. That card starts out in pencil. It's a guess, unverified. If the system encounters the same information from a different source, it goes over the pencil in pen. After four independent confirmations, the card gets laminated. Now it's trusted knowledge.
Laminated cards get filed in a cross-referenced system. "Payment processing" links to "error handling" links to "retry logic." Over time, the library builds itself.
Here's the clever part: if new information contradicts a laminated card, the card gets torn up. Both the old and new claims start over as pencil cards and have to prove themselves again. The library corrects itself automatically. Bad information doesn't sit there quietly getting stale. It breaks apart.
When someone asks a question, the librarian checks the card catalog first. If a laminated card covers it, she answers in seconds without pulling a single book off the shelf. That's zero cost.
The Memory Analogy
The third piece is how the system remembers what happened in past conversations.
Think about how your brain works. If you burn your hand on a hot stove, you remember that vividly. If someone tells you a random fact at a party, you probably forget it by next week. Your brain doesn't treat all memories equally. Important ones get strengthened, irrelevant ones fade.
My system does the same thing. Every interaction creates a memory pathway. If that pathway gets used again (you ask a similar question, or the same solution works twice), it gets stronger. If a pathway leads to a mistake, it gets weaker. If it sits unused for three days, it starts fading.
Pathways that stay strong for a full week get promoted to permanent memory. These are the system's expertise. The things it's proven it knows. They're fast to recall and resistant to being forgotten.
Every night at 3 AM, the system goes through a "sleep cycle." Just like how your brain consolidates memories during sleep, the system strengthens connections between related memories, prunes the weak ones, and promotes the strong ones. This whole maintenance process runs on the cheapest model and costs less than a penny per night.
The Compounding Effect
Here's what makes this interesting beyond just saving money. The system gets cheaper over time.
Week one, the library is empty. The memory is blank. Most questions have to escalate up the chain because the system hasn't built up knowledge yet. Maybe 60-70% of questions end up needing the attending or above.
By month three, the library has hundreds of laminated cards. The memory has dozens of proven pathways. The triage nurse can now resolve 80% of questions by checking the card catalog and recalling past solutions. Not because the nurse got smarter, but because the knowledge base did.
By month six, 95% of questions are handled at the front line. The system has learned your domain. Only brand new problems. Things it's truly never encountered before need to escalate to the specialist or subspecialist.
It's like training a new resident. The first month, they're asking the attending about everything. Six months in, they're handling most cases independently and only calling up for the unusual ones.
The Real Numbers
I run about 50 questions a day through this system. Here's what it costs:
The triage nurse handles 37-38 of those. Cost: less than a penny.
The resident handles 7-8. Cost: less than half a penny.
The attending handles 3. Cost: about a penny.
The specialist handles 1-2. Cost: about two cents.
The subspecialist handles maybe one every other day. Cost: about two cents.
Total: about six cents a day. Under two dollars a month.
Without this system, sending everything to the specialist: $112 a month.
The overnight maintenance (the "sleep cycle" that prunes memories and builds the library) costs two cents a month. Total.
Why This Matters Beyond My Desk
This isn't just about my personal AI bill. The pattern applies anywhere people use AI models.
If you're a developer running AI-powered features in an app, you're probably sending everything to one model. You're routing every question to the subspecialist when most of them could be handled at triage.
If you're a company with a customer service bot, most questions are FAQs. Those should be handled at the appropriate level, with the more capable models reserved for the cases that actually need them.
If you're a researcher running hundreds of queries against AI models, you're re-deriving the same knowledge over and over. A system that remembers and compounds would cut your costs dramatically.
The core idea is simple. Match the level of capability to the complexity of the problem, remember what you've already figured out, and let go of what isn't useful. Hospitals figured this out decades ago with triage. Every clinician works at the top of their scope. Our brains figured it out millions of years ago with synaptic pruning. AI infrastructure is just catching up.
Open Source
I released this as an open-source project called the ConsultChain. It runs on a Raspberry Pi, integrates with OpenClaw (the AI assistant framework I use), and works with any combination of AI models from any provider.
If you're technically inclined. It's a five-tier model cascade with progressive context distillation, a SQLite-backed knowledge store with phase-transition semantics, and a Hebbian memory system with scheduled pruning. It exposes an MCP server for tool integration.
If you're not technically inclined. It's a system that makes AI cheaper by being smarter about who answers the question.
https://github.com/sedusa/consultchain