01 The problem
The brief: take a stream of customer support requests spanning multiple companies, and classify and route each one to the right place. The catch is the part that breaks most LLM solutions. Every company has its own policies, and the one thing the system must never do is make a policy up. A confidently wrong answer about a refund window or an escalation path is worse than no answer.
So the real constraints were safety and determinism: the same input must always produce the same routing, the behaviour has to be explainable, and it has to run reliably at scale without a model hallucinating its way into a policy it was never given.
02 What I built
A terminal-based, deterministic triage pipeline built from three cooperating parts: lexical retrieval to pull the relevant, real policy text for each request; rule-based routing to make the actual classification and destination decision from that grounded context; and a modular pipeline stitching the stages together so each one can be tested and reasoned about in isolation. No freeform generation sits in the decision path. The model never gets the chance to invent a policy, because the routing decision is made by rules over retrieved facts.
03 Key decisions & tradeoffs
-
Deterministic core, not generative
The classification and routing decisions come from rules over retrieved text, not from a language model's free output. Same request in, same decision out, every time.
Tradeoff Less flexible on truly novel phrasings, paid in exchange for never fabricating a policy, which was the whole point of the brief.
-
Lexical retrieval over heavy embeddings
Grounding each decision in the company's own policy text with lexical retrieval kept the pipeline fast, transparent, and free of an embedding service to manage under time pressure.
Tradeoff Gives up some semantic recall on paraphrased requests, in return for speed and answers you can trace straight back to a source line.
-
Modular stages you can test
Retrieval, routing, and orchestration are separate modules, so each stage could be validated on its own rather than debugging one opaque end-to-end blob.
Tradeoff More structure up front than a quick script, which is exactly what made it defensible under questioning.
04 Outcome
The pipeline placed 70th out of 1,349 finalists at HackerRank Orchestrate in May 2026, from a field of 12,885 registrants. It was my first Orchestrate run, and the lesson that carried forward had less to do with the model than the defense: being able to explain why a deterministic, retrieval-grounded design was the right call for a problem where a confidently wrong answer is worse than no answer. Three months later, on a harder multi-modal brief, that same habit took me to 5th.
This is the project behind the line I keep coming back to: building software is only half the challenge. Understanding and defending your technical decisions matters just as much.