Build Rag with ChromaDb, Langchain and Hugging face
Or: How I taught my computer to become doc expert so I donāt have to read docs anymore
Letās be honest - weāve all been there. Youāre trying to learn a new technology (in this case, Bun, the blazingly fast JavaScript runtime), and youāre faced with a mountain of documentation. You know the answer to your question is somewhere in those docs, but finding it feels like searching for a needle in a haystack.
Thatās exactly what happened to me with Bun. Here I was, excited to try this new JavaScript runtime that promises to be 4x faster than Node.js, but I kept getting lost in the documentation maze. Questions like āHow do I install this thing?ā or āWhatās this bun-flavored TOML I keep hearing about?ā required me to dig through multiple markdown files.
So, being a developer, I did what any reasonable person would do: I built a robot to read the docs for me! š¤
RAG isnāt just a fancy acronym - itās like having a super-smart research assistant that can:
Think of it as Google search + ChatGPT, but specifically trained on your documentation and way more accurate.
Hereās what I used to build this documentation wizard:
intfloat/multilingual-e5-large-instruct
(fancy name for āturns text into numbers that computers understandā)Qwen/Qwen3-32B
(the brain that generates human-like responses)# Load all those markdown files
loader = DirectoryLoader("docs/bun", glob="*.md")
documents = loader.load()
# Chop them into bite-sized chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=300,
chunk_overlap=100, # Some overlap to maintain context
)
chunks = text_splitter.split_documents(documents)
I split the Bun documentation into 300-character chunks with 100 characters of overlap. Why? Because AI models work better with smaller, focused pieces of information rather than entire documents. Itās like the difference between asking someone to remember a paragraph vs. asking them to memorize a whole book.
class CustomHuggingFaceEmbeddings(Embeddings):
def embed_documents(self, texts: List[str]) -> List[List[float]]:
embeddings = []
for text in texts:
result = _inference_client.feature_extraction(
text,
model="intfloat/multilingual-e5-large-instruct",
)
embeddings.append(result)
return embeddings
This is where the magic starts. Each chunk of text gets converted into a vector (a list of numbers) that represents its meaning in mathematical space. Similar concepts end up close together in this vector space - itās like organizing books in a library, but in 1024 dimensions instead of just shelves!
Chroma.from_documents(
chunks, create_embedding(), persist_directory="chroma"
)
All these vectors get stored in ChromaDB, which is optimized for finding similar vectors quickly. When I ask āHow do I install Bun?ā, it can instantly find all the chunks that are semantically similar to that question.
# Find similar chunks
results = db.similarity_search_with_relevance_scores(query_text, k=3)
# Only proceed if we're confident about the results
if len(results) == 0 or results[0][1] < 0.7:
print("Unable to find matching results.")
return
# Combine the context
context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
When you ask a question, the system:
PROMPT_TEMPLATE = """
Answer the question based only on the following context
context: {context}
---
Answer the question based on the above context: {question}
"""
completion = client.chat.completions.create(
model="Qwen/Qwen3-32B",
messages=[{"role": "user", "content": prompt}]
)
Finally, the LLM gets the question and the relevant context, and generates a human-like response. The key phrase here is ābased only on the following contextā - this prevents the model from making stuff up!
The system works with six key Bun documentation files:
index.md
: The main overview explaining what Bun is (spoiler: itās a Node.js replacement thatās 4x faster)installation.md
: How to get Bun running on your machinequickstart.md
: Your first steps with Buntypescript.md
: TypeScript support (because who writes vanilla JS anymore?)benchmarks.md
: The performance numbers that make Node.js developers crybun-flavored-toml.md
: Bunās special configuration formatEach file gets chunked into ~300-character pieces, creating a searchable knowledge base that covers everything from basic installation to advanced configuration.
Hereās what makes this system actually useful:
Query: āHow do I install Bun?ā
Response:
To install Bun, you can use the following methods:
1. Using curl: curl -fsSL https://bun.sh/install | bash
2. Using npm: npm install -g bun
3. Using Homebrew: brew install bun
Sources: ['docs/bun/installation.md']
Query: āWhat makes Bun faster than Node.js?ā
Response:
Bun is faster than Node.js due to several key factors:
- Written in Zig and powered by JavaScriptCore (Safari's engine)
- Dramatically reduced startup times and memory usage
- Processes start 4x faster than Node.js
- Built from the ground-up with modern JavaScript ecosystem in mind
Sources: ['docs/bun/index.md', 'docs/bun/benchmarks.md']
LangChain doesnāt have built-in support for HuggingFaceās Inference API, so I had to create a custom embedding class:
class CustomHuggingFaceEmbeddings(Embeddings):
def __init__(self, model_name: str = "intfloat/multilingual-e5-large-instruct"):
self.model_name = model_name
def embed_documents(self, texts: List[str]) -> List[List[float]]:
# Custom implementation for HuggingFace Inference API
Not all similarity matches are created equal. I implemented a 0.7 threshold to ensure the system only responds when itās confident:
if len(results) == 0 or results[0][1] < 0.7:
print("Unable to find matching results.")
return
Balancing chunk size (300 chars) with overlap (100 chars) to maintain context while keeping chunks focused.
This RAG system is just the beginning. Hereās whatās on my roadmap:
Want to build your own documentation assistant? Hereās the quick setup:
# Clone and setup
git clone <your-repo>
cd Langchain_rag
pip install -r requirements.txt
# Add your HuggingFace API key
echo "HUGGIN_FACE_KEY=your_key_here" > .env
# Process the documents
python database.py
# Start querying!
python main.py
The beauty of this system is its modularity. Want to use it for React docs instead of Bun? Just change the DATA_PATH
in database.py
. Want to use a different LLM? Swap out the model name in main.py
. The architecture is flexible enough to adapt to any documentation set.
In a world where documentation is growing exponentially and developer time is precious, RAG systems like this arenāt just cool tech demos - theyāre productivity multipliers. Instead of spending 20 minutes hunting through docs, I can get accurate, sourced answers in seconds.
But more importantly, this project taught me that the future of developer tools isnāt about replacing human intelligence - itās about augmenting it. The system doesnāt think for me; it helps me find the information I need to think better.
And honestly? Building a robot that reads documentation so I donāt have to feels like the most developer thing Iāve ever done. š
Want to see the code? Check out the GitHub repository and feel free to contribute! And if you build your own documentation assistant, Iād love to hear about it.