All guides
CROSS-INDUSTRY

How to Build a Corporate RAG Chatbot

Technical and practical guide to building an AI chatbot that answers based on your company's documents. Architecture, tech stack, costs, and best practices.

6 chaptersTalk to us
01

What RAG is and why your company needs it

RAG (Retrieval-Augmented Generation) is the architecture that lets AI answer based on your specific company documents, not its general knowledge. Without RAG, an LLM like Claude or GPT-4 answers from training data — general knowledge but not your catalogs, policies, or manuals. With RAG, the system first searches your documents for relevant information, then generates an answer based on those specific documents.

The result: accurate, up-to-date answers specific to your company, with references to source documents. It is the most reliable approach for corporate chatbots because it drastically reduces hallucinations.

02

RAG architecture: the key components

A RAG system has 4 components. 1. Document Ingestion: PDFs, Word files, Excel, web pages are converted to text, split into chunks (500-1000 token segments), and transformed into vector embeddings. 2. Vector Database: chunks with their embeddings are stored in a vector database (Pinecone, Weaviate, pgvector on Supabase) enabling semantic similarity searches. 3. Retrieval: the user's question is converted to an embedding and the vector database returns the most semantically similar chunks (typically 3-10).

4. Generation: retrieved chunks are passed to the LLM with the original question. The LLM generates an answer based exclusively on the provided documents, citing sources.

03

Recommended tech stack for Italian SMEs

A pragmatic stack for an Italian SME: LLM: Claude 3 Haiku for fast, economical responses (0.25$/1M token input), Claude 3.5 Sonnet for more sophisticated responses. Embedding: OpenAI text-embedding-3-small (economical, performant for Italian). Vector DB: Supabase with pgvector (free up to 500MB, EU hosting available, familiar PostgreSQL). Framework: LlamaIndex for the RAG pipeline (simple, well-documented).

Frontend: Next.js with Vercel AI SDK for response streaming. Hosting: Vercel (frontend) + Supabase (database + vectors). Estimated monthly cost for an SME (1,000 queries/day): LLM API ~100-300 EUR, Supabase ~25 EUR, Vercel ~20 EUR. Total: 150-350 EUR/month for a professional corporate chatbot.

04

Optimization for the Italian language

Italian presents specific challenges for RAG systems. Chunking: Italian sentences tend to be longer than English ones. Chunks of 800-1200 tokens work better than the standard 500. Use a text splitter that respects sentence boundaries. Embedding: multilingual embedding models work well for Italian. OpenAI's text-embedding-3-small is a good compromise. For optimal results, also test multilingual-e5-large.

Retrieval: hybrid search (vector + keyword BM25) works better than vector-only for Italian, capturing both semantic meaning and specific technical terms. Weaviate natively supports hybrid search. Prompt: write system prompts in Italian. Instruct the model to respond in formal Italian and use sector-specific terminology. Test with real questions in colloquial Italian and with typos.

05

Company documents: preparation and updates

Chatbot quality depends 80% on document quality. Preparation: convert all documents to readable formats (text-based PDF, not images). Use OCR for scanned documents. Remove repetitive headers/footers, page numbers, and irrelevant content. Structure: well-structured documents with titles, subtitles, and paragraphs produce better chunks. One well-formatted manual beats 100 copied emails. Updates: documents change.

Implement an update process: when a policy changes, the document gets re-indexed automatically. A webhook on the document system can trigger re-indexing. Metadata: add metadata (date, author, category, version) to enable filtering results by category and showing only current documents.

06

Deployment, monitoring, and continuous improvement

Deployment: start with an internal rollout (customer service team only). Collect feedback for 2 weeks, improve, then expand to end users. Monitoring: track these metrics. Retrieval quality: are retrieved documents relevant? Measure with human feedback (thumbs up/down). Response quality: are answers accurate and useful? Collect user ratings. Fallback rate: how many conversations get escalated to a human? Coverage: how many questions get answered vs 'I don't have information on this'? Improvement: unanswered questions are gold — they indicate missing documents in the knowledge base.

Add content to fill gaps. Low-rated responses indicate chunking or prompt problems. Iterate weekly for the first 4 weeks, then monthly.

Ready to go from theory to practice?

Let's implement AI in your business together. The first call is free.