Skip to main content

Document Cleaning

The document cleaning algorithm improves the efficiency of RAG systems by removing noise from the dataset. Noisy or irrelevant content in source documents directly leads to hallucinations and inaccurate AI responses.

Main sources of context are the documents set itself and the PRD.

The cleaning agent uses information from the documents to build a single Knowledgebook Context and align it with PRD so that each document can be cleaned later on. You can download your documents cleaned versions in text format.