User documents are embedded into a vector database. The database splits the documents into chunks and embedds each chunk as a high dimensional feature vector.
When a user queries the bot, the query is given to the vector database, which searches its feature space and returns the most promising chunks. The chunks are then passed to a large language model as additional context.
This allows the large language model to utilize information from the underlying documents to answer user questions, all without the necessity of fine-tuning on those documents.