Configure the Retrieval Pipeline and Agent Q&A

This tutorial is the second article in the "Enterprise Technical Documentation Intelligent Q&A System" series, following the previous article Configure the Preprocessing Pipeline and Knowledge Base Ingestion.

In the previous article, we completed:

✅ Preprocessing Pipeline orchestration (text extraction → intelligent chunking → multi-dimensional summary enhancement → vectorized storage)
✅ Knowledge base creation and binding to the preprocessing RAG Pipeline
✅ Technical document upload and ingestion verification

Based on that foundation, this article will configure the retrieval Pipeline and associate it with an Agent to achieve end-to-end intelligent Q&A capabilities.

💡 Prerequisite: Please ensure that you have completed the previous tutorial and that the knowledge base already contains processed document data.

Step 1: Configure the Retrieval Pipeline

The retrieval Pipeline determines how the most relevant content is recalled from the knowledge base when a user asks a question. Since the preprocessing stage has already generated multi-dimensional enhanced data (paragraph summaries, image descriptions, table summaries), this data can be leveraged during retrieval to improve recall quality.

Create a Retrieval Pipeline

On the Knowledge Base page, switch to the "Retrieval Pipeline" tab.
Click "+ Create Pipeline" and fill in:
- Name: Technical Documentation Enhanced Retrieval
- Applicable Agent Scope: select Basic Orchestration
- Description: Query rewriting + dual-channel retrieval + multi-level reranking + LLM-generated answers
Click "Confirm" to enter the orchestration interface.

Configure Retrieval Nodes

This retrieval Pipeline adopts a multi-stage progressive architecture, including query rewriting, dual-channel retrieval, multi-level reranking, conditional fallback, and LLM-generated answers. Since the chain is relatively long, it is explained in detail below in three stages.

Stage 1: Query Understanding and Retrieval

1. Start node

Define the input parameters required to start the workflow. These contents will be read by the LLM during the assistant conversation process, enabling the LLM to start the workflow at the appropriate time and fill in the correct information.

Input: USER_INPUT (the user's input in the current turn), CHAT_HISTORY (chat history), USER_IMAGES (images input by the user in the current turn), etc.

2. Whether to rewrite (conditional judgment)

Determine whether the user query needs to be rewritten based on the current session context:

Condition: when USER_IMAGES is not empty or CHAT_HISTORY is not empty, enter the "Query Rewrite" branch.
else: skip the rewrite process and go directly to the "Confirm Search Query" node, using the original question as the final query.

💡 When there is multi-turn conversation history, the user's latest question may be a referential expression (such as "What are its parameters?"), in which case it needs to be rewritten into a complete query to improve retrieval effectiveness.

3. Query rewrite

Model: reuse the LLM configured in the start node (llm_id)
Input: USER_INPUT (the original question text entered by the user)
Output: output (the rewritten complete query statement)

The LLM will combine context to replace ambiguous references with explicit expressions, for example rewriting "How is it configured?" as "How is the retrieval Pipeline of the knowledge base configured?"

4. Confirm search query

Input：：param (original user input), param1 (result after query rewriting)
Output: output (the final confirmed search query)

This node ensures that whether or not rewriting has occurred, there is always a definite query statement passed to downstream retrieval nodes.

5. Split Words (word segmentation)

Input: query (the final query text after search query confirmation)
Output: text (segmentation result, used as keywords for full-text retrieval)

Perform word segmentation on the query and extract keywords for use in the full-text matching channels of subsequent QA retrieval and document retrieval.

6. QA search (QA knowledge retrieval)

Input: keyword (keywords after segmentation), origin_query (the final query text after search query confirmation)
Configuration: select Q&A Retrieval, with Reuse start node filters and Reuse start node extra_config enabled by default
Output: embedding_result (retrieval results from the QA knowledge base), keyword_result (QA matching results based on keyword full-text retrieval)

Priority is given to retrieving from the QA question-answer pair knowledge base. The QA knowledge base usually contains curated standard Q&A pairs, with a high hit rate and stable answer quality.

Stage 2: Multi-level Reranking and Conditional Fallback

7. RRF Reranking_2 (RRF fusion reranking)

Segment the input text using a fixed word segmentation tool

Input: param (semantic vector recall results from QA retrieval), param1 (keyword full-text matching recall results from QA retrieval)
- The weights of the two input paths are evenly split, each being 0.5
Configuration:
- - k constant: 60 (0-100, used to control the decay rate of ranking on the final score; the larger the value, the more gradual the weight decay for lower-ranked results)
Output: DocChunk (reranked document chunks)

Use the RRF (Reciprocal Rank Fusion) algorithm to fuse and rank QA retrieval results, combining ranking information from multiple recall paths to generate a unified relevance ranking.

8. LLM rerank (LLM fine reranking)

Input: query (the final query text after search query confirmation), chunk (reranked document chunks)
Configuration: reuse the LLM configured in the start node (llm_id)
Output: DocChunk (fine-reranked chunks)

Call the LLM to perform semantic-level fine reranking on candidate chunks, judging the relevance of each chunk to the user's question one by one. Compared with traditional Reranker models, it has stronger semantic understanding capabilities.

9. If QA result (conditional judgment - whether the QA result is valid)

Condition: when DocChunk is not empty, directly use the QA retrieval + reranking result
else: if the QA result is empty, trigger document fallback retrieval

This is the key design of this Pipeline: prioritize high-quality answers from the QA knowledge base, and only fall back to document chunk retrieval when QA cannot hit, balancing answer quality and recall coverage.

10. Doc Search (document retrieval - fallback channel)

Input: keyword (keywords after segmentation), origin_query (the final confirmed query text)
Configuration: select Document Retrieval, with Reuse start node filters and Reuse start node extra_config enabled by default
Output: embedding_result (list of document chunks hit by semantic vector retrieval) and keyword_result (list of document chunks hit by keyword full-text matching)

When QA retrieval returns no result, perform vector + full-text fusion retrieval from the document chunks stored during the preprocessing stage to ensure that the user's question does not go unanswered.

11. RRF Reranking_1 (document retrieval reranking)

Input: param (semantic vector recall results from document retrieval), param1 (keyword full-text matching recall results from document retrieval)
Configuration:
- k constant: 60 (the same decay control value used in QA retrieval reranking to ensure a consistent ranking strategy)
Output: DocChunk (reranked document chunks)

Perform RRF fusion reranking on the results of document fallback retrieval as well, ensuring the relevance ranking quality of returned results.

Stage 3: Answer Generation and Output

12. Filter

Input: chunks (candidate chunk list from document retrieval reranking)
Model: reuse the LLM configured in the start node (llm_id)
Output: DocChunk (highly relevant chunks retained after LLM filtering, with noise or weakly relevant content removed)

Filter the final recalled chunks to remove low-relevance or duplicate chunks, ensuring that the context passed to the LLM is concise and effective.

13. Format user input

Input: chunks (filtered list of highly relevant document chunks), language (the language of the user's input), user_input (the user's input content)
Output: output (formatted user context)

Concatenate the recalled document chunks and the user's question according to the specified template format to form the user-side input for the LLM.

14. Main prompt (main prompt assembly)

Mode: string concatenation
Input: param (formatted user context)
Output: output (assembled main prompt content)

15. Format system prompt words

Input: robot_prompt (assistant prompt), quote_prompt (assembled main prompt content), time_diff_hour (actual time zone)
Output: output (complete System Prompt, used as the system-level instruction for the LLM)

Integrate the Agent's persona description, behavioral constraints, and retrieval context template into a complete System Prompt.

16. Generate answer

Model: reuse the LLM configured in the start node (llm_id)
Input: robot_prompt (complete system prompt), user_input (formatted user context)
Output: output (final answer text generated by the LLM)

The core generation node, where the LLM generates the final answer for the user based on the retrieved document chunks and system prompt.

17. Reference filtering

Input: answer (final answer text generated by the LLM), retrival_docs (filtered list of highly relevant document chunks)
Output: answer (answer), reference (list of chunks actually cited in or used as the basis for the answer, filtered from retrieval_docs)

Filter out the chunks that are actually cited in the answer from the candidate chunks and display them to the user as reference sources, enhancing answer traceability.

18. QA answer (QA answer processing)

Input: param (fine-reranked chunks)
Output: output (processed answer structure)

Format the standard answers produced by the QA channel.

19. Variable Aggregator_1 (result aggregation)

Aggregation strategy: the value of the first non-empty variable in each group
Input:
- group1:
  - The reference list output by Reference filtering.
  - Chunks after LLM Rerank fine reranking
- group2:
  - The answer text output by Reference filtering.
  - The output from QA answer.
Output: group1 (the final list of reference sources used), group2 (the final answer text returned to the user)

Aggregate the outputs of QA answers and reference filtering into a unified final return structure.

20. End (end of process)

Output: chunks (aggregated group1, i.e., the final confirmed list of reference citations), answer (aggregated group2, i.e., the final answer text returned to the user)

Output the final result to the caller (Agent), including the answer content and reference citations.

Summary of the Complete Retrieval Chain

Retrieval Testing and Publishing

Click the "Trial Run" button. 2，Select model: GPT-5.4
Enter different types of questions:

Test Question	Expected Recall Type
`What deployment architectures does the system support?`	Text paragraphs + architecture diagram descriptions
`Comparison of performance metrics for each module`	Table summary-related chunks
`What is the data processing flow?`	Flowchart descriptions + related text paragraphs

Select knowledge base: Technical Documentation Library
Confirm that the recalled results cover multi-dimensional data (original text + summaries + image descriptions + table summaries).
After confirming that the test results meet expectations, click "Publish".

Step 2: Create an Agent and Apply the Retrieval RAG Pipeline

Associate the configured retrieval Pipeline with an Agent to complete end-to-end intelligent Q&A verification.

Create or Configure an Agent

Go to AI Asset Module → Agent Management.
Create a basic Agent or select an existing Agent, and fill in:
- Name: Technical Documentation Assistant
- Description: Intelligent Q&A based on enterprise technical documentation, supporting multi-dimensional understanding of text, tables, and images
In the Agent's Knowledge Base Configuration:
- Associate knowledge base: Technical Documentation Library (the knowledge base created in the previous article)
- Retrieval Pipeline: select Technical Documentation Enhanced Retrieval
Save the Agent configuration.

Verify Intelligent Q&A Effectiveness

Enter the Agent conversation interface and perform the following test:

Question: How do I create an Agent?

Expected effect:

The answer content comes from the document paragraphs about creating an Agent
The cited chunks include the original paragraph text and summary enhancement information
If the document contains related step diagrams, those images also participate in recall

Summary of Results

By combining the previous article (preprocessing Pipeline + knowledge base) and this article (retrieval Pipeline + Agent), we have completed an end-to-end intelligent Q&A system with deep document understanding capabilities:

Stage	Configuration Content	Effect
Preprocessing Pipeline	Text extraction → intelligent chunking → multi-dimensional enhancement → storage	Documents are fully understood and indexed; text/images/tables are all retrievable
Knowledge Base RAG Pipeline Mode	Directly bind the preprocessing RAG Pipeline when creating the knowledge base	Upload and process immediately, automatically executing full-chain preprocessing
Retrieval Pipeline	Query rewriting → dual-channel retrieval → multi-level reranking → LLM generation	Understand user intent, accurately recall, and intelligently generate answers
Agent Association	Knowledge base + retrieval Pipeline binding	End-to-end intelligent Q&A is available

Highlights of Retrieval Pipeline Design

Query rewriting: Automatically completes referential questions based on conversation history, improving retrieval accuracy in multi-turn conversations
QA priority + document fallback: A dual-channel strategy that balances answer quality and coverage
Multi-level reranking: RRF fusion ranking + LLM semantic fine reranking, progressively improving Top-K quality
End-to-end generation: The retrieval Pipeline has a built-in LLM generation stage, completing the full process from retrieval to answer in a single chain
Citation traceability: Reference filtering automatically marks the document chunks cited in the answer

Further Optimization Suggestions

Optimize query rewrite Prompt: Customize rewrite prompts according to the business domain to improve rewrite quality
Adjust RRF parameters: Adjust the k value parameter of RRF according to actual recall performance
LLM rerank cost control: If call volume is high, consider replacing LLM fine reranking with a lightweight Reranker model
Expand the QA knowledge base: Continuously supplement high-frequency Q&A pairs to improve the hit rate of the QA channel
Monitor retrieval logs: Use Pipeline run history to locate recall quality issues and response time bottlenecks

Configure the Preprocessing Pipeline and Knowledge Base Ingestion (the first article in this series)
RAG Pipeline Feature Description
Knowledge Base Management

Step 1: Configure the Retrieval Pipeline​

Create a Retrieval Pipeline​

Configure Retrieval Nodes​

Stage 1: Query Understanding and Retrieval​

Stage 2: Multi-level Reranking and Conditional Fallback​

Stage 3: Answer Generation and Output​

Summary of the Complete Retrieval Chain​

Retrieval Testing and Publishing​

Step 2: Create an Agent and Apply the Retrieval RAG Pipeline​

Create or Configure an Agent​

Verify Intelligent Q&A Effectiveness​

Summary of Results​

Highlights of Retrieval Pipeline Design​

Further Optimization Suggestions​

Related Documents​