评估测试

测试AI应用需要评估生成内容，以确保AI模型没有产生幻觉反应。spring-doc.cadn.net.cn

评估响应的一种方法是直接使用AI模型进行评估。选择最佳的AI模型进行评估，但这可能与生成回答时使用的模型不同。spring-doc.cadn.net.cn

Spring AI 用于评估响应的界面是计算器，定义为：spring-doc.cadn.net.cn

@FunctionalInterface
public interface Evaluator {
    EvaluationResponse evaluate(EvaluationRequest evaluationRequest);
}

评估的输入为评估请求定义为spring-doc.cadn.net.cn

public class EvaluationRequest {

	private final String userText;

	private final List<Content> dataList;

	private final String responseContent;

	public EvaluationRequest(String userText, List<Content> dataList, String responseContent) {
		this.userText = userText;
		this.dataList = dataList;
		this.responseContent = responseContent;
	}

  ...
}

用户文本：用户作为字符串spring-doc.cadn.net.cn
dataList：上下文数据，如检索增强生成，附加在原始输入中。spring-doc.cadn.net.cn
response内容： AI 模型作为字符串spring-doc.cadn.net.cn

在积分测试中的应用

以下是相关性评估器在积分测试中，利用检索增强顾问:spring-doc.cadn.net.cn

@Test
void evaluateRelevancy() {
    String question = "Where does the adventure of Anacletus and Birba take place?";

    RetrievalAugmentationAdvisor ragAdvisor = RetrievalAugmentationAdvisor.builder()
        .documentRetriever(VectorStoreDocumentRetriever.builder()
            .vectorStore(pgVectorStore)
            .build())
        .build();

    ChatResponse chatResponse = ChatClient.builder(chatModel).build()
        .prompt(question)
        .advisors(ragAdvisor)
        .call()
        .chatResponse();

    EvaluationRequest evaluationRequest = new EvaluationRequest(
        // The original user question
        question,
        // The retrieved context from the RAG flow
        chatResponse.getMetadata().get(RetrievalAugmentationAdvisor.DOCUMENT_CONTEXT),
        // The AI model's response
        chatResponse.getResult().getOutput().getText()
    );

    RelevancyEvaluator evaluator = new RelevancyEvaluator(ChatClient.builder(chatModel));

    EvaluationResponse evaluationResponse = evaluator.evaluate(evaluationRequest);

    assertThat(evaluationResponse.isPass()).isTrue();
}

你可以在 Spring AI 项目中找到多个使用以下相关性评估器测试问答顾问（参见测试）和检索增强顾问（参见测试）spring-doc.cadn.net.cn

自定义模板

这相关性评估器使用默认模板提示 AI 模型进行评估。你可以通过提供自己的行为来定制这种行为提示模板对象通过.promptTemplate（）建造者方法。spring-doc.cadn.net.cn

习俗提示模板可以使用任何模板渲染器实现（默认情况下，它使用StPrompt模板基于StringTemplate引擎）。重要要求是模板必须包含以下占位符：spring-doc.cadn.net.cn

一个查询占位符以接收用户提问。spring-doc.cadn.net.cn
一个响应占位符以接收AI模型的响应。spring-doc.cadn.net.cn
一个上下文占位符以接收上下文信息。spring-doc.cadn.net.cn

事实核查评估员

FactCheckingEvaluator 是评估者界面的另一种实现，旨在根据所提供上下文评估 AI 生成的回复的事实准确性。该评估器通过验证某一陈述（主张）是否在所提供的上下文（文档）中逻辑支持，帮助检测和减少人工智能输出中的幻觉。spring-doc.cadn.net.cn

“主张”和“文档”会被提交给AI模型进行评估。专门用于此目的的更小更高效的AI模型，如Bespoke的Minicheck，相比旗舰模型如GPT-4，它有助于降低执行这些检查的成本。Minicheck也可以通过Ollama使用。spring-doc.cadn.net.cn

用法

FactCheckingEvaluator 构造器以 ChatClient.Builder 作为参数：spring-doc.cadn.net.cn

public FactCheckingEvaluator(ChatClient.Builder chatClientBuilder) {
  this.chatClientBuilder = chatClientBuilder;
}

评估者使用以下提示模板进行事实核查：spring-doc.cadn.net.cn

Document: {document}
Claim: {claim}

哪里{文档}是上下文信息，{claim}是AI模型的反应，需要被评估。spring-doc.cadn.net.cn

示例

以下是如何将FactCheckingEvaluator与基于Ollama的聊天模型结合的示例，具体来说是Bespoke-Minicheck模型：spring-doc.cadn.net.cn

@Test
void testFactChecking() {
  // Set up the Ollama API
  OllamaApi ollamaApi = new OllamaApi("http://localhost:11434");

  ChatModel chatModel = new OllamaChatModel(ollamaApi,
				OllamaChatOptions.builder().model(BESPOKE_MINICHECK).numPredict(2).temperature(0.0d).build())


  // Create the FactCheckingEvaluator
  var factCheckingEvaluator = new FactCheckingEvaluator(ChatClient.builder(chatModel));

  // Example context and claim
  String context = "The Earth is the third planet from the Sun and the only astronomical object known to harbor life.";
  String claim = "The Earth is the fourth planet from the Sun.";

  // Create an EvaluationRequest
  EvaluationRequest evaluationRequest = new EvaluationRequest(context, Collections.emptyList(), claim);

  // Perform the evaluation
  EvaluationResponse evaluationResponse = factCheckingEvaluator.evaluate(evaluationRequest);

  assertFalse(evaluationResponse.isPass(), "The claim should not be supported by the context");

}