Azure Cosmos DB

本节将引导你如何设置CosmosDBVectorStore用于存储文档嵌入并进行相似性搜索。spring-doc.cadn.net.cn

什么是Azure Cosmos数据库？

Azure Cosmos DB 是 Microsoft 全球分布式云原生数据库服务，专为关键任务应用设计。它具备高可用性、低延迟，并具备水平扩展能力以满足现代应用需求。它从零开始构建，核心是全球分发、细粒度多租户和横向可扩展性。它是Azure中的一项基础服务，被Microsoft全球范围内大多数关键任务应用使用，包括Teams、Skype、Xbox Live、Office 365、Bing、Azure Active Directory、Azure Portal、Microsoft Store等。它也被数千个外部客户使用，包括OpenAI用于ChatGPT及其他需要弹性规模、交钥匙全球分发以及全球低延迟和高可用性的关键AI应用。spring-doc.cadn.net.cn

什么是DiskAN？

DiskANN（基于磁盘的近似最近邻搜索）是一种创新技术，用于Azure Cosmos数据库，以提升向量搜索的性能。它通过索引存储在 Cosmos DB 中的嵌入，实现高效且可扩展的高维数据相似性搜索。spring-doc.cadn.net.cn

DiskANN 提供了以下优势：spring-doc.cadn.net.cn

效率：通过利用基于磁盘的结构，DiskANN相比传统方法显著缩短了寻找最近邻所需的时间。spring-doc.cadn.net.cn
可扩展性：它能够处理超过内存容量的大型数据集，适合多种应用，包括机器学习和人工智能驱动的解决方案。spring-doc.cadn.net.cn
低延迟：DiskANN 在搜索作中最小化延迟，确保即使数据量庞大，应用也能快速检索结果。spring-doc.cadn.net.cn

在Azure Cosmos数据库的Spring AI中，矢量搜索将创建并利用DiskANN索引，确保相似性查询的最佳性能。spring-doc.cadn.net.cn

设置带有自动配置的Azure Cosmos DB Vector Store

以下代码演示如何设置CosmosDBVectorStore具备自动配置功能：spring-doc.cadn.net.cn

package com.example.demo;

import io.micrometer.observation.ObservationRegistry;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Lazy;

import java.util.List;
import java.util.Map;
import java.util.UUID;

import static org.assertj.core.api.Assertions.assertThat;

@SpringBootApplication
@EnableAutoConfiguration
public class DemoApplication implements CommandLineRunner {

    private static final Logger log = LoggerFactory.getLogger(DemoApplication.class);

    @Lazy
    @Autowired
    private VectorStore vectorStore;

    public static void main(String[] args) {
        SpringApplication.run(DemoApplication.class, args);
    }

    @Override
    public void run(String... args) throws Exception {
        Document document1 = new Document(UUID.randomUUID().toString(), "Sample content1", Map.of("key1", "value1"));
        Document document2 = new Document(UUID.randomUUID().toString(), "Sample content2", Map.of("key2", "value2"));
		this.vectorStore.add(List.of(document1, document2));
        List<Document> results = this.vectorStore.similaritySearch(SearchRequest.builder().query("Sample content").topK(1).build());

        log.info("Search results: {}", results);

        // Remove the documents from the vector store
		this.vectorStore.delete(List.of(document1.getId(), document2.getId()));
    }

    @Bean
    public ObservationRegistry observationRegistry() {
        return ObservationRegistry.create();
    }
}

自动配置

Spring AI自动配置、起始模块的工件名称发生了重大变化。更多信息请参阅升级说明。spring-doc.cadn.net.cn

在你的Maven项目中添加以下依赖：spring-doc.cadn.net.cn

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-vector-store-azure-cosmos-db</artifactId>
</dependency>

配置属性

Cosmos DB 向量存储器可用的配置属性如下：spring-doc.cadn.net.cn

属性	描述
spring.ai.vectorstore.cosmosdb.databaseNamespring-doc.cadn.net.cn	Cosmos 数据库的名称。spring-doc.cadn.net.cn
spring.ai.vectorstore.cosmosdb.containerNamespring-doc.cadn.net.cn	Cosmos DB容器的名称。spring-doc.cadn.net.cn
spring.ai.vectorstore.cosmosdb.partitionKeyPathspring-doc.cadn.net.cn	分区键的路径。spring-doc.cadn.net.cn
spring.ai.vectorstore.cosmosdb.metadataFieldsspring-doc.cadn.net.cn	元数据字段的逗号分隔列表。spring-doc.cadn.net.cn
spring.ai.vectorstore.cosmosdb.vectorStoreThroughputspring-doc.cadn.net.cn	向量存储的吞吐量。spring-doc.cadn.net.cn
spring.ai.vectorstore.cosmosdb.vectorDimensionsspring-doc.cadn.net.cn	向量的维数。spring-doc.cadn.net.cn
spring.ai.vectorstore.cosmosdb.endpointspring-doc.cadn.net.cn	宇宙数据库的端点。spring-doc.cadn.net.cn
spring.ai.vectorstore.cosmosdb.keyspring-doc.cadn.net.cn	Cosmos数据库的密钥（如果没有密钥，将使用[DefaultAzureCredential]（learn.microsoft.com/azure/developer/java/sdk/authentication/credential-chains#defaultazurecredential-overview））。spring-doc.cadn.net.cn

属性

描述

spring.ai.vectorstore.cosmosdb.databaseNamespring-doc.cadn.net.cn

Cosmos 数据库的名称。spring-doc.cadn.net.cn

spring.ai.vectorstore.cosmosdb.containerNamespring-doc.cadn.net.cn

Cosmos DB容器的名称。spring-doc.cadn.net.cn

spring.ai.vectorstore.cosmosdb.partitionKeyPathspring-doc.cadn.net.cn

分区键的路径。spring-doc.cadn.net.cn

spring.ai.vectorstore.cosmosdb.metadataFieldsspring-doc.cadn.net.cn

元数据字段的逗号分隔列表。spring-doc.cadn.net.cn

spring.ai.vectorstore.cosmosdb.vectorStoreThroughputspring-doc.cadn.net.cn

向量存储的吞吐量。spring-doc.cadn.net.cn

spring.ai.vectorstore.cosmosdb.vectorDimensionsspring-doc.cadn.net.cn

向量的维数。spring-doc.cadn.net.cn

spring.ai.vectorstore.cosmosdb.endpointspring-doc.cadn.net.cn

宇宙数据库的端点。spring-doc.cadn.net.cn

spring.ai.vectorstore.cosmosdb.keyspring-doc.cadn.net.cn

Cosmos数据库的密钥（如果没有密钥，将使用[DefaultAzureCredential]（learn.microsoft.com/azure/developer/java/sdk/authentication/credential-chains#defaultazurecredential-overview））。spring-doc.cadn.net.cn

带Filter的复杂搜索

你可以在 Cosmos 数据库向量存储中使用过滤器进行更复杂的搜索。下面是一个示例，演示如何在搜索查询中使用筛选器。spring-doc.cadn.net.cn

Map<String, Object> metadata1 = new HashMap<>();
metadata1.put("country", "UK");
metadata1.put("year", 2021);
metadata1.put("city", "London");

Map<String, Object> metadata2 = new HashMap<>();
metadata2.put("country", "NL");
metadata2.put("year", 2022);
metadata2.put("city", "Amsterdam");

Document document1 = new Document("1", "A document about the UK", this.metadata1);
Document document2 = new Document("2", "A document about the Netherlands", this.metadata2);

vectorStore.add(List.of(document1, document2));

FilterExpressionBuilder builder = new FilterExpressionBuilder();
List<Document> results = vectorStore.similaritySearch(SearchRequest.builder().query("The World")
    .topK(10)
    .filterExpression((this.builder.in("country", "UK", "NL")).build()).build());

Setup up Azure Cosmos DB Vector Store without Auto Configuration

以下代码演示如何设置CosmosDBVectorStore无需依赖自动配置。[默认AzureCredential]推荐用于Azure Cosmos DB的认证（learn.microsoft.com/azure/developer/java/sdk/authentication/credential-chains#defaultazurecredential-overview）。spring-doc.cadn.net.cn

@Bean
public VectorStore vectorStore(ObservationRegistry observationRegistry) {
    // Create the Cosmos DB client
    CosmosAsyncClient cosmosClient = new CosmosClientBuilder()
            .endpoint(System.getenv("COSMOSDB_AI_ENDPOINT"))
            .credential(new DefaultAzureCredentialBuilder().build())
            .userAgentSuffix("SpringAI-CDBNoSQL-VectorStore")
            .gatewayMode()
            .buildAsyncClient();

    // Create and configure the vector store
    return CosmosDBVectorStore.builder(cosmosClient, embeddingModel)
            .databaseName("test-database")
            .containerName("test-container")
            // Configure metadata fields for filtering
            .metadataFields(List.of("country", "year", "city"))
            // Set the partition key path (optional)
            .partitionKeyPath("/id")
            // Configure performance settings
            .vectorStoreThroughput(1000)
            .vectorDimensions(1536)  // Match your embedding model's dimensions
            // Add custom batching strategy (optional)
            .batchingStrategy(new TokenCountBatchingStrategy())
            // Add observation registry for metrics
            .observationRegistry(observationRegistry)
            .build();
}

@Bean
public EmbeddingModel embeddingModel() {
    return new TransformersEmbeddingModel();
}

此配置显示了所有可用的建造者选项：spring-doc.cadn.net.cn

数据库名称：你的Cosmos数据库名称spring-doc.cadn.net.cn
容器名称：数据库中你容器的名称spring-doc.cadn.net.cn
partitionKeyPath：分区键的路径（例如，“/id”）spring-doc.cadn.net.cn
元数据字段：用于过滤的元数据字段列表spring-doc.cadn.net.cn
向量存储吞吐量：向量存储容器的吞吐量（RU/s）spring-doc.cadn.net.cn
矢量维度：你的向量维数（应与你的嵌入模型相匹配）spring-doc.cadn.net.cn
批次策略：批处理文档作策略（可选）spring-doc.cadn.net.cn

手动依赖设置

在你的Maven项目中添加以下依赖：spring-doc.cadn.net.cn

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-azure-cosmos-db-store</artifactId>
</dependency>

访问本地客户端

Azure Cosmos DB 向量存储实现提供了对底层原生 Azure Cosmos DB 客户端的访问（宇宙客户端）通过getNativeClient（）方法：spring-doc.cadn.net.cn

CosmosDBVectorStore vectorStore = context.getBean(CosmosDBVectorStore.class);
Optional<CosmosClient> nativeClient = vectorStore.getNativeClient();

if (nativeClient.isPresent()) {
    CosmosClient client = nativeClient.get();
    // Use the native client for Azure Cosmos DB-specific operations
}

原生客户端为你提供了Azure Cosmos数据库专属的功能和作，这些可能无法通过VectorStore接口。spring-doc.cadn.net.cn