Using OpenAI

info

Take-home activity! Do it if you are following along at home. It won't be covered during the hands-on lab.

OpenAI is a company that develops AI models for natural language processing. They offer a free API that you can use to create embeddings for your documents. The API is called OpenAI's Embedding API.

To get some embeddings using their API, you need to create an account and get an API key.

Create an OpenAI account and get an API key

To create an account, go to https://openai.com/ and click on the Log In button in the upper right corner. This will redirect you to the login page, where you'll have the option to sign up for their services.

Follow the instructions on the screen, and verify your email address.

Once you have an account, you can go to the API keys page to get an API key.

From there, click on the Create new secret key button.

You'll be prompted to give your key a name. You can call it "MongoDB Vector Search Demo." Click on the Create secret key button.

You will then be presented with your API key. Copy it and save it somewhere safe.

caution

Make sure you copy this key somewhere as you'll need it later on, and you won't be able to see it again.

Now that you have an API key, you can use it to create embeddings for your documents.

Create embeddings for documents

To create embeddings for your documents by sending curl commands to the OpenAI API, you can use the following command.

OPENAI_API_KEY=<YOUR_API_KEY>
curl https://api.openai.com/v1/embeddings \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "The food was delicious and the waiter...",
    "model": "text-embedding-ada-002"
  }'

You can find more information about the API in the OpenAI documentation.

Create embeddings for the books

To create the embeddings for the books in your collection, you should run this curl command, or use the Node.js library, for each book. This process is somewhat time-consuming, so we've already created them for you.

You can find the 1586 dimensions vector in the embeddings field of the books.

Because we already have the vectors for the books, we can use them with Vector Search.

Querying with vectors

To query the data, Vector Search will need to calculate the distance between the query vector and the vectors of the documents in the collection.

To do so, you will need to vectorize your query. You can use the same function to vectorize your query, as well.

🚀 NodeJS/Express
☕️ Java Spring Boot

In the library application, we've created a function that will vectorize your query for you. You can find it in the server/src/embeddings/openai.ts file.

import OpenAI from 'openai';

const { EMBEDDING_KEY } = process.env;

let openai;

const getTermEmbeddings = async (text) => {
    if (!openai) {
        openai = new OpenAI({apiKey: EMBEDDING_KEY});
    }
    const embeddings = await openai.embeddings.create({
        model: 'text-embedding-ada-002',
        input: text,
    });
    return embeddings?.data[0]?.embedding;
};

export default getTermEmbeddings;

Configuring the application

In your server/.env file, you'll find a few variables that you can use to configure the application.

The first one is EMBEDDINGS_SOURCE. It tells the application where to get the embeddings from. You can set it to openai.

Now that you have an OpenAI API key, you can set the EMBEDDING_KEY variable to your API key.

EMBEDDINGS_SOURCE=openai
EMBEDDING_KEY=sk-...

To run semantic queries in Java, the application also needs to generate embeddings for your query.
This is handled by the EmbeddingProvider interface. For OpenAI, the implementation is OpenAIEmbeddingProvider, which delegates the HTTP call to the Feign client:

src/main/java/com/mongodb/devrel/library/infrastructure/embeddings/openai/OpenAIEmbeddingClient.java
@FeignClient(
    name = "openai-embeddings",
    url = "${openai.base-url}",
    configuration = OpenAIFeignInterceptor.class
)
public interface OpenAIEmbeddingClient {
    @PostMapping(
            value = "/v1/embeddings", 
            consumes = "application/json", 
            produces = "application/json"
    )
    OpenAIEmbeddingResponse createEmbeddings(@RequestBody OpenAIEmbeddingRequest request);
}

Configuring the application

To use OpenAI, you only need to adjust two parameters in application.yml:

src/main/resources/application.yml
embeddings:
  source: ${EMBEDDING_SOURCE}

openai:
  api-key: ${OPENAI_KEY:yourKey}

The embeddings.source property is what selects the provider at runtime. If you set it to openai, the OpenAI implementation will be used.

You can either edit these values in application.yml, or export them as environment variables:

export EMBEDDING_SOURCE=openai
export OPENAI_KEY=sk-...

Running the application

Once the variables are set, start the application as usual:

mvn spring-boot:run

At the end, when the application starts, just make a call to searchBooks as described in the step Add Semantic Search to Your Application and you will see the application calling the OPENAI implementation:

Create an OpenAI account and get an API key​

Create embeddings for documents​

Create embeddings for the books​

Querying with vectors​

Configuring the application​

Configuring the application​

Running the application​

Create an OpenAI account and get an API key

Create embeddings for documents

Create embeddings for the books

Querying with vectors

Configuring the application

Configuring the application

Running the application