This page was recently updated. What do you think about it? Let us know!.

Batch process all your records to store structured outputs in KDB.AI.

The requirements are as follows.

  • A KDB.AI Cloud or server instance. Sign Up for KDB.AI Cloud: Starter Edition. Set up KDB.AI Server.

  • The instance’s endpoint URL. Get the KDB.AI Cloud endpoint URL. Get the KDB.AI Server endpoint URL.

  • An API key. Create the API key.

  • The name of the target table to access. Create the table.

    KDB.AI requires the target table to have a defined schema before Unstructured can write to the table. The recommended table schema for Unstructured contains the fields id, element_id, document, metadata, and embeddings, as follows. This example code demonstrates the use of the KDB.AI Client for Python to create a table with this recommended schema, along with creating a vector index that contains 3072 dimensions:

    Python
    import kdbai_client as kdbai
    import os
    
    session = kdbai.Session(
        endpoint=os.getenv("KDBAI_ENDPOINT"),
        api_key=os.getenv("KDBAI_API_KEY")
    )
    
    db = session.database("default")
    
    schema = [
        {
            "name": "id",
            "type": "str"
        },
        {
            "name": "element_id",
            "type": "str"
        },
        {
            "name": "document",
            "type": "str"
        },
        {
            "name": "metadata", 
            "type": "general"
        },
        {
            "name": "embeddings",
            "type": "float32s"
        }
    ]
    
    indexes = [ 
        {
            "name": "vectorIndex",
            "type": "flat", 
            "params": {
                "dims": 3072,
                "metric": "L2"
            },
            "column": "embeddings"
        }
    ]
    
    table = db.create_table(
        table=os.getenv("KDBAI_TABLE"),
        schema=schema,
        indexes=indexes
    )
    
    print(f"The table named '{table.name}' now exists.")
    

The KDB.AI connector dependencies:

CLI, Python
pip install "unstructured-ingest[kdbai]"

You might also need to install additional dependencies, depending on your needs. Learn more.

The following environment variables:

  • KDBAI_ENDPOINT - The KDB.AI instance’s endpoint URL, represented by --endpoint (CLI) or endpoint (Python).
  • KDBAI_API_KEY - The KDB.AI API key, represented by --api-key (CLI) or api_key (Python).
  • KDBAI_TABLE - The name of the target table, represented by --table-name (CLI) or table_name (Python).

These environment variables:

  • UNSTRUCTURED_API_KEY - Your Unstructured API key value.
  • UNSTRUCTURED_API_URL - Your Unstructured API URL.

Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported. This example uses the local source connector: