The Unstructured Platform can offer you recommendations for an ideal embedding provider and model and chunking strategy and settings for your source files. These recommendations are optimized to work well across a variety of vector stores, RAG applications, and model fine-tuning scenarios.

Unstructured’s embedding and chunking recommendations are especially useful if you are not familiar with how the various embedding and chunking strategies and settings can be applied for optimal results. However, if you are already comfortable with embedding and chunking, these recommendations can still be useful in helping inform your current strategies.

Unstructured’s recommendations can be implemented only in Build it with me > Custom and Build it myself workflows. You cannot implement these recommendations in Build it with me > Basic, Advanced, and Platinum workflows, as those workflow types already have preset embedding and chunking settings that cannot be changed.

Unstructured makes its recommendations by using the specified source connector to access, process, and analyze a sampling of files from the source location. Unstructured then recommends an embedding provider and model and a chunking strategy and settings based on this analysis.

Unstructured’s embedding and chunking recommendations can be requested for the following file-based source connector types:

Performing a recommendation will result in billing to your Unstructured account. To make its recommendation, Unstructured must process and analyze a sampling of up to 50 files from the source location. Your Unstructured account is billed for the equivalent number of pages.

We calculate a page as follows:

  • For these file types, a page is a page, slide, or image: .pdf, .pptx, and .tiff.
  • For .docx files that have page metadata, we calculate the number of pages based on that metadata.
  • For all other file types, we calculate the number of pages as the file’s size divided by 100 KB.
  • For non-file data, we calculate a page as 100 KB of incoming data to be processed.

Request a recommendation

  1. In the Unstructured Platform, on the sidebar, click Connectors.

  2. Click Sources.

  3. Click the name of the source connector that you want to use. If you do not have a source connector, create one.

  4. If you’re requesting a recomendation for the first time for this connector, click the Run Recommender button.

    If you have previously requested a recommendation for this connector, you can make another request by clicking the Run Again button. This is useful if you significantly changed the files in the source location since you previously requested a recommendation.

    If the Run Recommender or Run Again button is not visible, or if they are visible but not enabled, check for the following:

    • The selected connector must be a file-based source connector. See the preceding list for supported file-based source connector types.
    • The selected connector must have successfully passed a connectivity test. If the connector’s details pane does not show a Successful icon, then click the pencil icon, make any necessary changes to the connector’s previous settings, and then click Save and Test.
  5. Two Scheduled statuses appear, one for Embed and another for Chunk.

  6. After several minutes, the Scheduled statuses are replaced by Running.

  7. After several more minutes, the Running statuses are replaced by Finished.

  8. After Finished appears, to view the recommendation, click View.

The Auto Recommender Results pane shows Unstructured’s recommended embedding provider and model and chunking strategy and settings for the source files that it analyzed.

Implement an embed recommendation

  1. In the Auto Recommender Results pane, in the Embed Recommendation area, note the recommended embedding provider and model.
  2. To implement the recommendation, expand the Next Steps section and follow the on-screen instructions for your target workflow.

Implement a chunking recommendation

  1. In the Auto Recommender Results pane, in the Chunk Recommendation area, note the recommended chunking strategy and settings.
  2. To implement the recommendation, expand the Next Steps section and follow the on-screen instructions for your target workflow.