This page was recently updated. What do you think about it? Let us know!.

Connect SFTP to your preprocessing pipeline, and use the Unstructured Ingest CLI or the Unstructured Ingest Python library to batch process all your documents and store structured outputs locally on your filesystem.

The requirements are as follows.

  • The SFTP server hostname, port, username, and password.

    SFTP servers are offered by several vendors. For example, the following video shows how to create and set up an SFTP server by using AWS Transfer Family:

  • The directory path to start accessing data from, specified as sftp://<path>/<to>/<directory>.

See the SFTP documentation.

The SFTP connector dependencies:

CLI, Python
pip install "unstructured-ingest[sftp]"

You might also need to install additional dependencies, depending on your needs. Learn more.

The following environment variables:

  • SFTP_HOST - The SFTP hostname, represented by --host (CLI) or host (Python).
  • SFTP_PORT - The SFTP port number, represented by --port (CLI) or port (Python).
  • SFTP_REMOTE_URL - The directory path to start accessing data from, represented by --remote-url (CLI) or remote_url (Python).
  • SFTP_USERNAME - The SFTP username, represented by --username (CLI) or username (Python).
  • SFTP_PASSWORD - The SFTP password, represented by --password (CLI) or password (Python).

These environment variables:

  • UNSTRUCTURED_API_KEY - Your Unstructured API key value.
  • UNSTRUCTURED_API_URL - Your Unstructured API URL.

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The destination connector can be any of the ones supported. This example uses the local destination connector: