Skip to main content

upsertFromPDF

Inserts or updates vectors from a PDF File.

upsertFromPDF(
pdf: File,
options?: {
splitPages?: boolean
metadata?: Record<string, any>
textSplitter?: SplitterParams
}
): Promise<string[]>

Reference

import { myVectorStore, myAssets } from "#elements";
export default async function () {
const pdfFile = myAssets["/babel.pdf"];
const count = await myVectorStore.upsertFromPDF(pdfFile, {
splitPages: true,
});
console.log(`${count} vectors upserted`);
}

Parameters

  • pdf: The File representing the PDF to extract content from.
  • options: Optional configuration parameters, including:
    • splitPages: (optional) Whether to split the PDF into separate vectors for each page, defaults to true.
    • metadata: (optional) The metadata to associate with the vectors.
    • textSplitter: (optional) The text splitter employed to divide the content into multiple vectors. In the absence of a provided splitter, the token splitter is used by default.

Returns

Promise of an array of IDs of the upserted vectors.

Caveats

  • You can query all the results by filtering the metadata field source-by-babel to file name.
  • Only one PDF file can be uploaded at a time, and its size should not exceed 256MB.