upsertFromWebpageRecursive

Inserts or updates vectors from a webpage and its linked pages recursively.

upsertFromWebpageRecursive(
    webpage: string,
    options?: {
      excludeDirs?: string[]
      maxDepth?: number
      timeout?: number
      preventOutside?: boolean
      metadata?: Record<string, any>
      textSplitter?: SplitterParams
    }
): Promise<string[]>

Reference

import { myVectorStore } from "#elements";
export default async function () {
  const count = await myVectorStore.upsertFromWebpageRecursive(
    "https://docs.babel.cloud/docs/overview",
    { maxDepth: 2 }
  );
  console.log(`${count} vectors upserted`);
}

Parameters

webpage: The URL of the webpage to start extracting content from.
options: Optional configuration parameters, including:
- excludeDirs: (optional) Webpage directories to exclude.
- maxDepth: (optional) The maximum depth to crawl. By default, it is set to 2. If you need to crawl the whole website, set it to a number that is large enough would simply do the job.
- timeout: (optional) The timeout for each request, in the unit of seconds, defaults to 10000 (10 seconds).
- preventOutside: (optional) Whether to prevent crawling outside the root url, defaults to true.
- metadata: (optional) The metadata to associate with the vectors.
- textSplitter: (optional) The text splitter employed to divide the content into multiple vectors. In the absence of a provided splitter, the token splitter is used by default.

Returns

Promise of an array of IDs of the upserted vectors.

Caveats

This method will insert a new vector for each webpage, or update the existing vector if the webpage has been previously upserted.
At times, this approach might not recursively load in the manner anticipated, which is contingent upon the configuration of the website's menu.
You can query all the results by filtering the metadata field source-by-babel to webpage.

upsertFromWebpageRecursive

Reference​

Parameters​

Returns​

Caveats​

Reference

Parameters

Returns

Caveats