Skip to main content

upsertFromWebpage

Inserts or updates vectors from a webpage.

upsertFromWebpage(
webpage: string,
options?: {
timeout?: number
selector?: string
textDecoder?: string
metadata?: Record<string, any>
textSplitter?: SplitterParams
}
): Promise<string[]>;

Reference

import { myVectorStore } from "#elements";
export default async function () {
const count = await myVectorStore.upsertFromWebpage(
"https://docs.babel.cloud/docs/overview",
{ selector: "body" }
);
console.log(`${count} vectors upserted`);
}

Parameters

  • webpage: The URL of the webpage to extract content from.
  • options: Optional configuration parameters, including:
    • timeout: (Optional) The maximum time in milliseconds to wait for the webpage to load, defaults to 10000 (10 seconds).
    • selector: (Optional) The CSS selector to extract content from, defaults to 'body'.
    • textDecoder: (Optional) The text decoder to use, defaults to 'utf-8'.
    • metadata: (Optional) The metadata to associate with the vectors.
    • textSplitter: (Optional) The text splitter employed to divide the content into multiple vectors. In the absence of a provided splitter, the token splitter is used by default.

Returns

Promise of an array of IDs of the upserted vectors.

Caveats

  • This method will insert a new vector if the webpage does not exist, or update the existing vector if the webpage exists.
  • You can query all the results by filtering the metadata field source-by-babel to webpage.

Examples

In the HTTP Element, parse and embed a web page in markdown, and perform semantic search:

import * as Koa from "koa";
import { myVectorStore } from "#elements";

export default async function (
request: Koa.Request,
response: Koa.Response,
ctx: Koa.Context
) {
await myVectorStore.upsertFromWebpage(
"https://docs.babel.cloud/docs/overview",
{
selector: `.markdown`,
}
);

const result = await myVectorStore.search("What is Babel?", {
topK: 3,
});
return result[0].metadata!["content"];
}