upsertFromWebpage
Inserts or updates vectors from a webpage.
upsertFromWebpage(
webpage: string,
options?: {
timeout?: number
selector?: string
textDecoder?: string
metadata?: Record<string, any>
textSplitter?: SplitterParams
}
): Promise<string[]>;
Reference
import { myVectorStore } from "#elements";
export default async function () {
const count = await myVectorStore.upsertFromWebpage(
"https://docs.babel.cloud/docs/overview",
{ selector: "body" }
);
console.log(`${count} vectors upserted`);
}
Parameters
webpage
: The URL of the webpage to extract content from.options
: Optional configuration parameters, including:timeout
: (Optional) The maximum time in milliseconds to wait for the webpage to load, defaults to 10000 (10 seconds).selector
: (Optional) The CSS selector to extract content from, defaults to 'body'.textDecoder
: (Optional) The text decoder to use, defaults to 'utf-8'.metadata
: (Optional) The metadata to associate with the vectors.textSplitter
: (Optional) The text splitter employed to divide the content into multiple vectors. In the absence of a provided splitter, the token splitter is used by default.
Returns
Promise of an array of IDs of the upserted vectors.
Caveats
- This method will insert a new vector if the
webpage
does not exist, or update the existing vector if thewebpage
exists. - You can query all the results by filtering the metadata field
source-by-babel
towebpage
.
Examples
In the HTTP Element, parse and embed a web page in markdown, and perform semantic search:
import * as Koa from "koa";
import { myVectorStore } from "#elements";
export default async function (
request: Koa.Request,
response: Koa.Response,
ctx: Koa.Context
) {
await myVectorStore.upsertFromWebpage(
"https://docs.babel.cloud/docs/overview",
{
selector: `.markdown`,
}
);
const result = await myVectorStore.search("What is Babel?", {
topK: 3,
});
return result[0].metadata!["content"];
}