Update examples section

This commit is contained in:
Eric Ciarla
2024-06-21 15:40:46 -04:00
parent 5cf2beff92
commit 8e39083d8c
87 changed files with 8939 additions and 0 deletions
@@ -0,0 +1,11 @@
# Required environment variables
FIRECRAWL_API_KEY=
# Optional environment variables
# LangSmith tracing from the web worker.
# WARNING: FOR DEVELOPMENT ONLY. DO NOT DEPLOY A LIVE VERSION WITH THESE
# VARIABLES SET AS YOU WILL LEAK YOUR LANGCHAIN API KEY.
NEXT_PUBLIC_LANGCHAIN_TRACING_V2=
NEXT_PUBLIC_LANGCHAIN_API_KEY=
NEXT_PUBLIC_LANGCHAIN_PROJECT=
@@ -0,0 +1,3 @@
{
"extends": "next/core-web-vitals"
}
@@ -0,0 +1,38 @@
# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
# dependencies
/node_modules
/.pnp
.pnp.js
# testing
/coverage
# next.js
/.next/
/out/
# production
/build
# misc
.DS_Store
*.pem
# debug
npm-debug.log*
yarn-debug.log*
yarn-error.log*
# local env files
.env*.local
.env
# vercel
.vercel
# typescript
*.tsbuildinfo
next-env.d.ts
.yarn
@@ -0,0 +1 @@
{}
@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2023 Jacob Lee
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
@@ -0,0 +1,7 @@
Copyright <YEAR> <COPYRIGHT HOLDER>
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,72 @@
# Local Chat With Websites
Welcome to the Local Web Chatbot! This is a direct fork of [Jacob Lee' fully local PDF chatbot](https://github.com/jacoblee93/fully-local-pdf-chatbot) replacing the chat with PDF functionality with chat with website support powered by [Firecrawl](https://www.firecrawl.dev/). It is a simple chatbot that allows you to ask questions about a website by embedding it and running queries against the vector store using a local LLM and embeddings.
## 🦙 Ollama
You can run more powerful, general models outside the browser using [Ollama's desktop app](https://ollama.ai). Users will need to download and set up then run the following commands to allow the site access to a locally running Mistral instance:
### Mac/Linux
```bash
$ OLLAMA_ORIGINS=https://webml-demo.vercel.app OLLAMA_HOST=127.0.0.1:11435 ollama serve
```
Then, in another terminal window:
```bash
$ OLLAMA_HOST=127.0.0.1:11435 ollama pull mistral
```
### Windows
```cmd
$ set OLLAMA_ORIGINS=https://webml-demo.vercel.app
set OLLAMA_HOST=127.0.0.1:11435
ollama serve
```
Then, in another terminal window:
```cmd
$ set OLLAMA_HOST=127.0.0.1:11435
ollama pull mistral
```
## 🔥 Firecrawl
Additionally, you will need a Firecrawl API key for website embedding. Signing up for [Firecrawl](https://www.firecrawl.dev/) is easy and you get 500 credits free. Enter your API key into the box below the URL in the embedding form.
## ⚡ Stack
It uses the following:
- [Voy](https://github.com/tantaraio/voy) as the vector store, fully WASM in the browser.
- [Ollama](https://ollama.ai/).
- [LangChain.js](https://js.langchain.com) to call the models, perform retrieval, and generally orchestrate all the pieces.
- [Transformers.js](https://huggingface.co/docs/transformers.js/index) to run open source [Nomic](https://www.nomic.ai/) embeddings in the browser.
- For higher-quality embeddings, switch to `"nomic-ai/nomic-embed-text-v1"` in `app/worker.ts`.
- [Firecrawl](https://www.firecrawl.dev/) to scrape the webpages and deliver them in markdown format.
## 🔱 Forking
To run/deploy this yourself, simply fork this repo and install the required dependencies with `yarn`.
There are no required environment variables, but you can optionally set up [LangSmith tracing](https://smith.langchain.com/) while developing locally to help debug the prompts and the chain. Copy the `.env.example` file into a `.env.local` file:
```ini
# No environment variables required!
# LangSmith tracing from the web worker.
# WARNING: FOR DEVELOPMENT ONLY. DO NOT DEPLOY A LIVE VERSION WITH THESE
# VARIABLES SET AS YOU WILL LEAK YOUR LANGCHAIN API KEY.
NEXT_PUBLIC_LANGCHAIN_TRACING_V2="true"
NEXT_PUBLIC_LANGCHAIN_API_KEY=
NEXT_PUBLIC_LANGCHAIN_PROJECT=
```
Just make sure you don't set this in production, as your LangChain API key will be public on the frontend!
## 🙏 Thank you!
Huge thanks to Jacob Lee and the other contributors of the repo for making this happen! Be sure to give him a follow on Twitter [@Hacubu](https://x.com/hacubu)!
@@ -0,0 +1,74 @@
@tailwind base;
@tailwind components;
@tailwind utilities;
body {
color: #f8f8f8;
background: #131318;
}
body input,
body textarea {
color: black;
}
a {
color: #5ba4f8;
}
a:hover {
border-bottom: 1px solid;
}
p {
margin: 8px 0;
}
code,
pre {
color: #ffa500;
}
pre {
background-color: black;
color: #39ff14;
}
li {
padding: 4px;
}
@layer base {
label {
@apply h-6 relative inline-block;
}
[type="checkbox"] {
@apply w-11 h-0 cursor-pointer inline-block;
@apply focus:outline-0 dark:focus:outline-0;
@apply border-0 dark:border-0;
@apply focus:ring-offset-transparent dark:focus:ring-offset-transparent;
@apply focus:ring-transparent dark:focus:ring-transparent;
@apply focus-within:ring-0 dark:focus-within:ring-0;
@apply focus:shadow-none dark:focus:shadow-none;
@apply after:absolute before:absolute;
@apply after:top-0 before:top-0;
@apply after:block before:inline-block;
@apply before:rounded-full after:rounded-full;
@apply after:content-[''] after:w-5 after:h-5 after:mt-0.5 after:ml-0.5;
@apply after:shadow-md after:duration-100;
@apply before:content-[''] before:w-10 before:h-full;
@apply before:shadow-[inset_0_0_#000];
@apply after:bg-white dark:after:bg-gray-50;
@apply before:bg-gray-300 dark:before:bg-gray-600;
@apply before:checked:bg-lime-500 dark:before:checked:bg-lime-500;
@apply checked:after:duration-300 checked:after:translate-x-4;
@apply disabled:after:bg-opacity-75 disabled:cursor-not-allowed;
@apply disabled:checked:before:bg-opacity-40;
}
}
@@ -0,0 +1,49 @@
import "./globals.css";
import { Public_Sans } from "next/font/google";
import { Navbar } from "@/components/Navbar";
const publicSans = Public_Sans({ subsets: ["latin"] });
export default function RootLayout({
children,
}: {
children: React.ReactNode;
}) {
return (
<html lang="en">
<head>
<title>Fully In-Browser Chat Over Documents</title>
<link rel="shortcut icon" href="/images/favicon.ico" />
<meta
name="description"
content="Upload a PDF, then ask questions about it - without a single remote request!"
/>
<meta
property="og:title"
content="Fully In-Browser Chat Over Documents"
/>
<meta
property="og:description"
content="Upload a PDF, then ask questions about it - without a single remote request!"
/>
<meta property="og:image" content="/images/og-image.png" />
<meta property="og:image:width" content="1200" />
<meta property="og:image:height" content="630" />
<meta name="twitter:card" content="summary_large_image" />
<meta
name="twitter:title"
content="Fully In-Browser Chat Over Documents"
/>
<meta
name="twitter:description"
content="Upload a PDF, then ask questions about it - without a single remote request!"
/>
<meta name="twitter:image" content="/images/og-image.png" />
</head>
<body className={publicSans.className}>
<div className="flex flex-col p-4 md:p-12 h-[100vh]">{children}</div>
</body>
</html>
);
}
@@ -0,0 +1,7 @@
import { ChatWindow } from "@/components/ChatWindow";
export default function Home() {
return (
<ChatWindow placeholder="Try asking something about the document you just uploaded!"></ChatWindow>
);
}
@@ -0,0 +1,232 @@
import { ChatWindowMessage } from "@/schema/ChatWindowMessage";
import { Voy as VoyClient } from "voy-search";
import { createRetrievalChain } from "langchain/chains/retrieval";
import { createStuffDocumentsChain } from "langchain/chains/combine_documents";
import { createHistoryAwareRetriever } from "langchain/chains/history_aware_retriever";
import { FireCrawlLoader } from "@langchain/community/document_loaders/web/firecrawl";
import { HuggingFaceTransformersEmbeddings } from "@langchain/community/embeddings/hf_transformers";
import { VoyVectorStore } from "@langchain/community/vectorstores/voy";
import {
ChatPromptTemplate,
MessagesPlaceholder,
PromptTemplate,
} from "@langchain/core/prompts";
import { RunnableSequence, RunnablePick } from "@langchain/core/runnables";
import {
AIMessage,
type BaseMessage,
HumanMessage,
} from "@langchain/core/messages";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import type { BaseChatModel } from "@langchain/core/language_models/chat_models";
import type { LanguageModelLike } from "@langchain/core/language_models/base";
import { LangChainTracer } from "@langchain/core/tracers/tracer_langchain";
import { Client } from "langsmith";
import { ChatOllama } from "@langchain/community/chat_models/ollama";
const embeddings = new HuggingFaceTransformersEmbeddings({
modelName: "Xenova/all-MiniLM-L6-v2",
});
const voyClient = new VoyClient();
const vectorstore = new VoyVectorStore(voyClient, embeddings);
const OLLAMA_RESPONSE_SYSTEM_TEMPLATE = `You are an experienced researcher, expert at interpreting and answering questions based on provided sources. Using the provided context, answer the user's question to the best of your ability using the resources provided.
Generate a concise answer for a given question based solely on the provided search results. You must only use information from the provided search results. Use an unbiased and journalistic tone. Combine search results together into a coherent answer. Do not repeat text.
If there is nothing in the context relevant to the question at hand, just say "Hmm, I'm not sure." Don't try to make up an answer.
Anything between the following \`context\` html blocks is retrieved from a knowledge bank, not part of the conversation with the user.
<context>
{context}
<context/>
REMEMBER: If there is no relevant information within the context, just say "Hmm, I'm not sure." Don't try to make up an answer. Anything between the preceding 'context' html blocks is retrieved from a knowledge bank, not part of the conversation with the user.`;
const _formatChatHistoryAsMessages = async (
chatHistory: ChatWindowMessage[],
) => {
return chatHistory.map((chatMessage) => {
if (chatMessage.role === "human") {
return new HumanMessage(chatMessage.content);
} else {
return new AIMessage(chatMessage.content);
}
});
};
const embedWebsite = async (url: string, firecrawlApiKey: string) => {
const webLoader = new FireCrawlLoader({
url: url,
apiKey: firecrawlApiKey,
mode: "scrape",
});
const docs = await webLoader.load();
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 500,
chunkOverlap: 50,
});
const splitDocs = await splitter.splitDocuments(docs);
self.postMessage({
type: "log",
data: splitDocs,
});
await vectorstore.addDocuments(splitDocs);
};
const queryVectorStore = async (
messages: ChatWindowMessage[],
{
chatModel,
modelProvider,
devModeTracer,
}: {
chatModel: LanguageModelLike;
modelProvider: "ollama";
devModeTracer?: LangChainTracer;
},
) => {
const text = messages[messages.length - 1].content;
const chatHistory = await _formatChatHistoryAsMessages(messages.slice(0, -1));
const responseChainPrompt = ChatPromptTemplate.fromMessages<{
context: string;
chat_history: BaseMessage[];
question: string;
}>([
["system", OLLAMA_RESPONSE_SYSTEM_TEMPLATE],
new MessagesPlaceholder("chat_history"),
["user", `{input}`],
]);
const documentChain = await createStuffDocumentsChain({
llm: chatModel,
prompt: responseChainPrompt,
documentPrompt: PromptTemplate.fromTemplate(
`<doc>\n{page_content}\n</doc>`,
),
});
const historyAwarePrompt = ChatPromptTemplate.fromMessages([
new MessagesPlaceholder("chat_history"),
["user", "{input}"],
[
"user",
"Given the above conversation, generate a natural language search query to look up in order to get information relevant to the conversation. Do not respond with anything except the query.",
],
]);
const historyAwareRetrieverChain = await createHistoryAwareRetriever({
llm: chatModel,
retriever: vectorstore.asRetriever(),
rephrasePrompt: historyAwarePrompt,
});
const retrievalChain = await createRetrievalChain({
combineDocsChain: documentChain,
retriever: historyAwareRetrieverChain,
});
const fullChain = RunnableSequence.from([
retrievalChain,
new RunnablePick("answer"),
]);
const stream = await fullChain.stream(
{
input: text,
chat_history: chatHistory,
},
{
callbacks: devModeTracer !== undefined ? [devModeTracer] : [],
},
);
for await (const chunk of stream) {
if (chunk) {
self.postMessage({
type: "chunk",
data: chunk,
});
}
}
self.postMessage({
type: "complete",
data: "OK",
});
};
// Listen for messages from the main thread
self.addEventListener("message", async (event: { data: any }) => {
self.postMessage({
type: "log",
data: `Received data!`,
});
let devModeTracer;
if (
event.data.DEV_LANGCHAIN_TRACING !== undefined &&
typeof event.data.DEV_LANGCHAIN_TRACING === "object"
) {
devModeTracer = new LangChainTracer({
projectName: event.data.DEV_LANGCHAIN_TRACING.LANGCHAIN_PROJECT,
client: new Client({
apiKey: event.data.DEV_LANGCHAIN_TRACING.LANGCHAIN_API_KEY,
}),
});
}
if (event.data.url) {
try {
self.postMessage({
type: "log",
data: `Embedding website now: ${event.data.url} with Firecrawl API Key: ${event.data.firecrawlApiKey}`,
});
await embedWebsite(event.data.url, event.data.firecrawlApiKey);
self.postMessage({
type: "log",
data: `Embedded website: ${event.data.url} complete`,
});
} catch (e: any) {
self.postMessage({
type: "error",
error: e.message,
});
throw e;
}
} else {
const modelProvider = event.data.modelProvider;
const modelConfig = event.data.modelConfig;
let chatModel: BaseChatModel | LanguageModelLike;
chatModel = new ChatOllama(modelConfig);
try {
await queryVectorStore(event.data.messages, {
devModeTracer,
modelProvider,
chatModel,
});
} catch (e: any) {
self.postMessage({
type: "error",
error: `${e.message}. Make sure you are running Ollama.`,
});
throw e;
}
}
self.postMessage({
type: "complete",
data: "OK",
});
});
@@ -0,0 +1,125 @@
"use client";
import { toast } from 'react-toastify';
import 'react-toastify/dist/ReactToastify.css';
import { ChatWindowMessage } from '@/schema/ChatWindowMessage';
import { useState, type FormEvent } from "react";
import { Feedback } from 'langsmith';
export function ChatMessageBubble(props: {
message: ChatWindowMessage;
aiEmoji?: string;
onRemovePressed?: () => void;
}) {
const { role, content, runId } = props.message;
const colorClassName =
role === "human" ? "bg-sky-600" : "bg-slate-50 text-black";
const alignmentClassName =
role === "human" ? "ml-auto" : "mr-auto";
const prefix = role === "human" ? "🧑" : props.aiEmoji;
const [isLoading, setIsLoading] = useState(false);
const [feedback, setFeedback] = useState<Feedback | null>(null);
const [comment, setComment] = useState("");
const [showCommentForm, setShowCommentForm] = useState(false);
async function handleScoreButtonPress(e: React.MouseEvent<HTMLButtonElement, MouseEvent>, score: number) {
e.preventDefault();
setComment("");
await sendFeedback(score);
}
async function handleCommentSubmission(e: FormEvent<HTMLFormElement>) {
e.preventDefault();
const score = typeof feedback?.score === "number" ? feedback.score : 0;
await sendFeedback(score);
}
async function sendFeedback(score: number) {
if (isLoading) {
return;
}
setIsLoading(true);
const response = await fetch("api/feedback", {
method: feedback?.id ? "PUT" : "POST",
body: JSON.stringify({
id: feedback?.id,
run_id: runId,
score,
comment,
})
});
const json = await response.json();
if (json.error) {
toast(json.error, {
theme: "dark"
});
return;
} else if (feedback?.id && comment) {
toast("Response recorded! Go to https://smith.langchain.com and check it out in under your run's \"Feedback\" pane.", {
theme: "dark",
autoClose: 3000,
});
setComment("");
setShowCommentForm(false);
} else {
setShowCommentForm(true);
}
if (json.feedback) {
setFeedback(json.feedback);
}
setIsLoading(false);
}
return (
<div
className={`${alignmentClassName} ${colorClassName} rounded px-4 py-2 max-w-[80%] mb-8 flex flex-col`}
>
<div className="flex hover:group group">
<div className="mr-2">
{prefix}
</div>
<div className="whitespace-pre-wrap">
{/* TODO: Remove. Hacky fix, stop sequences don't seem to work with WebLLM yet. */}
{content.trim().split("\nInstruct:")[0].split("\nInstruction:")[0]}
</div>
<div className="cursor-pointer opacity-0 hover:opacity-100 relative left-2 bottom-1" onMouseUp={props?.onRemovePressed}>
</div>
</div>
<div className={`${!runId ? "hidden" : ""} ml-auto mt-2`}>
<button className={`p-2 border text-3xl rounded hover:bg-green-400 ${feedback && feedback.score === 1 ? "bg-green-400" : ""}`} onMouseUp={(e) => handleScoreButtonPress(e, 1)}>
👍
</button>
<button className={`p-2 border text-3xl rounded ml-4 hover:bg-red-400 ${feedback && feedback.score === 0 ? "bg-red-400" : ""}`} onMouseUp={(e) => handleScoreButtonPress(e, 0)}>
👎
</button>
</div>
<div className={`${(feedback && showCommentForm) ? "" : "hidden"} min-w-[480px]`}>
<form onSubmit={handleCommentSubmission} className="relative">
<input
className="mr-8 p-4 rounded w-full border mt-2"
value={comment}
placeholder={feedback?.score === 1 ? "Anything else you'd like to add about this response?" : "What would the correct or preferred response have been?"}
onChange={(e) => setComment(e.target.value)}
/>
<div role="status" className={`${isLoading ? "" : "hidden"} flex justify-center absolute top-[24px] right-[16px]`}>
<svg aria-hidden="true" className="w-6 h-6 text-slate-200 animate-spin dark:text-slate-200 fill-sky-800" viewBox="0 0 100 101" fill="none" xmlns="http://www.w3.org/2000/svg">
<path d="M100 50.5908C100 78.2051 77.6142 100.591 50 100.591C22.3858 100.591 0 78.2051 0 50.5908C0 22.9766 22.3858 0.59082 50 0.59082C77.6142 0.59082 100 22.9766 100 50.5908ZM9.08144 50.5908C9.08144 73.1895 27.4013 91.5094 50 91.5094C72.5987 91.5094 90.9186 73.1895 90.9186 50.5908C90.9186 27.9921 72.5987 9.67226 50 9.67226C27.4013 9.67226 9.08144 27.9921 9.08144 50.5908Z" fill="currentColor"/>
<path d="M93.9676 39.0409C96.393 38.4038 97.8624 35.9116 97.0079 33.5539C95.2932 28.8227 92.871 24.3692 89.8167 20.348C85.8452 15.1192 80.8826 10.7238 75.2124 7.41289C69.5422 4.10194 63.2754 1.94025 56.7698 1.05124C51.7666 0.367541 46.6976 0.446843 41.7345 1.27873C39.2613 1.69328 37.813 4.19778 38.4501 6.62326C39.0873 9.04874 41.5694 10.4717 44.0505 10.1071C47.8511 9.54855 51.7191 9.52689 55.5402 10.0491C60.8642 10.7766 65.9928 12.5457 70.6331 15.2552C75.2735 17.9648 79.3347 21.5619 82.5849 25.841C84.9175 28.9121 86.7997 32.2913 88.1811 35.8758C89.083 38.2158 91.5421 39.6781 93.9676 39.0409Z" fill="currentFill"/>
</svg>
<span className="sr-only">Loading...</span>
</div>
</form>
</div>
</div>
);
}
@@ -0,0 +1,422 @@
"use client";
import { Id, ToastContainer, toast } from "react-toastify";
import "react-toastify/dist/ReactToastify.css";
import { useRef, useState, useEffect } from "react";
import type { FormEvent } from "react";
import { ChatMessageBubble } from "@/components/ChatMessageBubble";
import { ChatWindowMessage } from "@/schema/ChatWindowMessage";
export function ChatWindow(props: { placeholder?: string }) {
const { placeholder } = props;
const [messages, setMessages] = useState<ChatWindowMessage[]>([]);
const [input, setInput] = useState("");
const [isLoading, setIsLoading] = useState(true);
const [selectedURL, setSelectedURL] = useState<string | null>(null);
const [firecrawlApiKey, setFirecrawlApiKey] = useState("");
const [readyToChat, setReadyToChat] = useState(false);
const initProgressToastId = useRef<Id | null>(null);
const titleText = "Local Chat With Websites";
const emoji = "🔥";
const worker = useRef<Worker | null>(null);
async function queryStore(messages: ChatWindowMessage[]) {
if (!worker.current) {
throw new Error("Worker is not ready.");
}
return new ReadableStream({
start(controller) {
if (!worker.current) {
controller.close();
return;
}
const ollamaConfig = {
baseUrl: "http://localhost:11435",
temperature: 0.3,
model: "mistral",
};
const payload: Record<string, any> = {
messages,
modelProvider: "ollama",
modelConfig: ollamaConfig,
};
if (
process.env.NEXT_PUBLIC_LANGCHAIN_TRACING_V2 === "true" &&
process.env.NEXT_PUBLIC_LANGCHAIN_API_KEY !== undefined
) {
console.warn(
"[WARNING]: You have set your LangChain API key publicly. This should only be done in local devlopment - remember to remove it before deploying!",
);
payload.DEV_LANGCHAIN_TRACING = {
LANGCHAIN_TRACING_V2: "true",
LANGCHAIN_API_KEY: process.env.NEXT_PUBLIC_LANGCHAIN_API_KEY,
LANGCHAIN_PROJECT: process.env.NEXT_PUBLIC_LANGCHAIN_PROJECT,
};
}
worker.current?.postMessage(payload);
const onMessageReceived = async (e: any) => {
switch (e.data.type) {
case "log":
console.log(e.data);
break;
case "init_progress":
if (initProgressToastId.current === null) {
initProgressToastId.current = toast(
"Loading model weights... This may take a while",
{
progress: e.data.data.progress || 0.01,
theme: "dark",
},
);
} else {
if (e.data.data.progress === 1) {
await new Promise((resolve) => setTimeout(resolve, 2000));
}
toast.update(initProgressToastId.current, {
progress: e.data.data.progress || 0.01,
});
}
break;
case "chunk":
controller.enqueue(e.data.data);
break;
case "error":
worker.current?.removeEventListener("message", onMessageReceived);
console.log(e.data.error);
const error = new Error(e.data.error);
controller.error(error);
break;
case "complete":
worker.current?.removeEventListener("message", onMessageReceived);
controller.close();
break;
}
};
worker.current?.addEventListener("message", onMessageReceived);
},
});
}
async function sendMessage(e: FormEvent<HTMLFormElement>) {
e.preventDefault();
if (isLoading || !input) {
return;
}
const initialInput = input;
const initialMessages = [...messages];
const newMessages = [
...initialMessages,
{ role: "human" as const, content: input },
];
setMessages(newMessages);
setIsLoading(true);
setInput("");
try {
const stream = await queryStore(newMessages);
const reader = stream.getReader();
let chunk = await reader.read();
const aiResponseMessage: ChatWindowMessage = {
content: "",
role: "ai" as const,
};
setMessages([...newMessages, aiResponseMessage]);
while (!chunk.done) {
aiResponseMessage.content = aiResponseMessage.content + chunk.value;
setMessages([...newMessages, aiResponseMessage]);
chunk = await reader.read();
}
setIsLoading(false);
} catch (e: any) {
setMessages(initialMessages);
setIsLoading(false);
setInput(initialInput);
toast(`There was an issue with querying your website: ${e.message}`, {
theme: "dark",
});
}
}
// We use the `useEffect` hook to set up the worker as soon as the `App` component is mounted.
useEffect(() => {
if (!worker.current) {
// Create the worker if it does not yet exist.
worker.current = new Worker(
new URL("../app/worker.ts", import.meta.url),
{
type: "module",
},
);
setIsLoading(false);
}
}, []);
async function embedWebsite(e: FormEvent<HTMLFormElement>) {
console.log(e);
console.log(selectedURL);
console.log(firecrawlApiKey);
e.preventDefault();
// const reader = new FileReader();
if (selectedURL === null) {
toast(`You must enter a URL to embed.`, {
theme: "dark",
});
return;
}
setIsLoading(true);
worker.current?.postMessage({
url: selectedURL,
firecrawlApiKey: firecrawlApiKey,
});
const onMessageReceived = (e: any) => {
switch (e.data.type) {
case "log":
console.log(e.data);
break;
case "error":
worker.current?.removeEventListener("message", onMessageReceived);
setIsLoading(false);
console.log(e.data.error);
toast(`There was an issue embedding your website: ${e.data.error}`, {
theme: "dark",
});
break;
case "complete":
worker.current?.removeEventListener("message", onMessageReceived);
setIsLoading(false);
setReadyToChat(true);
toast(
`Embedding successful! Now try asking a question about your website.`,
{
theme: "dark",
},
);
break;
}
};
worker.current?.addEventListener("message", onMessageReceived);
}
const chooseDataComponent = (
<>
<div className="p-4 md:p-8 rounded bg-[#25252d] w-full max-h-[85%] overflow-hidden flex flex-col">
<h1 className="text-3xl md:text-4xl mb-2 ml-auto mr-auto">
{emoji} Local Chat With Websites {emoji}
</h1>
<ul>
<li className="text-l">
🏡
<span className="ml-2">
Welcome to the Local Web Chatbot!
<br></br>
<br></br>
This is a direct fork of{" "}
<a href="https://github.com/jacoblee93/fully-local-pdf-chatbot">
Jacob Lee&apos;s fully local PDF chatbot
</a>{" "}
replacing the chat with PDF functionality with website support. It
is a simple chatbot that allows you to ask questions about a
website by embedding it and running queries against the vector
store using a local LLM and embeddings.
</span>
</li>
<li>
<span className="ml-2">
The default LLM is Mistral-7B run locally by Ollama. You&apos;ll
need to install{" "}
<a target="_blank" href="https://ollama.ai">
the Ollama desktop app
</a>{" "}
and run the following commands to give this site access to the
locally running model:
<br />
<pre className="inline-flex px-2 py-1 my-2 rounded">
$ OLLAMA_ORIGINS=https://webml-demo.vercel.app
OLLAMA_HOST=127.0.0.1:11435 ollama serve
</pre>
<br />
Then, in another window:
<br />
<pre className="inline-flex px-2 py-1 my-2 rounded">
$ OLLAMA_HOST=127.0.0.1:11435 ollama pull mistral
</pre>
<br />
Additionally, you will need a Firecrawl API key for website
embedding. Signing up at{" "}
<a target="_blank" href="https://firecrawl.dev">
firecrawl.dev
</a>{" "}
is easy and you get 500 credits free. Enter your API key into the
box below the URL in the embedding form.
</span>
</li>
<li className="text-l">
🐙
<span className="ml-2">
Both this template and Jacob Lee&apos;s template are open source -
you can see the source code and deploy your own version{" "}
<a
href="https://github.com/ericciarla/local-web-chatbot"
target="_blank"
>
from the GitHub repo
</a>
or Jacob&apos;s{" "}
<a href="https://github.com/jacoblee93/fully-local-pdf-chatbot">
original GitHub repo
</a>
!
</span>
</li>
<li className="text-l">
👇
<span className="ml-2">
Try embedding a website below, then asking questions! You can even
turn off your WiFi after the website is scraped.
</span>
</li>
</ul>
</div>
<form
onSubmit={embedWebsite}
className="mt-4 flex flex-col justify-between items-center w-full"
>
<input
id="url_input"
type="text"
placeholder="Enter a URL to scrape"
className="text-black mb-2 w-[300px] px-4 py-2 rounded-lg"
onChange={(e) => setSelectedURL(e.target.value)}
></input>
<input
id="api_key_input"
type="text"
placeholder="Enter your Firecrawl API Key"
className="text-black mb-2 w-[300px] px-4 py-2 rounded-lg"
onChange={(e) => setFirecrawlApiKey(e.target.value)}
></input>
<button
type="submit"
className="shrink-0 px-4 py-4 bg-sky-600 rounded w-42"
>
<div
role="status"
className={`${isLoading ? "" : "hidden"} flex justify-center`}
>
<svg
aria-hidden="true"
className="w-6 h-6 text-white animate-spin dark:text-white fill-sky-800"
viewBox="0 0 100 101"
fill="none"
xmlns="http://www.w3.org/2000/svg"
>
<path
d="M100 50.5908C100 78.2051 77.6142 100.591 50 100.591C22.3858 100.591 0 78.2051 0 50.5908C0 22.9766 22.3858 0.59082 50 0.59082C77.6142 0.59082 100 22.9766 100 50.5908ZM9.08144 50.5908C9.08144 73.1895 27.4013 91.5094 50 91.5094C72.5987 91.5094 90.9186 73.1895 90.9186 50.5908C90.9186 27.9921 72.5987 9.67226 50 9.67226C27.4013 9.67226 9.08144 27.9921 9.08144 50.5908Z"
fill="currentColor"
/>
<path
d="M93.9676 39.0409C96.393 38.4038 97.8624 35.9116 97.0079 33.5539C95.2932 28.8227 92.871 24.3692 89.8167 20.348C85.8452 15.1192 80.8826 10.7238 75.2124 7.41289C69.5422 4.10194 63.2754 1.94025 56.7698 1.05124C51.7666 0.367541 46.6976 0.446843 41.7345 1.27873C39.2613 1.69328 37.813 4.19778 38.4501 6.62326C39.0873 9.04874 41.5694 10.4717 44.0505 10.1071C47.8511 9.54855 51.7191 9.52689 55.5402 10.0491C60.8642 10.7766 65.9928 12.5457 70.6331 15.2552C75.2735 17.9648 79.3347 21.5619 82.5849 25.841C84.9175 28.9121 86.7997 32.2913 88.1811 35.8758C89.083 38.2158 91.5421 39.6781 93.9676 39.0409Z"
fill="currentFill"
/>
</svg>
<span className="sr-only">Loading...</span>
</div>
<span className={isLoading ? "hidden" : ""}>Embed Website</span>
</button>
</form>
</>
);
const chatInterfaceComponent = (
<>
<div className="flex flex-col-reverse w-full mb-4 overflow-auto grow">
{messages.length > 0
? [...messages].reverse().map((m, i) => (
<ChatMessageBubble
key={i}
message={m}
aiEmoji={emoji}
onRemovePressed={() =>
setMessages((previousMessages) => {
const displayOrderedMessages = previousMessages.reverse();
return [
...displayOrderedMessages.slice(0, i),
...displayOrderedMessages.slice(i + 1),
].reverse();
})
}
></ChatMessageBubble>
))
: ""}
</div>
<form onSubmit={sendMessage} className="flex w-full flex-col">
<div className="flex w-full mt-4">
<input
className="grow mr-8 p-4 rounded"
value={input}
placeholder={placeholder ?? "What's it like to be a pirate?"}
onChange={(e) => setInput(e.target.value)}
/>
<button
type="submit"
className="shrink-0 px-8 py-4 bg-sky-600 rounded w-28"
>
<div
role="status"
className={`${isLoading ? "" : "hidden"} flex justify-center`}
>
<svg
aria-hidden="true"
className="w-6 h-6 text-white animate-spin dark:text-white fill-sky-800"
viewBox="0 0 100 101"
fill="none"
xmlns="http://www.w3.org/2000/svg"
>
<path
d="M100 50.5908C100 78.2051 77.6142 100.591 50 100.591C22.3858 100.591 0 78.2051 0 50.5908C0 22.9766 22.3858 0.59082 50 0.59082C77.6142 0.59082 100 22.9766 100 50.5908ZM9.08144 50.5908C9.08144 73.1895 27.4013 91.5094 50 91.5094C72.5987 91.5094 90.9186 73.1895 90.9186 50.5908C90.9186 27.9921 72.5987 9.67226 50 9.67226C27.4013 9.67226 9.08144 27.9921 9.08144 50.5908Z"
fill="currentColor"
/>
<path
d="M93.9676 39.0409C96.393 38.4038 97.8624 35.9116 97.0079 33.5539C95.2932 28.8227 92.871 24.3692 89.8167 20.348C85.8452 15.1192 80.8826 10.7238 75.2124 7.41289C69.5422 4.10194 63.2754 1.94025 56.7698 1.05124C51.7666 0.367541 46.6976 0.446843 41.7345 1.27873C39.2613 1.69328 37.813 4.19778 38.4501 6.62326C39.0873 9.04874 41.5694 10.4717 44.0505 10.1071C47.8511 9.54855 51.7191 9.52689 55.5402 10.0491C60.8642 10.7766 65.9928 12.5457 70.6331 15.2552C75.2735 17.9648 79.3347 21.5619 82.5849 25.841C84.9175 28.9121 86.7997 32.2913 88.1811 35.8758C89.083 38.2158 91.5421 39.6781 93.9676 39.0409Z"
fill="currentFill"
/>
</svg>
<span className="sr-only">Loading...</span>
</div>
<span className={isLoading ? "hidden" : ""}>Send</span>
</button>
</div>
</form>
</>
);
return (
<div
className={`flex flex-col items-center p-4 md:p-8 rounded grow overflow-hidden ${
readyToChat ? "border" : ""
}`}
>
<h2 className={`${readyToChat ? "" : "hidden"} text-2xl`}>
{emoji} {titleText}
</h2>
{readyToChat ? chatInterfaceComponent : chooseDataComponent}
<ToastContainer />
</div>
);
}
@@ -0,0 +1,16 @@
"use client";
import { usePathname } from 'next/navigation';
export function Navbar() {
const pathname = usePathname();
return (
<nav className="mb-4">
<a className={`mr-4 ${pathname === "/" ? "text-white border-b" : ""}`} href="/">🏴 Chat</a>
<a className={`mr-4 ${pathname === "/structured_output" ? "text-white border-b" : ""}`} href="/structured_output">🧱 Structured Output</a>
<a className={`mr-4 ${pathname === "/agents" ? "text-white border-b" : ""}`} href="/agents">🦜 Agents</a>
<a className={`mr-4 ${pathname === "/retrieval" ? "text-white border-b" : ""}`} href="/retrieval">🐶 Retrieval</a>
<a className={`mr-4 ${pathname === "/retrieval_agents" ? "text-white border-b" : ""}`} href="/retrieval_agents">🤖 Retrieval Agents</a>
</nav>
);
}
@@ -0,0 +1,39 @@
/** @type {import('next').NextConfig} */
const nextConfig = {
// (Optional) Export as a static site
// See https://nextjs.org/docs/pages/building-your-application/deploying/static-exports#configuration
output: 'export', // Feel free to modify/remove this option
// Override the default webpack configuration
webpack: (config, { isServer }) => {
// See https://webpack.js.org/configuration/resolve/#resolvealias
config.resolve.alias = {
...config.resolve.alias,
"sharp$": false,
"onnxruntime-node$": false,
}
config.experiments = {
...config.experiments,
topLevelAwait: true,
asyncWebAssembly: true,
};
config.module.rules.push({
test: /\.md$/i,
use: "raw-loader",
});
// Fixes npm packages that depend on `fs` module
if (!isServer) {
config.resolve.fallback = {
...config.resolve.fallback, // if you miss it, all the other options in fallback, specified
// by next.js will be dropped. Doesn't make much sense, but how it is
fs: false, // the solution
"node:fs/promises": false,
module: false,
perf_hooks: false,
};
}
return config;
},
}
module.exports = nextConfig
@@ -0,0 +1,47 @@
{
"name": "local-website-chatbot",
"version": "0.0.0",
"private": true,
"scripts": {
"dev": "next dev",
"build": "next build",
"start": "next start",
"lint": "next lint",
"format": "prettier --write \"app\""
},
"engines": {
"node": ">=18"
},
"dependencies": {
"@langchain/community": "^0.2.9",
"@langchain/weaviate": "^0.0.4",
"@mendable/firecrawl-js": "^0.0.26",
"@mlc-ai/web-llm": "^0.2.42",
"@types/node": "20.4.5",
"@types/react": "18.2.17",
"@types/react-dom": "18.2.7",
"@xenova/transformers": "^2.16.0",
"autoprefixer": "10.4.14",
"encoding": "^0.1.13",
"eslint": "8.46.0",
"eslint-config-next": "13.4.12",
"jest": "^29.7.0",
"langchain": "^0.2.5",
"next": "13.4.12",
"pdf-parse": "^1.1.1",
"postcss": "8.4.27",
"react": "18.2.0",
"react-dom": "18.2.0",
"react-toastify": "^10.0.5",
"tailwindcss": "3.3.3",
"ts-node": "^10.9.2",
"typescript": "^5.4.5",
"voy-search": "^0.6.3"
},
"devDependencies": {
"prettier": "3.0.0"
},
"resolutions": {
"@langchain/core": "0.2.6"
}
}
@@ -0,0 +1,6 @@
module.exports = {
plugins: {
tailwindcss: {},
autoprefixer: {},
},
}
Binary file not shown.

After

Width:  |  Height:  |  Size: 4.0 MiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 308 KiB

@@ -0,0 +1,6 @@
export type ChatWindowMessage = {
content: string;
role: "human" | "ai";
runId?: string;
traceUrl?: string;
}
@@ -0,0 +1,18 @@
/** @type {import('tailwindcss').Config} */
module.exports = {
content: [
'./pages/**/*.{js,ts,jsx,tsx,mdx}',
'./components/**/*.{js,ts,jsx,tsx,mdx}',
'./app/**/*.{js,ts,jsx,tsx,mdx}',
],
theme: {
extend: {
backgroundImage: {
'gradient-radial': 'radial-gradient(var(--tw-gradient-stops))',
'gradient-conic':
'conic-gradient(from 180deg at 50% 50%, var(--tw-gradient-stops))',
},
},
},
plugins: [],
}
@@ -0,0 +1,28 @@
{
"compilerOptions": {
"target": "es5",
"lib": ["dom", "dom.iterable", "esnext"],
"allowJs": true,
"skipLibCheck": true,
"strict": true,
"forceConsistentCasingInFileNames": true,
"noEmit": true,
"esModuleInterop": true,
"module": "esnext",
"moduleResolution": "bundler",
"resolveJsonModule": true,
"isolatedModules": true,
"jsx": "preserve",
"incremental": true,
"plugins": [
{
"name": "next"
}
],
"paths": {
"@/*": ["./*"]
}
},
"include": ["next-env.d.ts", "**/*.ts", "**/*.tsx", ".next/types/**/*.ts"],
"exclude": ["node_modules"]
}
File diff suppressed because it is too large Load Diff