Files
ds_fire_fighter/doc-experiment.ipynb
T

397 lines
12 KiB
Plaintext
Raw Normal View History

2024-08-05 21:08:29 +01:00
{
"cells": [
2024-08-05 22:14:19 +01:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Libs import"
]
},
2024-08-05 21:08:29 +01:00
{
"cell_type": "code",
2024-08-05 22:14:19 +01:00
"execution_count": 2,
2024-08-05 21:08:29 +01:00
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.embeddings import HuggingFaceBgeEmbeddings\n",
"from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
2024-08-05 22:14:19 +01:00
"from langchain_community.vectorstores import FAISS\n",
"from langchain_community.document_loaders import PyPDFLoader"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Loading the embeddings model"
2024-08-05 21:08:29 +01:00
]
},
{
"cell_type": "code",
2024-08-05 22:14:19 +01:00
"execution_count": 3,
2024-08-05 21:08:29 +01:00
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"c:\\Users\\timmy_3aupohg\\anaconda3\\envs\\smog_env\\Lib\\site-packages\\sentence_transformers\\cross_encoder\\CrossEncoder.py:11: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)\n",
" from tqdm.autonotebook import tqdm, trange\n"
]
}
],
"source": [
"# Initialize embedding\n",
"model_name = \"BAAI/bge-small-en\"\n",
"model_kwargs = {\"device\": \"cuda\"} #can also be cpu\n",
"encode_kwargs = {\"normalize_embeddings\": True}\n",
"embeddings = HuggingFaceBgeEmbeddings(\n",
" model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs\n",
2024-08-05 22:14:19 +01:00
" )"
2024-08-05 21:08:29 +01:00
]
},
{
2024-08-05 22:14:19 +01:00
"cell_type": "markdown",
2024-08-05 21:08:29 +01:00
"metadata": {},
"source": [
2024-08-05 22:14:19 +01:00
"## Experiment for pdf loading"
2024-08-05 21:08:29 +01:00
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
2024-08-05 22:14:19 +01:00
"# creating a function that checks the document type and loads the document\n",
"def load_pdf_document(document_path):\n",
" if document_path.endswith(\".pdf\"):\n",
" pdf_doc = PyPDFLoader(document_path)\n",
" pages = pdf_doc.load_and_split()\n",
" return pages\n",
" else:\n",
" raise ValueError(f\"Unsupported document type for {document_path}\")\n"
2024-08-05 21:08:29 +01:00
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
2024-08-05 22:14:19 +01:00
"outputs": [],
2024-08-05 21:08:29 +01:00
"source": [
2024-08-05 22:14:19 +01:00
"# Load the document \n",
"document_path = \"data/corolla-2020-toyota-owners-manual.pdf\"\n",
"pdf_pages = load_pdf_document(document_path)"
2024-08-05 21:08:29 +01:00
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
2024-08-05 22:14:19 +01:00
"outputs": [],
2024-08-05 21:08:29 +01:00
"source": [
2024-08-05 22:14:19 +01:00
"db = FAISS.from_documents(pdf_pages, embeddings)"
2024-08-05 21:08:29 +01:00
]
},
{
"cell_type": "code",
2024-08-05 22:14:19 +01:00
"execution_count": 10,
2024-08-05 21:08:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2024-08-05 22:14:19 +01:00
"def save_embedded_data(embeddings, key=\"pdf\"):\n",
2024-08-05 21:08:29 +01:00
" embeddings.save_local(f\"vec-db/index/faiss_index_{key}\")\n",
" print(\"Embeddings saved\")\n",
"\n",
2024-08-05 22:14:19 +01:00
"def load_embedded_data(embeddings, key=\"pdf\"):\n",
2024-08-05 21:08:29 +01:00
" embed_db = FAISS.load_local(f\"vec-db/index/faiss_index_{key}\", embeddings, allow_dangerous_deserialization=True)\n",
" return embed_db"
]
},
{
"cell_type": "code",
2024-08-05 22:14:19 +01:00
"execution_count": 11,
2024-08-05 21:08:29 +01:00
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Embeddings saved\n"
]
}
],
"source": [
2024-08-05 22:14:19 +01:00
"save_embedded_data(db)"
2024-08-05 21:08:29 +01:00
]
},
{
"cell_type": "code",
2024-08-05 22:14:19 +01:00
"execution_count": 12,
2024-08-05 21:08:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2024-08-05 22:14:19 +01:00
"load_db = load_embedded_data(embeddings)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data Search"
2024-08-05 21:08:29 +01:00
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"query = \"Steering assist function/lane centering function\"\n",
"docs = load_db.similarity_search(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"202 4-5. Using the driving support systems\n",
"COROLLA_UInside of displayed lines is \n",
"black\n",
"Indicates that the system is not able to recognize white (yellow) \n",
"lines or a course\n",
"* or is temporar-\n",
"ily canceled.\n",
"*: Boundary between asphalt and \n",
"the side of the road, such as \n",
"grass, soil, or a curb\n",
"Follow-up cruising display\n",
"Displayed when the multi-informa-tion display is switched to the driv-ing support system information screen.\n",
"Indicates that steering assist of the \n",
"lane centering function is operating by monitoring the position of a pre-ceding vehicle.\n",
"When the follow-up cruising display \n",
"is displayed, if the preceding vehi-cle moves, your vehicle may move in the same way. A lways pay care-\n",
"ful attention to your surroundings and operate the steering wheel as necessary to correct the path of the vehicle and ensure safety.\n",
"■Operation conditions of each \n",
"function\n",
"●Lane departure alert function\n",
"This function oper ates when all of \n",
"the following cond itions are met.\n",
"• LTA is turned on.• Vehicle speed is approximately 32 \n",
"mph (50 km/h) or more.*1\n",
"• System recognizes white (yellow) \n",
"lane lines or a course*2. (When a \n",
"white [yellow] line or course*2 is \n",
"recognized on only one side, the system will operate only for the \n",
"recognized side.)\n",
"• Width of traffic lane is approxi-\n",
"mately 9.8 ft. (3 m) or more.\n",
"• Turn signal lever is not operated.\n",
"(Vehicles with a Blind Spot Moni-\n",
"tor: Except when another vehicle \n",
"is in the lane on the side where the turn signal was operated)\n",
"• Vehicle is not being driven around \n",
"a sharp curve.\n",
"• No system malfunctions are \n",
"detected. ( P.204)\n",
"*1:The function oper ates even if the \n",
"vehicle speed is less than \n",
"approximately 32 mph (50 km/h) when the lane centering function is operating.\n",
"*2:Boundary between asphalt and \n",
"the side of the road, such as grass, soil, or a curb\n",
"●Steering assist function\n",
"This function operates when all of the following conditions are met in addition to the operation conditions for the lane departure alert function.\n",
"• Setting for “Steering Assist” in \n",
"of the multi-information display is \n",
"set to “ON”. ( P.548)\n",
"• Vehicle is not accelerated or \n",
"decelerated by a fixed amount or more.\n",
"• Steering wheel is not operated \n",
"with a steering force level suitable \n",
"for changing lanes.\n",
"• ABS, VSC, TRAC and PCS are \n",
"not operating.\n",
"• TRAC or VSC is not turned off.\n",
"• Hands off steering wheel warning \n",
"is not displayed. ( P.204)\n",
"●Vehicle sway warning function\n",
"This function operates when all of \n",
"https://www.MyCarManual.com\n"
]
}
],
"source": [
"print(docs[0].page_content)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"201\n"
]
}
],
"source": [
"print(docs[0].metadata['page'])"
]
},
{
"cell_type": "code",
2024-08-05 22:14:19 +01:00
"execution_count": 15,
2024-08-05 21:08:29 +01:00
"metadata": {},
"outputs": [],
"source": [
"def search(db, query, k=4):\n",
" docs = db.similarity_search(query, k)\n",
" all = \"\"\n",
2024-08-05 22:14:19 +01:00
" pages = []\n",
2024-08-05 21:08:29 +01:00
" for doc in docs:\n",
" all += f\"{doc.page_content}\\n\"\n",
2024-08-05 22:14:19 +01:00
" pages.append(doc.metadata['page'])\n",
" return docs[0].page_content, all, pages"
2024-08-05 21:08:29 +01:00
]
},
{
"cell_type": "code",
2024-08-05 22:14:19 +01:00
"execution_count": 16,
2024-08-05 21:08:29 +01:00
"metadata": {},
"outputs": [
{
2024-08-05 22:14:19 +01:00
"name": "stdout",
"output_type": "stream",
"text": [
"206 4-5. Using the driving support systems\n",
"COROLLA_UWARNING\n",
"■Before using LDA system\n",
"●Do not rely solely upon the LDA \n",
"system. The LDA system does \n",
"not automatically drive the vehi-cle or reduce the amount of \n",
"attention that must be paid to \n",
"the area in front of the vehicle. The driver must always assume \n",
"full responsibilit y for driving \n",
"safely by paying careful atten-\n",
"tion to the surrounding condi-tions and operating the steering \n",
"wheel to correct the path of the \n",
"vehicle. Also, the driver must take adequate breaks when \n",
"fatigued, such as from driving \n",
"for a long period of time.\n",
"●Failure to perform appropriate \n",
"driving operations and pay care-\n",
"ful attention may lead to an \n",
"accident, resulting in death or serious injury.\n",
"●When not using the LDA sys-\n",
"tem, use the LDA switch to turn \n",
"the system off.\n",
"■Situations unsuitable for LDA system\n",
"In the following situations, use the LDA switch to turn the system off. \n",
"Failure to do so may lead to an \n",
"accident, resulting in death or serious injury.\n",
"●Vehicle is driven on a road sur-\n",
"face which is slippery due to \n",
"rainy weather, fallen snow, freezing, etc.\n",
"●Vehicle is driven on a snow-cov-\n",
"ered road.\n",
"●White (yellow) lin es are difficult \n",
"to see due to rain, snow, fog, \n",
"dust, etc.\n",
"●A spare tire, tire chains, etc. are \n",
"equipped.●When the tires have been excessively worn, or when the \n",
"tire inflation p ressure is low.\n",
"●When tires of a size other than specified are installed.\n",
"●Vehicle is driven in traffic lanes \n",
"other than that highways and \n",
"freeways.\n",
"●During emergency towing.\n",
"■Preventing LDA system mal-functions and operations per-\n",
"formed by mistake\n",
"●Do not modify the headlights or place stickers, etc. on the sur-\n",
"face of the lights.\n",
"●Do not modify the suspension etc. If the suspension etc. needs \n",
"to be replaced, contact your \n",
"Toyota dealer.\n",
"●Do not install or place anything on the hoo d or grille. Also, do \n",
"not install a gr ille guard (bull \n",
"bars, kangaroo bar, etc.).\n",
"●If your windshield needs repairs, contact your Toyota \n",
"dealer.\n",
"■Conditions in which functions \n",
"may not operate properly\n",
"In the following situations, the \n",
"functions may not operate prop-erly and the vehicle may depart \n",
"from its lane. Drive safely by \n",
"always paying careful attention to your surroundings and operate \n",
"the steering whee l to correct the \n",
"path of the vehicle without relying \n",
"solely on the functions.\n",
"●Vehicle is being driven around a sharp curve.\n",
"https://www.MyCarManual.com\n"
2024-08-05 21:08:29 +01:00
]
}
],
"source": [
2024-08-05 22:14:19 +01:00
"search_result, all, pages = search(db, \"What is LDA\")\n",
2024-08-05 21:08:29 +01:00
"print( search_result )"
]
},
2024-08-05 22:14:19 +01:00
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[205, 208, 204, 212]"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pages"
]
},
2024-08-05 21:08:29 +01:00
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "ai_index",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}