Files
ds_fire_fighter/__pycache__/text_extractor.cpython-311.pyc
T

74 lines
7.4 KiB
Plaintext
Raw Normal View History

2024-08-08 14:58:44 +01:00
§
¸´f¯ãóTddlZddlmZddlZddlZddlZddlZGdd¦«ZdS)éN)ÚImagecó>eZdZdZdZdZdZdZdZdZ dZ
d S)
Ú
TextExtractorcó.| ¦«dS)N)Úset_tesseract_path)Úselfs úQc:\Users\timmy_3aupohg\Downloads\Manaknight Projects\ds_aiindex\text_extractor.pyÚ__init__zTextExtractor.__init__
sØ ×ÒÑcóàtj¦«}|dkrdtj_dS|dkrdtj_dS|dkrdtj_dSt d¦«dS) z[
Sets the path to the Tesseract executable based on the detected platform.
ÚLinuxz/usr/bin/tesseractÚWindowsz,C:\Program Files\Tesseract-OCR\tesseract.exeÚDarwinz/usr/local/bin/tesseractz=Unsupported platform. Please set the Tesseract path manually.N)ÚplatformÚsystemÚ pytesseractÚ
tesseract_cmdÚprint)rÚcurrent_platforms r rz TextExtractor.set_tesseract_path
s}õ
$œ?Ñð ˜ &Ø4H
 Ò
*Ø4e
 Ò
)Ø4N Ð Rr cóà tj|¦«5}tj|¦«}|cddd¦«S#1swxYwYdS#t$r}t d|¦«Yd}~dSd}~wwxYw)
Reads text from an image using pytesseract.
Args:
image_path (str): Path to the image file.
Returns:
str: Extracted text from the image.
NzError reading text from image: Ú)rÚopenrÚimage_to_stringÚ Exceptionr)rÚ
image_pathÚimgÚtextÚes r Úread_text_from_imagez"TextExtractor.read_text_from_imageð ݘ
¨3Ý2°3Ñ7Øð
ð
ð
ð
ñ
ô
ð
ð
ð
ð
ð
ð
øøøð
ð
ð
ð
ð
ð
øõð ð ð Ý Ð7°AÐ 22222øøøøð øøøs4A9¬ A¹=½AÁ=ÁAÁ
A-ÁA(Á(A-có~ d}tj|¦«5}|jD]}|| ¦«z
}Œ ddd¦«n #1swxYwY|t j|¦«S#t $r1}td|¦«Yd}~t j|¦«dSd}~wwxYw#t j|¦«wxYw)
Reads text from a PDF file using pytesseract.
Args:
pdf_path (str): Path to the PDF file.
Returns:
str: Extracted text from the PDF.
rError reading text from PDF: )Ú
pdfplumberrÚpagesÚ extract_textÚosÚremoverr©rÚpdf_pathrÚpdfÚpagers r Úread_text_from_pdfz TextExtractor.read_text_from_pdf2s&ð
؈DÝ Ñ
0¨cØœIð0˜D×/D










0øøøð



õ