c4145977dd
- Added support for processing DOCX files and extracting content. - Updated database schema to include a combined 'issues_and_recommendations' field. - Improved error handling during document uploads and analysis. - Modified the analysis display to show issues and recommendations in a structured format. - Adjusted API call parameters for better performance and error management.
865 lines
53 KiB
Plaintext
865 lines
53 KiB
Plaintext
2025-04-21 20:33:27,700 - root - INFO - Processing upload for document ID: 78afc395-9b7c-4388-8d8b-aa1d02fbf75f
|
|
2025-04-21 20:33:27,705 - root - INFO - File saved to data/uploads/78afc395-9b7c-4388-8d8b-aa1d02fbf75f_2.Tender Specifications.docx
|
|
2025-04-21 20:33:27,722 - root - INFO - Processing document 78afc395-9b7c-4388-8d8b-aa1d02fbf75f with content length: 17509
|
|
2025-04-21 20:33:28,023 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-21 20:33:28,939 - root - INFO - Stored embedding for document 78afc395-9b7c-4388-8d8b-aa1d02fbf75f
|
|
2025-04-21 20:34:54,473 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"
|
|
2025-04-21 20:36:26,368 - root - INFO - Document 78afc395-9b7c-4388-8d8b-aa1d02fbf75f processed successfully
|
|
2025-04-21 20:36:26,384 - root - INFO - Document 78afc395-9b7c-4388-8d8b-aa1d02fbf75f processed successfully
|
|
2025-04-21 21:10:13,818 - root - INFO - Processing upload for document ID: 1130b5e6-c350-40f9-94c2-9ad43d6fa4a4
|
|
2025-04-21 21:10:13,818 - root - INFO - File saved to data/uploads/1130b5e6-c350-40f9-94c2-9ad43d6fa4a4_1.Invitation to Tender.docx
|
|
2025-04-21 21:10:13,834 - root - INFO - Processing document 1130b5e6-c350-40f9-94c2-9ad43d6fa4a4 with content length: 13880
|
|
2025-04-21 21:10:14,176 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-21 21:10:14,994 - root - INFO - Stored embedding for document 1130b5e6-c350-40f9-94c2-9ad43d6fa4a4
|
|
2025-04-21 21:11:14,319 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"
|
|
2025-04-21 21:11:14,326 - root - ERROR - Error processing document 1130b5e6-c350-40f9-94c2-9ad43d6fa4a4: 'NoneType' object has no attribute 'lower'
|
|
2025-04-21 21:11:14,326 - root - ERROR - Error processing document: 'NoneType' object has no attribute 'lower'
|
|
2025-04-21 21:11:14,326 - root - ERROR - Traceback (most recent call last):
|
|
File "c:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\src\main.py", line 112, in upload_document
|
|
await document_processor.process_document(doc_id, file_path, document_type)
|
|
File "c:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\src\services\document_processor.py", line 147, in process_document
|
|
"issues": self._format_issues(reranked_issues),
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
File "c:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\src\services\document_processor.py", line 227, in _format_issues
|
|
if any(word in issue_text.lower() for word in ['critical', 'severe', 'major', 'high risk']):
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
File "c:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\src\services\document_processor.py", line 227, in <genexpr>
|
|
if any(word in issue_text.lower() for word in ['critical', 'severe', 'major', 'high risk']):
|
|
^^^^^^^^^^^^^^^^
|
|
AttributeError: 'NoneType' object has no attribute 'lower'
|
|
|
|
2025-04-21 21:22:01,931 - root - INFO - Processing upload for document ID: aecbb62c-b7ed-4c2e-beff-fe5e292de9f1
|
|
2025-04-21 21:22:01,940 - root - INFO - File saved to data/uploads/aecbb62c-b7ed-4c2e-beff-fe5e292de9f1_4.Scope of Work.docx
|
|
2025-04-21 21:22:01,994 - root - INFO - Processing document aecbb62c-b7ed-4c2e-beff-fe5e292de9f1 with content length: 15493
|
|
2025-04-21 21:22:02,392 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-21 21:22:03,409 - root - INFO - Stored embedding for document aecbb62c-b7ed-4c2e-beff-fe5e292de9f1
|
|
2025-04-21 21:22:56,055 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"
|
|
2025-04-21 21:24:51,709 - root - INFO - Document aecbb62c-b7ed-4c2e-beff-fe5e292de9f1 processed successfully
|
|
2025-04-21 21:24:51,717 - root - INFO - Document aecbb62c-b7ed-4c2e-beff-fe5e292de9f1 processed successfully
|
|
2025-04-21 21:34:15,975 - root - INFO - Processing upload for document ID: a444141c-e0c0-4ee9-a448-4555494aaede
|
|
2025-04-21 21:34:16,015 - root - INFO - File saved to data/uploads/a444141c-e0c0-4ee9-a448-4555494aaede_3.Bill of Quantities.docx
|
|
2025-04-21 21:34:16,058 - root - INFO - Processing document a444141c-e0c0-4ee9-a448-4555494aaede with content length: 13346
|
|
2025-04-21 21:34:16,557 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-21 21:34:18,925 - root - INFO - Stored embedding for document a444141c-e0c0-4ee9-a448-4555494aaede
|
|
2025-04-21 21:35:43,925 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"
|
|
2025-04-21 21:36:09,310 - root - INFO - Document a444141c-e0c0-4ee9-a448-4555494aaede processed successfully
|
|
2025-04-21 21:36:09,328 - root - INFO - Document a444141c-e0c0-4ee9-a448-4555494aaede processed successfully
|
|
2025-04-21 21:38:22,988 - root - ERROR - Error retrieving analysis: list object has no element 1
|
|
2025-04-21 21:38:23,032 - root - ERROR - Traceback (most recent call last):
|
|
File "c:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\src\main.py", line 139, in get_analysis
|
|
return templates.TemplateResponse(
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
File "C:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\.venv\Lib\site-packages\starlette\templating.py", line 209, in TemplateResponse
|
|
return _TemplateResponse(
|
|
^^^^^^^^^^^^^^^^^^
|
|
File "C:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\.venv\Lib\site-packages\starlette\templating.py", line 40, in __init__
|
|
content = template.render(context)
|
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
|
File "C:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\.venv\Lib\site-packages\jinja2\environment.py", line 1295, in render
|
|
self.environment.handle_exception()
|
|
File "C:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\.venv\Lib\site-packages\jinja2\environment.py", line 942, in handle_exception
|
|
raise rewrite_traceback_stack(source=source)
|
|
File "src\templates\analysis.html", line 1, in top-level template code
|
|
{% extends "base.html" %}
|
|
File "src\templates\base.html", line 34, in top-level template code
|
|
{% block content %}{% endblock %}
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
File "src\templates\analysis.html", line 50, in block 'content'
|
|
{{ analysis.recommendations[loop.index0]|markdown|safe }}
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
File "c:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\src\main.py", line 44, in markdown_filter
|
|
return markdown.markdown(text, extensions=['extra', 'nl2br'])
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
File "C:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\.venv\Lib\site-packages\markdown\core.py", line 482, in markdown
|
|
return md.convert(text)
|
|
^^^^^^^^^^^^^^^^
|
|
File "C:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\.venv\Lib\site-packages\markdown\core.py", line 341, in convert
|
|
if not source.strip():
|
|
^^^^^^^^^^^^
|
|
jinja2.exceptions.UndefinedError: list object has no element 1
|
|
|
|
2025-04-21 21:40:33,659 - root - INFO - Processing upload for document ID: befee86e-cc92-43cf-b935-3c8ec78ef275
|
|
2025-04-21 21:40:33,663 - root - INFO - File saved to data/uploads/befee86e-cc92-43cf-b935-3c8ec78ef275_4.Scope of Work.docx
|
|
2025-04-21 21:40:33,694 - root - INFO - Processing document befee86e-cc92-43cf-b935-3c8ec78ef275 with content length: 15493
|
|
2025-04-21 21:40:34,174 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-21 21:40:35,491 - root - INFO - Stored embedding for document befee86e-cc92-43cf-b935-3c8ec78ef275
|
|
2025-04-21 21:41:47,586 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"
|
|
2025-04-21 21:42:22,869 - root - INFO - Document befee86e-cc92-43cf-b935-3c8ec78ef275 processed successfully
|
|
2025-04-21 21:42:22,887 - root - INFO - Document befee86e-cc92-43cf-b935-3c8ec78ef275 processed successfully
|
|
2025-04-21 21:53:13,705 - root - INFO - Processing upload for document ID: 3f10b972-3eb6-43c9-bfeb-777d0a33c4a3
|
|
2025-04-21 21:53:13,713 - root - INFO - File saved to data/uploads/3f10b972-3eb6-43c9-bfeb-777d0a33c4a3_8.form of tender.docx
|
|
2025-04-21 21:53:13,755 - root - INFO - Processing document 3f10b972-3eb6-43c9-bfeb-777d0a33c4a3 with content length: 16555
|
|
2025-04-21 21:53:16,168 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-21 21:53:18,290 - root - INFO - Stored embedding for document 3f10b972-3eb6-43c9-bfeb-777d0a33c4a3
|
|
2025-04-21 21:54:26,818 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"
|
|
2025-04-21 21:55:06,268 - root - INFO - Document 3f10b972-3eb6-43c9-bfeb-777d0a33c4a3 processed successfully
|
|
2025-04-21 21:55:06,268 - root - INFO - Document 3f10b972-3eb6-43c9-bfeb-777d0a33c4a3 processed successfully
|
|
2025-04-21 22:11:40,110 - root - INFO - Processing upload for document ID: 2de0becf-fcfc-4793-aad0-7369180c6980
|
|
2025-04-21 22:11:40,113 - root - INFO - File saved to data/uploads/2de0becf-fcfc-4793-aad0-7369180c6980_3.Bill of Quantities.docx
|
|
2025-04-21 22:11:40,158 - root - INFO - Processing document 2de0becf-fcfc-4793-aad0-7369180c6980 with content length: 13346
|
|
2025-04-21 22:11:40,648 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-21 22:11:41,772 - root - INFO - Stored embedding for document 2de0becf-fcfc-4793-aad0-7369180c6980
|
|
2025-04-21 22:12:35,639 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"
|
|
2025-04-21 22:13:08,662 - root - INFO - Document 2de0becf-fcfc-4793-aad0-7369180c6980 processed successfully
|
|
2025-04-21 22:13:08,672 - root - INFO - Document 2de0becf-fcfc-4793-aad0-7369180c6980 processed successfully
|
|
2025-04-21 22:15:10,060 - root - INFO - Processing upload for document ID: d6c97c26-b59b-4854-bf6d-6f7f58f3dc11
|
|
2025-04-21 22:15:10,060 - root - INFO - File saved to data/uploads/d6c97c26-b59b-4854-bf6d-6f7f58f3dc11_4.Scope of Work.docx
|
|
2025-04-21 22:15:10,140 - root - INFO - Processing document d6c97c26-b59b-4854-bf6d-6f7f58f3dc11 with content length: 15493
|
|
2025-04-21 22:15:10,472 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-21 22:15:11,322 - root - INFO - Stored embedding for document d6c97c26-b59b-4854-bf6d-6f7f58f3dc11
|
|
2025-04-21 22:16:10,272 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"
|
|
2025-04-21 22:16:43,822 - root - INFO - Analysis for document d6c97c26-b59b-4854-bf6d-6f7f58f3dc11: {
|
|
"document_id": "d6c97c26-b59b-4854-bf6d-6f7f58f3dc11",
|
|
"summary": "Here are the key sections and requirements extracted from the document:\n\n### 1. **Content Types**\n - Defines the types of content within the document (e.g., XML files, relationships, themes).\n\n### 2. **Relationships**\n - Specifies the relationships between different parts of the document, such as how document parts are connected.\n\n### 3. **Document Content**\n - Contains the main content of the document, including text and formatting.\n\n### 4. **Document Relationships**\n - Defines the relationships specific to the main document part.\n\n### 5. **Theme**\n - Contains theme-related settings for the document, including color schemes and fonts.\n\n### 6. **Settings**\n - Includes document settings such as proofing options, compatibility settings, and other Word-specific configurations.\n\n### 7. **Numbering**\n - Defines numbering formats and styles used in the document.\n\n### 8. **Styles**\n - Specifies the styles applied throughout the document, including paragraph and character styles.\n\n### 9. **Web Settings**\n - Contains settings related to how the document is displayed in a web browser.\n\n### 10. **Font Table**\n - Lists the fonts used in the document and their properties.\n\n### 11. **Core Properties**\n - Includes metadata about the document, such as title, author, and creation date.\n\n### 12. **Application Properties**\n - Contains properties specific to the application (e.g., Microsoft Word) used to create the document.\n\n### Key Requirements:\n- The document must adhere to the defined content types and relationships.\n- Proper formatting and styling must be applied as specified in the styles and numbering sections.\n- Theme settings must be consistent throughout the document.\n- Metadata (core and application properties) must be accurately filled out.\n- Web settings should ensure proper display if the document is viewed online.\n\nThis summary provides an overview of the document's structure and key components, which are essential for maintaining consistency and functionality in the Word document.",
|
|
"issues": [
|
|
{
|
|
"issue": null,
|
|
"severity": "high",
|
|
"rank": 1
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "high",
|
|
"rank": 2
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "high",
|
|
"rank": 3
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "medium",
|
|
"rank": 4
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "medium",
|
|
"rank": 5
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "medium",
|
|
"rank": 6
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 7
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 8
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 9
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 10
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 11
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 12
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 13
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 14
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 15
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 16
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 17
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 18
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 19
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 20
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 21
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 22
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 23
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 24
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 25
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 26
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 27
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 28
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 29
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 30
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 31
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 32
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 33
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 34
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 35
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 36
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 37
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 38
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 39
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 40
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 41
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 42
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 43
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 44
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 45
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 46
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 47
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 48
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 49
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 50
|
|
}
|
|
],
|
|
"recommendations": [
|
|
"### **Comprehensive Compliance Recommendation for Document Issues** \n\n#### **1. Missing or Unidentified Documents (Indexes 0, 49, 1, 30, 25)** \n**Issue:** Several documents are either missing (`document=None`) or improperly indexed, leading to potential compliance risks. \n\n**Action Steps:** \n- **Conduct a Document Audit:** \n - Identify all missing documents (`index=0, 49, 1, 30, 25`) and verify if they were misplaced, incorrectly labeled, or never properly stored. \n - Cross-reference with a master document registry to confirm expected files. \n- **Implement a Document Tracking System:** \n - Use a **Document Management System (DMS)** with version control, metadata tagging, and audit trails. \n - Assign unique identifiers (e.g., `DOC-001`, `POL-2024-01`) to prevent indexing errors. \n- **Enforce Mandatory Metadata:** \n - Require fields such as: \n - Document title \n - Version number \n - Creation/modification dates \n - Author/owner \n - Compliance status (e.g., \"Reviewed,\" \"Pending Approval\") \n\n#### **2. Inconsistent Relevance Scores (Ranging from 0.23 to 0.98)** \n**Issue:** The relevance scores vary significantly, indicating potential inconsistencies in document classification, retrieval, or applicability. \n\n**Action Steps:** \n- **Standardize Relevance Scoring:** \n - Define clear criteria (e.g., regulatory impact, business criticality, frequency of use) to score documents objectively. \n - Use AI/ML tools (if available) to auto-tag relevance based on content analysis. \n- **Review Low-Scoring Documents (Indexes 25, 30):** \n - Determine if these documents are obsolete, redundant, or incorrectly tagged. \n - Archive or deprecate irrelevant files to reduce clutter. \n- **Flag High-Scoring Documents (Indexes 0, 49, 1):** \n - Prioritize review and updates for documents with high relevance to ensure compliance. \n\n#### **3. General Compliance Framework Enhancement** \nTo prevent recurrence, implement the following best practices: \n- **Automated Compliance Checks:** \n - Use compliance software (e.g., **OneTrust, LogicGate, Smarsh**) to flag missing/expired documents. \n- **Regular Training & Accountability:** \n - Train staff on proper document handling and compliance protocols. \n - Assign a **Compliance Officer** to oversee document integrity. \n- **Periodic Reassessment:** \n - Conduct quarterly audits to verify document accuracy, relevance, and accessibility. \n\n### **Final Recommendation Summary** \n1. **Locate & properly index missing documents.** \n2. **Adopt a DMS with strict metadata requirements.** \n3. **Standardize relevance scoring and purge obsolete files.** \n4. **Automate compliance monitoring and enforce accountability.** \n\nBy implementing these steps, the organization can resolve current compliance gaps and establish a robust, future-proof document management process. \n\nWould you like a tailored checklist for immediate execution?"
|
|
]
|
|
}
|
|
2025-04-21 22:16:43,889 - root - INFO - Document d6c97c26-b59b-4854-bf6d-6f7f58f3dc11 processed successfully
|
|
2025-04-21 22:16:43,903 - root - INFO - Document d6c97c26-b59b-4854-bf6d-6f7f58f3dc11 processed successfully
|
|
2025-04-21 22:16:54,195 - root - INFO - Retrieved analysis for document d6c97c26-b59b-4854-bf6d-6f7f58f3dc11: {
|
|
"document_id": "d6c97c26-b59b-4854-bf6d-6f7f58f3dc11",
|
|
"summary": "Here are the key sections and requirements extracted from the document:\n\n### 1. **Content Types**\n - Defines the types of content within the document (e.g., XML files, relationships, themes).\n\n### 2. **Relationships**\n - Specifies the relationships between different parts of the document, such as how document parts are connected.\n\n### 3. **Document Content**\n - Contains the main content of the document, including text and formatting.\n\n### 4. **Document Relationships**\n - Defines the relationships specific to the main document part.\n\n### 5. **Theme**\n - Contains theme-related settings for the document, including color schemes and fonts.\n\n### 6. **Settings**\n - Includes document settings such as proofing options, compatibility settings, and other Word-specific configurations.\n\n### 7. **Numbering**\n - Defines numbering formats and styles used in the document.\n\n### 8. **Styles**\n - Specifies the styles applied throughout the document, including paragraph and character styles.\n\n### 9. **Web Settings**\n - Contains settings related to how the document is displayed in a web browser.\n\n### 10. **Font Table**\n - Lists the fonts used in the document and their properties.\n\n### 11. **Core Properties**\n - Includes metadata about the document, such as title, author, and creation date.\n\n### 12. **Application Properties**\n - Contains properties specific to the application (e.g., Microsoft Word) used to create the document.\n\n### Key Requirements:\n- The document must adhere to the defined content types and relationships.\n- Proper formatting and styling must be applied as specified in the styles and numbering sections.\n- Theme settings must be consistent throughout the document.\n- Metadata (core and application properties) must be accurately filled out.\n- Web settings should ensure proper display if the document is viewed online.\n\nThis summary provides an overview of the document's structure and key components, which are essential for maintaining consistency and functionality in the Word document.",
|
|
"issues": [
|
|
{
|
|
"issue": null,
|
|
"severity": "high",
|
|
"rank": 1
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "high",
|
|
"rank": 2
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "high",
|
|
"rank": 3
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "medium",
|
|
"rank": 4
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "medium",
|
|
"rank": 5
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "medium",
|
|
"rank": 6
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 7
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 8
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 9
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 10
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 11
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 12
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 13
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 14
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 15
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 16
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 17
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 18
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 19
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 20
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 21
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 22
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 23
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 24
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 25
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 26
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 27
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 28
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 29
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 30
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 31
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 32
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 33
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 34
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 35
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 36
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 37
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 38
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 39
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 40
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 41
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 42
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 43
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 44
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 45
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 46
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 47
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 48
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 49
|
|
},
|
|
{
|
|
"issue": null,
|
|
"severity": "low",
|
|
"rank": 50
|
|
}
|
|
],
|
|
"recommendations": [
|
|
"### **Comprehensive Compliance Recommendation for Document Issues** \n\n#### **1. Missing or Unidentified Documents (Indexes 0, 49, 1, 30, 25)** \n**Issue:** Several documents are either missing (`document=None`) or improperly indexed, leading to potential compliance risks. \n\n**Action Steps:** \n- **Conduct a Document Audit:** \n - Identify all missing documents (`index=0, 49, 1, 30, 25`) and verify if they were misplaced, incorrectly labeled, or never properly stored. \n - Cross-reference with a master document registry to confirm expected files. \n- **Implement a Document Tracking System:** \n - Use a **Document Management System (DMS)** with version control, metadata tagging, and audit trails. \n - Assign unique identifiers (e.g., `DOC-001`, `POL-2024-01`) to prevent indexing errors. \n- **Enforce Mandatory Metadata:** \n - Require fields such as: \n - Document title \n - Version number \n - Creation/modification dates \n - Author/owner \n - Compliance status (e.g., \"Reviewed,\" \"Pending Approval\") \n\n#### **2. Inconsistent Relevance Scores (Ranging from 0.23 to 0.98)** \n**Issue:** The relevance scores vary significantly, indicating potential inconsistencies in document classification, retrieval, or applicability. \n\n**Action Steps:** \n- **Standardize Relevance Scoring:** \n - Define clear criteria (e.g., regulatory impact, business criticality, frequency of use) to score documents objectively. \n - Use AI/ML tools (if available) to auto-tag relevance based on content analysis. \n- **Review Low-Scoring Documents (Indexes 25, 30):** \n - Determine if these documents are obsolete, redundant, or incorrectly tagged. \n - Archive or deprecate irrelevant files to reduce clutter. \n- **Flag High-Scoring Documents (Indexes 0, 49, 1):** \n - Prioritize review and updates for documents with high relevance to ensure compliance. \n\n#### **3. General Compliance Framework Enhancement** \nTo prevent recurrence, implement the following best practices: \n- **Automated Compliance Checks:** \n - Use compliance software (e.g., **OneTrust, LogicGate, Smarsh**) to flag missing/expired documents. \n- **Regular Training & Accountability:** \n - Train staff on proper document handling and compliance protocols. \n - Assign a **Compliance Officer** to oversee document integrity. \n- **Periodic Reassessment:** \n - Conduct quarterly audits to verify document accuracy, relevance, and accessibility. \n\n### **Final Recommendation Summary** \n1. **Locate & properly index missing documents.** \n2. **Adopt a DMS with strict metadata requirements.** \n3. **Standardize relevance scoring and purge obsolete files.** \n4. **Automate compliance monitoring and enforce accountability.** \n\nBy implementing these steps, the organization can resolve current compliance gaps and establish a robust, future-proof document management process. \n\nWould you like a tailored checklist for immediate execution?"
|
|
]
|
|
}
|
|
2025-04-21 22:18:34,156 - root - ERROR - Error retrieving analysis: list object has no element 1
|
|
2025-04-21 22:18:34,163 - root - ERROR - Traceback (most recent call last):
|
|
File "c:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\src\main.py", line 138, in get_analysis
|
|
return templates.TemplateResponse(
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
File "C:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\.venv\Lib\site-packages\starlette\templating.py", line 209, in TemplateResponse
|
|
return _TemplateResponse(
|
|
^^^^^^^^^^^^^^^^^^
|
|
File "C:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\.venv\Lib\site-packages\starlette\templating.py", line 40, in __init__
|
|
content = template.render(context)
|
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
|
File "C:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\.venv\Lib\site-packages\jinja2\environment.py", line 1295, in render
|
|
self.environment.handle_exception()
|
|
File "C:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\.venv\Lib\site-packages\jinja2\environment.py", line 942, in handle_exception
|
|
raise rewrite_traceback_stack(source=source)
|
|
File "src\templates\analysis.html", line 1, in top-level template code
|
|
{% extends "base.html" %}
|
|
File "src\templates\base.html", line 34, in top-level template code
|
|
{% block content %}{% endblock %}
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
File "src\templates\analysis.html", line 50, in block 'content'
|
|
{{ analysis.recommendations[loop.index0]|markdown|safe }}
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
File "c:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\src\main.py", line 44, in markdown_filter
|
|
return markdown.markdown(text, extensions=['extra', 'nl2br'])
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
File "C:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\.venv\Lib\site-packages\markdown\core.py", line 482, in markdown
|
|
return md.convert(text)
|
|
^^^^^^^^^^^^^^^^
|
|
File "C:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\.venv\Lib\site-packages\markdown\core.py", line 341, in convert
|
|
if not source.strip():
|
|
^^^^^^^^^^^^
|
|
jinja2.exceptions.UndefinedError: list object has no element 1
|
|
|
|
2025-04-21 22:18:40,245 - root - INFO - Deleted uploaded file: data/uploads/d6c97c26-b59b-4854-bf6d-6f7f58f3dc11_4.Scope of Work.docx
|
|
2025-04-21 22:18:41,105 - root - INFO - Deleted document d6c97c26-b59b-4854-bf6d-6f7f58f3dc11 from index
|
|
2025-04-21 22:18:41,105 - root - INFO - Removed document d6c97c26-b59b-4854-bf6d-6f7f58f3dc11 from vector store
|
|
2025-04-21 22:18:49,878 - root - INFO - Deleted uploaded file: data/uploads/2de0becf-fcfc-4793-aad0-7369180c6980_3.Bill of Quantities.docx
|
|
2025-04-21 22:18:50,166 - root - INFO - Deleted document 2de0becf-fcfc-4793-aad0-7369180c6980 from index
|
|
2025-04-21 22:18:50,166 - root - INFO - Removed document 2de0becf-fcfc-4793-aad0-7369180c6980 from vector store
|
|
2025-04-21 22:18:54,080 - root - INFO - Deleted uploaded file: data/uploads/3f10b972-3eb6-43c9-bfeb-777d0a33c4a3_8.form of tender.docx
|
|
2025-04-21 22:18:54,321 - root - INFO - Deleted document 3f10b972-3eb6-43c9-bfeb-777d0a33c4a3 from index
|
|
2025-04-21 22:18:54,322 - root - INFO - Removed document 3f10b972-3eb6-43c9-bfeb-777d0a33c4a3 from vector store
|
|
2025-04-21 22:18:57,881 - root - INFO - Deleted uploaded file: data/uploads/befee86e-cc92-43cf-b935-3c8ec78ef275_4.Scope of Work.docx
|
|
2025-04-21 22:18:58,105 - root - INFO - Deleted document befee86e-cc92-43cf-b935-3c8ec78ef275 from index
|
|
2025-04-21 22:18:58,105 - root - INFO - Removed document befee86e-cc92-43cf-b935-3c8ec78ef275 from vector store
|
|
2025-04-21 22:19:02,930 - root - INFO - Deleted uploaded file: data/uploads/a444141c-e0c0-4ee9-a448-4555494aaede_3.Bill of Quantities.docx
|
|
2025-04-21 22:19:03,739 - root - INFO - Deleted document a444141c-e0c0-4ee9-a448-4555494aaede from index
|
|
2025-04-21 22:19:03,748 - root - INFO - Removed document a444141c-e0c0-4ee9-a448-4555494aaede from vector store
|
|
2025-04-21 22:19:18,005 - root - INFO - Processing upload for document ID: 77063b1d-633c-421e-9591-cde2eb90a979
|
|
2025-04-21 22:19:18,008 - root - INFO - File saved to data/uploads/77063b1d-633c-421e-9591-cde2eb90a979_7.Supplier SQualification requirements.docx
|
|
2025-04-21 22:19:18,041 - root - INFO - Processing document 77063b1d-633c-421e-9591-cde2eb90a979 with content length: 15335
|
|
2025-04-21 22:19:18,695 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-21 22:19:19,239 - root - INFO - Stored embedding for document 77063b1d-633c-421e-9591-cde2eb90a979
|
|
2025-04-21 22:20:24,699 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"
|
|
2025-04-21 22:22:05,759 - root - INFO - Document 77063b1d-633c-421e-9591-cde2eb90a979 processed successfully
|
|
2025-04-21 22:22:05,775 - root - INFO - Document 77063b1d-633c-421e-9591-cde2eb90a979 processed successfully
|
|
2025-04-21 22:31:21,934 - root - INFO - Processing upload for document ID: e79aeb90-799a-4d06-9efd-1d19315eebcc
|
|
2025-04-21 22:31:21,936 - root - INFO - File saved to data/uploads/e79aeb90-799a-4d06-9efd-1d19315eebcc_2.Tender Specifications.docx
|
|
2025-04-21 22:31:21,953 - root - INFO - Processing document e79aeb90-799a-4d06-9efd-1d19315eebcc with content length: 17509
|
|
2025-04-21 22:32:04,265 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-21 22:32:05,457 - root - INFO - Stored embedding for document e79aeb90-799a-4d06-9efd-1d19315eebcc
|
|
2025-04-21 22:33:04,556 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"
|
|
2025-04-21 22:34:43,603 - root - INFO - Document e79aeb90-799a-4d06-9efd-1d19315eebcc processed successfully
|
|
2025-04-21 22:34:43,611 - root - INFO - Document e79aeb90-799a-4d06-9efd-1d19315eebcc processed successfully
|
|
2025-04-22 09:39:48,962 - root - INFO - Processing upload for document ID: e838ee14-75a7-483a-9a6e-46b218127dc5
|
|
2025-04-22 09:39:48,968 - root - INFO - File saved to data/uploads/e838ee14-75a7-483a-9a6e-46b218127dc5_7.Supplier SQualification requirements.docx
|
|
2025-04-22 09:39:49,014 - root - INFO - Processing document e838ee14-75a7-483a-9a6e-46b218127dc5 with content length: 15335
|
|
2025-04-22 09:39:50,452 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-22 09:39:51,219 - root - INFO - Stored embedding for document e838ee14-75a7-483a-9a6e-46b218127dc5
|
|
2025-04-22 09:41:03,267 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"
|
|
2025-04-22 09:41:56,158 - root - INFO - Document e838ee14-75a7-483a-9a6e-46b218127dc5 processed successfully
|
|
2025-04-22 09:41:56,169 - root - INFO - Document e838ee14-75a7-483a-9a6e-46b218127dc5 processed successfully
|
|
2025-04-22 09:50:01,851 - root - INFO - Processing upload for document ID: eaed1290-2993-4133-9a04-6a4e5cb8b431
|
|
2025-04-22 09:50:01,854 - root - INFO - File saved to data/uploads/eaed1290-2993-4133-9a04-6a4e5cb8b431_7.Supplier SQualification requirements.docx
|
|
2025-04-22 09:50:02,083 - root - ERROR - Error reading Word document: No module named 'exceptions'
|
|
2025-04-22 09:50:02,083 - root - INFO - Processing document eaed1290-2993-4133-9a04-6a4e5cb8b431 with content length: 0
|
|
2025-04-22 09:50:02,520 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-22 09:50:03,207 - root - INFO - Stored embedding for document eaed1290-2993-4133-9a04-6a4e5cb8b431
|
|
2025-04-22 09:50:47,257 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"
|
|
2025-04-22 09:51:36,358 - root - INFO - Document eaed1290-2993-4133-9a04-6a4e5cb8b431 processed successfully
|
|
2025-04-22 09:51:36,377 - root - INFO - Document eaed1290-2993-4133-9a04-6a4e5cb8b431 processed successfully
|
|
2025-04-22 09:55:01,750 - root - INFO - Processing upload for document ID: aec927db-9f43-49b9-90af-2af8ad64e793
|
|
2025-04-22 09:55:01,752 - root - INFO - File saved to data/uploads/aec927db-9f43-49b9-90af-2af8ad64e793_7.Supplier SQualification requirements.docx
|
|
2025-04-22 09:55:01,791 - root - ERROR - Error reading Word document: No module named 'exceptions'
|
|
2025-04-22 09:55:01,791 - root - INFO - Processing document aec927db-9f43-49b9-90af-2af8ad64e793 with content length: 0
|
|
2025-04-22 09:55:02,033 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-22 09:55:02,565 - root - INFO - Stored embedding for document aec927db-9f43-49b9-90af-2af8ad64e793
|
|
2025-04-22 09:55:57,175 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"
|
|
2025-04-22 09:56:48,889 - root - INFO - Document aec927db-9f43-49b9-90af-2af8ad64e793 processed successfully
|
|
2025-04-22 09:56:48,898 - root - INFO - Document aec927db-9f43-49b9-90af-2af8ad64e793 processed successfully
|
|
2025-04-22 10:00:35,648 - root - INFO - Processing upload for document ID: 48480333-d451-4907-988b-f059166fd1a5
|
|
2025-04-22 10:00:35,652 - root - INFO - File saved to data/uploads/48480333-d451-4907-988b-f059166fd1a5_9.confidentiality agreement.docx
|
|
2025-04-22 10:00:36,025 - root - INFO - Processing document 48480333-d451-4907-988b-f059166fd1a5 with content length: 161
|
|
2025-04-22 10:00:36,689 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-22 10:00:37,466 - root - INFO - Stored embedding for document 48480333-d451-4907-988b-f059166fd1a5
|
|
2025-04-22 10:01:31,476 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"
|
|
2025-04-22 10:02:20,575 - root - INFO - Document 48480333-d451-4907-988b-f059166fd1a5 processed successfully
|
|
2025-04-22 10:02:20,592 - root - INFO - Document 48480333-d451-4907-988b-f059166fd1a5 processed successfully
|
|
2025-04-22 10:06:04,139 - root - INFO - Deleted document f1d07dde-5de4-4bf6-a14d-b69a433aa855 from index
|
|
2025-04-22 10:06:04,142 - root - INFO - Removed document f1d07dde-5de4-4bf6-a14d-b69a433aa855 from vector store
|
|
2025-04-22 10:06:04,289 - root - ERROR - Error retrieving metadata for document f1d07dde-5de4-4bf6-a14d-b69a433aa855: Metadata not found for document f1d07dde-5de4-4bf6-a14d-b69a433aa855
|
|
2025-04-22 10:06:04,289 - root - ERROR - Error deleting document: Metadata not found for document f1d07dde-5de4-4bf6-a14d-b69a433aa855
|
|
2025-04-22 10:06:04,289 - root - ERROR - Traceback (most recent call last):
|
|
File "C:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\src\main.py", line 213, in delete_document
|
|
metadata = database.get_metadata(doc_id)
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
File "C:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\src\services\database.py", line 114, in get_metadata
|
|
raise FileNotFoundError(f"Metadata not found for document {document_id}")
|
|
FileNotFoundError: Metadata not found for document f1d07dde-5de4-4bf6-a14d-b69a433aa855
|
|
|
|
2025-04-22 10:06:08,722 - root - INFO - Deleted document 82d9da57-6291-4ddb-b2e3-d43f467f4dd0 from index
|
|
2025-04-22 10:06:08,722 - root - INFO - Removed document 82d9da57-6291-4ddb-b2e3-d43f467f4dd0 from vector store
|
|
2025-04-22 10:06:13,324 - root - INFO - Deleted uploaded file: data/uploads/78afc395-9b7c-4388-8d8b-aa1d02fbf75f_2.Tender Specifications.docx
|
|
2025-04-22 10:06:13,533 - root - INFO - Deleted document 78afc395-9b7c-4388-8d8b-aa1d02fbf75f from index
|
|
2025-04-22 10:06:13,533 - root - INFO - Removed document 78afc395-9b7c-4388-8d8b-aa1d02fbf75f from vector store
|
|
2025-04-22 10:06:17,591 - root - INFO - Deleted uploaded file: data/uploads/aecbb62c-b7ed-4c2e-beff-fe5e292de9f1_4.Scope of Work.docx
|
|
2025-04-22 10:06:17,860 - root - INFO - Deleted document aecbb62c-b7ed-4c2e-beff-fe5e292de9f1 from index
|
|
2025-04-22 10:06:17,860 - root - INFO - Removed document aecbb62c-b7ed-4c2e-beff-fe5e292de9f1 from vector store
|
|
2025-04-22 10:06:22,432 - root - INFO - Deleted uploaded file: data/uploads/77063b1d-633c-421e-9591-cde2eb90a979_7.Supplier SQualification requirements.docx
|
|
2025-04-22 10:06:22,572 - root - INFO - Deleted document 77063b1d-633c-421e-9591-cde2eb90a979 from index
|
|
2025-04-22 10:06:22,572 - root - INFO - Removed document 77063b1d-633c-421e-9591-cde2eb90a979 from vector store
|
|
2025-04-22 10:06:26,959 - root - INFO - Deleted uploaded file: data/uploads/e79aeb90-799a-4d06-9efd-1d19315eebcc_2.Tender Specifications.docx
|
|
2025-04-22 10:06:27,114 - root - INFO - Deleted document e79aeb90-799a-4d06-9efd-1d19315eebcc from index
|
|
2025-04-22 10:06:27,119 - root - INFO - Removed document e79aeb90-799a-4d06-9efd-1d19315eebcc from vector store
|
|
2025-04-22 10:06:31,340 - root - INFO - Deleted uploaded file: data/uploads/e838ee14-75a7-483a-9a6e-46b218127dc5_7.Supplier SQualification requirements.docx
|
|
2025-04-22 10:06:31,518 - root - INFO - Deleted document e838ee14-75a7-483a-9a6e-46b218127dc5 from index
|
|
2025-04-22 10:06:31,520 - root - INFO - Removed document e838ee14-75a7-483a-9a6e-46b218127dc5 from vector store
|
|
2025-04-22 10:06:50,772 - root - INFO - Processing upload for document ID: 788f85e4-a873-407b-bfbc-0b7d6a676b9a
|
|
2025-04-22 10:06:50,779 - root - INFO - File saved to data/uploads/788f85e4-a873-407b-bfbc-0b7d6a676b9a_2.Tender Specifications.docx
|
|
2025-04-22 10:06:51,181 - root - INFO - Processing document 788f85e4-a873-407b-bfbc-0b7d6a676b9a with content length: 2363
|
|
2025-04-22 10:06:51,477 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-22 10:06:52,399 - root - INFO - Stored embedding for document 788f85e4-a873-407b-bfbc-0b7d6a676b9a
|
|
2025-04-22 10:08:26,609 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"
|
|
2025-04-22 10:09:18,265 - root - INFO - Document 788f85e4-a873-407b-bfbc-0b7d6a676b9a processed successfully
|
|
2025-04-22 10:09:18,287 - root - INFO - Document 788f85e4-a873-407b-bfbc-0b7d6a676b9a processed successfully
|
|
2025-04-22 10:31:42,225 - root - INFO - Processing upload for document ID: 804d09d1-22b8-4e49-9bab-6fd7d34181b7
|
|
2025-04-22 10:31:42,227 - root - INFO - File saved to data/uploads/804d09d1-22b8-4e49-9bab-6fd7d34181b7_7.Supplier SQualification requirements.docx
|
|
2025-04-22 10:31:42,381 - root - INFO - Processing document 804d09d1-22b8-4e49-9bab-6fd7d34181b7 with content length: 229
|
|
2025-04-22 10:31:42,613 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-22 10:31:44,246 - root - INFO - Stored embedding for document 804d09d1-22b8-4e49-9bab-6fd7d34181b7
|
|
2025-04-22 10:32:27,429 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"
|
|
2025-04-22 10:33:19,434 - root - INFO - Document 804d09d1-22b8-4e49-9bab-6fd7d34181b7 processed successfully
|
|
2025-04-22 10:33:19,446 - root - INFO - Document 804d09d1-22b8-4e49-9bab-6fd7d34181b7 processed successfully
|
|
2025-04-22 10:51:21,371 - root - INFO - Processing upload for document ID: 8d580b49-94bb-473e-90dd-66ee00f77048
|
|
2025-04-22 10:51:21,372 - root - INFO - File saved to data/uploads/8d580b49-94bb-473e-90dd-66ee00f77048_8.form of tender.docx
|
|
2025-04-22 10:51:21,496 - root - INFO - Processing document 8d580b49-94bb-473e-90dd-66ee00f77048 with content length: 523
|
|
2025-04-22 10:51:21,994 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-22 10:51:22,895 - root - INFO - Stored embedding for document 8d580b49-94bb-473e-90dd-66ee00f77048
|
|
2025-04-22 10:52:05,978 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 400 Bad Request"
|
|
2025-04-22 10:52:05,978 - root - ERROR - Error using Cohere reranker: status_code: 400, body: {'message': 'invalid request: number of total max chunks (number of documents * max chunks per doc) must be less than 10000'}
|
|
2025-04-22 10:56:57,188 - root - INFO - Processing upload for document ID: 5ab90386-7d4e-45b2-a1a6-40ad23f59428
|
|
2025-04-22 10:56:57,189 - root - INFO - File saved to data/uploads/5ab90386-7d4e-45b2-a1a6-40ad23f59428_8.form of tender.docx
|
|
2025-04-22 10:56:57,312 - root - INFO - Processing document 5ab90386-7d4e-45b2-a1a6-40ad23f59428 with content length: 523
|
|
2025-04-22 10:56:57,661 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-22 10:56:58,794 - root - INFO - Stored embedding for document 5ab90386-7d4e-45b2-a1a6-40ad23f59428
|
|
2025-04-22 10:57:57,544 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 400 Bad Request"
|
|
2025-04-22 10:57:57,544 - root - ERROR - Error using Cohere reranker: status_code: 400, body: {'message': 'invalid request: list of documents must not be empty'}
|
|
2025-04-22 10:57:57,563 - root - INFO - Document 5ab90386-7d4e-45b2-a1a6-40ad23f59428 processed successfully
|
|
2025-04-22 10:57:57,563 - root - INFO - Document 5ab90386-7d4e-45b2-a1a6-40ad23f59428 processed successfully
|
|
2025-04-22 11:04:23,803 - root - INFO - Processing upload for document ID: 3065e0dd-0b2e-454c-8f7e-dd0e464dbf7a
|
|
2025-04-22 11:04:23,806 - root - INFO - File saved to data/uploads/3065e0dd-0b2e-454c-8f7e-dd0e464dbf7a_3.Bill of Quantities.docx
|
|
2025-04-22 11:04:23,950 - root - INFO - Processing document 3065e0dd-0b2e-454c-8f7e-dd0e464dbf7a with content length: 2057
|
|
2025-04-22 11:04:24,294 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-22 11:04:24,927 - root - INFO - Stored embedding for document 3065e0dd-0b2e-454c-8f7e-dd0e464dbf7a
|
|
2025-04-22 11:05:28,953 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 400 Bad Request"
|
|
2025-04-22 11:05:28,953 - root - ERROR - Error using Cohere reranker: status_code: 400, body: {'message': 'invalid request: list of documents must not be empty'}
|
|
2025-04-22 11:05:28,969 - root - INFO - Document 3065e0dd-0b2e-454c-8f7e-dd0e464dbf7a processed successfully
|
|
2025-04-22 11:05:28,980 - root - INFO - Document 3065e0dd-0b2e-454c-8f7e-dd0e464dbf7a processed successfully
|
|
2025-04-22 11:13:54,204 - root - INFO - Processing upload for document ID: 201e4896-3d89-466b-852d-783ef3e30f83
|
|
2025-04-22 11:13:54,206 - root - INFO - File saved to data/uploads/201e4896-3d89-466b-852d-783ef3e30f83_7.Supplier SQualification requirements.docx
|
|
2025-04-22 11:13:54,311 - root - INFO - Processing document 201e4896-3d89-466b-852d-783ef3e30f83 with content length: 229
|
|
2025-04-22 11:13:54,726 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-22 11:13:55,243 - root - INFO - Stored embedding for document 201e4896-3d89-466b-852d-783ef3e30f83
|
|
2025-04-22 11:14:42,644 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 400 Bad Request"
|
|
2025-04-22 11:14:42,644 - root - ERROR - Error using Cohere reranker: status_code: 400, body: {'message': 'invalid request: list of documents must not be empty'}
|
|
2025-04-22 11:14:42,644 - root - INFO - Document 201e4896-3d89-466b-852d-783ef3e30f83 processed successfully
|
|
2025-04-22 11:14:42,659 - root - INFO - Document 201e4896-3d89-466b-852d-783ef3e30f83 processed successfully
|
|
2025-04-22 11:18:42,103 - root - INFO - Processing upload for document ID: bc1e71ac-6b65-4b2c-b1e8-81844b49ba5d
|
|
2025-04-22 11:18:42,104 - root - INFO - File saved to data/uploads/bc1e71ac-6b65-4b2c-b1e8-81844b49ba5d_7.Supplier SQualification requirements.docx
|
|
2025-04-22 11:18:42,225 - root - INFO - Processing document bc1e71ac-6b65-4b2c-b1e8-81844b49ba5d with content length: 229
|
|
2025-04-22 11:18:42,443 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-22 11:18:45,759 - root - INFO - Stored embedding for document bc1e71ac-6b65-4b2c-b1e8-81844b49ba5d
|
|
2025-04-22 11:19:35,113 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"
|
|
2025-04-22 11:19:35,138 - root - INFO - Document bc1e71ac-6b65-4b2c-b1e8-81844b49ba5d processed successfully
|
|
2025-04-22 11:19:35,145 - root - INFO - Document bc1e71ac-6b65-4b2c-b1e8-81844b49ba5d processed successfully
|
|
2025-04-22 11:27:23,936 - root - INFO - Processing upload for document ID: a76f7f9c-59ec-4f7b-8b4e-168f2db5f92e
|
|
2025-04-22 11:27:23,938 - root - INFO - File saved to data/uploads/a76f7f9c-59ec-4f7b-8b4e-168f2db5f92e_7.Supplier SQualification requirements.docx
|
|
2025-04-22 11:27:24,051 - root - INFO - Processing document a76f7f9c-59ec-4f7b-8b4e-168f2db5f92e with content length: 229
|
|
2025-04-22 11:27:24,259 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-22 11:27:24,775 - root - INFO - Stored embedding for document a76f7f9c-59ec-4f7b-8b4e-168f2db5f92e
|
|
2025-04-22 11:28:04,075 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"
|
|
2025-04-22 11:28:04,108 - root - INFO - Document a76f7f9c-59ec-4f7b-8b4e-168f2db5f92e processed successfully
|
|
2025-04-22 11:28:04,108 - root - INFO - Document a76f7f9c-59ec-4f7b-8b4e-168f2db5f92e processed successfully
|
|
2025-04-22 11:33:42,952 - root - INFO - Processing upload for document ID: 6fd270a4-e76a-4234-a4c4-a40d2ac64d56
|
|
2025-04-22 11:33:42,954 - root - INFO - File saved to data/uploads/6fd270a4-e76a-4234-a4c4-a40d2ac64d56_4.Scope of Work.docx
|
|
2025-04-22 11:33:43,183 - root - INFO - Processing document 6fd270a4-e76a-4234-a4c4-a40d2ac64d56 with content length: 282
|
|
2025-04-22 11:33:43,425 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-22 11:33:43,875 - root - INFO - Stored embedding for document 6fd270a4-e76a-4234-a4c4-a40d2ac64d56
|
|
2025-04-22 11:34:30,942 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"
|
|
2025-04-22 11:34:30,960 - root - INFO - Document 6fd270a4-e76a-4234-a4c4-a40d2ac64d56 processed successfully
|
|
2025-04-22 11:34:30,960 - root - INFO - Document 6fd270a4-e76a-4234-a4c4-a40d2ac64d56 processed successfully
|
|
2025-04-22 11:43:04,651 - root - INFO - Processing upload for document ID: e92e078c-6d36-46b1-89ba-03f7387947e9
|
|
2025-04-22 11:43:04,652 - root - INFO - File saved to data/uploads/e92e078c-6d36-46b1-89ba-03f7387947e9_9.confidentiality agreement.docx
|
|
2025-04-22 11:43:04,758 - root - INFO - Processing document e92e078c-6d36-46b1-89ba-03f7387947e9 with content length: 161
|
|
2025-04-22 11:43:05,057 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-22 11:43:05,924 - root - INFO - Stored embedding for document e92e078c-6d36-46b1-89ba-03f7387947e9
|
|
2025-04-22 11:43:46,791 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/rerank "HTTP/1.1 200 OK"
|
|
2025-04-22 11:43:46,807 - root - INFO - Document e92e078c-6d36-46b1-89ba-03f7387947e9 processed successfully
|
|
2025-04-22 11:43:46,807 - root - INFO - Document e92e078c-6d36-46b1-89ba-03f7387947e9 processed successfully
|
|
2025-04-22 11:52:38,067 - root - INFO - Processing upload for document ID: 2813c3dc-8496-4aff-b945-0b04e5c439c0
|
|
2025-04-22 11:52:38,069 - root - INFO - File saved to data/uploads/2813c3dc-8496-4aff-b945-0b04e5c439c0_7.Supplier SQualification requirements.docx
|
|
2025-04-22 11:52:38,184 - root - INFO - Processing document 2813c3dc-8496-4aff-b945-0b04e5c439c0 with content length: 229
|
|
2025-04-22 11:52:38,574 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-22 11:52:39,190 - root - INFO - Stored embedding for document 2813c3dc-8496-4aff-b945-0b04e5c439c0
|
|
2025-04-22 11:53:23,757 - root - ERROR - Error saving analysis for document 2813c3dc-8496-4aff-b945-0b04e5c439c0: table analysis has no column named issues_and_recommendations
|
|
2025-04-22 11:53:23,757 - root - ERROR - Error processing document 2813c3dc-8496-4aff-b945-0b04e5c439c0: table analysis has no column named issues_and_recommendations
|
|
2025-04-22 11:53:23,757 - root - ERROR - Error processing document: table analysis has no column named issues_and_recommendations
|
|
2025-04-22 11:53:23,783 - root - ERROR - Traceback (most recent call last):
|
|
File "C:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\src\main.py", line 112, in upload_document
|
|
await document_processor.process_document(doc_id, file_path, document_type)
|
|
File "C:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\src\services\document_processor.py", line 141, in process_document
|
|
self.database.save_analysis(doc_id, analysis)
|
|
File "C:\Users\babaw\Documents\Work\Mana Knight Digital\ds_task_scp\src\services\database.py", line 50, in save_analysis
|
|
cursor.execute('''
|
|
sqlite3.OperationalError: table analysis has no column named issues_and_recommendations
|
|
|
|
2025-04-22 11:55:32,835 - root - INFO - Processing upload for document ID: 9dc21524-8c93-427b-a6cc-04b7585a9545
|
|
2025-04-22 11:55:32,836 - root - INFO - File saved to data/uploads/9dc21524-8c93-427b-a6cc-04b7585a9545_8.form of tender.docx
|
|
2025-04-22 11:55:32,958 - root - INFO - Processing document 9dc21524-8c93-427b-a6cc-04b7585a9545 with content length: 523
|
|
2025-04-22 11:55:33,174 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
|
|
2025-04-22 11:55:33,740 - root - INFO - Stored embedding for document 9dc21524-8c93-427b-a6cc-04b7585a9545
|
|
2025-04-22 11:56:27,580 - root - INFO - Document 9dc21524-8c93-427b-a6cc-04b7585a9545 processed successfully
|
|
2025-04-22 11:56:27,588 - root - INFO - Document 9dc21524-8c93-427b-a6cc-04b7585a9545 processed successfully
|