Initial commit
This commit is contained in:
@@ -0,0 +1,259 @@
|
||||
# Mini SpecsComply Pro (SCP)
|
||||
|
||||
## Overview
|
||||
|
||||
Mini SpecsComply Pro (SCP) is a lightweight document compliance and validation tool designed to analyze and verify technical documents against predefined standards and project-specific requirements. It leverages advanced AI models for embedding, reasoning, and ranking to ensure fast and accurate document processing.
|
||||
|
||||
## Features
|
||||
|
||||
- **Document Analysis:** Automated analysis of technical documents for compliance verification
|
||||
- **AI-Powered Processing:**
|
||||
- GROQ LLM for deep reasoning and compliance analysis
|
||||
- Cohere for document embedding and result ranking
|
||||
- **Advanced Standards Matching:**
|
||||
- Sophisticated matching algorithm to identify relevant standards
|
||||
- Section-based analysis for contextual understanding
|
||||
- Technical term recognition and keyword extraction
|
||||
- Relevance scoring system for accurate standard selection
|
||||
- **Custom Standards Support:**
|
||||
- Upload and manage your own compliance standards
|
||||
- JSON-based standard definitions with flexible structure
|
||||
- **Vector Database Support:**
|
||||
- Pinecone (default)
|
||||
- Weaviate (alternative)
|
||||
- **RESTful API:** Built with FastAPI for easy integration
|
||||
- **Real-time Processing:** Async support for efficient document handling
|
||||
- **Structured Reports:** Detailed compliance feedback and recommendations with applied standards tracking
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.8 or higher
|
||||
- pip or poetry for package management
|
||||
- API keys for:
|
||||
- GROQ
|
||||
- Cohere
|
||||
- Pinecone (if using Pinecone) or Weaviate URL (if using Weaviate)
|
||||
|
||||
## Installation
|
||||
|
||||
1. Clone the repository:
|
||||
```bash
|
||||
git clone http://23.29.118.76:3000/task/mini-specscomply-pro.git
|
||||
cd mini-specscomply-pro
|
||||
```
|
||||
|
||||
2. Create and activate a virtual environment:
|
||||
```bash
|
||||
python -m venv venv
|
||||
source venv/bin/activate # On Windows: venv\Scripts\activate
|
||||
```
|
||||
|
||||
3. Install dependencies:
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
4. Create a `.env` file in the project root:
|
||||
```env
|
||||
# Required API Keys
|
||||
GROQ_API_KEY=your_groq_api_key
|
||||
COHERE_API_KEY=your_cohere_api_key
|
||||
|
||||
# Vector Database (Choose one)
|
||||
# For Pinecone:
|
||||
VECTOR_DB=pinecone
|
||||
PINECONE_API_KEY=your_pinecone_api_key
|
||||
PINECONE_ENVIRONMENT=your_pinecone_environment #us-east-1
|
||||
PINECONE_INDEX_NAME=specscomply_documents
|
||||
|
||||
# Or for Weaviate:
|
||||
# VECTOR_DB=weaviate
|
||||
# WEAVIATE_URL=your_weaviate_url
|
||||
# WEAVIATE_API_KEY=your_weaviate_api_key
|
||||
|
||||
# Optional Settings
|
||||
APP_NAME="Mini SpecsComply Pro"
|
||||
APP_VERSION="0.1.0"
|
||||
DEBUG=False
|
||||
```
|
||||
|
||||
## Running the Application
|
||||
|
||||
### Quick Start
|
||||
```bash
|
||||
python launch.py
|
||||
```
|
||||
This will check your environment setup and start the application. Go to `http://localhost:8000` in your browser.
|
||||
|
||||
|
||||
The API will be available at:
|
||||
- API Documentation: `http://localhost:8000/docs`
|
||||
|
||||
## API Endpoints
|
||||
|
||||
- `POST /api/documents/upload` - Upload a document for analysis
|
||||
- `GET /api/documents/{document_id}` - Get document status and results
|
||||
- `POST /api/documents/{document_id}/resubmit` - Resubmit a document for re-analysis
|
||||
- `GET /api/documents/{document_id}/analysis` - Get detailed compliance analysis
|
||||
- `GET /api/standards` - List all available standards
|
||||
- `POST /api/standards/upload` - Upload a custom standard definition
|
||||
- `GET /api/standards/{standard_id}` - Get details of a specific standard
|
||||
- `GET /api/health` - Health check endpoint
|
||||
|
||||
## Configuration
|
||||
|
||||
The application can be configured through environment variables or the `.env` file. Key configuration options:
|
||||
|
||||
- `DEBUG`: Enable debug mode (default: False)
|
||||
- `VECTOR_DB`: Choose vector database backend ("pinecone" or "weaviate")
|
||||
- `EMBEDDING_MODEL`: Cohere embedding model (default: "embed-english-v3.0")
|
||||
- `RERANKER_MODEL`: Cohere reranker model (default: "rerank-english-v2.0")
|
||||
- `REASONING_MODEL`: GROQ model (default: "llama-3.3-70b-versatile")
|
||||
|
||||
## Development
|
||||
|
||||
### Project Structure
|
||||
```
|
||||
mini-specscomply-pro/
|
||||
├── app/
|
||||
│ ├── api/ # API routes and endpoints
|
||||
│ ├── core/ # Core configuration and models
|
||||
│ └── services/ # Business logic services
|
||||
|── Data/ # Sample data and documents
|
||||
├── requirements.txt # Project dependencies
|
||||
├── run.py # Application runner
|
||||
|── launch.py # Setup and launch script
|
||||
├── .env # Environment variables
|
||||
├── .gitignore # Git ignore file
|
||||
├── README.md # Project documentation
|
||||
```
|
||||
|
||||
## Advanced Standards Matching
|
||||
|
||||
Mini SpecsComply Pro uses a sophisticated algorithm to match documents with relevant standards:
|
||||
|
||||
1. **Document Analysis**
|
||||
- Extracts sections and headings from the document
|
||||
- Identifies key technical terms and phrases
|
||||
- Recognizes standard references (e.g., "ISO-9001", "IEEE 829")
|
||||
|
||||
2. **Relevance Scoring**
|
||||
- Calculates weighted scores based on multiple factors:
|
||||
- Direct standard name matches (highest weight)
|
||||
- Keyword matches between document and standard
|
||||
- Section-specific matches (e.g., in References or Requirements sections)
|
||||
- Technical term matches
|
||||
- Requirement-specific matches
|
||||
|
||||
3. **Standard Selection**
|
||||
- Selects the most relevant standards based on score threshold
|
||||
- Applies these standards during compliance analysis
|
||||
- Displays applied standards in the compliance report
|
||||
|
||||
This approach ensures that the most appropriate standards are applied to each document, improving the accuracy and relevance of compliance analysis.
|
||||
|
||||
## Document and Standard Formats
|
||||
|
||||
### Compliance Documents
|
||||
|
||||
For best results, structure your compliance documents with clear sections and headings. The system performs better with well-organized documents that include:
|
||||
|
||||
1. **Clear Headings**: Use markdown-style headings (e.g., `# Section Title`) to organize content
|
||||
2. **Introduction Section**: Provide context and purpose of the document
|
||||
3. **Scope Section**: Define what the document covers and doesn't cover
|
||||
4. **Requirements Sections**: Clearly state requirements using terms like "shall", "must", "should"
|
||||
5. **References Section**: List relevant standards, specifications, or other documents
|
||||
6. **Technical Details**: Include specific technical information relevant to compliance
|
||||
|
||||
Example document structure:
|
||||
```markdown
|
||||
# System Compliance Specification
|
||||
|
||||
## Introduction
|
||||
This document specifies the compliance requirements for the XYZ system.
|
||||
|
||||
## Scope
|
||||
This specification applies to all components of the XYZ system.
|
||||
|
||||
## Requirements
|
||||
### Functional Requirements
|
||||
1. The system shall process user input within 500ms.
|
||||
2. The system must maintain data integrity during power failures.
|
||||
|
||||
### Security Requirements
|
||||
1. All data transmissions shall be encrypted using AES-256.
|
||||
2. User authentication must comply with NIST guidelines.
|
||||
|
||||
## References
|
||||
- ISO-9001:2015 Quality Management Systems
|
||||
- IEEE-829 Software Test Documentation
|
||||
```
|
||||
|
||||
### Custom Standard Definitions
|
||||
|
||||
Custom standards are defined in JSON format with the following structure:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "ISO-9001",
|
||||
"description": "Quality Management System standard",
|
||||
"requirements": [
|
||||
{
|
||||
"id": "ISO-9001-4.1",
|
||||
"description": "The organization shall determine external and internal issues relevant to its purpose and strategic direction.",
|
||||
"severity": "major"
|
||||
},
|
||||
{
|
||||
"id": "ISO-9001-4.2",
|
||||
"description": "The organization shall monitor and review information about these external and internal issues.",
|
||||
"severity": "minor"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
You can also define multiple standards in a single file:
|
||||
|
||||
```json
|
||||
{
|
||||
"standards": [
|
||||
{
|
||||
"name": "ISO-9001",
|
||||
"description": "Quality Management System standard",
|
||||
"requirements": [...]
|
||||
},
|
||||
{
|
||||
"name": "IEEE-829",
|
||||
"description": "Software Test Documentation standard",
|
||||
"requirements": [...]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Requirement severity levels:
|
||||
- `critical`: Major non-compliance that must be addressed immediately
|
||||
- `major`: Significant issue that should be addressed soon
|
||||
- `minor`: Less significant issue that should be addressed when convenient
|
||||
- `info`: Informational note or suggestion
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
Common issues and solutions:
|
||||
|
||||
1. **Missing API Keys**
|
||||
- Ensure all required API keys are set in your `.env` file
|
||||
- Check the API key format and validity
|
||||
|
||||
2. **Vector Database Connection**
|
||||
- Verify the vector database configuration
|
||||
- Ensure the selected database service is running and accessible
|
||||
|
||||
3. **Model Errors**
|
||||
- Check API quotas and limits
|
||||
- Verify model names in configuration
|
||||
|
||||
4. **Standards Not Being Applied**
|
||||
- Verify that standards have been uploaded correctly
|
||||
- Check the logs for standards matching information
|
||||
- Ensure document content includes relevant terminology for matching
|
||||
Reference in New Issue
Block a user