# Mini SpecsComply Pro (SCP) ## Overview Mini SpecsComply Pro (SCP) is a lightweight document compliance and validation tool designed to analyze and verify technical documents against predefined standards and project-specific requirements. It leverages advanced AI models for embedding, reasoning, and ranking to ensure fast and accurate document processing. ## Features - **Document Analysis:** Automated analysis of technical documents for compliance verification - **AI-Powered Processing:** - GROQ LLM for deep reasoning and compliance analysis - Cohere for document embedding and result ranking - **Advanced Standards Matching:** - Sophisticated matching algorithm to identify relevant standards - Section-based analysis for contextual understanding - Technical term recognition and keyword extraction - Relevance scoring system for accurate standard selection - **Custom Standards Support:** - Upload and manage your own compliance standards - JSON-based standard definitions with flexible structure - **Vector Database Support:** - Pinecone (default) - Weaviate (alternative) - **RESTful API:** Built with FastAPI for easy integration - **Real-time Processing:** Async support for efficient document handling - **Structured Reports:** Detailed compliance feedback and recommendations with applied standards tracking ## Prerequisites - Python 3.8 or higher - pip or poetry for package management - API keys for: - GROQ - Cohere - Pinecone (if using Pinecone) or Weaviate URL (if using Weaviate) ## Installation 1. Clone the repository: ```bash git clone http://23.29.118.76:3000/task/ds_scp_task_solution.git cd ds_scp_task_solution ``` 2. Create and activate a virtual environment: ```bash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate ``` 3. Install dependencies: ```bash pip install -r requirements.txt ``` 4. Create a `.env` file in the project root: ```env # Required API Keys GROQ_API_KEY=your_groq_api_key COHERE_API_KEY=your_cohere_api_key # Vector Database (Choose one) # For Pinecone: VECTOR_DB=pinecone PINECONE_API_KEY=your_pinecone_api_key PINECONE_ENVIRONMENT=your_pinecone_environment #us-east-1 PINECONE_INDEX_NAME=specscomply-documents # Or for Weaviate: # VECTOR_DB=weaviate # WEAVIATE_URL=your_weaviate_url # WEAVIATE_API_KEY=your_weaviate_api_key # Optional Settings APP_NAME="Mini SpecsComply Pro" APP_VERSION="0.1.0" DEBUG=False ``` ## Running the Application ### Quick Start ```bash python launch.py ``` This will check your environment setup and start the application. Go to `http://localhost:8000` in your browser. The API will be available at: - API Documentation: `http://localhost:8000/docs` ## API Endpoints - `POST /api/documents/upload` - Upload a document for analysis - `GET /api/documents/{document_id}` - Get document status and results - `POST /api/documents/{document_id}/resubmit` - Resubmit a document for re-analysis - `GET /api/documents/{document_id}/analysis` - Get detailed compliance analysis - `GET /api/standards` - List all available standards - `POST /api/standards/upload` - Upload a custom standard definition - `GET /api/standards/{standard_id}` - Get details of a specific standard - `GET /api/health` - Health check endpoint ## Configuration The application can be configured through environment variables or the `.env` file. Key configuration options: - `DEBUG`: Enable debug mode (default: False) - `VECTOR_DB`: Choose vector database backend ("pinecone" or "weaviate") - `EMBEDDING_MODEL`: Cohere embedding model (default: "embed-english-v3.0") - `RERANKER_MODEL`: Cohere reranker model (default: "rerank-english-v2.0") - `REASONING_MODEL`: GROQ model (default: "llama-3.3-70b-versatile") ## Development ### Project Structure ``` mini-specscomply-pro/ ├── app/ │ ├── api/ # API routes and endpoints │ ├── core/ # Core configuration and models │ └── services/ # Business logic services |── Data/ # Sample data and documents ├── requirements.txt # Project dependencies ├── run.py # Application runner |── launch.py # Setup and launch script ├── .env # Environment variables ├── .gitignore # Git ignore file ├── README.md # Project documentation ``` ## Advanced Standards Matching Mini SpecsComply Pro uses a sophisticated algorithm to match documents with relevant standards: 1. **Document Analysis** - Extracts sections and headings from the document - Identifies key technical terms and phrases - Recognizes standard references (e.g., "ISO-9001", "IEEE 829") 2. **Relevance Scoring** - Calculates weighted scores based on multiple factors: - Direct standard name matches (highest weight) - Keyword matches between document and standard - Section-specific matches (e.g., in References or Requirements sections) - Technical term matches - Requirement-specific matches 3. **Standard Selection** - Selects the most relevant standards based on score threshold - Applies these standards during compliance analysis - Displays applied standards in the compliance report This approach ensures that the most appropriate standards are applied to each document, improving the accuracy and relevance of compliance analysis. ## Document and Standard Formats ### Compliance Documents For best results, structure your compliance documents with clear sections and headings. The system performs better with well-organized documents that include: 1. **Clear Headings**: Use markdown-style headings (e.g., `# Section Title`) to organize content 2. **Introduction Section**: Provide context and purpose of the document 3. **Scope Section**: Define what the document covers and doesn't cover 4. **Requirements Sections**: Clearly state requirements using terms like "shall", "must", "should" 5. **References Section**: List relevant standards, specifications, or other documents 6. **Technical Details**: Include specific technical information relevant to compliance Example document structure: ```markdown # System Compliance Specification ## Introduction This document specifies the compliance requirements for the XYZ system. ## Scope This specification applies to all components of the XYZ system. ## Requirements ### Functional Requirements 1. The system shall process user input within 500ms. 2. The system must maintain data integrity during power failures. ### Security Requirements 1. All data transmissions shall be encrypted using AES-256. 2. User authentication must comply with NIST guidelines. ## References - ISO-9001:2015 Quality Management Systems - IEEE-829 Software Test Documentation ``` ### Custom Standard Definitions Custom standards are defined in JSON format with the following structure: ```json { "name": "ISO-9001", "description": "Quality Management System standard", "requirements": [ { "id": "ISO-9001-4.1", "description": "The organization shall determine external and internal issues relevant to its purpose and strategic direction.", "severity": "major" }, { "id": "ISO-9001-4.2", "description": "The organization shall monitor and review information about these external and internal issues.", "severity": "minor" } ] } ``` You can also define multiple standards in a single file: ```json { "standards": [ { "name": "ISO-9001", "description": "Quality Management System standard", "requirements": [...] }, { "name": "IEEE-829", "description": "Software Test Documentation standard", "requirements": [...] } ] } ``` Requirement severity levels: - `critical`: Major non-compliance that must be addressed immediately - `major`: Significant issue that should be addressed soon - `minor`: Less significant issue that should be addressed when convenient - `info`: Informational note or suggestion ## Troubleshooting Common issues and solutions: 1. **Missing API Keys** - Ensure all required API keys are set in your `.env` file - Check the API key format and validity 2. **Vector Database Connection** - Verify the vector database configuration - Ensure the selected database service is running and accessible 3. **Model Errors** - Check API quotas and limits - Verify model names in configuration 4. **Standards Not Being Applied** - Verify that standards have been uploaded correctly - Check the logs for standards matching information - Ensure document content includes relevant terminology for matching