# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

EPOLaw is a comprehensive legal AI platform built with Flask that provides document analysis, case management, summarization, and legal research capabilities for law firms and legal professionals. The application uses a multi-tenant architecture with company-based data isolation and processes documents using OpenAI GPT-4o.

## Critical Development Commands

### Service Management
```bash
# Restart the application service (after code changes)
systemctl restart epolaw

# Check service status
systemctl status epolaw

# View application logs
tail -f /var/log/epolaw/error.log
tail -f /var/log/epolaw/access.log

# Check Apache proxy errors
tail -f /var/log/apache2/epolaw_error.log

# Monitor systemd service logs
journalctl -u epolaw -f

# Reload systemd after editing service file
systemctl daemon-reload
```

### Virtual Environment & Dependencies
```bash
# Activate virtual environment (ALWAYS use this for Python operations)
source /var/www/lawbot/venv/bin/activate

# Install new dependencies
/var/www/lawbot/venv/bin/pip install package_name

# Update requirements after installing new packages
/var/www/lawbot/venv/bin/pip freeze > /var/www/lawbot/config/requirements.txt
```

### Database Operations
```bash
# Create new database tables (after adding models)
/var/www/lawbot/venv/bin/python3 -c "from app import app; from models import db; with app.app_context(): db.create_all()"

# For specific table creation scripts
/var/www/lawbot/venv/bin/python3 /var/www/lawbot/create_law_library_tables.py

# Check recent job status (useful for debugging)
/var/www/lawbot/venv/bin/python3 /var/www/lawbot/check_recent_jobs.py
```

### Development Server (for testing only)
```bash
# Run Flask development server
cd /var/www/lawbot
/var/www/lawbot/venv/bin/python3 app.py
```

## Architecture & Key Design Patterns

### Background Processing
Document analysis and summarization use threading for background processing:
```python
# Background job pattern used in summarization_routes.py
import threading
threading.Thread(target=process_summarization_job, args=(job_uuid, file_path, ...)).start()
```
**Note**: Signal handlers don't work in background threads - use time-based checks instead.

### Authentication Pattern
The application uses **session-based authentication** (NOT Flask-Login decorators):
```python
# Correct authentication check pattern
user_id = session.get('user_id')
if not user_id:
    flash('Please log in to access this feature', 'warning')
    return redirect(url_for('auth.login'))
user = User.query.get(user_id)
```

### Multi-Tenancy & Access Control
All data is scoped by `company_id`. When adding new features:
- Always filter queries by `company_id`
- Check `user.company_id` for data access
- Ensure cross-company data isolation

**Case Access Control Pattern:**
Cases use multi-level access control. Users can access a case if they are:
1. Admin (sees all cases)
2. Company admin (sees all company cases)
3. Case creator (`case.created_by_id == user.id`)
4. Lead attorney (`case.lead_attorney_id == user.id`)
5. Team member (`user in case.team_members`)
6. Same company AND case is NOT confidential

**Important Query Pattern:**
```python
# Always include team member check in case queries for regular users
Case.query.filter(
    db.or_(
        Case.created_by_id == user.id,
        Case.lead_attorney_id == user.id,
        db.and_(Case.company_id == user.company_id, Case.is_confidential == False),
        Case.team_members.any(User.id == user.id)  # Critical: team member access
    )
)
```

**Case Team Members:**
- Managed via `case_team` association table (many-to-many)
- Routes: `POST /cases/<id>/team/add` and `POST /cases/<id>/team/remove/<member_id>`
- UI available on both Case Dashboard and Edit Case pages
- Only lead attorney, creator, or admins can manage team members

### Blueprint Organization
When adding new features, follow the blueprint pattern:
```python
# Create new blueprint in separate file
feature_bp = Blueprint('feature_name', __name__, url_prefix='/feature')

# Register in app.py
app.register_blueprint(feature_bp)
```

### Database Model Pattern
New models should follow this structure:
```python
class NewModel(db.Model):
    __tablename__ = 'table_name'
    id = db.Column(db.Integer, primary_key=True)
    company_id = db.Column(db.Integer, db.ForeignKey('companies.id'), nullable=False, index=True)
    # ... other fields
    created_at = db.Column(db.DateTime, default=datetime.utcnow)
    updated_at = db.Column(db.DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
```

## File Organization

### Core Application Files
- **app.py**: Main Flask application - route registrations, middleware configuration
- **models.py**: SQLAlchemy database models - defines all entities
- **database.py**: Database initialization wrapper

### Blueprint Files (Feature Modules)
- **auth_routes.py**: Authentication and user management
- **case_routes.py**: Case management system
- **law_library_routes.py**: Law Library blueprint implementation
- **summarization_routes.py**: Document summarization with background processing
- **ai_legal_search_routes.py**: AI-powered legal research using Anthropic Claude
- **subscription_routes.py**: Stripe subscription management
- **admin_routes.py**: System admin features
- **company_admin_routes.py**: Company-level admin features

### Processing Modules
- **document_summarization_processor.py**: PDF/DOCX text extraction and OpenAI summarization
- **citation_generator.py**: Legal citation generation for analysis

### Template Structure
Templates use Jinja2 with Bootstrap 5.3.0:
- **base_layout.html**: Main navigation and layout (NOT base.html)
- Templates organized by feature in subdirectories
- Use `{% extends "base_layout.html" %}` for new templates

### JavaScript Conventions
- Avoid using reserved keywords as variable names (e.g., 'case')
- Use Bootstrap's built-in components for UI elements
- jQuery is available globally

## Common Issues & Solutions

### Issue: Authentication errors with @login_required
**Solution**: Don't use Flask-Login decorators. Use session-based authentication:
```python
user_id = session.get('user_id')
if not user_id:
    return redirect(url_for('auth.login'))
```

### Issue: Blueprint endpoint naming
**Solution**: Always use correct blueprint prefix in `url_for()`:
- Case routes use `cases.` prefix (e.g., `url_for('cases.view_case', case_id=id)`)
- Not `case.` (singular) - this will cause BuildError exceptions
- Check registered blueprints: `case_bp = Blueprint('cases', __name__)` uses 'cases' as name

### Issue: Import errors when running scripts
**Solution**: Always use the virtual environment Python:
```bash
/var/www/lawbot/venv/bin/python3 script.py
```

### Issue: Changes not reflected after code update
**Solution**: Restart the service:
```bash
systemctl restart epolaw
```

### Issue: Missing Python packages
**Solution**: Install in virtual environment:
```bash
/var/www/lawbot/venv/bin/pip install package_name
```

## Service Configuration

### Systemd Service
- **Service file**: `/etc/systemd/system/epolaw.service`
- **Working directory**: `/var/www/lawbot`
- **User/Group**: `www-data`
- **Gunicorn workers**: 4 (sync workers)
- **Bind address**: `127.0.0.1:5000`
- **Timeout**: 900 seconds (15 minutes)
- **Worker recycling**: After 500 requests to prevent memory leaks

### Production Stack
- **Web Server**: Apache (reverse proxy)
- **WSGI Server**: Gunicorn
- **Database**: MySQL
- **Python**: 3.10.12 (in virtual environment)

## Security Considerations

### CSRF Protection
All forms must include CSRF token:
```html
<input type="hidden" name="csrf_token" value="{{ csrf_token() }}">
```

### File Uploads
- Use `secure_filename()` for all uploaded files
- Store in `/var/www/lawbot/uploads/` subdirectories
- Check file extensions with `allowed_file()` function

### Session Security
- Sessions expire after inactivity
- Use `session.permanent = True` for remember me functionality
- Clear session on logout with `session.clear()`

## Security Audit & Vulnerability Management

### Security Audit History
**Latest Audit**: October 11, 2025
- **Files Audited**: `auth_routes.py`, `case_routes.py`
- **Total Issues Found**: 18 (5 Critical, 7 High, 3 Medium, 3 Low)
- **Full Report**: See detailed audit report in conversation history or `/var/www/lawbot/SECURITY_TODO.md`

### Known Security Issues (Critical Priority)
**⚠️ IMPORTANT**: These issues should be addressed before production deployment or during next maintenance window.

1. **Missing Rate Limiting**: Login and password reset routes lack rate limiting protection
2. **Open Redirect**: Login redirect parameter not validated
3. **Path Traversal**: File downloads don't validate paths are within upload directory
4. **Missing File Size Validation**: No enforcement of 100MB file size limit
5. **Information Disclosure**: Password reset URLs exposed in error messages

See `SECURITY_TODO.md` for complete prioritized list with implementation details.

### Security Best Practices for New Code
When adding new features, always:

1. **Authentication & Authorization**
   - Apply `@login_required` decorator to protected routes
   - Check `user.company_id` matches resource's `company_id`
   - Use role-based checks: `user.is_admin()`, `user.is_company_admin()`
   - For case access, use `user.can_view_case(case)` and `user.can_edit_case(case)`

2. **Input Validation**
   - Validate all user inputs (length, format, type)
   - Use `secure_filename()` for file uploads
   - Validate email format with regex
   - Check file extensions AND content (magic bytes)
   - Enforce file size limits before processing

3. **SQL Injection Prevention**
   - Always use SQLAlchemy ORM (never raw SQL)
   - Use parameterized queries if raw SQL is unavoidable
   - Filter by `company_id` in all multi-tenant queries

4. **Path Traversal Prevention**
   ```python
   # Always validate file paths
   upload_dir = os.path.abspath(current_app.config['UPLOAD_FOLDER'])
   file_path = os.path.abspath(document.file_path)
   if not file_path.startswith(upload_dir):
       abort(403)
   ```

5. **Rate Limiting**
   ```python
   from app import limiter

   @auth_bp.route('/sensitive-endpoint', methods=['POST'])
   @limiter.limit("5 per hour")
   def sensitive_endpoint():
       # ...
   ```

6. **Session Management**
   ```python
   # Always regenerate session after login
   session.clear()
   session.permanent = remember_me
   session['user_id'] = user.id
   ```

7. **Error Handling**
   - Never expose sensitive data in error messages
   - Log detailed errors server-side only
   - Return generic error messages to users
   - Use `current_app.logger.error()` for secure logging

8. **CSRF Protection**
   - All POST forms must include `{{ csrf_token() }}`
   - API endpoints that modify data need CSRF validation

### Security Testing Commands
```bash
# Check for outdated/vulnerable dependencies
/var/www/lawbot/venv/bin/pip list --outdated
/var/www/lawbot/venv/bin/pip install pip-audit
/var/www/lawbot/venv/bin/pip-audit

# Review security configuration
cat /var/www/lawbot/config/security_config.py

# Check file permissions
ls -la /var/www/lawbot/uploads/
ls -la /var/www/lawbot/*.py

# Review recent authentication attempts (requires logging setup)
tail -f /var/log/epolaw/error.log | grep -i "invalid\|failed\|unauthorized"
```

### Security Review Checklist
Before deploying new features, verify:
- [ ] Rate limiting applied to sensitive endpoints
- [ ] All user inputs validated (length, format, type)
- [ ] File uploads check content type, not just extension
- [ ] File paths validated against directory traversal
- [ ] Authorization checks prevent horizontal privilege escalation
- [ ] Company ID filtering prevents cross-tenant data access
- [ ] CSRF tokens present in all forms
- [ ] No sensitive data in error messages or logs
- [ ] Session regeneration after authentication
- [ ] Password strength requirements enforced

## Testing & Debugging

### Check for Python syntax errors
```bash
/var/www/lawbot/venv/bin/python3 -m py_compile file.py
```

### Database connection test
```bash
/var/www/lawbot/venv/bin/python3 -c "from app import app; from models import db; with app.app_context(): print(db.engine.execute('SELECT 1').scalar())"
```

### View running background shells (if using Bash tool)
```bash
# Use /bashes command in Claude Code interface
```

## Important Environment Details
- **Platform**: Linux (Ubuntu)
- **Working Directory**: `/var/www/lawbot`
- **Git Repository**: Not initialized (use backups before major changes)
- **Backup Location**: `/var/www/lawbot/backups/` and `/home/sgadmin/`

## Backup & Disaster Recovery

### Creating Full System Backup
To create a complete backup of the EPOLaw application:
```bash
# Create timestamped backup directory
BACKUP_DIR="/var/www/lawbot/backups/backup_$(date +%Y%m%d_%H%M%S)"
mkdir -p $BACKUP_DIR

# Copy all Python application files
cp /var/www/lawbot/*.py $BACKUP_DIR/

# Copy templates and configuration
cp -r /var/www/lawbot/templates $BACKUP_DIR/
cp -r /var/www/lawbot/config $BACKUP_DIR/
cp /var/www/lawbot/CLAUDE.md $BACKUP_DIR/ 2>/dev/null

# Backup complete database (all databases with schema and data)
mysqldump --all-databases --single-transaction --routines --triggers > $BACKUP_DIR/database_schema.sql

# Backup system configuration
cp /etc/systemd/system/epolaw.service $BACKUP_DIR/
cp /etc/apache2/sites-available/epolaw.conf $BACKUP_DIR/ 2>/dev/null

# Create compressed archive
cd /var/www/lawbot/backups
tar -czf $(basename $BACKUP_DIR).tar.gz $(basename $BACKUP_DIR)/

# Copy to user-accessible location for download
cp $(basename $BACKUP_DIR).tar.gz /home/sgadmin/
```

### Latest Backup
- **Date**: October 2, 2025
- **Location**: `/home/sgadmin/backup_20251002_081346.tar.gz`
- **Size**: 867 KB (compressed), 4.5 MB uncompressed
- **Contents**: All application files, templates, config, complete database dump, service configurations

### Restoration Notes
- Database backup includes ALL databases (not just EPOLaw)
- Use `mysql < database_schema.sql` to restore database
- Ensure proper file permissions after restoration (`chown www-data:www-data`)
- Restart services after restoration: `systemctl restart epolaw`

## Recent Feature Additions & UI Updates

### Landing Page Updates (October 2025)
- **5-Slide Carousel**: Added professional slides showcasing platform features
  - Slide 1: All-in-One Legal Platform (purple gradient)
  - Slide 2: Plans That Fit Your Practice (green gradient)
  - Slide 3: Secure, Verified, Reliable (blue-purple gradient)
  - Slide 4: AI-Powered Legal Analysis
  - Slide 5: Time Savings Comparison
- **Contact Form**: Enhanced with subject pre-selection via URL parameters
  - Added "Custom Development" subject option
  - URL parameter support: `/contact?subject=custom_development`
  - Enterprise users redirected to contact form for custom solutions
- **Login Page**: Added footer links to epolaw.ai and epobot.ai with external link icons

### Error Handling & User Experience
- **Custom Error Pages**: Professional error pages with helpful actions
  - 404, 500, and general exception handlers
  - "Report This Issue" button links to contact form with "bug" pre-selected
  - Support contact information displayed (email & phone)
  - CSRF token issues resolved in AI Research suggested searches

### Subscription & Plan Management
- **Enterprise Plan Display**: "Unlimited analyses" instead of "inf analyses left"
- **Plan-Specific Actions**: Enterprise users see "Contact Sales for Custom Plans"
- **Feature Access Control**: Proper feature flags in subscription plans

### Law Library System
- **Implementation**: Complete document repository with metadata and full-text search
- **Location**: `/law-library` routes in `law_library_routes.py`
- **Features**: Upload, search, filter by type/court/practice area, document preview
- **Database**: `LawLibraryDocument` model with extensive metadata fields
- **FULLTEXT Search**: MySQL FULLTEXT index on title, description, keywords, content_text, case_name
  - Automatic text extraction from PDF, DOCX, TXT files on upload
  - MEDIUMTEXT column type supports up to 16MB (handles large Supreme Court decisions)
  - Company-isolated search - users only search their company's documents
  - Relevance-ranked results using NATURAL LANGUAGE MODE
- **Color Scheme**: Soft purple gradient (#7e57c2 to #9575cd) to distinguish from operational features
- **Authentication**: Uses session-based auth, NOT Flask-Login decorators
- **Access Control**: Enterprise plan feature - `law_library_access: true` in plan features

### Navigation Menu Organization
Current order (left to right):
1. Dashboard
2. New Analysis
3. New Summary
4. Research Tools (dropdown containing Case Research, Enhanced Research, AI Research)
5. Cases (dropdown with case management and recent cases)
6. Law Library
7. Recent Activity

### UI Fixes & Improvements
- **Document Name Wrapping**: Long filenames in case documents now wrap properly using `.document-name` class
- **Dropdown Z-Index**: Fixed floating dropdown menus with z-index: 99999
- **JavaScript Reserved Words**: Avoid using 'case' as variable name (use 'caseItem' instead)
- **Document Actions**: Both dropdown and click-to-show UI patterns implemented
- **AI Research Display Fix** (October 7, 2025): Fixed JavaScript function name collision
  - **Issue**: `addMessage()` function in `base_layout.html` (for assistant chat) was conflicting with AI research page's `addMessage()`
  - **Root Cause**: Global function was being called, looking for `#assistant-messages` (doesn't exist on AI research page), causing silent failure
  - **Solution**: Renamed AI research function to `addAIMessage()` in `/var/www/lawbot/templates/ai_legal_research.html`
  - **Impact**: AI Research page now properly displays user queries and AI responses
  - **Files Modified**: `templates/ai_legal_research.html`
  - **Lesson Learned**: Always use unique function names for page-specific JavaScript to avoid conflicts with global functions

### Case Management Enhancements
- **Document Integration**: View/Download/Analyze/Summarize documents directly from case dashboard
- **Recent Cases Filter**: 30-day window for recent activity
- **Update Timestamps**: Case `updated_at` field updates when notes/documents added
- **Document Upload**: Fixed file dialog and JSON parsing issues
- **Team Member Sharing**: Complete implementation for confidential case collaboration
  - Add/remove team members to cases (even confidential ones)
  - Team members UI on both Case Dashboard and Edit Case pages
  - Team member access included in all case queries and case count
  - `Case.team_members` relationship using `case_team` association table
  - `can_user_access()` method checks team membership for access control

### Document Processing Optimizations
- **Large PDF Handling**: Supreme Court decisions (250+ pages) use smart sampling:
  - Processes first 10 pages (syllabus)
  - Pages 20-40 (main opinion)
  - Last 10 pages (conclusion)
  - Reduces 281 pages to ~40 key pages
- **File Size Limits**: 100MB max, with helpful error messages for larger files
- **Timeout Protection**: 10-minute processing limit with graceful failure
- **Chunk Size**: 5000 tokens for GPT-4o, 2500 for GPT-3.5
- **Error Recovery**: Comprehensive retry page with document splitting suggestions
- init