The challenge
Associates at this 40-person regional firm were spending the majority of each working day doing something that felt unavoidable: manually reading through case documents, contracts, and precedents to find the specific clause, date, or ruling they needed for the brief they were writing.
The firm's document corpus had grown to thousands of files across three practice areas — corporate, employment, and property law. Keyword search returned too many false positives. Associates would open, scan, and close documents dozens of times before locating the passage they needed. On a complex matter, a single research task could consume an entire morning.
The practice manager estimated associates were spending 6–8 hours per day on document search — time that was billed at associate rates but added little strategic value. The firm had looked at legal research platforms, but the cost per seat was prohibitive for a firm of their size.
The solution
Gilligan Tech deployed a Document Intelligence pipeline built on AWS Bedrock Titan Embeddings and Google Gemini 1.5 Pro. The firm's entire document corpus — PDFs, Word documents, scanned files — was ingested, chunked using a legal-document-aware chunking strategy, and embedded into a vector store.
Associates now query their document library in plain English: "Find all clauses about liability limitation in employment contracts signed after 2022" or "What did the Henderson case establish about contractor classification?" The system returns ranked results with exact source citations — document name, section, and page number — in under three seconds.
Gemini 1.5 Pro's one-million-token context window was critical for complex cross-document analysis: the model can reason across multiple related contracts simultaneously, something that was previously impossible without manually assembling excerpts.
Implementation
- Document audit and ingestion: The firm's document corpus (~4,200 files) was audited, classified by practice area, and ingested. Scanned PDFs were passed through OCR before chunking.
- Legal-aware chunking: Documents were chunked at section boundaries (not arbitrary character counts) to preserve legal context. Clause headers, section numbers, and parties were preserved as metadata.
- Embedding and indexing: Chunks were embedded using AWS Bedrock Titan Embeddings V2 and indexed in a managed vector store. Metadata filters allow associates to scope queries by practice area, document type, or date range.
- Query interface: A simple web interface (integrated into the firm's existing intranet) lets associates type plain-English queries. Results show ranked document excerpts with exact citations and a brief Gemini-generated summary of relevance.
- Access controls: Document-level permissions were preserved. Associates only see results from documents in their practice area, with partner-only documents gated behind role-based access.
Results
Technology
Ready to see similar results for your firm or business?