What AWS Bedrock enables for your business.
Open-weight models on Bedrock give clients full visibility into model architecture and weights — critical for regulated industries where "black box" AI is a compliance risk. Llama 3.1 on Bedrock combines open-source transparency with AWS's enterprise security.
- Meta Llama 3.1 70B — Open-weight reasoning for privacy-sensitive deployments
- Meta Llama 3.1 8B — Cost-efficient open-weight inference for high-volume tasks
- Mistral Large 2 — European-origin model with strong multilingual capability
Amazon Titan Text Embeddings V2 powers the vector retrieval layer in our RAG pipelines — turning every document corpus into a semantically searchable knowledge base that returns conceptually relevant results, not just keyword matches.
- Titan Text Embeddings V2 — 1,024-dimension semantic embeddings for RAG
- Cohere Embed v3 — Multilingual embeddings + reranking for precision retrieval
- Amazon OpenSearch — Managed vector store with k-NN search, AWS-native
For high-volume classification, extraction, and triage tasks, Bedrock's low-cost models — Amazon Nova Micro in particular — cut inference costs dramatically without sacrificing accuracy on structured tasks.
- Amazon Nova Micro — Ultra-low-cost inference for classification and extraction
- Amazon Nova Lite — Balanced speed and cost for moderate-complexity tasks
- Bedrock Batch Inference — Async bulk processing at up to 50% cost reduction
Bedrock runs inside your AWS VPC boundary with no data egress to the public internet. Combined with KMS-managed encryption and VPC endpoints, it satisfies even the most demanding enterprise and regulated-industry security requirements.
- VPC Endpoints — Private connectivity; inference traffic never hits public internet
- AWS KMS Encryption — Customer-managed keys for data at rest and in transit
- CloudTrail Logging — Every API call logged for audit and compliance
How Gilligan Tech deploys AWS Bedrock.
- Model routing: Incoming tasks are classified by complexity and data-sensitivity. Open-weight models (Llama) are selected for privacy-sensitive workloads; Titan for embedding generation; Nova for bulk classification.
- Document ingestion: Source documents are chunked, embedded via Titan Text Embeddings V2, and stored in Amazon OpenSearch or a compatible vector store. Chunk metadata is preserved for citation.
- Retrieval: User queries are embedded and matched against the vector store using approximate nearest-neighbour search. Top-ranked chunks are assembled into the LLM prompt context.
- Inference: The assembled prompt is sent to the appropriate Bedrock model. For privacy-critical clients, Llama 3.1 70B is used exclusively — no third-party model vendor receives the data.
- Audit trail: CloudTrail captures every Bedrock API call. Gilligan Tech's platform layer adds application-level logging of model, latency, token count, and retrieved sources.
AWS Bedrock models we deploy.
| Model | Provider | Best for |
|---|---|---|
| Llama 3.1 70B Instruct | Meta (via Bedrock) | Privacy-first reasoning; regulated industry deployments |
| Llama 3.1 8B Instruct | Meta (via Bedrock) | Cost-efficient open-weight inference for high-volume tasks |
| Amazon Nova Micro | Amazon | Ultra-low-cost classification, tagging, extraction |
| Amazon Nova Lite | Amazon | Balanced cost and capability for production workloads |
| Titan Text Embeddings V2 | Amazon | Semantic embeddings for RAG retrieval pipelines |
| Cohere Embed v3 | Cohere (via Bedrock) | Multilingual embeddings + reranking for precision retrieval |
| Mistral Large 2 | Mistral AI (via Bedrock) | European-origin multilingual reasoning |