Four clouds. One platform.
Each cloud AI provider brings something distinct. We use all four, and select the best fit for each workload automatically.
Vertex AI is our primary platform for reasoning-heavy tasks and enterprise-scale document retrieval. Gemini's 1M-token context window makes it uniquely suited to analysing large document sets in a single pass.
- Gemini 2.0 Flash — High-throughput inference for real-time LLM features and chat
- Gemini 1.5 Pro — Complex document analysis with 1M-token context window
- Vertex AI Search & RAG Engine — Managed enterprise retrieval with Grounding
- Imagen 3 — Visual content generation for marketing assets
AWS Bedrock gives us access to both proprietary and open-weight models through a single managed API. For data-sensitive customers who need model transparency, Llama on Bedrock is a privacy-first choice.
- Claude 3.5 Sonnet (via Bedrock) — Primary reasoning model for complex multi-step tasks
- Meta Llama 3.1 70B — Open-weight option for privacy-sensitive deployments
- Amazon Titan Text Embeddings V2 — Vector embeddings for semantic search and RAG
- Amazon Nova Micro — Ultra-fast, cost-efficient inference for high-volume ops
Azure OpenAI gives us GPT-4o's multimodal reasoning inside Microsoft's enterprise compliance boundary — important for customers in regulated industries. Azure AI Foundry handles fine-tuning and custom deployment.
- GPT-4o — Multimodal reasoning + vision for document and image processing
- GPT-4o mini — Cost-efficient for high-volume classification and extraction
- Azure AI Foundry — Managed fine-tuning, evaluation, and custom deployment
- Whisper (Azure) — Speech-to-text for meeting notes, calls, and audio transcription
When response time is everything, Cloudflare Workers AI runs inference at the network edge — milliseconds from any user, globally. Ideal for real-time suggestions, autocomplete, and lightweight triage before routing to a heavier model.
- Llama 3.1 8B Instruct — Edge inference for sub-50ms response on common queries
- Mistral 7B Instruct — Lightweight reasoning at the network edge
- Whisper Large v3 — Audio transcription at the edge (data stays in-region)
Intelligent model routing
The Gilligan Tech platform analyses each incoming task and routes it to the optimal model — balancing capability, latency, and cost automatically.
Enterprise-grade by default.
No model training on your data
Customer documents and queries are never used to train or fine-tune any model. We use inference-only API calls on all four platforms.
US data residency
All data is processed and stored within US-based cloud regions by default. Regional data residency for EU customers is available on Enterprise plans.
Full audit trail
Every model call is logged with the model used, token count, latency, and response. Available for download in your dashboard at any time.
Beyond the big four clouds.
The Gilligan Tech routing layer is model-agnostic. In addition to our four primary cloud platforms, we support these providers for specialised workloads, open-weight deployments, and regional data residency requirements.
| Provider | Models | Best for |
|---|---|---|
| NVIDIA NIM | Llama 3.1 Nemotron · Mistral NIM | Optimised GPU inference via NIM microservices; NVIDIA Inception ecosystem |
| IBM watsonx.ai | Granite 13B · Llama 3.1 on watsonx | Enterprise governance, IBM-managed infrastructure, TechXChange partner credits |
| Cohere | Command R+ · Embed v3 | RAG-optimised embeddings, reranking, enterprise retrieval pipelines |
| Mistral AI | Mistral Large 2 · Mistral 7B | EU-hosted inference, GDPR data residency, cost-efficient reasoning |
| Together.ai | Llama 3.1 405B · Qwen 2.5 72B | High-throughput open-weight inference at scale; batch workloads |
| Hugging Face Inference | Any Inference Endpoint model | Custom fine-tuned models, open-source ecosystem, private deployments |
| Anthropic (direct) | Claude 3.5 Sonnet · Claude 3 Haiku | Direct API path (non-Bedrock) for customers requiring Anthropic MSA terms |
| Perplexity Sonar | sonar-pro · sonar-reasoning | Web-grounded RAG with live search results; real-time fact retrieval |