AI Platform & Technology — Gilligan Tech Inc.

Cloud Platforms

Google, AWS, Azure, Cloudflare — all under one API.

12+

AI Models

Gemini, GPT-4o, Claude, Llama, Mistral and more — routed by task type.

<100ms

Edge Latency

Cloudflare Workers AI delivers edge inference with sub-100ms response.

Training on Your Data

Your documents never train any model. Period.

Infrastructure

Four clouds. One platform.

Each cloud AI provider brings something distinct. We use all four, and select the best fit for each workload automatically.

Vertex AI is our primary platform for reasoning-heavy tasks and enterprise-scale document retrieval. Gemini's 1M-token context window makes it uniquely suited to analysing large document sets in a single pass.

Gemini 2.0 Flash — High-throughput inference for real-time LLM features and chat
Gemini 1.5 Pro — Complex document analysis with 1M-token context window
Vertex AI Search & RAG Engine — Managed enterprise retrieval with Grounding
Imagen 3 — Visual content generation for marketing assets

AWS Bedrock gives us access to both proprietary and open-weight models through a single managed API. For data-sensitive customers who need model transparency, Llama on Bedrock is a privacy-first choice.

Claude 3.5 Sonnet (via Bedrock) — Primary reasoning model for complex multi-step tasks
Meta Llama 3.1 70B — Open-weight option for privacy-sensitive deployments
Amazon Titan Text Embeddings V2 — Vector embeddings for semantic search and RAG
Amazon Nova Micro — Ultra-fast, cost-efficient inference for high-volume ops

Azure OpenAI gives us GPT-4o's multimodal reasoning inside Microsoft's enterprise compliance boundary — important for customers in regulated industries. Azure AI Foundry handles fine-tuning and custom deployment.

GPT-4o — Multimodal reasoning + vision for document and image processing
GPT-4o mini — Cost-efficient for high-volume classification and extraction
Azure AI Foundry — Managed fine-tuning, evaluation, and custom deployment
Whisper (Azure) — Speech-to-text for meeting notes, calls, and audio transcription

When response time is everything, Cloudflare Workers AI runs inference at the network edge — milliseconds from any user, globally. Ideal for real-time suggestions, autocomplete, and lightweight triage before routing to a heavier model.

Llama 3.1 8B Instruct — Edge inference for sub-50ms response on common queries
Mistral 7B Instruct — Lightweight reasoning at the network edge
Whisper Large v3 — Audio transcription at the edge (data stays in-region)

Architecture

Intelligent model routing

The Gilligan Tech platform analyses each incoming task and routes it to the optimal model — balancing capability, latency, and cost automatically.

Task Classification

Is this a quick query, a long-context analysis, a multimodal task, or a real-time edge operation?

Model Selection

Route to Gemini 1.5 Pro, GPT-4o, Llama 70B, or Cloudflare edge — whichever wins on the task profile.

Inference

The selected model processes the request with your data — never leaving the approved cloud boundary.

Result & Audit

Results returned to your product. Every call logged with model, latency, and token cost for full auditability.

Security & Compliance

Enterprise-grade by default.

🔒

No model training on your data

Customer documents and queries are never used to train or fine-tune any model. We use inference-only API calls on all four platforms.

🌍

US data residency

All data is processed and stored within US-based cloud regions by default. Regional data residency for EU customers is available on Enterprise plans.

📄

Full audit trail

Every model call is logged with the model used, token count, latency, and response. Available for download in your dashboard at any time.

Extended Model Support

Beyond the big four clouds.

The Gilligan Tech routing layer is model-agnostic. In addition to our four primary cloud platforms, we support these providers for specialised workloads, open-weight deployments, and regional data residency requirements.

Provider	Models	Best for
NVIDIA NIM	Llama 3.1 Nemotron · Mistral NIM	Optimised GPU inference via NIM microservices; NVIDIA Inception ecosystem
IBM watsonx.ai	Granite 13B · Llama 3.1 on watsonx	Enterprise governance, IBM-managed infrastructure, TechXChange partner credits
Cohere	Command R+ · Embed v3	RAG-optimised embeddings, reranking, enterprise retrieval pipelines
Mistral AI	Mistral Large 2 · Mistral 7B	EU-hosted inference, GDPR data residency, cost-efficient reasoning
Together.ai	Llama 3.1 405B · Qwen 2.5 72B	High-throughput open-weight inference at scale; batch workloads
Hugging Face Inference	Any Inference Endpoint model	Custom fine-tuned models, open-source ecosystem, private deployments
Anthropic (direct)	Claude 3.5 Sonnet · Claude 3 Haiku	Direct API path (non-Bedrock) for customers requiring Anthropic MSA terms
Perplexity Sonar	sonar-pro · sonar-reasoning	Web-grounded RAG with live search results; real-time fact retrieval

Built on the world's leading AI infrastructure.