Artificial Intelligence Expertise
Generative AI, an opportunity to exploit with method
In 2026, generative artificial intelligence has left the exploration phase. LLMs produce code daily in more than 70% of European tech teams. Autonomous agents automate customer support, sales qualification, report generation. RAG makes internal corpora queryable in natural language. Claude, GPT-5, Mistral Large, Llama 4 offer capabilities that were unthinkable three years ago.
The problem is no longer technical, it is methodological. AI projects that fail share the same symptoms: POCs that never reach production, spiraling costs, undetected hallucinations, single-provider lock-in, PII leaking into prompts, no quantitative quality evaluation. Our boutique applies the same rigor to AI projects as to a critical IT system: evaluation, observability, guardrails, controlled costs, reversibility.
Our approach
AI must solve a measurable business problem, not feed a communication deck.
- Framing on a quantified KPI. Before any POC, we define the success metric (support deflection rate, case processing time, semantic search precision). No metric, no project.
- Model selection by evaluation, not by hype. We compare Claude, GPT, Mistral, Llama on a dataset representative of the client, on the three axes latency, cost, quality. Often the small model wins.
- Modular, decoupled architecture. The LLM engine is an injectable dependency behind an interface. Switching provider does not require a rewrite. Protection against vendor lock-in by design.
- Systematic guardrails. Prompt injection detection and rejection (Rebuff, Lakera), PII scrubbing before sending to the LLM, output moderation, rate limiting per user and per tenant.
- Observability from day 1. Langfuse or Helicone for traces, cost dashboards per feature and per tenant, tracking of quality scores (automated eval), drift detection on request distribution.
- Lightweight but real MLOps. Prompts versioned alongside the code, automated evaluations in CI, canary deployment on model changes, instant rollback.
Technologies & frameworks we master
| Area | Tools and models |
|---|---|
| Hosted LLMs | Claude 4.5 Sonnet, Claude 4.5 Opus, GPT-5, GPT-5-mini, Mistral Large 2, Gemini 2 Pro |
| Self-hosted LLMs | Llama 4, Qwen 3, Mistral 7B/24B, DeepSeek V3, via vLLM, Ollama, TGI |
| Orchestration | Symfony + HTTP client, LangChain (Python for POCs), LlamaIndex, CrewAI |
| Embeddings | OpenAI text-embedding-3-large, Cohere Embed v4, BAAI bge-large, jina-embeddings v3 |
| Vector DBs | pgvector 0.8 (our default for 80% of cases), Qdrant, Weaviate, Pinecone |
| Reranking | Cohere Rerank v3.5, BAAI bge-reranker-large, local cross-encoders |
| LLM observability | Langfuse, Helicone, LangSmith, Arize Phoenix |
| Evaluation | Ragas, DeepEval, OpenAI Evals, custom eval on client dataset |
| Guardrails | Lakera Guard, Rebuff, NeMo Guardrails, regex + PII detection (presidio) |
| Fine-tuning | Axolotl, Unsloth, LoRA, QLoRA on A100/H100 GPUs rented by the hour |
| Agents | Claude Agent SDK, OpenAI Assistants, AutoGen, Semantic Kernel |
Related services
Our AI engagements plug into these catalogue services.
- Artificial intelligence — POC, RAG, agents, MLOps, controlled costs.
- Software architecture — clean integration into a Symfony monolith or microservices.
- Cybersecurity — LLM guardrails, PII protection, prompt security.
- Performance & scalability — embedding caching, batching, API cost optimization.
- Custom PHP development — native Claude and OpenAI clients, Messenger for async.
Typical use cases
Semantic search over a 50,000-document knowledge base. Ingestion, smart chunking, OpenAI text-embedding-3-large embeddings, pgvector HNSW indexing, Cohere reranking, Claude generation. Median response time under 1.2s, cost 12€ per month for indexing, 0.30€ per query.
Sales qualification agent. Claude drives a multi-step workflow via the SDK: enrichment via external APIs, BANT scoring, drafting of a summary note, opportunity creation in the CRM. 300 leads processed per day, saving 2 FTEs on pre-sales.
Automating regulatory report writing. Structured extraction from PDF, section composition via templates, cross-review by a second LLM, final human validation. Report production time divided by 4, human error rate reduced by 60%.
Automatic classification and routing of support tickets. Embeddings over historical data, classification into 18 categories, automatic assignment, escalation on detected negative sentiment. Correct-classification rate 94%, operational gain of 1.5 support FTE.
AI-specific FAQ
Should I use LangChain with Symfony or code the pipeline directly? For a simple RAG or agents with predictable workflow, we implement directly in PHP with the Symfony HTTP client. The code is more maintainable, typing is strict, PHPUnit tests are reliable. LangChain remains useful in Python for fast prototypes or very complex chains. For critical production, we prefer explicit code.
pgvector or a dedicated vector DB (Qdrant, Pinecone)? pgvector covers 80% of real needs: up to 5 million 1536-dim vectors, with response times under 50 ms thanks to the HNSW index. The huge advantage: ACID transactions with the rest of the business, unified backups, standard SQL. We switch to Qdrant beyond 50 million vectors, or when pre-query filtering becomes complex.
How do I keep LLM API costs under control? Four combined levers. One, pick the smallest model that passes the eval (GPT-5-mini or Claude Haiku often win). Two, cache identical responses aggressively. Three, use native prompt caching (Anthropic or OpenAI) for long repetitive contexts, 50 to 90% savings on input. Four, batch non-urgent requests through batch APIs, 50% price reduction.
How do I avoid hallucinations in a RAG? Three golden rules. One, the retrieved context must be sufficient and relevant: reranking is mandatory, diversification, smart chunking. Two, the prompt must explicitly forbid answering without a source: "answer only from the provided context, otherwise say you do not know". Three, display sources to the user, this changes the model's behavior and user trust.
Can I self-host an LLM to avoid sending data to a third party? Yes, with Llama 4 70B or Mistral Large 2, quality is close to hosted models for classic classification or RAG tasks. The cost shifts to GPU: an A100 80 GB rents for 2€ an hour, roughly 1500€ per month running 24/7. Profitable beyond 10 million tokens per day. Below that, the hosted API remains cheaper.
Further reading
Our technical articles dive into AI in production.
- RAG in production with pgvector, Claude and Symfony — detailed architecture, code, SQL schemas, real costs.
- OWASP Top 10 2025: concrete implementation with Symfony 7 — security for endpoints that call an LLM.
- Migrating a PHP 5.6 legacy to 8.3 with the strangler pattern — technical groundwork needed to integrate AI into an old IT system.
Get in touch
An AI use case to assess, a POC to productionize, costs to control? Write to contact@your-digital-hub.com or use our contact page. First 60-minute AI framing workshop, no strings attached.