YOUR DIGITAL HUB

Artificial Intelligence Expertise

Generative AI, an opportunity to exploit with method

In 2026, generative artificial intelligence has left the exploration phase. LLMs produce code daily in more than 70% of European tech teams. Autonomous agents automate customer support, sales qualification, report generation. RAG makes internal corpora queryable in natural language. Claude, GPT-5, Mistral Large, Llama 4 offer capabilities that were unthinkable three years ago.

The problem is no longer technical, it is methodological. AI projects that fail share the same symptoms: POCs that never reach production, spiraling costs, undetected hallucinations, single-provider lock-in, PII leaking into prompts, no quantitative quality evaluation. Our boutique applies the same rigor to AI projects as to a critical IT system: evaluation, observability, guardrails, controlled costs, reversibility.

Our approach

AI must solve a measurable business problem, not feed a communication deck.

Technologies & frameworks we master

Area Tools and models
Hosted LLMs Claude 4.5 Sonnet, Claude 4.5 Opus, GPT-5, GPT-5-mini, Mistral Large 2, Gemini 2 Pro
Self-hosted LLMs Llama 4, Qwen 3, Mistral 7B/24B, DeepSeek V3, via vLLM, Ollama, TGI
Orchestration Symfony + HTTP client, LangChain (Python for POCs), LlamaIndex, CrewAI
Embeddings OpenAI text-embedding-3-large, Cohere Embed v4, BAAI bge-large, jina-embeddings v3
Vector DBs pgvector 0.8 (our default for 80% of cases), Qdrant, Weaviate, Pinecone
Reranking Cohere Rerank v3.5, BAAI bge-reranker-large, local cross-encoders
LLM observability Langfuse, Helicone, LangSmith, Arize Phoenix
Evaluation Ragas, DeepEval, OpenAI Evals, custom eval on client dataset
Guardrails Lakera Guard, Rebuff, NeMo Guardrails, regex + PII detection (presidio)
Fine-tuning Axolotl, Unsloth, LoRA, QLoRA on A100/H100 GPUs rented by the hour
Agents Claude Agent SDK, OpenAI Assistants, AutoGen, Semantic Kernel

Related services

Our AI engagements plug into these catalogue services.

Typical use cases

Semantic search over a 50,000-document knowledge base. Ingestion, smart chunking, OpenAI text-embedding-3-large embeddings, pgvector HNSW indexing, Cohere reranking, Claude generation. Median response time under 1.2s, cost 12€ per month for indexing, 0.30€ per query.

Sales qualification agent. Claude drives a multi-step workflow via the SDK: enrichment via external APIs, BANT scoring, drafting of a summary note, opportunity creation in the CRM. 300 leads processed per day, saving 2 FTEs on pre-sales.

Automating regulatory report writing. Structured extraction from PDF, section composition via templates, cross-review by a second LLM, final human validation. Report production time divided by 4, human error rate reduced by 60%.

Automatic classification and routing of support tickets. Embeddings over historical data, classification into 18 categories, automatic assignment, escalation on detected negative sentiment. Correct-classification rate 94%, operational gain of 1.5 support FTE.

AI-specific FAQ

Should I use LangChain with Symfony or code the pipeline directly? For a simple RAG or agents with predictable workflow, we implement directly in PHP with the Symfony HTTP client. The code is more maintainable, typing is strict, PHPUnit tests are reliable. LangChain remains useful in Python for fast prototypes or very complex chains. For critical production, we prefer explicit code.

pgvector or a dedicated vector DB (Qdrant, Pinecone)? pgvector covers 80% of real needs: up to 5 million 1536-dim vectors, with response times under 50 ms thanks to the HNSW index. The huge advantage: ACID transactions with the rest of the business, unified backups, standard SQL. We switch to Qdrant beyond 50 million vectors, or when pre-query filtering becomes complex.

How do I keep LLM API costs under control? Four combined levers. One, pick the smallest model that passes the eval (GPT-5-mini or Claude Haiku often win). Two, cache identical responses aggressively. Three, use native prompt caching (Anthropic or OpenAI) for long repetitive contexts, 50 to 90% savings on input. Four, batch non-urgent requests through batch APIs, 50% price reduction.

How do I avoid hallucinations in a RAG? Three golden rules. One, the retrieved context must be sufficient and relevant: reranking is mandatory, diversification, smart chunking. Two, the prompt must explicitly forbid answering without a source: "answer only from the provided context, otherwise say you do not know". Three, display sources to the user, this changes the model's behavior and user trust.

Can I self-host an LLM to avoid sending data to a third party? Yes, with Llama 4 70B or Mistral Large 2, quality is close to hosted models for classic classification or RAG tasks. The cost shifts to GPU: an A100 80 GB rents for 2€ an hour, roughly 1500€ per month running 24/7. Profitable beyond 10 million tokens per day. Below that, the hosted API remains cheaper.

Further reading

Our technical articles dive into AI in production.

Get in touch

An AI use case to assess, a POC to productionize, costs to control? Write to contact@your-digital-hub.com or use our contact page. First 60-minute AI framing workshop, no strings attached.