Elasticsearch with PHP and Symfony: search, aggregations, relevance tuning
When PostgreSQL full-text is no longer enough
PostgreSQL offers tsvector and tsquery, and in 80% of search cases that is sufficient. On a product table, a fuzzy search with the French dictionary and pg_trgm covers 90% of the need. The question is not replacing Postgres with Elasticsearch, but knowing when search becomes a subsystem of its own.
Signals that push us out of Postgres:
- Fine-grained relevance tuning: per-field boosting, score functions, rescorers. Hard to maintain in SQL.
- Real-time facets and aggregations: dynamic filtering on 20 dimensions (price, brand, category, tags). Very slow in SQL without a dedicated index.
- Multilingual: language-specific analyzers (stemming, synonyms). PostgreSQL handles a few languages, not all.
- Typo tolerance: fuzzy matching with edit distance. Possible in Postgres via
pg_trgmbut less controllable. - Volume: more than 10 million documents with p95 requests under 100 ms.
- Reads decoupled from writes: isolating the search workload from business transactions.
Once three or more boxes are checked, a dedicated search engine pays off.
Elasticsearch, OpenSearch, Meilisearch, Typesense
Four options dominate in 2026. Our comparison.
| Criterion | Elasticsearch 8/9 | OpenSearch | Meilisearch | Typesense |
|---|---|---|---|---|
| License | Elastic License v2 (proprietary) | Apache 2.0 | MIT | GPL v3 |
| Fork of | - | Elasticsearch 7.10 | - | - |
| Ecosystem | Most mature | Mature, AWS-compatible | Young but fast | Similar to Meilisearch |
| Native vector search | Yes (8.x+) | Yes | Yes (2024+) | Yes |
| Operational complexity | High | High | Low | Low |
| Rich aggregations | Yes, best on the market | Yes | Limited | Limited |
| Fine relevance tuning | Yes | Yes | Limited | Limited |
| Cloud cost ~1 node 8 GB | ~100 EUR/month | ~90 EUR (AWS) | ~30 EUR | ~30 EUR |
| Our recommendation | Full engine, aggregations, classic ES | AWS constraint or license | Simple product catalog | Meilisearch alternative |
Our rule: Elasticsearch by default for e-commerce, B2B and analytics-rich projects. Meilisearch for simple cases (content sites, modest catalog) where you want to ship in a day. OpenSearch if the infrastructure already lives on AWS and you want to avoid the Elastic license.
Local Elasticsearch setup
Docker Compose we use to bootstrap.
# docker-compose.yml
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.14.0
environment:
- discovery.type=single-node
- xpack.security.enabled=true
- ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
- ES_JAVA_OPTS=-Xms2g -Xmx2g
volumes:
- es_data:/usr/share/elasticsearch/data
ports:
- "9200:9200"
healthcheck:
test: ["CMD", "curl", "-fsSL", "-u", "elastic:${ELASTIC_PASSWORD}", "http://localhost:9200/_cluster/health"]
interval: 10s
timeout: 5s
retries: 10
kibana:
image: docker.elastic.co/kibana/kibana:8.14.0
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
- ELASTICSEARCH_USERNAME=kibana_system
- ELASTICSEARCH_PASSWORD=${KIBANA_PASSWORD}
ports:
- "5601:5601"
depends_on:
elasticsearch:
condition: service_healthy
volumes:
es_data:
Baseline sizing for production: start with 1 node 8 GB RAM, scale to 3 nodes beyond 10 million documents or 500 requests/second. JVM heap at 50% of max RAM.
PHP clients: official vs Elastica
Two clients dominate the PHP ecosystem.
elasticsearch/elasticsearch (official)
Maintained by Elastic, very close 1:1 to the REST API. Explicit configuration, strong typing, manual pagination. Our default on new projects.
use Elastic\Elasticsearch\ClientBuilder;
$client = ClientBuilder::create()
->setHosts(['http://elasticsearch:9200'])
->setBasicAuthentication('elastic', getenv('ELASTIC_PASSWORD'))
->build();
$response = $client->search([
'index' => 'products',
'body' => [
'query' => [
'multi_match' => [
'query' => 'bluetooth headphones',
'fields' => ['title^3', 'description', 'brand^2'],
],
],
'size' => 20,
],
]);
ruflin/elastica
Fluent PHP DSL to build queries, avoiding deeply nested arrays. More readable on complex queries, but adds a layer to maintain. Interesting on projects that build many dynamic queries.
Our recommendation
Official client for new integrations, unless the team is already fluent with Elastica. Avoid the historic FOSElasticaBundle on modern Symfony: it fits poorly with 8.x APIs and its abstractions hide more than they help.
Explicit mapping, never dynamic
Mapping is the schema of your documents. Elasticsearch can infer it automatically, but this is a false shortcut: a field wrongly detected as text instead of keyword, and aggregations silently break.
PUT /products-v1
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"analysis": {
"analyzer": {
"english_product": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "english_stop", "english_stemmer", "asciifolding"]
},
"edge_ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "asciifolding", "edge_ngram_filter"]
}
},
"filter": {
"english_stop": { "type": "stop", "stopwords": "_english_" },
"english_stemmer": { "type": "stemmer", "language": "light_english" },
"edge_ngram_filter": { "type": "edge_ngram", "min_gram": 2, "max_gram": 15 }
}
}
},
"mappings": {
"properties": {
"id": { "type": "keyword" },
"sku": { "type": "keyword" },
"title": {
"type": "text",
"analyzer": "english_product",
"fields": {
"raw": { "type": "keyword" },
"autocomplete": { "type": "text", "analyzer": "edge_ngram_analyzer", "search_analyzer": "english_product" }
}
},
"description": { "type": "text", "analyzer": "english_product" },
"brand": { "type": "keyword" },
"price_cents": { "type": "integer" },
"currency": { "type": "keyword" },
"stock": { "type": "integer" },
"tags": { "type": "keyword" },
"category_path": { "type": "keyword" },
"published_at": { "type": "date" },
"rating_avg": { "type": "float" },
"rating_count": { "type": "integer" }
}
}
}
The multi-fields pattern is critical: title exists in three variants (analyzed for search, raw for exact aggregations, autocomplete for edge_ngram autocomplete). No application-side duplication.
Indexing via Symfony Messenger
Elasticsearch writes must be asynchronous. Synchronizing indexing with the business transaction is an anti-pattern: if Elasticsearch fails, the transaction fails too. Use Messenger to publish an event, a consumer handles indexing.
// src/Search/Message/IndexProductMessage.php
namespace App\Search\Message;
final readonly class IndexProductMessage
{
public function __construct(
public int $productId,
public string $action = 'upsert', // or 'delete'
) {}
}
// src/Search/MessageHandler/IndexProductHandler.php
namespace App\Search\MessageHandler;
use App\Entity\Product;
use App\Search\Message\IndexProductMessage;
use App\Search\ProductIndexer;
use Doctrine\ORM\EntityManagerInterface;
use Symfony\Component\Messenger\Attribute\AsMessageHandler;
#[AsMessageHandler]
final readonly class IndexProductHandler
{
public function __construct(
private EntityManagerInterface $em,
private ProductIndexer $indexer,
) {}
public function __invoke(IndexProductMessage $message): void
{
if ($message->action === 'delete') {
$this->indexer->delete($message->productId);
return;
}
$product = $this->em->find(Product::class, $message->productId);
if ($product === null) {
$this->indexer->delete($message->productId);
return;
}
$this->indexer->upsert($product);
}
}
Bulk API with batching
On a full reindex, never index document by document: 10 to 20 times slower. Use the _bulk API with batches of 500 to 2000 documents.
namespace App\Search;
use App\Entity\Product;
use Elastic\Elasticsearch\Client;
use Doctrine\ORM\EntityManagerInterface;
final class ProductBulkIndexer
{
private const BATCH_SIZE = 1000;
private const INDEX = 'products';
public function __construct(
private readonly Client $client,
private readonly EntityManagerInterface $em,
) {}
public function reindexAll(?callable $onProgress = null): int
{
$total = 0;
$batch = [];
$iterator = $this->em->getRepository(Product::class)
->createQueryBuilder('p')
->getQuery()
->toIterable();
foreach ($iterator as $product) {
$batch[] = ['index' => ['_index' => self::INDEX, '_id' => (string) $product->getId()]];
$batch[] = $this->toDocument($product);
if (count($batch) >= self::BATCH_SIZE * 2) {
$this->flush($batch);
$total += self::BATCH_SIZE;
$batch = [];
$this->em->clear();
$onProgress?->call($this, $total);
}
}
if ($batch !== []) {
$this->flush($batch);
$total += count($batch) / 2;
}
return (int) $total;
}
private function flush(array $body): void
{
$response = $this->client->bulk(['body' => $body]);
$payload = $response->asArray();
if ($payload['errors'] ?? false) {
$errors = [];
foreach ($payload['items'] as $item) {
$op = array_values($item)[0];
if (($op['error'] ?? null) !== null) {
$errors[] = $op['error'];
}
}
throw new \RuntimeException('Bulk indexing errors: ' . json_encode($errors, JSON_THROW_ON_ERROR));
}
}
private function toDocument(Product $p): array
{
return [
'id' => (string) $p->getId(),
'sku' => $p->getSku(),
'title' => $p->getTitle(),
'description' => $p->getDescription(),
'brand' => $p->getBrand(),
'price_cents' => $p->getPriceCents(),
'currency' => $p->getCurrency(),
'stock' => $p->getStock(),
'tags' => $p->getTags(),
'category_path' => $p->getCategoryPath(),
'published_at' => $p->getPublishedAt()?->format(DATE_ATOM),
'rating_avg' => $p->getRatingAvg(),
'rating_count' => $p->getRatingCount(),
];
}
}
The $em->clear() after each batch is critical: without it, Doctrine keeps all entities in memory and can consume several gigabytes over a million documents.
Queries: bool, multi_match, function_score
Multi-field search with boost
GET /products/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "bluetooth headphones",
"fields": ["title^3", "description", "brand^2"],
"type": "best_fields",
"operator": "and",
"fuzziness": "AUTO"
}
}
],
"filter": [
{ "term": { "brand": "Sony" } },
{ "range": { "price_cents": { "gte": 5000, "lte": 30000 } } }
],
"should": [
{ "range": { "rating_avg": { "gte": 4.0, "boost": 2.0 } } }
]
}
},
"size": 20
}
must filters and contributes to scoring. filter filters without scoring (cache-friendly, faster). should boosts without requirement. This separation is the heart of the Elasticsearch DSL.
Aggregations for facets
GET /products/_search
{
"query": { "match_all": {} },
"size": 0,
"aggs": {
"by_brand": {
"terms": { "field": "brand", "size": 20 }
},
"price_histogram": {
"histogram": { "field": "price_cents", "interval": 5000 }
},
"over_time": {
"date_histogram": { "field": "published_at", "calendar_interval": "month" }
}
}
}
Aggregations power faceted filter UIs: "Sony (42)", "Bose (18)", price histogram. PostgreSQL can do this, not under 20 ms on 10 million rows.
Function score for relevance tuning
Custom boost combining text relevance and business signals.
{
"query": {
"function_score": {
"query": { "multi_match": { "query": "headphones", "fields": ["title", "description"] } },
"functions": [
{ "filter": { "term": { "in_stock": true } }, "weight": 2 },
{ "field_value_factor": { "field": "rating_avg", "factor": 1.2, "missing": 1 } },
{ "gauss": { "published_at": { "origin": "now", "scale": "30d", "decay": 0.5 } } }
],
"score_mode": "multiply",
"boost_mode": "multiply"
}
}
}
This template promotes in-stock, well-rated, recently published products. Three business levers you cannot model as simply in SQL.
Zero-downtime reindex with aliases
Aliases are the key to Elasticsearch maintenance without downtime. Rather than writing to products, write to products-v1 and expose products as an alias.
Reindex procedure:
- Create
products-v2with the new mapping. - Reindex all content (via
_reindexAPI or from the source DB). - Atomically switch the alias:
productsnow points toproducts-v2. - Delete
products-v1after verification.
POST /_aliases
{
"actions": [
{ "remove": { "index": "products-v1", "alias": "products" } },
{ "add": { "index": "products-v2", "alias": "products" } }
]
}
The action is atomic: no client sees an intermediate state. The switch takes a millisecond.
Snapshots and restore
Elasticsearch ships with a snapshot system to S3 or shared storage. Incremental backup, fast restore, versioned.
PUT /_snapshot/s3_repo
{
"type": "s3",
"settings": {
"bucket": "ydh-es-backups",
"region": "eu-west-1",
"base_path": "prod"
}
}
PUT /_snapshot/s3_repo/snapshot_2026_06_15?wait_for_completion=false
{
"indices": "products,orders",
"include_global_state": false
}
We schedule a daily snapshot via Curator, 30-day retention, quarterly restore test on staging. An untested backup is a backup that does not exist.
Anti-patterns to avoid
Mistakes we see in audits with strong impact.
- Elasticsearch as the source of truth. It has no ACID transactions. Always keep PostgreSQL (or the source) as master, Elasticsearch as derived index.
- Dynamic mapping in prod. An unexpected field creates a wrong mapping only a full reindex can fix. Always
"dynamic": "strict"or"false". - No lifecycle policy. Log indexes pile up until disks saturate. ILM (Index Lifecycle Management) automates rotation.
- Poorly sized shards. One shard per index under 30 GB is the rule. 1000 shards on a cluster is unmanageable.
- Direct browser connection. Never. Always go through a backend that filters tenants and builds queries.
- No application-side circuit breaker. An Elasticsearch outage cascades across every endpoint using it if there is no short timeout and fallback.
2026 costs
Estimates based on deployments we operate.
| Size | Volume | Cluster | Monthly Elastic Cloud cost |
|---|---|---|---|
| Small | < 1M docs, < 10 RPS | 1 node 2 GB | ~50 EUR |
| Medium | 1 to 10M docs, 50 RPS | 1 node 8 GB | ~100 EUR |
| Large | 10 to 100M docs, 500 RPS | 3 nodes 16 GB | ~800 EUR |
| Very large | > 100M docs, > 1000 RPS | Dedicated multi-region | from 2500 EUR |
Self-hosted on VPS: divide by 3 to 5, but add operational cost (1 to 2 days per month for a senior sysadmin on a production cluster).
Conclusion
Elasticsearch is a powerful but demanding tool. Deployed without discipline (dynamic mapping, no aliases, poorly sized shards), it quickly becomes a cost and a source of outages. Deployed with method (strict mapping, aliases, bulk indexing, snapshots, ILM), it transforms search for a PHP application.
For Elasticsearch scoping on your project, a search rework or an audit of an existing cluster, reach out at contact@your-digital-hub.com. See also our PHP expertise and our software architecture service.