YOUR DIGITAL HUB
← Back to blog

Elasticsearch with PHP and Symfony: search, aggregations, relevance tuning

· 10 min read
Cover — Elasticsearch with PHP and Symfony

When PostgreSQL full-text is no longer enough

PostgreSQL offers tsvector and tsquery, and in 80% of search cases that is sufficient. On a product table, a fuzzy search with the French dictionary and pg_trgm covers 90% of the need. The question is not replacing Postgres with Elasticsearch, but knowing when search becomes a subsystem of its own.

Signals that push us out of Postgres:

  • Fine-grained relevance tuning: per-field boosting, score functions, rescorers. Hard to maintain in SQL.
  • Real-time facets and aggregations: dynamic filtering on 20 dimensions (price, brand, category, tags). Very slow in SQL without a dedicated index.
  • Multilingual: language-specific analyzers (stemming, synonyms). PostgreSQL handles a few languages, not all.
  • Typo tolerance: fuzzy matching with edit distance. Possible in Postgres via pg_trgm but less controllable.
  • Volume: more than 10 million documents with p95 requests under 100 ms.
  • Reads decoupled from writes: isolating the search workload from business transactions.

Once three or more boxes are checked, a dedicated search engine pays off.

Elasticsearch, OpenSearch, Meilisearch, Typesense

Four options dominate in 2026. Our comparison.

Criterion Elasticsearch 8/9 OpenSearch Meilisearch Typesense
License Elastic License v2 (proprietary) Apache 2.0 MIT GPL v3
Fork of - Elasticsearch 7.10 - -
Ecosystem Most mature Mature, AWS-compatible Young but fast Similar to Meilisearch
Native vector search Yes (8.x+) Yes Yes (2024+) Yes
Operational complexity High High Low Low
Rich aggregations Yes, best on the market Yes Limited Limited
Fine relevance tuning Yes Yes Limited Limited
Cloud cost ~1 node 8 GB ~100 EUR/month ~90 EUR (AWS) ~30 EUR ~30 EUR
Our recommendation Full engine, aggregations, classic ES AWS constraint or license Simple product catalog Meilisearch alternative

Our rule: Elasticsearch by default for e-commerce, B2B and analytics-rich projects. Meilisearch for simple cases (content sites, modest catalog) where you want to ship in a day. OpenSearch if the infrastructure already lives on AWS and you want to avoid the Elastic license.

Local Elasticsearch setup

Docker Compose we use to bootstrap.

# docker-compose.yml
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.14.0
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=true
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
      - ES_JAVA_OPTS=-Xms2g -Xmx2g
    volumes:
      - es_data:/usr/share/elasticsearch/data
    ports:
      - "9200:9200"
    healthcheck:
      test: ["CMD", "curl", "-fsSL", "-u", "elastic:${ELASTIC_PASSWORD}", "http://localhost:9200/_cluster/health"]
      interval: 10s
      timeout: 5s
      retries: 10

  kibana:
    image: docker.elastic.co/kibana/kibana:8.14.0
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
      - ELASTICSEARCH_USERNAME=kibana_system
      - ELASTICSEARCH_PASSWORD=${KIBANA_PASSWORD}
    ports:
      - "5601:5601"
    depends_on:
      elasticsearch:
        condition: service_healthy

volumes:
  es_data:

Baseline sizing for production: start with 1 node 8 GB RAM, scale to 3 nodes beyond 10 million documents or 500 requests/second. JVM heap at 50% of max RAM.

PHP clients: official vs Elastica

Two clients dominate the PHP ecosystem.

elasticsearch/elasticsearch (official)

Maintained by Elastic, very close 1:1 to the REST API. Explicit configuration, strong typing, manual pagination. Our default on new projects.

use Elastic\Elasticsearch\ClientBuilder;

$client = ClientBuilder::create()
    ->setHosts(['http://elasticsearch:9200'])
    ->setBasicAuthentication('elastic', getenv('ELASTIC_PASSWORD'))
    ->build();

$response = $client->search([
    'index' => 'products',
    'body' => [
        'query' => [
            'multi_match' => [
                'query' => 'bluetooth headphones',
                'fields' => ['title^3', 'description', 'brand^2'],
            ],
        ],
        'size' => 20,
    ],
]);

ruflin/elastica

Fluent PHP DSL to build queries, avoiding deeply nested arrays. More readable on complex queries, but adds a layer to maintain. Interesting on projects that build many dynamic queries.

Our recommendation

Official client for new integrations, unless the team is already fluent with Elastica. Avoid the historic FOSElasticaBundle on modern Symfony: it fits poorly with 8.x APIs and its abstractions hide more than they help.

Explicit mapping, never dynamic

Mapping is the schema of your documents. Elasticsearch can infer it automatically, but this is a false shortcut: a field wrongly detected as text instead of keyword, and aggregations silently break.

PUT /products-v1
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1,
    "analysis": {
      "analyzer": {
        "english_product": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "english_stop", "english_stemmer", "asciifolding"]
        },
        "edge_ngram_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "asciifolding", "edge_ngram_filter"]
        }
      },
      "filter": {
        "english_stop": { "type": "stop", "stopwords": "_english_" },
        "english_stemmer": { "type": "stemmer", "language": "light_english" },
        "edge_ngram_filter": { "type": "edge_ngram", "min_gram": 2, "max_gram": 15 }
      }
    }
  },
  "mappings": {
    "properties": {
      "id": { "type": "keyword" },
      "sku": { "type": "keyword" },
      "title": {
        "type": "text",
        "analyzer": "english_product",
        "fields": {
          "raw": { "type": "keyword" },
          "autocomplete": { "type": "text", "analyzer": "edge_ngram_analyzer", "search_analyzer": "english_product" }
        }
      },
      "description": { "type": "text", "analyzer": "english_product" },
      "brand": { "type": "keyword" },
      "price_cents": { "type": "integer" },
      "currency": { "type": "keyword" },
      "stock": { "type": "integer" },
      "tags": { "type": "keyword" },
      "category_path": { "type": "keyword" },
      "published_at": { "type": "date" },
      "rating_avg": { "type": "float" },
      "rating_count": { "type": "integer" }
    }
  }
}

The multi-fields pattern is critical: title exists in three variants (analyzed for search, raw for exact aggregations, autocomplete for edge_ngram autocomplete). No application-side duplication.

Indexing via Symfony Messenger

Elasticsearch writes must be asynchronous. Synchronizing indexing with the business transaction is an anti-pattern: if Elasticsearch fails, the transaction fails too. Use Messenger to publish an event, a consumer handles indexing.

// src/Search/Message/IndexProductMessage.php
namespace App\Search\Message;

final readonly class IndexProductMessage
{
    public function __construct(
        public int $productId,
        public string $action = 'upsert', // or 'delete'
    ) {}
}
// src/Search/MessageHandler/IndexProductHandler.php
namespace App\Search\MessageHandler;

use App\Entity\Product;
use App\Search\Message\IndexProductMessage;
use App\Search\ProductIndexer;
use Doctrine\ORM\EntityManagerInterface;
use Symfony\Component\Messenger\Attribute\AsMessageHandler;

#[AsMessageHandler]
final readonly class IndexProductHandler
{
    public function __construct(
        private EntityManagerInterface $em,
        private ProductIndexer $indexer,
    ) {}

    public function __invoke(IndexProductMessage $message): void
    {
        if ($message->action === 'delete') {
            $this->indexer->delete($message->productId);
            return;
        }
        $product = $this->em->find(Product::class, $message->productId);
        if ($product === null) {
            $this->indexer->delete($message->productId);
            return;
        }
        $this->indexer->upsert($product);
    }
}

Bulk API with batching

On a full reindex, never index document by document: 10 to 20 times slower. Use the _bulk API with batches of 500 to 2000 documents.

namespace App\Search;

use App\Entity\Product;
use Elastic\Elasticsearch\Client;
use Doctrine\ORM\EntityManagerInterface;

final class ProductBulkIndexer
{
    private const BATCH_SIZE = 1000;
    private const INDEX = 'products';

    public function __construct(
        private readonly Client $client,
        private readonly EntityManagerInterface $em,
    ) {}

    public function reindexAll(?callable $onProgress = null): int
    {
        $total = 0;
        $batch = [];
        $iterator = $this->em->getRepository(Product::class)
            ->createQueryBuilder('p')
            ->getQuery()
            ->toIterable();

        foreach ($iterator as $product) {
            $batch[] = ['index' => ['_index' => self::INDEX, '_id' => (string) $product->getId()]];
            $batch[] = $this->toDocument($product);

            if (count($batch) >= self::BATCH_SIZE * 2) {
                $this->flush($batch);
                $total += self::BATCH_SIZE;
                $batch = [];
                $this->em->clear();
                $onProgress?->call($this, $total);
            }
        }

        if ($batch !== []) {
            $this->flush($batch);
            $total += count($batch) / 2;
        }
        return (int) $total;
    }

    private function flush(array $body): void
    {
        $response = $this->client->bulk(['body' => $body]);
        $payload = $response->asArray();
        if ($payload['errors'] ?? false) {
            $errors = [];
            foreach ($payload['items'] as $item) {
                $op = array_values($item)[0];
                if (($op['error'] ?? null) !== null) {
                    $errors[] = $op['error'];
                }
            }
            throw new \RuntimeException('Bulk indexing errors: ' . json_encode($errors, JSON_THROW_ON_ERROR));
        }
    }

    private function toDocument(Product $p): array
    {
        return [
            'id' => (string) $p->getId(),
            'sku' => $p->getSku(),
            'title' => $p->getTitle(),
            'description' => $p->getDescription(),
            'brand' => $p->getBrand(),
            'price_cents' => $p->getPriceCents(),
            'currency' => $p->getCurrency(),
            'stock' => $p->getStock(),
            'tags' => $p->getTags(),
            'category_path' => $p->getCategoryPath(),
            'published_at' => $p->getPublishedAt()?->format(DATE_ATOM),
            'rating_avg' => $p->getRatingAvg(),
            'rating_count' => $p->getRatingCount(),
        ];
    }
}

The $em->clear() after each batch is critical: without it, Doctrine keeps all entities in memory and can consume several gigabytes over a million documents.

Queries: bool, multi_match, function_score

Multi-field search with boost

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "bluetooth headphones",
            "fields": ["title^3", "description", "brand^2"],
            "type": "best_fields",
            "operator": "and",
            "fuzziness": "AUTO"
          }
        }
      ],
      "filter": [
        { "term": { "brand": "Sony" } },
        { "range": { "price_cents": { "gte": 5000, "lte": 30000 } } }
      ],
      "should": [
        { "range": { "rating_avg": { "gte": 4.0, "boost": 2.0 } } }
      ]
    }
  },
  "size": 20
}

must filters and contributes to scoring. filter filters without scoring (cache-friendly, faster). should boosts without requirement. This separation is the heart of the Elasticsearch DSL.

Aggregations for facets

GET /products/_search
{
  "query": { "match_all": {} },
  "size": 0,
  "aggs": {
    "by_brand": {
      "terms": { "field": "brand", "size": 20 }
    },
    "price_histogram": {
      "histogram": { "field": "price_cents", "interval": 5000 }
    },
    "over_time": {
      "date_histogram": { "field": "published_at", "calendar_interval": "month" }
    }
  }
}

Aggregations power faceted filter UIs: "Sony (42)", "Bose (18)", price histogram. PostgreSQL can do this, not under 20 ms on 10 million rows.

Function score for relevance tuning

Custom boost combining text relevance and business signals.

{
  "query": {
    "function_score": {
      "query": { "multi_match": { "query": "headphones", "fields": ["title", "description"] } },
      "functions": [
        { "filter": { "term": { "in_stock": true } }, "weight": 2 },
        { "field_value_factor": { "field": "rating_avg", "factor": 1.2, "missing": 1 } },
        { "gauss": { "published_at": { "origin": "now", "scale": "30d", "decay": 0.5 } } }
      ],
      "score_mode": "multiply",
      "boost_mode": "multiply"
    }
  }
}

This template promotes in-stock, well-rated, recently published products. Three business levers you cannot model as simply in SQL.

Zero-downtime reindex with aliases

Aliases are the key to Elasticsearch maintenance without downtime. Rather than writing to products, write to products-v1 and expose products as an alias.

Reindex procedure:

  1. Create products-v2 with the new mapping.
  2. Reindex all content (via _reindex API or from the source DB).
  3. Atomically switch the alias: products now points to products-v2.
  4. Delete products-v1 after verification.
POST /_aliases
{
  "actions": [
    { "remove": { "index": "products-v1", "alias": "products" } },
    { "add":    { "index": "products-v2", "alias": "products" } }
  ]
}

The action is atomic: no client sees an intermediate state. The switch takes a millisecond.

Snapshots and restore

Elasticsearch ships with a snapshot system to S3 or shared storage. Incremental backup, fast restore, versioned.

PUT /_snapshot/s3_repo
{
  "type": "s3",
  "settings": {
    "bucket": "ydh-es-backups",
    "region": "eu-west-1",
    "base_path": "prod"
  }
}

PUT /_snapshot/s3_repo/snapshot_2026_06_15?wait_for_completion=false
{
  "indices": "products,orders",
  "include_global_state": false
}

We schedule a daily snapshot via Curator, 30-day retention, quarterly restore test on staging. An untested backup is a backup that does not exist.

Anti-patterns to avoid

Mistakes we see in audits with strong impact.

  • Elasticsearch as the source of truth. It has no ACID transactions. Always keep PostgreSQL (or the source) as master, Elasticsearch as derived index.
  • Dynamic mapping in prod. An unexpected field creates a wrong mapping only a full reindex can fix. Always "dynamic": "strict" or "false".
  • No lifecycle policy. Log indexes pile up until disks saturate. ILM (Index Lifecycle Management) automates rotation.
  • Poorly sized shards. One shard per index under 30 GB is the rule. 1000 shards on a cluster is unmanageable.
  • Direct browser connection. Never. Always go through a backend that filters tenants and builds queries.
  • No application-side circuit breaker. An Elasticsearch outage cascades across every endpoint using it if there is no short timeout and fallback.

2026 costs

Estimates based on deployments we operate.

Size Volume Cluster Monthly Elastic Cloud cost
Small < 1M docs, < 10 RPS 1 node 2 GB ~50 EUR
Medium 1 to 10M docs, 50 RPS 1 node 8 GB ~100 EUR
Large 10 to 100M docs, 500 RPS 3 nodes 16 GB ~800 EUR
Very large > 100M docs, > 1000 RPS Dedicated multi-region from 2500 EUR

Self-hosted on VPS: divide by 3 to 5, but add operational cost (1 to 2 days per month for a senior sysadmin on a production cluster).

Conclusion

Elasticsearch is a powerful but demanding tool. Deployed without discipline (dynamic mapping, no aliases, poorly sized shards), it quickly becomes a cost and a source of outages. Deployed with method (strict mapping, aliases, bulk indexing, snapshots, ILM), it transforms search for a PHP application.

For Elasticsearch scoping on your project, a search rework or an audit of an existing cluster, reach out at contact@your-digital-hub.com. See also our PHP expertise and our software architecture service.