Feature

Message Deduplication


Written by: Magnus Landerblom


Message deduplication is an upcoming feature in LavinMQ that removes duplicate messages within the broker. Enable it to prevent redundant processing, save storage, and ensure each message is delivered only once. You can enable deduplication on both exchanges and queues.

Enabling deduplication

Deduplication is activated using arguments during an exchange or a queue setup. You’ll need to provide the following:

  1. x-message-deduplication (bool): Enables the feature when set to true.
  2. x-cache-size (int): Specifies the size of the cache, determining how many messages are stored for duplication checks. Keep in mind that larger caches require more memory.

Optional settings:

  • x-cache-ttl (int): Specifies how long a cache entry remains valid (in milliseconds). By default, entries never expire.
  • x-deduplication-header (string): Allows customization of the message header used for deduplication. The default header is x-deduplication-header.

How it works

  • On a queue: When a message is identified as a duplicate based on the configured settings, it won’t be delivered to the queue. This means that the exchange will still route the message to the queue but it will never end up in the queue.
  • On an Exchange: When enabled on an exchange, duplicate messages are filtered out entirely and are not forwarded to any queues bound to that exchange. This feature works across all exchange types in LavinMQ. ***

Message headers and customization

Deduplication checks rely on a specific message header, defaulting to x-deduplication-header. If a message lacks this header, it bypasses deduplication and is published as usual.

Additional customizations include:

  • Message-Specific TTL: A message can override the default cache TTL by specifying a value in the x-cache-ttl header. This is useful for messages that require different expiration rules.

Storage considerations

The deduplication cache is stored in memory, meaning:

  • Larger caches consume more memory, so it’s important to configure cache size carefully based on your system’s resources.
  • The cache is non-persistent. It resets if the broker restarts or if cluster leadership transfers to another node.