Kafka Consumer Topic subscription strategy
Kafka consumer API provides the capability to consumer group to subscribe to multiple topics at the same time.
consumer.subscribe(Collections.singletonList("topic1", "topic2", ...));
Kafka consumer Group subscribing to multiple topics
Features
Prevents message starving in a topic
Messages are enqueued starting from the first partition, if there are no more messages in the current partition left, but there are still bytes to fill, messages from the next partition will be enqueued until there are no more messages or the buffer is full.
After the consumer receives the buffer, it will split it into CompletedFetches
, where one CompletedFetch
contains all the messages of one topic partition, the CompletedFetches
are enqueued
The enqueued CompletedFetches
are logically flattened into one big queue, and since the requests to each partition are sent in parallel they may be mixed together
consumer.poll()
will read and dequeue at most max.poll.records
from that flattened big queue.
Subsequent fetch requests exclude all the topic partitions that are already in the flattened queue.
This means that you’ll have no starving messages and message consumption happens in a round robin manner, but you may have a large number of messages from one topic, before you’ll get a large number of messages for the next topic.
By using property max.partition.fetch.bytes, number of messages to be consumed from a partition at a poll can be regulated.
Pros
- Number of threads spawned as part of the consumer group in the single application remains constant. Thread count is not proportional to the number of topics to which the application subscribes to.
- The constant number of threads spawned prevents excessive context switching which at the end might result in more performance bottleneck in the feat to achieve parallelism.
- If more parallelism needs to be achieved, the no of consumer threads can be increased.
Cons
Separate Consumer Group for each Topic
Pros
- Each kafka topic will have its own consumer thread group which can be scaled independently.
Cons
- With increase in number of Kafka topics, new consumer group will be created and hence number of threads will keep on increasing. Resulting in too much context-switching which will eventually slow down the application.
- With increase in threads, lot of resources in terms of memory and CPU will be consumed.
- Auto-scaling of application might result in a problem. The application instances which will be spawned due to autoscaling might also use similar amount of resources (memory/CPU). This might result in spawning of new application instances in a loop.