Memory Spike and Serilog


I am working on moving a WCF service to .Net 5. The new application is a REST service running on a Linux Docker container and Kubernetes.

Our goal is to be able to process more requests per second. After the migration, we ran load tests to see the results. Everything was running fine. We were able to process X20 requests. However, we found that pods were constantly restarting because they were running out of memory. The memory kept getting bigger and when it reached the limit, the pod would terminate.

I got a dump of the container before it terminated. I saw that Serilog ConcurrentQueue and Timer were consuming the memory. There are a lot of messages being logged in the application. Maybe 2K per request. The log level was Info and I changed it to Warning. But it was still the same. Because changing the log level does not prevent the logs from being queued. They were still in the queue and Serilog processes the messages and decides what to do depending on the log level. In the end, for production, I hardcoded only Waning and higher levels logging for production. After this change, the memory was stable.

Everything was working fine for a request, but we found this issue during load testing. This shows that it is important to do load testing, otherwise we would have serious problems in production.

Unfortunately I do not have the graphs anymore so I can not add them to this post.

No comments

Powered by Blogger.