Multi-threading is now becoming a norm. Obvious issue with logging is how to synchronize between threads. As discussed in last post Application Logging Improvement Plan – Part 1, we want to log as much as possible in machine readable format. So there comes a problem with multiple threads trying to log at the same time. Two possible implementations come to mind but both are flawed.
- Synchronize between threads for logging – Disk writes are slow and now locking contention would only make it worse. This slows down the business logic and is a big no-no.
- Log without synchronizing – Business logic works but logs get jumbled up because multiple threads are trying to log at the same time. This leaves logs in worst shape and unusable.
We can do better by combining both of above to get a solution. We will create a per thread logging buffer (lets call it LogBuffer) where each thread would log without any conflicts. And at a certain threshold, threads synchronize and log their LogBuffer to the disk (lets call this Flush). The buffer is in memory so logging even become faster and they synchronize before flushing to the disk. Two possible issues come to mind.
- Application Crash would take away all the not yet Flush-ed LogBuffer content with it. Now if there was a core generated, it can be loaded in debugger and all the not yet Flush-ed messages would show up in memory. Non-coring crashes can be handled by proper signal handing to flush before quit. That leaves kill -9 which cannot be captured. But if an application gets to this last resort, it is expected that a core was already generated for troubleshooting using tools like “gcore”. Another option would be to have a signal handler to invoke a Flush. e.g. SIGUSR1 handler would flush that can be used before issuing a kill -9. This is not fool proof but gets us much closure to where we want to be.
- Out Of Sequence log messages. These would make the log file look quite ridiculous. Log may indicate (in order of appearance) that application started to process a message before it arrived. Or even finished processing the content before processing began or arrival of message. Looking at time order of the messages, things start to clear up. So our idea of machine readable logs remain standing. Anyone processing the logs has to sort the logs in time order before using. This has a slight disadvantage that file cannot be viewed by simple less/more/vi commands. But the benefits of being able to monitor and have proper logs messages without slowing down business logic is the holy grail. Remember we are in multithread environment? So there would still be times when logs cannot be properly sorted in time order because they may have been logged at the same time. Logging thread identifier would help to create exact timeline as the message moves among the threads before leaving the threading ecosystem for storage or next level or what have you.