I’d like to make my logs more useful. Currently they mostly contain such common columns as: timestamp, logger-name, log-level, message, exception.
They are unsearchable and unparsable and writing messsges and adding some data to them is not actually helping. Consequently when something goes wrong I’m not able to easily find the answer by just looking at the logs and need to debug my application.
The questions that need to be asnwered are not only about why the application crashed or didn’t do something correctltly but also about business rules like: why did I get a bonus when ordering ABC?
So usually I need to open the IDE and get the order and debug it to find which condition prevented the bonus to be added and whether it was ok to do so. Have I had logged the data or any other criteria necessary to make this decision, I would have been able to find it in the logs and answer the question maybe within 5 minutes.
But it’s not possible to do so with the default schema with the message being the main part of it.
I thought I need to completely reorganize my logs in order to be able to log more data. But I cannot just put everything into the message or additional columns because there are too many possibilities and I’d like to have a general solution that would work for any application.
This means that I need more specific fields then just the message where I can put the additional information.
In order to find those fields I categorizing every piece of data I could think of. This is my list:
Environment
– this is the largest scope. I use this to log machine names or dev/prod environments.-
Product
(name) – runs within the environment. -
Layer
(name) – this helps me to categorize the logs by the software layer. Each of the layers has its own log-level so I have:Application
– for general technical data about the application itself, this is logged with the max; log-level: DebugBusiness
– which are logs about business-logic; log-level: InformationPresentation
– logs about the UI; log-level: TraceIO
– logs about disk operations; log-level: TraceDatabase
– logs about database; log-level: TraceNetwork
– logs about network; log-level: TraceExternal
– logs about external devices; log-level: Trace
Transaction
(name) – all logs must belong to some transaction so that I can group them together and see the entire process.State
orEvent
– each log is either aState
log, that logs some data that I usually use to make decisions or it’s anEvent
.
As a state I can log two types of information:
Actual
(State) – this is current state; log-level: TraceExcpected
(State) – this is what I expected; log-level: Trace
They both usualy contain small object dumps in json format.
Event
s can be logged together with the Elapsed
field. They also end with a result. I defined four of them:
Undefined
– when not run like invalid parameters; log-level: WarningSuccess
– everything went well; log-level: InformationCompleted
– conditions not met, no errors; log-level: Information-
Failure
– an error occured; log-level: Error -
Message
– finally there is the old good message which I usually use to give some hints how to fix what might went wrong but I now write it very rarely. Exception
– here I put the stack-trace of execptions.
As a table in a database it could look like this:
Log-Table -------------- Id Timestamp --- Environment | development, production Product | Product-v0 Logger | RepositoryXLogger TransactionId | 123 Layer | Application, Business Level | Debug, Information Expected | small object dumps (json) Actual | small object dumps (json) Event | LoadConfiguration, GetDataX Elapsed | milliseconds Result | Undefined, Success, Completed, Failure Message Exception
I don’t present any code because it’s not about an specific programming language also how I log this information is an implementation detail that I’d rather ask on a different site.
With this new categories it should be much easier to tell what happened and to distinguish the application logs form the business logic logs. I should also be much easier to log because now I don’t have to put all this very specific and enum-like information into messages.
If now anyone asks me about what went wrong I should be able to much faster give him an answer because I wouldn’t even have to open the IDE and debug the application.
My questions are:
- Are these categories enough to easily find the information you need about your application?
- What other useful categories could there be?
- Can you think of any case or question about your application you would not be able to find an answer for in such log or you would not be able to efficiently log?
The goal is to be able answer questions about application failure or strange behavior more quickly and without looking at the code. Especially questions about business rules that you might know but you are sometimes not sure if it worked correctly if you don’t see the data it used to make decisions.