how we should practice logging (part II)

a proposal to use log data in a modern way

ninjeanne
9 min readNov 14, 2021

I found an interesting analogy in relation to dealing with Big (Log) Data a few weeks ago in an infographic. There, data was compared to Lego® bricks. Our goal is nothing else but the fancy Lego® house in the picture below.

© Bill Ward, Yellow Victorian, license: Attribution 2.0 Generic (CC BY 2.0)

But what we actually have right now is the following.

Randy Krum, © Hot Butter Studios

From this heap it is not obvious whether we have all the Lego® bricks to build our house. Moreover, the bricks we need are scattered and some other bricks are not needed. Isn’t it better if we not only know what our house has to look like, but if we also specify the bricks from the beginning? And this is where having purpose and strategic alignment come into play.

clean up your data!

The strong dependence on manual configurations emphasizes even more the need to establish a comprehensible, company-wide document or specification that specifies the goal of logging and the structure of different types of entries. I will use the term logging strategy as the name of this document (this is the instruction for our Lego® house). Defining such a strategy is by no means easy. Not only is logging itself sparsely researched due to its complexity, but especially the industrial domain. A strategy must above all be pragmatic and usable. And a subsequent introduction of a strategy does not guarantee that a migration of existing log statements of an existing project to a new format is necessarily worthwhile. When a return on investment occurs, i.e. the point in time at which added value is gained from an investment, and what time and financial costs are associated with achieving this, should be investigated.

but how can we start specifying a logging strategy?

First things first, I would like to define a few more terms about logs so that I am giving clear statements.

a little terminology

From reading the first part of this series you should already know that I understand an (event) log as a sequence of log entries of a system. A log is nothing else but a text file. I am using event log, log and log file synonymously.

A log entry is a protocoled line within a log. It records relevant events within a software, a program or a system. Even though a large number of entries seems to be messy within a short time, the produced content can be summarized into a few patterns. I am using log entry, entry and lines synonymously.

A pattern or format specifies the structure of repeating, similar entries. This structure is often described by defining a regular expression.

The following image visualizes the connection of event logs, entries and patterns exemplary.

A log point is defined as the origin of an entry within the source code of a software or program.

At this point, there is located a so called log statement that describes how an entry should actually be generated.

The condition, by which a log point and its statement are called or will be passed, is called activation code.

Nowadays, log statements are implemented by using logging frameworks. Those are programming libraries that support us in structuring our log entries in a more simpel way. To achieve this, logging frameworks consist of a lot of tools and methods. When analyzing logs and trying to summarize generated lines into a few patterns, it might be helpful to locate their statements and understand the used framework as well.

Finally, a log mechanism describes a set of log points and their activation code. What we understand by (event) logging is nothing else but the process of implementing log mechanisms. A log analysis is what we do when we are actually searching for anomalous system behavior.

The origin of an entry is visualized exemplary in the following image.

As logs have to be analyzed within a huge mess of data, that’s where we have to make a strategic plan to simplify future searches. Generally, log statements, that are the origin of each entry, are created within minutes. And seldom they are updated or touched. If we would have put a little effort in here at the beginning, we would save a lot of time, trouble and.. money! And we could reuse our data for even more use cases than the search for anomalous system behaviour. Therefore, let’s anatomize a log entry and which Lego® bricks we have or require for our house!

Randy Krum, © Hot Butter Studios

the structure of logs

A log entry consists at least of a registration timestamp and a textual message. In general, the text protocols the current state of whatever that is monitored or the cause of an event with a free text description and is written in free text typically without any restrictions. Free text means that developers formulate a specific situation by using their natural language. So, I am using (free text) message and (free text) description synonymously. And this free text can be splitted into a static and variable part.

Static text consists of immutable words, punctuation marks, sentences or parts of sentences. Variables have a fixed position encircled by static text, but can change their value depending on the run time behavior of a system. In conclusion, they actually contain what we are interested in: information about the inner workings of a system. And they can be of any kind of data type, like identifiers or IP-addresses.

Depending on the literature, a log level can be seen as a fixed part of a log entry as well. As mentioned in a previous part of this blog series, it highlights the severity or verbosity of an event. There are levels that are more talkative like DEBUG or TRACE, levels that are used to inform us about outstanding events like INFO or WARN and levels that need us to react immediately like ERROR or FATAL (e.g. when a system failure occurs). But a log level is normally not a part of a free text, even though it’s value varies. The same goes for timestamps. Those two are examples for so called meta data, a special kind of variables. Meta data is often described as “information above information”. So their values help us to understand the origin of a date (or rather an event) or to oppose a set of dates to the time of their occurrence. It is important to differentiate variables, that are describing an event, from actual meta data, with those we can navigate and search through our logging data.

At this point, to clean up our logging data, we need to take advantage of differencing both variables and meta data, but also to arrange them on a fixed position, so we can easier find information. This definition of variables and metadata, the arrangement of these at a fixed place as well as the defaults of data types and identifiers is called structural logging or also semantic or typed logging.

And here comes the tricky part: which variables and meta data do I need? How should I arrange them? Should I arrange them after the most imporant and the least important information? But how can I tell what is most or least important? Do I always have access to the each information that I define? Can I reuse my strategy for more than one application? What is the adequate log level for an event, should I log more or less meta data … and how can I fit everything into one single line that is so meaningful, expressive and cohesive that EVERY developer can interpret entries at any time?

Well.. let’s begin with the following. You can’t specify a swiss army knife.

there is more than one structure!

When we are talking about event logging, we have to distinguish different types of events or goals that we want to achieve with them. I remain constant in this blog and use the name event types. And each of those types has its own structure.. or at least one own structure. In Lego® terms: We have several houses that we need to look at from different angles.

Events types are the so called state dumping events, execution tracing, event reporting and performance or application dependent events.

State dumping events are used to collect runtime attributes of the associated system. Execution tracing notifies the entry or exit of a class, method/function or condition. Tracing helps to audit events. On the other hand is event reporting the most common type of event logging. Special occasions or happenings are protocolled here. Finally, the performance or application dependent logs are used to record other data, such as system metrics.

Knowing the purpose of a statement is already a good step forward to have certainty about the content. However, this is not the only decision developers have to make nowadays. If we are talking about too much or too little information, then the number of lines or the coverage of sequences in the code is another consideration that provides information about the quality of the data, i.e. the location.

Know where you log!

The activation code for reaching log points can be divided into unexpected situations and execution points. Each division is again subdivided.

Unexpected situations include assertion check logging, return value checks, and exception logging. In assertion checks, developers execute assert-like functions to query an erroneous state and issue an adequate free-text message as a description before an unwanted event occurs and an instruction is executed. Return value checks check the return value of a method for an error code, such as the values -1 or null. If such a value occurs, a log statement is executed. With classic exceptions, log statements are activated as soon as an exception occurs and, for example, a catch block is reached.

Execution points are divided into logic branch logging and observing point logging. Logic-Branches switch depending on the condition of a code sequence into a associated branch. Typical implementations use if- or switch-statements. Here, critical states can also be queried and monitored. Observing points describe the remaining scenarios that cannot be classified into any of the already defined categories.

Depending on the project, developer, team or even company, the most frequently used groups can differ greatly. Logic-branch logging can be the most commonly used code pattern for activation code or exception logging can be equal on with return-value-check logging or even exception logging and observing-point logging. This is the last dynamic that can be observed in the topic of logging.

in a nutshell

To complete the circle, in a strategy decisions must be made about which event logging characteristics an event fulfills. The combination of event types then in turn gives an overall, well-formed picture of the functioning of a system. Per logging purpose it is then determined which variables and metadata are to be logged, which log level is assigned or according to which specification where and how the log statement is implemented (= Lego® instructions). And finally, structural logging is the implementation of any decision that has been made within a logging strategy (= sorting the bricks).

Currently, there is no standard for structured logs. Instead of a free-text message, structured logs attempt to contain typed, rich data rather than complex parsing. There are a few examples of attempts to formalize a structured log format, such as Common Event Expression and Project Lumberjack. However, none of these have caught on or are actively under development. Logging frameworks provide sufficient tools to implement a defined structure of a logging strategy and should be used. Stay in touch with my next blog entry. I am going to discuss log formats and logging frameworks in more detail.

Thanks for reading! If you are interested in more or if you want to support me, I would be glad to get a free drink 😚

https://ko-fi.com/ninjeanne

--

--

ninjeanne

MSc in CS, Big Data | Microsoft Learn Student Ambassador | dreaming of a world without language barriers #nlp | I💚caffeine ko-fi.com/ninjeanne