© Pixabay

how we should practice logging (part I)

a proposal to use log data in a modern way

the idea behind logging

A log refers to a protocol and, specifically in computer science, to a record of what is happening on a computer. This record is one of the possible ways to monitor a system. When going live with our software, we don’t have a debugger. So, logging is helping us to look inside our running applications and their internal events.

Throughout my planned blog series, I want to focus on the so called event logs (and not audit, query, transaction logs or some other types). Within event logs, there are events and operations of a system under a real payload collected so that developers can understand the internal processes of a system better. Events can include accesses and operations in software and applications, servers or consoles, but also anomalies and subsequently incorrect system behavior. Those events are recorded as unique lines within a log and are called log entry.

In the past, such an entry was simply build by having a timestamp and a free text message.

2021–03–21 12:34:56 +0200 I am an event!
2021–03–21 12:34:57 +0200 I represent something different..

But some events are supposed to be more critical than others and developers need to react more or less frequent to their occurrences. So, to distinguish them, a log level became an important part within a log entry. Some levels are supposed to be more verbose (TRACE and DEBUG), some levels are purely informative (INFO and WARN), and others require immediate action (ERROR and FATAL).

2021–03–21 12:34:56 +0200 ERROR I am a critical event!
2021–03–21 12:34:57 +0200 INFO I want to inform about sth. like a startup
2021–03–21 12:34:58 +0200 WARN sth. unexpected but uncritical happened

What happens now if an entry is no longer current or you want to find the cause of an event in your code? It is pretty difficult to locate the source of a message once it has been generated and in case you don’t have a (good) search functionality within your IDE. Therefore, some information about its source might be a good meta date and is often part of an entry as well.

2021–03–21 12:34:56 +0200 ERROR [package.class.method:12] I am a critical event!
2021–03–21 12:34:57 +0200 INFO [package.class.method:34] I want to inform about sth. like a startup

I will discuss the common content and structure of a log entry in more detail in a subsequent blog article. For now, you might see that log entries became too long to fit into a line. You might have noticed this already when I presented the second example or, depending on your screen size, you might not see the line break I am talking about. But you can’t assume that everyone has your screen size who will just read your messages. A line break is extremely unpleasant to read and consequently makes it difficult to find (critical) anomal system behavior. So, on the one hand, a log entry really needs to be precise, but on the other hand it has to have enough meta data to locate its source. Here, the free text message needs to be as expressive as possible, too. So there is a discrepancy between logging too much and logging too little.

how we analyze logs

What we find in today’s IT companies is that System environments are getting bigger due to more services and load-balanced servers. Logs from distributed servers are still accessed via consoles and are opened within multiple tabs. Search filters are copied here. One might argue that centralizing log data is the logical inference to manage the huge amount of data better, but that’s not the case. Just imagine how much more meta data you need to add to trace log entries (= lines) back to their source and to recombine the context of an event with other events that might help to explain a specific happening. Centralization might solve the need to log on to different systems and copy search filters at the same time, but it also increases the search space where anomalous system behavior is disguised even more.

Reducing the number of log entries arbitrarily and aimlessly is not always advisable, however, so as not to make post-mortem analysis, i.e., the investigation of the causes of failures, more difficult. A balance must be found for a sufficient number of log entries, since otherwise the performance during processing and the required resources for persisting them are stressed. Accordingly, it is also necessary to define variables and metadata (for now, those are structured components of an event) in such a way that analyzing log data in a short time leads to a beneficial, effective result and can be interpreted further.

Even at the level of a single log statement, i.e. per entry, developers must make well-considered decisions, since this is where they decide which information they will view in the future. Logging frameworks support this with a basic structure of a log entry and can enrich it with many metadata. However, on the technical side, when using different frameworks, there is no uniform registration timestamp, no fixed order of the metadata or specifications as to which metadata must clearly appear in an entry, to name just a few examples. Common log formats and frameworks, furthermore, do not provide a solution for how developers should handle free text messages as descriptions of events or implement different forms of key-value representations within them. Scientific work on an overall uniform recommendation for the handling of logging is quite rare and there is still no existing standardization of the procedures and the resulting suitable structure of log entries. A missing specification means chaos. I am going to explain why I think so.

the natural language of humans

One of the largest sources of mistakes, both in logging and the analysis of logs, is the human being. The message in a log entry is the most important information to understand the cause of an event. Log statements can enhance the readability of source code and promote communication between developers. However, under the cleanliness of code, it should be noted that the free text message in log statements must be viewed as a contract (it’s the same with code comments). If sequences or attributes in the source code change, then it is obligatory to also check the statement and adjust the message if necessary. Misinterpretations can occur, which can be traced back to bad log statements. These include, for example, missing or outdated information. Finding and updating the source of an entry is often associated with greater effort. Developers who investigate causes are usually not the ones who implemented the statements or methods.

But not only outdated statements can make searches for anomalous system behavior difficult. A free-text message varies strongly in its expressiveness, depending on the developer who created it in the natural language of humans. In a log analysis, the interpretation of this natural language may require unnecessary effort and experience. A misleading or incomprehensible description can also lead to wrong decisions in identifying and fixing a wrong behavior. It is even more difficult to prepare a free-text message in such a way that systems can also interpret the content correctly. Not to omit, developers also arrange their messages differently, use punctuation and separators differently, or highlight special variables almost randomly.

If new unwanted events occur, developers sometimes use ad-hoc methods, i.e. log mechanisms are subsequently added as needed over time. This can lead to a situation where less relevant entries in the log predominate.

The non-deterministic behavior of developers can be explained by the fact that they usually lack defined requirements according to which they should practice logging. In addition, they lack an overview of the underlying system, also due to the fact that they usually only work on partial components of a system. Developers thus decide according to their own experience, understanding and judgment. The maintenance of software and logging is conclusively impracticable without proper experts, non-existing standards and guidelines as well as a missing structure of a message and consequently too much arbitrary data. The lack of strict specifications for logging behavior makes decisions highly dependent on the domain knowledge of the developers. Different teams have independent approaches. But also each individual team member can build up his own exclusive knowledge about logging practices.

preventing human error

So what we need is to make logging more defined and we should stop depending on our gut feeling when practicing it. How should we start?

There are technologies that are able to automatically transform texts into tokens based on rules (tokenizer/lexer). Transforming texts (here: logs) into usable, new data structures or points is called (log) parsing. This reduces the state that a log entry represents and is further usable. You might already know this from how compilers work.

However, free text messages are still problematic and can’t “just” be parsed. There is no formal language that can map human natural language with a finite number of rules and thus always correctly assign all variables or metadata to a message. This means that developers would be forced to resort to an arbitrary number of individual regular expressions to parse all messages. Furthermore, this does not guarantee that in the future, still unknown special cases were also considered. In addition, log statements and their patterns can change just again underlying source code. This has the effect of creating an increasingly large, unstable, messy, monolithic parser that will fail at the slightest change to a format. This added overhead means that it is much more difficult to subsequently manually extract diagnostic or operational value from log data and effectively makes logging absurd. Parsing would now only helps us separate metadata from the actual text message.

the parsing of messy log data into something “usable”

But hope is not lost yet, of course. There is a better approach, and that is we need to do some prep work. In the next parts, I am going to discuss how we can create high-quality log data, how we should use it and how we are going to profit from clean and structured logging.

Thanks for reading! If you are interested in more or if you want to support me, I would be glad to get a free drink 😚

https://ko-fi.com/ninjeanne

--

--

--

24 y/o female Software Developer | dreaming of becoming a data scientist and a world without language barriers | I 💚 caffeine ko-fi.com/ninjeanne

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Consider Serverless for your Pet Projects

Coming back to the monolith, Q&A

Tibco Spotfire Data Source for MariaDB(10.5.8)

Ways to install Laravel 8.0 on Ubuntu / Pop!_OS

5 Things You Need to Know Before Installing Web Push Notifications

Be sure to follow us on Social Networks. Check out our daily update here

Scrypta launch the campaign “IdANode Network Expansion”

Best Practices When Documenting Your Code for Software Engineers

Cartoon character holding up some documentation.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
ninjeanne

ninjeanne

24 y/o female Software Developer | dreaming of becoming a data scientist and a world without language barriers | I 💚 caffeine ko-fi.com/ninjeanne

More from Medium

What are Some Common Types of Smells? How can I refactor ?

9 Essential Lessons for Modernization of Your Legacy Application

Three Deployment Patterns

Big data and security: A match made in heaven

Big Data and Security: A Match Made in Heaven