Service Logging Best Practices: A Complete Guide to Production-Ready Logging

Service Logging Best Practices: A Complete Guide to Production-Ready Logging

Comprehensive guide to implementing effective logging strategies for modern applications, covering structured logging, centralized systems, cloud solutions, and monitoring best practices.

Jongmin Lee
17 min read

Service Logging Best Practices: A Complete Guide to Production-Ready Logging

Introduction

Logging is the backbone of modern application observability. When done right, it transforms debugging from guesswork into systematic problem-solving. When done wrong, it becomes noise that obscures real issues and wastes resources.

After years of managing production systems processing millions of requests, I’ve seen how proper logging can reduce incident resolution time from hours to minutes. This guide focuses on practical strategies that actually work in production environments.

Why Logging Strategy Matters

The Real Cost of Poor Logging

  • Debugging Time: Poor logs can turn a 10-minute fix into a 3-hour investigation
  • Incident Response: Without proper context, teams waste time gathering information instead of fixing issues
  • Compliance Risk: Missing audit trails can result in regulatory violations
  • Resource Waste: Excessive logging can consume significant infrastructure costs

What Good Logging Delivers

  • Rapid Problem Resolution: Clear context helps identify root causes quickly
  • Proactive Issue Detection: Patterns in logs reveal problems before they impact users
  • Business Intelligence: User behavior insights from application events
  • Operational Confidence: Teams can deploy and operate systems with visibility

Essential Logging Concepts

Log Levels: Your Information Hierarchy

Log Levels

Think of log levels as a filtering system that helps you find the right information at the right time:

ERROR - Something broke and needs immediate attention

  • Payment processing failures
  • Database connection errors
  • Unhandled exceptions

WARN - Something unusual happened but the system continues

  • API rate limits approaching
  • Retry attempts
  • Deprecated feature usage

INFO - Important business events worth tracking

  • User logins and logouts
  • Order completions
  • System startup/shutdown

DEBUG - Technical details for troubleshooting

  • Cache hits/misses
  • SQL query execution
  • Method entry/exit (use sparingly)

Structured vs Unstructured Logging

Structured vs Unstructured

The difference between good and great logging often comes down to structure:

Unstructured (Hard to Query):

"User john@example.com ordered 3 items totaling $99.99"

Structured (Easy to Query):

{
  "message": "Order completed",
  "userId": "john@example.com",
  "itemCount": 3,
  "total": 99.99,
  "timestamp": "2024-12-20T10:30:15Z"
}

With structured logs, you can easily answer questions like:

  • “Show me all orders over $50 from the last hour”
  • “Which users are placing the most orders?”
  • “What’s our average order value by day?”

What Information Should You Log?

Critical Business Events

These are the events that matter to your business and operations team:

User Actions:

  • Login/logout events with user context
  • Feature usage and engagement metrics
  • Purchase completions and failures
  • Account changes and security events

System Operations:

  • Application startup and shutdown
  • Configuration changes
  • Scheduled job executions
  • External service integrations

Performance Indicators:

  • Response times for critical operations
  • Database query performance
  • Cache hit/miss ratios
  • Resource utilization patterns

Essential Context Information

Every log entry should include enough context to be actionable:

Request Context:

  • Correlation/Request ID for tracing
  • User ID (when available)
  • Session information
  • Client IP and user agent

Operational Context:

  • Service name and version
  • Environment (dev/staging/prod)
  • Server/container identifier
  • Timestamp with timezone

Business Context:

  • Operation being performed
  • Key business entities (order ID, product ID)
  • Transaction amounts or quantities
  • Success/failure indicators

Simple Implementation Example

1. Logger Setup (Program.cs)

// Configure Serilog in Program.cs
Log.Logger = new LoggerConfiguration()
    .MinimumLevel.Information()
    .Enrich.FromLogContext()
    .Enrich.WithProperty("Service", "OrderService")
    .WriteTo.Console()
    .WriteTo.File("logs/app-.log", rollingInterval: RollingInterval.Day)
    .CreateLogger();

builder.Host.UseSerilog();

2. Logger Helper Class

public class StructuredLogger
{
    private readonly ILogger _logger;
    
    public StructuredLogger(ILogger logger)
    {
        _logger = logger;
    }
    
    public void LogInfo(string message, object context = null)
    {
        _logger.LogInformation("{Message} {@Context}", message, context);
    }
    
    public void LogError(string message, Exception error, object context = null)
    {
        _logger.LogError(error, "{Message} {@Context}", message, context);
    }
}

3. Using the Logger in Services

public class OrderService
{
    private readonly StructuredLogger _logger;
    
    public OrderService(StructuredLogger logger)
    {
        _logger = logger;
    }
    
    public async Task<Order> ProcessOrderAsync(OrderData orderData, string userId)
    {
        var correlationId = Guid.NewGuid().ToString();
        
        _logger.LogInfo("Processing order started", new
        {
            CorrelationId = correlationId,
            UserId = userId,
            ItemCount = orderData.Items.Count
        });
        
        try
        {
            var result = await CreateOrderAsync(orderData);
            
            _logger.LogInfo("Order processed successfully", new
            {
                CorrelationId = correlationId,
                OrderId = result.Id,
                Total = result.Total
            });
            
            return result;
        }
        catch (Exception ex)
        {
            _logger.LogError("Order processing failed", ex, new
            {
                CorrelationId = correlationId,
                UserId = userId
            });
            throw;
        }
    }
}

Centralized Logging: Bringing It All Together

Why Centralize Your Logs?

When you have multiple services, centralized logging becomes essential:

Single Source of Truth: All logs in one searchable location Cross-Service Correlation: Follow a user request across multiple services Unified Alerting: Set up alerts across your entire system Simplified Operations: One place to search, one place to monitor

ELK Stack (Elasticsearch, Logstash, Kibana)

  • Pros: Powerful search, flexible, open source
  • Cons: Complex setup, resource intensive
  • Best for: Large teams with dedicated DevOps resources

Cloud Solutions

  • AWS CloudWatch: Integrated with AWS services, simple setup
  • Google Cloud Logging: Great for GCP environments
  • Azure Application Insights: Excellent for .NET applications
  • Datadog/New Relic: Full-featured but paid solutions

Lightweight Options

  • Grafana Loki: Prometheus-inspired, cost-effective
  • Fluentd: Flexible data collection and forwarding
  • Vector: High-performance observability data pipeline

Getting Started with Centralized Logging

Step 1: Choose Your Stack Start simple. If you’re on AWS, use CloudWatch. If you’re using Docker, try the ELK stack with Docker Compose.

Step 2: Standardize Your Log Format Ensure all services use the same JSON structure:

{
  "timestamp": "2024-12-20T10:30:15Z",
  "level": "INFO",
  "service": "order-service",
  "message": "Order processed successfully",
  "correlationId": "abc-123-def",
  "userId": "user-456",
  "orderId": "order-789"
}

Step 3: Implement Correlation IDs

Middleware Setup

public class CorrelationIdMiddleware
{
    private readonly RequestDelegate _next;
    
    public CorrelationIdMiddleware(RequestDelegate next)
    {
        _next = next;
    }
    
    public async Task InvokeAsync(HttpContext context)
    {
        var correlationId = context.Request.Headers["X-Correlation-ID"]
            .FirstOrDefault() ?? Guid.NewGuid().ToString();
        
        context.Response.Headers.Add("X-Correlation-ID", correlationId);
        
        using (LogContext.PushProperty("CorrelationId", correlationId))
        {
            await _next(context);
        }
    }
}

Register Middleware

// In Program.cs
app.UseMiddleware<CorrelationIdMiddleware>();

Security and Sensitive Data

Never Log These Items

Personal Information:

  • Credit card numbers, CVV codes
  • Social Security Numbers
  • Passwords or password hashes
  • API keys and tokens
  • Personal addresses and phone numbers

Business Sensitive Data:

  • Internal pricing information
  • Proprietary algorithms or business logic
  • Customer financial details
  • Confidential business metrics

Safe Logging Practices

Data Sanitization Helper

public class DataSanitizer
{
    private readonly HashSet<string> _sensitiveFields = new()
    {
        "password", "creditCard", "ssn", "token", "secret", "key"
    };
    
    public object SanitizeUserData(UserData userData)
    {
        return new
        {
            UserId = userData.Id,
            Email = MaskEmail(userData.Email),
            Action = userData.Action
            // Exclude sensitive fields
        };
    }
    
    private string MaskEmail(string email)
    {
        if (string.IsNullOrEmpty(email)) return email;
        var parts = email.Split('@');
        if (parts.Length != 2) return email;
        
        return $"{parts[0].Substring(0, 3)}***@{parts[1]}";
    }
}

Usage Example

var sanitizer = new DataSanitizer();
_logger.LogInformation("User action completed {@User}", 
    sanitizer.SanitizeUserData(user));

Compliance Considerations:

  • GDPR: Be careful with EU user data
  • PCI DSS: Never log payment card information
  • HIPAA: Healthcare data requires special handling
  • SOX: Financial data has strict requirements

Monitoring and Alerting

Key Metrics to Monitor

Error Rates:

  • Overall error percentage
  • Errors by service/endpoint
  • Error trends over time
  • Critical vs non-critical errors

Performance Indicators:

  • Response time percentiles (p50, p95, p99)
  • Throughput (requests per minute)
  • Database query performance
  • External service response times

Business Metrics:

  • User signup/login rates
  • Transaction completion rates
  • Feature adoption metrics
  • Revenue-impacting events

Setting Up Effective Alerts

Alert on Patterns, Not Individual Events:

❌ Alert on every single error
✅ Alert when error rate > 5% for 5 minutes

Use Meaningful Thresholds:

❌ Alert when response time > 100ms
✅ Alert when p95 response time > 2 seconds for 10 minutes

Include Context in Alerts:

{
  "alert": "High Error Rate",
  "service": "payment-service",
  "current_rate": "8.5%",
  "threshold": "5%",
  "duration": "7 minutes",
  "runbook": "https://wiki.company.com/payment-service-errors"
}

Simple Log-Based Queries

Find Payment Failures:

level:ERROR AND service:payment AND message:*failed*

Track User Journey:

correlationId:"abc-123-def" | sort timestamp

Monitor API Performance:

message:*response_time* AND response_time:>2000

Performance Optimization Tips

Async Logging Configuration

Log.Logger = new LoggerConfiguration()
    .WriteTo.Async(a => a.File("logs/app.log"))
    .WriteTo.Async(a => a.Console())
    .CreateLogger();

Log Sampling for High Volume

public class SamplingLogger
{
    private readonly ILogger _logger;
    private readonly Random _random = new Random();
    
    public void LogWithSampling(LogLevel level, string message, object context = null)
    {
        // Always log errors and warnings
        if (level >= LogLevel.Warning)
        {
            _logger.Log(level, message, context);
            return;
        }
        
        // Sample debug/info logs
        var samplingRate = level == LogLevel.Information ? 0.1 : 0.01;
        
        if (_random.NextDouble() < samplingRate)
        {
            _logger.Log(level, message, context);
        }
    }
}

Structured Data: Use structured logging to make queries faster and more reliable than text-based searches.

Common Pitfalls and How to Avoid Them

Over-Logging

Problem: Logging every method entry/exit or minor operations Solution: Focus on business events and error conditions

Under-Logging

Problem: Not enough context when issues occur Solution: Include correlation IDs, user context, and operation details

Inconsistent Formats

Problem: Different services use different log structures Solution: Establish team-wide logging standards and templates

Performance Impact

Problem: Synchronous logging slowing down applications Solution: Use asynchronous logging libraries and consider sampling

Security Violations

Problem: Accidentally logging sensitive data Solution: Implement data sanitization and regular log audits

Implementation Roadmap

Phase 1: Foundation (Week 1-2)

  1. Standardize Log Levels: Define when to use each level across your team
  2. Implement Structured Logging: Convert string concatenation to structured format
  3. Add Correlation IDs: Implement request tracing across services
  4. Security Review: Audit existing logs for sensitive data exposure

Phase 2: Centralization (Week 3-4)

  1. Choose Your Stack: Select centralized logging solution (ELK, cloud, etc.)
  2. Set Up Collection: Configure log forwarding from all services
  3. Create Dashboards: Build basic monitoring views
  4. Define Retention: Establish log storage and cleanup policies

Phase 3: Monitoring (Week 5-6)

  1. Key Metrics: Identify critical business and technical metrics
  2. Alert Rules: Create actionable alerts with proper thresholds
  3. Runbooks: Document response procedures for common issues
  4. Test Alerts: Verify alert delivery and response procedures

Phase 4: Optimization (Ongoing)

  1. Performance Tuning: Implement async logging and sampling
  2. Cost Management: Monitor and optimize log storage costs
  3. Team Training: Educate developers on logging best practices
  4. Continuous Improvement: Regular review and refinement

Quick Reference Guide

Essential Do’s and Don’ts

✅ Always Do:

  • Use structured logging (JSON format)
  • Include correlation IDs for request tracing
  • Sanitize sensitive data before logging
  • Log business events and errors with context
  • Set up centralized log collection
  • Create actionable alerts, not noise

❌ Never Do:

  • Log passwords, credit cards, or personal data
  • Use string concatenation for log messages
  • Over-log (every method entry/exit)
  • Ignore logging performance impact
  • Create logs without sufficient context
  • Mix different log formats across services

Implementation Checklist

Week 1-2: Foundation

  • Standardize log levels across team
  • Implement structured logging format
  • Add correlation ID to all requests
  • Audit logs for sensitive data exposure

Week 3-4: Centralization

  • Set up centralized logging system
  • Configure log forwarding from all services
  • Create basic monitoring dashboards
  • Define log retention policies

Week 5-6: Monitoring

  • Identify key business and technical metrics
  • Create actionable alert rules
  • Document incident response procedures
  • Test alert delivery and escalation

Ongoing: Optimization

  • Monitor logging performance impact
  • Optimize log storage costs
  • Train team on logging best practices
  • Regular review and improvement

Conclusion

Good logging is invisible when everything works and invaluable when things break. It’s the difference between spending 10 minutes fixing an issue and spending 3 hours trying to understand what went wrong.

The key is to start simple and iterate. You don’t need a perfect logging system from day one. Begin with structured logging and correlation IDs, then gradually add centralized collection, monitoring, and alerting as your needs grow.

Remember: logs are for humans. Make them readable, searchable, and actionable. Your future self (and your teammates) will thank you when you’re troubleshooting a production issue at 2 AM.

The investment in proper logging pays off quickly in reduced debugging time, faster incident resolution, and increased confidence in your systems. Start today, and build the observability foundation your applications deserve.