Service Logging Best Practices: A Complete Guide to Production-Ready Logging
Comprehensive guide to implementing effective logging strategies for modern applications, covering structured logging, centralized systems, cloud solutions, and monitoring best practices.
Service Logging Best Practices: A Complete Guide to Production-Ready Logging
Introduction
Logging is the backbone of modern application observability. When done right, it transforms debugging from guesswork into systematic problem-solving. When done wrong, it becomes noise that obscures real issues and wastes resources.
After years of managing production systems processing millions of requests, I’ve seen how proper logging can reduce incident resolution time from hours to minutes. This guide focuses on practical strategies that actually work in production environments.
Why Logging Strategy Matters
The Real Cost of Poor Logging
- Debugging Time: Poor logs can turn a 10-minute fix into a 3-hour investigation
- Incident Response: Without proper context, teams waste time gathering information instead of fixing issues
- Compliance Risk: Missing audit trails can result in regulatory violations
- Resource Waste: Excessive logging can consume significant infrastructure costs
What Good Logging Delivers
- Rapid Problem Resolution: Clear context helps identify root causes quickly
- Proactive Issue Detection: Patterns in logs reveal problems before they impact users
- Business Intelligence: User behavior insights from application events
- Operational Confidence: Teams can deploy and operate systems with visibility
Essential Logging Concepts
Log Levels: Your Information Hierarchy
Think of log levels as a filtering system that helps you find the right information at the right time:
ERROR - Something broke and needs immediate attention
- Payment processing failures
- Database connection errors
- Unhandled exceptions
WARN - Something unusual happened but the system continues
- API rate limits approaching
- Retry attempts
- Deprecated feature usage
INFO - Important business events worth tracking
- User logins and logouts
- Order completions
- System startup/shutdown
DEBUG - Technical details for troubleshooting
- Cache hits/misses
- SQL query execution
- Method entry/exit (use sparingly)
Structured vs Unstructured Logging
The difference between good and great logging often comes down to structure:
Unstructured (Hard to Query):
"User john@example.com ordered 3 items totaling $99.99"
Structured (Easy to Query):
{
"message": "Order completed",
"userId": "john@example.com",
"itemCount": 3,
"total": 99.99,
"timestamp": "2024-12-20T10:30:15Z"
}
With structured logs, you can easily answer questions like:
- “Show me all orders over $50 from the last hour”
- “Which users are placing the most orders?”
- “What’s our average order value by day?”
What Information Should You Log?
Critical Business Events
These are the events that matter to your business and operations team:
User Actions:
- Login/logout events with user context
- Feature usage and engagement metrics
- Purchase completions and failures
- Account changes and security events
System Operations:
- Application startup and shutdown
- Configuration changes
- Scheduled job executions
- External service integrations
Performance Indicators:
- Response times for critical operations
- Database query performance
- Cache hit/miss ratios
- Resource utilization patterns
Essential Context Information
Every log entry should include enough context to be actionable:
Request Context:
- Correlation/Request ID for tracing
- User ID (when available)
- Session information
- Client IP and user agent
Operational Context:
- Service name and version
- Environment (dev/staging/prod)
- Server/container identifier
- Timestamp with timezone
Business Context:
- Operation being performed
- Key business entities (order ID, product ID)
- Transaction amounts or quantities
- Success/failure indicators
Simple Implementation Example
1. Logger Setup (Program.cs)
// Configure Serilog in Program.cs
Log.Logger = new LoggerConfiguration()
.MinimumLevel.Information()
.Enrich.FromLogContext()
.Enrich.WithProperty("Service", "OrderService")
.WriteTo.Console()
.WriteTo.File("logs/app-.log", rollingInterval: RollingInterval.Day)
.CreateLogger();
builder.Host.UseSerilog();
2. Logger Helper Class
public class StructuredLogger
{
private readonly ILogger _logger;
public StructuredLogger(ILogger logger)
{
_logger = logger;
}
public void LogInfo(string message, object context = null)
{
_logger.LogInformation("{Message} {@Context}", message, context);
}
public void LogError(string message, Exception error, object context = null)
{
_logger.LogError(error, "{Message} {@Context}", message, context);
}
}
3. Using the Logger in Services
public class OrderService
{
private readonly StructuredLogger _logger;
public OrderService(StructuredLogger logger)
{
_logger = logger;
}
public async Task<Order> ProcessOrderAsync(OrderData orderData, string userId)
{
var correlationId = Guid.NewGuid().ToString();
_logger.LogInfo("Processing order started", new
{
CorrelationId = correlationId,
UserId = userId,
ItemCount = orderData.Items.Count
});
try
{
var result = await CreateOrderAsync(orderData);
_logger.LogInfo("Order processed successfully", new
{
CorrelationId = correlationId,
OrderId = result.Id,
Total = result.Total
});
return result;
}
catch (Exception ex)
{
_logger.LogError("Order processing failed", ex, new
{
CorrelationId = correlationId,
UserId = userId
});
throw;
}
}
}
Centralized Logging: Bringing It All Together
Why Centralize Your Logs?
When you have multiple services, centralized logging becomes essential:
Single Source of Truth: All logs in one searchable location Cross-Service Correlation: Follow a user request across multiple services Unified Alerting: Set up alerts across your entire system Simplified Operations: One place to search, one place to monitor
Popular Centralized Solutions
ELK Stack (Elasticsearch, Logstash, Kibana)
- Pros: Powerful search, flexible, open source
- Cons: Complex setup, resource intensive
- Best for: Large teams with dedicated DevOps resources
Cloud Solutions
- AWS CloudWatch: Integrated with AWS services, simple setup
- Google Cloud Logging: Great for GCP environments
- Azure Application Insights: Excellent for .NET applications
- Datadog/New Relic: Full-featured but paid solutions
Lightweight Options
- Grafana Loki: Prometheus-inspired, cost-effective
- Fluentd: Flexible data collection and forwarding
- Vector: High-performance observability data pipeline
Getting Started with Centralized Logging
Step 1: Choose Your Stack Start simple. If you’re on AWS, use CloudWatch. If you’re using Docker, try the ELK stack with Docker Compose.
Step 2: Standardize Your Log Format Ensure all services use the same JSON structure:
{
"timestamp": "2024-12-20T10:30:15Z",
"level": "INFO",
"service": "order-service",
"message": "Order processed successfully",
"correlationId": "abc-123-def",
"userId": "user-456",
"orderId": "order-789"
}
Step 3: Implement Correlation IDs
Middleware Setup
public class CorrelationIdMiddleware
{
private readonly RequestDelegate _next;
public CorrelationIdMiddleware(RequestDelegate next)
{
_next = next;
}
public async Task InvokeAsync(HttpContext context)
{
var correlationId = context.Request.Headers["X-Correlation-ID"]
.FirstOrDefault() ?? Guid.NewGuid().ToString();
context.Response.Headers.Add("X-Correlation-ID", correlationId);
using (LogContext.PushProperty("CorrelationId", correlationId))
{
await _next(context);
}
}
}
Register Middleware
// In Program.cs
app.UseMiddleware<CorrelationIdMiddleware>();
Security and Sensitive Data
Never Log These Items
Personal Information:
- Credit card numbers, CVV codes
- Social Security Numbers
- Passwords or password hashes
- API keys and tokens
- Personal addresses and phone numbers
Business Sensitive Data:
- Internal pricing information
- Proprietary algorithms or business logic
- Customer financial details
- Confidential business metrics
Safe Logging Practices
Data Sanitization Helper
public class DataSanitizer
{
private readonly HashSet<string> _sensitiveFields = new()
{
"password", "creditCard", "ssn", "token", "secret", "key"
};
public object SanitizeUserData(UserData userData)
{
return new
{
UserId = userData.Id,
Email = MaskEmail(userData.Email),
Action = userData.Action
// Exclude sensitive fields
};
}
private string MaskEmail(string email)
{
if (string.IsNullOrEmpty(email)) return email;
var parts = email.Split('@');
if (parts.Length != 2) return email;
return $"{parts[0].Substring(0, 3)}***@{parts[1]}";
}
}
Usage Example
var sanitizer = new DataSanitizer();
_logger.LogInformation("User action completed {@User}",
sanitizer.SanitizeUserData(user));
Compliance Considerations:
- GDPR: Be careful with EU user data
- PCI DSS: Never log payment card information
- HIPAA: Healthcare data requires special handling
- SOX: Financial data has strict requirements
Monitoring and Alerting
Key Metrics to Monitor
Error Rates:
- Overall error percentage
- Errors by service/endpoint
- Error trends over time
- Critical vs non-critical errors
Performance Indicators:
- Response time percentiles (p50, p95, p99)
- Throughput (requests per minute)
- Database query performance
- External service response times
Business Metrics:
- User signup/login rates
- Transaction completion rates
- Feature adoption metrics
- Revenue-impacting events
Setting Up Effective Alerts
Alert on Patterns, Not Individual Events:
❌ Alert on every single error
✅ Alert when error rate > 5% for 5 minutes
Use Meaningful Thresholds:
❌ Alert when response time > 100ms
✅ Alert when p95 response time > 2 seconds for 10 minutes
Include Context in Alerts:
{
"alert": "High Error Rate",
"service": "payment-service",
"current_rate": "8.5%",
"threshold": "5%",
"duration": "7 minutes",
"runbook": "https://wiki.company.com/payment-service-errors"
}
Simple Log-Based Queries
Find Payment Failures:
level:ERROR AND service:payment AND message:*failed*
Track User Journey:
correlationId:"abc-123-def" | sort timestamp
Monitor API Performance:
message:*response_time* AND response_time:>2000
Performance Optimization Tips
Async Logging Configuration
Log.Logger = new LoggerConfiguration()
.WriteTo.Async(a => a.File("logs/app.log"))
.WriteTo.Async(a => a.Console())
.CreateLogger();
Log Sampling for High Volume
public class SamplingLogger
{
private readonly ILogger _logger;
private readonly Random _random = new Random();
public void LogWithSampling(LogLevel level, string message, object context = null)
{
// Always log errors and warnings
if (level >= LogLevel.Warning)
{
_logger.Log(level, message, context);
return;
}
// Sample debug/info logs
var samplingRate = level == LogLevel.Information ? 0.1 : 0.01;
if (_random.NextDouble() < samplingRate)
{
_logger.Log(level, message, context);
}
}
}
Structured Data: Use structured logging to make queries faster and more reliable than text-based searches.
Common Pitfalls and How to Avoid Them
Over-Logging
Problem: Logging every method entry/exit or minor operations Solution: Focus on business events and error conditions
Under-Logging
Problem: Not enough context when issues occur Solution: Include correlation IDs, user context, and operation details
Inconsistent Formats
Problem: Different services use different log structures Solution: Establish team-wide logging standards and templates
Performance Impact
Problem: Synchronous logging slowing down applications Solution: Use asynchronous logging libraries and consider sampling
Security Violations
Problem: Accidentally logging sensitive data Solution: Implement data sanitization and regular log audits
Implementation Roadmap
Phase 1: Foundation (Week 1-2)
- Standardize Log Levels: Define when to use each level across your team
- Implement Structured Logging: Convert string concatenation to structured format
- Add Correlation IDs: Implement request tracing across services
- Security Review: Audit existing logs for sensitive data exposure
Phase 2: Centralization (Week 3-4)
- Choose Your Stack: Select centralized logging solution (ELK, cloud, etc.)
- Set Up Collection: Configure log forwarding from all services
- Create Dashboards: Build basic monitoring views
- Define Retention: Establish log storage and cleanup policies
Phase 3: Monitoring (Week 5-6)
- Key Metrics: Identify critical business and technical metrics
- Alert Rules: Create actionable alerts with proper thresholds
- Runbooks: Document response procedures for common issues
- Test Alerts: Verify alert delivery and response procedures
Phase 4: Optimization (Ongoing)
- Performance Tuning: Implement async logging and sampling
- Cost Management: Monitor and optimize log storage costs
- Team Training: Educate developers on logging best practices
- Continuous Improvement: Regular review and refinement
Quick Reference Guide
Essential Do’s and Don’ts
✅ Always Do:
- Use structured logging (JSON format)
- Include correlation IDs for request tracing
- Sanitize sensitive data before logging
- Log business events and errors with context
- Set up centralized log collection
- Create actionable alerts, not noise
❌ Never Do:
- Log passwords, credit cards, or personal data
- Use string concatenation for log messages
- Over-log (every method entry/exit)
- Ignore logging performance impact
- Create logs without sufficient context
- Mix different log formats across services
Implementation Checklist
Week 1-2: Foundation
- Standardize log levels across team
- Implement structured logging format
- Add correlation ID to all requests
- Audit logs for sensitive data exposure
Week 3-4: Centralization
- Set up centralized logging system
- Configure log forwarding from all services
- Create basic monitoring dashboards
- Define log retention policies
Week 5-6: Monitoring
- Identify key business and technical metrics
- Create actionable alert rules
- Document incident response procedures
- Test alert delivery and escalation
Ongoing: Optimization
- Monitor logging performance impact
- Optimize log storage costs
- Train team on logging best practices
- Regular review and improvement
Conclusion
Good logging is invisible when everything works and invaluable when things break. It’s the difference between spending 10 minutes fixing an issue and spending 3 hours trying to understand what went wrong.
The key is to start simple and iterate. You don’t need a perfect logging system from day one. Begin with structured logging and correlation IDs, then gradually add centralized collection, monitoring, and alerting as your needs grow.
Remember: logs are for humans. Make them readable, searchable, and actionable. Your future self (and your teammates) will thank you when you’re troubleshooting a production issue at 2 AM.
The investment in proper logging pays off quickly in reduced debugging time, faster incident resolution, and increased confidence in your systems. Start today, and build the observability foundation your applications deserve.