Clean Code Part 4: Concurrency and Refactoring
This final part tackles the hardest topics: concurrent programming, real-world refactoring case studies, and a comprehensive reference of code smells and heuristics. These chapters separate competent programmers from expert ones.
Chapter 13: Concurrency
Writing clean concurrent code is hard—very hard. Even code that looks clean can be broken in subtle ways.
Why Concurrency?
Concurrency is a decoupling strategy. It decouples what gets done from when it gets done. In single-threaded applications, what and when are tightly coupled.
Benefits of decoupling what from when:
- Improved throughput (handling multiple users)
- Better response times (doing work in background)
- More natural problem modeling (some problems are inherently concurrent)
Myths and Misconceptions
Myth: Concurrency always improves performance. Reality: Concurrency can sometimes improve performance, but only when wait time can be shared between threads. It adds overhead.
Myth: Design doesn’t change when writing concurrent programs. Reality: The design of a concurrent system can be dramatically different from a single-threaded system.
Myth: Understanding concurrency issues isn’t important when working with containers like web or EJB containers. Reality: You need to know what the container does and how to protect concurrent updates.
Concurrency Defense Principles
Single Responsibility Principle
Concurrency-related code has its own development lifecycle. Keep it separate:
// Bad - concurrency mixed with business logic
public class UserService {
public void processUsers() {
synchronized(this) {
// business logic intertwined with locking
}
}
}
// Good - separated concerns
public class UserService {
public void processUser(User user) {
// Pure business logic
}
}
public class ConcurrentUserProcessor {
private UserService service;
public void processUsersConcurrently(List<User> users) {
// Concurrency logic separated
}
}
Corollary: Limit the Scope of Data
Restrict access to shared data. Use synchronized or similar mechanisms, but minimize shared sections.
Two threads modifying the same shared object can interfere in unexpected ways. The more places shared data can be updated, the more likely:
- You’ll forget to protect one section
- Duplication will break DRY
- It’s hard to determine the source of failures
Corollary: Use Copies of Data
A good way to avoid sharing: don’t share. Copy objects and treat them as read-only. Merge results in a single thread.
The overhead of copying might be offset by avoiding synchronization.
Corollary: Threads Should Be as Independent as Possible
Write code so each thread exists in its own world, sharing no data with other threads. Each processes one client request, with all data from unshared sources and stored as local variables.
Know Your Library
Java 5 and later offer significant concurrency improvements:
Thread-safe collections:
ConcurrentHashMap- much better than synchronizedHashMapReentrantLock- can be acquired in one method, released in anotherSemaphore- classic counting semaphoreCountDownLatch- waits for events before releasing threads
// Bad
Map<String, Integer> map = new HashMap<>();
synchronized(map) {
map.put(key, value);
}
// Good
ConcurrentHashMap<String, Integer> map = new ConcurrentHashMap<>();
map.put(key, value); // Thread-safe without explicit locking
Know Your Execution Models
Producer-Consumer: Producers place work on a queue; consumers take work from the queue. The queue is a bound resource.
Readers-Writers: Readers can read simultaneously, but writers need exclusive access. Balance throughput against starvation.
Dining Philosophers: Threads compete for resources in a way that can lead to deadlock, livelock, or throughput issues.
Understanding these patterns helps you recognize and solve concurrency problems.
Beware Dependencies Between Synchronized Methods
Having more than one synchronized method on a shared object can cause subtle bugs:
// Dangerous - two synchronized methods
public class IntegerIterator implements Iterator<Integer> {
private Integer nextValue = 0;
public synchronized boolean hasNext() {
return nextValue < 100000;
}
public synchronized Integer next() {
if (nextValue == 100000) throw new NoSuchElementException();
return nextValue++;
}
}
Even though both methods are synchronized, this sequence can fail:
Thread 1: hasNext() returns true
Thread 2: hasNext() returns true
Thread 1: next() returns 99999
Thread 2: next() throws exception (nextValue is now 100000)
Solutions:
- Client-based locking (client locks server before calling)
- Server-based locking (server locks inside, provides atomic method)
- Adapted server (intermediary provides atomic operations)
Keep Synchronized Sections Small
synchronized introduces delays and overhead. Don’t extend critical sections beyond what’s necessary:
// Bad - too much inside synchronized block
synchronized(this) {
expensiveComputation(); // Doesn't need lock
updateSharedState(); // Needs lock
anotherComputation(); // Doesn't need lock
}
// Good - minimal critical section
expensiveComputation();
synchronized(this) {
updateSharedState();
}
anotherComputation();
Writing Correct Shut-Down Code Is Hard
Graceful shutdown is hard to get right. Think about shutdown early. Getting this working early is essential.
Common issues:
- Deadlock—threads waiting forever for signals that never come
- Resources not being released
- Threads that don’t terminate
Testing Threaded Code
Testing doesn’t guarantee correctness, but good tests minimize risk:
Treat spurious failures as candidate threading issues. Developers often ignore system failures. “The server was hiccuping.” “Cosmic ray.” Threading bugs might appear as one-in-a-thousand failures. Don’t ignore them.
Get your nonthreaded code working first. Don’t debug threading and code issues simultaneously.
Make your threaded code pluggable. So it can run in various configurations:
- One thread, several threads, varied number of threads
- Real vs. test doubles
- Fast vs. slow iterations
Make your threaded code tunable. Allow number of threads to be easily adjusted, even at runtime.
Run with more threads than processors. This increases the chance of task swapping and exposing bugs.
Run on different platforms. Threading behavior differs across OSes.
Instrument your code to try and force failures. Add calls to Object.wait(), Object.sleep(), Object.yield(), Object.priority() to change execution order.
public synchronized String nextUrlOrNull() {
if (hasNext()) {
String url = urlGenerator.next();
Thread.yield(); // Force context switch for testing
updateHasNext();
return url;
}
return null;
}
Use jiggling strategies (tools like IBM’s ConTest) to randomize these calls.
Chapter 14: Successive Refinement
This chapter is a detailed case study of cleaning up a command-line argument parser. It demonstrates the iterative refinement process.
The Args Implementation
The chapter shows a clean, final implementation of an Args class that parses command-line arguments:
Args arg = new Args("l,p#,d*", args);
boolean logging = arg.getBoolean('l');
int port = arg.getInt('p');
String directory = arg.getString('d');
The schema string defines:
l- boolean flagp#- integer argumentd*- string argument
How Did I Do This?
The key insight: I didn’t write it clean from the start. The first version was a mess. Clean code requires successive refinement.
Martin shows the progression:
- First draft: Boolean arguments only. Already messy.
- Second draft: Added String arguments. Code doubled in complexity.
- Crisis point: Adding Integer arguments would have created chaos.
- Stop and refactor: Recognize the pattern, extract abstractions.
The Progression
Step 1: Boolean arguments work
// Initial messy code
private void parseSchema() {
for (String element : schema.split(",")) {
parseSchemaElement(element);
}
}
private void parseSchemaElement(String element) {
if (element.length() == 1) {
parseBooleanSchemaElement(element);
}
}
Step 2: String arguments added
The code doubled. Similar if-else chains for parsing schema, parsing arguments, and retrieving values.
Step 3: See the pattern
Each argument type needs:
- Schema parsing
- Argument parsing
- Value retrieval
- Default value
Step 4: Extract the ArgumentMarshaler interface
public interface ArgumentMarshaler {
void set(Iterator<String> currentArgument);
Object get();
}
public class BooleanArgumentMarshaler implements ArgumentMarshaler {
private boolean booleanValue = false;
public void set(Iterator<String> currentArgument) {
booleanValue = true;
}
public Object get() {
return booleanValue;
}
}
Step 5: Move type-specific code to marshalers
Each marshaler handles its own parsing. The main class just orchestrates.
Key Lessons
Incrementalism: Software is never “written.” It’s grown. Start with something that works. Improve iteratively.
Refactoring is not rewriting: Changes are small. Tests pass after each change. Never break the system.
Stop when you see the mess: Don’t keep adding features to messy code. Clean it first.
The Boy Scout Rule in action: Leave the code cleaner than you found it. Every change is an opportunity.
“It is not enough for code to work. Code that works is often badly broken. Programmers who satisfy themselves with merely working code are behaving unprofessionally.”
Chapters 15-16: JUnit and SerialDate Case Studies
These chapters show real-world refactoring of production code.
JUnit Internals (Chapter 15)
Martin refactors the ComparisonCompactor class from JUnit. Key improvements:
- Extract method for complex conditions
- Rename variables for clarity (e.g.,
fExpected→expected) - Eliminate redundancy in string handling
- Simplify conditionals with guard clauses
Before:
private String compactExpected;
private String compactActual;
private void compactExpectedAndActual() {
findCommonPrefix();
findCommonSuffix();
compactExpected = compactString(expected);
compactActual = compactString(actual);
}
After:
private String compact(String source) {
return new StringBuilder()
.append(startingEllipsis())
.append(startingContext())
.append(DELTA_START)
.append(delta(source))
.append(DELTA_END)
.append(endingContext())
.append(endingEllipsis())
.toString();
}
Refactoring SerialDate (Chapter 16)
Martin refactors a date handling class. The chapter demonstrates:
- Fix bugs first: Found through testing
- Remove dead code: Unused variables, unreachable branches
- Rename for clarity:
stringToWeekdayCode→parseDay - Extract classes: Separate
DayDatefromSpreadsheetDate - Apply SRP: Move factory methods to separate factory class
- Use enums: Replace integer constants with type-safe enums
Key insight: Even well-intentioned code from experienced developers benefits from refactoring. Nobody writes perfect code the first time.
Chapter 17: Smells and Heuristics
This chapter is a reference catalog of code smells and heuristics. Use it as a checklist.
Comments
C1: Inappropriate Information Comments should not hold information better kept elsewhere (version control, issue tracking).
C2: Obsolete Comment Comments that are old, irrelevant, or incorrect. Update or delete them.
C3: Redundant Comment Comments that describe what code already says clearly.
C4: Poorly Written Comment Don’t ramble. Don’t state the obvious. Be brief.
C5: Commented-Out Code Delete it. Version control remembers.
Environment
E1: Build Requires More Than One Step
Building should be a single command: make, ant all, mvn package.
E2: Tests Require More Than One Step Running all tests should be a single command.
Functions
F1: Too Many Arguments Zero is best. Three is suspicious. More than three is problematic.
F2: Output Arguments Avoid them. Return values instead.
F3: Flag Arguments Boolean arguments signal the function does more than one thing. Split it.
F4: Dead Function Delete functions that are never called.
General
G1: Multiple Languages in One Source File Keep XML, HTML, JavaScript, etc. separate when possible.
G2: Obvious Behavior Is Unimplemented
Follow the Principle of Least Surprise. If a function is named getDayOfWeek(), it should return a day of the week.
G3: Incorrect Behavior at the Boundaries Don’t rely on intuition. Test every boundary condition.
G4: Overridden Safeties Don’t turn off compiler warnings. Don’t skip failing tests. Don’t ignore exceptions.
G5: Duplication Every time you see duplication, it represents a missed opportunity for abstraction. DRY (Don’t Repeat Yourself).
G6: Code at Wrong Level of Abstraction High-level concepts in base classes, low-level details in derivatives.
G7: Base Classes Depending on Their Derivatives Base classes should know nothing about their derivatives.
G8: Too Much Information Hide data. Hide utility functions. Hide constants and temporaries. Don’t expose inner structure.
G9: Dead Code Code that isn’t executed. Find and delete it.
G10: Vertical Separation Variables should be declared close to usage. Private functions should be just below their first usage.
G11: Inconsistency If you do something a certain way, do all similar things the same way.
G12: Clutter Remove unused variables, functions, comments.
G13: Artificial Coupling Things that don’t depend on each other shouldn’t be coupled.
G14: Feature Envy Methods that use accessors of another class extensively want to be in that class.
G15: Selector Arguments Boolean or enum arguments that select behavior should be separate functions.
G16: Obscured Intent Code should be as expressive as possible.
G17: Misplaced Responsibility Code should be where a reader would expect it.
G18: Inappropriate Static Prefer non-static methods. Static methods can’t be overridden.
G19: Use Explanatory Variables Break complex expressions into intermediate variables with meaningful names.
G20: Function Names Should Say What They Do If you have to look at implementation to understand, rename it.
G21: Understand the Algorithm Don’t just make code “work” through trial and error. Understand why it works.
G22: Make Logical Dependencies Physical If one module depends on another, that dependency should be explicit.
G23: Prefer Polymorphism to If/Else or Switch/Case Most conditionals on type can be replaced with polymorphism.
G24: Follow Standard Conventions Team conventions trump personal preferences.
G25: Replace Magic Numbers with Named Constants
86400 means nothing. SECONDS_PER_DAY is clear.
G26: Be Precise Don’t use floating-point for currency. Don’t use ArrayList when you need concurrency. Don’t ignore compiler warnings.
G27: Structure over Convention Enforce design decisions through structure, not just convention.
G28: Encapsulate Conditionals
// Bad
if (timer.hasExpired() && !timer.isRecurrent())
// Good
if (shouldBeDeleted(timer))
G29: Avoid Negative Conditionals
// Bad
if (!buffer.shouldNotCompact())
// Good
if (buffer.shouldCompact())
G30: Functions Should Do One Thing Multiple sections in a function are a sign it does too much.
G31: Hidden Temporal Couplings Make temporal dependencies explicit through arguments.
G32: Don’t Be Arbitrary Structure reflects reasoning. If structure seems arbitrary, others will feel free to change it.
G33: Encapsulate Boundary Conditions
// Bad
if (level + 1 < tags.length) {
parts = parse(level + 1);
}
// Good
int nextLevel = level + 1;
if (nextLevel < tags.length) {
parts = parse(nextLevel);
}
G34: Functions Should Descend Only One Level of Abstraction
G35: Keep Configurable Data at High Levels Don’t bury configuration deep in low-level functions.
G36: Avoid Transitive Navigation Modules should know only about their immediate collaborators (Law of Demeter).
Names
N1: Choose Descriptive Names Names are 90% of readability. Take time to choose them.
N2: Choose Names at the Appropriate Level of Abstraction Don’t pick names that communicate implementation details.
N3: Use Standard Nomenclature Where Possible Use design pattern names (Factory, Visitor), mathematical terms, etc.
N4: Unambiguous Names Choose names that make the workings of a function unambiguous.
N5: Use Long Names for Long Scopes Short names for small scopes, long names for large scopes.
N6: Avoid Encodings
No Hungarian notation, no m_ prefixes.
N7: Names Should Describe Side Effects
getOos() that creates an ObjectOutputStream should be createOrReturnOos().
Tests
T1: Insufficient Tests Test everything that could possibly break.
T2: Use a Coverage Tool Know what’s untested.
T3: Don’t Skip Trivial Tests They document behavior.
T4: An Ignored Test Is a Question about an Ambiguity Express requirements uncertainties as tests.
T5: Test Boundary Conditions Boundaries are where bugs cluster.
T6: Exhaustively Test Near Bugs Bugs congregate. When you find one, look for more nearby.
T7: Patterns of Failure Are Revealing Analyze test failures. Patterns point to causes.
T8: Test Coverage Patterns Can Be Revealing Looking at which code runs (or doesn’t) in passing tests is illuminating.
T9: Tests Should Be Fast Slow tests don’t get run.
Key Takeaways
-
Concurrency is hard. Keep concurrent code separate. Minimize shared data. Know your tools and patterns. Test aggressively.
-
Clean code is grown, not born. Start simple and refactor continuously. Successive refinement is the professional way.
-
Refactoring is safe. Small changes, tests pass after each one. Never break the system while improving it.
-
Code smells are patterns. Learn to recognize them. The heuristics catalog is a checklist for code review.
-
Professionalism is caring. Working code isn’t enough. Clean code is a matter of professional pride.
The Final Word
“Of course bad code can be cleaned up. But it’s very expensive. As code rots, the time and cost of making changes increases dramatically. In many cases, the cost becomes so high that the product dies.”
The only way to go fast is to go well. Clean code isn’t a luxury—it’s a survival skill.