“Run the standard cell viability assay on samples A1-A6, but use half the usual reagent volume.”
For a trained lab technician, this instruction is clear enough. For an automation system, it’s impossibly ambiguous. What’s the “standard” assay? What volumes are “usual”? What specific steps are involved?
Text-to-Protocol systems aim to bridge this gap—converting human-readable instructions into structured, machine-executable procedures. This isn’t just about parsing text; it’s about capturing intent, resolving ambiguity, and ensuring the resulting protocol is safe to execute.
The Problem Space
Traditional lab automation requires explicit programming: every pipetting step, every incubation time, every conditional branch must be specified in code. This creates a barrier between scientists who design experiments and automation systems that execute them.
| Approach | Flow | Bottleneck |
|---|---|---|
| Traditional | Scientist → Programmer → Code → Robot | Programmer bandwidth |
| Text-to-Protocol | Scientist → NLP → Protocol → Validation → Robot | Validation accuracy |
The promise is compelling: scientists describe what they want in familiar terms, and the system figures out how to make it happen. But the implementation is far from trivial.
Protocol Representation Formats
Before an LLM can generate a protocol, you need to decide how protocols are represented. This choice has significant implications for validation, execution, and maintainability.
Option 1: JSON Schema
The most straightforward approach—define a JSON schema that captures protocol structure:
{
"protocol": {
"name": "Cell Viability Assay",
"version": "1.0",
"steps": [
{
"id": "step_1",
"action": "aspirate",
"source": {
"labware": "reagent_plate",
"wells": ["A1"]
},
"volume_ul": 50,
"parameters": {
"speed": "normal",
"touch_tip": true
}
},
{
"id": "step_2",
"action": "dispense",
"destination": {
"labware": "sample_plate",
"wells": ["A1", "A2", "A3", "A4", "A5", "A6"]
},
"volume_ul": 8.33,
"parameters": {
"speed": "slow",
"mix_after": {
"cycles": 3,
"volume_ul": 25
}
}
},
{
"id": "step_3",
"action": "incubate",
"labware": "sample_plate",
"parameters": {
"temperature_c": 37,
"duration_min": 30,
"humidity_percent": 95
}
}
],
"metadata": {
"estimated_duration_min": 45,
"required_labware": ["reagent_plate", "sample_plate"],
"required_equipment": ["pipette_8channel", "incubator"]
}
}
}
Pros:
- Easy to validate with JSON Schema
- Widely supported, language-agnostic
- Human-readable
Cons:
- Verbose for complex protocols
- Limited expressiveness for conditionals and loops
- No built-in semantics
Option 2: Domain-Specific Language (DSL)
A purpose-built language for laboratory procedures:
PROTOCOL "Cell Viability Assay" v1.0
LABWARE:
reagent_plate: 96-well plate at position 1
sample_plate: 96-well plate at position 2
PROCEDURE:
# Transfer reagent to samples
TRANSFER 50uL FROM reagent_plate:A1
TO sample_plate:A1-A6
USING 8-channel pipette
WITH mix_after(3, 25uL)
# Incubate
INCUBATE sample_plate
AT 37C FOR 30min
HUMIDITY 95%
# Measure
READ sample_plate
MODE fluorescence
WAVELENGTH 485/535nm
CONSTRAINTS:
- reagent_plate MUST be at 4C before use
- total_volume per well MUST NOT exceed 200uL
Pros:
- Concise, readable syntax
- Can encode domain-specific semantics
- Natural expression of constraints
Cons:
- Requires custom parser
- Learning curve for users
- Tooling must be built from scratch
Option 3: Existing Standards
Several standards exist for laboratory protocol exchange:
| Standard | Focus | Adoption |
|---|---|---|
| Autoprotocol | Cloud lab execution | Transcriptic/Strateos |
| SBOL | Synthetic biology | Academic |
| AnIML | Analytical data | Instrument vendors |
| SiLA 2 | Lab instrument communication | Industry consortium |
# Example: Autoprotocol-style representation
from autoprotocol import Protocol, Container
p = Protocol()
sample_plate = p.ref("sample_plate", cont_type="96-pcr", storage="cold_4")
p.transfer(
source=reagent_plate.wells_from(0, 1),
dest=sample_plate.wells_from(0, 6),
volume="50:microliter",
mix_after=True
)
p.incubate(sample_plate, where="warm_37", duration="30:minute")
Pros:
- Industry backing
- Existing tooling
- Interoperability potential
Cons:
- May not fit your specific domain
- Can be over-engineered for simple cases
- Vendor lock-in risks
Recommendation
For most cases, start with JSON with a well-defined schema. It’s the easiest for LLMs to generate reliably, straightforward to validate, and can always be compiled to other formats if needed.
Natural Language → LLM → JSON Protocol → Validator → DSL/Autoprotocol → Execution
The LLM Pipeline
Converting natural language to structured protocols involves several stages, each with its own challenges.
Stage 1: Intent Extraction
Before generating a full protocol, understand what the user actually wants:
Example: “Run viability assay on A1-A6, half reagent volume”
| Extraction Step | Output |
|---|---|
| Assay Type | viability |
| Samples | A1-A6 |
| Modifier | 0.5x reagent |
| Template Resolution | standard_viability_v2 + apply modifications |
Implementation approach:
EXTRACTION_PROMPT = """
You are a laboratory protocol assistant. Extract structured information from the user's request.
Return JSON with:
- protocol_type: The type of assay/procedure requested
- samples: List of sample identifiers or ranges
- modifications: Any deviations from standard procedure
- ambiguities: Any unclear aspects that need clarification
User request: {user_input}
"""
def extract_intent(user_input: str) -> dict:
response = llm.generate(
prompt=EXTRACTION_PROMPT.format(user_input=user_input),
response_format={"type": "json_object"}
)
return json.loads(response)
Stage 2: Template Resolution
Most protocols are variations of standard procedures. Rather than generating from scratch, resolve to a template and apply modifications:
class ProtocolResolver:
def __init__(self, template_library: dict):
self.templates = template_library
def resolve(self, intent: dict) -> dict:
# Find matching template
template_id = self.find_template(intent["protocol_type"])
template = self.templates[template_id].copy()
# Apply sample mapping
template = self.apply_samples(template, intent["samples"])
# Apply modifications
for mod in intent["modifications"]:
template = self.apply_modification(template, mod)
return template
def apply_modification(self, template: dict, modification: dict) -> dict:
"""Apply a modification like 'half reagent volume'"""
if modification["type"] == "volume_scale":
scale = modification["factor"]
for step in template["steps"]:
if "volume_ul" in step:
step["volume_ul"] *= scale
return template
Stage 3: Full Generation (When Needed)
For truly novel protocols, generate from scratch with careful prompting:
GENERATION_PROMPT = """
You are a laboratory automation expert. Generate a structured protocol in JSON format.
## Protocol Requirements
{requirements}
## Available Equipment
{equipment_list}
## Available Labware
{labware_list}
## Output Format
Generate a JSON protocol following this schema:
{schema}
## Constraints
- All volumes must be between 1uL and 1000uL
- All temperatures must be between 4C and 95C
- Each step must reference existing labware
- Pipetting steps must specify source and destination
## Examples
{few_shot_examples}
Generate the protocol:
"""
def generate_protocol(requirements: str, context: dict) -> dict:
prompt = GENERATION_PROMPT.format(
requirements=requirements,
equipment_list=json.dumps(context["equipment"]),
labware_list=json.dumps(context["labware"]),
schema=json.dumps(PROTOCOL_SCHEMA),
few_shot_examples=format_examples(context["examples"])
)
response = llm.generate(
prompt=prompt,
response_format={"type": "json_object"},
temperature=0.2 # Lower temperature for more consistent output
)
return json.loads(response)
Few-Shot Examples Matter
The quality of generated protocols depends heavily on examples:
EXAMPLES = [
{
"input": "Add 100uL of PBS to wells A1-A12",
"output": {
"steps": [{
"action": "dispense",
"source": {"labware": "reagent_reservoir", "wells": ["A1"]},
"destination": {"labware": "plate_1", "wells": ["A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A9", "A10", "A11", "A12"]},
"volume_ul": 100,
"reagent": "PBS"
}]
}
},
{
"input": "Serial dilution 1:2 across row A, starting with 200uL",
"output": {
"steps": [
{
"action": "dispense",
"destination": {"labware": "plate_1", "wells": ["A1"]},
"volume_ul": 200
},
{
"action": "serial_dilute",
"labware": "plate_1",
"wells": ["A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A9", "A10", "A11", "A12"],
"dilution_factor": 2,
"transfer_volume_ul": 100
}
]
}
}
]
Validation Pipeline
Generated protocols must be validated before execution. This is non-negotiable in laboratory environments where errors can waste expensive reagents, damage equipment, or compromise safety.
Layer 1: Schema Validation
First, ensure the output is structurally valid:
from jsonschema import validate, ValidationError
PROTOCOL_SCHEMA = {
"type": "object",
"required": ["steps"],
"properties": {
"steps": {
"type": "array",
"items": {
"type": "object",
"required": ["action"],
"properties": {
"action": {"enum": ["aspirate", "dispense", "transfer", "incubate", "mix", "read"]},
"volume_ul": {"type": "number", "minimum": 0.1, "maximum": 1000},
"temperature_c": {"type": "number", "minimum": 4, "maximum": 95}
}
}
}
}
}
def validate_schema(protocol: dict) -> list[str]:
errors = []
try:
validate(instance=protocol, schema=PROTOCOL_SCHEMA)
except ValidationError as e:
errors.append(f"Schema error: {e.message}")
return errors
Layer 2: Semantic Validation
Check that the protocol makes sense in the domain:
class SemanticValidator:
def validate(self, protocol: dict, context: dict) -> list[str]:
errors = []
# Check labware references
errors.extend(self.validate_labware_refs(protocol, context["labware"]))
# Check volume constraints
errors.extend(self.validate_volumes(protocol))
# Check liquid tracking
errors.extend(self.validate_liquid_tracking(protocol))
# Check equipment availability
errors.extend(self.validate_equipment(protocol, context["equipment"]))
return errors
def validate_liquid_tracking(self, protocol: dict) -> list[str]:
"""Ensure we don't aspirate more than what's in a well"""
errors = []
well_volumes = defaultdict(float)
for step in protocol["steps"]:
if step["action"] == "dispense":
for well in step["destination"]["wells"]:
well_volumes[well] += step["volume_ul"]
elif step["action"] == "aspirate":
for well in step["source"]["wells"]:
if well_volumes[well] < step["volume_ul"]:
errors.append(
f"Aspirating {step['volume_ul']}uL from {well} "
f"but only {well_volumes[well]}uL available"
)
well_volumes[well] -= step["volume_ul"]
return errors
Layer 3: Safety Validation
Enforce safety rules that should never be violated:
class SafetyValidator:
def __init__(self, safety_rules: list):
self.rules = safety_rules
def validate(self, protocol: dict) -> list[str]:
errors = []
for rule in self.rules:
if not rule.check(protocol):
errors.append(f"Safety violation: {rule.description}")
return errors
# Example safety rules
SAFETY_RULES = [
MaxVolumeRule(max_ul=200, description="Well volume must not exceed 200uL"),
IncompatibleReagentsRule(
pairs=[("bleach", "acid"), ("oxidizer", "reducer")],
description="Incompatible reagents must not be mixed"
),
TemperatureRangeRule(
min_c=4, max_c=65,
description="Temperature must be in safe range"
),
RequiredPPERule(
hazardous_reagents=["formaldehyde", "ethidium_bromide"],
description="Hazardous reagents require appropriate PPE"
)
]
Layer 4: Simulation/Dry Run
Before touching real equipment, simulate the protocol:
class ProtocolSimulator:
def __init__(self, deck_layout: dict):
self.deck = DeckState(deck_layout)
self.pipette = PipetteState()
def simulate(self, protocol: dict) -> SimulationResult:
timeline = []
warnings = []
for step in protocol["steps"]:
try:
result = self.execute_step(step)
timeline.append({
"step": step,
"state": self.deck.snapshot(),
"duration_s": result.duration
})
except SimulationError as e:
return SimulationResult(
success=False,
error=str(e),
timeline=timeline
)
return SimulationResult(
success=True,
timeline=timeline,
total_duration_s=sum(t["duration_s"] for t in timeline),
warnings=warnings
)
Validation Pipeline Integration
| Layer | Check | On Fail |
|---|---|---|
| 1. Schema | JSON structure valid? | Regenerate |
| 2. Semantic | Labware exists? Volumes feasible? | Regenerate |
| 3. Safety | No hazardous combinations? Within limits? | Reject + Alert |
| 4. Simulation | Dry-run succeeds? | Regenerate |
| ✓ Pass All | → Ready for Human Review |
Human-in-the-Loop
Even with thorough validation, human review should be part of the workflow—especially for novel protocols or high-stakes experiments.
Review Interface
Present generated protocols in a way that facilitates quick review:
┌─────────────────────────────────────────────────────────────┐
│ Generated Protocol: Cell Viability Assay (Modified) │
├─────────────────────────────────────────────────────────────┤
│ │
│ Original Request: │
│ "Run standard viability on A1-A6, half reagent" │
│ │
│ ⚠ Modifications from Standard: │
│ • Reagent volume: 50µL → 25µL (0.5x) │
│ │
│ Steps: │
│ ┌─────┬────────────┬──────────────┬─────────────┐ │
│ │ # │ Action │ Details │ Duration │ │
│ ├─────┼────────────┼──────────────┼─────────────┤ │
│ │ 1 │ Transfer │ 25µL to A1-6 │ 30s │ │
│ │ 2 │ Mix │ 3x at 20µL │ 45s │ │
│ │ 3 │ Incubate │ 37°C, 30min │ 30min │ │
│ │ 4 │ Read │ FL 485/535 │ 2min │ │
│ └─────┴────────────┴──────────────┴─────────────┘ │
│ │
│ Validation: ✓ Schema ✓ Semantic ✓ Safety ✓ Simulation │
│ Estimated Duration: 33 minutes │
│ Required Reagents: Viability Reagent (150µL total) │
│ │
│ [Approve & Execute] [Edit] [Reject] [Save as Template] │
└─────────────────────────────────────────────────────────────┘
Approval Workflow
class ProtocolApproval:
def __init__(self, protocol: dict, user: str):
self.protocol = protocol
self.requested_by = user
self.status = "pending_review"
self.reviews = []
def submit_for_review(self, reviewer: str):
self.status = "under_review"
self.assigned_reviewer = reviewer
notify_reviewer(reviewer, self)
def approve(self, reviewer: str, comments: str = ""):
self.reviews.append({
"reviewer": reviewer,
"decision": "approved",
"comments": comments,
"timestamp": datetime.now()
})
self.status = "approved"
def request_changes(self, reviewer: str, changes: list[str]):
self.reviews.append({
"reviewer": reviewer,
"decision": "changes_requested",
"changes": changes,
"timestamp": datetime.now()
})
self.status = "changes_requested"
Handling Ambiguity
Natural language is inherently ambiguous. Good text-to-protocol systems must detect and resolve ambiguity rather than guessing.
Common Ambiguities
| Ambiguity Type | Example | Resolution Strategy |
|---|---|---|
| Implicit references | ”the samples” | Ask for clarification or infer from context |
| Undefined terms | ”standard protocol” | Lookup in protocol library |
| Missing parameters | ”incubate briefly” | Use defaults or ask |
| Conflicting instructions | ”quickly but gently” | Prioritize or ask |
Clarification Dialog
class ClarificationHandler:
def __init__(self, llm):
self.llm = llm
def detect_ambiguities(self, intent: dict) -> list[Ambiguity]:
"""Use LLM to identify ambiguous elements"""
prompt = f"""
Analyze this protocol request for ambiguities:
{json.dumps(intent)}
Identify any:
- Undefined references (e.g., "the plate" without specifying which)
- Missing required parameters
- Vague quantities or durations
- Implicit assumptions
Return JSON list of ambiguities with suggested clarifying questions.
"""
response = self.llm.generate(prompt)
return [Ambiguity(**a) for a in json.loads(response)]
def generate_clarification(self, ambiguities: list[Ambiguity]) -> str:
"""Generate user-friendly clarification request"""
questions = []
for amb in ambiguities:
questions.append(f"• {amb.question}")
if amb.suggestions:
questions.append(f" Options: {', '.join(amb.suggestions)}")
return "I need a few clarifications:\n" + "\n".join(questions)
Example Dialog
User: Run the assay on today's samples
System: I need a few clarifications:
• Which assay would you like to run?
Options: Cell Viability, ELISA, Bradford Protein
• Where are today's samples located?
Options: Plate 1 (Position A), Plate 2 (Position B)
User: Viability assay, samples are in Plate 1
System: Got it. Generating protocol for Cell Viability Assay
on Plate 1 samples...
Error Recovery
When generation fails or produces invalid output, the system should recover gracefully.
class ResilientGenerator:
def __init__(self, llm, max_retries: int = 3):
self.llm = llm
self.max_retries = max_retries
def generate_with_recovery(self, request: str, context: dict) -> dict:
errors_encountered = []
for attempt in range(self.max_retries):
# Generate protocol
protocol = self.generate(request, context, errors_encountered)
# Validate
validation_errors = self.validate(protocol)
if not validation_errors:
return protocol
# Log errors for next attempt
errors_encountered.extend(validation_errors)
# Adjust prompt based on error type
if any("volume" in e for e in validation_errors):
context["hints"].append("Pay careful attention to volume constraints")
if any("labware" in e for e in validation_errors):
context["hints"].append("Only use labware from the provided list")
# All retries failed
raise GenerationError(
f"Failed to generate valid protocol after {self.max_retries} attempts",
errors=errors_encountered
)
Practical Architecture
Putting it all together for air-gapped deployment:
| Layer | Components | Responsibilities |
|---|---|---|
| User Interface | Input, Review UI | Natural language input, approval workflow |
| Processing | Intent Extractor, Template Resolver, Generator, Validator | Parse → Resolve → Generate → Validate |
| Resources | Protocol Templates, Local LLM (7B), Safety Rules | Domain knowledge, inference, constraints |
| Execution | Simulator, Lab Equipment | Dry-run verification, physical execution |
Data Flow: Input → Intent Extraction → Template Resolution → LLM Generation → Validation → Human Review → Simulation → Equipment
Model Selection for Protocol Generation
| Task | Recommended Model | Why |
|---|---|---|
| Intent extraction | Qwen3-0.6B / Phi-3 | Fast, simple structured output |
| Protocol generation | Llama3.2-3B / Mistral-7B | Balance of capability and speed |
| Ambiguity detection | Same as generation | Requires reasoning |
| Validation | Rule-based (no LLM) | Deterministic, auditable |
Considerations for Adoption
Start Simple
Don’t try to handle all possible protocols. Start with:
- A single, well-understood protocol type
- Clear template with parameterized variations
- Strict validation rules
- Mandatory human review
Build Your Template Library
The more templates you have, the less the LLM needs to generate from scratch:
TEMPLATES = {
"cell_viability_standard": {...},
"cell_viability_extended": {...},
"elisa_sandwich": {...},
"elisa_competitive": {...},
"pcr_standard": {...},
# etc.
}
Log Everything
Every generation, validation, and execution should be logged:
@dataclass
class ProtocolAuditLog:
timestamp: datetime
request_id: str
user_input: str
extracted_intent: dict
generated_protocol: dict
validation_results: list
human_review: dict
execution_result: dict
Regulatory Considerations
For GxP environments:
- Generated protocols may need to be treated as controlled documents
- The LLM itself may need validation
- Audit trails must be complete and tamper-evident
- Human approval workflows must be documented
Closing Thoughts
Text-to-Protocol systems represent a significant step toward making lab automation accessible. The technology is ready—LLMs can generate structured output reliably, and validation systems can catch errors.
The challenge is building the infrastructure around the LLM: robust validation, sensible defaults, clear error messages, and appropriate human oversight. Get these right, and you have a system that genuinely accelerates scientific work.
Get them wrong, and you have a system that generates plausible-looking protocols that fail in subtle, expensive ways.
The key insight: the LLM is just one component. Template libraries, validation rules, and human review processes are equally important. Invest in all of them.
Protocol generation is advancing rapidly. The specific models and techniques will evolve, but the architectural principles—structured output, multi-layer validation, human oversight—will remain relevant.
Further Reading: