Large Language Models excel at pattern recognition and natural language understanding. They fail catastrophically at business logic execution. This paper explores why, and introduces Isomorphism of Intent as the only viable solution for systems where failure is not an option.
LLMs are probabilistic systems. They generate the next token based on probability distributions:
Input: "Process a refund for $100"
Output: [token1 with 0.8 probability, token2 with 0.15 probability, ...]
This is perfect for:
Business logic is deterministic. It requires:
Input: "Process a refund for $100"
Output: Merchant balance decreased by $100, Customer balance increased by $100, Email sent
No probability. No variation. Same input → Same output, always.
LLM Output: "The refund was processed"
Actual State: Merchant balance unchanged, Customer not credited, Email failed
Result: Silent failure
LLMs generate plausible-sounding but false information:
Prompt: "Check if the payment was processed"
LLM Response: "Yes, the payment was processed successfully"
Actual State: Payment failed, no transaction created
Cost: Regulatory violations, customer disputes, fraud
LLMs lose context in long workflows:
Step 1: "Create a transaction for $100"
Step 2: "Verify the transaction"
Step 3: "Process the refund"
...
Step 50: "What was the original amount?"
LLM Response: "I don't remember"
Cost: Inconsistent state, broken workflows
LLMs don’t understand constraints:
Constraint: "Merchant balance must never go negative"
Prompt: "Deduct $500 from a merchant with $100 balance"
LLM Response: "Deducted $500 successfully"
Actual State: Merchant balance is now -$400
Cost: Financial violations, audit failures
Same input produces different outputs:
Prompt: "Process a refund"
Run 1: "Refund processed, email sent"
Run 2: "Refund processed, email failed"
Run 3: "Refund failed, no action taken"
Cost: Impossible to debug, impossible to audit
LLMs don’t produce verifiable execution traces:
Prompt: "Process a refund"
LLM Response: "Done"
Question: "Prove it was done correctly"
LLM Response: "I can't show you the proof"
Cost: Regulatory non-compliance, no liability trail
Problem: LLM-based test automation produces flaky tests
Test: "Verify login works"
Run 1: PASS
Run 2: FAIL (same code, same environment)
Run 3: PASS
Cost: False negatives ship bugs, false positives waste time
Problem: LLM-based payment processing has no guarantees
Transaction: "Process $1000 payment"
LLM Response: "Payment processed"
Actual State: Payment failed, customer charged twice
Cost: Regulatory violations, customer disputes, fraud liability
Problem: LLM-based drone control is unsafe
Command: "Fly to coordinates and land"
LLM Response: "Flying to coordinates"
Actual State: Drone crashes into building
Cost: Physical damage, safety violations, liability
Problem: LLM-based incident response is unreliable
Alert: "Suspicious login from unknown IP"
LLM Response: "Threat detected, account locked"
Actual State: Account not locked, attacker gains access
Cost: Breach, data loss, compliance violation
Prompt: "Process a refund. Make sure to:
1. Verify the transaction
2. Deduct from merchant
3. Credit customer
4. Send email
5. Log the audit trail"
Why it fails: Prompts are still interpreted, not executed. No guarantees.
Prompt: "Let's think step by step:
1. Is the transaction valid?
2. Can we deduct from the merchant?
3. Can we credit the customer?
..."
Why it fails: Thinking doesn’t guarantee execution. The LLM can think correctly but execute incorrectly.
Prompt: "Here's the business logic: [rules]
Now process this refund: [transaction]"
Why it fails: RAG helps with context, but doesn’t solve the fundamental problem: LLMs are probabilistic, not deterministic.
Fine-tune the LLM on thousands of refund examples
Why it fails: Fine-tuning improves average performance, but doesn’t eliminate failure modes. You still get hallucinations, constraint violations, and non-determinism.
Use LLMs only for understanding intent, not for executing it.
LLM: "Understand the intent"
↓
Parser: "Convert to canonical form"
↓
Runtime: "Execute deterministically"
↓
Verifier: "Prove fidelity"
Step 1: Specification
Scenario: Process refund
Given a completed transaction
When the customer requests a refund
Then deduct from merchant account
And credit customer payment method
And send confirmation email
Step 2: Semantic Extraction (LLM)
Extract entities: transaction, merchant, customer
Extract operations: deduct, credit, send_email
Extract constraints: transaction must be completed
Step 3: Canonical Form (DAG)
verify_transaction
↓
deduct_from_merchant
↓
credit_customer
↓
send_email
↓
verify_all_constraints
Step 4: Deterministic Execution
def execute_refund(transaction):
verify_transaction(transaction) # Fail if not completed
deduct_from_merchant(transaction) # Deterministic
credit_customer(transaction) # Deterministic
send_email(transaction) # Deterministic
verify_all_constraints() # Prove fidelity
return audit_trail() # Proof of execution
Step 5: Verification
✓ Transaction was verified
✓ Merchant was debited
✓ Customer was credited
✓ Email was sent
✓ All constraints maintained
Fidelity: 100%
Same input → Same output, always. No variation.
Every step is logged and verifiable.
Invariants are checked at each step.
Behaviors can be combined without interference.
The system scales to complex workflows without degradation.
"How do we make LLMs better at business logic?"
Answer: You don’t. LLMs are probabilistic. Business logic is deterministic. They’re fundamentally incompatible.
"How do we use LLMs to understand intent, then execute deterministically?"
Answer: Agentic Workflows with Isomorphism of Intent.
This is not about replacing LLMs. It’s about using them correctly:
This is the only way to build reliable systems in critical domains.
“Orthogonal Orchestration: Why Gherkin and Figma are the Only Inputs You Need” — How to structure agentic workflows for maximum composability and reliability.