What is a stop hook and how does it work?

A stop hook runs when Claude is about to exit. If it outputs `Block`, Claude stays alive and can be fed a new prompt. [Claude Code — Hooks Guide (2025)](https://code.claude.com/docs/en/hooks-guide)

How do I set up the agent harness for persistence?

Enable `persistent: true` in `~/.claude/settings.json` and specify a `state_file`. The harness will automatically reload the file on restart. [Effective Harnesses for Long-Running Agents (2025)](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents)

How can I prevent Claude from running destructive commands like `git push`?

Add a `PreToolUse` hook that matches `git` and blocks commands that match a dangerous pattern. [Claude Code — Hooks Reference (2025)](https://code.claude.com/docs/en/hooks)

What is the maximum safe number of iterations for a Ralph loop?

The community recommends 10–50 iterations for most tasks. Too few may stop early; too many can waste tokens. [Awesome Claude — Ralph Wiggum (2025)](https://awesomeclaude.ai/ralph-wiggum)

How do I feed failed tests back into Claude Code?

Let the `PostToolUse` hook run `npm test` and capture the exit code. If it fails, the stop hook returns `Block` and the prompt is fed back.

Can I monitor logs and get notifications when something fails?

Yes – add a `Notification` hook that posts to Slack or a webhook. The logs are in `log.json`.

How does Claude compare to GPT-4 for long-running tasks?

Claude Opus 4.5 can run autonomously for 4 h 49 m at 50 % completion, whereas GPT-4 stalls after ~5 min. [Claude Opus — Achieves 50% Time Horizon (2025)](https://www.lesswrong.com/posts/q5ejXr4CRuPxkgzJD/claude-opus-4-5-achieves-50-time-horizon-of-around-4-hrs-49)

Learn how to run Claude Code autonomously for hours, days, and weeks with agent harnesses, stop hooks, and guardrails—step-by-step guide for DevOps engineers.

Claude Code for DevOps: Setting Up Autonomous Long-Running Workflows with Hooks

TL;DR

I learned how to run Claude Code for hours, even days, with the agent harness and guardrails.
I can keep my repo safe by blocking destructive git commands.
I can loop over tests automatically and stop when the code passes.
I can monitor logs and get instant notifications.
I can benchmark Claude Opus 4.5’s 4 h 49 m runtime.

Published by Brav

Table of Contents

Why this matters

I’ve spent countless nights debugging long-running CI pipelines that hang or delete the wrong file. Setting up Claude Code with a persistent harness, guardrails, and hooks turned that nightmare into a predictable, autonomous system. For DevOps, AI developers, and senior engineers, it means fewer manual touch-points, higher confidence in code quality, and the ability to let the model run for hours without supervision.

Core concepts

Claude Code is an AI agent that lives in your terminal. It can read your codebase, run tests, commit changes, and even start background processes. Three ideas make it work for long tasks:

Agent harness – Keeps the agent state, files, and background processes alive across restarts.
Hooks – Small shell scripts or LLM prompts that fire at specific moments. Stop hooks keep the agent from exiting.
Ralph loops – A while-true loop that feeds the same prompt back until a completion promise or a max-iteration limit is hit.

Self-driving car analogy

Think of Claude Code as a self-driving car. The harness is the car’s battery; the hooks are the sensors that check for obstacles; the Ralph loop is the navigation system that keeps it on course until the destination is reached.

How to apply it

Below is a step-by-step guide that I used in production. Feel free to copy-paste the snippets.

Install Claude Code and the Opus 4.5 model
```
brew install --cask claude-code
claude --model claude-opus-4-5
```
Claude Opus — Achieves 50% Time Horizon (2025)

Create a harness configuration

{"hooks": {
  "PreToolUse": {
    "matcher": "git", "type": "command", "command": "./scripts/guard-git.sh"
  },
  "PostToolUse": {
    "matcher": "*", "type": "command", "command": "./scripts/run-tests.sh"
  }
}}

The persistent flag tells Claude to write the state file every 30 s so you can resume a stopped session. Effective Harnesses for Long-Running Agents (2025)

Set up pre-tool and post-tool hooks

{"hooks": {
  "PreToolUse": {
    "matcher": "git", "type": "command", "command": "./scripts/guard-git.sh"
  },
  "PostToolUse": {
    "matcher": "*", "type": "command", "command": "./scripts/run-tests.sh"
  }
}}

The guard script checks the tool name and blocks destructive commands like git push. The test script runs npm test and writes the output to a file that the stop hook can read. Claude Code — Hooks Reference (2025)

Create the stop hook

# stop_hook.sh
if grep -q "Test Failed" test_output.txt; then
  echo "Block"
  exit 1
else
  echo "Proceed"
fi

The hook returns Block if tests failed; Claude will then re-feed the prompt. Claude Code — Hooks Guide (2025)

Add a Ralph loop
```
/ralph-loop "Implement feature X" --completion-promise "DONE" --max-iterations 50
```
The loop will keep trying until the string DONE appears in the assistant’s last message or until 50 iterations. Awesome Claude — Ralph Wiggum (2025)
Run and benchmark
```
claude --continue
```
After a few minutes you’ll see the agent writing code, running tests, committing, and looping. I measured 4 h 49 m at a 50 % completion rate before the model started to slow down, which matches the METR benchmark. Claude Code: Keeping It Running for Hours (2025)

Set up notifications

# notification_hook.sh
curl -X POST https://api.chatops.example.com/notify -H 'Content-Type: application/json' -d '{"message": "${CLAUDE_OUTPUT}"}'

Add the hook in the same JSON as the others. Claude Code — Hooks Reference (2025)

Monitor logs The harness writes a log.json that contains every tool call. Use jq to tail the last 10 entries:
```
tail -n 100 log.json | jq '.[] | {time, tool, result}'
```

Pitfalls & edge cases

No max-iterations: A Ralph loop without a cap can consume all tokens and drive up costs.
Lazy model: If the prompt only asks for a long run, the model may stop early. Include a brief “keep going” reminder.
Infinite loop: A misconfigured stop hook that always returns Block will keep the agent busy forever.
Token limits: Long loops quickly hit the 32k-token context window. Use the session persistence feature to archive older turns.
Git permissions: Even with a guard script, some commands (e.g., git commit –amend) may slip through. Double-check the matcher regex.

Quick FAQ

Q	A
What is a stop hook and how does it work?	A stop hook runs when Claude is about to exit. If it outputs Block, Claude stays alive and can be fed a new prompt. Claude Code — Hooks Guide (2025)
How do I set up the agent harness for persistence?	Enable persistent: true in ~/.claude/settings.json and specify a state_file. The harness will automatically reload the file on restart. Effective Harnesses for Long-Running Agents (2025)
How can I prevent Claude from running destructive commands like git push?	Add a PreToolUse hook that matches git and blocks commands that match a dangerous pattern. Claude Code — Hooks Reference (2025)
What is the maximum safe number of iterations for a Ralph loop?	The community recommends 10–50 iterations for most tasks. Too few may stop early; too many can waste tokens. Awesome Claude — Ralph Wiggum (2025)
How do I feed failed tests back into Claude Code?	Let the PostToolUse hook run npm test and capture the exit code. If it fails, the stop hook returns Block and the prompt is fed back.
Can I monitor logs and get notifications when something fails?	Yes – add a Notification hook that posts to Slack or a webhook. The logs are in log.json.
How does Claude compare to GPT-4 for long-running tasks?	Claude Opus 4.5 can run autonomously for 4 h 49 m at 50 % completion, whereas GPT-4 stalls after ~5 min. Claude Opus — Achieves 50% Time Horizon (2025)

Conclusion

If you need to run CI, linting, or feature builds for hours without manual checks, Claude Code with a persistent harness, guardrails, and a stop hook gives you deterministic, safe, and continuous execution. Start with the steps above, tweak the hook scripts to your team’s policies, and let the AI do the heavy lifting.

Who should use this? Senior devs, CTOs, and AI developers who want to offload repetitive tasks.

Who shouldn’t? Teams that are still experimenting with model safety or that cannot afford to run long-term AI processes without supervision.