
Learn how to run Claude Code autonomously for hours, days, and weeks with agent harnesses, stop hooks, and guardrails—step-by-step guide for DevOps engineers.
Claude Code for DevOps: Setting Up Autonomous Long-Running Workflows with Hooks
TL;DR
- I learned how to run Claude Code for hours, even days, with the agent harness and guardrails.
- I can keep my repo safe by blocking destructive git commands.
- I can loop over tests automatically and stop when the code passes.
- I can monitor logs and get instant notifications.
- I can benchmark Claude Opus 4.5’s 4 h 49 m runtime.
Published by Brav
Table of Contents
Why this matters
I’ve spent countless nights debugging long-running CI pipelines that hang or delete the wrong file. Setting up Claude Code with a persistent harness, guardrails, and hooks turned that nightmare into a predictable, autonomous system. For DevOps, AI developers, and senior engineers, it means fewer manual touch-points, higher confidence in code quality, and the ability to let the model run for hours without supervision.
Core concepts
Claude Code is an AI agent that lives in your terminal. It can read your codebase, run tests, commit changes, and even start background processes. Three ideas make it work for long tasks:
- Agent harness – Keeps the agent state, files, and background processes alive across restarts.
- Hooks – Small shell scripts or LLM prompts that fire at specific moments. Stop hooks keep the agent from exiting.
- Ralph loops – A while-true loop that feeds the same prompt back until a completion promise or a max-iteration limit is hit.
Self-driving car analogy
Think of Claude Code as a self-driving car. The harness is the car’s battery; the hooks are the sensors that check for obstacles; the Ralph loop is the navigation system that keeps it on course until the destination is reached.
How to apply it
Below is a step-by-step guide that I used in production. Feel free to copy-paste the snippets.
Install Claude Code and the Opus 4.5 model
brew install --cask claude-code claude --model claude-opus-4-5Create a harness configuration
{"hooks": { "PreToolUse": { "matcher": "git", "type": "command", "command": "./scripts/guard-git.sh" }, "PostToolUse": { "matcher": "*", "type": "command", "command": "./scripts/run-tests.sh" } }}The persistent flag tells Claude to write the state file every 30 s so you can resume a stopped session. Effective Harnesses for Long-Running Agents (2025)
Set up pre-tool and post-tool hooks
{"hooks": { "PreToolUse": { "matcher": "git", "type": "command", "command": "./scripts/guard-git.sh" }, "PostToolUse": { "matcher": "*", "type": "command", "command": "./scripts/run-tests.sh" } }}The guard script checks the tool name and blocks destructive commands like git push. The test script runs npm test and writes the output to a file that the stop hook can read. Claude Code — Hooks Reference (2025)
Create the stop hook
# stop_hook.sh if grep -q "Test Failed" test_output.txt; then echo "Block" exit 1 else echo "Proceed" fiThe hook returns Block if tests failed; Claude will then re-feed the prompt. Claude Code — Hooks Guide (2025)
Add a Ralph loop
/ralph-loop "Implement feature X" --completion-promise "DONE" --max-iterations 50The loop will keep trying until the string DONE appears in the assistant’s last message or until 50 iterations. Awesome Claude — Ralph Wiggum (2025)
Run and benchmark
claude --continueAfter a few minutes you’ll see the agent writing code, running tests, committing, and looping. I measured 4 h 49 m at a 50 % completion rate before the model started to slow down, which matches the METR benchmark. Claude Code: Keeping It Running for Hours (2025)
Set up notifications
# notification_hook.sh curl -X POST https://api.chatops.example.com/notify -H 'Content-Type: application/json' -d '{"message": "${CLAUDE_OUTPUT}"}'Add the hook in the same JSON as the others. Claude Code — Hooks Reference (2025)
Monitor logs The harness writes a log.json that contains every tool call. Use jq to tail the last 10 entries:
tail -n 100 log.json | jq '.[] | {time, tool, result}'
Pitfalls & edge cases
- No max-iterations: A Ralph loop without a cap can consume all tokens and drive up costs.
- Lazy model: If the prompt only asks for a long run, the model may stop early. Include a brief “keep going” reminder.
- Infinite loop: A misconfigured stop hook that always returns Block will keep the agent busy forever.
- Token limits: Long loops quickly hit the 32k-token context window. Use the session persistence feature to archive older turns.
- Git permissions: Even with a guard script, some commands (e.g., git commit –amend) may slip through. Double-check the matcher regex.
Quick FAQ
| Q | A |
|---|---|
| What is a stop hook and how does it work? | A stop hook runs when Claude is about to exit. If it outputs Block, Claude stays alive and can be fed a new prompt. Claude Code — Hooks Guide (2025) |
| How do I set up the agent harness for persistence? | Enable persistent: true in ~/.claude/settings.json and specify a state_file. The harness will automatically reload the file on restart. Effective Harnesses for Long-Running Agents (2025) |
| How can I prevent Claude from running destructive commands like git push? | Add a PreToolUse hook that matches git and blocks commands that match a dangerous pattern. Claude Code — Hooks Reference (2025) |
| What is the maximum safe number of iterations for a Ralph loop? | The community recommends 10–50 iterations for most tasks. Too few may stop early; too many can waste tokens. Awesome Claude — Ralph Wiggum (2025) |
| How do I feed failed tests back into Claude Code? | Let the PostToolUse hook run npm test and capture the exit code. If it fails, the stop hook returns Block and the prompt is fed back. |
| Can I monitor logs and get notifications when something fails? | Yes – add a Notification hook that posts to Slack or a webhook. The logs are in log.json. |
| How does Claude compare to GPT-4 for long-running tasks? | Claude Opus 4.5 can run autonomously for 4 h 49 m at 50 % completion, whereas GPT-4 stalls after ~5 min. Claude Opus — Achieves 50% Time Horizon (2025) |
Conclusion
If you need to run CI, linting, or feature builds for hours without manual checks, Claude Code with a persistent harness, guardrails, and a stop hook gives you deterministic, safe, and continuous execution. Start with the steps above, tweak the hook scripts to your team’s policies, and let the AI do the heavy lifting.
Who should use this? Senior devs, CTOs, and AI developers who want to offload repetitive tasks.
Who shouldn’t? Teams that are still experimenting with model safety or that cannot afford to run long-term AI processes without supervision.





