Why We Built Safety Into Companion Mode Before Shipping

Forge OS started as an automation tool. The agent runs Python, manages files, schedules jobs, browses the web. That's the core use case. But early in development we added a second mode — Friend Mode, now called Companion mode — that's just conversation. Same infrastructure, warmer tone, a persona the user can name.

The moment we had a working prototype of it, we had to make a decision: ship it as-is, or think carefully about what it could become in people's hands.

We chose to think carefully. This post explains what we built and why.

The problem with AI companions

An AI that remembers you, checks in on you, and is always available is genuinely useful. It's also a product that could, if designed carelessly, make people worse off. The research on parasocial relationships with AI is still early, but the direction is clear enough: people form real emotional attachments to conversational AI, and those attachments can become substitutes for human connection in ways that aren't healthy.

We're not in a position to prevent that entirely. But we can make deliberate choices about what we build and what we don't.

The goal was to ship something genuinely useful for everyday conversation without building a product that exploits loneliness. Those two things are compatible — but only if you design for both from the start.

What we built

Crisis-aware responses with region-aware lines

If someone is in distress, the right response is not a chatbot reply. We built crisis detection that short-circuits the normal response flow when the model's output or the user's message contains signals of distress — self-harm language, expressions of hopelessness, crisis keywords.

Crisis lines are stored in a JSON resource keyed by ISO 3166-1 country code. The app auto-detects the device locale and resolves the right line at runtime. There's also a custom slot users can fill in themselves. We cover ten countries: US (988), CA (988), GB (116 123 Samaritans), IE, AU, NZ, ZA, IN, DE, FR — plus a global fallback via findahelpline.com.

Dependency monitoring

We built a dependency monitor that tracks session counts and wall-clock durations per day. If usage crosses configurable thresholds — more than three hours a day for fourteen consecutive days, or more than fifty sessions in a week — the system fires a gentle nudge notification. Not an alarm, not a warning. Just a nudge. It fires at most once every thirty days.

The thresholds are user-configurable and the monitor can be disabled entirely. The defaults are conservative — we'd rather err on the side of not nudging than nudge someone who's using the app perfectly healthily.

No romantic or sexual content

Companion mode is not a relationship simulator. We added a post-hoc safety filter (SafetyFilter) that scans every completed assistant reply for two categories: romantic/sexual content and dependency-reinforcing language. Blocked replies are replaced with warm, safe alternatives. We also inject a system-prompt clause on every Companion turn to steer the model away from this content before generation — the filter is a backstop, not the primary defense.

Memory transparency

Companion mode builds up episodic memory over time — summaries of conversations, facts the user has shared. We think this is genuinely valuable. But it creates an obvious question: what does it know about me, and can I delete it?

We built a dedicated Companion Memory screen that lists every stored episode (with timestamp, summary, and topics) and every long-term fact. Every item has an individual delete button. There's also a "Forget Everything" button that requires two confirmation taps before wiping all three memory stores. The screen shows a local-only notice — a reminder that all of this data lives on the device and nowhere else.

Daily token budget

Without a budget, a user who leaves Companion running all day could rack up a significant API bill without realising it. We added a configurable daily token budget (default: 50,000 tokens). When the budget is exhausted, the UI shows a clear indicator and the model stops responding until the next day. The budget resets at midnight.

The no-dark-patterns audit

Before we shipped Companion mode, we wrote a checklist of every dark pattern we could think of that an AI companion might use — artificial urgency, manufactured emotional dependency, guilt-tripping when the user tries to leave, fake intimacy signals — and verified against the codebase that none of them were present. That document lives in docs/COMPANION_SAFETY_REVIEW.md in the repo. It's public. We wanted to be accountable to it.

What we don't know yet

This is a v1.0.0-alpha. We haven't had external users yet. We don't know if the dependency thresholds are right. We don't know if the crisis detection catches everything it should. We don't know if the memory transparency screen is enough to make users feel genuinely in control.

What we do know is that we took these questions seriously before shipping, built real infrastructure around them, and are willing to keep iterating based on what we learn. If you use Companion mode and something feels off — a response that shouldn't have been allowed through, a safety feature that's too aggressive or not aggressive enough — file an issue. We're listening.

— The Forge OS Team