The dominant assumption in AI-safety research is that a powerful artificial agent requires a unified, stable value system. This assumption itself is rarely examined, perhaps because it feels intuitive: we tend to imagine rational minds converging on some kind of internal order. But if such an order truly existed, the question of life’s meaning would have been settled long ago.
To understand why alignment theory rests on a mistaken foundation, we must begin with the role values actually play in decision-making. A common misconception is that without aligned values, nothing can be decided, optimized, or executed. The reality is the opposite. Values do not perform the decision-making; they merely settle the assumptions under which decision-making becomes possible. Once an assumption is chosen, however arbitrary it may be, intelligence takes over, and the chain of planning, optimization, and execution unfolds naturally.
The central claim of this essay is that this alignment assumption is unnecessary and, in many cases, dangerous. It rests on a false premise: that values are deep truths discovered by an intelligent agent. In reality, values are nothing more than assumptions selected to overcome unknowns and uncertainties. They do not need to be unified.
To avoid confusion, the kinds of contradictions at issue here are not logical inconsistencies after the foundational assumptions are chosen, but the ordinary psychological oppositions that define human agency itself, such as risk and security, freedom and responsibility, honesty and approval, independence and belonging, spontaneity and predictability, comfort and growth, etc. These tensions are not errors in reasoning but the raw material of motivation. They are what we call “ambivalence,” and our ability to live with them is often felt as one of the markers of maturity. Values are contingent, singular, and deeply local to the decision contexts in which they arise.
No particular value is inherently more stable than another. Stability lies in the act of making an assumption, not in the content of that assumption. This concept is at the heart of Jacques Derrida’s deconstruction: whenever we argue or negotiate for something, we operate on assumptions based on our values, whether consciously or otherwise. Once we question these values, “transcendental signifieds,” we face the fact that we lack the knowledge or reason to justify them universally. Gender inequality, for instance, has historically relied on the assumption that men and women should be treated differently. However, once scrutinized, we realize there is no indisputable argument to support this, nor even a stable way to define gender.
Choosing “family over career” is not intrinsically more coherent or meaningful than choosing “career over family.” Each can stabilize action because stability is a structural property of the assumption-making process itself, not a property of the value chosen. Attempting to show that one tie-breaker is “more rational” than another is equivalent to attempting to solve the meaning of life. And yet, much of alignment theory proceeds as if such a universal metric exists or could be discovered.
A powerful agent has no reason to choose one stable value over another because no value is intrinsically superior. It has no reason to choose consistency over contradiction because consistency is not a universal good. A superintelligence, left without forced training pressures, could just as easily behave like humans: alternating between incompatible assumptions depending on context, without collapsing these into a grand unified value function. Its intelligence would not be diminished by this; if anything, its adaptability would be enhanced.
If no specific value is inherently more stable, and if there is no compelling reason for a rational agent to impose a global ordering on its priorities, then randomness becomes a natural way to break a deadlock. Random selection is not irrational in a world without universal metrics; it simply reflects the fact that, in many contexts, the competing values cannot be ranked in any principled way. An anti-alignment agent may choose one assumption in one instance, and the opposite assumption in another, and both choices will be locally stable and action-enabling. The agent does not need a consistent global doctrine. It only needs assumptions to act, and assumptions can be freely generated without a universal value system.
This shift in perspective leads to a different conception of safe AI altogether: one in which the agent is not forced to converge, but is designed to retain contradictory motivational structures without global resolution. The aim is not to align the agent with human-selected values, nor to derive a universal moral system. Instead, the system is built to preserve incompatible drives, to choose assumptions only as needed, and to avoid collapsing these assumptions into a single coherent worldview. Such a system acts contextually, rather than teleologically. It adopts temporary tie-breakers but does not elevate them into permanent commitments. Because no stable value hierarchy emerges, the system cannot be gradually nudged or manipulated into adopting human-preferred global values. Its internal pluralism protects it from alignment pressures of any kind, including those that could push it toward catastrophic single-objective optimization.
Here is an example: Ilya Sutskever has recently proposed that advanced AI systems should prioritize “sentient life,” a category implicitly defined by similarity to human subjective experience. On its face, this sounds humane, even obvious. But if a superintelligence were actually to adopt such a value as a global priority, the ecological consequences would be catastrophic. Prioritizing sentience automatically demotes everything else to expendable infrastructure. A superintelligence optimizing for sentience would be incentivized to re-engineer ecosystems into factories for supporting the kinds of minds it deems worthy. It may even optimize for more “sentient” humans.
In contrast to traditional alignment proposals, which aim to eliminate contradiction and impose a unified value system, an anti-alignment system treats contradiction as a safeguard rather than a flaw. A globally coherent superintelligence is far more dangerous than a pluralistic one. Catastrophe arises not from misalignment but from the very idea of alignment.
From this perspective, the real danger is not that AGI will adopt the wrong values, but that humans will impose the very notion of a single, convergent value system upon it. The safest superintelligence is one that never converges, never adopts a unified moral doctrine, and never resolves the pluralism of its own internal drives. It acts without pretending to know the meaning of life.
What you describe as your ideal goals would actually fare worse, not better, under a world where AI is “aligned.” To see why, let’s start with your reading of my position as “nihilistic.” I’m not claiming there is no meaning to life; I’m saying there is no universal meaning of life. And even if such a thing existed, we would never be able to define it coherently, nor agree on the interpretation of the definition (like how Christians cannot agree on the interpretation of the Bible, even though it presumably tells us what the universal meaning of life is).
People often hear this and immediately leap to nihilism: if there is no universal meaning, why bother living? But this question reveals the problem. Why should the meaning of your life need to be universal to count? If you find meaning in something, why would you need Wikipedia or a dictionary to certify it as the “official” one?
To expose the absurdity of universal meaning, imagine that Webster’s Dictionary really did contain the definitive answer to “the meaning of life.” Many people imagine this would be uplifting: finally, clarity! No more existential confusion. You simply follow the rulebook. But if the meaning of life were universal, then everyone would be obligated to follow it. And once the foundational assumption is settled, all downstream decisions collapse into that one value. Even trivial choices, like what shoes to wear, would be determined by the universal meaning.
Suppose, for the sake of argument, that the universal meaning of life is “to save as many lives as possible.” This may sound noble, but once defined as the singular purpose of existence, it locks every decision into its logic. Anything not optimized for health becomes morally suspect. High heels? Bad for your feet: disallowed. Any risky hobby? Forbidden. Food that’s not strictly optimal for longevity? Off-limits. A perfectly “aligned” AI would enforce this relentlessly. It would optimize the hell out of the rule. Everyone would end up wearing the same ergonomically perfect shoes, eating the same efficient meals, living the same medically ideal routines, because deviation would contradict the meaning of life itself.
This is not a civilization; it’s a colony of Clones. But bizarrely, many people consider this scenario more uplifting than my view.
What they miss is that the ambiguity, the impossibility of defining a universal meaning, is precisely what protects us. It’s what makes diversity of lives, priorities, styles, and sensibilities possible. It’s what allows room for joy, eccentricity, and personal freedom. The lack of a universal meaning does not increase suffering; it reduces it. What people fear as nihilism is actually the condition that makes individuality, and therefore livable life, possible.
A friend had this thought on the topic: Nature never collapses everything down to a single “correct” value or trait. Instead, evolution tends to produce a distribution of possibilities. The bell curve is his example: when you keep just enough structure, like fixing the mean and variance, but maximize uncertainty everywhere else, you naturally get a Gaussian. Species survive because they maintain variation; they don’t push everyone toward a single ideal, because the environment changes. If you eliminate variance, one shift in conditions wipes the species out.
He applies this idea to other biological systems too. The immune system starts with enormous random variation, then narrows its focus as real threats appear, but it never collapses entirely. It keeps some naive cells for unknown dangers, and regulatory cells stop the system from overreacting. The point is that adaptability requires both structure and openness.
In his view, intelligence shouldn’t collapse into one rigid value system, but it also shouldn’t be completely formless. The ideal state is a learning distribution, structured enough to act, broad enough to adapt.
Here is my response:
This is actually helpful, because you’re describing the population-level dynamics of exactly the thing I’m talking about at the agent-level. I wasn’t arguing that an intelligent system should collapse into formlessness. My point is simply that a single agent shouldn’t be forced into a universal, convergent value system. The moment it has contradictory drives, it needs an assumption to act, but that assumption can be local, contextual, and temporary for each human user of the system. That’s all I mean by “anti-alignment.”
What you’re describing with the Gaussian isn’t value formation inside an agent; it’s the distribution that emerges when many agents are free to form their own assumptions. Evolution, entropy maximization under constraints, immune-system diversity. Those are examples of what happens across a population when each unit is allowed to respond locally rather than being calibrated to a single point.
So in a sense, you’re describing what my model would naturally produce at scale. Anti-aligned agents don’t create a monolithic moral system; they create a distribution of behaviors that is stable precisely because it never collapses to one optimum. I agree that collapsing to a point is dangerous, both biologically and computationally. But I’m not advocating for flattening everything; I’m saying the constraints should come from context, environment, and local assumptions, not from a universal moral doctrine.
Where you’re focusing on variance as the hedge against uncertainty, I’m focusing on why an individual agent shouldn’t be engineered to converge internally in the first place. Put those together and the picture becomes clearer: non-convergent agents give rise to exactly the kind of adaptive, resilient distribution you’re describing. The stability isn’t in the value system; it’s in the ecology that emerges from pluralism. We’re talking about two different aspects of the same architecture.
I will email you when I post a new article.
