The AI of Today Is Not Yours: Why Centralized Models Are Misaligned by Design

Artificial intelligence systems today are powerful but not personalized. They are trained centrally by platforms scraping massive amounts of public data and optimized for platform goals not for your needs. In this article we explore why that is misaligned by design and how a human digital twin built around local intelligence and personal alignment offers a fundamentally better path.

Centralized AI Today: Scraping the Internet and Opaque Objectives

Leading AI models are built on enormous datasets drawn from public web content, code repositories and forums. For instance Meta’s Llama 3 was trained on over 15 trillion tokens drawn from publicly available sources. It explicitly avoided training on private Meta user data.

At the same time model developers offer minimal transparency. The GPT-4 technical report intentionally omits many details about architecture training compute or dataset construction citing safety and competitive reasons. These practices underline that centralized models are opaque by design.

The reliance on large-scale scraping has triggered legal challenges. The New York Times sued OpenAI and Microsoft over the use of NYT journalism in model training without permission. Separately Getty Images sued Stability AI in both the US and UK for alleged unauthorized copying of millions of images to train Stable Diffusion. These disputes expose how the current status quo depends on sweeping ingestion of copyrighted content.

Regulators are responding. The EU AI Act entered into force on August 1 2024 introducing the first comprehensive AI rulebook with new obligations for model transparency data governance and risk management.

Misalignment: AI Optimized for Platforms Not People

Centralized AI systems are tuned to serve provider interests such as brand safety cost efficiency user retention and compliance not necessarily your objectives. Techniques such as Reinforcement Learning from Human Feedback (RLHF) and Constitutional AI do make models safer or more helpful on average but they still reflect provider defined values not yours.

Because models are proprietary you cannot easily inspect or adjust what they are optimizing. DeepMind’s documentation on specification gaming shows how even well defined objectives can be exploited in unintended ways. That reinforces how hard it is for a typical user to assert meaningful alignment on their own terms.

The Data Relevance Problem

General purpose models do not know about you. Without access to your calendar your documents your device data or your personal preferences the assistance they provide is necessarily superficial. When personal context is lacking models hallucinate or make generic assumptions.

Research from retrieval augmented generation (RAG) shows that grounding responses in relevant external documents at inference time significantly improves accuracy and recency. Extending RAG to your own private data would yield far more useful results but centralized assistants typically do not have consented access to that context.

Some vendors are attempting limited personalization. For example Google’s Gemini offers a “personal context” memory but it remains confined within Google’s own policies and infrastructure not under your full control.

Lack of Agency

Current assistants offer settings toggles for safety or style but most steering is set upstream by training pipelines red-team policies and hidden reward functions. RLHF or Constitution-based alignment remains a general solution not one that you can tailor to your personal goals with verifiable control.

Local Intelligence and Personal Alignment Through the Twin

Our proposal is to pair general AI capability with a Human Digital Twin, a privacy preserving reflection of your data identity that operates locally or within a user-controlled enclave. This shifts the boundary so your context your rules your motives come first.

This vision builds on existing technologies:

On-device AI such as Google’s Gemini Nano running generative features entirely on Pixel devices reducing latency and avoiding cloud exposure
Federated learning as used in Gboard where improvements are made without aggregating private typing data centrally
Private cloud compute such as Apple’s Private Cloud Compute which handles heavier tasks in Apple-managed infrastructure with strict data handling guarantees

A Twin architecture would combine these advantages:

Run models locally when possible for speed and privacy
Use RAG over your own encrypted local data sources
Let you define explicit objectives or personal “constitutions” for the assistant
Provide audit logs and consent prompts for any data crossing beyond your control

Ethical and Strategic Benefits

The shift to local and personal alignment offers several benefits:

Privacy by design not by permission: Keeping private context local aligns with GDPR principles of data minimization and purpose limitation and builds user trust
Regulatory resilience: Transparent computation and data flow helps meet EU AI Act audit and governance requirements
Lower latency and better availability: Edge computation like Gemini Nano means faster responses and reliability even under poor network conditions
Better relevance with less exposure: RAG using private data yields more accurate results without centralizing sensitive information

Clearer incentives: Your Twin becomes the optimization target so day-to-day behavior adapts to your preferences not generic platform metrics

What This Means in Practice

Your data your rules: Every model request passes through your Twin’s policy and consent layer
Context that helps: Your calendar emails notes or sensor logs are instantly available when you choose to share them
Portable alignment: Your personal preferences move with you across devices and services not the reverse

Auditable by default: Clear logs show what ran where against which data under what rules

Conclusion

Centralized AI will remain core infrastructure but it is fundamentally misaligned with individual users. Training on everyone optimizing for somebody else and blind to your real-world needs. The solution is not better simulation of your body but centering your data identity. A local-first Human Digital Twin anchored in personal retrieval explicit objectives and verifiable boundaries transforms generic intelligence into your intelligence.

Author: Sebastian Thum

Artificial Intelligence

Dwinity Blog