What's New in Privacy AI

Stay up to date with the latest features and improvements

What’s New

v1.3.2

Version 1.3.2

Features

MLX Model Integration
Privacy AI now supports the MLX model engine, enabling both text and vision models to run locally. You can directly download models from Hugging Face by entering a repository ID and access token (if required).
The new download manager adds resume-on-failure, background downloading, and a model integrity verifier for reliable large-model transfers.
MLX models now fully support local tool calls and MCP tool calls, just like GGUF and remote API models.
Best of all, MLX model support is included in the Free Plan.
Clipboard and Drag-and-Drop Enhancements
The Chat and Reader editors now allow pasting images, videos, and files directly from the clipboard. On iPad, you can also drag and drop files or images into the chat editor — a smoother, more natural way to attach media.

Improvements

llama.cpp Upgrade
Updated llama.cpp from b6558 → b6692, adding compatibility for new models such as Qwen3 Reranker and LiquidAI LFM2-2.6B, plus broader quantization format support.
Text Rendering Refinement
Improved text layout and rendering during “thinking” mode for better readability and consistency across devices.
Smarter Model Switching
Switching local models now filters to show only models of the same type — MLX, GGUF, or Apple Intelligence — for a cleaner experience.

Bug Fixes

Thinking Text Display
Fixed an issue where “thinking” models occasionally displayed incomplete or cut-off lines when collapsed.
Thinking Mode Persistence
Fixed a bug where the local model’s thinking-mode setting was not properly saved or applied to new chat sessions.
Single Local Model Enforcement
Resolved a critical issue where MLX and GGUF models could run simultaneously, potentially causing memory overflow.

v1.2.4

Version 1.2.4

Improvements

GLM-4.6 Integration
Added support for the latest GLM-4.6 model under the Z.ai provider, giving users access to improved performance and reasoning capabilities.
IBM Model Branding
Introduced official IBM logos for the Granite series and other IBM models, making it easier to visually identify them in the model list.
Enhanced iPad Experience
Improved the remote model configuration UI on iPad with a cleaner layout and better usability for larger screens.

Bug Fixes

Remote Parameter Saving
Fixed an issue where key parameters—including temp, seed, top_p, repeat_penalty, presence_penalty, frequence_penalty, and reasoningEffort—were not being saved correctly for remote models. These settings are now preserved as expected.

v1.2.3

Version 1.2.3

Improvements

Refined switch model behavior
The switch model button now switches between models from the same provider instead of toggling between local and remote APIs.

Bug Fixes

Consistent model picker display
Fixed an issue where the model picker in chat did not consistently show both remote and local models.
Removed duplicate switch button
Eliminated the extra switch model button in the chat UI. Users can now rely on the Clone and Fork Chat features for a better way to continue a conversation with another model.

v1.2.2

Version 1.2.2

Features

Free Plan launched
Most core features are now permanently free, including Local Models, Apple’s on-device Foundation Model, iCloud sync, natural language chat, Reader, 25+ built-in tools (Search, News, Stocks, Weather, Health, Email, Calendar, etc.), conversation cloning, advanced export, audio/video transcription, Siri & Shortcuts, and more.
Only advanced features—Cloud models, MCP, and custom API providers—require a Pro Plan subscription.
Configurable text rendering refresh rate
Control how often text refreshes under throttle to prevent UI lag and battery drain with very fast models.
llama.cpp engine upgrade
Upgraded to version b6558 with support for Liquid AI series models.
Model settings copy on chat clone
When cloning or forking a chat with the same model, all settings (temperature, top_p, top_k, context, etc.) are preserved.

Improvements

Optimized typewriter effect
Optimize the typing effect with dynamic throttling and caching for smoother screen updates and longer battery life.
Auto-scroll enabled by default
Replies now scroll automatically when new text appears.
Updated default model parameters
Set Temperature to 0.3 (from 0.6) and Top P to 1.0 (from 0.9) based on community feedback for better defaults. Fully customizable.
Streaming support for Apple Foundation Model
The on-device model now streams responses for faster interaction.

Bug Fixes

Correct model categorization
Fixed an issue where the Apple on-device Foundation Model was mistakenly treated as a cloud model.
Renaming imported models
Fixed a bug that prevented local or imported models from changing their titles.

v1.1.33

Version 1.1.33

Feature

Added Apple’s on device Foundation model Added Apple’s on device Foundation model to the local model list. Integrated with 25 built-in tools (search, email, news, and more) and extended support for external MCP tool calling.

Improvement

Customizable think tags Users can now define their own start and end tags for self-hosted or cloud models.

Bug Fix

Resolved system prompt issue Fixed a bug where system prompts for self-hosted and cloud models could not be saved.

v1.1.32

Version 1.1.32

Features

iOS 26 Support Integration with the latest iOS 26 UI and Apple’s on-device Foundation Model.
Natural Talk UI A new hands-free interface that lets you talk naturally with any model.
Update Highlights The “What’s New” screen now appears automatically after each app update.
Smarter History Rebuilt history handling now supports binary attachments (images, PDFs, and more) directly in chat history, so you can reference them later.
Richer Previews Attachments now preview Microsoft Office, Apple iWork, PDF, video, and audio files.
New Image Models Added support for OpenAI’s 'gpt-image-1' and HuggingFace’s 'black-forest-labs/FLUX.1-dev'.
TTS Model Support Added OpenAI protocol text-to-speech endpoints, including gpt-4o-tts' and 'gpt-4o-mini-tts'.
Universal Preview Tool Open and export AI-generated PDFs, images, Office 365 documents, iWork files, videos, and audio.

Improvements

OS Requirement Raised minimum requirement to iOS 18.6, keeping APIs aligned with current and previous iOS versions.
Model Selection Prevents starting a remote chat session without choosing a model.
File Compatibility Improved binary file handling (e.g., PDFs) with full support for OpenAI-compatible servers like OpenRouter. Raw PDFs can now be uploaded directly.
Protocol Inspector Limited displayed text to 2,000 characters to avoid freezes from large Base64-encoded requests.
Token Management Smarter history trimming to stay within each model’s token limits.
API Pricing Added pricing data for TTS, image generation, and transcription models, including hybrid cost methods (per minute, per image, etc.).
Tool Calls Improved UI for tool calls, which are now saved and reloaded with chats.

Bug Fixes

Audio Playback Fixed an issue where the mini player bar stopped playback when scrolling through chat.
Image Queries Fixed a bug where refreshing an image query ignored updated parameters (count and size).

v1.1.31

Version 1.1.31

Features

Super Siri Tool Control – Configure tools when invoking Siri. Enable web search or calendar management directly by voice.
Smarter Local TTS – Pause and resume local text-to-speech from the current position.
Upgraded Speech Engine – Updated sherpa-onnx to 1.12.11 for better pause handling at punctuation.
Large File Processing – Oversized files are now split into chunks with preserved context, enabling summaries, translations, and more at any scale.

Improvements

Mini TTS Player – A new toolbar player replaces the blocking dialog, letting you read while listening. It also highlights the current text being spoken.
Multi-File Import – Import multiple files into a single chat for richer workflows.
Smarter Image Text Extraction – When enabled, extracted text is added as the alt attribute in Markdown image links, improving clarity for AI.

Bug Fixes

Markdown attachments now render in the Markdown viewer instead of plain text.
URL submissions now extract images according to Reader settings (previously all images were included).
Fixed X.com link handling: sharing or opening X posts in Reader now extracts the full post content.

v1.1.30

Version 1.1.30

Features

Multiple Attachments in Chats

All models now support multiple attachments in conversations. You can upload one or more files of different types, such as documents, spreadsheets, or images, into a single chat. Privacy AI will process all of them together, giving you richer context and more accurate results.
Process Multiple URLs at Once

Instead of pasting links one by one, you can now include multiple URLs in a single message. Privacy AI automatically fetches and processes all of them, merges the content, and sends it to the AI model for unified analysis.
Latest Office 365 Excel Support

Reader and chat now support the newest Office 365 XLSX file formats. Even the most recent Excel spreadsheets can be converted directly into clean Markdown for further reading, summarization, or analysis.
Smarter Reader Settings

Two new configuration options: “JS Check Interval” and “JS Idle Threshold”, let you control how Privacy AI handles JavaScript execution when loading web pages. This fine-tuning improves accuracy and efficiency when converting remote web content into Markdown.
Interactive AI Images

AI-generated images are no longer static. You can now rotate, flip, zoom in, or zoom out within the app, making it easier to explore details and adjust perspective.
New Image Editor

A built-in Image Editor lets you draw entirely new images or make edits to existing ones. Use Apple Pencil or your fingers to sketch, highlight, or modify an image, then send it directly into a chat for AI processing.
Edit Sent Messages

Messages you’ve already sent can now be edited. When you make changes, all attachments linked to the original message are automatically preserved and copied to the updated message, saving time and preventing accidental data loss.
Context Usage Indicator

A new indicator shows live token usage in each chat. If the connected server supports usage reporting, Privacy AI will display accurate values; otherwise, it estimates them for you. This helps you track conversation length and know when it’s time to compress history or start a new chat.

Improvements

Friendly Error Messages

Technical errors are now automatically reformatted into plain, human-readable messages, so you can quickly understand what went wrong without needing to parse raw logs.
Upgraded Inspector

The Inspector tool is more powerful than ever. You can now export complete logs, copy them in full, or select and copy partial segments. This makes debugging model calls and protocol flows faster and more convenient.
Smarter Remote Error Handling

When remote AI servers return errors, Privacy AI will now reformat them into simple, explanatory messages rather than showing dense technical stack traces.
Better Message Editing

Edited messages now automatically retain their original attachments. This ensures that no files are lost when you make corrections or updates.
Optimized Token Allocation

The history management algorithm has been redesigned. It maximizes the amount of past conversation that can be preserved while maintaining safe margins for the current query and response, improving both reliability and model performance.
Improved File Upload Dialog

The file chooser now includes clear descriptions of each processing mode. These explain the technical details (such as BASE64 encoding or local vs. remote handling), the token cost implications, and the best situations for each option—helping you make smarter decisions.
Provider Memory

When you create a new remote AI model, Privacy AI now remembers the last provider you used. This removes the need to reselect your preferred provider each time, saving clicks and setup time.
Transparent Subscription UI

The subscription interface now shows detailed explanations of how free trials work, including timing and conditions, so you can make informed decisions before committing.
Faster Siri & Shortcuts

The AI workflow for Siri and Shortcuts has been streamlined. This stateless mode now runs more efficiently, improving performance and responsiveness in automations.

Bug Fixes

Resolved an issue where Siri and Shortcuts could not connect to AI models if the application was not already running in the background.

v1.1.29

Version 1.1.29

Features

OpenRouter Image Generation: Generate images directly through OpenRouter. Try the free Gemini 2.5 Flash Image Preview model today.
Smarter File Import for Media: When importing videos or images, a new dialog lets you configure dimensions. The app automatically scales the shorter side or uses iOS APIs to extract text, saving tokens and costs. You can also choose to send files unchanged to remote AI.
Flexible File Handling: For non-media files, you can upload them as-is or convert them to Markdown locally. Markdown conversion reduces cost but may trade off some accuracy.
iCloud Model Sync: Download a GGUF model once and use it across all your devices with the same iCloud account. Models load automatically in the background—no need for multiple downloads.
Reader Enhancements: Added support for capturing photos directly in the reader and selecting processing methods before importing documents, images, or videos. The reader UI has also been refreshed.
Expanded Self-Hosted Server Support: Now works with llama.cpp, vLLM, LocalAI, and Jan AI.
HTML in Code Blocks: Code blocks containing HTML can now be rendered and captured as screenshots for easy sharing.
Rich Chat History: Chat messages now preserve attachments and generated media, creating a more complete conversation record.
Feature Overview: Added a section that lists all features and highlights of the app for easier discovery.
Improved Model Pricing Display – Added new fields and improved the UI of model price information shown in the Remote Service and API Keys views.

Improvements

Updated llama.cpp to b6301 for improved performance and compatibility.
Remote Services now remember the collapsed/expanded status for each category.

Bug Fixes

Fixed an issue where newly installed apps sometimes failed to display local AI models in the selection list.
Fixed an issue where the application failed to refresh the local cache after remote configuration changes.

v1.1.26

Version 1.1.26

Features

Local vision models on device. You can now run vision-capable GGUF models locally via our llama.cpp mtmb integration (for example, Qwen2.5-VL 3B Instruct). Add images to a chat without sending data to the cloud.
Add our first vision model, "Qwen2.5-VL 3B Instruct."
Add new suggested models. Added "gemma-3-270m" to the recommended list

Improvements

Faster and more reliable downloads. Local model downloads are quicker and more stable; we also fixed an issue that could start extra download workers when the app moved between background and foreground.
Clearer remote model pricing. The price panel now shows "Cache Read (per 1M tokens)" and "Cache Write (per 1M tokens)" so you can estimate costs more accurately when KV cache is used.
Easier ways to contact us. In Feedback, you can open your default email app or DM our X account. There's also an optional email field if you want a reply. Because the app has no user accounts, we can only respond if you leave contact info.

Bug Fixes

Model list refresh. Newly downloaded local models now appear immediately when you switch models inside a chat.

v1.1.23

Version 1.1.23

Improvements

llama.cpp Upgrade – Updated from b5950 to b6131 for better GPU performance on Apple devices, more efficient KV cache handling, and support for new models: GLM-4.5, SmallThinker, Qwen3-Embedding, and Hunyuan Dense.
Faster Large Model Imports – Refactored the GGUF file processor to handle 4B+ models with the latest llama.cpp engine, importing quickly without memory overflows.
Perplexity API Update – Added support for the latest models: sonar, sonar-pro, sonar-deep-research, sonar-reasoning, and sonar-reasoning-pro. Model names can now be edited in Perplexity API settings. (Tip: If you already use the Perplexity API, remove it from the API list and restart the app to apply changes.)
TTS Settings Enhancement – The TTS settings view now displays your device's CPU core count to help you choose the optimal thread count.
Clearer X.com API Key Guide – Added more detailed instructions in Settings for using your X.com API key.
Tool Description Improvements – Updated the search_contact local tool description so AI knows when and how to use it. Related tools like send_email and send_sms now rely on it for recipient lookup. For example: "Send my wife an SMS and tell her I love her" will first trigger search_contact to find the recipient, then create the message.
Better HuggingFace Downloads – Model downloads now run in the background and can resume after interruptions. A prompt will notify you when the download is complete—ideal for large models like Qwen3-4B-Thinking/Instruct-2507.
Tool Search Bar – Added a search bar for local and MCP tools to quickly find what you need as the tool list grows.
UI Tips Added – Added quick usage tips for API Key Management, Tools, and Remote Services to help new users get started faster.

Bug Fixes

Search bar in Remote Services now filters models correctly.
Removed the "Add" menu from Local Tools, since adding external tools is no longer supported.
Resolved a crash when sending SMS with the send_sms tool.

v1.1.21

Version 1.1.21

Features

Offline Text-to-Speech – An offline TTS model Kokoros-82M (53 distinct voice styles) is now bundled directly in the app. Any AI reply or Reader article can be spoken aloud entirely on-device, so nothing is sent to the cloud and there are no per-character fees. You can also export the generated audio (M4A,WAV,AIFF) to Files, AirDrop, or any media player for later listening.
API Server Templates and Cloning – You can duplicate an existing server profile—such as the HuggingFace—to create custom endpoints in seconds. All models, headers, tokens, and endpoint settings are copied automatically. You only adjust what differs (base URL, model path, etc.). This is ideal when a provider exposes multiple endpoints under a single API key or when you run several private vLLM clusters.
Built-in GitHub Provider – GitHub has been added to the list of internal API provider.
Flexible Model Selection for Forked or Cloned Chats – When you fork an existing conversation or clone it into a new thread, you can now change the underlying model—local or remote—before the next message is generated. The original chat remains intact, and the new branch inherits the full context and tool settings while letting you compare answers or continue with a faster or cheaper engine.

Bug Fixes

The file-import button that disappeared inside chats is back.
Manually entered model names now save correctly in API Key configuration screen.

v1.1.17

Version 1.1.17

Features

Built-in API Access to z.ai – We've embedded native support for https://api.z.ai/api/paas/v4/. You can now call Z.AI services directly from Privacy AI with your token.
Major Siri Integration Upgrade – Local Models Now Work with Siri: You can now trigger local models using Siri voice commands. This is made possible by enhanced performance of llama.cpp-based models. Faster AI Replies for Siri: Adjusted prompt logic helps AI respond within Siri's strict time limits (typically under 8 seconds). For best results, use fast-response models—avoid long-thinking agents.
Search Tool Now Has Speed/Balance/Quality Modes – Choose your preferred mode for the searching_tool. "Speed" mode delivers up to 3× faster results, perfect for quick lookups.

Improvements

Perplexity Model Deprecation – The outdated r1-1776 model from Perplexity has been removed. Please switch to sonar-reasoning or sonar-reasoning-pro for continued access via OpenRouter.
Expanded OpenRouter Protocol Compatibility – Improved protocol handling ensures better performance and compatibility with the latest OpenRouter models and backends.
Improved Thinking Text View – When thinking models produce long outputs, the text now scrolls smoothly with a visible scrollbar—no more UI lockups on lengthy thoughts.
Code Block UI Stability – Markdown rendering has been optimized to remain responsive even when AI outputs include very large code blocks (1000+ lines). No more freezing or slowdowns in the chat UI.

v1.1.16

Version 1.1.16

Improvements

Fixed API Endpoint Save Bug – Resolved an issue where custom API base URLs were not being saved properly in the API settings screen.
Improved URL Input Experience – Disabled automatic capitalization for URL fields to prevent input errors when entering API endpoints.
Streamlined API Server Creation – You can now create a new API server directly from the API Key detail view—making it faster to configure your remote models.
Enhanced Launch Feedback – Added detailed progress indicators during app startup to show what's being synced or initialized.

v1.1.6

Version 1.1.6

Improvements

Upgraded to llama.cpp b5950 with expanded local models support – Added support for the following new local GGUF models: Menlo_Lucy, SmolLM3, OpenReasoning-Nemotron-1.5B.
Comprehensive rebuild and optimization of llama.cpp for iOS – Rebuilt the entire iOS build system for llama.cpp, generating a smaller and faster xcframework optimized for Apple devices. The Swift wrapper has been completely rewritten to improve memory handling and inference throughput. Benchmark results show a ~30% performance boost in local model prediction on Apple chips.
Improved YouTube subtitle handling – The YouTube caption downloader now automatically falls back to the default subtitle track when the English caption is unavailable, improving compatibility with non-English content.
Enhanced Markdown conversion for specific blog websites in Reader – Reader now better supports various blog layouts and formats, producing cleaner, more accurate Markdown output for improved content processing and analysis.

v1.1.5

Version 1.1.5

Features

Groq Cloud Support – You can now connect directly to Groq API inference service.

Improvements

Chat Prompt Sync with iCloud – Each chat now syncs its prompt properly across devices via iCloud.
Faster First-Time Setup on New Devices – We've optimized iCloud sync performance when launching the app for the first time on a new device. Chats, settings, and models now load faster and more reliably.
Improved Database Readiness for iCloud Devices – All critical data — including API keys and remote server configs — are now synced before the app starts. We've also added a "Refresh" button to manually trigger sync if needed.

v1.1.4

Version 1.1.4

Bug Fixes

Fixed a bug that prevented users from switching between text-only and text-to-image chat modes without creating a new session.
Resolved a serious performance issue on iPad that caused scrolling to drop below 10 FPS in some cases.
Added a new cache management system that significantly improves app launch speed.

v1.1.3

Version 1.1.3

Features

Scanned PDF OCR – Extract text from scanned PDFs — fully offline, no cloud involved.
Moonshot API Integration – Now supports Moonshot servers like Kimi-K2 natively.
Parallel Conversations – iPhone: run 8 AI chats at once. iPad: up to 12. Seamless multitasking.

Improvements

iPad Split View Enhanced – Smarter layout adaptation when multitasking on iPad.
iCloud Key Fix – Resolved API key sync issues across your Apple devices.
Chat Launch Boosted – Chats open faster than ever.
LiquidAI Ready – Upgraded llama.cpp to support Liquid series models for offline use.

v1.1.2

Version 1.1.2

Improvements

Qwen Model Optimization – Switched from 1.7B to 0.6B for better performance on older devices, with strong summarization and tool execution still intact
YouTube Caption Summarization – Quickly summarize videos using available captions
Improved Reading – Remembers your last reading position and features a smoother outline view
llama.cpp Upgrade – Updated to b5846, with support for Baidu's ERNIE-4.5 models

Bug Fixes

Photo Sharing Fixes – Sharing images into Privacy AI now works reliably across apps

v1.1.1

Version 1.1.1

Features

HuggingFace Integration – Connect to any Inference Endpoint with your token
Polymarket Tool – Analyze real-time prediction markets for research and strategy
Statistical Toolkit – Run advanced Bayesian and frequentist analysis with any tool-capable model
MCP Upgrade – Now supports Authorization headers for secure remote access

v1.1.0

Version 1.1.0

Improvements

Upgraded OpenAI Protocol Support – Compatibility and responsiveness with services like Perplexity, Gemini, Anthropic, Mistral, and xAI have been significantly improved by updating to the latest OpenAI-compatible protocol version.
Faster Web Search Tool – The search_web tool has been completely rewritten, resulting in a 60% boost in search speed and responsiveness.
llama.cpp Core Upgrade – Updated to b5760, this release brings full support for the latest Gemma 3n open-source model, enabling faster and smarter on-device AI performance.
Fork a Chat Anytime – Now you can instantly fork any conversation into a new thread—preserving all previous messages and tool settings for seamless exploration.
Improved Academic Search Accuracy – The search_arxiv tool has been fine-tuned for more accurate academic paper search, delivering better results from the ArXiv database.

Ready to Experience Privacy AI?

Download now and take control of your AI experience with complete privacy.

Download Privacy AI