Local inference profiler

What can your machine actually run?

Reads your CPU, memory, and GPU, then tells you which open models fit, at what quantization, with a ready-to-run Ollama command.

02 / Models

03 / FAQ

What does What Model do?
What Model scans your system hardware (CPU, RAM, GPU, and VRAM) and recommends open-weight LLMs that fit your machine, including the best quantization level and copy-ready Ollama pull and run commands.
Do I need Ollama installed?
No. Recommendations work without Ollama. If Ollama is installed and running locally, the app detects pulled models and marks them on each card.
How accurate are the memory estimates?
Estimates are based on parameter count, quantization format, and context length. They are approximate; actual usage varies by runtime such as Ollama, llama.cpp, or LM Studio.
Should I run this on the machine I use for inference?
Yes. For accurate results, run What Model on the same PC or laptop where you plan to run local models. On phones and tablets, use manual VRAM and RAM configuration.