Local inference profiler
What can your machine actually run?
Reads your CPU, memory, and GPU, then tells you which open models fit, at what quantization, with a ready-to-run Ollama command.
02 / Models
03 / FAQ
- What does What Model do?
- What Model scans your system hardware (CPU, RAM, GPU, and VRAM) and recommends open-weight LLMs that fit your machine, including the best quantization level and copy-ready Ollama pull and run commands.
- Do I need Ollama installed?
- No. Recommendations work without Ollama. If Ollama is installed and running locally, the app detects pulled models and marks them on each card.
- How accurate are the memory estimates?
- Estimates are based on parameter count, quantization format, and context length. They are approximate; actual usage varies by runtime such as Ollama, llama.cpp, or LM Studio.
- Should I run this on the machine I use for inference?
- Yes. For accurate results, run What Model on the same PC or laptop where you plan to run local models. On phones and tablets, use manual VRAM and RAM configuration.