Qwen Team has introduced QwQ-32B-Preview, an experimental research model designed to improve AI reasoning and analytical capabilities. Featuring a 32,768-token context and cutting-edge transformer architecture, it excels in math, programming, and scientific benchmarks like GPQA and MATH-500.
QwQ-32B-Preview is a causal language model built using advanced transformer architecture. It features Rotary Positional Embedding (RoPE), SwiGLU, RMSNorm, and Attention QKV bias. With 64 layers and 40 attention heads, it is optimized for tasks requiring deep reasoning. Its extended context length of 32,768 tokens allows the model to process large inputs and tackle intricate multi-step problems.
I did a short test on my M3 MAC, and the speed is excellent compared to the model capabilities, For local applications, hybrid architectures are ideal, combining reasoning power with tailored precision. As these models evolve, they open doors for more intelligent, localized AI solutions in combination with more powerful cloud capabilities.
QwQ-32B-Preview was tested on multiple challenging benchmarks, achieving notable results:

GPQA (Graduate-Level Google-Proof Q&A): Scored 65.2%, showcasing strong reasoning in scientific problem-solving.
AIME (American Invitation Mathematics Examination): Achieved 50.0%, solving advanced mathematical problems in algebra, geometry, and probability.
MATH-500: Performed exceptionally well with a 90.6% score, demonstrating comprehension across various mathematical topics.
LiveCodeBench: Reached 50.0%, validating its ability to generate and analyze code in real-world programming scenarios.
benchmark
QwQ-32B-Preview, as an experimental model, comes with several known challenges and limitations. One issue is its tendency to mix languages or switch between them unexpectedly, which can reduce the clarity of its responses. Additionally, the model sometimes enters recursive reasoning loops, leading to circular arguments and generating lengthy outputs without reaching definitive conclusions. While it excels in specialized tasks, it has room for improvement in general reasoning, particularly in areas like common sense and nuanced language understanding. Another significant concern is the need for enhanced safety measures to ensure its reliable and ethical deployment, especially in applications requiring high levels of trust and accountability.
How to Access QwQ-32B-Preview?
You can access QwQ-32B-Preview through HuggingChat, where it's currently running unquantized for free. To use QwQ-32B-Preview:
- Visit HuggingChat: https://huggingface.co/chat/
- Select QwQ-32B-Preview from the available models
- Start interacting with the model

Future updates aim to address its current limitations and enhance its performance in broader AI applications