Alibaba (BABA) has recently announced the launch of its new large language model, QwQ-32B. This model, with only 32 billion parameters, has demonstrated performance comparable to DeepSeek-R1, which has 671 billion parameters (370 billion of which are activated), and in certain tests, it even surpasses the latter. This move has resulted in Alibaba's stock price rising over 7%, further driving the transformation of AI large models from "quantity" to "quality."
The release of the QwQ-32B model highlights that smaller parameter models can also achieve high performance. The Alibaba Qwen team noted that this achievement showcases the effectiveness of applying reinforcement learning (RL) to large-scale pre-trained models, suggesting that this approach may be a viable path toward general artificial intelligence. Additionally, QwQ-32B not only possesses strong foundational reasoning abilities but also integrates agent-related capabilities, enabling it to perform critical thinking while using tools and adjust its reasoning process based on environmental feedback.
According to official testing results, QwQ-32B has excelled in multiple key evaluations. In the AIME24 mathematical ability assessment, QwQ-32B performed comparably to DeepSeek-R1 and significantly outperformed similar models like o1-mini. Its performance in the LiveCodeBench code evaluation was also on par with DeepSeek-R1. Furthermore, in the "Most Difficult LLMs Evaluation" LiveBench, QwQ-32B scored higher than DeepSeek-R1, and it also outperformed DeepSeek-R1 in the IFEval instruction-following capability assessment. In the BFCL test, QwQ-32B's performance similarly exceeded that of DeepSeek-R1. The LiveBench score for QwQ-32B is approximately 72.5, with a cost of just $0.25. In comparison, R1 has a score of around 70 and costs $2.50, while o3-mini scores 75 at a cost of $5.00. This indicates that QwQ-32B has achieved a good balance between performance and cost.
The outstanding performance of QwQ-32B is primarily attributed to its use of large-scale reinforcement learning methods. The Alibaba team conducted a phased approach to RL training, starting with a focus on mathematical and programming tasks, providing feedback by verifying the correctness of generated answers and the success of code execution. In the expansion phase, RL training for general capabilities was added, utilizing a universal reward model and rule-based validators to enhance the model's overall abilities. Research has shown that as the number of RL training rounds increases, the model's performance in math and programming consistently improves, validating the effectiveness of this approach.
QwQ-32B has now been open-sourced on platforms such as Hugging Face and ModelScope under the Apache 2.0 license, allowing users to experience this powerful model through Qwen Chat. Commentators in the tech media have remarked on the significance of this open-source release, highlighting the potential of the RLHF approach and dispelling pessimistic expectations regarding AI model development. Recently, Alibaba also announced plans to invest over 380 billion yuan in cloud and AI hardware infrastructure over the next three years, aiming to surpass the total investment of the past decade. The launch of QwQ-32B aligns closely with Alibaba's AI strategy and further solidifies its position among the top global open-source models. Looking ahead, Alibaba will continue to introduce larger models to further advance AI technology.
After logging into the uSMART HK APP, click on the search icon at the top right of the screen. Enter the stock code, such as "09988.HK" to access detailed information, trading history, and trends. Click the “Trade” button at the bottom right, select the “Buy/Sell” function, and submit your order after filling in the transaction conditions.
(Source: uSMART HK)