MetaのMuse Spark：パーソナル超知能に向けた新たなマルチモーダルAI

MetaのMuse Spark：パーソナル超知能への飛躍

本日、Metaが野心的なMuseファミリーの最初のモデルであるMuse Sparkを発表し、人工知能の進化において極めて重要な瞬間を迎えました。Muse Sparkは単なるもう一つのAIモデルではありません。それは、AIが世界とどのように対話し、理解するかという根本的な変化を表しています。ネイティブなマルチモーダル推論モデルとして、テキストから複雑な視覚情報まで、多様なデータタイプをシームレスに統合し処理することで、信じられないほど多機能で強力なツールとなっています。

Muse Sparkの機能の中核をなすのは、外部システムや環境と対話できる堅牢なツール使用サポート、そしてより透明で洗練された問題解決を可能にする革新的なビジュアル思考連鎖処理です。さらに、その高度なマルチエージェントオーケストレーションは、複数のAIエージェントを連携させて複雑なタスクを共同で解決する能力を強化します。このリリースは、基礎研究やモデルトレーニングからHyperionデータセンターのような最先端インフラストラクチャに至るまで、AIスタック全体にわたる重要な戦略的投資に裏打ちされた、MetaのAI戦略の包括的な見直しの最初の具体的な成果です。Muse Sparkは、meta.aiおよびMeta AIアプリを通じて即座に利用可能であり、一部のユーザーにはプライベートAPIプレビューが提供されます。

Muse Sparkの機能で高度な推論を解き放つ

Muse Sparkは、マルチモーダル知覚、複雑な推論、健康アプリケーション、高度なエージェントワークフローを含む幅広いAIタスクで競争力のあるパフォーマンスを示しています。Metaは、長期的エージェントシステムや複雑なコーディングワークフローなど、現在のパフォーマンスギャップがある分野への継続的な投資を認める一方で、初期の結果は彼らの新しいスケーリングスタックの有効性を裏付けています。熟考モードの導入は、Muse Sparkの推論能力をさらに高めます。この革新的なモードは、複数のAIエージェントを並行して推論させることで、困難なタスクにおけるパフォーマンスを大幅に向上させる戦略です。

熟考モードは、「人類最後の試験」で58%、「フロンティア科学研究」で38%という目覚ましい結果を達成し、Muse SparkをGemini Deep ThinkやGPT Proのような主要なフロンティアモデルの極限推論能力と競合する位置に置きました。この並列推論アプローチにより、モデルは複数の解決策の道筋を同時に探索でき、より堅牢で正確な結果につながります。meta.aiにおける熟考モードの段階的な展開は、これらの高度な機能をユーザーに順次開放し、パーソナル超知能の未来を垣間見せるでしょう。

実世界アプリケーション：動作中のMuse Spark

Muse Sparkは、パーソナル超知能の可能性を日常生活にもたらし、高度にパーソナライズされた方法でユーザーを理解し、支援するように設計されています。その高度な推論とマルチモーダル機能は、無数の実用的なアプリケーションを解き放ちます。

マルチモーダルインタラクション

マルチモーダル統合のためにゼロから構築されたMuse Sparkは、さまざまなドメインやツールにわたる視覚情報の処理に優れています。視覚的なSTEM問題、エンティティ認識、およびローカリゼーションにおいて強力なパフォーマンスを達成します。これらの強みが集約され、これまでは手の届かなかったインタラクティブな体験を可能にします。

インタラクティブ学習: 複雑な図を楽しいミニゲームに変えたり、家電製品のトラブルシューティングをMuse Sparkに依頼したりするのを想像してみてください。それは部品を特定し、インタラクティブなチュートリアルを作成し、手順の上にマウスを置くと動的な注釈で特定の領域をハイライト表示することができます。
プロンプト例: 'コーヒーマシンとグラインダーの主要コンポーネントを特定し、シンプルなウェブページでこのマシンを使ってラテを作るインタラクティブなチュートリアルを作成してください。ステップの上にマウスを置くと、コンポーネントのバウンディングボックスがハイライト表示されるようにしてください。'

パーソナライズされた健康に関する洞察

パーソナル超知能の重要な応用は、個人が自身の健康をよりよく理解し管理できるようにすることにあります。事実に基づいた包括的な応答を確保するため、Metaは1,000人以上の医師と協力し、Muse Sparkの健康推論機能のための専門的な訓練データをキュレーションしました。これにより、モデルは以下のことが可能になります。

健康情報の説明: 様々な食品の栄養成分や特定の運動中に活性化する筋肉など、健康データを分解して説明するインタラクティブなディスプレイを生成します。
パーソナライズされた食事指導: 個人の健康プロファイルに基づいて、カスタマイズされた食事アドバイスを提供し、画像内の食品項目にパーソナライズされた推奨事項や健康スコアを視覚的に注釈付けすることも可能です。
プロンプト例: '私は高コレステロールの魚菜食主義者です。推奨される食品には緑の点、推奨されない食品には赤の点を付けてください。点を重複させず、適切にローカライズされていることを確認してください。点の上にマウスを置くと、個人的な根拠と10点満点の「健康スコア」、さらにカロリー、炭水化物、タンパク質、脂肪を表示してください。健康スコアの数字は、マウスを置かなくても点のすぐ上に表示されるようにしてください。マウスを置いたときに表示される説明は、他のすべての点の上に表示されるようにしてください。'
フィットネスフィードバック: 運動姿勢を分析し、伸ばされている筋肉群を特定し、難易度を評価し、フォームに関するリアルタイムのフィードバックを提供します。パートナーとのパフォーマンス比較も可能です。
プロンプト例: '両方の画像について、どの筋肉が伸びているか、その難易度を示してください。点の上にマウスを置くと、その筋肉群について、私のフォームをどう修正すればよいかを含めて詳しく教えてください。ヨガを上達させたいです。パートナーと並べて表示し、私たち二人の評価を1から10のスケールで採点してください。'

スケーリング軸：Muse Spark成長の原動力

Metaのパーソナル超知能の追求は、モデルの予測可能かつ効率的なスケーリングにかかっています。Muse Sparkの開発は、事前学習、強化学習、テスト時推論という3つの重要なスケーリング軸に関する貴重な洞察をもたらしました。

事前学習の効率

事前学習フェーズは、Muse Sparkが基本的なマルチモーダル理解、推論、コーディング能力を確立する段階です。過去9ヶ月にわたり、Metaはその事前学習スタックを完全に再構築し、モデルアーキテクチャ、最適化技術、データキュレーションにおいて大幅な改善を組み込みました。これらの進歩は、計算の各単位から得られる能力を総合的に向上させます。一連の小規模モデルを用いたスケーリング法則による厳密な評価は、画期的な効率性を明らかにしました。Muse Sparkは、前身であるLlama 4 Maverickと比較して、1桁以上少ない計算量で同じ能力を達成できます。これにより、Muse Sparkは既存の主要なベースモデルよりも著しく効率的になります。

メトリック	Llama 4 Maverick (ベースライン)	Muse Spark (計算効率)	改善率
Compute for Capability	X FLOPs	< 0.1X FLOPs	> 10x
Performance Equivalence	Achieved Baseline	Achieved Baseline	N/A

強化学習（RL）の成果

事前学習の後、強化学習はMuse Sparkの能力をスケーラブルな方法で増幅させる上で重要な役割を果たします。大規模RLによく伴う固有の不安定性にもかかわらず、Metaの新しいスタックはスムーズで予測可能な成果をもたらします。これを実証するプロットは、訓練データにおけるpass@1およびpass@16（16回の試行のうち少なくとも1回成功）などのメトリクスでの対数線形成長を示しており、推論の多様性を損なうことなくモデルの信頼性が向上していることを示しています。重要なことに、保持された評価セットでの精度向上は、これらのRLの成果が予測可能に一般化されることを裏付けています。これは、Muse Sparkが訓練中に明示的に見たことのないタスクでもスムーズに改善することを意味します。これにより、モデルの機能強化が堅牢で広く適用可能であることが保証されます。

テスト時推論の最適化

数十億人のユーザーに効率的に知能を提供するため、Muse Sparkのテスト時推論は最適化される必要があります。Metaは2つの主要な戦略を採用しています。

思考時間ペナルティと思考圧縮: RL訓練中、思考時間が長くなるとペナルティが適用され、トークン使用量を最適化しつつ正解率を最大化するようモデルを促します。特定の評価では、これが「相転移」につながります。モデルがより長く考えることで改善する初期期間の後、長さペナルティが思考圧縮を誘発します。Muse Sparkは推論を凝縮し、著しく少ないトークンで問題を解決することを学習します。この圧縮の後、モデルはさらに強力なパフォーマンスを達成するために再びソリューションを拡張することができ、推論効率における目覚ましい適応性を示しています。
マルチエージェントオーケストレーション: レイテンシを大幅に増加させることなくテスト時推論を向上させるため、Metaは協力する並列エージェントの数をスケールさせます。標準的なテスト時スケーリングでは単一のエージェントが長く考えるのに対し、Muse Sparkのマルチエージェントアプローチは、同等の応答時間で優れたパフォーマンスを可能にします。この並列処理能力は、ユーザーフレンドリーな速度で複雑な推論を提供するために不可欠です。

Metaのビジョン：パーソナル超知能への道

Muse Sparkの導入は、パーソナル超知能を創造するというMetaの長期的なビジョンにおける記念碑的な一歩を表しています。基礎研究やインフラから高度な訓練技術に至るまで、AIスタックの各層を細心の注意を払って洗練させることで、MetaはAIが人間の能力を深く理解し、増強できる未来を築いています。Muse Sparkは、そのマルチモーダル推論、高度なツール使用、効率的なスケーリングにより、真にパーソナライズされたインテリジェントなAIコンパニオンへと私たちを近づける、将来のさらに大規模なモデルのための堅固な基盤を築きます。このスケーラブルでインテリジェントなAIへのコミットメントは、今後何年にもわたってテクノロジーと私たちの世界との関わり方を形成し、誰もがAIをスケールできる可能性を現実へと近づけるでしょう。

元の情報源

https://ai.meta.com/blog/introducing-muse-spark-msl/

よくある質問

What is Muse Spark and what makes it unique?

Muse Spark is Meta's inaugural model in the 'Muse' family, developed by Meta Superintelligence Labs. It stands out as a natively multimodal reasoning model, meaning it seamlessly integrates and processes information from various modalities like text and vision. Its unique capabilities include robust tool-use functionality, visual chain of thought for complex problem-solving, and sophisticated multi-agent orchestration, enabling it to coordinate multiple AI agents for enhanced performance. This model marks a significant step in Meta's ambitious journey towards developing personal superintelligence, aiming to understand and interact with users' worlds on a deeply personal level. Its introduction signifies a foundational shift in Meta's AI strategy, built on a ground-up overhaul of their AI efforts.

What are the core capabilities of Muse Spark, particularly 'Contemplating mode'?

Muse Spark offers competitive performance across a wide array of domains, including multimodal perception, complex reasoning tasks, health-related applications, and sophisticated agentic workflows. A standout feature is its 'Contemplating mode,' which represents a significant leap in AI reasoning. This mode orchestrates multiple AI agents to reason in parallel, allowing Muse Spark to tackle highly challenging problems with enhanced depth and accuracy. This parallel processing capability positions Muse Spark to compete with the extreme reasoning modes found in other frontier models, demonstrated by its impressive scores of 58% on 'Humanity’s Last Exam' and 38% on 'FrontierScience Research.' This mode allows for more deliberate and thorough problem-solving, crucial for achieving advanced cognitive functions.

How does Muse Spark apply its multimodal capabilities in real-world scenarios?

Muse Spark leverages its native multimodal integration to create highly interactive and practical applications. For instance, it can dynamically analyze and interact with visual information to troubleshoot home appliances, offering interactive tutorials with bounding box highlights and step-by-step guidance. In the realm of health, it can process visual data of food items or exercise routines to provide personalized insights, such as nutritional content, muscle activation, and even health scores with justifications, curated in collaboration with medical professionals. These capabilities enable Muse Spark to analyze immediate environments, support wellness, and generate engaging interactive experiences like mini-games, making AI more intuitive and helpful in daily life.

What strategic investments has Meta made to scale Muse Spark and future AI models?

To support the continued scaling of Muse Spark and its successors, Meta has undertaken strategic investments across its entire AI stack. This includes a comprehensive overhaul of its research methodologies, optimizing model training pipelines, and significantly upgrading its infrastructure, notably through the development of the Hyperion data center. A key aspect of these investments is a complete rebuild of the pretraining stack, which has led to substantial improvements in model architecture, optimization algorithms, and data curation techniques. These advancements have dramatically increased the efficiency of Meta's AI development, allowing them to extract greater capabilities from every unit of computational power and ensure predictable, efficient scaling towards the goal of personal superintelligence.

How has Meta achieved significant compute efficiency with Muse Spark compared to previous models?

Meta has achieved remarkable compute efficiency with Muse Spark through a rigorous overhaul of its pretraining stack. By implementing improvements in model architecture, optimization strategies, and data curation, they can now extract significantly more capability from the same amount of computational resources. Evaluations have shown that Muse Spark can reach the same performance levels with over an order of magnitude less compute compared to Meta's previous model, Llama 4 Maverick. This efficiency gain is not only a testament to their innovative engineering but also positions Muse Spark as a highly competitive model in terms of resource utilization against other leading base models. This breakthrough is critical for accelerating the development of larger, more powerful models.

Explain the role of Reinforcement Learning (RL) in Muse Spark's development.

Reinforcement Learning (RL) plays a crucial role in amplifying Muse Spark's capabilities post-pretraining. Despite the inherent instability often associated with large-scale RL, Meta's new stack ensures smooth and predictable gains. RL systematically improves the model's reliability and reasoning diversity, as evidenced by log-linear growth in pass@1 and pass@16 metrics on training data. Crucially, these improvements generalize effectively to unseen tasks, demonstrating that the gains from RL are not merely rote memorization but true capability enhancements. This predictable scaling of RL compute allows Muse Spark to continuously improve its ability to perform complex tasks, ensuring the model remains adaptable and performs well beyond its initial training scope.

What is 'thought compression' and 'multi-agent orchestration' in the context of Muse Spark's test-time reasoning?

In Muse Spark's test-time reasoning, 'thought compression' refers to the model's ability to condense its reasoning process to solve problems using significantly fewer tokens, driven by 'thinking time penalties' during RL training. Initially, the model might 'think longer' to improve, but as penalties increase, it learns to achieve similar or better results more concisely. After this compression phase, it can then extend its solutions for even stronger performance. 'Multi-agent orchestration' is a technique to scale test-time reasoning without drastically increasing latency. Instead of a single agent thinking longer, multiple parallel agents collaborate to solve complex problems, allowing Muse Spark to achieve superior performance with comparable response times. Both methods aim to maximize intelligence per token and per unit of time, making the AI efficient and responsive.

How can users access Muse Spark, and what are Meta's future plans for it?

Muse Spark is available today to the general public via [meta.ai](https://meta.ai/) and the Meta AI app. Additionally, Meta is extending access to select users through a private API preview, allowing developers and researchers to integrate and experiment with its advanced capabilities. As the first model in the Muse family, Muse Spark represents an initial step on Meta's ambitious scaling ladder towards achieving 'personal superintelligence.' Meta continues to invest heavily in developing larger, more capable models building upon Spark's foundation, with ongoing research focused on addressing current performance gaps in areas like long-horizon agentic systems and complex coding workflows. The 'Contemplating mode' will also be rolling out gradually to all users.