Comparing changes

Implement complete Qwen3-Next architecture integration with NeMo Megatron framework: - Add hybrid attention mechanism supporting both full and linear attention layers - Implement gated delta rule for efficient linear attention with O(n) complexity - Create modular layer specifications with dynamic attention type selection - Add custom transformer layer supporting mixed attention mechanisms - Integrate with existing NeMo model infrastructure via megatron_gpt_model.py Key features: - Linear attention with chunk-based and recurrent processing algorithms - L2 normalization and clipping mechanisms for numerical stability - Full tensor parallelism support for distributed training - Memory-efficient processing for long sequences (4K+ tokens) - Configurable layer type patterns for optimal performance/efficiency balance Architecture allows 3B total parameters with ~1B active during inference, providing substantial efficiency gains while maintaining model expressivity. Files added: - nemo/collections/nlp/models/language_modeling/megatron/qwen3_next/__init__.py - nemo/collections/nlp/models/language_modeling/megatron/qwen3_next/qwen3_next_modules.py - nemo/collections/nlp/models/language_modeling/megatron/qwen3_next/qwen3_next_spec.py - nemo/collections/nlp/models/language_modeling/megatron/qwen3_next/qwen3_next_layer.py - nemo/collections/nlp/models/language_modeling/megatron/qwen3_next/README.md Files modified: - nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py

Add comprehensive FineWeb dataset integration for NeMo pre-training: - Create preprocessing script to convert HuggingFace FineWeb to NeMo binary format - Implement streaming dataset support for large-scale web data processing - Add optimized data loading with configurable batch sizes and workers - Support for subset selection (sample-10BT, sample-350BT) for experimentation - Integrate Qwen2 tokenizer for consistent text processing Key features: - Efficient streaming from HuggingFace datasets hub - Automatic text filtering and length validation - Parallel processing with configurable worker count - Memory-efficient JSONL intermediate format - Compatible with NeMo's standard indexed dataset pipeline Files added: - scripts/nlp_language_modeling/preprocess_fineweb_for_qwen3_next.py - nemo/collections/nlp/data/language_modeling/megatron/hf_streaming_dataset.py - nemo/collections/nlp/data/language_modeling/megatron/gpt_fineweb_dataset.py Enables training on web-scale datasets with proper tokenization and formatting for foundation model pre-training workflows.

Add production-ready training setup for 3B parameter Qwen3-Next model: - Complete training configuration for 8-GPU setup with optimal parallelism - Hybrid attention pattern optimized for 3B total/1B active parameters - BFloat16 mixed precision and gradient accumulation for memory efficiency - Integration with preprocessed FineWeb dataset using NeMo binary format - Comprehensive documentation with step-by-step training guide Training configuration features: - Tensor parallelism across 2 GPUs per model replica - Sequence parallel processing for 4K sequence lengths - Activation checkpointing for memory optimization - Cosine annealing scheduler with appropriate warmup - Validation and checkpointing with best model selection Documentation includes: - Complete preprocessing pipeline instructions - Training command examples and parameter explanations - Performance tuning guidelines for different GPU configurations - Troubleshooting section for common training issues - Architecture details and scaling recommendations Files added: - qwen3_next_3b_fineweb_training.yaml - QWEN3_NEXT_TRAINING_GUIDE.md Ready for production training of hybrid attention language models on web-scale data.

Automatic merge of Dependabot PR

) Automatic merge of Dependabot PR

Move .github/workflows/*.yml(.yaml) to .github/workflows-disabled/ to neutralize them ahead of the org-wide GitHub Actions enablement. Restore via the re-enable-workflow process when a workflow is intentionally re-enabled. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

) Automatic merge of Dependabot PR

Automatic merge of Dependabot PR

) Automatic merge of Dependabot PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Uh oh!

Commits on Sep 25, 2025

Commits on Sep 30, 2025

Commits on Oct 1, 2025

Commits on Oct 8, 2025

Commits on Oct 21, 2025

Commits on Mar 3, 2026

Commits on Mar 26, 2026

Commits on Apr 8, 2026

Commits on Apr 15, 2026

Commits on Apr 21, 2026

Commits on Jun 3, 2026

Commits on Jun 4, 2026

Commits on Jun 10, 2026

Commits on Jun 16, 2026

Commits on Jun 17, 2026

This comparison is taking too long to generate.

Uh oh!