pytorch
PyTorch deep learning framework: tensors, autograd, and nn modules.
Technical explanation of how PyTorch autograd and backward() build the dynamic graph and accumulate gradients into linear layer weights and biases for SGD.
Fix PyTorch Dataset and DataLoader for multivariate time series preprocessing from CSV. Ensure (B, V, L) shapes, avoid data leakage with proper scaling, and validate sliding windows for MAMBA models.
DGL compatibility with PyTorch 2.8 and RTX 50 GPUs: recommended DGL/PyTorch/CUDA combos, Docker, prebuilt wheels, or build from source for Blackwell support.
Diagnose why OOM hits training Qwen3-0.6B (16 heads) on A100 48GB at 32k sequence with FlashAttention 2. Correct attention matrix estimates, quick fixes like windowed attention, ZeRO-3 offload, and scaling strategies for long sequences.