提交历史

作者 SHA1 备注 提交日期
  Mayank Mishra f1d2a15b50 better eval sampler (#2907) 1 年之前
  Olatunji Ruwase 81b4d5db06 Make z3 respect comm dtype (#2807) 1 年之前
  Conglong Li 7c99def0f0 Data efficiency library update (#2866) 1 年之前
  Stas Bekman d323abd80f remove outdated comment (#2786) 1 年之前
  Logan Adams 86477538a6 Fix hardcoded instances to fp16 in optimizer creation log messages to the correct dtype. (#2743) 1 年之前
  Bing Xie 8d3b42c230 Bing/formatting correction (#2764) 1 年之前
  Jeff Rasley a60e31a7f2 [zero] remove misleading dtype log (#2732) 1 年之前
  Dashiell Stander d4bfae415d Fix autotuning so that it records Floating Point Operations per second, not microsecond (#2711) 1 年之前
  Ma, Guokai 98cc35b6a8 Abstract accelerator (step 3) (#2677) 1 年之前
  Joe Mayer 4be8df721a fixing optimizer sanity check (#2742) 1 年之前
  Ammar Ahmad Awan 867da307d0 Inference Refactor (replace_with_policy, model_implementations) (#2554) 1 年之前
  Joe Mayer 8d87c89e42 BF16 optimizer for BF16+ZeRO Stage 1 (#2706) 1 年之前
  Jeff Rasley e4ba722297 non-MoE stage 1 requires CG disabled (#2703) 1 年之前
  Alexander Jipa 0f0e38c520 fixes #2498 (#2603) 1 年之前
  Conglong Li ef869377e9 DeepSpeed Data Efficiency Library (#2585) 1 年之前
  Ma, Guokai 06938835eb Support fp32 gradaccum for bf16 model (#2566) 1 年之前
  Cheng Li abe4fc6b55 encoded ds config into command line argument when launching child processes in autotuning (#2524) 1 年之前
  ShijieZZZZ 340fc0cf19 Report progress at gradient accumulation boundary (#2553) 1 年之前
  Joe Mayer 21c2802964 Adding Gradient Accumulation Data Type Config (#2512) 1 年之前
  Olatunji Ruwase ee39187d8f Make bf16_optimizer work for non pipeline (#2470) 1 年之前
  Joe Mayer 7d113633e4 Fix Bug #2319 (#2438) 2 年之前
  Adam Moody b8fb9c3f1a parallelize writing of layer checkpoint files across data parallel instances (#1419) 2 年之前
  Olatunji Ruwase 799120e7e4 Universal checkpoint for zero stage 1 (#2284) 2 年之前
  Joe Mayer 906b4a025f Fixing bug 2361 (#2410) 2 年之前
  Alexander Jipa cfead55132 fixes #2389 (#2411) 2 年之前
  Matt Smith b609a29412 fix an exception when recursively casting dicts to fp16 (#2370) 2 年之前
  叶志晟 80f94c10c5 fix #2240: wrong time unit in flops_profiler (#2241) 2 年之前
  Olatunji Ruwase cb5e05fe55 Correctly detect CPU optimizer usage (#2257) 2 年之前
  Siddharth Singh b288cf1b9b Enable contiguous gradients with Z1+MoE (#2250) 2 年之前
  Olatunji Ruwase 217338beb6 Refactor dist tests: Checkpointing (#2202) 2 年之前