PySlowFast Model Zoo and Baselines

Kinetics 400 and 600

architecture	size	crops x clips	frame length x sample rate	top1	top5	model	config	dataset
C2D	R50	3 x 10	8 x 8	67.2	87.8	`link`	Kinetics/c2/C2D_NOPOOL_8x8_R50	K400
I3D	R50	3 x 10	8 x 8	73.5	90.8	`link`	Kinetics/c2/I3D_8x8_R50	K400
I3D NLN	R50	3 x 10	8 x 8	74.0	91.1	`link`	Kinetics/c2/I3D_NLN_8x8_R50	K400
Slow	R50	3 x 10	4 x 16	72.7	90.3	`link`	Kinetics/c2/SLOW_4x16_R50	K400
Slow	R50	3 x 10	8 x 8	74.8	91.6	`link`	Kinetics/c2/SLOW_8x8_R50	K400
SlowFast	R50	3 x 10	4 x 16	75.6	92.0	`link`	Kinetics/c2/SLOWFAST_4x16_R50	K400
SlowFast	R50	3 x 10	8 x 8	77.0	92.6	`link`	Kinetics/c2/SLOWFAST_8x8_R50	K400
MViTv1	B-Conv	1 x 5	16 x 4	78.4	93.5	`link`	Kinetics/MVIT_B_16x4_CONV	K400
rev-MViT	B-Conv	1 x 5	16 x 4	78.4	93.4	`link`	Kinetics/REV_MVIT_B_16x4_CONV	K400
MViTv1	B-Conv	1 x 5	32 x 3	80.4	94.8	`link`	Kinetics/MVIT_B_32x3_CONV	K400
MViTv1	B-Conv	1 x 5	32 x 3	83.9	96.5	`link`	Kinetics/MVIT_B_32x3_CONV_K600	K600
MViTv2	S	1 x 5	16 x 4	81.0	94.6	`link`	Kinetics/MVITv2_S_16x4	K400
MViTv2	B	1 x 5	32 x 3	82.9	95.7	`link`	Kinetics/MVITv2_B_32x3	K400

X3D models (details in projects/x3d)

architecture	size	pretrain	frame length x sample rate	top1 10-view	top1 30-view	parameters (M)	FLOPs (G)	model	config
X3D	XS	-	4 x 12	68.7	69.5	3.8	0.60	`link`	Kinetics/X3D_XS
X3D	S	-	13 x 6	73.1	73.5	3.8	1.96	`link`	Kinetics/X3D_S
X3D	M	-	16 x 5	75.1	76.2	3.8	4.73	`link`	Kinetics/X3D_M
X3D	L	-	16 x 5	76.9	77.5	6.2	18.37	`link`	Kinetics/X3D_L

AVA

architecture	size	Pretrain Model	frame length x sample rate	MAP	AVA version	model
Slow	R50	Kinetics 400	4 x 16	19.5	2.2	`link`
SlowFast	R101	Kinetics 600	8 x 8	28.2	2.1	`link`
SlowFast	R101	Kinetics 600	8 x 8	29.1	2.2	`link`
SlowFast	R101	Kinetics 600	16 x 8	29.4	2.2	`link`

Multigrid Training

Update June, 2020: In the following we provide (reimplemented) models from "A Multigrid Method for Efficiently Training Video Models " paper. The multigrid method trains about 3-6x faster than the original training on multiple datasets. See projects/multigrid for more information. The following provides models, results, and example config files.

Kinetics:

architecture	size	pretrain	frame length x sample rate	training	top1	top5	model	config
SlowFast	R50	-	8 x 8	Standard	76.8	92.7	`link`	Kinetics/SLOWFAST_8x8_R50_stepwise
SlowFast	R50	-	8 x 8	Multigrid	76.6	92.7	`link`	Kinetics/SLOWFAST_8x8_R50_stepwise_multigrid

(Here we use stepwise learning rate schedule.)

Something-Something V2:

architecture	size	pretrain	frame length x sample rate	training	top1	top5	model	config
SlowFast	R50	Kinetics 400	16 x 8	Standard	63.0	88.5	`link`	SSv2/SLOWFAST_16x8_R50
SlowFast	R50	Kinetics 400	16 x 8	Multigrid	63.5	88.7	`link`	SSv2/SLOWFAST_16x8_R50_multigrid

Charades

architecture	size	pretrain	frame length x sample rate	training	mAP	model	config
SlowFast	R50	Kinetics 400	16 x 8	Standard	38.9	`link`	SSv2/SLOWFAST_16x8_R50
SlowFast	R50	Kinetics 400	16 x 8	Multigrid	38.6	`link`	SSv2/SLOWFAST_16x8_R50_multigrid

ImageNet

We also release the imagenet pretrained model if finetuning from ImageNet is preferred. The reported accuracy is obtained by center crop testing on the validation set.

architecture	size	Top1	Top5	model	Config
ResNet	R50	76.4	93.2	`link`	ImageNet/RES_R50
MVIT	B-16-Conv	82.9	96.3	`link`	ImageNet/MVIT_B_16_CONV
rev-VIT	Small	79.9	94.9	`link`	ImageNet/REV_VIT_S.yaml
rev-VIT	Base	81.8	95.6	`link`	ImageNet/REV_VIT_B.yaml
rev-MVIT	Base	82.9*	96.3	`link`	ImageNet/REV_MVIT_B_16_CONV.yaml

*please refer to Reversible Model Zoo.

PyTorchVideo

We support and benchmark PyTorchVideo models and datasets in PySlowFast. See projects/pytorchvideo for more information about PyTorchVideo Model Zoo.

MODEL_ZOO.md 8.9 KB 永久链接 文件历史 原始文件