process_th1kh.md 1.7 KB

process dataset

we use Talking-Head-1K-Hour as the example.

download and crop the talking person video clips

resample & resize video clips to 512x512 resolution and 25FPS

  • You can use the example code in data_gen/utils/process_video/resample_video_to_25fps_resize_to_512.py
  • It will generate processed video clips in /home/xxx/TH1KH_512/video/*.mp4

extract segment images

  • You can use the example code in data_gen/utils/process_video/extract_segment_imgs.py
  • It will generate segment images in /home/xxx/TH1KH_512/{gt_imgs, head_imgs, inpaint_torso_imgs, com_imgs}/*

extract 2d facial landmark

  • You can use the example code in data_gen/utils/process_video/extract_lm2d.py
  • It will generate 2d landmarks in /home/xxx/TH1KH_512/lms_2d/*_lms_2d.npy

extract 3dmm coefficients

  • You can use the example code in data_gen/utils/process_video/fit_3dmm_landmark.py
  • It will generate 3dmm coefficients in /home/xxx/TH1KH_512/coeff_fit_mp/*_coeff_fit_mp.npy

extract audio features

  • You can use the example code in data_gen/utils/process_audio/extract_mel_f0.py
  • It will generate raw wav in /home/xxx/TH1KH_512/audio/*.wav and mel_f0 in /home/xxx/TH1KH_512/mel_f0/*_mel_f0.npy
  • You can use the example code in data_gen/utils/process_audio/extract_hubert.py
  • It will generate hubert in /home/xxx/TH1KH_512/hubert/*_hubert.npy

Binarize the dataset

  • You can use the example code in data_gen/runs/binarizer_th1kh.py
  • You will see a binarized dataset at data/binary/th1kh