Our image warping-based method implements a highly robust motion transmission model, Filter Deform Attention Generative Adversarial Network (FDA GAN), which can transmit complex human motion videos using only a small number of human body images. We use a 3D pose shape estimator to replace traditional 2D pose estimation. We design a new attention mechanism and fuse it with the GAN network to propose a new network that can extract image features well and generate human motion videos. The entire process of FDA GAN is divided into three parts. Firstly, we obtain the pose and shape of the human through the 3D pose shape estimator. Secondly, we analyze the pose shape to obtain the foreground and background images of the human, as well as motion transformation streams. Finally, in the generative adversarial network, the foreground, background, and texture features are fused into new-person images. We show in the experimental section that our method outperforms recent methods overall and on various evaluation metrics.
https://svip-lab.github.io/dataset/iPER_dataset.html
This repository has been tested on the following platform: Python 3.8.15, PyTorch 1.7.0 with CUDA 11.0 and cuDNN 8.0, Ubuntu 22.04
To clone the repo, run either:
git clone --recursive https://github.com/mioyeah/FDA-GAN.git
Next, you have to make sure that you have all dependencies in place. The simplest way to do so, is to use anaconda.
conda env create -f environment.yml -n my-env
conda activate my-env
Download the following models and place them in their correspoding directories:
- FDAGAN.pth in
assets/checkpoints/neural_reders
.
python motiontransfer.py \
--gpu_ids 0 \
--image_size 512 \
--num_source 2 \
--output_dir "./results" \
--assets_dir "./assets" \
--model_id "Av37667655_2" \
--src_path "path?=./assets/samples/sources/fange_1/fange_1_ns=6,name?=fange_1_ns=6" \
--ref_path "path?=./assets/samples/references/Av37667655_2.mp4,name?=Av37667655_2,pose_fc?=300"