diff --git a/README.md b/README.md
index 9f2e194..a9d5bde 100644
--- a/README.md
+++ b/README.md
@@ -230,7 +230,6 @@
 
 ## 8. Citation
 
-
-
-
-http://gofile.me/78ddm/vIf5YG74w
\ No newline at end of file
+<!-- TODO: Update Google Colab
+TODO: Update Arxiv Links
+TODO: Update Arxiv Link in Gradio -->
\ No newline at end of file
diff --git a/mae/README.md b/mae/README.md
deleted file mode 100644
index bcb6d16..0000000
--- a/mae/README.md
+++ /dev/null
@@ -1,51 +0,0 @@
-## Pre-training Using MAE
-We adopt the framework of [MAE](http://openaccess.thecvf.com/content/CVPR2022/html/He_Masked_Autoencoders_Are_Scalable_Vision_Learners_CVPR_2022_paper.html) for pre-training. The code is heavily borrowed from [Masked Autoencoders: A PyTorch Implementation](https://github.com/facebookresearch/mae).
-
-### 1. Install
-```bash
-conda create -n mae python=3.7
-conda activate mae
-pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
-pip install -r requirements.txt
-```
-- **Attention**: This repo is based on `timm==0.3.2`, for which a [fix](https://github.com/huggingface/pytorch-image-models/issues/420#issuecomment-776459842) is needed to work with PyTorch 1.8.1+.
-
-### 2. Prepare dataset
-- You need to prepare the dataset(s) in torchvision.datasets.ImageFolder format. The basic structure of the dataset is as follows:
-    ```text
-    |--dataset
-        |--subfolder1
-            |--image1.jpg
-            |--image2.jpg
-            |--...
-        |--subfolder2
-            |--image1.jpg
-            |--image2.jpg
-            |--...
-    ```
-- You can aslo download [Union14M-U](../README.md/#34-download) for pre-training, which is organized in ImageFolder format.
-
-### 3. Pre-training
-- Pre-training ViT-Small on Union14M-U with 4 gpus:
-    ```bash
-    CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch \
-        --nproc_per_node=4 main_pretrain.py \
-        --batch_size 256 \
-        --model mae_vit_small_patch4 \
-        --mask_ratio 0.75 \
-        --epochs 20 \
-        --warmup_epochs 2 \
-        --norm_pix_loss \
-        --blr 1.5e-4 \
-        --weight_decay 0.05 \
-        --data_path Union14M-U/book32 Union14M-U/openvino /Union14M-U/CC
-    ```
-- Here the effective batch size is 256 (batch_size per gpu) * 1 (nodes) * 4 (gpus per node) = 1024. If memory or # gpus is limited, use --accum_iter to maintain the effective batch size, which is batch_size (per gpu) * nodes * 8 (gpus per node) * accum_iter.
-- Here we use --norm_pix_loss as the target for better representation learning. To train a baseline model (e.g., for visualization), use pixel-based construction and turn off --norm_pix_loss.
-- To train ViT-Base set --model mae_vit_base_patch4
-- We also support tensorboard for visualization during pre-training. The learning rate, loss, and reconstructed images are logged every 200 iterations. 
-Note that when using norm_pix_loss, the reconstructed images are not the original images, but the images after normalization. To use it: 
-    ```bash
-    tensorboard --logdir=output_dir
-    ```
-- The pre-training takes about 2 hours each epoch on 4 A6000 GPUs (48G).
\ No newline at end of file