GitHub - SCZwangxiao/DEPICT: a multi-modal video caption dataset with richer annotation

DEPICT: Towards Holistic Long Video-Language Understanding

About DEPICT dataset

From paper "DEPICT: Towards Holistic Long Video-Language Understanding" (Under review)

In this work, we curate the DEPICT dataset, a high-quality large-scale video caption dataset of untrimmed videos, consisting of 304K videos and 81.3M caption tokens. It has richer annotations than existing video caption datasets, including 8x more caption tokens, 2x more average unique tokens/video, and more modalities.

Download

We provide two downloading approach:

Google drive
Baidu drive (code: 759q)

Unzip the files to forge the following directories:

data
└── depict
    └── annotations
        └── train.json
        └── val.json
        └── test.json
    └── videos

Visualization

We provide code for dataset visualization in ./visualization.ipynb.

If the video is unplayable in the Jupyter notebook above, you may be missing the hevc/H.265 codec. Install it or convert videos to the H.264 codec for proper visualization. Please note that this issue will not affect the baseline codes, as both video decoding package pyav and decord can handle this.

Baselines

We provide instructions to run the baselines in the paper to reproduce all experiments including ablation studies:

After finishing inference, run evaluation.py to get evluation results.

Annotation format

The annotations are organized in a nested manner shown below:

[
  {
    "BVid": "BV1hT421X7xV",
    "video_duration": 158,
    "video_title": "Both a Home-cooked Dish and a Street-side Specialty Snack, Master teaches you how to make Salt and Pepper Mushrooms",
    "summarization": "This is a simple and easy-to-learn home-cooked dish - the recipe for Salt and Pepper Mushrooms, suitable for selling at a stall ...",
    "asr_results": "Do we have a familiar and delicious home-cooked dish that can be made into a street snack, light and swift to set up a stall ..."
  },
  ...
]

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
docs		docs
inference		inference
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
evaluation.py		evaluation.py
visualization.ipynb		visualization.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DEPICT: Towards Holistic Long Video-Language Understanding

About DEPICT dataset

Download

Visualization

Baselines

Annotation format

About

Releases

Packages

Languages

License

SCZwangxiao/DEPICT

Folders and files

Latest commit

History

Repository files navigation

DEPICT: Towards Holistic Long Video-Language Understanding

About DEPICT dataset

Download

Visualization

Baselines

Annotation format

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages