Train Scene Graph Generation for Visual Genome and GQA in PyTorch >= 1.2 with improved zero and few-shot generalization.
-
Updated
Jun 18, 2023 - Jupyter Notebook
Train Scene Graph Generation for Visual Genome and GQA in PyTorch >= 1.2 with improved zero and few-shot generalization.
Predicting a subgraph alongside the answer in a graph based VQA model
Vision-Language, Solve GQA(Visual Reasoning in the Real World) dataset.
A RAG-based question-answering system that processes user queries using local documents. It extracts relevant information to answer questions, falling back to a large language model when local sources are insufficient, ensuring accurate and contextual responses.
LaTeX files for my honours thesis: "Graph Attention Networks for Compositional Visual Question Answering"
Source code for my honours thesis: "Graph Attention Networks for Compositional Visual Question Answering"
A toolkit for vision-language processing to support the increasing popularity of mulit-modal transformer-based models
Case study of multi-layer perceptron and random forest techniques as applied to a subset of the GQA dataset.
Simple Llama architecture LLM in pytorch
Add a description, image, and links to the gqa topic page so that developers can more easily learn about it.
To associate your repository with the gqa topic, visit your repo's landing page and select "manage topics."