Analysis of line charts through CNNs

This repository contains the source code for the implementation of a neural network model to achieve multiple tasks over line chart images. In particular, 3 tasks have been sought:

Localizing legend
Localizing axis
Counting the number of lines

Frameworks and languages

The whole project has been implemented locally with Python 3.8, the Keras/TensorFlow2 platform for preprocessing, building, and training the model, Matplotlib library for the visualization of the dataset and the usage of Google Colab for speeding up the process.

Dataset

The study has been conducted over the images contained inside the FigureSeer dataset and contained 927 line chart images of different sizes, each one characterized by a legend and a number of lines between 2 and 10.

Description of the study

As a first study, a custom model for the task of legend localization has been implemented in legend_localization_custom_model.ipynb with a standard preprocessing. Secondly, the same custom model has been compared with vgg16 pre-trained model, with the presence of a custom preprocessing in legend_localization_model.ipynb. The fine-tuning approach outperformed the custom model and therefore it has been selected for the rest of the study.

Notebooks 2_task_vgg_fine_tuning.ipynb and 2_task_resnet_fine_tuning.ipynb implement the multitask models for the localization of legend and axes over the line graph images, respectively fine-tuning pre-trained model vgg16 and ResNet50. Such models have similar performance, however, vgg16 showed less overfitting behavior, consequently, it has been selected for the rest of the study.

Notebook get_num_lines_from_legend.ipynb implements the line counting task through the vgg16 fine-tuned model, by processing the legend-only images (cropped out by the original dataset). This approach led to interesting results, however imperfect in the case line entries inside the legend were placed horizontally.

Notebook 3_task_fine_tuning_model.ipynb implements the model that predicts over all the aforementioned tasks. This approach, however, showed an overfitting behavior for the line counting task, which evidently degraded the performance of the legend localization task.

Results

Being the localization task a bounding-box regression problem, the accuracy of each model has been analyzed based on the number of predictions that reported satisfying values of IoU metric (which describes the amount of ground truth area the prediction fulfills), over the whole sets. The gathered results show the performance of the vgg16 fine-tuned model, for each of the aforementioned scenarios.

Legend localization:

IoU Training set accuracy Validation set accuracy Test set accuracy

>0.6 95,27% 82,55% 86,02%

>0.7 88,01% 71,81% 71,51%

>0.8 68,58% 42,95% 44,09%

Legend and axis localization:

Task IoU Training set accuracy Validation set accuracy Test set accuracy

>0.6 93,58% 74,50% 84,95%

Legend localization >0.7 85,30% 60,40% 68,28%

>0.8 61,66% 32,21% 37,10%

>0.6 98,14% 97,32% 97,31%

Axis localization >0.7 97,80% 91,95% 96,77%

>0.8 94,09% 83,89% 90,32%

Legend-axis localization and line counting:

Task criterion Training set accuracy Validation set accuracy Test set accuracy

IoU>0.6 73,31% 52,35% 57,53%

Legend localization IoU>0.7 46,96% 26,85% 33,33%

IoU>0.8 17,06% 8,05% 12,37%

IoU>0.6 97,97% 96,64% 97,85%

Axis localization IoU>0.7 95,78% 92,62% 94,09%

IoU>0.8 91,05% 81,88% 83,87%

Line counting None 100% 74,50% 71,51%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis of line charts through CNNs

Frameworks and languages

Dataset

Description of the study

Results

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
2_task_resnet_fine_tuning.ipynb		2_task_resnet_fine_tuning.ipynb
2_task_vgg_fine_tuning.ipynb		2_task_vgg_fine_tuning.ipynb
3_task_fine_tuning_model.ipynb		3_task_fine_tuning_model.ipynb
get_num_lines_from_legend.ipynb		get_num_lines_from_legend.ipynb
legend_localization_custom_model.ipynb		legend_localization_custom_model.ipynb
legend_localization_model.ipynb		legend_localization_model.ipynb
readme.md		readme.md

IoU	Training set accuracy	Validation set accuracy	Test set accuracy
>0.6	95,27%	82,55%	86,02%
>0.7	88,01%	71,81%	71,51%
>0.8	68,58%	42,95%	44,09%

Task	IoU	Training set accuracy	Validation set accuracy	Test set accuracy
	>0.6	93,58%	74,50%	84,95%
Legend localization	>0.7	85,30%	60,40%	68,28%
	>0.8	61,66%	32,21%	37,10%
	>0.6	98,14%	97,32%	97,31%
Axis localization	>0.7	97,80%	91,95%	96,77%
	>0.8	94,09%	83,89%	90,32%

Task	criterion	Training set accuracy	Validation set accuracy	Test set accuracy
	IoU>0.6	73,31%	52,35%	57,53%
Legend localization	IoU>0.7	46,96%	26,85%	33,33%
	IoU>0.8	17,06%	8,05%	12,37%
	IoU>0.6	97,97%	96,64%	97,85%
Axis localization	IoU>0.7	95,78%	92,62%	94,09%
	IoU>0.8	91,05%	81,88%	83,87%
Line counting	None	100%	74,50%	71,51%

LeoGori/line_chart_analysis

Folders and files

Latest commit

History

Repository files navigation

Analysis of line charts through CNNs

Frameworks and languages

Dataset

Description of the study

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages