Citation Intent Classifier [Current Release: Alpha]

About The Project

The Citation Extractor and Classifier (CEC) is a software that performs automatic annotation of in-text citations in academic papers provided in PDF. You can classify citation by means of two main ensemble models, one utilizing section titles, the other without them. Finally, you can use a mix of the two according to the specific case (suggested). To specify the models to use within the API you have access to WoS mode (for the model not utilizing section titles), WS mode (for the one with section titles), and finally M for the mixed model.

This page describes the Citation Intent Classifier (CIC) component, which is able to identify the citation intent of one or more citation(s) given as input. Citations are classified according to the CiTO ontology and four classes are currently recognized: UsesMethodIn, ObtainsBackgroundFrom, UseConclusionsFrom and CitesForInformation.

Technical Overview

The classification part is carried out by an Ensemble Model, which is a combination of six binary classifiers (in Beta release) and a meta classifier built on top of them. The meta classifier carries out the voting process and returns the final classification result. Furthermore, a threshold of 90% confidence has been defined to filter out the results on which the ensemble is not confident enough.

Key Features

The baseline model surpass the current SOTA Macro-F1 score for the citation intent classification task within the SciCite dataset.

This tool gives you the possibility to classify any number of input sentences given in input in the form of a list of tuples, or as a JSON file. The tool also gives you the possibility to download the results in JSON format.

You have the possibility to select one of three possible working modes:

With Sections: select this mode if ALL your sentences have also the title of the section in which the citation is contained;
Without Sections: select this mode if NONE of your sentences contains the title of the section in which the citation is, or if you want to try a classification based on the pure semantic of the sentence at hand;
Mixed: select this mode if SOME of your sentences have the title of the section in which the citation is contained, and others not. The tool will carry out the entire filtering process and return you the results.

(back to top)

Benchmark on SciCite

The leadboard is based on Macro-F1 scores of the models tested on the test set of the SciCite Dataset. Highlighted mdoels are the resulting classifiers of this project. The WS models utilize section titles to classify citation sentences, while the WoS models do not make use of section titles and classify raw citation sentences. Models are also presented as different outputs coming from Alpha (described here) and Beta (described here) releases.

#	Model	Macro-F1 Score	Accuracy Score
1	EnsIntWS - Beta Release	89.46	90.75
2	EnsCICWS - Alpha Release	88.99	90.32
3	ImpactCite	88.93	\
4	EnsIntWoS - Beta Release	88.48	89.73
5	EnsCICWoS - Alpha Release	87.75	88.86
6	CitePrompt	86.33	87.56
7	SciBERT	86.32	\

Roadmap

Final Release:
- Release new template for the web application
- Add a better threshold definition mechanics for classifiers
- Release API:
  - Write API:
    - Add support for compressed files and folders
    - Write Documentation
    - Write usage examples
Beta Release:
- Add structured README.md
- Add Changelog
- Publish article
- Update base software:
  - Update ensemble models
  - Improve classifier score
Alpha Release:
- Release web application:
  - Design web interface
  - Develop and publish classification model

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have any suggestion that would make this project better, please fork the repo and create a pull request. If this sounds too complex, you can simply open an issue with the tag "enhancement". Don't forget to give the project a star!

(back to top)

License

Distributed under the ISC License. See LICENCE for more information.

(back to top)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Citation Intent Classifier [Current Release: Alpha]

About The Project

Technical Overview

Key Features

Benchmark on SciCite

Roadmap

Contributing

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Citation Intent Classifier [Current Release: Alpha]

About The Project

Technical Overview

Key Features

Benchmark on SciCite

Roadmap

Contributing

License