Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User story: I want to perform KPI extraction #205

Open
JeremyGohBNP opened this issue Sep 14, 2022 · 1 comment
Open

User story: I want to perform KPI extraction #205

JeremyGohBNP opened this issue Sep 14, 2022 · 1 comment
Labels
development Indicates that the issue is about software development enhancement New feature or request

Comments

@JeremyGohBNP
Copy link

If I want to extract KPIs from a PDF report, the procedure is as follows:

  • I log onto the extraction tool/interface (credentials TBD)
  • I look for the company/report I am interested in, searching it by name or ID (and year?)
  • I am presented with a list of corresponding reports
  • I select the correct report if it exists in the catalogue. If not, I can manually add the report (refer to I want to add a report)
  • I configure the extraction according to my needs, e.g. subset of KPIs, number of suggestions from the models
  • I trigger the extraction process: all intermediary steps will be launched automatically - text extraction, relevance detection, KPI extraction. Intermediary outputs should be saved onto the S3 bucket automatically for further use
  • I should be aware that if I try to extract KPIs that are not relevant for a given report/company/sector (e.g. because models have not been trained on a specific sector), the extraction might show disappointing performance
  • I can review the results (refer to I want to perform annotations)
  • Once the annotations are validated, results will be exposed on SuperSet and I can later directly use SuperSet to get the KPIs
@JeremyGohBNP JeremyGohBNP added the enhancement New feature or request label Sep 14, 2022
@erikerlandson
Copy link
Contributor

I wanted to clarify that Red Hat Emerging Tech Data Science does not have the engineering bandwidth or the application development skill set to address these user story elements. The work that @Shreyanand did updating the existing notebooks to allow choosing pdf directories and running against them should provide a good baseline capability for most of these, and can be used as a starting point for further application development.

@Shreyanand Shreyanand added the development Indicates that the issue is about software development label Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Indicates that the issue is about software development enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants