You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I want to extract KPIs from a PDF report, the procedure is as follows:
I log onto the extraction tool/interface (credentials TBD)
I look for the company/report I am interested in, searching it by name or ID (and year?)
I am presented with a list of corresponding reports
I select the correct report if it exists in the catalogue. If not, I can manually add the report (refer to I want to add a report)
I configure the extraction according to my needs, e.g. subset of KPIs, number of suggestions from the models
I trigger the extraction process: all intermediary steps will be launched automatically - text extraction, relevance detection, KPI extraction. Intermediary outputs should be saved onto the S3 bucket automatically for further use
I should be aware that if I try to extract KPIs that are not relevant for a given report/company/sector (e.g. because models have not been trained on a specific sector), the extraction might show disappointing performance
I can review the results (refer to I want to perform annotations)
Once the annotations are validated, results will be exposed on SuperSet and I can later directly use SuperSet to get the KPIs
The text was updated successfully, but these errors were encountered:
I wanted to clarify that Red Hat Emerging Tech Data Science does not have the engineering bandwidth or the application development skill set to address these user story elements. The work that @Shreyanand did updating the existing notebooks to allow choosing pdf directories and running against them should provide a good baseline capability for most of these, and can be used as a starting point for further application development.
If I want to extract KPIs from a PDF report, the procedure is as follows:
The text was updated successfully, but these errors were encountered: