You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the output prior to cleaning, the page number is still missing:
In the reshaped and processed XML the page number is also not present:
Although the page number is present in the XML for the item just below it:
This issue is likely to do with the positioning of the <page> tag or the positioning of the <start_date> tag in situations where the start and end dates values are compressed into just the <start_date> tag (which we handle later on), which may cause there the item to be missing a closing <\/training> tag. This is important because we append the final <page_number> tag to each item in the XML by inserting it just before the <\/training> tag. So no tag, no <page_number>.
Initial fix
The aim is to ensure that at the earliest stage possible fix those cases. This does the job for about 1100 items in the 2009-2010 report:
In the final output the page number is missing:
In the output prior to cleaning, the page number is still missing:
In the reshaped and processed XML the page number is also not present:
Although the page number is present in the XML for the item just below it:
This issue is likely to do with the positioning of the
<page>
tag or the positioning of the<start_date>
tag in situations where the start and end dates values are compressed into just the<start_date>
tag (which we handle later on), which may cause there the item to be missing a closing<\/training>
tag. This is important because we append the final<page_number>
tag to each item in the XML by inserting it just before the<\/training>
tag. So no tag, no<page_number>
.Initial fix
The aim is to ensure that at the earliest stage possible fix those cases. This does the job for about 1100 items in the 2009-2010 report:
fmtrpt_data/scrapers/fmtrpt_2009_2010/1_scrape_extract/src/extractor_2009_2010_all_sections.sh
Lines 100 to 101 in d1e119b
However, it is still not working to fix about 200 items without page numbers.
The text was updated successfully, but these errors were encountered: