Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken internal links priority #528

Merged
merged 11 commits into from
Dec 17, 2024
78 changes: 32 additions & 46 deletions src/internal-links/handler.js
Original file line number Diff line number Diff line change
Expand Up @@ -18,56 +18,41 @@ import { noopUrlResolver } from '../common/audit.js';
import { syncSuggestions } from '../utils/data-access.js';

const INTERVAL = 30; // days
const DAILY_PAGEVIEW_THRESHOLD = 100;
const AUDIT_TYPE = 'broken-internal-links';

/**
* Determines if the URL has the same host as the current host.
* @param {*} url
* @param {*} currentHost
* @returns
* Classifies links into priority categories based on views
* High: top 25%, Medium: next 25%, Low: bottom 50%
* @param {Array} links - Array of objects with views property
* @returns {Array} - Links with priority classifications included
*/
function hasSameHost(url, currentHost) {
const host = new URL(url).hostname;
return host === currentHost;
}

/**
* Filter out the 404 links that:
* - have less than 100 views and do not have a URL.
* - do not have any sources from the same domain.
* @param {*} links - all 404 links Data
* @param {*} hostUrl - the host URL of the domain
* @param {*} auditUrl - the URL to run audit against
* @param {*} log - the logger object
* @returns {Array} - Returns an array of 404 links that meet the criteria.
*/

function transform404LinksData(responseData, hostUrl, auditUrl, log) {
return responseData.reduce((result, { url, views, all_sources: allSources }) => {
try {
if (!url || views < DAILY_PAGEVIEW_THRESHOLD) {
return result;
}
const sameDomainSources = allSources.filter(
(source) => source && hasSameHost(source, hostUrl),
);

for (const source of sameDomainSources) {
result.push({
url_to: url,
url_from: source,
traffic_domain: views,
});
}
} catch {
log.error(
`Error occurred for audit type broken-internal-links for url ${auditUrl}, while processing sources for link ${url}`,
);
function calculatePriority(links) {
// Sort links by views
const sortedLinks = [...links].sort((a, b) => b.views - a.views);

// Calculate total views
const totalViews = sortedLinks.reduce((sum, link) => sum + link.views, 0);

// Map through sorted links and assign priority based on contribution percentage
return sortedLinks.map((link) => {
const contributionPercentage = (link.views / totalViews) * 100;

let priority;
if (contributionPercentage >= 75) {
priority = 'high';
} else if (contributionPercentage >= 50) {
priority = 'medium';
} else {
priority = 'low';
}
return result;
}, []);

return {
...link,
priority,
};
});
}

/**
* Perform an audit to check which internal links for domain are broken.
*
Expand All @@ -93,9 +78,10 @@ export async function internalLinksAuditRunner(auditUrl, context, site) {

log.info('broken-internal-links: Options for RUM call: ', JSON.stringify(options));

const all404Links = await rumAPIClient.query('404', options);
const internal404Links = await rumAPIClient.query('404-internal-links', options);
const priorityLinks = calculatePriority(internal404Links);
const auditResult = {
brokenInternalLinks: transform404LinksData(all404Links, finalUrl, auditUrl, log),
brokenInternalLinks: priorityLinks,
fullAuditRef: auditUrl,
finalUrl,
auditContext: {
Expand Down
2 changes: 2 additions & 0 deletions test/audits/internal-links.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,13 @@ const AUDIT_RESULT_DATA = [
url_to: 'https://www.example.com/article/dogs/breeds/choosing-an-irish-setter',
url_from: 'https://www.example.com/article/dogs/just-for-fun/dogs-good-for-men-13-manly-masculine-dog-breeds',
traffic_domain: 100,
priority: 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the implementation, priority is set to either high, medium, or low; but why it does end up as an integer here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mistake - leftover code from initial implementation of priorities. I will take care of this while implementing test cases which is not done yet. thx

},
{
url_to: 'https://www.example.com/article/dogs/breeds/choosing-a-miniature-poodle',
url_from: 'https://www.example.com/article/dogs/pet-care/when-is-a-dog-considered-senior',
traffic_domain: 100,
priority: 0,
},
];

Expand Down
Loading