Skip to content

Commit

Permalink
Fix handling atomic tags
Browse files Browse the repository at this point in the history
  • Loading branch information
curdopet committed May 9, 2024
1 parent d19960c commit ec64972
Show file tree
Hide file tree
Showing 4 changed files with 19 additions and 7 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ of these three parameters it will be ignored:
not be compared - the entire tag should be treated as one token. This is useful for tags
where it does not make sense to insert `<ins>` and `<del>` tags. If not used, the default
list will be used:
`iframe,object,math,svg,script,video,head,style`.
`iframe,object,math,svg,script,video,head,style,a,img`.
### Example
Expand Down
2 changes: 1 addition & 1 deletion htmldiff-cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Options:
child nodes should not be compared - the entire tag should be treated
as one token. This is useful for tags where it does not make sense to
insert <ins> and <del> tags. If not used, this default list will be used:
"iframe,object,math,svg,script,video,head,style".`;
"iframe,object,math,svg,script,video,head,style,a,img".`;
console.log(usage);
}

Expand Down
2 changes: 1 addition & 1 deletion js/htmldiff.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
* `tag1,tag2,...` e. g. `head,script,style`. An atomic tag is one whose child nodes should not be
* compared - the entire tag should be treated as one token. This is useful for tags where it does
* not make sense to insert `<ins>` and `<del>` tags. If not used, the default list
* `iframe,object,math,svg,script,video,head,style` will be used.
* `iframe,object,math,svg,script,video,head,style,a,img` will be used.
* @return The combined HTML content with differences wrapped in `<ins>` and `<del>` tags.
*/
declare function diff(before: string, after: string, className?: string | null, dataPrefix?: string | null, atomicTags?: string | null): string;
Expand Down
20 changes: 16 additions & 4 deletions js/htmldiff.js
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,8 @@
*/
var atomicTagsRegExp;
// Added head and style (for style tags inside the body)
var defaultAtomicTagsRegExp = new RegExp('^<(iframe|object|math|svg|script|video|head|style|a)\b');
var defaultAtomicTagsRegExp = new RegExp('^<(iframe|object|math|svg|script|video|head|style|a|img)\\b');
var maxAtomicTagLength;

/**
* Checks if the current word is the beginning of an atomic tag. An atomic tag is one whose
Expand Down Expand Up @@ -98,7 +99,8 @@
* false otherwise.
*/
function isEndOfAtomicTag(word, tag){
return word.substring(word.length - tag.length - 2) === ('</' + tag);
// todo: implement better handling of atomic tags - not all tags have a closing tag
return tag === "img" || word.substring(word.length - tag.length - 2) === ('</' + tag);
}

/**
Expand Down Expand Up @@ -179,7 +181,7 @@
var char = html[i];
switch (mode){
case 'tag':
var atomicTag = isStartOfAtomicTag(currentWord);
var atomicTag = currentWord.length - 1 <= maxAtomicTagLength ? isStartOfAtomicTag(currentWord) : null;
if (atomicTag){
mode = 'atomic_tag';
currentAtomicTag = atomicTag;
Expand Down Expand Up @@ -940,6 +942,15 @@
}, '');
}

/**
* Finds the longest atomic tag and saves it to maxAtomicTagLength.
*/
function calculateMaxAtomicTagLength(){
const pattern = atomicTagsRegExp.source;
const atomicTagsList = pattern.match(/\b\w+\b/g);
maxAtomicTagLength = atomicTagsList.reduce((max, tag) => Math.max(max, tag.length), 0);
}

/**
* Compares two pieces of HTML content and returns the combined content with differences
* wrapped in <ins> and <del> tags.
Expand All @@ -951,7 +962,7 @@
* operation index data attribute will be named `data-${dataPrefix-}operation-index`.
* @param {string} atomicTags (Optional) Comma separated list of atomic tag names. The
* list has to be in the form `tag1,tag2,...` e. g. `head,script,style`. If not used,
* the default list `iframe,object,math,svg,script,video,head,style` will be used.
* the default list `iframe,object,math,svg,script,video,head,style,a,img` will be used.
*
* @return {string} The combined HTML content with differences wrapped in <ins> and <del> tags.
*/
Expand All @@ -962,6 +973,7 @@
atomicTags ?
(atomicTagsRegExp = new RegExp('^<(' + atomicTags.replace(/\s*/g, '').replace(/,/g, '|') + ')\b'))
: (atomicTagsRegExp = defaultAtomicTagsRegExp);
calculateMaxAtomicTagLength();

before = htmlToTokens(before);
after = htmlToTokens(after);
Expand Down

0 comments on commit ec64972

Please sign in to comment.