Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PresentationBuilder.PublishSlides generates slides with different data #40

Open
f1nzer opened this issue Nov 12, 2021 · 3 comments
Open
Labels
enhancement New feature or request PowerPoint PowerPoint related tasks

Comments

@f1nzer
Copy link
Contributor

f1nzer commented Nov 12, 2021

I'm using PresentationBuilder.PublishSlides to generate slides from the original pptx file.
The problem is that this method returns non-deterministic results from run to run: slide's DocumentByteArray has different data - there is a difference in several bytes.

Is it an expected behavior or not? Thanks.

Simple repro code (NET SDK 6.0.100, Clippit 1.8.1):

using System.IO;
using System.Linq;
using Clippit.PowerPoint;
using DocumentFormat.OpenXml.Packaging;
using Xunit;

namespace PptxTest;

public class UnitTest1
{
    [Fact]
    public void PublishSlides_Should_GenerateSameDataInTwoRuns()
    {
        const string filePath = @"use any pptx file path";
        
        var sizesForSlides1 = SplitPptxAndGetByteSizesForSlides(filePath);
        var sizesForSlides2 = SplitPptxAndGetByteSizesForSlides(filePath);

        Assert.Equal(sizesForSlides1, sizesForSlides2);
    }

    private static int[] SplitPptxAndGetByteSizesForSlides(string filePath)
    {
        using var fileContentStream = File.OpenRead(filePath);
        using var document = PresentationDocument.Open(fileContentStream, false);
        var slides = PresentationBuilder.PublishSlides(document, null);

        return slides.Select(slide => slide.DocumentByteArray.Length).ToArray();
    }
}
@sergey-tihon
Copy link
Owner

I am not quite sure, but I think that ZIP archives (*.pptx, *.docx, *.xlsx) are not deterministic by their native

According to Wikipedia http://en.wikipedia.org/wiki/Zip_(file_format) seems that zip files have headers for File last modification time and File last modification date so any zip file checked into git will appear to git to have changed if the zip is rebuilt from the same content since. And it seems that there is no flag to tell it to not set those headers.

From SO

@f1nzer
Copy link
Contributor Author

f1nzer commented Nov 12, 2021

That's interesting.

In addition to that, in my case _rels/.rels file has several <Relationship .. tags where Id are unique (another file has a different id set).
The same story for other rels files (see ppt folder).

@sergey-tihon
Copy link
Owner

Ha! You are right, OpenXmlPowerTools historically uses GUIDs as relationship IDs

internal static string NewRelationshipId() =>
"rcId" + Guid.NewGuid().ToString().Replace("-", "").Substring(0, 16);

@sergey-tihon sergey-tihon added enhancement New feature or request PowerPoint PowerPoint related tasks labels Nov 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request PowerPoint PowerPoint related tasks
Projects
None yet
Development

No branches or pull requests

2 participants