Skip to content

markjulmar/ConvertLearnToDoc

Repository files navigation

ConvertLearnToDoc

.NET

ConvertLearnToDoc is a set of tools that can convert Microsoft Word documents into Microsoft Learn articles or training modules that can be published on the Microsoft Learn site. This is a tool primarily intended for contributors to the platform who don't want to work in Markdown or YAML, but would prefer to leverage a document. The tool has an experimental capability to load an article or training module from the https://learn.microsoft.com and convert it into a Microsoft Word document.

Important: This tool does not push content into GitHub. The step to actually publish content requires a GitHub account connected to the https://github.com/MicrosoftDocs organization. This tool only converts documents into the Markdown and YAML files necessary for publishing.

Project structure

The GitHub repo has several related projects:

Project Description
ConvertLearnToDoc Blazor Server version of the conversion tool. This is the most current version and includes authentication through either MSA or GitHub and logging through Azure LogAnalytics.
ConvertAll A CLI tool to walk a local clone of a MicrosoftDocs GitHub repository and create Word docs from each located training module.
ConvertDocx A CLI tool to convert a single Learn module or Docs page to a Word doc, or vice-versa. It can take a URL, GitHub details, or a local folder/file.

In addition, there are four libraries used by the above projects.

Library project Description
Docx.Renderer.Markdown A library to convert a .docx file to Markdown
GenMarkdown.DocFX.Extensions A library of GenMarkdown extensions to render DocFX extensions.
LearnDocUtils The main conversion library.
Markdig.Renderer.Docx A Markdig library to read a Markdig document and turn it into a .docx file.
DocsToMarkdown A library that converts a Learn URL into Markdown. This is used to convert from Learn to Word or Markdown in the web tool.

Project dependencies

The project also depends on several NuGet packages:

Package Description
DxPlus A library to read/write .docx files.
GenMarkdown A library to generate Markdown content.
MSLearnRepos A .NET library to work with GitHub and the Learn repo structure
Markdig A CommonMark Markdown parsing library for .NET
Microsoft.DocAsCode.MarkdigEngine.Extensions Extensions for Markdig and DocFX.

Converting a Learn module to a Word document

To try out the tools locally, clone the repository and navigate to the src\ConvertDocx project folder. Running the tool with no parameters will list the options:

Input File            Required. Input file or folder.
Output FIle           Required. Output file or folder.
-s, --singlePage      Output should be a single page (Markdown file).
-g, --Organization    GitHub organization
-r, --Repo            GitHub repo
-b, --Branch          GitHub branch, defaults to 'live'
-t, --Token           GitHub access token
-d, --Debug           Debug output, save temp files
-p, --Pivot           Zone pivot to render to doc, defaults to all
-z, --zipOutput       Zip output folder, defaults to false
-n, --Notebook        Convert notebooks into document, only used on MS Learn content
-f, --OutputFormat    The output format when the input is a URL. Valid values are [Markdown, Docx], defaults to Docx.
--help                Display help.
--version             Display version information.
Option Description
First parameter Specifies a local Learn module folder or docs Markdown page, URL to a Learn module/docs conceptual page, or a local .docx file.
Second parameter Specifies a local folder or file to output a Learn module/docs page to, or a .docx filename.
-g GitHub organization to get content from. This allows a fork of MicrosoftDocs to be used and requires a token.
-r Repository to pull content from. If provided, the input parameter should a folder in this repo. This requires a token.
-b Optional branch if content is not public. If provided, the input parameter should a folder in this repo. This requires a token.
-t GitHub token - if supplied, the token must have access to the specified repository and the tool will fetch the Markdown from there.
-d Debug - keeps all intermediary files.
-p Zone pivot to render when going from Learn to a .docx. If not supplied, all pivots are rendered.
-n If supplied, any notebooks in the module will be rendered in place.
-f Output format when the input is a URL. Valid values are [Markdown, Docx], defaults to Docx.
-s Indicates to render to a single page. This is only necessary if the input is a Word doc and the output filename does not indicate it
-z If supplied and converting from Learn to .docx, this will zip the generated folder.
should be a Markdown file.

Running the web version

The Blazor server version of the app consists of a Blazor Web Assembly client with a ASP.NET Web API backend host. You can run the host + client by starting the server application and then launching a web client pointed at https://localhost:5001/.

cd src/ConvertLearnToDoc
dotnet run .

Main page of web app

There are four options to the Blazor app:

  1. Word to Training - convert a Word .docx file to a Microsoft Learn training module (YAML and Markdown).
  2. Word to Article - convert a Word .docx file to a Microsoft Learn single-page conceptual article (Markdown).
  3. Learn to Word - convert a Microsoft Learn article or training module to a Word .docx file.
  4. Learn to Markdown - convert a Microsoft Learn article or training module to Markdown/YAML.

In the case of 1 & 2, you have the option to edit the metadata. The app pulls out the metadata from the Word document and allows editing. It passes the edited metadata back when the document is converted.

Restrictions

The Blazor app has a file size limit of 1Gb for the Word .docx file. This is captured in a variable in the ArticleOrModuleRef.cs:

private const int MAX_FILE_SIZE = 1024 * 1024 * 1024; // 1gb max size

You can change this to allow for larger file sizes.

Copyright (C) 2024 julmar.com