Skip to content

Parse Code from Markdown Files

Screenshot of Pester test output highlighting the use of aliases within markdown documentation code blocks

Introduction

Are you testing your documentation? If you write PowerShell scripts or modules, you are hopefully using Pester to test your code. And if you use PlatyPS to generate markdown documentation like I do, then you have a bunch of example PowerShell code sitting in .md files. But what happens if you rename a command, a parameter, or make a breaking change?

Your documentation is the face of your product. It's the source of truth for the people who use it, whether it's a PowerShell module or something else entirely unrelated. When your examples have errors in them, it won't be obvious to everyone. Some people may copy and paste your examples, see an error, and move on. Maybe they see the use of aliases and other coding patterns that are generally not recommended to use in source code or documentation and pick up those habits, or they become unsure about the overall quality of the product behind the documentation?

The MilestonePSTools PowerShell module I work on has 413 markdown files under the docs folder, and 394 of those files were generated by PlatyPS for commands in the module (in English and Spanish). I have a bunch of tests for the module itself, but until today I was not testing any examples or other PowerShell code blocks found in the documentation.

Oh aliases...

My PowerShell journey started in 2019 when I began building a module. I was learning PowerShell at the same time I was building what would become a commercially used module, and learning best practices and common patterns from the community. One important best practice I failed to learn early on was use a prefix for the nouns in command names to prevent collisions with commands from other modules. So after I while, I started to add a "Vms" prefix to the commands in the module, and I started renaming commands and adding an alias to the new command matching the old one to help prevent breaking changes.

The Pester test screenshot at the top of this post shows that there are some old pre-prefix commands still in use. At the time the documentation was written, these weren't aliases at all. But they are now, and people reading this documentation might be confused about the command names, or they may just naturally start using the alias version of those commands because it's in the documentation so it must be right!

Screenshot of PowerShell documentation on a website where the aliases Get-Token and Get-RecordingServer are used.

Demonstration

Let's take a look at an excerpt of the docs from another command, this time in markdown format. In the first example for Update-Bookmark, I used the "%" alias in place of ForEach-Object. To be fair, I wanted to keep the example line from being too long. But I know there are better strategies to achieve that.

Update-Bookmark.md
# Update-Bookmark

## SYNOPSIS

Updates the properties of a bookmark

## SYNTAX

```
Update-Bookmark -Bookmark <Bookmark> [<CommonParameters>]
```

## DESCRIPTION

The `Update-Bookmark` command updates a bookmark in the VMS by pushing changes
to the bookmark object up to the Management Server.

The expected workflow is that a bookmark is retrieved using Get-Bookmark.
Then properties of the local bookmark object are changed as desired.
Finally the modified local bookmark object is used to update the record on the Management Server by piping it to this cmdlet.

REQUIREMENTS

- Requires VMS connection and will attempt to connect automatically

## EXAMPLES

### EXAMPLE 1

```powershell
Get-Bookmark -Timestamp '2019-06-04 14:00:00' -Minutes 120 | % { $_.Description = 'Testing'; $_ | Update-Bookmark }
```

Gets all bookmarks for any device where the bookmark time is between 2PM and 4PM local time on the 4th of June, changes the Description to 'Testing', and sends the updated bookmark to the Management Server.

The Get-MdCodeBlock command uses regular expressions to determine whether a line represents the beginning, or end of a code fence, and whether inline code is present in that line. If a language shortcode is used, that information is grabbed and returned with each code block. For the markdown example above, that looks like...

Get-MdCodeBlock -Path .\Update-Bookmark.md | Select-Object Source, LineNumber, Position, Inline, Language | Format-Table

# Source                  LineNumber Position Inline Language
# ------                  ---------- -------- ------ --------
# Update-Bookmark.md               9        0  False
# Update-Bookmark.md              15        4   True
# Update-Bookmark.md              30        0  False powershell

For brevity I didn't include the Content property in the example output above, but you can probably see the value in checking all of the example code you wrote years ago and never looked at again, despite the code base seeing dramatic changes and growth over time.

Sample Pester Test

Here's a basic Pester test which uses Get-MdCodeBlock to extract the powershell example and pass the content to Invoke-ScriptAnalyzer.

markdown.tests.ps1
Describe 'Markdown Tests' {
    Context 'PowerShell Code Blocks are Valid' {
        BeforeDiscovery {
            . $PSScriptRoot\Get-MdCodeBlock.ps1
            $script:codeBlocks = Get-ChildItem '*.md' | Get-MDCodeBlock -Language powershell
        }

        It 'Analyze codeblock at <_>' -ForEach $script:codeBlocks {
            $analysis = Invoke-ScriptAnalyzer -ScriptDefinition $_.Content -Settings PSGallery
            $analysis | Where-Object Severity -ge 'Warning' | Out-String | Should -BeNullOrEmpty
        }
    }
}

I absolutely love having this improved visibility into the health of the documentation. The tests call out the file, line number, and give me the formatted output from PSScriptAnalyzer. And you can get even more creative by using the PowerShell language parser to extract an abstract syntax tree and inspect all code hiding in markdown files for just about anything.

A screenshot of the output from the above Pester tests

Code

Download

using namespace System.Text
using namespace System.Text.RegularExpressions

enum MdState {
    Undefined
    InCodeBlock
}

class CodeBlock {
    [string] $Source
    [string] $Language
    [string] $Content
    [int]    $LineNumber
    [int]    $Position
    [bool]   $Inline

    [string] ToString() {
        return '{0}:{1}:{2}' -f $this.Source, $this.LineNumber, $this.Language
    }
}

function Get-MdCodeBlock {
    <#
    .SYNOPSIS
    Gets code from inline code and fenced code blocks in markdown files.

    .DESCRIPTION
    Gets code from inline code and fenced code blocks in markdown files with
    support for simple PyMdown Snippets syntax, and the PyMdown InlineHilite
    extension which allows you to use a "shebang" like `#!powershell Get-ChildItem *.md -Recurse | Get-MdCodeBlock`.

    .PARAMETER Path
    Specifies the path to the markdown file from which to extract code blocks.

    .PARAMETER BasePath
    Specifies the base path to use when resolving relative file paths for the CodeBlock object's Source property.

    .PARAMETER Language
    Specifies that only the codeblocks with the named language shortcode should be returned.

    .EXAMPLE
    Get-ChildItem -Path .\*.md -Recurse | Get-MdCodeBlock

    Gets information about inline and fenced code from all .md files in the current directory and any subdirectories
    recursively.

    .EXAMPLE
    Get-MdCodeBlock -Path docs\*.md -BasePath docs\

    Gets information about inline and fenced code from all .md files in the "docs" subdirectory. The Source property
    on each CodeBlock object returned will be relative to the docs subdirectory.

    .EXAMPLE
    Get-MDCodeBlock -Path docs\*.md -BasePath docs\ -Language powershell | ForEach-Object {
        Invoke-ScriptAnalyzer -ScriptDefinition $_.Content
    }

    Gets all inline and fenced PowerShell code from all .md files in the docs\ directory, and runs each of them through
    PSScriptAnalyzer using `Invoke-ScriptAnalyzer`.

    .EXAMPLE
    Get-ChildItem -Path *.md -Recurse | Get-MdCodeBlock | Where-Object Language -eq 'powershell' | ForEach-Object {
        $tokens = $errors = $null
        $ast = [management.automation.language.parser]::ParseInput($_.Content, [ref]$tokens, [ref]$errors)
        [pscustomobject]@{
            CodeBlock = $_
            Tokens    = $tokens
            Errors    = $errors
            Ast       = $ast
        }
    }

    Gets all inline and fenced powershell code from all markdown files in the current directory and all subdirectories,
    and runs them through the PowerShell language parser to return a PSCustomObject with the original CodeBlock, and the
    tokens, errors, and Abstract Syntax Tree returned by the language parser. You might use this to locate errors in
    your documentation, or find very specific elements of PowerShell code.

    .NOTES
    [Pymdown Snippets extension](https://facelessuser.github.io/pymdown-extensions/extensions/snippets/)
    [Pymdown InlineHilite extension](https://facelessuser.github.io/pymdown-extensions/extensions/inlinehilite/)
    #>
    [CmdletBinding()]
    [OutputType([CodeBlock])]
    param (
        [Parameter(Mandatory, ValueFromPipeline, Position = 0)]
        [string[]]
        [SupportsWildcards()]
        $Path,

        [Parameter()]
        [string]
        $BasePath = '.',

        [Parameter()]
        [string]
        $Language
    )

    process {
        foreach ($unresolved in $Path) {
            foreach ($file in (Resolve-Path -Path $unresolved).Path) {
                $file = (Resolve-Path -Path $file).Path
                $BasePath = (Resolve-Path -Path $BasePath).Path
                $escapedRoot = [regex]::Escape($BasePath)
                $relativePath = $file -replace "$escapedRoot\\", ''


                # This section imports files referenced by PyMdown snippet syntax
                # Example: --8<-- "abbreviations.md"
                # Note: This function only supports very basic snippet syntax.
                # See https://facelessuser.github.io/pymdown-extensions/extensions/snippets/ for documentation on the Snippets PyMdown extension
                $lines = [io.file]::ReadAllLines($file, [encoding]::UTF8) | ForEach-Object {
                    if ($_ -match '--8<-- "(?<file>[^"]+)"') {
                        $snippetPath = Join-Path -Path $BasePath -ChildPath $Matches.file
                        if (Test-Path -Path $snippetPath) {
                            Get-Content -Path $snippetPath
                        } else {
                            Write-Warning "Snippet not found: $snippetPath"
                        }
                    } else {
                        $_
                    }
                }


                $lineNumber = 0
                $code = $null
                $state = [MdState]::Undefined
                $content = [stringbuilder]::new()

                foreach ($line in $lines) {
                    $lineNumber++
                    switch ($state) {
                        'Undefined' {
                            if ($line -match '^\s*```(?<lang>\w+)?' -and ([string]::IsNullOrWhiteSpace($Language) -or $Matches.lang -eq $Language)) {
                                $state = [MdState]::InCodeBlock
                                $code = [CodeBlock]@{
                                    Source     = $relativePath
                                    Language   = $Matches.lang
                                    LineNumber = $lineNumber
                                }
                            } elseif (($inlineMatches = [regex]::Matches($line, '(?<!`)`(#!(?<lang>\w+) )?(?<code>[^`]+)`(?!`)'))) {
                                if (-not [string]::IsNullOrWhiteSpace($Language) -and $inlineMatch.Groups.lang -ne $Language) {
                                    continue
                                }
                                foreach ($inlineMatch in $inlineMatches) {
                                    [CodeBlock]@{
                                        Source     = $relativePath
                                        Language   = $inlineMatch.Groups.lang
                                        Content    = $inlineMatch.Groups.code
                                        LineNumber = $lineNumber
                                        Position   = $inlineMatch.Index
                                        Inline     = $true
                                    }
                                }
                            }
                        }

                        'InCodeBlock' {
                            if ($line -match '^\s*```') {
                                $state = [MdState]::Undefined
                                $code.Content = $content.ToString()
                                $code
                                $code = $null
                                $null = $content.Clear()
                            } else {
                                $null = $content.AppendLine($line)
                            }
                        }
                    }
                }
            }
        }
    }
}

Comments