Deferred profile loading for better performance

Updated 2023-11-25: the initial code sample broke argument completers. The sample at the bottom is amended. It needed reflection code… <sigh>

I have pared down my Powershell profile to improve performance, but it does still take the best part of a second to load on my desktop machine.

As a powershell-everywhere person, my profile can be unacceptably slow on less-powerful machines. But I don’t want to lose any functionality.

When I open a shell, I typically want to start running simple commands soonest or sooner.

For a long time, my profile did nothing but dot-source scripts from my chezmoi repo and set the path to my git project root. My plan is to load most of these asynchronously, to reduce the time-to-interactive-prompt while still keeping all the functionality (if I just wait a second).

First approach ❌

Powershell compiles all scriptblocks and caches them, and caches all imported modules. That’s why the second import of a module is much faster.

So my first approach was:

open a runspace and dot-source all the script to warm up the compiled scriptblock and module cache
wait for the event of that runspace completing
re-run the script synchronously, with the benefit of the cache

This approach failed, because the action you can attach to an event handler will not run in the default runspace. It’s very challenging to take over the default runspace - I won’t say impossible, but it’s certainly designed to be impossible. And I’m not willing to churn out a huge pile of janky reflection code. So the only way would be to Wait-Event for the runspace, which blocks the current runspace… which defeats the point.

Second approach 🚀

Like so many times before, I found a pointer from SeeminglyScience - specifically, a way to pass global session state around as a variable.

Briefly: when you are at an interactive prompt and you import a module or function, you are adding it to the global SessionState. That is what I need to do to load my profile, but can’t normally do in an event handler or runspace.

So, I capture the global state:

$GlobalState = [psmoduleinfo]::new($false)
$GlobalState.SessionState = $ExecutionContext.SessionState

Then I start a runspace and pass in the global state:

Start-ThreadJob pulls in the ThreadJob module, but it is very fast to import

$Job = Start-ThreadJob -Name TestJob -ArgumentList $GlobalState -ScriptBlock {
    $GlobalState = $args[0]
}

In the runspace, I dot-source my profile in the context of the global state:

$Job = Start-ThreadJob -Name TestJob -ArgumentList $GlobalState -ScriptBlock {
    $GlobalState = $args[0]
    . $GlobalState {
        . "$HOME/.local/share/chezmoi/Console.ps1"
        . "$HOME/.local/share/chezmoi/git_helpers.ps1"
    }
}

…and bingo, when the job completes, the functions, aliases, variables and modules from my dot-sourced scripts are all available!

Even better: the functionality is incrementally available, so I can put rarely-used stuff towards the tail and still have my git helpers imported in milliseconds.

There is a bug, though. As written, the job always errors with, typically, “Unable to find command Import-Module” or similar. You see this error when you call Receive-Job. This only happens when starting a new shell, not when testing in a warm shell, so I suspect it’s related to Powershell’s initialisation sequence. A wait fixes the issue:

do {Start-Sleep -Milliseconds 200} until (Get-Command Import-Module -ErrorAction Ignore)

That wait needs to go inside the scriptblock that’s running in the global session state, and it needs to be a do-while - I could not get it to work without an initial wait.

To complete the implementation, I promote console configuration to the body of the profile script, start the job, and add an event handler to read any errors and clean up.

My profile has gone from ~990ms down to ~210ms, and 100ms of that is my starship init (which I could optimise further), so I call this a win. The asynchronous stuff is available within a second, maybe two. To test:

pwsh -NoProfile
Measure-Command {. $PROFILE.CurrentUserAllHosts}
exit

Full sample of a minimal solution

This breaks with VS Code shell integration. To disable it, set “terminal.integrated.shellIntegration.enabled” to “false” in your settings.

The full profile script:

<#
    ...cd to my preferred PWD
    ...set up my prompt
    ...run code that doesn't work in the deferred scriptblock
          (i.e. setting [console]::OutputEncoding)
#>


$Deferred = {
    . "/home/freddie/.local/share/chezmoi/PSHelpers/Console.ps1"
    . "/home/freddie/.local/share/chezmoi/PSHelpers/git_helpers.ps1"
    # ...other slow code...
}


# https://seeminglyscience.github.io/powershell/2017/09/30/invocation-operators-states-and-scopes
$GlobalState = [psmoduleinfo]::new($false)
$GlobalState.SessionState = $ExecutionContext.SessionState

# to run our code asynchronously
$Runspace = [runspacefactory]::CreateRunspace($Host)
$Powershell = [powershell]::Create($Runspace)
$Runspace.Open()
$Runspace.SessionStateProxy.PSVariable.Set('GlobalState', $GlobalState)

# ArgumentCompleters are set on the ExecutionContext, not the SessionState
# Note that $ExecutionContext is not an ExecutionContext, it's an EngineIntrinsics 😡
$Private = [Reflection.BindingFlags]'Instance, NonPublic'
$ContextField = [Management.Automation.EngineIntrinsics].GetField('_context', $Private)
$Context = $ContextField.GetValue($ExecutionContext)

# Get the ArgumentCompleters. If null, initialise them.
$ContextCACProperty = $Context.GetType().GetProperty('CustomArgumentCompleters', $Private)
$ContextNACProperty = $Context.GetType().GetProperty('NativeArgumentCompleters', $Private)
$CAC = $ContextCACProperty.GetValue($Context)
$NAC = $ContextNACProperty.GetValue($Context)
if ($null -eq $CAC)
{
    $CAC = [Collections.Generic.Dictionary[string, scriptblock]]::new()
    $ContextCACProperty.SetValue($Context, $CAC)
}
if ($null -eq $NAC)
{
    $NAC = [Collections.Generic.Dictionary[string, scriptblock]]::new()
    $ContextNACProperty.SetValue($Context, $NAC)
}

# Get the AutomationEngine and ExecutionContext of the runspace
$RSEngineField = $Runspace.GetType().GetField('_engine', $Private)
$RSEngine = $RSEngineField.GetValue($Runspace)
$EngineContextField = $RSEngine.GetType().GetFields($Private) | Where-Object {$_.FieldType.Name -eq 'ExecutionContext'}
$RSContext = $EngineContextField.GetValue($RSEngine)

# Set the runspace to use the global ArgumentCompleters
$ContextCACProperty.SetValue($RSContext, $CAC)
$ContextNACProperty.SetValue($RSContext, $NAC)

$Wrapper = {
    # Without a sleep, you get issues:
    #   - occasional crashes
    #   - prompt not rendered
    #   - no highlighting
    # Assumption: this is related to PSReadLine.
    # 20ms seems to be enough on my machine, but let's be generous - this is non-blocking
    Start-Sleep -Milliseconds 200

    . $GlobalState {. $Deferred; Remove-Variable Deferred}
}

$null = $Powershell.AddScript($Wrapper.ToString()).BeginInvoke()

Timings

Note that it takes a few millisconds to parse and start executing the profile. I need more than 74ms to get to a blinking cursor.

Time (s)	Waypoint
00.000	Just before invoking the asynchronous code
00.074	At the bottom of the profile; shell is interactive
00.275	Starting the deferred code, after the 200ms sleep (is it a PSReadline issue?)
00.802	Completed the deferred code; all functions and modules available

Using reflection to get round PSv2 lack of PSSerializer class

I had some code that used PSSerializer to serialize objects into XML. This blew up when run on PSv2, because PSv2 doesn’t expose the System.Management.Automation.PSSerializer class - leaving me looking at an annoying refactor to use Export-Clixml everywhere, which only ever writes to the filesystem.

I thought I’d have a look at what PSSerializer does under the hood, so I opened the source code for Powershell - which you can clone from Github - and searched for it.

I found it in serialization.cs. I’m really only interested in the Serialize and Deserialize methods and, fortunately, they turn out to be quite simple. Here’s the method declaration for Serialize:

public static string Serialize(Object source, int depth)
{
    // Create an xml writer
    StringBuilder sb = new StringBuilder();
    XmlWriterSettings xmlSettings = new XmlWriterSettings();
    xmlSettings.CloseOutput = true;
    xmlSettings.Encoding = System.Text.Encoding.Unicode;
    xmlSettings.Indent = true;
    xmlSettings.OmitXmlDeclaration = true;
    XmlWriter xw = XmlWriter.Create(sb, xmlSettings);

    // Serialize the objects
    Serializer serializer = new Serializer(xw, depth, true);
    serializer.Serialize(source);
    serializer.Done();
    serializer = null;

    // Return the output
    return sb.ToString();
}

Pretty simple, right… if I can also use those other classes. Well, StringBuilder and XmlWriterSettings are public, and I can find them even in PSv2, but Serializer is declared as an internal class, so I just get an error:

[System.Management.Automation.Serializer]
Unable to find type [System.Management.Automation.Serializer].
At line:1 char:1
+ [System.Management.Automation.Serializer]
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (System.Management.Automation.Serializer:TypeName) [], RuntimeException
    + FullyQualifiedErrorId : TypeNotFound

If I can access this class, then I can brew up an sort-of monkeypatch of PSSerializer’s Serialize() and Done() methods. This is where reflection comes in.

First, we use the assembly containing the class to get the type:

# System.Management.Automation
# Quickest way to get the right assembly
# is from a type in the same assembly
$SmaAssembly = [powershell].Assembly
$Type = $SmaAssembly.GetType('System.Management.Automation.Serializer')

$Type

IsPublic IsSerial Name           BaseType
-------- -------- ----           --------
False    False    Serializer     System.Object

This is a RuntimeType, just like the output from calling GetType() with no arguments on any object, except that it would be otherwise inaccessible (because it was declared internal rather than public).

Next we get a constructor (note that ‘.ctor’ is a common abbreviation for ‘constructor’):

$Ctor = $Type.GetConstructors('Instance, NonPublic') |
    Where-Object {$_.GetParameters().Count -eq 3}

I have Where-Object {$_.GetParameters().Count -eq 3} because Serializer has three constructors, and I want the one that matches the signature of the one used in the PSv3+ declaration of the PSSerializer class, which is new Serializer(xw, depth, true) in the C# source code.

The GetConstructors method takes an argument of type System.Reflection.BindingFlags. That is an enum. These are the values of the enum:
[Enum]::GetValues([System.Reflection.BindingFlags])
Default
IgnoreCase
DeclaredOnly
Instance
Static
Public
NonPublic
  ... etc ...
As the name suggests, this is a flag-type enum, which means that you can combine the values. This is usually the case wherever you see options that have power-of-two values, like 0x400, 0x800 etc etc. You bitwise-combine these to send the combination of options that you want - so 0x400 and 0x800 would be 0xC00. We want the Instance and the NonPublic options. In Powershell, the long way to write this out would be:
[System.Reflection.BindingFlags]::Instance -bor [System.Reflection.BindingFlags]::NonPublic
Fortunately, a single string containing comma-separated enum names will be combined, so we can just pass in 'Instance, NonPublic' to get the same effect.

To get back from our digression, we now have the constructor that we want and can invoke it:

# Constructor params
$Depth = 10    # like -Depth in ConvertTo-Json
$OutputBuilder = [Text.StringBuilder]::new()
$XmlWriter = [System.Xml.XmlWriter]::Create($OutputBuilder)

$Serializer = $Ctor.Invoke(@($XmlWriter, $Depth, $true))

We’re not done with reflection, unfortunately. To use this object, we need to call Serialize followed by Done. And those methods are also nonpublic. So we neeed to grab those:

$Methods = $Type.GetMethods('Instance, NonPublic')
$SerializeMethod = $Methods | Where-Object {$_.Name -eq 'Serialize' -and $_.GetParameters().Count -eq 1}
$DoneMethod = $Methods | Where-Object {$_.Name -eq 'Done'}

Now we can Do The Thing:

$DataToSerialize = "Foo"

$SerializeMethod.Invoke($Serializer, @($DataToSerialize))   # single param for .Serialize(data)
$DoneMethod.Invoke($Serializer, @())                        # empty list of params for .Done()

return $OutputBuilder.ToString()
# <?xml version="1.0" encoding="utf-16"?>
# <Objs Version="1.1.0.1" xmlns="http://schemas.microsoft.com/powershell/2004/04">
#     <S>Foo</S>
# </Objs>

Lazy loading for API attributes

Typically, when you implement an API client module, you’ll query the API and output the objects you get back - possibly with some changes, possibly just as they come from the API.

However, some APIs have large surface areas and a lot of attributes that an object might have. You may never need all of them, and you may well only need a couple.

What you want in this scenario is the ability to create the object without having to populate all the attributes at creation time, but where they can be fetched on first access. But you still want them to look and feel like properties, because that’s what your callers expect.

The demo code is in https://github.com/fsackur/LazyLoading. In this demo I’ll use a class called Server, but these techniques will also work for PSCustomObjects. Here’s the start:

class Server
{
    Server ()
    {
        foreach ($Prop in [Server]::_properties)
        {
            $this | Add-Member ScriptProperty -Name $Prop -Value {
                2 + 2
            }
        }
    }


    hidden static [string[]] $_properties = @(
        'Foo',
        'Bar',
        'Baz'
    )

}


$s = [Server]::new()
$s


# Output:

# Foo Bar Baz
# --- --- ---
#   4   4   4

Clearly, we’ve created an object with Foo, Bar and Baz properties. But we cannot add these to the class in the normal way, because we need to have properties that are implemented in code - we need some code to happen when we access a property. So we need to use members of type ScriptProperty, which are accessed like properties but the value is a scriptblock. Powershell classes don’t have a keyword to define these, so we have to add the scriptproperties dynamically in the constructor. When we look at any of the scriptproperties, the 2 + 2 scriptblock runs, giving us the 4s we see in the output.

This gives us a base for code-defined properties. We’re looping over the list of properties and defining a generic code block for each. But when we’re accessing the IpAddress property of a server object and we want it to run code to query for the IP address, we need that generic code to know that it’s running on the IpAddress property and not the SerialNumber property. And a scriptproperty doesn’t have a handy $PSCmdlet or $MyInvocation. So we’re going to use closures (see here for a quick intro to closures):

class Server
{
    Server ()
    {
        foreach ($Prop in [Server]::_properties)
        {
            $this | Add-Member ScriptProperty -Name $Prop -Value {
                $Prop
            }.GetNewClosure()
        }
    }


    hidden static [string[]] $_properties = @(
        'Foo',
        'Bar',
        'Baz'
    )

}


$s = [Server]::new()
$s


# Output:

# Foo Bar Baz
# --- --- ---
# Foo Bar Baz

We’ve added the GetNewClosure() call when creating the scriptblock, and that seals in the value of $Prop as it was at the time we called that method. Now $Prop contains the property name, and we can see that each property now outputs its own name.

Now let’s complete the sketch with an internal dictionary that holds the properties that have been fetched:

class Server
{
    Server ()
    {
        foreach ($Prop in [Server]::_properties)
        {
            $this | Add-Member ScriptProperty -Name $Prop -Value {

                if (-not $this._dict.ContainsKey($Prop))
                {
                    $this._dict[$Prop] = Get-ServerProperty $Prop -Server $this
                }

                return $this._dict[$Prop]

            }.GetNewClosure()
        }
    }


    hidden static [string[]] $_properties = @(
        'Foo',
        'Bar',
        'Baz'
    )


    hidden [hashtable] $_dict = @{}

}


$s = [Server]::new()
$s


# Output
# (for impl of Get-ServerProperty in the repo)

#       Foo       Bar       Baz
#       ---       ---       ---
# 230636006 145716468 285402623

We’ve implemented an inner dictionary to keep track of the properties we’ve already queried. If any property is accessed and already exists in the dictionary, it’s returned from the dictionary. If it doesn’t exist, it’s queried using the Get-ServerProperty function. (The function would implement your API code.)

If we declare _dict as System.Collections.Generic.Dictionary instead of hashtable, we can use TryGetValue(), which checks for the existence of a key and retrieves the value in one call.

Pre-fetching

You would very likely have a base set of attributes which you always fetch. It would be rare for all attributes to be lazy-loaded. Anything like an ID or an index property would have to be guaranteed to be present because it’s probably going to be required for subsequent API calls. You might fetch these eagerly in the constructor, or you could check on each property access that results in an API call whether the important properties have been fetched yet and include them in the call if not.

If you use, say, 3 attributes, this technique will result in 3 separate API calls, which is inefficient. You would likely need to provide a way for a caller to declare a list of attributes which it knows it will be using and fetch all of those attributes in one call. This might be done with a constructor parameter or by providing a method for the purpose.

Finally, note that the properties were populated merely by outputting the object to the pipeline. When $s is output, the default formatter (and this would be the same if you had explicitly piped to Format-List or what-have-you) looks at the properties in order to display them. This would result in 3 API calls for each object - your Format-List would get very slow!

As a mitigation strategy for this, you would likely define a default view with a named set of properties, and ensure that those attributes are pre-fetched on object creation (or first access). I’d recommend a Format.ps1xml for that task, but you can also dynamically create a DisplayPropertySet in the constructor.

Closures - scriptblocks with baggage

You may well expect what is going to happen here:

$a = 12

$sb = {$a}

$a = 33

& $sb

# Outputs: 33

When we execute the scriptblock $sb, it references $a but doesn’t declare it within its own body. Therefore, it looks at the parent scope and finds $a. At the time we execute $sb, the value of $a is 33. So the scriptblock outputs 33.

But what if we want to put some values into the scriptblock at creation time?

You could do something like this:

$a = 12

$sb = [scriptblock]::Create(
@"
    `$a = $a
    `$a
"@
)

$a = 33

& $sb

# Outputs: 33

but please don’t. It will be a pain to develop scriptblocks of any complexity, and your linter won’t parse your code. What you need is a closure:

$a = 12

$cl = {$a}.GetNewClosure()

$a = 33

& $cl

# Outputs: 12

So what is a closure? Well, it’s a scriptblock! But it specifically is a scriptblock that executes within the scope where it was created. GetNewClosure() takes the scriptblock that it’s called on, and associates it with the variables defined within whatever scope is active at the time. When we called GetNewClosure(), the value of $a is 12. That variable is captured and bound to the scriptblock.

Closures are found in many languages. They are very common in Javascript.

It works in function scopes too:

function Get-Closure
{
    param ($a)

    return {$a}.GetNewClosure()
}

$cl = Get-Closure -a 12

& $cl

# Outputs: 12

But watch out, because in Powershell, closure scopes only capture a single level of the enclosing scope, and not all the parent scopes all the way down:

$a = 12

function Get-Closure
{
    return {$a}.GetNewClosure()
}

$cl = Get-Closure

$a = 33

& $cl

# Outputs: 33

If we were to access $a in the function, we’d get 12. But it is not captured in the closure. Only variables declared in in the function body are captured by the GetNewClosure method call - it’s not sufficient for a variable to be accessible within the function body. (And for function body, read “whatever scope we called GetNewClosure() in.)

So, what can you do with it?

You can have a long and storied career without ever touching closures in Powershell. Closures are only really used in Powershell when doing metaprogramming, and metaprogramming is the solution to a very small subset of problems. (‘Metaprogramming’ is one term used for doing LISPy Javascripty stuff where you manipulate code with code.)

In my next blogpost (about lazy loading), I’ll show you a use case. But for now, here’s a quick one. Let’s say you have a filter:

$filter = {$_.Id -eq 'a8124ec8-0e5d-461e-a8f9-3c6359d44397'}

You’d commonly use something like that in a Where-Object statement:

$MyCat = $Cats | Where-Object $filter

Well, you can parameterise that and get a filter that you can pass around that will always find the same cat:

$Id = 'a8124ec8-0e5d-461e-a8f9-3c6359d44397'
$filter = {$_.Id -eq $Id}

Save on “lost cat” posters!

Apart from that, a function is just a named scriptblock that’s registered in the session function table. And a function that’s exported from a module (and can therefore access private functions and variables) is just a closure on the module’s scope. So if you wanted to dynamically export a function from a module, you could create the function body as a closure in the module’s scope and then register it in the session, like this:

# Contents of Module.psm1
function foo
{
    "As usual, foo."
}

Export-ModuleMember @()

$M = Import-Module .\Module.psm1 -PassThru
$M

# Outputs:

# ModuleType Version Name   ExportedCommands
# ---------- ------- ----   ----------------
# Script     0.0     Module

$BarBody = & $M {
    $FooCmd = Get-Command foo
    {
        $FooString = & $FooCmd
        $FooString, 'Bar.' -join ' '
    }.GetNewClosure()
}

The & $M { ... } formulation executes the outer scriptblock in the scope of the module.
In the outer scriptblock, we get the private function foo into a variable, so that it’s available to the closure
In the inner scriptblock, we call foo by referring to that variable
We convert the inner scriptblock into a closure and output it
We store the closure in $BarBody
Last, we register the function in the session using the function:\ PSProvider, as follows:

Set-Item function:\Global:bar $BarBody

bar

# Outputs: "As usual, foo. Bar."

Design Patterns presentation

I gave a talk at the PSUG-UK group on April 10th.

PSUG-UK is a Powershell user group with meetups across England and Scotland, they are a lovely community and you should really check them out when you’re in the UK. Follow them on Twitter or get a Slack invite or check out the London meetups

Software Engineering in Powershell

My talk was on the topic of Powershell and Principles of Software Engineering.

This was a bit cheeky of me, since I have no formal background in SE. But I do see the occasional bit of spaghetti code, and I do know something about the challenges that come with software that has grown organically.

In larger applications, software engineers manage this by following a few design principles, and by using design patterns. These things should be of interest to us writing in Powershell, too - not so much if we write only small scripts, but definitely if we create anything over a few hundred lines of code.

The bulk of the talk is a walkthrough of refactoring a project using SE principles. Slides, notes and demo code are on Github, and the full recording is on YouTube.

But why not start a few minutes in, and see two audience members demonstrate the concept of polymorphism using string and animal noises:

Enjoy!

Proxying Out-Default to apply custom formatting

I had the task of updating a CLI module for an API with the following goals:

items should be grouped by type
items representing errors should be displayed inline, yet with Powershell error formatting

The API runs one or more commands on remote systems and returns the command output, which may be a single object or multiple ones.

The CLI, as it stands, adds a PSTypeName to each item that names the command that generated that output. It also saves output into a global variable, so the user can easily retrieve output without having to rerun a command through the API.

Let’s shelve the error presentation for now and concentrate on the output grouping. This is the problem we’re trying to solve:

User runs Get-BootTime and Get-InstalledUpdates on devices 70061c9bb840 and 6a27a4067dc1
$LastOutput now contains four objects:

PS C:\dev> $LastOutput[0,1]

Device       BootTime
------       --------
70061c9bb840 22/02/2019 20:39:55
6a27a4067dc1 12/02/2019 07:29:52


PS C:\dev> $LastOutput[2,3]

Device       Updates
------       -------
70061c9bb840 {KB3001261, KB3120289}
6a27a4067dc1 {KB3120289, KB2600183, KB3100460}

But if we just pass $LastOutput down the pipeline, we lose information:

PS C:\dev> $LastOutput

Device       BootTime
------       --------
70061c9bb840 22/02/2019 20:39:55
6a27a4067dc1 12/02/2019 07:29:52
70061c9bb840
6a27a4067dc1

This is normal Powershell behaviour. When you don’t specify a formatting or output command at the end of a pipeline:

the engine calls the default formatter
the default formatter inspects the first object coming down the pipeline and sends all output to Format-Table if the first object has four visible properties or fewer, or to Format-List if it has five or more properties
Format-Table derives table headers for all the output coming down the pipeline from the first object

“Aha!” you cry. “So just use Group-Object.” This doesn’t achieve our goals, because the output is still going to go down a single pipeline:

PS C:\dev> $LastOutput | Group-Object {$_.PSTypeNames[0]} | Select-Object -ExpandProperty Group

Device       BootTime
------       --------
70061c9bb840 22/02/2019 20:39:55
6a27a4067dc1 12/02/2019 07:29:52
70061c9bb840
6a27a4067dc1

We are going to have to be a bit cleverer. This is where Out-Default comes in. Here’s what Get-Help has to say about Out-Default:

The Out-Default cmdlet sends output to the default formatter and the default output cmdlet. This cmdlet has no effect on the formatting or output of Windows PowerShell commands. It is a placeholder that lets you write your own Out-Default function or cmdlet.

In other words, it exists as a hook that you can access by clobbering the cmdlet with your own function.

WARNING: Any function that you write will almost certainly perform slower than the equivalent cmdlet in compiled code, and this will affect everything that outputs to the console where the user did not specify a format or output command. In tests, my version slows down commands that output to the console by 5-10%.

We need to generate a high-fidelity proxy for Out-Default. That means that we have to faithfully implement the parameters of the command that we are clobbering. For Out-Default, this is easy because it only has two parameters, but I’ll show you the technique anyway:

$Command = Get-Command Out-Default
$MetaData = [System.Management.Automation.CommandMetaData]::new($Command)
$ProxyDef = [System.Management.Automation.ProxyCommand]::Create($MetaData)

If you have PSScriptAnalyzer installed, you can format the output further - my team uses Allman bracket style, and I find the OTBS format that ProxyCommand.Create generates to be less legible.

$ProxyDef = Invoke-Formatter -ScriptDefinition $ProxyDef -Settings 'CodeFormattingAllman'

Let’s see what we have in $ProxyDef now:

[CmdletBinding(HelpUri = 'https://go.microsoft.com/fwlink/?LinkID=113362', RemotingCapability = 'None')]
param(
    [switch]
    ${Transcript},

    [Parameter(ValueFromPipeline = $true)]
    [psobject]
    ${InputObject})

begin
{
    try
    {
        $outBuffer = $null
        if ($PSBoundParameters.TryGetValue('OutBuffer', [ref]$outBuffer))
        {
            $PSBoundParameters['OutBuffer'] = 1
        }
        $wrappedCmd = $ExecutionContext.InvokeCommand.GetCommand('Microsoft.PowerShell.Core\Out-Default', [System.Management.Automation.CommandTypes]::Cmdlet)
        $scriptCmd = {& $wrappedCmd @PSBoundParameters }
        $steppablePipeline = $scriptCmd.GetSteppablePipeline($myInvocation.CommandOrigin)
        $steppablePipeline.Begin($PSCmdlet)
    }
    catch
    {
        throw
    }
}

process
{
    try
    {
        $steppablePipeline.Process($_)
    }
    catch
    {
        throw
    }
}

end
{
    try
    {
        $steppablePipeline.End()
    }
    catch
    {
        throw
    }
}
<#

.ForwardHelpTargetName Microsoft.PowerShell.Core\Out-Default
.ForwardHelpCategory Cmdlet

#>

Let’s break down the components of this auto-generated command definition:

Help

The HelpUri parameter of CmdletBinding and the ForwardHelpTargetName and ForwardHelpCategory components of the function help simply redirect help to the built-in command. We can edit the help functionality if we want. Note that it is perfectly valid to place the comment-based help block at the bottom, but it’s unusual. We’ll move it to the top as we edit the proxy command.

OutBuffer

This is a topic worthy of a separate post, so I won’t go into it here. Suffice to say, we don’t need it for this command, so I’ll delete it.

WrappedCommand

$wrappedCmd is populated by a call to $ExecutionContext.InvokeCommand.GetCommand() that specifies the original command with its module name, namely, Microsoft.PowerShell.Core\Out-Default. If you fail to specify the module, you’ll get a handle to yourself, resulting in infinite recursion. The result is exactly the same as calling Get-Command. The method call is slightly faster, but I prefer using the cmdlet for readability.

Scriptblock

$scriptCmd is a scriptblock that calls the original command with the parmaeters that were passed into our proxy function. When writing proxies that add functionality, you will need to remove any parameters that you’ve introduced from $PSBoundParameters before this line. However, we aren’t adding any parameters.

SteppablePipeline

We get the steppable pipeline from the scriptblock. This is an object that exposes the Begin, Process and End blocks of the wrapped command, so that we can call the wrapped command in a very granular way.

try/catches

I suspect that the ProxyCommand.Create method is adding these to provide a place to edit the exception handling, because a bare throw statement in a catch block simply passes on the exception unchanged in Powershell. We will delete these.

After tidying up, we are left with:

function Out-Default
{
    <#
        .SYNOPSIS
        A proxy for Out-Default that aggregates API output and processes it into groups, leaving all other input untouched.
    #>
    [CmdletBinding()]
    param
    (
        [Parameter(ValueFromPipeline = $true)]
        [psobject]$InputObject,

        [switch]$Transcript
    )

    begin
    {
        $OutDefaultCommand = Get-Command 'Microsoft.PowerShell.Core\Out-Default' -CommandType Cmdlet
        $OutDefaultSB      = {& $OutDefaultCommand @PSBoundParameters}
        $SteppablePipeline = $OutDefaultSB.GetSteppablePipeline($MyInvocation.CommandOrigin)

        $SteppablePipeline.Begin($PSCmdlet)
    }

    process
    {
        $SteppablePipeline.Process($_)
    }

    end
    {
        $SteppablePipeline.End()
    }
}

Hopefully this should be fairly easy to read. Sadly, it adds no value whatever! We need to add the functionality that lead us to creating this proxy:

begin
{
    $OutDefaultCommand = Get-Command 'Microsoft.PowerShell.Core\Out-Default' -CommandType Cmdlet
    $OutDefaultSB      = {& $OutDefaultCommand @PSBoundParameters}
    $SteppablePipeline = $OutDefaultSB.GetSteppablePipeline($MyInvocation.CommandOrigin)

    $SteppablePipeline.Begin($true)

    $GroupedOutput = [System.Collections.Generic.Dictionary[string, System.Collections.ArrayList]]::new()
}

process
{
    if ($_.PSTypeNames -contains 'ApiOutput')
    {
        $TypeName = $_.PSTypeNames[0]
        if (-not $GroupedOutput.ContainsKey($TypeName))
        {
            $GroupedOutput.Add($TypeName, [System.Collections.ArrayList]::new())
        }
        $null = $GroupedOutput[$TypeName].Add($_)
    }
    else
    {
        $SteppablePipeline.Process($_)
    }
}

end
{
    $GroupedOutput.Values |
        Format-Table |
        ForEach-Object {$SteppablePipeline.Process($_)}

    $SteppablePipeline.End()
    $SteppablePipeline.Dispose()
}

In short, we accumulate output into a dictionary of arraylists, then we pass each arraylist through Format-Table in the end block - which lets us do our grouping.

There’s a lot more to the production code; for example, I identify “error” objects and intersperse them in the output, and I detect whether we should be formatting as a table or list. But this is enough to demonstrate the principles.

Reflecting on types

Sometimes, we need to define classes in inline C#, like so:

Add-Type -TypeDefinition @'
using System.Collections;

public class MyClass
{
    private Int32 myInt;
        //etc

Typical reasons are that we might need to support a version of Powershell prior to the introduction of native classes, or we need to use P/Invoke to access native Win32 libraries.

When developing like this, the second time we run our code, we bang our heads on the type name already exists error:

Add-Type : Cannot add type. The type name 'MyClass' already exists.
At line:3 char:1
+ Add-Type -TypeDefinition @'
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (MyClass:String) [Add-Type], Exception
    + FullyQualifiedErrorId : TYPE_ALREADY_EXISTS,Microsoft.PowerShell.Commands.AddTypeCommand

This is because of AppDomains. You absolutely, massively and totally cannot unload a class from an AppDomain. In general in .NET, you can create a new AppDomain and unload it when you are done, but this is impossible in Powershell because the engine is still in the first AppDomain or, more accurately, if it is possible, it’s beyond me!

I often like to sketch code snippets before formally introducing them into the project, so I increment the class name each time with a hack like this:

if (-not $i) {$Global:i = 1} else {$i++}
Add-Type -TypeDefinition (@'
using System;
using System.Management.Automation;

public class Foo
{
    //code here
}

'@ -replace 'Foo', "Foo$i")

which saves me reloading my session every time I update the inline C#. But of course, then my class becomes Foo1, Foo2, Foo3.. and I have to keep editing my test code to reflect the changing typenames. This change saves me that:

$Type = Add-Type -PassThru -TypeDefinition (@'
    // code here
'@)
$TypeName = $Type.FullName

so I can have my test object of the latest class:

$Obj = New-Object $TypeName

or, in Powershell 5 and above:

$Obj = $Type::new()

(You can get at all public static members with this syntax.)

Ersatz classes in Powershell v2 part 2

In part 1, I outlined how you can use Import-Module’s -AsCustomObject to create an hacky class definition in versions of Powershell that don’t support native classes (i.e., before v5).

Should you do this? Proooobably not - it is a bit odd to read to anyone who is familiar with OOP, and also to anyone who is familiar with Powershell. But it does give you an ability that you may not have otherwise, and it does sidestep the serious problems with native classes in Powershell (issues reloading, issues epxorting defined types, issues with testability).

Here’s a hacky way to achieve a kind of inheritance:

$ParentDef = {
    # A type definition for a blogger

    # Constructor args
    param
    (
        [string]$Name,
        [string]$Topic
    )

    # Method definition
    function GetBlogpost([uint16] $Length)
    {
        return (@($Topic) * $Length) -join ' '
    }
}

$ChildDef = {
    # A type definition for a blogger named "Freddie" who posts about Powershell
    param ()

    #region
    # Code in the body is run at initialisation-time; in other words,
    # the body IS the constructor
    $Name = "Freddie"
    $Topic = "Powershell"

    # Constructor chaining
    New-Module $ParentDef -ArgumentList ($Name, $Topic)
    #endregion
}

$Child = New-Module $ChildDef -AsCustomObject
$Child.GetBlogpost(7)

Powershell Powershell Powershell Powershell Powershell Powershell Powershell

Similarly, we can override methods from the parent:

$ParentDef = {
    # A type definition for a blogger

    # Constructor args
    param
    (
        [string]$Name,
        [string]$Topic
    )

    # Method definition
    function GetBlogpost([uint16] $Length)
    {
        return (@($Topic) * $Length) -join ' '
    }
}

$ChildDef = {
    # A type definition for a blogger named "Freddie" who posts about Powershell
    param ()

    $Name = "Freddie"
    $Topic = "Powershell"

    New-Module $ParentDef -ArgumentList ($Name, $Topic)

    # Method override
    function GetBlogpost([uint16] $Length)
    {
        return (& $Base.ExportedCommands.GetBlogpost ($Length)).ToLower()
    }
}

$Child = New-Module $ChildDef -AsCustomObject
$Child.GetBlogpost(7)

powershell powershell powershell powershell powershell powershell powershell

As you see, the child now lower-cases the method return from the parent “class”.

This is such a messy way to write code that it’s probably better to upgrade target systems to powershell 5 and use native classes.

Ersatz classes in Powershell v2 part 1

Part 2 is here

There are some types of problem that object-oriented programming is very well suited for. You don’t particularly need it if you are running through a linear sequence of tasks, but when you have a problem that’s characterised by different types of thing and the relationships between your things are important, then OOP can let you write much more concise code.

Case in point: I have a project that’s for use on members of an AD domain, to identify domain controllers and test connectivity to each on AD ports. If the member is itself a domain controller, then the ports to be tested must be a superset of the ports needed by a non-DC member. I also want discovery to be done through DNS.

In my first attempt, I created a lot of hashtables to hold DNS servers and domain controllers. There were lots of loops over the items in hashtables, and variable bloat. I had to choose what to use as my keys, and settled on IP addresses, which brought further questions. The code got pretty unwieldy, and it wasn’t much fun to work on past a certain point.

There are a few aspects of the problem that cry out for a class-based solution:

DNS servers have behaviours such as query
Domain controllers have ports, which are accessible or not
Domain controllers are just a special case of domain members - they have all the behaviours and properties that members have
It would be nice to have some control over how we determine if two domain controllers are the same server
Domain members have authenticating domain controllers and domain controllers that are in the same site
Domain controllers have current replication partners and potential future replication partners

The project needs to be rewritten with classes. But I have further constraints: I want to support Powershell v3 and, ideally, v2. Native powershell classes were introduced in v5.

It’s perfectly valid to write classfiles in C# and import them directly into Powershell. To set against this approach, I did not want to limit the supportability of the project to only people familiar with C#, because this project is a tool for sysads.

I don’t mind if the simpler elements of the class are written in C# as long as the methods (where debuggers will typically spend their time) are in PS, but I couldn’t find any way to write the methods in the class such that they run in the hosting Powershell process’s first runspace.

I also considered this approach:

Add-Type -TypeDefinition @'
namespace Test
{
    public class Test
    {
        public string Name = "Freddie";
    }
}
'@ -IgnoreWarnings

$T = New-Object Test.Test
$T | Add-Member -MemberType ScriptMethod -Name 'GetName' -Value {
    return $this.Name
}
$T.GetName()

Freddie

This is b’fugly, and I don’t fancy having to answer questions about it.

The above snippet takes advantage of the fact that, unlike C#, Powershell is dynamically-typed. You can add members to an instance that are not present in the type definition. This can’t be done in C#.

Where I found some joy was in the -AsCustomObject switch available to the Import-Module and New-Module cmdlets. I’ve never seen this used, or even used it myself (until now), so I’ll take a minute to explore the meaning of this switch.

Module definition

A module definition is typically living in its own file with a .psm1 extension. As a rule, the primary function is to contain function definitions which are then exported into the calling scope when you import the module. You can control which elements of a module are exported and which remain hidden, which is similar in effect to declaring a member private in C# (we won’t dive into thie rabbit-hole now). You can also define variables, which then exist, by default, in the script scope (effectively, private to the module functions) but can optionally be exported as well. There are use cases for defining script variables, but in the majority of cases, a powershell module is purely a way of bundling functions together for code organisation.

However, if we look at what we’ve described, we have behaviours and properties, both of which can be public or private. Is that starting to sound like a type definition to you?

Let’s also add in that module definitions can have parameter blocks and accept argumants, and we have a rough implementation of the most important parts of a class system. So our friends in the Microsoft Powershell team reached a little further and furnished us with the…

AsCustomObject

… switch parameter.

When you import a module definition with Import-Module -AsCustomObject, instead of “importing” a “module”, Import-Module returns a powershell object.

The functions in the module definition become the methods of the object.

The variables in the module definition become the properties of the object.

The parameter block in the module definition has an equivalence to the constructor of a class, and Import-Module does the instantiation (instead of New-Object).

This language feature has been present in Powershell since at least v2, but hasn’t had much attention because it’s so completely different to the normal usage of Import-Module, and because it’s so completely different to how operations people started out using powershell.

It’s also pretty ugly.

Let me show you with some examples:

#Contents of ModuleDef.psm1
function Get-Month {
    $Date = Get-Date
    return $Date.ToString('MMMM')
}
#End of ModuleDef.psm1

$MyModuleObject = Import-Module .\ModuleDef.psm1 -AsCustomObject
$MyModuleObject.'Get-Month'()

February

Absolutely crazy. You can see that I am running the Get-Month function using dot-and-bracket syntax. (Because Get-Month has a hyphen, I have to enclose it in quotes. For this reason, when using this technique, you should name your functions in the accepted method style with no hyphen.)

Here is a longer example:

#Contents of MyAge.psm1
param (
    [int]$MyAge
)

$MinAgeToVote = 18
$MaxAgeToDance = 32

function AmICool {
    return ($MyAge -ge $MinAgeToVote -and
            $MyAge -lt $MaxAgeToDance)
}

Export-ModuleMember -Function * -Variable *

#End of MyAge.psm1

$MyAgeObject = Import-Module .\MyAge.psm1 -AsCustomObject `
                                            -ArgumentList (37)
$MyAgeObject.MyAge

$MyAgeObject.MinAgeToVote

$MyAgeObject.AmICool()

False

‘Elegant’ would be a stretch for this syntax, but it’s not bad if you don’t have powershell 5 on your systems. It’s worth diving into some more in part 2 of this blogpost.

Part 2

Regex to split a Pascal-case string

Postcard from the bowels of the regex beast!

I want to derive some exception classes and pass in a message that comes from the relevant value of ErrorCategory.

ErrorCategory is an enum with 32 values that are no-whitespace strings in Pascal case:

[Enum]::GetValues(
    [System.Management.Automation.ErrorCategory])

NotSpecified
OpenError
CloseError
DeviceError
DeadlockDetected
# ...etc

That [System.Enum]::GetValues() method works on any enum, e.g. [System.DayOfWeek]

I don’t want to text-edit them all myself, we have computers for that.

Step 1, split them:

[System.Enum]::GetValues(
    [System.Management.Automation.ErrorCategory]) |
    select -First 1 | foreach {
        [regex]::Matches(
            $_,
            '[A-Z][a-z]*'
        ).Value
    }

Not
Specified

Does anyone else select only the first item while work is in progress? Saves some scrolling. On which note, sorry about the awkward spacing, this blog theme makes horizontal space very precious.

Complete snippet:

[System.Enum]::GetValues(
    [System.Management.Automation.ErrorCategory]) |
    foreach {
        $Words = [regex]::Matches(
            $_,
            '[A-Z][a-z]*'
        ).Value

        $Words = @($Words[0]) + ($Words | select -Skip 1 | %{$_.ToLower()})
        $Words -join ' '
    }

Not specified
Open error
Close error
Device error
Deadlock detected
# ...etc

My love/hate relationship with regex continues.

← Newer Page 1 of 3 Older →