Quantcast
Channel: Jose Barreto's Blog
Viewing all 127 articles
Browse latest View live

New PowerShell cmdlets in Windows Server 2016 TP2 (compared to Windows Server 2012 R2)

$
0
0

 

1. State the problem

 

With the release of Windows Server 2016 TP2 a few weeks ago, I was wondering what new PowerShell cmdlets are now included (when you compare to Windows Server 2012 R2). However, the list of cmdlets is so long now that it is hard to spot the differences by hand.

However, there a cmdlet in PowerShell to show all the cmdlets available (Get-Command) and a little bit of programming would make it easy to find out what are the main differences. So I set out to collect the data and compare the list.

 

DISCLAIMER: As you probably know already, the Technical Preview is subject to change so all the information about Windows Server 2016 TP2 is preliminary and may not make it into the final product. Use with care, your mileage may vary, not available in all areas, some restrictions apply, professional PowerShell operator on a closed Azure VM course, do not attempt.

 

2. Gather the data

 

First, I needed the list of cmdlets from both versions of the operating system. That was actually pretty easy to gather, with a little help from Azure. I basically provisioned two Azure VM, one running Windows Server 2012 R2 and one running Windows Server 2016 Technical Preview 2 (yes, TP2 is now available in the regular Azure VM image gallery).

Second, I installed all of the Remote Server Administration Tools (RSAT) on both versions. That loads the PowerShell modules used for managing features that are not installed by default, like Failover Cluster or Storage Replica.

Finally, I ran a simple cmdlet to gather the list from Get-Command and save it to an XML file. This made it easier to put all the data I needed in a single place (my desktop machine running Windows 10 Insider Preview). Here's a summary of what it took:

  • Create WS 2012 R2 Azure VM
  • Install RSAT in the WS 2012 R2 VM
    • Get-WindowsFeature RSAT* | Install-WindowsFeature
  • Capture XML file with all the WS 2012 R2 cmdlet information
    • Get-Command | Select * | Export-CliXml C:\WS2012R2Cmdlets.XML
  • Create WS 2016 TP2 Azure VM
  • Install RSAT in the WS 2016 TP2 VM
    • Get-WindowsFeature RSAT* | Install-WindowsFeature

  • Capture XML file with all the WS 2016 TP2 cmdlet information
    • Get-Command | Select * | Export-CliXml C:\WS2016TP2Cmdlets.XML

 

3. Process the data

 

With the two XML files at hand, all I had left to do was to compare them to produce a good list of what's new. The first attempt resulted in a long list that was hard to understand, so I decided to do it module by module.

The code starts by creating a combined list of modules from both operating systems. Then it builds a dictionary of all cmdlets for a given module, assigning the value 1 if it's in WS 2012 R2, 2 if it's in WS 2016 TP2 and 3 if it's in both.

Then I would show the total number of cmdlets per module per OS, then number of new cmdlets and the actual list of new cmdlets. Since the goal was to publish this blog, I actually wrote the script to format the output as an HTML table. Quite handy :-).

 

4. Show the results

 

Finally, here is resulting table with all the new PowerShell cmdlets (by module) in Windows Server 2016 TP2, compared to Windows Server 2012. Enjoy!

 

ModuleNew CmdletsWS 2016 TP2
Cmdlets
WS 2012 R2
Cmdlets
03838
ActiveDirectory0147147
ADRMSAdmin02121
AppLocker055
Appx8146
+ Add-AppxVolume
+ Dismount-AppxVolume
+ Get-AppxDefaultVolume
+ Get-AppxVolume
+ Mount-AppxVolume
+ Move-AppxPackage
+ Remove-AppxVolume
+ Set-AppxDefaultVolume
BestPractices044
BitLocker01313
BitsTransfer088
BranchCache03232
CimCmdlets01414
CIPolicy110
+ ConvertFrom-CIPolicy
ClusterAwareUpdating01717
ConfigCI10100
+ Edit-CIPolicyRule
+ Get-CIPolicy
+ Get-CIPolicyInfo
+ Get-SystemDriver
+ Merge-CIPolicy
+ New-CIPolicy
+ New-CIPolicyRule
+ Remove-CIPolicyRule
+ Set-HVCIOptions
+ Set-RuleOption
Defender11110
+ Add-MpPreference
+ Get-MpComputerStatus
+ Get-MpPreference
+ Get-MpThreat
+ Get-MpThreatCatalog
+ Get-MpThreatDetection
+ Remove-MpPreference
+ Remove-MpThreat
+ Set-MpPreference
+ Start-MpScan
+ Update-MpSignature
DFSN02323
DFSR34542
+ Get-DfsrDelegation
+ Grant-DfsrDelegation
+ Revoke-DfsrDelegation
DhcpServer0121121
DirectAccessClientComponents01111
Dism44339
+ Add-WindowsCapability
+ Expand-WindowsCustomDataImage
+ Get-WindowsCapability
+ Remove-WindowsCapability
DnsClient01717
DnsServer21122101
+ Add-DnsServerClientSubnet
+ Add-DnsServerQueryResolutionPolicy
+ Add-DnsServerRecursionScope
+ Add-DnsServerZoneScope
+ Add-DnsServerZoneTransferPolicy
+ Disable-DnsServerPolicy
+ Enable-DnsServerPolicy
+ Get-DnsServerClientSubnet
+ Get-DnsServerQueryResolutionPolicy
+ Get-DnsServerRecursionScope
+ Get-DnsServerZoneScope
+ Get-DnsServerZoneTransferPolicy
+ Remove-DnsServerClientSubnet
+ Remove-DnsServerQueryResolutionPolicy
+ Remove-DnsServerRecursionScope
+ Remove-DnsServerZoneScope
+ Remove-DnsServerZoneTransferPolicy
+ Set-DnsServerClientSubnet
+ Set-DnsServerQueryResolutionPolicy
+ Set-DnsServerRecursionScope
+ Set-DnsServerZoneTransferPolicy
EventTracingManagement14140
+ Add-EtwTraceProvider
+ Get-AutologgerConfig
+ Get-EtwTraceProvider
+ Get-EtwTraceSession
+ New-AutologgerConfig
+ New-EtwTraceSession
+ Remove-AutologgerConfig
+ Remove-EtwTraceProvider
+ Remove-EtwTraceSession
+ Send-EtwTraceSession
+ Set-AutologgerConfig
+ Set-EtwTraceProvider
+ Set-EtwTraceSession
+ Start-AutologgerConfig
FailoverClusters28482
+ New-ClusterNameAccount
+ Update-ClusterFunctionalLevel
GroupPolicy02929
HgsClient11110
+ Export-HgsGuardian
+ Get-HgsAttestationBaselinePolicy
+ Get-HgsClientConfiguration
+ Get-HgsGuardian
+ Grant-HgsKeyProtectorAccess
+ Import-HgsGuardian
+ New-HgsGuardian
+ New-HgsKeyProtector
+ Remove-HgsGuardian
+ Revoke-HgsKeyProtectorAccess
+ Set-HgsClientConfiguration
Hyper-V26204178
+ Add-VMGroupMember
+ Add-VMSwitchTeamMember
+ Add-VMTPM
+ Disable-VMConsoleSupport
+ Enable-VMConsoleSupport
+ Get-VHDSet
+ Get-VHDSnapshot
+ Get-VMGroup
+ Get-VMHostCluster
+ Get-VMSwitchTeam
+ Get-VMTPM
+ Get-VMVideo
+ New-VMGroup
+ Optimize-VHDSet
+ Remove-VHDSnapshot
+ Remove-VMGroup
+ Remove-VMGroupMember
+ Remove-VMSwitchTeamMember
+ Rename-VMGroup
+ Set-VMHostCluster
+ Set-VMSwitchTeam
+ Set-VMTPM
+ Set-VMVideo
+ Start-VMTrace
+ Stop-VMTrace
+ Update-VMVersion
IISAdministration17170
+ Get-IISAppPool
+ Get-IISConfigCollectionItem
+ Get-IISConfigElement
+ Get-IISConfigSection
+ Get-IISConfigValue
+ Get-IISServerManager
+ Get-IISSite
+ New-IISConfigCollectionItem
+ New-IISSite
+ Remove-IISConfigCollectionItem
+ Remove-IISSite
+ Reset-IISServerManager
+ Set-IISConfigValue
+ Start-IISCommitDelay
+ Start-IISSite
+ Stop-IISCommitDelay
+ Stop-IISSite
International01818
iSCSI01313
IscsiTarget02828
ISE033
Kds066
Microsoft.PowerShell.Archive220
+ Compress-Archive
+ Expand-Archive
Microsoft.PowerShell.Core56055
+ Debug-Job
+ Enter-PSHostProcess
+ Exit-PSHostProcess
+ Get-PSHostProcessInfo
+ Register-ArgumentCompleter
Microsoft.PowerShell.Diagnostics055
Microsoft.PowerShell.Host022
Microsoft.PowerShell.Management48682
+ Clear-RecycleBin
+ Get-Clipboard
+ Get-ItemPropertyValue
+ Set-Clipboard
Microsoft.PowerShell.ODataUtils110
+ Export-ODataEndpointProxy
Microsoft.PowerShell.Security01313
Microsoft.PowerShell.Utility1110594
+ ConvertFrom-String
+ Convert-String
+ Debug-Runspace
+ Disable-RunspaceDebug
+ Enable-RunspaceDebug
+ Format-Hex
+ Get-Runspace
+ Get-RunspaceDebug
- GetStreamHash
+ New-Guid
+ New-TemporaryFile
+ Wait-Debugger
+ Write-Information
Microsoft.WSMan.Management01313
MMAgent055
MsDtc04141
NetAdapter46864
+ Disable-NetAdapterPacketDirect
+ Enable-NetAdapterPacketDirect
+ Get-NetAdapterPacketDirect
+ Set-NetAdapterPacketDirect
NetConnection022
NetEventPacketCapture02323
NetLbfo01313
NetNat01313
NetQos044
NetSecurity08585
NetSwitchTeam077
NetTCPIP03434
NetWNV01919
NetworkConnectivityStatus044
NetworkController1411410
+ Add-NetworkControllerNode
+ Clear-NetworkControllerNodeContent
+ Disable-NetworkControllerNode
+ Enable-NetworkControllerNode
+ Export-NetworkController
+ Get-NetworkController
+ Get-NetworkControllerCanaryConfiguration
+ Get-NetworkControllerCluster
+ Get-NetworkControllerCredential
+ Get-NetworkControllerDevice
+ Get-NetworkControllerDeviceGroupingTestConfiguration
+ Get-NetworkControllerDeviceGroups
+ Get-NetworkControllerDeviceGroupUsage
+ Get-NetworkControllerDeviceUsage
+ Get-NetworkControllerDiagnostic
+ Get-NetworkControllerDiscoveredTopology
+ Get-NetworkControllerExternalTestRule
+ Get-NetworkControllerFabricRoute
+ Get-NetworkControllerGoalTopology
+ Get-NetworkControllerInterface
+ Get-NetworkControllerInterfaceUsage
+ Get-NetworkControllerIpPool
+ Get-NetworkControllerIpPoolStatistics
+ Get-NetworkControllerIpSubnetStatistics
+ Get-NetworkControllerLogicalNetwork
+ Get-NetworkControllerLogicalSubnet
+ Get-NetworkControllerMonitoringService
+ Get-NetworkControllerNode
+ Get-NetworkControllerPhysicalHostInterfaceParameter
+ Get-NetworkControllerPhysicalHostParameter
+ Get-NetworkControllerPhysicalSwitchCpuUtilizationParameter
+ Get-NetworkControllerPhysicalSwitchInterfaceParameter
+ Get-NetworkControllerPhysicalSwitchMemoryUtilizationParameter
+ Get-NetworkControllerPhysicalSwitchParameter
+ Get-NetworkControllerPSwitch
+ Get-NetworkControllerPublicIpAddress
+ Get-NetworkControllerServer
+ Get-NetworkControllerServerInterface
+ Get-NetworkControllerSwitchBgpPeer
+ Get-NetworkControllerSwitchBgpRouter
+ Get-NetworkControllerSwitchConfig
+ Get-NetworkControllerSwitchNetworkRoute
+ Get-NetworkControllerSwitchPort
+ Get-NetworkControllerSwitchPortChannel
+ Get-NetworkControllerSwitchVlan
+ Get-NetworkControllerTopologyConfiguration
+ Get-NetworkControllerTopologyDiscoveryStatistics
+ Get-NetworkControllerTopologyLink
+ Get-NetworkControllerTopologyNode
+ Get-NetworkControllerTopologyTerminationPoint
+ Get-NetworkControllerTopologyValidationReport
+ Get-NetworkControllerVirtualInterface
+ Get-NetworkControllerVirtualNetworkUsage
+ Get-NetworkControllerVirtualPort
+ Get-NetworkControllerVirtualServer
+ Get-NetworkControllerVirtualServerInterface
+ Get-NetworkControllerVirtualSwitch
+ Get-NetworkControllerVirtualSwitchPortParameter
+ Import-NetworkController
+ Install-NetworkController
+ Install-NetworkControllerCluster
+ New-NetworkControllerCanaryConfiguration
+ New-NetworkControllerCredential
+ New-NetworkControllerDevice
+ New-NetworkControllerDeviceGroupingTestConfiguration
+ New-NetworkControllerDeviceGroups
+ New-NetworkControllerExternalTestRule
+ New-NetworkControllerInterface
+ New-NetworkControllerIpPool
+ New-NetworkControllerLogicalNetwork
+ New-NetworkControllerMonitoringService
+ New-NetworkControllerNodeObject
+ New-NetworkControllerPhysicalHostInterfaceParameter
+ New-NetworkControllerPhysicalHostParameter
+ New-NetworkControllerPhysicalSwitchCpuUtilizationParameter
+ New-NetworkControllerPhysicalSwitchInterfaceParameter
+ New-NetworkControllerPhysicalSwitchMemoryUtilizationParameter
+ New-NetworkControllerPhysicalSwitchParameter
+ New-NetworkControllerPSwitch
+ New-NetworkControllerPublicIpAddress
+ New-NetworkControllerServer
+ New-NetworkControllerServerInterface
+ New-NetworkControllerSwitchBgpPeer
+ New-NetworkControllerSwitchBgpRouter
+ New-NetworkControllerSwitchNetworkRoute
+ New-NetworkControllerSwitchPortChannel
+ New-NetworkControllerSwitchVlan
+ New-NetworkControllerTopologyLink
+ New-NetworkControllerTopologyNode
+ New-NetworkControllerTopologyTerminationPoint
+ New-NetworkControllerVirtualInterface
+ New-NetworkControllerVirtualPort
+ New-NetworkControllerVirtualServer
+ New-NetworkControllerVirtualServerInterface
+ New-NetworkControllerVirtualSwitch
+ New-NetworkControllerVirtualSwitchPortParameter
+ Remove-NetworkControllerCanaryConfiguration
+ Remove-NetworkControllerCredential
+ Remove-NetworkControllerDevice
+ Remove-NetworkControllerDeviceGroupingTestConfiguration
+ Remove-NetworkControllerDeviceGroups
+ Remove-NetworkControllerExternalTestRule
+ Remove-NetworkControllerFabricRoute
+ Remove-NetworkControllerInterface
+ Remove-NetworkControllerIpPool
+ Remove-NetworkControllerLogicalNetwork
+ Remove-NetworkControllerLogicalSubnet
+ Remove-NetworkControllerNode
+ Remove-NetworkControllerPhysicalSwitchCpuUtilizationParameter
+ Remove-NetworkControllerPhysicalSwitchMemoryUtilizationParameter
+ Remove-NetworkControllerPSwitch
+ Remove-NetworkControllerPublicIpAddress
+ Remove-NetworkControllerServer
+ Remove-NetworkControllerServerInterface
+ Remove-NetworkControllerSwitchBgpPeer
+ Remove-NetworkControllerSwitchBgpRouter
+ Remove-NetworkControllerSwitchNetworkRoute
+ Remove-NetworkControllerSwitchPortChannel
+ Remove-NetworkControllerSwitchVlan
+ Remove-NetworkControllerTopologyLink
+ Remove-NetworkControllerTopologyNode
+ Remove-NetworkControllerTopologyTerminationPoint
+ Remove-NetworkControllerVirtualInterface
+ Remove-NetworkControllerVirtualPort
+ Remove-NetworkControllerVirtualServer
+ Remove-NetworkControllerVirtualServerInterface
+ Remove-NetworkControllerVirtualSwitch
+ Repair-NetworkControllerCluster
+ Set-NetworkController
+ Set-NetworkControllerCluster
+ Set-NetworkControllerDiagnostic
+ Set-NetworkControllerFabricRoute
+ Set-NetworkControllerGoalTopology
+ Set-NetworkControllerLogicalSubnet
+ Set-NetworkControllerNode
+ Set-NetworkControllerSwitchConfig
+ Set-NetworkControllerSwitchPort
+ Set-NetworkControllerTopologyConfiguration
+ Start-NetworkControllerTopologyDiscovery
+ Uninstall-NetworkController
+ Uninstall-NetworkControllerCluster
NetworkLoadBalancingClusters03535
NetworkSwitchManager19190
+ Disable-NetworkSwitchEthernetPort
+ Disable-NetworkSwitchFeature
+ Disable-NetworkSwitchVlan
+ Enable-NetworkSwitchEthernetPort
+ Enable-NetworkSwitchFeature
+ Enable-NetworkSwitchVlan
+ Get-NetworkSwitchEthernetPort
+ Get-NetworkSwitchFeature
+ Get-NetworkSwitchGlobalData
+ Get-NetworkSwitchVlan
+ New-NetworkSwitchVlan
+ Remove-NetworkSwitchEthernetPortIPAddress
+ Remove-NetworkSwitchVlan
+ Restore-NetworkSwitchConfiguration
+ Save-NetworkSwitchConfiguration
+ Set-NetworkSwitchEthernetPortIPAddress
+ Set-NetworkSwitchPortMode
+ Set-NetworkSwitchPortProperty
+ Set-NetworkSwitchVlanProperty
NetworkTransition03434
NFS04242
Nps-6713
- Get-NpsRemediationServer
- Get-NpsRemediationServerGroup
- New-NpsRemediationServer
- New-NpsRemediationServerGroup
- Remove-NpsRemediationServer
- Remove-NpsRemediationServerGroup
PackageManagement10100
+ Find-Package
+ Get-Package
+ Get-PackageProvider
+ Get-PackageSource
+ Install-Package
+ Register-PackageSource
+ Save-Package
+ Set-PackageSource
+ Uninstall-Package
+ Unregister-PackageSource
PcsvDevice495
+ Clear-PcsvDeviceLog
+ Get-PcsvDeviceLog
+ Set-PcsvDeviceNetworkConfiguration
+ Set-PcsvDeviceUserPassword
Pester20200
+ AfterAll
+ AfterEach
+ Assert-MockCalled
+ Assert-VerifiableMocks
+ BeforeAll
+ BeforeEach
+ Context
+ Describe
+ Get-MockDynamicParameters
+ Get-TestDriveItem
+ In
+ InModuleScope
+ Invoke-Mock
+ Invoke-Pester
+ It
+ Mock
+ New-Fixture
+ Set-DynamicParameterVariables
+ Setup
+ Should
PKI01717
PnpDevice440
+ Disable-PnpDevice
+ Enable-PnpDevice
+ Get-PnpDevice
+ Get-PnpDeviceProperty
PowerShellGet11110
+ Find-Module
+ Get-InstalledModule
+ Get-PSRepository
+ Install-Module
+ Publish-Module
+ Register-PSRepository
+ Save-Module
+ Set-PSRepository
+ Uninstall-Module
+ Unregister-PSRepository
+ Update-Module
PrintManagement02222
PSDesiredStateConfiguration51712
+ Connect-DscConfiguration
+ Find-DscResource
+ Get-DscConfigurationStatus
+ Invoke-DscResource
+ Publish-DscConfiguration
PSDiagnostics01010
PSReadline550
+ Get-PSReadlineKeyHandler
+ Get-PSReadlineOption
+ Remove-PSReadlineKeyHandler
+ Set-PSReadlineKeyHandler
+ Set-PSReadlineOption
PSScheduledJob01616
PSWorkflow022
PSWorkflowUtility011
RemoteAccess14121107
+ Add-BgpRouteAggregate
+ Add-VpnSstpProxyRule
+ Clear-BgpRouteFlapDampening
+ Disable-BgpRouteFlapDampening
+ Enable-BgpRouteFlapDampening
+ Get-BgpRouteAggregate
+ Get-BgpRouteFlapDampening
+ Get-VpnSstpProxyRule
+ New-VpnSstpProxyRule
+ Remove-BgpRouteAggregate
+ Remove-VpnSstpProxyRule
+ Set-BgpRouteAggregate
+ Set-BgpRouteFlapDampening
+ Set-VpnSstpProxyRule
RemoteDesktop57873
+ Export-RDPersonalSessionDesktopAssignment
+ Get-RDPersonalSessionDesktopAssignment
+ Import-RDPersonalSessionDesktopAssignment
+ Remove-RDPersonalSessionDesktopAssignment
+ Set-RDPersonalSessionDesktopAssignment
ScheduledTasks01919
SecureBoot055
ServerCore022
ServerManager077
ServerManagerTasks01111
ShieldedVMDataFile330
+ Import-ShieldingDataFile
+ New-VolumeIDQualifier
+ Protect-ShieldingDataFile
ShieldedVMTemplate110
+ Protect-ServerVHDX
SmbShare03535
SmbWitness033
SoftwareInventoryLogging01111
StartScreen033
Storage32140108
+ Block-FileShareAccess
+ Clear-StorageDiagnosticInfo
+ Debug-FileShare
+ Debug-StorageSubSystem
+ Disable-PhysicalDiskIdentification
+ Disable-StorageDiagnosticLog
+ Enable-PhysicalDiskIdentification
+ Enable-StorageDiagnosticLog
+ Get-DedupProperties
+ Get-DiskSNV
+ Get-DiskStorageNodeView
+ Get-FileShare
+ Get-FileShareAccessControlEntry
+ Get-StorageAdvancedProperty
+ Get-StorageDiagnosticInfo
+ Get-StorageEnclosureSNV
+ Get-StorageEnclosureStorageNodeView
+ Get-StorageFaultDomain
+ Get-StorageFileServer
+ Grant-FileShareAccess
+ New-FileShare
+ New-StorageFileServer
+ Optimize-StoragePool
+ Remove-FileShare
+ Remove-StorageFileServer
+ Revoke-FileShareAccess
+ Set-FileShare
+ Set-StorageFileServer
+ Start-StorageDiagnosticLog
+ Stop-StorageDiagnosticLog
+ Stop-StorageJob
+ Unblock-FileShareAccess
StorageQoS660
+ Get-StorageQoSFlow
+ Get-StorageQoSPolicy
+ Get-StorageQoSVolume
+ New-StorageQoSPolicy
+ Remove-StorageQoSPolicy
+ Set-StorageQoSPolicy
StorageReplica11110
+ Get-SRGroup
+ Get-SRPartnership
+ New-SRGroup
+ New-SRPartnership
+ Remove-SRGroup
+ Remove-SRPartnership
+ Set-SRGroup
+ Set-SRPartnership
+ Suspend-SRGroup
+ Sync-SRGroup
+ Test-SRTopology
TLS374
+ Disable-TlsCipherSuite
+ Enable-TlsCipherSuite
+ Get-TlsCipherSuite
TroubleshootingPack022
TrustedPlatformModule01111
UpdateServices41612
+ Add-WsusDynamicCategory
+ Get-WsusDynamicCategory
+ Remove-WsusDynamicCategory
+ Set-WsusDynamicCategory
UserAccessLogging01414
VpnClient01919
Wdac01212
WebAdministration08080
Whea022
WindowsDeveloperLicense033
WindowsErrorReporting033
WindowsSearch022

 

5. Share the code

 

For those wondering about the script I used to compile the results, here it goes.

#
# Enumerating all the modules from both OS versions
#

# Load XML files into memory: $Files[0] and $Files[1]

$Files
= ( (Import-Clixml"C:\WS2012R2Cmdlets.XML"),
           (Import-Clixml"C:\WS2016TP2Cmdlets.XML") )

# Create empty dictionary for modules


$ModuleDict
= @{}

# Loop through the two files to gather module info

$Files
|% {
  $_|GroupModuleName|SortName|% {
    $Module=$_.Name

    # If found, increase count. If not, add to dictionary


    If
($ModuleDict.ContainsKey($Module)) {
      $ModuleDict.$Module++
    } Else {
      $ModuleDict.Add($Module,1)
    } # End If

  }
# End Import

}
# End $Files

#
# Enumerate the cmdlets in every module
#

# Add the HTML table header

Write-Host
"<table border=1><tr><td><b>Module</b></td><td>New Cmdlets</td><td>WS 2016 TP2</td><td>WS 2012 R2</td></tr>"

# Loop through the modules in the dictionary

$ModuleDict
.GetEnumerator() |SortName|% {

  # Initialize variables for a new module


  $Module
=$_.Name
  $VersionCount= (0,0)
  $CmdletDict= @{}

  # Loop through the two files, filtering by module

  0
..1|% {

    $WSVersion
=$_
    $Files[$_]|?ModuleName-eq$Module|% {

      $Cmdlet
=$_.Name

      # Count cmdlets by module for each OS version

      $VersionCount
[$WSVersion]++

      # Increase per-cmdlet value by 1 (WS2012R2) or by 2 (WS2016TP2)
      # If cmdlet exists in both OSes, value will be 3

      If
($CmdletDict.ContainsKey($Cmdlet)) {
        $CmdletDict.$Cmdlet+= ($WSVersion+1)
      } Else {
        $CmdletDict.Add($Cmdlet, ($WSVersion+1))
      } # End If

    }
# End %

  }
# End 0..1

  #
  # Output the list of cmdlets that changed in every module
  #

  # Copy data to single variables for easy use with Write-Host
  
  $WS0
=$VersionCount[0]
  $WS1=$VersionCount[1]
  $Dif=$WS1-$WS0
  $CrLf="<BR>"+[char]10+[char]13

  # Write HTML table row with module summary information

  Write-Host
"<tr><td><b>$Module</b></td><td align=`"right`">$Dif</td><td align=`"right`">$WS1</td><td align=`"right`">$WS0</td></tr>" 

  # If there are cmdlets in the module

  If
($CmdletDict.Count -gt0) {

    # Gather all new and removed cmdlets in a variable

    $CmdletList
=""
    $CmdletDict.GetEnumerator() |? {$_.Value -eq2-or$_.Value -eq1} |SortName|% {

      # 1 means removed cmdlet. 2 means new cmdlet

      $Name
=$_.Name
      If ($_.Value -eq1) {
        $CmdletList+="- $Name"+$CrLf
      } else {
        $CmdletList+="+ $Name"+$CrLf
      } # End If

    }
# End Enumerator

    # If new or removed exist, write another HTML table row

    If
($CmdletList-ne"") {
      Write-Host"<tr><td colspan=4>$CmdletList</td></tr>"
    } # End If

  }
# End if

} # End Module

# Write HTML table end. All done.

Write-Host
"</table>"

 


Microsoft SQL Server 2014 PowerShell cmdlet popularity

$
0
0

If you follow the blog, you probably saw a little PowerShell script I published a while back to measure the popularity of the cmdlets in a certain module using a Bing search. As an example, that blog showed the popularity of the cmdlets in the SmbShare module.

Now I got curious about how the cmdlets in other modules would rank, so I spun up some Azure virtual machines to try some other modules. I decided to try the Microsoft SQL Server 2014 main module (named SQLPS).

image

The results are listed below:

PS C:\> .\PopularCmdlet.ps1 | Sort BingCount -Descending | FT -AutoSize

CmdletName                               BingCount
———-                               ———
Invoke-Sqlcmd                                70700
Backup-SqlDatabase                           34100
Restore-SqlDatabase                          20300
Get-SqlDatabase                               7250
Invoke-PolicyEvaluation                       7170
Enable-SqlAlwaysOn                            5460
Add-SqlAvailabilityDatabase                   4230
Test-SqlAvailabilityGroup                     4050
Get-SqlInstance                               3850
Encode-SqlName                                3040
Set-SqlAvailabilityReplica                    2970
Test-SqlAvailabilityReplica                   2680
Join-SqlAvailabilityGroup                     2350
Switch-SqlAvailabilityGroup                   2330
Test-SqlDatabaseReplicaState                  2250
Set-SqlHADREndpoint                           2230
Set-SqlAvailabilityGroupListener              1930
Remove-SqlAvailabilityDatabase                1920
Convert-UrnToPath                             1790
Decode-SqlName                                1690
Disable-SqlAlwaysOn                           1370
Remove-SqlAvailabilityReplica                 1290
Set-SqlAvailabilityGroup                      1100
Suspend-SqlAvailabilityDatabase               1070
Resume-SqlAvailabilityDatabase                1050
Remove-SqlAvailabilityGroup                    889
Add-SqlAvailabilityGroupListenerStaticIp        50
New-SqlBackupEncryptionOption                   34
Get-SqlSmartAdmin                               34
Set-SqlSmartAdmin                               31
Get-SqlCredential                               28
Set-SqlCredential                               25
Remove-SqlCredential                            23
Set-SqlAuthenticationMode                       20
Test-SqlSmartAdmin                              18
Start-SqlInstance                               15
Stop-SqlInstance                                15
Set-SqlNetworkConfiguration                     10
Add-SqlFirewallRule                              9
Remove-SqlFirewallRule                           9
New-SqlAvailabilityGroup                         5
New-SqlAvailabilityGroupListener                 4
New-SqlHADREndpoint                              3
New-SqlCredential                                3
New-SqlAvailabilityReplica                       3

PS C:\>

Windows Server 2012 R2 Storage PowerShell cmdlet popularity

$
0
0

If you follow the blog, you probably saw a little PowerShell script I published a while back to measure the popularity of the cmdlets in a certain module using a Bing search. As an example, that blog showed the popularity of the cmdlets in the SmbShare module.

Now I got curious about how the cmdlets in other modules would rank, so I spun up some Azure virtual machines to try some other modules. I decided to try the Storage module in Windows Server 2012 R2 (named simply Storage).

image

The results are listed below:

PS C:\> .\PopularCmdlet.ps1 | Sort BingCount -Descending | FT -AutoSize

CmdletName                             BingCount
———-                             ———
New-Volume                                688000
New-Partition                             403000
Set-Volume                                268000
Get-Volume                                156000
Format-Volume                             111000
Set-Partition                              82300
Set-Disk                                   73700
Get-Disk                                   72500
Flush-Volume                               71300
Resize-Partition                           66000
Clear-Disk                                 65900
Repair-Volume                              63400
Get-Partition                              62100
Initialize-Disk                            56000
Update-Disk                                52000
Remove-Partition                           49000
Optimize-Volume                            38700
New-VirtualDisk                            29500
Mount-DiskImage                            25700
Get-VirtualDisk                            17900
Get-PhysicalDisk                           17400
Repair-VirtualDisk                         14600
Remove-PhysicalDisk                        11200
Set-VirtualDisk                            10600
Remove-VirtualDisk                          9340
Set-PhysicalDisk                            8250
Get-DiskImage                               7220
New-StoragePool                             7130
Dismount-DiskImage                          7080
Initialize-Volume                           7040
Get-StoragePool                             5860
Set-FileStorageTier                         4810
Resize-VirtualDisk                          4620
Get-StorageEnclosure                        4380
Get-StorageReliabilityCounter               4340
Connect-VirtualDisk                         4340
Add-PartitionAccessPath                     4330
Set-StoragePool                             4250
Get-StorageSubSystem                        4170
Get-StorageTier                             4080
Get-InitiatorPort                           4060
Set-FileIntegrity                           4030
Get-FileStorageTier                         3930
Add-PhysicalDisk                            3930
Reset-PhysicalDisk                          3790
Get-FileIntegrity                           3780
Clear-FileStorageTier                       3620
Get-StorageProvider                         3500
Set-ResiliencySetting                       3490
Get-PartitionSupportedSize                  3440
Get-MaskingSet                              3360
Unregister-StorageSubsystem                 3270
Repair-FileIntegrity                        3250
Add-InitiatorIdToMaskingSet                 3200
Set-StorageSubSystem                        3180
Remove-StoragePool                          3160
Get-StorageJob                              2910
Get-InitiatorId                               48
Get-ResiliencySetting                         47
Remove-PartitionAccessPath                    47
Get-TargetPort                                46
Add-TargetPortToMaskingSet                    45
Get-StorageTierSupportedSize                  44
Get-StorageSetting                            39
Resize-StorageTier                            39
Disconnect-VirtualDisk                        36
Get-StorageNode                               36
Set-StorageSetting                            35
Get-TargetPortal                              35
Get-OffloadDataTransferSetting                35
Set-StorageProvider                           32
Enable-PhysicalDiskIndication                 30
Add-VirtualDiskToMaskingSet                   29
Set-InitiatorPort                             29
Set-StorageTier                               27
Remove-StorageTier                            26
Remove-MaskingSet                             26
Remove-InitiatorIdFromMaskingSet              25
Get-VirtualDiskSupportedSize                  25
Remove-InitiatorId                            25
Remove-VirtualDiskFromMaskingSet              25
Get-SupportedFileSystems                      24
Hide-VirtualDisk                              24
Get-VolumeCorruptionCount                     23
Rename-MaskingSet                             23
Get-VolumeScrubPolicy                         22
Write-VolumeCache                             22
Get-SupportedClusterSizes                     21
Enable-StorageEnclosureIdentification         21
Remove-TargetPortFromMaskingSet               20
Get-StorageEnclosureVendorData                17
Set-VolumeScrubPolicy                         16
Disable-PhysicalDiskIndication                 9
Disable-StorageEnclosureIdentification         9
Register-StorageSubsystem                      8
Reset-StorageReliabilityCounter                7
Write-FileSystemCache                          7
Update-StoragePool                             4
New-VirtualDiskSnapshot                        3
New-VirtualDiskClone                           3
Update-StorageProviderCache                    3
Update-HostStorageCache                        3
New-MaskingSet                                 3
New-StorageSubsystemVirtualDisk                3
Show-VirtualDisk                               3
New-StorageTier                                3
Get-PhysicalDiskStorageNodeView                2
Get-PhysicalDiskSNV                            1

PS C:\>

New PowerShell cmdlets in Windows Server 2016 TP2 (compared to Windows Server 2012 R2)

$
0
0

 

1. State the problem

 

With the release of Windows Server 2016 TP2 a few weeks ago, I was wondering what new PowerShell cmdlets are now included (when you compare to Windows Server 2012 R2). However, the list of cmdlets is so long now that it is hard to spot the differences by hand.

However, there a cmdlet in PowerShell to show all the cmdlets available (Get-Command) and a little bit of programming would make it easy to find out what are the main differences. So I set out to collect the data and compare the list.

 

DISCLAIMER: As you probably know already, the Technical Preview is subject to change so all the information about Windows Server 2016 TP2 is preliminary and may not make it into the final product. Use with care, your mileage may vary, not available in all areas, some restrictions apply, professional PowerShell operator on a closed Azure VM course, do not attempt.

 

2. Gather the data

 

First, I needed the list of cmdlets from both versions of the operating system. That was actually pretty easy to gather, with a little help from Azure. I basically provisioned two Azure VM, one running Windows Server 2012 R2 and one running Windows Server 2016 Technical Preview 2 (yes, TP2 is now available in the regular Azure VM image gallery).

Second, I installed all of the Remote Server Administration Tools (RSAT) on both versions. That loads the PowerShell modules used for managing features that are not installed by default, like Failover Cluster or Storage Replica.

Finally, I ran a simple cmdlet to gather the list from Get-Command and save it to an XML file. This made it easier to put all the data I needed in a single place (my desktop machine running Windows 10 Insider Preview). Here’s a summary of what it took:

  • Create WS 2012 R2 Azure VM
  • Install RSAT in the WS 2012 R2 VM
    • Get-WindowsFeature RSAT* | Install-WindowsFeature
  • Capture XML file with all the WS 2012 R2 cmdlet information
    • Get-Command | Select * | Export-CliXml C:WS2012R2Cmdlets.XML
  • Create WS 2016 TP2 Azure VM
  • Install RSAT in the WS 2016 TP2 VM
    • Get-WindowsFeature RSAT* | Install-WindowsFeature

  • Capture XML file with all the WS 2016 TP2 cmdlet information
    • Get-Command | Select * | Export-CliXml C:WS2016TP2Cmdlets.XML

 

3. Process the data

 

With the two XML files at hand, all I had left to do was to compare them to produce a good list of what’s new. The first attempt resulted in a long list that was hard to understand, so I decided to do it module by module.

The code starts by creating a combined list of modules from both operating systems. Then it builds a dictionary of all cmdlets for a given module, assigning the value 1 if it’s in WS 2012 R2, 2 if it’s in WS 2016 TP2 and 3 if it’s in both.

Then I would show the total number of cmdlets per module per OS, then number of new cmdlets and the actual list of new cmdlets. Since the goal was to publish this blog, I actually wrote the script to format the output as an HTML table. Quite handy :-).

 

4. Show the results

 

Finally, here is resulting table with all the new PowerShell cmdlets (by module) in Windows Server 2016 TP2, compared to Windows Server 2012. Enjoy!

 

Module New Cmdlets WS 2016 TP2
Cmdlets
WS 2012 R2
Cmdlets
0 38 38
ActiveDirectory 0 147 147
ADRMSAdmin 0 21 21
AppLocker 0 5 5
Appx 8 14 6
+ Add-AppxVolume
+ Dismount-AppxVolume
+ Get-AppxDefaultVolume
+ Get-AppxVolume
+ Mount-AppxVolume
+ Move-AppxPackage
+ Remove-AppxVolume
+ Set-AppxDefaultVolume
BestPractices 0 4 4
BitLocker 0 13 13
BitsTransfer 0 8 8
BranchCache 0 32 32
CimCmdlets 0 14 14
CIPolicy 1 1 0
+ ConvertFrom-CIPolicy
ClusterAwareUpdating 0 17 17
ConfigCI 10 10 0
+ Edit-CIPolicyRule
+ Get-CIPolicy
+ Get-CIPolicyInfo
+ Get-SystemDriver
+ Merge-CIPolicy
+ New-CIPolicy
+ New-CIPolicyRule
+ Remove-CIPolicyRule
+ Set-HVCIOptions
+ Set-RuleOption
Defender 11 11 0
+ Add-MpPreference
+ Get-MpComputerStatus
+ Get-MpPreference
+ Get-MpThreat
+ Get-MpThreatCatalog
+ Get-MpThreatDetection
+ Remove-MpPreference
+ Remove-MpThreat
+ Set-MpPreference
+ Start-MpScan
+ Update-MpSignature
DFSN 0 23 23
DFSR 3 45 42
+ Get-DfsrDelegation
+ Grant-DfsrDelegation
+ Revoke-DfsrDelegation
DhcpServer 0 121 121
DirectAccessClientComponents 0 11 11
Dism 4 43 39
+ Add-WindowsCapability
+ Expand-WindowsCustomDataImage
+ Get-WindowsCapability
+ Remove-WindowsCapability
DnsClient 0 17 17
DnsServer 21 122 101
+ Add-DnsServerClientSubnet
+ Add-DnsServerQueryResolutionPolicy
+ Add-DnsServerRecursionScope
+ Add-DnsServerZoneScope
+ Add-DnsServerZoneTransferPolicy
+ Disable-DnsServerPolicy
+ Enable-DnsServerPolicy
+ Get-DnsServerClientSubnet
+ Get-DnsServerQueryResolutionPolicy
+ Get-DnsServerRecursionScope
+ Get-DnsServerZoneScope
+ Get-DnsServerZoneTransferPolicy
+ Remove-DnsServerClientSubnet
+ Remove-DnsServerQueryResolutionPolicy
+ Remove-DnsServerRecursionScope
+ Remove-DnsServerZoneScope
+ Remove-DnsServerZoneTransferPolicy
+ Set-DnsServerClientSubnet
+ Set-DnsServerQueryResolutionPolicy
+ Set-DnsServerRecursionScope
+ Set-DnsServerZoneTransferPolicy
EventTracingManagement 14 14 0
+ Add-EtwTraceProvider
+ Get-AutologgerConfig
+ Get-EtwTraceProvider
+ Get-EtwTraceSession
+ New-AutologgerConfig
+ New-EtwTraceSession
+ Remove-AutologgerConfig
+ Remove-EtwTraceProvider
+ Remove-EtwTraceSession
+ Send-EtwTraceSession
+ Set-AutologgerConfig
+ Set-EtwTraceProvider
+ Set-EtwTraceSession
+ Start-AutologgerConfig
FailoverClusters 2 84 82
+ New-ClusterNameAccount
+ Update-ClusterFunctionalLevel
GroupPolicy 0 29 29
HgsClient 11 11 0
+ Export-HgsGuardian
+ Get-HgsAttestationBaselinePolicy
+ Get-HgsClientConfiguration
+ Get-HgsGuardian
+ Grant-HgsKeyProtectorAccess
+ Import-HgsGuardian
+ New-HgsGuardian
+ New-HgsKeyProtector
+ Remove-HgsGuardian
+ Revoke-HgsKeyProtectorAccess
+ Set-HgsClientConfiguration
Hyper-V 26 204 178
+ Add-VMGroupMember
+ Add-VMSwitchTeamMember
+ Add-VMTPM
+ Disable-VMConsoleSupport
+ Enable-VMConsoleSupport
+ Get-VHDSet
+ Get-VHDSnapshot
+ Get-VMGroup
+ Get-VMHostCluster
+ Get-VMSwitchTeam
+ Get-VMTPM
+ Get-VMVideo
+ New-VMGroup
+ Optimize-VHDSet
+ Remove-VHDSnapshot
+ Remove-VMGroup
+ Remove-VMGroupMember
+ Remove-VMSwitchTeamMember
+ Rename-VMGroup
+ Set-VMHostCluster
+ Set-VMSwitchTeam
+ Set-VMTPM
+ Set-VMVideo
+ Start-VMTrace
+ Stop-VMTrace
+ Update-VMVersion
IISAdministration 17 17 0
+ Get-IISAppPool
+ Get-IISConfigCollectionItem
+ Get-IISConfigElement
+ Get-IISConfigSection
+ Get-IISConfigValue
+ Get-IISServerManager
+ Get-IISSite
+ New-IISConfigCollectionItem
+ New-IISSite
+ Remove-IISConfigCollectionItem
+ Remove-IISSite
+ Reset-IISServerManager
+ Set-IISConfigValue
+ Start-IISCommitDelay
+ Start-IISSite
+ Stop-IISCommitDelay
+ Stop-IISSite
International 0 18 18
iSCSI 0 13 13
IscsiTarget 0 28 28
ISE 0 3 3
Kds 0 6 6
Microsoft.PowerShell.Archive 2 2 0
+ Compress-Archive
+ Expand-Archive
Microsoft.PowerShell.Core 5 60 55
+ Debug-Job
+ Enter-PSHostProcess
+ Exit-PSHostProcess
+ Get-PSHostProcessInfo
+ Register-ArgumentCompleter
Microsoft.PowerShell.Diagnostics 0 5 5
Microsoft.PowerShell.Host 0 2 2
Microsoft.PowerShell.Management 4 86 82
+ Clear-RecycleBin
+ Get-Clipboard
+ Get-ItemPropertyValue
+ Set-Clipboard
Microsoft.PowerShell.ODataUtils 1 1 0
+ Export-ODataEndpointProxy
Microsoft.PowerShell.Security 0 13 13
Microsoft.PowerShell.Utility 11 105 94
+ ConvertFrom-String
+ Convert-String
+ Debug-Runspace
+ Disable-RunspaceDebug
+ Enable-RunspaceDebug
+ Format-Hex
+ Get-Runspace
+ Get-RunspaceDebug
– GetStreamHash
+ New-Guid
+ New-TemporaryFile
+ Wait-Debugger
+ Write-Information
Microsoft.WSMan.Management 0 13 13
MMAgent 0 5 5
MsDtc 0 41 41
NetAdapter 4 68 64
+ Disable-NetAdapterPacketDirect
+ Enable-NetAdapterPacketDirect
+ Get-NetAdapterPacketDirect
+ Set-NetAdapterPacketDirect
NetConnection 0 2 2
NetEventPacketCapture 0 23 23
NetLbfo 0 13 13
NetNat 0 13 13
NetQos 0 4 4
NetSecurity 0 85 85
NetSwitchTeam 0 7 7
NetTCPIP 0 34 34
NetWNV 0 19 19
NetworkConnectivityStatus 0 4 4
NetworkController 141 141 0
+ Add-NetworkControllerNode
+ Clear-NetworkControllerNodeContent
+ Disable-NetworkControllerNode
+ Enable-NetworkControllerNode
+ Export-NetworkController
+ Get-NetworkController
+ Get-NetworkControllerCanaryConfiguration
+ Get-NetworkControllerCluster
+ Get-NetworkControllerCredential
+ Get-NetworkControllerDevice
+ Get-NetworkControllerDeviceGroupingTestConfiguration
+ Get-NetworkControllerDeviceGroups
+ Get-NetworkControllerDeviceGroupUsage
+ Get-NetworkControllerDeviceUsage
+ Get-NetworkControllerDiagnostic
+ Get-NetworkControllerDiscoveredTopology
+ Get-NetworkControllerExternalTestRule
+ Get-NetworkControllerFabricRoute
+ Get-NetworkControllerGoalTopology
+ Get-NetworkControllerInterface
+ Get-NetworkControllerInterfaceUsage
+ Get-NetworkControllerIpPool
+ Get-NetworkControllerIpPoolStatistics
+ Get-NetworkControllerIpSubnetStatistics
+ Get-NetworkControllerLogicalNetwork
+ Get-NetworkControllerLogicalSubnet
+ Get-NetworkControllerMonitoringService
+ Get-NetworkControllerNode
+ Get-NetworkControllerPhysicalHostInterfaceParameter
+ Get-NetworkControllerPhysicalHostParameter
+ Get-NetworkControllerPhysicalSwitchCpuUtilizationParameter
+ Get-NetworkControllerPhysicalSwitchInterfaceParameter
+ Get-NetworkControllerPhysicalSwitchMemoryUtilizationParameter
+ Get-NetworkControllerPhysicalSwitchParameter
+ Get-NetworkControllerPSwitch
+ Get-NetworkControllerPublicIpAddress
+ Get-NetworkControllerServer
+ Get-NetworkControllerServerInterface
+ Get-NetworkControllerSwitchBgpPeer
+ Get-NetworkControllerSwitchBgpRouter
+ Get-NetworkControllerSwitchConfig
+ Get-NetworkControllerSwitchNetworkRoute
+ Get-NetworkControllerSwitchPort
+ Get-NetworkControllerSwitchPortChannel
+ Get-NetworkControllerSwitchVlan
+ Get-NetworkControllerTopologyConfiguration
+ Get-NetworkControllerTopologyDiscoveryStatistics
+ Get-NetworkControllerTopologyLink
+ Get-NetworkControllerTopologyNode
+ Get-NetworkControllerTopologyTerminationPoint
+ Get-NetworkControllerTopologyValidationReport
+ Get-NetworkControllerVirtualInterface
+ Get-NetworkControllerVirtualNetworkUsage
+ Get-NetworkControllerVirtualPort
+ Get-NetworkControllerVirtualServer
+ Get-NetworkControllerVirtualServerInterface
+ Get-NetworkControllerVirtualSwitch
+ Get-NetworkControllerVirtualSwitchPortParameter
+ Import-NetworkController
+ Install-NetworkController
+ Install-NetworkControllerCluster
+ New-NetworkControllerCanaryConfiguration
+ New-NetworkControllerCredential
+ New-NetworkControllerDevice
+ New-NetworkControllerDeviceGroupingTestConfiguration
+ New-NetworkControllerDeviceGroups
+ New-NetworkControllerExternalTestRule
+ New-NetworkControllerInterface
+ New-NetworkControllerIpPool
+ New-NetworkControllerLogicalNetwork
+ New-NetworkControllerMonitoringService
+ New-NetworkControllerNodeObject
+ New-NetworkControllerPhysicalHostInterfaceParameter
+ New-NetworkControllerPhysicalHostParameter
+ New-NetworkControllerPhysicalSwitchCpuUtilizationParameter
+ New-NetworkControllerPhysicalSwitchInterfaceParameter
+ New-NetworkControllerPhysicalSwitchMemoryUtilizationParameter
+ New-NetworkControllerPhysicalSwitchParameter
+ New-NetworkControllerPSwitch
+ New-NetworkControllerPublicIpAddress
+ New-NetworkControllerServer
+ New-NetworkControllerServerInterface
+ New-NetworkControllerSwitchBgpPeer
+ New-NetworkControllerSwitchBgpRouter
+ New-NetworkControllerSwitchNetworkRoute
+ New-NetworkControllerSwitchPortChannel
+ New-NetworkControllerSwitchVlan
+ New-NetworkControllerTopologyLink
+ New-NetworkControllerTopologyNode
+ New-NetworkControllerTopologyTerminationPoint
+ New-NetworkControllerVirtualInterface
+ New-NetworkControllerVirtualPort
+ New-NetworkControllerVirtualServer
+ New-NetworkControllerVirtualServerInterface
+ New-NetworkControllerVirtualSwitch
+ New-NetworkControllerVirtualSwitchPortParameter
+ Remove-NetworkControllerCanaryConfiguration
+ Remove-NetworkControllerCredential
+ Remove-NetworkControllerDevice
+ Remove-NetworkControllerDeviceGroupingTestConfiguration
+ Remove-NetworkControllerDeviceGroups
+ Remove-NetworkControllerExternalTestRule
+ Remove-NetworkControllerFabricRoute
+ Remove-NetworkControllerInterface
+ Remove-NetworkControllerIpPool
+ Remove-NetworkControllerLogicalNetwork
+ Remove-NetworkControllerLogicalSubnet
+ Remove-NetworkControllerNode
+ Remove-NetworkControllerPhysicalSwitchCpuUtilizationParameter
+ Remove-NetworkControllerPhysicalSwitchMemoryUtilizationParameter
+ Remove-NetworkControllerPSwitch
+ Remove-NetworkControllerPublicIpAddress
+ Remove-NetworkControllerServer
+ Remove-NetworkControllerServerInterface
+ Remove-NetworkControllerSwitchBgpPeer
+ Remove-NetworkControllerSwitchBgpRouter
+ Remove-NetworkControllerSwitchNetworkRoute
+ Remove-NetworkControllerSwitchPortChannel
+ Remove-NetworkControllerSwitchVlan
+ Remove-NetworkControllerTopologyLink
+ Remove-NetworkControllerTopologyNode
+ Remove-NetworkControllerTopologyTerminationPoint
+ Remove-NetworkControllerVirtualInterface
+ Remove-NetworkControllerVirtualPort
+ Remove-NetworkControllerVirtualServer
+ Remove-NetworkControllerVirtualServerInterface
+ Remove-NetworkControllerVirtualSwitch
+ Repair-NetworkControllerCluster
+ Set-NetworkController
+ Set-NetworkControllerCluster
+ Set-NetworkControllerDiagnostic
+ Set-NetworkControllerFabricRoute
+ Set-NetworkControllerGoalTopology
+ Set-NetworkControllerLogicalSubnet
+ Set-NetworkControllerNode
+ Set-NetworkControllerSwitchConfig
+ Set-NetworkControllerSwitchPort
+ Set-NetworkControllerTopologyConfiguration
+ Start-NetworkControllerTopologyDiscovery
+ Uninstall-NetworkController
+ Uninstall-NetworkControllerCluster
NetworkLoadBalancingClusters 0 35 35
NetworkSwitchManager 19 19 0
+ Disable-NetworkSwitchEthernetPort
+ Disable-NetworkSwitchFeature
+ Disable-NetworkSwitchVlan
+ Enable-NetworkSwitchEthernetPort
+ Enable-NetworkSwitchFeature
+ Enable-NetworkSwitchVlan
+ Get-NetworkSwitchEthernetPort
+ Get-NetworkSwitchFeature
+ Get-NetworkSwitchGlobalData
+ Get-NetworkSwitchVlan
+ New-NetworkSwitchVlan
+ Remove-NetworkSwitchEthernetPortIPAddress
+ Remove-NetworkSwitchVlan
+ Restore-NetworkSwitchConfiguration
+ Save-NetworkSwitchConfiguration
+ Set-NetworkSwitchEthernetPortIPAddress
+ Set-NetworkSwitchPortMode
+ Set-NetworkSwitchPortProperty
+ Set-NetworkSwitchVlanProperty
NetworkTransition 0 34 34
NFS 0 42 42
Nps -6 7 13
- Get-NpsRemediationServer
– Get-NpsRemediationServerGroup
– New-NpsRemediationServer
– New-NpsRemediationServerGroup
– Remove-NpsRemediationServer
– Remove-NpsRemediationServerGroup
PackageManagement 10 10 0
+ Find-Package
+ Get-Package
+ Get-PackageProvider
+ Get-PackageSource
+ Install-Package
+ Register-PackageSource
+ Save-Package
+ Set-PackageSource
+ Uninstall-Package
+ Unregister-PackageSource
PcsvDevice 4 9 5
+ Clear-PcsvDeviceLog
+ Get-PcsvDeviceLog
+ Set-PcsvDeviceNetworkConfiguration
+ Set-PcsvDeviceUserPassword
Pester 20 20 0
+ AfterAll
+ AfterEach
+ Assert-MockCalled
+ Assert-VerifiableMocks
+ BeforeAll
+ BeforeEach
+ Context
+ Describe
+ Get-MockDynamicParameters
+ Get-TestDriveItem
+ In
+ InModuleScope
+ Invoke-Mock
+ Invoke-Pester
+ It
+ Mock
+ New-Fixture
+ Set-DynamicParameterVariables
+ Setup
+ Should
PKI 0 17 17
PnpDevice 4 4 0
+ Disable-PnpDevice
+ Enable-PnpDevice
+ Get-PnpDevice
+ Get-PnpDeviceProperty
PowerShellGet 11 11 0
+ Find-Module
+ Get-InstalledModule
+ Get-PSRepository
+ Install-Module
+ Publish-Module
+ Register-PSRepository
+ Save-Module
+ Set-PSRepository
+ Uninstall-Module
+ Unregister-PSRepository
+ Update-Module
PrintManagement 0 22 22
PSDesiredStateConfiguration 5 17 12
+ Connect-DscConfiguration
+ Find-DscResource
+ Get-DscConfigurationStatus
+ Invoke-DscResource
+ Publish-DscConfiguration
PSDiagnostics 0 10 10
PSReadline 5 5 0
+ Get-PSReadlineKeyHandler
+ Get-PSReadlineOption
+ Remove-PSReadlineKeyHandler
+ Set-PSReadlineKeyHandler
+ Set-PSReadlineOption
PSScheduledJob 0 16 16
PSWorkflow 0 2 2
PSWorkflowUtility 0 1 1
RemoteAccess 14 121 107
+ Add-BgpRouteAggregate
+ Add-VpnSstpProxyRule
+ Clear-BgpRouteFlapDampening
+ Disable-BgpRouteFlapDampening
+ Enable-BgpRouteFlapDampening
+ Get-BgpRouteAggregate
+ Get-BgpRouteFlapDampening
+ Get-VpnSstpProxyRule
+ New-VpnSstpProxyRule
+ Remove-BgpRouteAggregate
+ Remove-VpnSstpProxyRule
+ Set-BgpRouteAggregate
+ Set-BgpRouteFlapDampening
+ Set-VpnSstpProxyRule
RemoteDesktop 5 78 73
+ Export-RDPersonalSessionDesktopAssignment
+ Get-RDPersonalSessionDesktopAssignment
+ Import-RDPersonalSessionDesktopAssignment
+ Remove-RDPersonalSessionDesktopAssignment
+ Set-RDPersonalSessionDesktopAssignment
ScheduledTasks 0 19 19
SecureBoot 0 5 5
ServerCore 0 2 2
ServerManager 0 7 7
ServerManagerTasks 0 11 11
ShieldedVMDataFile 3 3 0
+ Import-ShieldingDataFile
+ New-VolumeIDQualifier
+ Protect-ShieldingDataFile
ShieldedVMTemplate 1 1 0
+ Protect-ServerVHDX
SmbShare 0 35 35
SmbWitness 0 3 3
SoftwareInventoryLogging 0 11 11
StartScreen 0 3 3
Storage 32 140 108
+ Block-FileShareAccess
+ Clear-StorageDiagnosticInfo
+ Debug-FileShare
+ Debug-StorageSubSystem
+ Disable-PhysicalDiskIdentification
+ Disable-StorageDiagnosticLog
+ Enable-PhysicalDiskIdentification
+ Enable-StorageDiagnosticLog
+ Get-DedupProperties
+ Get-DiskSNV
+ Get-DiskStorageNodeView
+ Get-FileShare
+ Get-FileShareAccessControlEntry
+ Get-StorageAdvancedProperty
+ Get-StorageDiagnosticInfo
+ Get-StorageEnclosureSNV
+ Get-StorageEnclosureStorageNodeView
+ Get-StorageFaultDomain
+ Get-StorageFileServer
+ Grant-FileShareAccess
+ New-FileShare
+ New-StorageFileServer
+ Optimize-StoragePool
+ Remove-FileShare
+ Remove-StorageFileServer
+ Revoke-FileShareAccess
+ Set-FileShare
+ Set-StorageFileServer
+ Start-StorageDiagnosticLog
+ Stop-StorageDiagnosticLog
+ Stop-StorageJob
+ Unblock-FileShareAccess
StorageQoS 6 6 0
+ Get-StorageQoSFlow
+ Get-StorageQoSPolicy
+ Get-StorageQoSVolume
+ New-StorageQoSPolicy
+ Remove-StorageQoSPolicy
+ Set-StorageQoSPolicy
StorageReplica 11 11 0
+ Get-SRGroup
+ Get-SRPartnership
+ New-SRGroup
+ New-SRPartnership
+ Remove-SRGroup
+ Remove-SRPartnership
+ Set-SRGroup
+ Set-SRPartnership
+ Suspend-SRGroup
+ Sync-SRGroup
+ Test-SRTopology
TLS 3 7 4
+ Disable-TlsCipherSuite
+ Enable-TlsCipherSuite
+ Get-TlsCipherSuite
TroubleshootingPack 0 2 2
TrustedPlatformModule 0 11 11
UpdateServices 4 16 12
+ Add-WsusDynamicCategory
+ Get-WsusDynamicCategory
+ Remove-WsusDynamicCategory
+ Set-WsusDynamicCategory

Using PowerShell and Excel PivotTables to understand the files on your disk

$
0
0

 

Introduction

I am a big fan of two specific technologies that usually don’t get mentioned together: PowerShell and Excel PivotTables. It started when I was explaining PivotTables to someone and the main issue I had was finding a good set of example data that is familiar to everyone. That’s when it hit me. People using a computer have tons of files stored in their local disks and most don’t have a clue about those files. That’s the perfect example!

So I set out to gather the steps to gather information about your local files and extract the most information about it.

 

Step 1 – List the questions you need to answer

To start, here are a few questions you might have about the files in your local hard drive:

  • Which folder is storing the most files or using up the most space in your local disk?
  • What kind of data (pictures, music, video) is using the most space in your local disk?
  • What is the average size of the pictures you took this year?
  • How much of the files in your local disk was created in the last 30 days? Or this year?
  • Which day of the week do you create the most new pictures? Or PowerPoint presentations?

Now you could write a PowerShell script to answer any of those questions. It would in itself be a great programming exercise, but some would be quite tricky to code. However, those questions are just the tip of the iceberg. Given that dataset, you could come up with many, many more. So the point is that you would use Excel PivotTables to explore the data set and come up with the answers while interacting with it.

 

Step 2 – Gather the required raw data

In any work with PivotTables and BI (Business Inteligence) in general, you need to identify the raw data that you can use to produce the answers to your questions. As you problably already figured out, we’ll use PowerShell to query the file system and extract that data. Using the Get-ChildItem (more commonly known by its alias: DIR), you can get information about each folder and file on the disk.

Now with possibly hundreds of thousands of files, you want to make sure you gather only the necessary data. That will make it faster to obtain and will give Excel less data to chew on, which is always a good thing.Here’s what you could use (running as an administrator), to get information:

Dir C:\ -Recurse | Select FullName, Extension, Length, CreationTime, Attributes

Next, you want to make sure you transform into into a format that Excel can consume. Luckly, PowerShell has a cmdlet to transform data into Comma-Separated Values, also known as CSV. You need to also include something to avoid any permission errors while accessing the data and output the results to a file, so we can load it into Excel. Here’s the final command line:

Dir \ -Recurse -ErrorAction SilentlyContinue | Select FullName, Extension, Length, CreationTime, Attributes | ConvertTo-Csv  -NoTypeInformation | Out-File C:\AllFiles.csv

That command will take several minutes to run, depending on the number of files on your disk, the speed of the disk and the speed of your computer. The resulting file can get quite big as well. In my case, it took a few minutes and the resulting file size was 135,359,136 bytes (or around 130MB).

 

Step 3 – Load into Excel and build the right table

With the AllFiles.csv file available, we can now load the raw data in Excel and start working with it. Just open Excel (I’m using Excel 2016 Preview) and load the CSV file. When importing, make sure to select “Delimited” in the first page of the wizard and check the “comma” checkbox in the second page.

clip_image001

clip_image002

Excel loaded the data and I ended up with 412,012 rows (including one row for the header). However the formating was a little lacking…

clip_image003

clip_image004

Next, I applied a format each column for best results. You want to format the Length to a number with comma separators and no decimals. To do that, select the third column and click to format as a number.

clip_image005

You can also use the same process to format the fourth column with a more interesting date format.

clip_image006

Here’s what it looks like at this point.

clip_image007

Last but not least, you want to freeze the top row of the spreadsheet and format the whole think as a table.

clip_image008

clip_image009

clip_image010

Here’s the final look for this phase:

clip_image011

 

Step 4 – Add fields that will help with your questions

While you have most of the data you need readily acessible, it helps to add to your table some additional fields. You could add those to your original PowerShell query, but Excel is probably better equipped to generate those extra columns on the fly.

Also, you might notice the need to add those only after you have played with the data a bit with Excel. That will also give you a chance to brush up on your Excel formula skills. In this example, we will add the following fields to the table:

  • CreatedYear – Year the file was created. Formula =YEAR([@CreationTime])
  • CreatedDays – Days since the file was created. Formula =TODAY()-[@CreationTime]
  • CreatedDow – Day of the week the file was created. Formula = =WEEKDAY([@CreationTime])
  • IsFolder – True if the item is folder, not a file. Formula =NOT(ISERROR(FIND("Directory",[@Attributes])))
  • TopFolder – First folder in the file name. Formula = =IF(ISERROR(FIND("\",[@FullName],4)),"C:\",LEFT([@FullName],FIND("\",[@FullName],4)))

Just insert the extra columns (right click column, click insert) and fill in the title and the formula. Excel will apply the formula to all cells in that column automatically. You will need to reformat the columns for CreatedYear, CreatedDays, CreatedDow to show as regular numbers, without any decimals.

clip_image012

 

Step 5 – Create a Pivot Table

With all the columns in place, you should proceed and create the Pivot Table. Just click on a cell at the table and choose Pivot Table under the “Insert” tab.

clip_image013

That will create an empty PivotTable with all the fields on the table available to you.

clip_image014

Now you just have to drag the fields to one of the four white boxes below the field list: Filters, Columns, Rows or Values. You will have options on how things are summarized (count, sum, average), how to format the data, how fields are sorted, etc.

To start, you can drag TopFolder to the Rows and Length to the Values. You should make adjustments to the “Count of Length” under Values to format as a number with no decimals.

clip_image015

You will also need to change the “More sort options” of the “TopFolder” field to sort on descending order by “Sum of Length”.

clip_image016

To avoid counting folders, you could add the IsFolder field to the filter box and then click on cell B1 to change the filter to false. Here’s what you should get: A sorted list of top-level folders with the number of files in each.

clip_image017

Simply by changing the settings in “Count of Length” to make it a sum, you get the list of top folders with the total size in bytes for each one:

clip_image018

Those two will answer the first question on our list: Which folder is storing the most files or using up the most space in your local disk?

 

Step 6 – Tweak the PivotTable to your heart’s content

Now you have everything you need to slice and dice the data, answering any of the questions posed at the beginning of this blog. Here are a few examples, with specific comments for each one. Top 20 extensions for all the disk. Start with dragging extension to the rows, then filter by TOP 10 and adjust:

clip_image019

So I have a lot used by programs (DLL, EXE), but also a fair amount of bytes used by music (WMA), videos (MP4) and pictures (JPG).

clip_image020

Next I could filter only to files under the C:\Users\ folder, which would exclude the operating system. After that, PowerPoint files jump up to number 4, right after music, videos and pictures.

clip_image021

If I want to look at the size of a particular kind of file, I would filter by that extension and add a few things to the values. To look at statistics of pictures I took this year, I dragged length to the values a few times and adjusted to do count, sum and max. I also moved the “∑ Values” from Columns to Rows. I finished by adding Created Year to the filters and selecting 2015.

clip_image022

Lastly, I looked at the the breakdown of files by the day of the week they were created. I was looking at the total number of files created in a given day of the week, broken down by the top 20 file extension. I had filters for user files only and restricted it also to files created in 2015. I also removed the Grand totals for this one. Apparently I did a lot of my file creation this year on this computer on Thursdays and Fridays.

clip_image023

Finally, here’s a more complex scenario showing a total of files, capacity, oldest year and largest size. I played with changing the default name of the values, which makes the labels a bit more readable. There’s also multiple items in the rows, creating a hieararchy. I’ll let you figure out how to get to this particular view.

clip_image024

 

Conclusion

I hope this post was a good example of all the things you can do with Excel PivotTables. In my view, this gets really powerful if you have an interesting data set to play with, which PowerShell and the file system were glad to provide for us. Let me know if you found this useful and share your experience with file/folder statistics, gathering data sets with PowerShell and PivotTables.

For my next data set, I was thinking about gathering some data about e-mails. There’s another thing that everyone has in large quantities…

Drive Performance Report Generator – PowerShell script using DiskSpd by Arnaud Torres

$
0
0

Arnaud Torres is a Senior Premier Field Engineer at Microsoft in France who sent me the PowerShell script below called “Drive Performance Report Generator”.

He created the script to test a wide range of profiles in one run to allow people to build a baseline of their storage using DiskSpd.EXE.

The script is written in PowerShell v1 and was tested on a Windows Server 2008 SP2 (really!), Windows Server 2012 R2 and Windows 10.

It displays results in real time, is highly documented and creates a text report which can be imported as CSV in Excel.

 

Thanks to Arnaud for sharing!

 

———————-

 

# Drive performance Report Generator

# by Arnaud TORRES

# Microsoft provides script, macro, and other code examples for illustration only, without warranty either expressed or implied, including but not

# limited to the implied warranties of merchantability and/or fitness for a particular purpose. This script is provided ‘as is’ and Microsoft does not

# guarantee that the following script, macro, or code can be used in all situations.

# Script will stress your computer CPU and storage, be sure that no critical workload is running

 

# Clear screen

Clear

 

write-host “DRIVE PERFORMANCE REPORT GENERATOR” -foregroundcolor green

write-host “Script will stress your computer CPU and storage layer (including network if applciable !), be sure that no critical workload is running” -foregroundcolor yellow

write-host “Microsoft provides script, macro, and other code examples for illustration only, without warranty either expressed or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose. This script is provided ‘as is’ and Microsoft does not guarantee that the following script, macro, or code can be used in all situations.” -foregroundcolor darkred

”   “

“Test will use all free space on drive minus 2 GB !”

“If there are less than 4 GB free test will stop”

 

# Disk to test

$Disk = Read-Host ‘Which disk would you like to test ? (example : D:)’

# $Disk = “D:”

if ($disk.length -ne 2){“Wrong drive letter format used, please specify the drive as D:”

                         Exit}

if ($disk.substring(1,1) -ne “:”){“Wrong drive letter format used, please specify the drive as D:”

                         Exit}

$disk = $disk.ToUpper()

 

# Reset test counter

$counter = 0

 

# Use 1 thread / core

$Thread = “-t”+(Get-WmiObject win32_processor).NumberofCores

 

# Set time in seconds for each run

# 10-120s is fine

$Time = “-d1″

 

# Outstanding IOs

# Should be 2 times the number of disks in the RAID

# Between  8 and 16 is generally fine

$OutstandingIO = “-o16″

 

# Disk preparation

# Delete testfile.dat if it exists

# The test will use all free space -2GB

 

$IsDir = test-path -path “$Disk\TestDiskSpd”

$isdir

if ($IsDir -like “False”){new-item -itemtype directory -path “$Disk\TestDiskSpd\”}

# Just a little security, in case we are working on a compressed drive …

compact /u /s $Disk\TestDiskSpd\

 

$Cleaning = test-path -path “$Disk\TestDiskSpd\testfile.dat”

if ($Cleaning -eq “True”)

{“Removing current testfile.dat from drive”

  remove-item $Disk\TestDiskSpd\testfile.dat}

 

$Disks = Get-WmiObject win32_logicaldisk

$LogicalDisk = $Disks | where {$_.DeviceID -eq $Disk}

$Freespace = $LogicalDisk.freespace

$FreespaceGB = [int]($Freespace / 1073741824)

$Capacity = $freespaceGB – 2

$CapacityParameter = “-c”+$Capacity+”G”

$CapacityO = $Capacity * 1073741824

 

if ($FreespaceGB -lt “4”)

{

       “Not enough space on the Disk ! More than 4GB needed”

       Exit

}

 

write-host ” “

$Continue = Read-Host “You are about to test $Disk which has $FreespaceGB GB free, do you wan’t to continue ? (Y/N) “

if ($continue -ne “y” -or $continue -ne “Y”){“Test Cancelled !!”

                                        Exit}

 

”   “

“Initialization can take some time, we are generating a $Capacity GB file…”

”  “

 

 

# Initialize outpout file

$date = get-date

 

# Add the tested disk and the date in the output file

“Disque $disk, $date” >> ./output.txt

 

# Add the headers to the output file

“Test N#, Drive, Operation, Access, Blocks, Run N#, IOPS, MB/sec, Latency ms, CPU %” >> ./output.txt

 

# Number of tests

# Multiply the number of loops to change this value

# By default there are : (4 blocks sizes) X (2 for read 100% and write 100%) X (2 for Sequential and Random) X (4 Runs of each)

$NumberOfTests = 64

 

”  “

write-host “TEST RESULTS (also logged in .\output.txt)” -foregroundcolor yellow

 

# Begin Tests loops

 

# We will run the tests with 4K, 8K, 64K and 512K blocks

(4,8,64,512) | % { 

$BlockParameter = (“-b”+$_+”K”)

$Blocks = (“Blocks “+$_+”K”)

 

# We will do Read tests and Write tests

  (0,100) | % {

      if ($_ -eq 0){$IO = “Read”}

      if ($_ -eq 100){$IO = “Write”}

      $WriteParameter = “-w”+$_

 

# We will do random and sequential IO tests

  (“r”,”si”) | % {

      if ($_ -eq “r”){$type = “Random”}

      if ($_ -eq “si”){$type = “Sequential”}

      $AccessParameter = “-“+$_

 

# Each run will be done 4 times

  (1..4) | % {

     

      # The test itself (finally !!)

         $result = .\diskspd.exe $CapacityPArameter $Time $AccessParameter $WriteParameter $Thread $OutstandingIO $BlockParameter -h -L $Disk\TestDiskSpd\testfile.dat

     

      # Now we will break the very verbose output of DiskSpd in a single line with the most important values

      foreach ($line in $result) {if ($line -like “total:*”) { $total=$line; break } }

      foreach ($line in $result) {if ($line -like “avg.*”) { $avg=$line; break } }

      $mbps = $total.Split(“|”)[2].Trim()

      $iops = $total.Split(“|”)[3].Trim()

      $latency = $total.Split(“|”)[4].Trim()

      $cpu = $avg.Split(“|”)[1].Trim()

      $counter = $counter + 1

 

      # A progress bar, for the fun

      Write-Progress -Activity “.\diskspd.exe $CapacityPArameter $Time $AccessParameter $WriteParameter $Thread $OutstandingIO $BlockParameter -h -L $Disk\TestDiskSpd\testfile.dat” -status “Test in progress” -percentComplete ($counter / $NumberofTests * 100)

     

      # Remove comment to check command line “.\diskspd.exe $CapacityPArameter $Time $AccessParameter $WriteParameter $Thread -$OutstandingIO $BlockParameter -h -L $Disk\TestDiskSpd\testfile.dat”

     

      # We output the values to the text file

      “Test $Counter,$Disk,$IO,$type,$Blocks,Run $_,$iops,$mbps,$latency,$cpu”  >> ./output.txt

 

      # We output a verbose format on screen

      “Test $Counter, $Disk, $IO, $type, $Blocks, Run $_, $iops iops, $mbps MB/sec, $latency ms, $cpu CPU”

}

}

}

}

 

Twenty years as a Microsoft Certified Professional – time flies when you’re having fun

$
0
0

 

I just noticed that last week was the 20th anniversary of my first Microsoft certification. I had to travel nearly 500 miles (from Fortaleza to Recife) to reach the closest official testing center available in Brazil in August 1995.

You’re probably thinking that I started by taking the Windows 95 exam, but it was actually the Windows 3.1 exam (which included a lot of MS-DOS 6.x stuff). The Windows 95 exam was my next one, but that only happened over a year later in December 1996.

I went on to take absolutely all of the Windows NT 4.0 and Windows 2000 exams (many of them in their beta version). At that point we had multiple Microsoft Certified Partners in Fortaleza and I worked for one of them.

I continued to take lots of exams even after moved to the US in October 2000 and after I joined Microsoft in October 2002. I only slowed down a bit after joining the Windows Server engineering team in October 2007.

In 2009 I achieved my last certification as a Microsoft Certified Master on SQL Server 2008. That took a few weeks of training, a series of written exams and a final, multi-hour lab exam. Exciting stuff! That also later granted me a charter certification as  Microsoft Certified Solutions Master (Data Platform), Microsoft Certified Solutions Expert (Data Platform) and Microsoft Certified Solutions Associate (SQL Server 2012).

My full list is shown below. In case you’re wondering, the Windows 10 exam (Configuring Windows Devices) is already in development and you can find the details at https://www.microsoft.com/learning/en-us/exam-70-697.aspx.

 

image

image

image

Raw notes from the Storage Developer Conference 2015 (SNIA SDC 2015)

$
0
0

Notes and disclaimers:

  • This blog post contains raw notes for some of the SNIA’s SDC 2015 presentations (SNIA’s Storage Developers Conference 2015)
  • These notes were typed during the talks and they may include typos and my own misinterpretations.
  • Text in the bullets under each talk are quotes from the speaker or text from the speaker slides, not my personal opinion.
  • If you feel that I misquoted you or badly represented the content of a talk, please add a comment to the post.
  • I spent limited time fixing typos or correcting the text after the event. There are only so many hours in a day…
  • I have not attended all sessions (since there are many being delivered at a time, that would actually not be possible :-)…
  • SNIA usually posts the actual PDF decks a few weeks after the event. Attendees have access immediately.
  • You can find the event agenda at http://www.snia.org/events/storage-developer/agenda

 

Understanding the Intel/Micron 3D XPoint Memory
Jim Handy, General Director, Objective Analysis

  • Memory analyst, SSD analyst, blogs: http://thememoryguy.com, http://thessdguy.com
  • Not much information available since the announcement in July: http://newsroom.intel.com/docs/DOC-6713
  • Agenda: What? Why? Who? Is the world ready for it? Should I care? When?
  • What: Picture of the 3D XPoint concept (pronounced 3d-cross-point). Micron’s photograph of “the real thing”.
  • Intel has researched PCM for 45 years. Mentioned in an Intel article at “Electronics” in Sep 28, 1970.
  • The many elements that have been tried shown in the periodic table of elements.
  • NAND laid the path to the increased hierarchy levels. Showed prices of DRAM/NAND from 2001 to 2015. Gap is now 20x.
  • Comparing bandwidth to price per gigabytes for different storage technologies: Tape, HDD, SSD, 3D XPoint, DRAM, L3, L2, L1
  • Intel diagram mentions PCM-based DIMMs (far memory) and DDR DIMMs (near memory).
  • Chart with latency for HDD SAS/SATA, SSD SAS/SATA, SSD NVMe, 3D XPoint NVMe – how much of it is the media, how much is the software stack?
  • 3D Xpoint’s place in the memory/storage hierarchy. IOPS x Access time. DRAM, 3D XPoint (Optane), NVMe SSD, SATA SSD
  • Great gains at low queue depth. 800GB SSD read IOPS using 16GB die. IOPS x queue depth of NAND vs. 3D XPoint.
  • Economic benefits: measuring $/write IOPS for SAS HDD, SATA SSD, PCIe SSD, 3D XPoint
  • Timing is good because: DRAM is running out of speed, NVDIMMs are catching on, some sysadmins understand how to use flash to reduce DRAM needs
  • Timing is bad because: Nobody can make it economically, no software supports SCM (storage class memory), new layers take time to establish Why should I care: better cost/perf ratio, lower power consumption (less DRAM, more perf/server, lower OpEx), in-memory DB starts to make sense
  • When? Micron slide projects 3D XPoint at end of FY17 (two months ahead of CY). Same slide shows NAND production surpassing DRAM production in FY17.
  • Comparing average price per GB compared to the number of GB shipped over time. It takes a lot of shipments to lower price.
  • Looking at the impact in the DRAM industry if this actually happens. DRAM slows down dramatically starting in FY17, as 3D XPoint revenues increase (optimistic).

 

Next Generation Data Centers: Hyperconverged Architectures Impact On Storage
Mark OConnell, Distinguished Engineer, EMC

  • History: Client/Server –> shared SANs –> Scale-Out systems
  • >> Scale-Out systems: architecture, expansion, balancing
  • >> Evolution of the application platform: physical servers à virtualization à Virtualized application farm
  • >> Virtualized application farms and Storage: local storage à Shared Storage (SAN) à Scale-Out Storage à Hyper-converged
  • >> Early hyper-converged systems: HDFS (Hadoop) à JVM/Tasks/HDFS in every node
  • Effects of hyper-converged systems
  • >> Elasticity (compute/storage density varies)
  • >> App management, containers, app frameworks
  • >> Storage provisioning: frameworks (openstack swift/cinder/manila), pure service architectures
  • >> Hybrid cloud enablement. Apps as self-describing bundles. Storage as a dynamically bound service. Enables movement off-prem.

 

Implications of Emerging Storage Technologies on Massive Scale Simulation Based Visual Effects
Yahya H. Mirza, CEO/CTO, Aclectic Systems Inc

  • Steve Jobs quote: “You‘ve got to start with the customer experience and work back toward the technology”.
  • Problem 1: Improve customer experience. Higher resolution, frame rate, throughput, etc.
  • Problem 2: Production cost continues to rise.
  • Problem 3: Time to render single frame remains constant.
  • Problem 4: Render farm power and cooling increasing. Coherent shared memory model.
  • How do you reduce customer CapEx/OpEx. Low efficiency: 30% CPU. Prooblem is memory access latency and I/O.
  • Production workflow: modeling, animation/simulation/shading, lighting, rendering, compositing. More and more simulation.
  • Concrete production experiment: 2005. Story boards. Attempt to create a short film. Putting himself in the customer’s shoes. Shot decomposition.
  • Real 3-minute short costs $2 million. Animatic to pitch the project.
  • Character modeling and development. Includes flesh and muscle simulation. A lot of it done procedurally.
  • Looking at Disney’s “Big Hero 6”, DreamWorks’ “Puss in Boots” and Weta’s “The Hobbit”, including simulation costs, frame rate, resolution, size of files, etc.
  • Physically based rendering: global illumination effects, reflection, shadows. Comes down to light transport simulation, physically based materials description.
  • Exemplary VFX shot pipeline. VFX Tool (Houdini/Maya), Voxelized Geometry (OpenVDB), Scene description (Alembic), Simulation Engine (PhysBam), Simulation Farm (RenderFarm), Simulation Output (OpenVDB), Rendering Engine (Mantra), Render Farm (RenderFarm), Output format (OpenEXR), Compositor (Flame), Long-term storage.
  • One example: smoke simulation – reference model smoke/fire VFX. Complicated physical model. Hotspot algorithms: monte-carlo integration, ray-intersection test, linear algebra solver (multigrid).
  • Storage implications. Compute storage (scene data, simulation data), Long term storage.
  • Is public cloud computing viable for high-end VFX?
  • Disney’s data center. 55K cores across 4 geos.
  • Vertically integrated systems are going to be more and more important. FPGAs, ARM-based servers.
  • Aclectic Colossus smoke demo. Showing 256x256x256.
  • We don’t want coherency; we don’t want sharing. Excited about Intel OmniPath.
  • http://www.intel.com/content/www/us/en/high-performance-computing-fabrics/omni-path-architecture-fabric-overview.html

 

How Did Human Cells Build a Storage Engine?
Sanjay Joshi, CTO Life Sciences, EMC

  • Human cell, Nuclear DNA, Transcription and Translation, DNA Structure
  • The data structure: [char(3*10^9) human_genome] strand
  • 3 gigabases [(3*10^9)*2]/8 = ~750MB. With overlaps, ~1GB per cell. 15-70 trillion cells.
  • Actual files used to store genome are bigger, between 10GB and 4TB (includes lots of redundancy).
  • Genome sequencing will surpass all other data types by 2040
  • Protein coding portion is just a small portion of it. There’s a lot we don’t understand.
  • Nuclear DNA: Is it a file? Flat file system, distributed, asynchronous. Search header, interpret, compile, execute.
  • Nuclear DNA properties: Large:~20K genes/cell, Dynamic: append/overwrite/truncate, Semantics: strict, Consistent: No, Metadata: fixed, View: one-to-many
  • Mitochondrial DNA: Object? Distributed hash table, a ring with 32 partitions. Constant across generations.
  • Mitochondrial DNA: Small: ~40 genes/cell, Static: constancy, energy functions, Semantics: single origin, Consistent: Yes, Metadata: system based, View: one-to-one
  • File versus object. Comparing Nuclear DNA and Mitochondrial DNA characteristics.
  • The human body: 7500 names parts, 206 regularly occurring bones (newborns close to 300), ~640 skeletal muscles (320 pairs), 60+ organs, 37 trillion cells. Distributed cluster.
  • Mapping the ISO 7 layers to this system. Picture.
  • Finite state machine: max 10^45 states at 4*10^53 state-changes/sec. 10^24 NOPS (nucleotide ops per second) across biosphere.
  • Consensus in cell biology: Safety: under all conditions: apoptosis. Availability: billions of replicate copies. Not timing dependent: asynchronous. Command completion: 10 base errors in every 10,000 protein translation (10 AA/sec).
  • Object vs. file. Object: Maternal, Static, Haploid. Small, Simple, Energy, Early. File: Maternal and paternal, Diploid. Scalable, Dynamic, Complex. All cells are female first.

 

Move Objects to LTFS Tape Using HTTP Web Service Interface
Matt Starr, Chief Technical Officer, Spectra Logic
Jeff Braunstein, Developer Evangelist, Spectra Logic

  • Worldwide data growth: 2009 = 800 EB, 2015 = 6.5ZB, 2020 = 35ZB
  • Genomics. 6 cows = 1TB of data. They keep it forever.
  • Video data. SD to Full HD to 4K UHD (4.2TB per hours) to 8K UHD. Also kept forever.
  • Intel slide on the Internet minute. 90% of the people of the world never took a picture with anything but a camera phone.
  • IOT – Total digital info create or replicated.
  • $1000 genome scan take 780MB fully compressed. 2011 HiSeq-2000 scanner generates 20TB per month. Typical camera generates 105GB/day.
  • More and more examples.
  • Tape storage is the lowest cost. But it’s also complex to deploy. Comparing to Public and Private cloud…
  • Pitfalls of public cloud – chart of $/PB/day. OpEx per PB/day reaches very high for public cloud.
  • Risk of public cloud: Amazon has 1 trillion objects. If they lose 1% it would 10 billion objects.
  • Risk of public cloud: Nirvanix. VC pulled the plug in September 2013.
  • Cloud: Good: toolkits, naturally WAN friendly, user expectation: put it away.
  • What if: Combine S3/Object with tape. Spectra S3 – Front end is REST, backend is LTFS tape.
  • Cost: $.09/GB. 7.2PB. Potentially a $0.20 two-copy archive.
  • Automated: App or user-built. Semi-Automated: NFI or scripting.
  • Information available at https://developer.spectralogic.com
  • All the tools you need to get started. Including simulator of the front end (BlackPearl) in a VM.
  • S3 commands, plus data to write sequentially in bulk fashion.
  • Configure user for access, buckets.
  • Deep storage browser (source code on GitHub) allows you to browse the simulated storage.
  • SDK available in Java, C#, many others. Includes integration with Visual Studio (demonstrated).
  • Showing sample application. 4 lines of code from the SDK to move a folder to tape storage.
  • Q: Access times when not cached? Hours or minutes. Depends on if the tape is already in the drive. You can ask to pull those to cache, set priorities. By default GET has higher priority than PUT. 28TB or 56TB of cache.
  • Q: Can we use CIFS/NFS? Yes, there is an NFI (Network File Interface) using CIFS/NFS, which talks to the cache machine. Manages time-outs.
  • Q: Any protection against this being used as disk? System monitors health of the tape. Using an object-based interface helps.
  • Q: Can you stage a file for some time, like 24h? There is a large cache. But there are no guarantees on the latency. Keeping it on cache is more like Glacier. What’s the trigger to bring the data?
  • Q: Glacier? Considering support for it. Data policy to move to lower cost, move it back (takes time). Not a lot of product or customers demanding it. S3 has become the standard, not sure if Glacier will be that for archive.
  • Q: Drives are a precious resource. How do you handle overload? By default, reads have precedence over writes. Writes usually can wait.

 

Taxonomy of Differential Compression
Liwei Ren, Scientific Adviser, Trend Micro

  • Mathematical model for describing file differences
  • Lossless data compression categories: data compression (one file), differential compression (two files), data deduplication (multiple files)
  • Purposes: network data transfer acceleration and storage space reduction
  • Areas for DC – mobile phones’ firmware over the air, incremental update of files for security software, file synchronization and transfer over WAN, executable files
  • Math model – Diff procedure: Delta = T – R, Merge procedure: T = R + Delta. Model for reduced network bandwidth, reduced storage cost.
  • Applications: backup, revision control system, patch management, firmware over the air, malware signature update, file sync and transfer, distributed file system, cloud data migration
  • Diff model. Two operations: COPY (source address, size [, destination address] ), ADD (data block, size [, destination address] )
  • How to create the delta? How to encode the delta into a file? How to create the right sequence of COPY/ADD operations?
  • Top task is an effective algorithm to identify common blocks. Not covering it here, since it would take more than half an hour…
  • Modeling a diff package. Example.
  • How do you measure the efficiency of an algorithm? You need a cost model.
  • Categorizing: Local DC – LDC (xdelta, zdelta, bsdiff), Remote DC – RDC (rsync, RDC protocol, tsync), Iterative – IDC (proposed)
  • Categorizing: Not-in-place merging: general files (xdelta, zdelta, bsdiff), executable files (bsdiff, courgette)
  • Categorizing: In place merging: firmware as general files (FOTA), firmware as executable files (FOTA)
  • Topics in depth: LDC vs RDC vs IDC for general files
  • Topics in depth: LDC for executable files
  • Topics in depth: LDC for in-place merging

 

New Consistent Hashing Algorithms for Data Storage
Jason Resch, Software Architect, Cleversafe

  • Introducing a new algorithm for hashing.
  • Hashing is useful. Used commonly is distributed storage, distributed caching.
  • Independent users can coordinate (readers know where writers would write without talking to them).
  • Typically, resizing a Hash Table is inefficient. Showing example.
  • That’s why we need “Stable Hashing”. Showing example. Only a small portion of the keys need to be re-mapped.
  • Stable hashing becomes a necessity when system is stateful and/or transferring state is expensive,
  • Used in Caching/Routing (CARP), DHT/Storage (Gluster, DynamoDB, Cassandra, ceph, openstack)
  • Stable Hashing with Global Namespaces. If you have a file name, you know what node has the data.
  • Eliminates points of contention, no metadata systems. Namespace is fixed, but the system is dynamic.
  • Balances read/write load across nodes, as well as storage utilization across nodes.
  • Perfectly Stable Hashing (Rendezvous Hashing, Consistent Hashing). Precisely weighted (CARP, RUSH, CRUSH).
  • It would be nice to have something that would offer the characteristics of both.
  • Consistent: buckets inserted in random positions. Keys maps to the next node greater than that key. With a new node, only neighbors as disrupted. But neighbor has to send data to new node, might not distribute keys evenly.
  • Rendezvous: Score = Hash (Bucket ID || Key). Bucket with the highest score wins. When adding a new node, some of the keys will move to it. Every node is disrupted evenly.
  • CARP is rendezvous hashing with a twist. It multiples the scores by a “Load Factor” for each node. Allows for some nodes being more capable than others. Not perfectly stable: if node’s weighting changes or node is added, then all load factor must be recomputed.
  • RUSH/CRUSH: Hierarchical tree, with each node assigned a probability to go left/right. CRUSH makes the tree match the fault domains of the system. Efficient to add nodes, but not to remove or re-weight nodes.
  • New algorithm: Weighted Rendezvous Hashing (WRH). Both perfectly stable and precisely weighted.
  • WRH adjusts scores before weighting them. Unlike CARP, scores aren’t relatively scaled.
  • No unnecessary transfer of keys when adding/removing nodes. If adding node or increasing weight on node, other nodes will move keys to it, but nothing else. Transfers are equalized and perfectly efficient.
  • WRH is simple to implement. Whole python code showed in one slide.
  • All the magic is in one line: “Score = 1.0 / -math.log(hash_f)” – Proof of correctness provided for the math inclined.
  • How Cleversafe uses WRH. System is grown by set of devices. Devices have a lifecycle: added, possibly expanded, then retired.
  • Detailed explanation of the lifecycle and how keys move as nodes are added, expanded, retired.
  • Storage Resource Map. Includes weight, hash_seed. Hash seed enables a clever trick to retire device sets more efficiently.
  • Q: How to find data when things are being moved? If clients talk to the old node while keys are being moved. Old node will proxy the request to the new node.

 

Storage Class Memory Support in the Windows Operating System
Neal Christiansen, Principal Development Lead, Microsoft

  • Windows support for non-volatile storage medium with RAM-like performance is a big change.
  • Storage Class Memory (SCM): NVDIMM, 3D XPoint, others
  • Microsoft involved with the standardization efforts in this space.
  • New driver model necessary: SCM Bus Driver, SCM Disk Driver.
  • Windows Goals for SCM: Support zero-copy access, run most user-mode apps unmodified, option for 100% backward compatibility (new types of failure modes), sector granular failure modes for app compat.
  • Applications make lots of assumptions on the underlying storage
  • SCM Storage Drivers will support BTT – Block Translation Table. Provides sector-level atomicity for writes.
  • SCM is disruptive. Fastest performance and application compatibility can be conflicting goals.
  • SCM-aware File Systems for Windows. Volume modes: block mode or DAS mode (chosen at format time).
  • Block Mode Volumes – maintain existing semantics, full application compatibility
  • DAS Mode Volumes – introduce new concepts (memory mapped files, maximizes performance). Some existing functionality is lost. Supported by NTFS and ReFS.
  • Memory Mapped IO in DAS mode. Application can create a memory mapped section. Allowed when volumes resides on SCM hardware and the volume has been formatted for DAS mode.
  • Memory Mapped IO: True zero copy access. BTT is not used. No paging reads or paging writes.
  • Cached IO in DAS Mode: Cache manager creates a DAS-enabled cache map. Cache manager will copy directly between user’s buffer and SCM. Coherent with memory-mapped IO. App will see new failure patterns on power loss or system crash. No paging reads or paging writes.
  • Non-cached IO in DAS Mode. Will send IO down the storage stack to the SCM driver. Will use BTT. Maintains existing storage semantics.
  • If you really want the performance, you will need to change your code.
  • DAS mode eliminates traditional hook points used by the file system to implement features.
  • Features not in DAS Mode: NTFS encryption, NTS compression, NTFS TxF, ReFS integrity streams, ReFS cluster band, ReFS block cloning, Bitlocker volume encryption, snapshot via VolSnap, mirrored or parity via storage spaces or dynamic disks
  • Sparse files won’t be there initially but will come in the future.
  • Updated at the time the file is memory mapped: file modification time, mark file as modified in the USN journal, directory change notification
  • File System Filters in DAS mode: no notification that a DAS volume is mounted, filter will indicate via a flag if they understand DAS mode semantics.
  • Application compatibility with filters in DAS mode: No opportunity for data transformation filters (encryption, compression). Anti-virus are minimally impacted, but will need to watch for creation of writeable mapped sections (no paging writes anymore).
  • Intel NVLM library. Open source library implemented by Intel. Defines set of application APIs for directly manipulating files on SCM hardware.
  • NVLM library available for Linux today via GitHub. Microsoft working with Intel on a Windows port.
  • Q: XIP (Execute in place)? It’s important, but the plans have not solidified yet.
  • Q: NUMA? Can be in NUMA nodes. Typically, the file system and cache are agnostic to NUMA.
  • Q: Hyper-V? Not ready to talk about what we are doing in that area.
  • Q: Roll-out plan? We have one, but not ready to talk about it yet.
  • Q: Data forensics? We’ve yet to discuss this with that group. But we will.
  • Q: How far are you to completion? It’s running and working today. But it is not complete.
  • Q: Windows client? To begin, we’re targeting the server. Because it’s available there first.
  • Q: Effect on performance? When we’re ready to announce the schedule, we will announce the performance. The data about SCM is out there. It’s fast!
  • Q: Will you backport? Probably not. We generally move forward only. Not many systems with this kind of hardware will run a down level OS.
  • Q: What languages for the Windows port of NVML? Andy will cover that in his talk tomorrow.
  • Q: How fast will memory mapped be? Potentially as fast as DRAM, but depends on the underlying technology.

 

The Bw-Tree Key-Value Store and Its Applications to Server/Cloud Data Management in Production
Sudipta Sengupta, Principal Research Scientist, Microsoft Research

  • The B-Tree: key-ordered access to records. Balanced tree via page split and merge mechanisms.
  • Design tenets: Lock free operation (high concurrency), log-structure storage (exploit flash devices with fast random reads and inefficient random writes), delta updates to pages (reduce cache invalidation, garbage creation)
  • Bw-Tree Architecture: 3 layers: B-Tree (expose API, B-tree search/update, in-memory pages), Cache (logical page abstraction, move between memory and flash), Flash (reads/writes from/to storage, storage management).
  • Mapping table: Expose logical pages to access method layer. Isolates updates to single page. Structure for lock-free multi-threaded concurrency control.
  • Highly concurrent page updates with Bw-Tree. Explaining the process using a diagram.
  • Bw-Tree Page Split: No hard threshold for splitting unlike in classical B-Tree. B-link structure allows “half-split” without locking.
  • Flash SSDs: Log-Structured storage. Use log structure to exploit the benefits of flash and work around its quirks: random reads are fast, random in-place writes are expensive.
  • LLAMA Log-Structured Store: Amortize cost of writes over many page updates. Random reads to fetch a “logical page”.
  • Depart from tradition: logical page formed by linking together records on multiple physical pages on flash. Adapted from SkimpyStash.
  • Detailed diagram comparing traditional page writing with the writing optimized storage organization with Bw-Tree.
  • LLAMA: Optimized Logical Page Reads. Multiple delta records are packed when flushed together. Pages consolidated periodically in memory also get consolidated on flash when flushed.
  • LLAMA: Garbage collection on flash. Two types of record units in the log: Valid or Orphaned. Garbage collection starts from the oldest portion of the log. Earliest written record on a logical page is encountered first.
  • LLAMA: cache layer. Responsible for moving pages back and forth from storage.
  • Bw-Tree Checkpointing: Need to flush to buffer and to storage. LLAMA checkpoint for fast recovery.
  • Bw-Tree Fast Recovery. Restore mapping table from latest checkpoint region. Warm-up using sequential I/O.
  • Bw-Tree: Support for transactions. Part of the Deuteronomy Architecture.
  • End-to-end crash recovery. Data component (DC) and transactional component (TC) recovery. DC happens before TC.
  • Bw-Tree in production: Key-sequential index in SQL Server in-memory database
  • Bw-Tree in production: Indexing engine in Azure DocumentDB. Resource governance is important (CPU, Memory, IOPS, Storage)
  • Bw-Tree in production: Sorted key-value store in Bing ObjectStore.
  • Summary: Classic B-Tree redesigned for modern hardware and cloud. Lock-free, delta updating of pages, log-structure, flexible resource governor, transactional. Shipping in production.
  • Going forward: Layer transactional component (Deuteronomy Architecture, CIDR 2015), open-source the codebase

 

ReFS v2: Cloning, Projecting, and Moving Data
J.R. Tipton, Development Lead, Microsoft

  • Agenda: ReFS v1 primer, ReFS v2 at a glance, motivations for ReFS v2, cloning, translation, transformation
  • ReFS v1 primer: Windows allocate-on-write file system, Merkel trees verify metadata integrity, online data correction from alternate copies, online chkdsk
  • ReFS v2: Available in Windows Server 2016 TP4. Efficient, reliable storage for VMs, efficient parity, write tiering, read caching, block cloning, optimizations
  • Motivations for ReFS v2: cheap storage does not mean slow, VM density, VM provisioning, more hardware flavors (SLC, MLC, TLC flash, SMR)
  • Write performance. Magic does not work in a few environments (super fast hardware, small random writes, durable writes/FUA/sync/write-through)
  • ReFS Block Cloning: Clone any block of one file into any other block in another file. Full file clone, reorder some or all data, project data from one area into another without copy
  • ReFS Block Cloning: Metadata only operation. Copy-on-write used when needed (ReFS knows when).
  • Cloning examples: deleting a Hyper-V VM checkpoint, VM provisioning from image.
  • Cloning observations: app directed, avoids data copies, metadata operations, Hyper-V is the first but not the only one using this
  • Cloning is no free lunch: multiple valid copies will copy-on-write upon changes. metadata overhead to track state, slam dunk in most cases, but not all
  • ReFS cluster bands. Volume internally divvied up into bands that contain regular FS clusters (4KB, 64KB). Mostly invisible outside file system. Bands and clusters track independently (per-band metadata). Bands can come and go.
  • ReFS can move bands around (read/write/update band pointer). Efficient write caching and parity. Writes to bands in fast tier. Tracks heat per band. Moves bands between tiers. More efficient allocation. You can move from 100% triple mirroring to 95% parity.
  • ReFS cluster bands: small writes accumulate where writing is cheap (mirror, flash, log-structured arena), bands are later shuffled to tier where random writes are expensive (band transfers are fully sequential).
  • ReFS cluster bands: transformation. ReFS can do stuff to the data in a band (can happen in the background). Examples: band compaction (put cold bands together, squeeze out free space), band compression (decompress on read).
  • ReFS v2 summary: data cloning, data movement, data transformation. Smart when smart makes sense, switches to dumb when dumb is better. Takes advantages of hardware combinations. And lots of other stuff…

 

Innovator, Disruptor or Laggard, Where Will Your Storage Applications Live? Next Generation Storage
Bev Crair, Vice President and General Manager, Storage Group, Intel

  • The world is changing: information growth,  complexity, cloud, technology.
  • Growth: 44ZB of data in all systems. 15% of the data is stored, since perceived cost is low.
  • Every minute of every day: 2013 : 8h of of video uploaded to YouTube, 47,000 apps downloaded, 200 million e-mails
  • Every minute of every day: 2015 : 300h of of video uploaded to YouTube, 51,000 apps downloaded, 204 million e-mails
  • Data never sleeps: the internet in real time. tiles showing activities all around the internet.
  • Data use pattern changes: sense and generate, collect and communicate, analyze and optimize. Example: HADRON collider
  • Data use pattern changes: from collection to analyzing data, valuable data now reside outside the organization, analyzing and optimizing unstructured data
  • Cloud impact on storage solutions: business impact, technology impact. Everyone wants an easy button
  • Intelligent storage: Deduplication, real-time compression, intelligent tiering, thin provisioning. All of this is a software problem.
  • Scale-out storage: From single system with internal network to nodes working together with an external network
  • Non-Volatile Memory (NVM) accelerates the enterprise: Examples in Virtualization, Private Cloud, Database, Big Data and HPC
  • Pyramid: CPU, DRAM, Intel DIMM (3D XPoint), Intel SSD (3D XPoint), NAND SSD, HDD,  …
  • Storage Media latency going down dramatically. With NVM, the bottleneck is now mostly in the software stack.
  • Future storage architecture: complex chart with workloads for 2020 and beyond. New protocols, new ways to attach.
  • Intel Storage Technologies. Not only hardware, but a fair amount of software. SPDK, NVMe driver, Acceleration Library, Lustre, others.
  • Why does faster storage matter? Genome testing for cancer takes weeks, and the cancer mutates. Genome is 10TB. If we can speed up the time it takes to test it to one day, it makes a huge difference and you can create a medicine that saves a person’s life. That’s why it matters.

  

The Long-Term Future of Solid State Storage
Jim Handy, General Director, Objective Analysis

  • How we got here? Why are we in the trouble we’re at right now? How do we get ahead of it? Where is it going tomorrow?
  • Establishing a schism: Memory is in bytes (DRAM, Cache, Flash?), Storage is in blocks (Disk Tape, DVD, SAN, NAS, Cloud, Flash)
  • Is it really about block? Block, NAND page, DRAM pages, CPU cache lines. It’s all in pages anyway…
  • Is there another differentiator? Volatile vs. Persistent. It’s confusing…
  • What is an SSD? SSDs are nothing new. Going back to DEC Bulk Core.
  • Disk interfaces create delays. SSD vs HDD latency chart. Time scale in milliseconds.
  • Zooming in to tens of microseconds. Different components of the SSD delay. Read time, Transfer time, Link transfer, platform and adapter, software
  • Now looking at delays for MLC NAND ONFi2, ONFi3, PCIe x4 Gen3, future NVM on PCIe x4 Gen3
  • Changing the scale to tens of microseconds on future NVM. Link Transfer, Platform & adapter and Software now accounts for most of the latency.
  • How to move ahead? Get rid of the disk interfaces (PCIe, NVMe, new technologies). Work on the software: SNIA.
  • Why now? DRAM Transfer rates. Chart transfer rates for SDRAM, DDR, DDR2, DDR3, DDR4. Designing the bus takes most of the time.
  • DRAM running out of speed? We probably won’t see a DDR5. HMC or HBM a likely next step. Everything points to fixed memory sizes.
  • NVM to the rescue. DRAM is not the only upgrade path. It became cheaper to use NAND flash than DRAM to upgrade a PC.
  • NVM to be a new memory layer between DRAM & NAND: Intel/Micron 3D XPoint – “Optane”
  • One won’t kill the other. Future systems will have DRAM, NVM, NAND, HDD. None of them will go away…
  • New memories are faster than NAND. Chart with read bandwidth vs write bandwidth. Emerging NVRAM: FeRAM, eMRAM, RRAM, PRAM.
  • Complex chart with emerging research memories. Clock frequency vs. Cell Area (cost).
  • The computer of tomorrow. Memory or storage? In the beginning (core memory), there was no distinction between the two.
  • We’re moving to an era where you can turn off the computer, turn it back on and there’s something in memory. Do you trust it?
  • SCM – Storage Class Memory: high performance with archival properties. There are many other terms for it: Persistent Memory, Non-Volatile Memory.
  • New NVM has disruptively low latency: Log chart with latency budgets for HDD, SATA SSD, NVMe, Persistent. When you go below 10 microseconds (as Persistent does), context switching does not make sense.
  • Non-blocking I/O. NUMA latencies up to 200ns have been tolerated. Latencies below these cause disruption.
  • Memory mapped files eliminate file system latency.
  • The computer of tomorrow. Fixed DRAM size, upgradeable NVM (tomorrow’s DIMM), both flash and disk (flash on PCIe or own bus), much work needed on SCM software
  • Q: Will all these layers survive? I believe so. There are potential improvements in all of them (cited a few on NAND, HDD).
  • Q: Shouldn’t we drop one of the layers? Usually, adding layers (not removing them) is more interesting from a cost perspective.
  • Q: Do we need a new protocol for SCM? NAND did well without much of that. Alternative memories could be put on a memory bus.

 

Concepts on Moving From SAS connected JBOD to an Ethernet Connected JBOD
Jim Pinkerton, Partner Architect Lead, Microsoft

  • What if we took a JBOD, a simple device, and just put it on Ethernet?
  • Re-Thinking the Software-defined Storage conceptual model definition: compute nodes, storage nodes, flakey storage devices
  • Front-end fabric (Ethernet, IB or FC), Back-end fabric (directly attached or shared storage)
  • Yesterday’s Storage Architecture: Still highly profitable. Compute nodes, traditional SAN/NAS box (shipped as an appliance)
  • Today: Software Defined Storage (SDS) – “Converged”. Separate the storage service from the JBOD.
  • Today: Software Defined Storage (SDS) – “Hyper-Converged” (H-C). Everything ships in a single box. Scale-out architecture.
  • H-C appliances are a dream for the customer to install/use, but the $/GB storage is high.
  • Microsoft Cloud Platform System (CPS). Shipped as a packaged deal. Microsoft tested and guaranteed.
  • SDS with DAS – Storage layer divided into storage front-end (FE) and storage back-end (BE). The two communicate over Ethernet.
  • SDS Topologies. Going from Converged and Hyper-Converged to a future EBOD topology. From file/block access to device access.
  • Expose the raw device over Ethernet. The raw device is flaky, but we love it. The storage FE will abstract that, add reliability.
  • I would like to have an EBOD box that could provide the storage BE.
  • EBOD works for a variety of access protocols and topologies. Examples: SMB3 “block”, Lustre object store, Ceph object store, NVMe fabric, T10 objects.
  • Shared SAS Interop. Nightmare experience (disk multi-path interop, expander multi-path interop, HBA distributed failure).  This is why customers prefers appliances.
  • To share or not to share. We want to share, but we do not want shared SAS. Customer deployment is more straightforward, but you have more traffic on Ethernet.
  • Hyper-Scale cloud tension – fault domain rebuild time. Depends on number of disks behind a node and how much network you have.
  • Fault domain for storage is too big. Required network speed offsets cost benefits of greater density. Many large disks behind a single node becomes a problem.
  • Private cloud tension – not enough disks. Entry points at 4 nodes, small number of disks. Again, fault domain is too large.
  • Goals in refactoring SDS – Storage back-end is a “data mover” (EBOD). Storage front-end is “general purpose CPU”.
  • EBOD goals – Can you hit a cost point that’s interesting? Reduce storage costs, reduce size of fault domain, build a more robust ecosystem of DAS. Keep topology simple, so customer can build it themselves.
  • EBOD: High end box, volume box, capacity box.
  • EBOD volume box should be close to what a JBOD costs. Basically like exposing raw disks.
  • Comparing current Hyper-Scale to EBOD. EBOD has an NIC and an SOC, in addition to the traditional expander in a JBOD.
  • EBOD volume box – Small CPU and memory, dual 10GbE, SOC with RDMA NIC/SATA/SAS/PCIe, up to 20 devices, SFF-8639 connector, management (IPMI, DMTF Redfish?)
  • Volume EBOD Proof Point – Intel Avaton, PCIe Gen 2, Chelsio 10GbE, SAS HBA, SAS SSD. Looking at random read IOPS (local, RDMA remote and non-RDMA remote). Max 159K IOPS w/RDMA, 122K IOPS w/o RDMA. Latency chart showing just a few msec.
  • EBOD Performance Concept – Big CPU, Dual attach 40GbE, Possibly all NVME attach or SCM. Will show some of the results this afternoon.
  • EBOD is an interesting approach that’s different from what we’re doing. But it’s nicely aligned with software-defined storage.
  • Price point of EBOD must be carefully managed, but the low price point enables a smaller fault domain.

  

Planning for the Next Decade of NVM Programming
Andy Rudoff, SNIA NVM Programming TWG, Intel

  • Looking at what’s coming up in the next decade, but will start with some history.
  • Comparison of data storage technologies. Emerging NV technologies with read times in the same order of magnitude as DRAM.
  • Moving the focus to software latency when using future NVM.
  • Is it memory or storage? It’s persistent (like storage) and byte-addressable (like memory).
  • Storage vs persistent memory. Block IO vs. byte addressable, sync/async (DMA master)  vs. sync (DMA slave). High capacity vs. growing capacity.
  • pmem: The new Tier. Byte addressable, but persistent. Not NAND. Can do small I/O. Can DMA to it.
  • SNIA TWG (lots of companies). Defining the NVM programming model: NVM.PM.FILE mode and NVM.PM.VOLUME mode.
  • All the OSes created in the last 30 years have a memory mapped file.
  • Is this stuff real? Why are we spending so much time on this? Yes – Intel 3D XPoint technology, the Intel DIMM. Showed a wafer on stage. 1000x faster than NAND. 1000X endurance of NAND, 10X denser than conventional memory. As much as 6TB of this stuff…
  • Timeline: Big gap between NAND flash memory (1989) and 3D XPoint (2015).
  • Diagram of of the model with Management, Block, File and Memory access. Link at the end to the diagram.
  • Detecting pmem: Defined in the ACPI 6.0. Linux support upstream (generic DIMM driver, DAX, ext4+DAX, KVM).  Neal talked about Windows support yesterday.
  • Heavy OSV involvement in TWG, we wrote the spec together.
  • We don’t want every application to have to re-architecture itself. That’s why we have block and file there as well.
  • The next decade
  • Transparency  levels: increasing barrier to adoption. increasing leverage. Could do it in layers. For instance, could be file system only, without app modification. For instance, could modify just the JVM to get significant advantages without changing the apps.
  • Comparing to multiple cores in hardware and multi-threaded programming. Took a decade or longer, but it’s commonplace now.
  • One transparent example: pmem Paging. Paging from the OS page cache (diagrams).
  • Attributes of paging : major page faults, memory looks much larger, page in must pick a victim, many enterprise apps opt-out, interesting example: Java GC.
  • What would it look like if you paged to pmem instead of paging to storage. I don’t even care that it’s persistent, just that there’s a lot of it.
  • I could kick a page out synchronously, probably faster than a context switch. But the app could access the data in pmem without swapping it in (that‘s new!). Could have policies for which app lives in which memory. The OS could manage that, with application transparency.
  • Would this really work? It will when pmem costs less, performance is close, capacity is significant and it is reliable. “We’re going to need a bigger byte” to hold error information.
  • Not just for pmem. Other memories technologies are emerging. High bandwidth memory, NUMA localities, different NVM technologies.
  • Extending into user space: NVM Library – pmem.io (64-bit Linux Alpha release). Windows is working on it as well.
  • That is a non-transparent example. It’s hard (like multi-threading). Things can fail in interesting new ways.
  • The library makes it easier and some of it is transactional.
  • No kernel interception point, for things like replication. No chance to hook above or below the file system. You could do it in the library.
  • Non-transparent use cases: volatile caching, in-memory database, storage appliance write cache, large byte-addressable data structures (hash table, dedup), HPC (checkpointing)
  • Sweet spots: middleware, libraries, in-kernel usages.
  • Big challenge: middleware, libraries. Is it worth the complexity.
  • Building a software ecosystem for pmem, cost vs. benefit challenge.
  • Prepare yourself: lean NVM programming model, map use cases to pmem, contribute to the libraries, software ecosystem

 

FS Design Around SMR: Seagate’s Journey and Reference System with EXT4
Adrian Palmer, Drive Development Engineering, Seagate Technologies

  • SNIA Tutorial. I’m talking about the standard, as opposed as the design of our drive.
  • SMR is being embraced by everyone, since this is a major change, a game changes.
  • From random writes to resemble the write profile of sequential-access tape.
  • 1 new condition: forward-write preferred. ZAC/ZBD spec: T10/13. Zones, SCSI ZBC standards, ATA ZAC standards.
  • What is a file system? Essential software on a system, structured and unstructured data, stores metadata and data.
  • Basic FS requirements: Write-in-place (superblock, known location on disk), Sequential write (journal), Unrestricted write type (random or sequential)
  • Drive parameters: Sector (atomic unit of read/write access). Typically 512B size. Independently accessed. Read/write, no state.
  • Drive parameters: Zone (atomic performant rewrite unit). Typically 256 MiB in size. Indirectly addressed via sector. Modified with ZAC/ZBD commands. Each zone has state (WritePointer, Condition, Size, Type).
  • Write Profiles. Conventional (random access), Tape (sequential access), Flash (sequential access, erase blocks), SMR HA/HM (sequential access, zones). SMR write profile is similar to Tape and Flash.
  • Allocation containers. Drive capacities are increasing, location mapping is expensive. 1.56% with 512B blocks or 0.2% with 4KB blocks.
  • Remap the block device as a… block device. Partitions (w*sector size), Block size (x*sector size), Group size (y*Block size), FS (z*group size, expressed as blocks).
  • Zones are a good fit to be matched with Groups. Absorb and mirror the metadata, don’t keep querying drive for metadata.
  • Solving the sequential write problem. Separate the problem spaces with zones.
  • Dedicate zones to each problem space: user data, file records, indexes, superblock, trees, journal, allocation containers.
  • GPT/Superblocks: First and last zone (convention, not guaranteed). Update infrequently, and at dismount. Looks at known location and WritePointer. Copy-on-update. Organized wipe and update algorithm.
  • Journal/soft updates. Update very frequently, 2 or more zones, set up as a circular buffer. Checkpoint at each zone. Wipe and overwrite oldest zone. Can be used as NV cache for metadata. Requires lots of storage space for efficient use and NV.
  • Group descriptors: Infrequently changed. Changes on zone condition change, resize, free block counts. Write cached, butwritten at WritePointer. Organized as a B+Tree, not an indexed array. The B+Tree needs to be stored on-disk.
  • File Records: POSIX information (ctime, mtime, atime, msize, fs specific attributes), updated very frequently. Allows records to be modified in memory, written to journal cache, gather from journal, write to new blocks at WritePointer.
  • Mapping (file records to blocks). File ideally written as a single chunk (single pointer), but could become fragmented (multiple pointers). Can outgrow file record space, needs its own B+Tree. List can be in memory, in the journal, written out to disk at WritePointer.
  • Data: Copy-on-write. Allocator chooses blocks at WritePointer. Writes are broken at zone boundary, creating new command and new mapping fragment.
  • Cleanup: Cannot clean up as you go, need a separate step. Each zone will have holes. Garbage collection: Journal GC, Zones GC, Zone Compaction, Defragmentation.
  • Advanced features: indexes, queries, extended attributes, snapshots, checksums/parity, RAID/JBOD.

 

Azure File Service: ‘Net Use’ the Cloud
David Goebel, Software Engineer, Microsoft

  • Agenda: features and API (what), scenarios enabled (why), design of an SMB server not backed by a conventional FS (how)
  • It’s not the Windows SMB server (srv2.sys). Uses Azure Tables and Azure Blobs for the actual files.
  • Easier because we already have a highly available and distributed architecture.
  • SMB 2.1 in preview since last summer. SMB 3.0 (encryption, persistent handles) in progress.
  • Azure containers mapped as shares. Clients work unmodified out-of-the-box. We implemented the spec.
  • Share namespace is coherently accessible
  • MS-SMB2, not SMB1. Anticipates (but does not require) a traditional file system on the other side.
  • In some ways it’s harder, since what’s there is not a file system. We have multiple tables (for leases, locks, etc). Nice and clean.
  • SMB is a stateful protocol, while REST is all stateless. Some state is immutable (like FileId), some state is transient (like open counts), some is maintained by the client (like CreateGuid), some state is ephemeral (connection).
  • Diagram with the big picture. Includes DNS, load balancer, session setup & traffic, front-end node, azure tables and blobs.
  • Front-end has ephemeral and immutable state. Back-end has solid and fluid durable state.
  • Diagram with two clients accessing the same file and share, using locks, etc. All the state handled by the back-end.
  • Losing a front-end node considered a regular event (happens during updates), the client simple reconnects, transparently.
  • Current state, SMB 2.1 (SMB 3.0 in the works). 5TB per share and 1TB per file. 1,000 8KB IOPS per share, 60MB/sec per share. Some NTFS features not supported, some limitations on characters and path length (due to HTTP/REST restrictions).
  • Demo: I’m actually running my talk using a PPTX file on Azure File. Robocopy to file share. Delete, watch via explorer (notifications working fine). Watching also via wireshark.
  • Currently Linux Support. Lists specific versions Ubuntu Server, Ubuntu Core, CentOS, Open SUSE, SUSE Linux Enterprise Server.
  • Why: They want to move to cloud, but they can’t change their apps. Existing file I/O applications. Most of what was written over the last 30 years “just works”. Minor caveats that will become more minor over time.
  • Discussed specific details about how permissions are currently implemented. ACL support is coming.
  • Example: Encryption enabled scenario over the internet.
  • What about REST? SMB and REST access the same data in the same namespace, so a gradual application transition without disruption is possible. REST for container, directory and file operations.
  • The durability game. Modified state that normally exists only in server memory, which must be durably committed.
  • Examples of state tiering: ephemeral state, immutable state, solid durable state, fluid durable state.
  • Example: Durable Handle Reconnect. Intended for network hiccups, but stretched to also handles front-end reconnects. Limited our ability because of SMB 2.1 protocol compliance.
  • Example: Persistent Handles. Unlike durable handles, SMB 3 is actually intended to support transparent failover when a front-end dies. Seamless transparent failover.
  • Resource Links: Getting started blog (http://blogs.msdn.com/b/windowsazurestorage/archive/2014/05/12/introducing-microsoft-azure-file-service.aspx) , NTFS features currently not supported (https://msdn.microsoft.com/en-us/library/azure/dn744326.aspx), naming restrictions for REST compatibility (https://msdn.microsoft.com/library/azure/dn167011.aspx).

 

Software Defined Storage – What Does it Look Like in 3 Years?
Richard McDougall, Big Data and Storage Chief Scientist, VMware

  • How do you come up with a common, generic storage platform that serves the needs of application?
  • Bringing a definition of SDS. Major trends in hardware, what the apps are doing, cloud platforms
  • Storage workloads map. Many apps on 4 quadrants on 2 axis: capacity (10’s of Terabytes to 10’s of Petabytes) and IOPS (1K to 1M)
  • What are cloud-native applications? Developer access via API, continuous integration and deployment, built for scale, availability architected in the app, microservices instead of monolithic stacks, decoupled from infrastructure
  • What do Linux containers need from storage? Copy/clone root images, isolated namespace, QoS controls
  • Options to deliver storage to containers: copy whole root tree (primitive), fast clone using shared read-only images, clone via “Another Union File System” (aufs), leverage native copy-on-write file system.
  • Shared data: Containers can share file system within host or across hots (new interest in distributed file systems)
  • Docker storage abstractions for containers: non-persistent boot environment, persistent data (backed by block volumes)
  • Container storage use cases: unshared volumes, shared volumes, persist to external storage (API to cloud storage)
  • Eliminate the silos: converged big data platform. Diagram shows Hadoop, HBase, Impala, Pivotal HawQ, Cassandra, Mongo, many others. HDFS, MAPR, GPFS, POSIX, block storage. Storage system common across all these, with the right access mechanism.
  • Back to the quadrants based on capacity and IOPS. Now with hardware solutions instead of software. Many flash appliances in the upper left (low capacity, high IOPS). Isilon in the lower right (high capacity, low IOPS).
  • Storage media technologies in 2016. Pyramid with latency, capacity per device, capacity per host for each layer: DRAM (1TB/device, 4TB/host, ~100ns latency), NVM (1TB, 4TB, ~500ns), NVMe SSD (4TB, 48TB, ~10us), capacity SSD (16TB, 192TB, ~1ms), magnetic storage (32TB, 384TB, ~10ms), object storage (?, ?, ~1s). 
  • Back to the quadrants based on capacity and IOPS. Now with storage media technologies.
  • Details on the types of NVDIMM (NVIMM-N – Type 1, NVDIMM-F – Type 2, Type 4). Standards coming up for all of these. Needs work to virtualize those, so they show up properly inside VMs.
  • Intel 3D XPoint Technology.
  • What are the SDS solutions than can sit on top of all this? Back to quadrants with SDS solutions. Nexenta, Mentions ScaleiO, VSAN, ceph, Scality, MAPR, HDFS. Can you make one solution that works well for everything?
  • What’s really behind a storage array? The value from the customer is that it’s all from one vendor and it all works. Nothing magic, but the vendor spent a ton of time on testing.
  • Types of SDS: Fail-over software on commodity servers (lists many vendors), complexity in hardware, interconnects. Issues with hardware compatibility.
  • Types of SDS: Software replication using servers + local disks. Simpler, but not very scalable.
  • Types of SDS: Caching hot core/cold edge. NVMe flash devices up front, something slower behind it (even cloud). Several solutions, mostly startups.
  • Types of SDS: Scale-out SDS. Scalable, fault-tolerant, rolling updates. More management, separate compute and storage silos. Model used by ceph, ScaleiO. Issues with hardware compatibility. You really need to test the hardware.
  • Types of SDS: Hyper-converged SDS. Easy management, scalable, fault-tolerant, rolling upgrades. Fixed compute to storage ration. Model used by VSAN, Nutanix. Amount of variance in hardware still a problem. Need to invest in HCL verification.
  • Storage interconnects. Lots of discussion on what’s the right direction. Protocols (iSCSI, FC, FCoE, NVMe, NVMe over Fabrics), Hardware transports (FC, Ethernet, IB, SAS), Device connectivity (SATA, SAS, NVMe)
  • Network. iSCSI, iSER, FCoE, RDMA over Ethernet, NVMe Fabrics. Can storage use the network? RDMA debate for years. We’re at a tipping point.
  • Device interconnects: HCA with SATA/SAS. NVMe SSD, NVM over PCIe. Comparing iSCSI, FCoE and NVMe over Ethernet.
  • PCIe rack-level Fabric. Devices become addressable. PCIe rack-scale compute and storage, with host-to-host RDMA.
  • NVMe – The new kid on the block. Support from various vendors. Quickly becoming the all-purpose stack for storage, becoming the universal standard for talking block.
  • Beyond block: SDS Service Platforms. Back to the 4 quadrants, now with service platforms.
  • Too many silos: block, object, database, key-value, big data. Each one is its own silo with its own machines, management stack, HCLs. No sharing of infrastructure.
  • Option 1: Multi-purpose stack. Has everything we talked about, but it’s a compromise.
  • Option 2: Common platform + ecosystem of services. Richest, best-of-breed services, on a single platform, manageable, shared resources.

 

Why the Storage You Have is Not the Storage Your Data Needs
Laz Vekiarides, CTO and Co-founder, ClearSky Data

  • ClearSky Data is a tech company, consumes what we discussed in this conference.
  • The problem we’re trying to solve is the management of the storage silos
  • Enterprise storage today. Chart: Capacity vs. $/TB. Flash, Mid-Range, Scale-Out. Complex, costly silos
  • Describe the lifecycle of the data, the many copies you make over time, the rebuilding and re-buying of infrastructure
  • What enterprises want: buy just enough of the infrastructure, with enough performance, availability, security.
  • Cloud economics – pay only for the stuff that you use, you don’t have to see all the gear behind the storage, someone does the physical management
  • Tiering is a bad answer – Nothing remains static. How fast does hot data cool? How fast does it re-warm? What is the overhead to manage it? It’s a huge overhead. It’s not just a bandwidth problem.
  • It’s the latency, stupid. Data travels at the speed of light. Fast, but finite. Boston to San Francisco: 29.4 milliseconds of round-trip time (best case). Reality (with switches, routers, protocols, virtualization) is more like 70 ms.
  • So, where exactly is the cloud? Amazon East is near Ashburn, VA. Best case is 10ms RTT. Worst case is ~150ms (does not include time to actually access the storage).
  • ClearSky solution: a global storage network. The infrastructure becomes invisible to you, what you see is a service level agreement.
  • Solution: Geo-distributed data caching. Customer SAN, Edge, Metro POP, Cloud. Cache on the edge (all flash), cache on the metro POP.
  • Edge to Metro POP are private lines (sub millisecond latency). Addressable market is the set of customers within a certain distance to the Metro POP.
  • Latency math: Less than 1ms to the Metro POP, cache miss path is between 25ms and 50ms.
  • Space Management: Edge (hot, 10%, 1 copy), POP (warm, <30%, 1-2 copies), Cloud (100%, n copies). All data is deduplicated and encrypted.
  • Modeling cache performance: Miss ratio curve (MRC). Performance as f(size), working set knees, inform allocation policy.
  • Reuse distance (unique intervening blocks between use and reuse). LRU is most of what’s out there. Look at stacking algorithms. Chart on cache size vs. miss ratio. There’s a talk on this tomorrow by CloudPhysics.
  • Worked with customers to create a heat map data collector. Sizing tool for VM environments. Collected 3-9 days of workload.
  • ~1,400 virtual disks, ~800 VMs, 18.9TB (68% full), avg read IOPS 5.2K, write IOPS 5.9K. Read IO 36KB, write IO 110KB. Read Latency 9.7ms, write latency 4.5ms.
  • This is average latency, maximum is interesting, some are off the chart. Some were hundred of ms, even 2 second.
  • Computing the cache miss ratio. How much cache would we need to get about 90% hit ratio? Could do it with less than 12% of the total.
  • What is cache hit for writes? What fits in the write-back cache. You don’t want to be synchronous with the cloud. You’ll go bankrupt that way.
  • Importance of the warm tier. Hot data (Edge, on prem, SSD) = 12%, warm data (Metro PoP, SSD and HDD) = 6%, cold data (Cloud) = 82%. Shown as a “donut”.
  • Yes, this works! We’re having a very successful outcome with the customers currently engaged.
  • Data access is very tiered. Small amounts of flash can yield disproportionate performance benefits. Single tier cache in front of high latency storage can’t work. Network latency is as important as bounding media latency.
  • Make sure your caching is simple. Sometimes you are overthinking it.
  • Identifying application patterns is hard. Try to identify the sets of LBA that are accessed. Identify hot spots, which change over time. The shape of the miss ratio remains similar.

 

Emerging Trends in Software Development
Donnie Berkholz, Research Director, 451 Research

  • How people are building applications. How storage developers are creating and shipping software.
  • Technology adoption is increasingly bottom-up. Open source, cloud. Used to be like building a cathedral, now it’s more like a bazaar.
  • App-dev workloads are quickly moving to the cloud. Chart from all-on-prem at the top to all-cloud at the bottom.
  • All on-prem going from 59% now to 37% in a few years. Moving to different types of clouds (private cloud, Public cloud (IaaS), Public cloud (SaaS).
  • Showing charts for total data at organization, how much in off-premises cloud (TB and %). 64% of people have less than 20% on the cloud.
  • The new stack. There’s a lot of fragmentation. 10 languages in the top 80%. Used to be only 3 languages. Same thing for databases. It’s more composable, right tool for the right job.
  • No single stack. An infinite set of possibilities.
  • Growth in Web APIs charted since 2005 (from ProgrammableWeb). Huge growth.
  • What do enterprises think of storage vendors. Top vendors. People not particularly happy with their storage vendors. Promise index vs. fulfillment index.
  • Development trends that will transform storage.
  • Containers. Docker, docker, docker. Whale logos everywhere. When does it really make sense to use VMs or containers? You need lots of random I/O for these to work well. 10,000 containers in a cluster? Where do the databases go?
  • Developers love Docker. Chart on configuration management GitHub totals (CFEngine, Puppet, Chef, Ansible, Salt, Docker). Shows developer adoption. Docker is off the charts.
  • It’s not just a toy. Survey of 1,000 people on containers. Docker is only 2.5 years old now. 20% no plans, 56% evaluating. Total doing pilot or more add up to 21%. That’s really fast adoption
  • Docker to microservices.
  • Amazon: “Every single data transfer between teams has to happen through an API or you’re fired”. Avoid sending spreadsheets around.
  • Microservices thinking is more business-oriented, as opposed to technology-oriented.
  • Loosely couple teams. Team organization has a great influence in your development.
  • The foundation of microservices. Terraform, MANTL, Apache Mesos, Capgemini Appollo, Amazon EC2 Container Service.
  • It’s a lot about scheduling. Number of schedulers that use available resources. Makes storage even more random.
  • Disruption in data processing. Spark. It’s a competitor to Hadoop, really good at caching in memory, also very fast on disk. 10x faster than map-reduce. People don’t have to be big data experts. Chart: Spark came out of nowhere (mining data from several public forums).
  • The market is coming. Hadoop market as a whole growing 46% (CAGR).
  • Storage-class memory. Picture of 3D XPoint. Do app developer care? Not sure. Not many optimize for cache lines in memory. Thinking about Redis in-memory database for caching. Developers probably will use SCM that way. Caching in the order of TB instead of GB.
  • Network will be incredibly important. Moving bottlenecks around.
  • Concurrency for developers. Chart of years vs. Percentage of Ohlon. Getting near to 1%. That’s a lot single the most popular is around 10%.
  • Development trends
  • DevOps. Taking agile development all the way to production. Agile, truly tip to tail. You want to iterate while involving your customers. Already happening with startups, but how do you scale?
  • DevOps: Culture, Automation (Pets vs. Cattle), Measurement
  • Automation: infrastructure as code. Continuous delivery.
  • Measurement: Nagios, graphite, Graylog2, splunk, Kibana, Sensu, etsy/statsd
  • DevOps is reaching DBAs. #1 stakeholder in recent survey.
  • One of the most popular team structure change. Dispersing the storage team.
  • The changing role of standards
  • The changing role of benchmarks. Torturing databases for fun and profit.
  • I would love for you to join our panel. If you fill our surveys, you get a lot of data for free.

 

Learnings from Nearly a Decade of Building Low-cost Cloud Storage
Gleb Budman, CEO, Backblaze

  • What we learned, specifically the cost equation
  • 150+ PB of customer data. 10B files.
  • In 2007 we wanted to build something that would backup your PC/Mac data to the cloud. $5/month.
  • Originally we wanted to put it all on S3, but we would lose money on every single customer.
  • Next we wanted to buy SANs to put the data on, but that did not make sense either.
  • We tried a whole bunch of things. NAS, USB-connected drives, etc.
  • Cloud storage has a new player, with a shockingly low price: B2. One fourth of the cost of S3.
  • Lower than Glacier, Nearline, S3-Infrequent Access, anything out there. Savings here add up.
  • Datacenter: convert kilowatts-to-kilobits
  • Datacenter Consideration: local cost to power, real state, taxes, climate, building/system efficiency, proximity to good people, connectivity.
  • Hardware: Connect hard drives to the internet, with as little as possible in between.
  • Blackblaze storage box, costs about $3K. As simple as possible, don’t make the hardware itself redundant. Use commodity parts (example: desktop power supply), use consumer hard drives, insource & use math for drive purchases.
  • They told us we could not use consumer hard drives. But reality is that the failure rate was actually lower. They last 6 years on average. Even if the enterprise HDD never fail, they still don’t make sense.
  • Insource & use math for drive purchases. Drives are the bulk of the cost. Chart with time vs. price per gigabyte. Talking about the Thailand Hard Drive Crisis.
  • Software: Put all intelligence here.
  • Blackblaze vault: 20 hard drives create 1 tome that share parts of a file, spread across racks.
  • Avoid choke point. Every single storage pods is a first class citizen. We can parallelize.
  • Algorithmically monitor SMART stats. Know which SMART codes correlate to annual failure rate. All the data is available on the site (all the codes for all the drives). https://www.backblaze.com/SMART
  • Plan for silent corruption. Bad drive looks exactly like a good drive.
  • Put replication above the file system.
  • Run out of resources simultaneous. Hardware and software together. Avoid having CPU pegged and your memory unused. Have your resources in balance, tweak over time.
  • Model and monitor storage burn. It’s important not to have too much or too little storage. Leading indicator is not storage, it’s bandwidth.
  • Business processes. Design for failure, but fix failures quickly. Drives will die, it’s what happens at scale.
  • Create repeatable repairs. Avoid the need for specialized people to do repair. Simple procedures: either swap a drive or swap a pod. Requires 5 minutes of training.
  • Standardize on the pod chassis. Simplifies so many things…
  • Use ROI to drive automation. Sometimes doing things twice is cheaper than automation. Know when it makes sense.
  • Workflow for storage buffer. Treat buffer in days, not TB. Model how many days of space available you need. Break into three different buffer types: live and running vs. in stock but not live vs. parts.
  • Culture: question “conventional wisdom”. No hardware worshippers. We love our red storage boxes, but we are a software team.
  • Agile extends to hardware. Storage Pod Scrum, with product backlog, sprints, etc.
  • Relentless focus on cost: Is it required? Is there a comparable lower cost option? Can business processes work around it? Can software work around it?

 

f4: Facebook’s Warm BLOB Storage System
Satadru Pan, Software Engineer, Facebook

  • White paper “f4: Facebook’s Warm BLOB Storage System” at http://www-bcf.usc.edu/~wyattllo/papers/f4-osdi14.pdf
  • Looking at how data cools over time. 100x drop in reads in 60 days.
  • Handling failure. Replication: 1.2 * 3 = 3.6. To lose data we need to lose 9 disks or 3 hosts. Hosts in different racks and datacenters.
  • Handling load. Load spread across 3 hosts.
  • Background: Data serving. CDN protects storage, router abstracts storage, web tier adds business logic.
  • Background: Haystack [OSDI2010]. Volume is a series of blobs. In-memory index.
  • Introducing f4: Haystack on cells. Cells = disks spread over a set of racks. Some compute resource in each cell. Tolerant to disk, host, rack or cell failures.
  • Data splitting: Split data into smaller blocks. Reed Solomon encoding, Create stripes with 5 data blocks and 2 parity blocks.
  • Blobs laid out sequentially in a block. Blobs do not cross block boundary. Can also rebuild blob, might not need to read all of the block.
  • Each stripe in a different rack. Each block/blob split into racks. Mirror to another cell. 14 racks involved.
  • Read. Router does Index read, Gets physical location (host, filename, offset). Router does data read. If data read fails, router sends request to compute (decoders).
  • Read under datacenter failure. Replica cell in a different data center. Router proxies read to a mirror cell.
  • Cross datacenter XOR. Third cell has a byte-by-byte XOR of the first two. Now mix this across 3 cells (triplet). Each has 67% data and 33% replica. 1.5 * 1.4 = 2.1X.
  • Looking at reads with datacenter XOR. Router sends two read requests to two local routers. Builds the data from the reads from the two cells.
  • Replication factors: Haystack with 3 copies (3.6X), f4 2.8 (2.8X), f4 2.1 (2.1X). Reduced replication factor, increased fault tolerance, increase load split.
  • Evaluation. What and how much data is “warm”?
  • CDN data: 1 day, 0.5 sampling. BLOB storage data: 2 week, 0.1%, Random distribution of blobs assumed, the worst case rates reported.
  • Hot data vs. Warm data. 1 week – 350 reads/sec/disk, 1 month – 150r/d/s, 3 months – 70r/d/s, 1 year 20r/d/s. Wants to keep above 80 reads/sec/disk. So chose 3 months as divider between hot and warm.
  • It is warm, not cold. Chart of blob age vs access. Even old data is read.
  • F4 performance: most loaded disk in cluster: 35 reads/second. Well below the 80r/s threshold.
  • F4 performance: latency. Chart of latency vs. read response. F4 is close to Haystack.
  • Conclusions. Facebook blob storage is big and growing. Blobs cool down with age very rapidly. 100x drop in reads in 60 days. Haystack 3.6 replication over provisioning for old, warm data. F4 encodes data to lower replication to 2.1X, without compromising performance significantly.

 

Pelican: A Building Block for Exascale Cold Data Storage
Austin Donnelly, Principal Research Software Development Engineer, Microsoft

  • White paper “Pelican: A building block for exascale cold data storage” at http://research.microsoft.com/pubs/230697/osdi2014-Pelican.pdf
  • This is research, not a product. No product announcement here. This is a science project that we offer to the product teams.
  • Background: Cold data in the cloud. Latency (ms. To hours) vs. frequency of access. SSD, 15K rpm HDD, 7.2K rpm HDD, Tape.
  • Defining hot, warm, archival tiers. There is a gap between warm and archival. That’s were Pelican (Cold) lives.
  • Pelican: Rack-scale co-design. Hardware and software (power, cooling, mechanical, HDD, software). Trade latency for lower cost. Massive density, low per-drive overhead.
  • Pelican rack: 52U, 1152 3.5” HDD. 2 servers, PCIe bus stretched rack wide. 4 x 10Gb links. Only 8% of disks can spin.
  • Looking at pictures of the rack. Very little there. Not many cables.
  • Interconnect details. Port multiplier, SATA controller, Backplane switch (PCIe), server switches, server, datacenter network. Showing bandwidth between each.
  • Research challenges: Not enough cooling, power, bandwidth.
  • Resource use: Traditional systems can have all disks running at once. In Pelican, a disk is part of a domain: power (2 of 16), cooling (1 of 12), vibration (1 of 2), bandwidth (tree).
  • Data placement: blob erasure-encoded on a set of concurrently active disks. Sets can conflict in resource requirement.
  • Data placement: random is pretty bad for Pelican. Intuition: concentrate conflicts over a few set of disks. 48 groups of 24 disk. 4 classes of 12 fully-conflicting groups. Blob storage over 18 disks (15+3 erasure coding).
  • IO scheduling: “spin up is the new seek”. All our IO is sequential, so we only need to optimize for spin up. Four schedulers, with 12 groups per scheduler, only one active at a time.
  • Naïve scheduler: FIFO. Pelican scheduler: request batching – trade between throughput and fairness.
  • Q: Would this much spin up and down reduce endurance of the disk. We’re studying it, not conclusive yet, but looking promising so far.
  • Q: What kind of drive? Archive drives, not enterprise drives.
  • Demo. Showing system with 36 HBAs in device manager. Showing Pelican visualization tool. Shows trays, drives, requests. Color-coded for status.
  • Demo. Writing one file: drives spin up, request completes, drives spin down. Reading one file: drives spin up, read completes, drives spin down.
  • Performance. Compare Pelican to a mythical beast. Results based on simulation.
  • Simulator cross-validation. Burst workload.
  • Rack throughput. Fully provisioned vs. Pelican vs. Random placement. Pelican works like fully provisioned up to 4 requests/second.
  • Time to first byte. Pelican adds spin-up time (14.2 seconds).
  • Power consumption. Comparing all disks on standby (1.8kW) vs. all disks active (10.8kW) vs. Pelican (3.7kW).
  • Trace replay: European Center for Medium-range Weather Forecast. Every request for 2.4 years. Run through the simulator. Tiering model. Tiered system with Primary storage, cache and pelican.
  • Trace replay: Plotting highest response time for a 2h period. Response time was not bad, simulator close to the rack.
  • Trace replay: Plotting deepest queues for a 2h period. Again, simulator close to the rack.
  • War stories. Booting a system with 1152 disks (BIOS changes needed). Port multiplier – port 0 (firmware change needed). Data model for system (serial numbers for everything). Things to track: slots, volumes, media.

 

Torturing Databases for Fun and Profit
Mai Zheng, Assistant Professor Computer Science Department – College of Arts and Sciences, New Mexico State University

  • White paper “Torturing Databases for Fun and Profit” at https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-zheng_mai.pdf
  • Databases are used to store important data. Should provide ACID properties: atomicity, consistency, isolation, durability – even under failures.
  • List of databases that passed the tests: <none>. Everything is broken under simulated power faults.
  • Power outages are not that uncommon. Several high profile examples shown.
  • Fault model: clean termination of I/O stream. Model does not introduce corruption/dropping/reorder.
  • How to test: Connect database to iSCSI target, then decouple the database from the iSCSI target.
  • Workload example. Key/value table. 2 threads, 2 transactions per thread.
  • Known initial state, each transaction updates N random work rows and 1 meta row. Fully exercise concurrency control.
  • Simulates power fault during our workload. Is there any ACID violation after recovery? Found atomicity violation.
  • Capture I/O trace without kernel modification. Construct a post-fault disk image. Check the post-fault DB.
  • This makes testing different fault points easy. But enhanced it with more context, to figure out what makes some fault points special.
  • With that, five patterns found. Unintended update to the mmap’ed blocks. Pattern-based ranking of where fault injections will lead to pattern.
  • Evaluated 8 databases (open source and commercial). Not a single database could survive.
  • The most common violation was durability. Some violations are difficult to trigger, but the framework helped.
  • Case study: A TokyoCabinet Bug. Looking at the fault and why the database recovery did not work.
  • Pattern-based fault injection greatly reduced test points while achieving similar coverage.
  • Wake up call: Traditional testing methodology may not be enough for today’s complex storage systems.
  • Thorough testing requires purpose-built workloads and intelligent fault injection techniques.
  • Different layers in the OS can help in different ways. For instance, iSCSI is an ideal place for fault injection.
  • We should bridge the gaps in understanding and assumptions. For instance, durability might not be provided by the default DB configuration.

 

Personal Cloud Self-Protecting Self-Encrypting Storage Devices
Robert Thibadeau, Ph.D., Scientist and Entrepreneur, CMU, Bright Plaza
http://www.snia.org/sites/default/files/DSS-Summit-2015/presentations/RobertThibadeau_Personal%20Cloud.pdf

  • This talk is about personal devices, not enterprise storage.
  • The age of uncontrolled data leaks. Long list of major hacks recently. All phishing initiated.
  • Security ~= Access Control.  Security should SERVE UP privacy.
  • Computer security ~= IPAAAA, Integrity, Private, Authentication, Authorization, Audit, Availability. The first 3 are encryption, the other aren’t.
  • A storage device is a computing device. Primary host interface, firmware, special hardware functions, diagnostic parts, probe points.
  • For years, there was a scripting language inside the drives.
  • TCG Core Spec. Core (Data Structures, Basic Operations) + Scripting (Amazing use cases).
  • Security Provider: Admin, Locking, Clock, Forensic Logging, Crypto services, internal controls, others.
  • What is an SED (Self-Encrypting Device)? Drive Trust Alliance definition: Device uses built-in hardware encryption circuits to read/write data in/out of NV storage.
  • At least one Media Encryption Key (MEK) is protected by at least one Key Encryption Key (KEK, usually a “password”).
  • Self-Encrypting Storage. Personal Storage Landscape. People don’t realize how successful it is.
  • All self-encrypting today: 100% of all SSDs, 100% of all enterprise storage (HDD, SSD, etc), all iOS devices, 100% of WD USB HDDs,
  • Much smaller number of personal HDDs are Opal or SED. But Microsoft Bitlocker supports “eDrive” = Opal 2.0 drives of all kinds.
  • You lose 40% of performance of a phone if you’re doing software encryption. You must do it in hardware.
  • Working on NVM right now.
  • Drive Trust Alliance: sole purpose to facilitate adoption of Personal SED. www.drivetrust.org
  • SP-SED Rule 1 – When we talk about cloud things, every personal device is actually in the cloud so… Look in the clouds for what should be in personal storage devices.
  • TCG SED Range. Essentially partitions in the storage devices that have their own key. Bitlocker eDrive – 4 ranges. US Government uses DTA open source for creating resilient PCs using ranges. BYOD and Ransomware protection containers.
  • Personal Data Storage (PDS). All data you want to protect can be permitted to be queried under your control.
  • Example: You can ask if you are over 21, but not what your birthday is or how old you are, although data is in your PDS.
  • MIT Media Lab, OpenPDS open source offered by Kerberos Consortium at MIT.
  • Homomorphic Encryption. How can you do computing operations on encrypted data without ever decrypting the data. PDS: Ask questions without any possibility of getting at the data.
  • It’s so simple, but really hard to get your mind wrapped around it. The requests come encrypted, results are encrypted and you can never see the plaintext over the line.
  • General solution was discovered but it is not computationally infeasible (like Bitcoin). Only in the last few years (2011) it improved.
  • HE Cloud Model and SP-DED Model. Uses OAuth. You can create personal data and you can get access to questions to your personal data. No plain text.
  • Solution for Homomorphic Encryption. Examples – several copies of the data. Multiple encryption schemes. Each operation (Search, Addition, Multiplication) uses a different scheme.
  • There’s a lot of technical work on this now. Your database will grow a lot to accommodate these kinds of operations.
  • SP-SED Rule 2 – Like the internet cloud: if anybody can make money off an SP-SED, then people get really smart really fast… SP-SED should charge $$ for access to the private data they protect.
  • The TCG Core Spec was written with this in mind. PDS and Homomorphic Encryption provide a conceptual path.
  • Challenges to you: The TCG Core was designed to provide service identical to the Apple App Store, but in Self-Protecting Storage devices. Every personal storage device should let the owner of the device make money off his private data on it.

 

Hitachi Data Systems – Security Directions and Trends
Eric Hibbard, Chair SNIA Security Technical Working Group, CTO Security and Privacy HDS

  • Protecting critical infrastructure. No agreement on what is critical.
  • What are the sections of critical infrastructure (CI)? Some commonality, but no agreement. US=16 sectors, CA=10, EU=12, UK=9, JP=10.
  • US Critical Infrastructure. Less than 20% controlled by the government. Significant vulnerabilities. Good news is that cybersecurity is a focus now. Bad news: a lot of interdependencies (lots of things depend on electric power).
  • Threat landscape for CI. Extreme weather, pandemics, terrorism, accidents/technical failures, cyber threats.
  • CI Protection – Catapulted to the forefront. Several incidents, widespread concern, edge of cyber-warfare, state-sponsored actions.
  • President Obama declared a National Emergency on 04/01/2015 due to rising number of cyberattacks.
  • CI protection initiatives. CI Decision-making organizations, CIP decisions. CIP decision-support system. The goal is to learn from attacks, go back and analyze what we could have done better.
  • Where is the US public sector going? Rethinking strategy, know what to protect, understand value of information, beyond perimeter security, cooperation.
  • Disruptive technologies:  Mobile computing, cloud computing, machine-to-machine, big data analytics, industrial internet, Internet of things, Industry 4.0, software defined “anything”. There are security and privacy issues for each. Complexity compounded if used together.
  • M2M maturity. Machine-to-machine communication between devices that are extremely intelligent, maybe AI.
  • M2M analytics building block. Big Data + M2M. This is the heart and soul of smart cities. This must be secured.
  • IoT. 50 billion connected objects expected by 2020. These will stay around for a long time. What if they are vulnerable and inside a wall?
  • IoT will drive big data adoption. Real time and accurate data sensing. They will know where you are at any point in time.
  • CI and emerging technology. IoT helps reduce cost, but it increases risks.
  • Social Infrastructure (Hitachi View). Looking at all kinds of technologies and their interplay. It requires collaborative system.
  • Securing smart sustainable cities. Complex systems, lots of IoT and cloud and big data, highly vulnerable. How to secure them?

 

Enterprise Key Management & KMIP: The Real Story  – Q&A with EKM Vendors
Moderator: Tony Cox, Chair SNIA Storage Security Industry Forum, Chair OASIS KMIP Technical Committee
Panelists: Tim Hudson, CTO, Cryptsoft
Nathan Turajski, Senior Product Manager, HP
Bob Lockhart, Chief Solutions Architect, Thales e-Security, Inc
Liz Townsend, Director of Business Development, Townsend Security
Imam Sheikh, Director of Product Management, Vormetric Inc

  • Goal: Q&A to explore perspective in EKM, KMIP.
  • What are the most critical concerns and barriers to adoption?
  • Some of developers that built the solution are no longer there. Key repository is an Excel spreadsheet. Need to explain that there are better key management solutions.
  • Different teams see this differently (security, storage). Need a set of requirements across teams.
  • Concern with using multiple vendors, interoperability.
  • Getting the right folks educated about basic key management, standards, how to evaluate solutions.
  • Understanding the existing solutions already implemented.
  • Would you say that the OASIS key management has progressed to a point where it can be implemented with multiple venders?
  • Yes, we have demonstrated this many times.
  • Trend to use KMIP to pull keys down from repository.
  • Different vendors excel in different areas and complex system do use multiple vendors.
  • We have seen migrations from one vendor to another. The interoperability is real.
  • KMIP has become a cost of entry. Vendors that do not implement it are being displaced.
  • It’s not just storage. Mobile and Cloud as well.
  • What’s driving customer purchasing? Is it proactive or reactive? With interoperability, where is the differentiation?
  • It’s a mix of proactive and reactive. Each vendor has different background and different strengths (performance, clustering models). There are also existing vendor relationships.
  • Organizations still buy for specific applications.
  • It’s mixed, but some customers are planning two years down the line. One vendor might not be able to solve all the problems.
  • Compliance is driving a lot of the proactive work, although meeting compliance is a low bar.
  • Storage drives a lot of it, storage encryption drives a lot of it.
  • What benefits are customers looking for when moving to KMIP? Bad guy getting to the key, good guy losing the key, reliably forget the key to erase data?
  • There’s quote a mix of priorities. The operational requirements not to disrupt operations. Assurances that a key has been destroyed and are not kept anywhere.
  • Those were all possible before. KMIP is about making those things easier to use and integrate.
  • Motivation is to follow the standard, auditing key transitions across different vendors.
  • When I look at the EU regulation, cloud computing federating key management. Is KMIP going to scale to billions of keys in the future?
  • We have vendors that work today with tens of billions of key and moving beyond that. The underlying technology to handle federation is there, the products will mature over time.
  • It might actually be trillions of keys, when you count all the applications like the smart cities, infrastructure.
  • When LDAP is fully secure and everything is encrypted. How does secure and unsecure merge?
  • Having conversations about different levels of protections for different attributes and objects.
  • What is the different from a local key management to a remote or centralized approaches?
  • There are lots of best practices in the high scale solutions (like separation of duties), and not all of them are there for the local solution.
  • I don’t like to use simple and enterprise to classify. It’s better to call them weak and strong.
  • There are scenarios where the key needs to local for some reason, but need to secure the key, maybe have a hybrid solution with a cloud component.
  • Some enterprises think in terms of individual projects, local key management. If they step back, they will see the many applications and move to centralized.
  • With the number of keys grows will we need a lot more repositories with more interop?
  • Yes. It is more and more a requirement, like in cloud and mobile.
  • Use KMIP layer to communicate between them.
  • We’re familiar with use cases? What about abuse cases? How to protect that infrastructure?
  • It goes back to not doing security by obscurity.
  • You use a standard and audit the accesses. The system will be able to audit, analyze and alert you when it sees these abuses.
  • The repository has to be secure, with two-factor authentication, real time monitoring, allow lists for who can access the system. Multiple people to control your key sets.
  • Key management is part of the security strategy, which needs to be multi-layered.
  • Simple systems and a common language is a vector for attack, but we need to do it.
  • Key management and encryption is not the end all and be all. There must be multiple layers. Firewall, access control, audit, logging, etc. It needs to be comprehensive.

 

Lessons Learned from the 2015 Verizon Data Breach Investigations Report
Suzanne Widup, Senior Analyst, Verizon
http://www.snia.org/sites/default/files/DSS-Summit-2015/presentations/SuzanneWidupLearned_Lessons_Verizon.pdf

  • Fact based research, gleaned from case reports. Second year that we used data visualization. Report at http://www.verizonenterprise.com/DBIR/2015/
  • 2015 DBIR: 70 contributed organizations, 79,790 security incidents, 2,122 confirmed data breaches, 61 countries
  • The VERIS framework (actor – who did it, action – how they did it, asset – what was affected, attribute – how it was affected). Given away for free.
  • We can’t share all the data. But some if it publicly disclosed and it’s in a GitHub repository as JSON files. http://www.vcdb.org.
  • You can be a part of it. Vcdb.org needs volunteers – be a security hero.
  • Looking at incidents vs. breaches. Divided by industry. Some industries have higher vulnerabilities, but a part of it is due to visibility.
  • Which industries exhibit similar threat profiles? There might be other industries that look similar to yours…
  • Zooming into healthcare and other industries with similar threat profiles.
  • Threat actors. Mostly external. Less than 20% internal.
  • Threat actions. Credentials (down), RAM scrapers (up), spyware/keyloggers (down), phishing (up).
  • The detection deficit. Overall trend is still pretty depressing. The bad guys are innovating faster than we are.
  • Discovery time line (from 2015). Mostly discovered in days or less.
  • The impact of breaches. We’re were not equipped to measure impact before. This year we partnered with insurance partners. We only have 50% of what is going on here.
  • Plotting the impact of breaches. If you look at the number of incidents, it was going down. If you look at the records lost, it is growing.
  • Charting number of records (1 to 100M) vs. expected loss (US$). There is a band from optimist to pessimist.
  • The nefarious nine: misc errors, crimeware, privilege misuse, lost/stolen assets, web applications, denial of service, cyber-espionage, point of sale, payment card skimmers.
  • Looks different if you use just breaches instead of all incidents. Point of sale is higher, for instance.
  • All incidents, charted over time (graphics are fun!)
  • More charts. Actors and the nine patterns. Breaches by industry.
  • Detailed look at point of sale (highest in accommodation, entertainment and retail), crimeware, cyber-espionage (lots of phishing), insider and privilege misuse (financial motivation), lost/stolen devices, denial of service.
  • Threat intelligence. Share early so it’s actionable.
  • Phishing for hire companies (23% of recipients open phishing messages, 11% click on attachments)
  • 10 CVEs account for 97% of exploits. Pay attention to the old vulnerabilities.
  • Mobile malware. Android “wins” over iOS.
  • Two-factor authentication and patching web servers mitigates 24% of vulnerabilities each.

My Top Reasons to Use OneDrive

$
0
0

 

As you might have noticed, I am now in the OneDrive team. Since I’ve been here for a few months, I think I earned the right to start sharing a few blogs about OneDrive. I’ll do that over the next few months, focusing on the user’s view of OneDrive (as opposed to the view we have from the inside).

 

To get things started, this post shares my top reasons to use OneDrive. As you probably already heard, OneDrive is a cloud storage solution by Microsoft. You can upload, download, sync, and share files from your PC, Mac, Phone or Tablet. Here are a few reasons why I like to use OneDrive.

 

1) Your files in the cloud.The most common reason for using OneDrive is to upload or synchronize your local data to the cloud. This will give you one extra copy of your documents, pictures and videos, which you could use if your computer breaks. Remember the 3-2-1 rules: have 3 copies of your important files, 2 in different media, 1 in another site. For instance, you could have one copy of your files on your PC, one copy in an external drive and one copy in OneDrive.

 

clip_image002

 

2) View and edit Office documents. OneDrive offers a great web interface that you can access anywhere you have a OneDrive client or using the http://onedrive.comweb site. The site includes viewers for common data types like videos and pictures. For your Office documents, you can use the great new Office apps for Windows, Mac OSX, Windows Phone, iOS and Android. You can also use the web versions of Word, Excel, PowerPoint or OneNote right from the OneDrive.com web site to create, view and edit your documents (even if Office is not installed on the machine).

 

clip_image003

 

3) Share files with others. Once your data is in the cloud, you have the option to share a file or an entire folder with others. You can use this to share pictures with your family or to share a document with a colleague. It’s simple to share, simple to access and you can stop sharing at any time.  OneDrive has a handy feature to show files shared with you as part of your drive and it’s quite useful.

 

clip_image004

 

4) Upload your photos automatically. If you use a phone or tablet to take pictures and video, you can configure it to automatically upload them to OneDrive. This way your cherished memories will be preserved in the cloud. If you’re on vacation and you phone is lost or stolen, you can replace the phone, knowing that your files were already preserved. We have OneDrive clients for Windows Phone, iOS and Android.

 

clip_image006

 

5) Keep in sync across devices. If you have multiple computers, you know how hard it is to keep data in sync. With OneDrive, you can keep your desktop, you laptop and your tablet in sync, automatically. We have OneDrive sync clients for Windows and Mac OSX. You also have an option to sync only a subset of your folders. This will help you have all files on a computer with a large drive, but only a few folders on another computer with limited storage.

 

clip_image008

 

6) Search.OneDrive offers a handy search feature that can help you find any of your files. Beyond simply searching for document names or text inside your documents, OneDrive will index the text inside your pictures, the types of picture (using tags like #mountain, #people, #car or #building) or the place where a picture was taken.

 

clip_image009

 

Did I forget something important? Use the comments to share other reasons why you like to use OneDrive…

Perhaps OneDrive

$
0
0

 

Perhaps OneDrive

Perhaps OneDrive’s like a place to save
A shelter from the storm
It exists to keep your files
In their clean and tidy form
And in those times of trouble
When your PC is gone
The memory in OneDrive
will bring you home

Perhaps OneDrive is like a window
Perhaps like one full screen
On a watch or on a Surface Hub
Or anywhere in between
And even if you lose your cell
With pictures you must keep
The memory in OneDrive
will stop your weep.

OneDrive to some is like a cloud
To some as strong as steel
For some a way of sharing
For some a way to view
And some use it on Windows 10
Some Android, some iPhone
Some browse it on a friend’s PC
When away from their own

Perhaps OneDrive is like a workbench
Full of projects, full of plans
Like the draft of a great novel
your first rocket as it lands
If I should live forever
And all my dreams prevail
The memory in OneDrive
will tell my tale

PowerShell for finding the size of your local OneDrive folder

$
0
0

I would just like to share a couple of PowerShell scripts to find the size of your local OneDrive folder. Note that this just looks at folders structures and does not interact with the OneDrive sync client or the OneDrive service.

First, a one-liner to show the total files, bytes and GBs under the local OneDrive folder (typically C:\Users\Username\OneDrive):

$F=0;$B=0;$N=(Type Env:\UserProfile)+"\OneDrive";Dir $N -R -Fo|%{$F++;$B+=$_.Length};$G=$B/1GB;"$F Files, $B Bytes, $G GB" #PS OneDrive Size

Second, a slightly longer script that shows files, folders, bytes and GBs for all folders under the profile folder that starts with “One”. That typically includes both your regular OneDrive folder and any OneDrive for Business folders:

$OneDrives = (Get-Content Env:\USERPROFILE)+"\One*"
Dir $OneDrives | % {
   $Files=0
   $Bytes=0
   $OneDrive = $_
   Dir $OneDrive -Recurse -File -Force | % {
       $Files++
       $Bytes += $_.Length
   }
   $Folders = (Dir $OneDrive -Recurse -Directory -Force).Count
   $GB = [System.Math]::Round($Bytes/1GB,2)
   Write-Host "Folder ‘$OneDrive’ has $Folders folders, $Files files, $Bytes bytes ($GB GB)"
}

Here is a sample output of the code above:

Folder ‘C:\Users\jose\OneDrive’ has 4239 folders, 33967 files, 37912177448 bytes (35.31 GB)
Folder ‘C:\Users\jose\OneDrive-Microsoft’ has 144 folders, 974 files, 5773863320 bytes (5.38 GB)

The ABC language, thirty years later…

$
0
0

Back in March 1986, I was in my second year of college (Data Processing at the Universidade Federal do Ceara in Brazil). I was also teaching programming night classes at a Brazilian technical school. On that year, I created a language called ABC, complete with a little compiler. It compiled the ABC code into pseudo code and ran it right away.

I actually used this language for a few years to teach an introductory programming class. Both the commands of the ABC language and the messages of the compiler were written in Portuguese. This made it easier for my Brazilian students to start in computer programming without having to know any English. Once they were familiar with the basic principles, they would start using conventional languages like Basic and Pascal.

The students would write some ABC code using a text editor and run the command “ABC filename” to compile and immediately run the code if no errors were found. The tool wrote a binary log entry for every attempt to compile/run a program with the name of the file, the error that stopped the compilation or how many instructions were executed. The teachers had a tool to read this binary log and examine the progress of a student over time.

I remember having a lot of fun with this project. The language was very simple and each command would have up to two parameters, followed by a semicolon. There were dozens of commands including:

  • Inicio (start, no action)
  • Fim (end, no action)
  • * (comment, no action)
  • Mova (move, move register to another register)
  • Troque (swap, swap contents of two registers)
  • Salve (save, put data into a register)
  • Restore (restore, restore data from a register)
  • Entre (enter, receive input from the keyboard)
  • Escreva (write, write to the printer)
  • Escreva> (writeline, write to the printer and jump to the next line)
  • Salte (jump, jump to the next printed page)
  • Mostre (display, display on the screen)
  • Mostre> (displayline, display on the screen and jump to the next line)
  • Apague (erase, erase the screen)
  • Cursor (cursor, position the cursor at the specified screen coordinates)
  • Pausa (pause, pause for the specified seconds)
  • Bip (beep, make a beeping sound)
  • Pare (stop, stop executing the program)
  • Desvie (goto, jump to the specified line number)
  • Se (if, start a conditional block)
  • FimSe (endif, end a conditional block)
  • Enquanto (while, start a loop until a condition is met)
  • FimEnq (endwhile, end of while loop)
  • Chame (call, call a subroutine)
  • Retorne (return, return from a subroutine)
  • Repita (repeat, start a loop that repeats a number of times)
  • FimRep (endrepeat, end of repeat loop)
  • AbraSai (openwrite, open file for writing)
  • AbraEnt (openread, open file for reading)
  • Feche (close, close file)
  • Leia (read, read from file)
  • Grave (write, write to file)
  • Ponha (poke, write to memory address)
  • Pegue (peek, read from memory address)

The language used 26 pre-defined variables named after each letter. There were also 100 memory positions you could read/write into. I was very proud of how you could use complex expressions with multiple operators, parenthesis, different numeric bases (binary, octal, decimal, hex) and functions like:

  • Raiz (square root)
  • Inverso (reverse string)
  • Caractere (convert number into ASCII character)
  • Codigo (convert ASCII character into a number)
  • FimArq (end of file)
  • Qualquer (random number generator)
  • Tamanho (length of a string)
  • Primeiro (first character of a string)
  • Restante (all but the first character of a string)

I had a whole lot of samples written in ABC, showcasing each of the command, but I somehow lost them along the way. I also had a booklet that we used in the programming classes, with a series of concept followed by examples in ABC. I also could not find it. Oh, well…

At least the source code survived (see below). I used an old version of Microsoft Basic running on a CP/M 2.2 operating system on a TRS-80 clone. Here are a few comments for those not familiar with that 1980’s language:

  • Line numbers were required. Colons were used to separate multiple commands in a single line.
  • Variables ending in $ were of type string. Variable with no suffix were of type integer.
  • Your variable names could be any length, but only the first 4 characters were actually used. Periods were allowed in variable names.
  • DIM was used to create arrays. Array dimensions were predefined and fixed. There wasn’t a lot of memory.
  • READ command was used to read from DATA lines. RESTORE would set the next DATA line to READ.
  • Files could be OPEN for sequential read (“I” mode), sequential write (“O” mode) or random access (“R” mode).

It compiled into a single ABC.COM file (that was the executable extension then). It also used the ABC.OVR file, which contained the error message and up to 128 compilation log entries. Comments are in Portuguese, but I bet you can understand most of it. The code is a little messy, but keep in mind this was written 30 years ago…

 

2 '************************************************************
3 '*   COMPILADOR/EXECUTOR DE LINGUAGEM ABC - MARCO/1986      *
4 '*               Jose Barreto de Araujo Junior              *
5 '*     com calculo recursivo de expressoes aritmeticas      *
6 '************************************************************
10 ' Versao  2.0 em 20/07/86
11 ' Revisao 2.1 em 31/07/86
12 ' Revisao 2.2 em 05/08/86
13 ' Revisao 2.3 em 15/02/87
14 ' Revisao 2.4 em 07/06/87, em MSDOS
20 '********** DEFINICOES INICIAIS
21 DEFINT A-Z:CLS:LOCATE 1,1,1:ON ERROR GOTO 63000
22 C.CST=1:C.REGIST=2:LT$=STRING$(51,45)
25 DIM ENT$(30),RET$(30),TP(30),P1$(30),P2$(30)
30 DIM CMD(200),PR1$(199),PR2$(199)
35 DIM MEM$(99),REGIST$(26),PRM$(4),MSG$(99)
36 DIM CT(40),REP(10),REPC(10),ENQ(10),ENQ$(10),CHA(10)
40 DEF FNS$(X)=MID$(STR$(X),2)
55 OPER$="!&=#><+-*/^~":MAU$=";.[]()?*"
60 FUNC$="RAIZ     INVERSO  CARACTER CODIGO   FIMARQ   QUALQUER "
62 FUNC$=FUNC$+"TAMANHO  PRIMEIRO RESTANTE ARQUIVO  "
65 ESC$=CHR$(27):BIP$=CHR$(7):TABHEX$="FEDCBA9876543210"
66 OK$=CHR$(5)+CHR$(6)+CHR$(11)
70 M.LN=199:M.CMD=37:MAX=16^4/2-1
75 ESP$=" ":BK$=CHR$(8):RN$="R":IN$="I":OU$="O":NL$=""
80 OPEN RN$,1,"ABC2.OVR",32:FIELD 1,32 AS ER$
85 IF LOF(1)=0 THEN CLOSE:KILL"ABC2.OVR":PRINT "ABC2.OVR NAO ENCONTRADO":END
90 GOSUB 10000 '********** MOSTRA MENSAGEM INICIAL
95 PRINT "Nome do programa: ";:BAS=1:GOSUB 18000:AR$=RI$:GOSUB 10205
99 '********** DEFINICAO DOS COMANDOS
100 DIM CMD$(37),PR$(37):CHQ=0:RESTORE 125
105 FOR X=1 TO M.CMD:READ CMD$(X),PR$(X)
110    CHQ=CHQ+ASC(CMD$(X))+VAL(PR$(X))
115 NEXT : IF CHQ<>3402 THEN END
120 '********** TABELA DOS COMANDOS E PARAMETROS
125 DATA INICIO,10,FIM,10,"*",10
130 DATA MOVA,54,TROQUE,55
135 DATA SALVE,30,RESTAURE,30," ",00
140 DATA ENTRE,52,ESCREVA,42,ESCREVA>,42,MOSTRE,42,MOSTRE>,42
145 DATA SALTE,00,APAGUE,00,CURSOR,22,PAUSA,20,BIP,00
150 DATA PARE,00,DESVIE,40,SE,20," ",00,FIMSE,00
155 DATA ENQUANTO,20," ",00,FIMENQ,00,CHAME,20,RETORNE,00
160 DATA REPITA,20,FIMREP,00
165 DATA ABRASAI,30,ABRAENT,30,FECHE,00,LEIA,50,GRAVE,40
170 DATA PONHA,42,PEGUE,52
190 '********** ABRE ARQUIVO PROGRAMA
200 IF LEN(ARQ$)=0 THEN ERROR 99:GOTO 64000
210 OPEN RN$,2,ARQ$:ULT=LOF(2):CLOSE#2
220 IF ULT=0 THEN KILL ARQ$:ERROR 109:GOTO 64000
390 '********** COMPILACAO
400 N.ERR=0:N.LN=0:IDT=0:CT.SE=0:CT.REP=0:CT.ENQ=0:I.CT=0:LN.ANT=0:CMP=1
405 PRINT:PRINT:PRINT "Compilando ";ARQ$
406 IF DEPUR THEN PRINT "Depuracao"
407 PRINT
410 OPEN IN$,2,ARQ$
415 WHILE NOT EOF(2)
420     LN.ERR=0:LINE INPUT#2,LN$
422     IF INKEY$=ESC$ THEN PRINT "*** Interrompido":GOTO 64000
425     N.LN=N.LN+1:GOSUB 20000 '*ANALISE SINTATICA DA LINHA
430 WEND:CLOSE#2
435 FOR X=IDT TO 1 STEP -1
440     ERROR CT(X)+115
445 NEXT X
450 PRINT:PRINT FNS$(N.LN);" linha(s) compilada(s)"
490 '********** EXECUCAO
500 IF N.ERR THEN PRINT FNS$(N.ERR);" erro(s)":GOTO 64000
510 PRINT "0 erros"
515 PRINT "Executando ";ARQ$:PRINT
520 NL=1:CMP=0:N.CMD=0:CHA=0:ENQ=0:REP=0:SE=0:ESC=0
525 FOR X=1 TO 99:MEM$(X)="":NEXT:FOR X=1 TO 26:REGIST$(X)="":NEXT
530 WHILE NL<=M.LN
535     PNL=NL+1:CMD=CMD(NL):PR1$=PR1$(NL):PR2$=PR2$(NL)
540     IF CMD>3 THEN GOSUB 30000:N.CMD=N.CMD+1 '****** EXECUTA COMANDO
550     NL=PNL:REGIST$(26)=INKEY$
555     IF REGIST$(26)=ESC$ OR ESC=1 THEN NL=M.LN+1:PRINT "*** Interrompido"
560 WEND
570 PRINT:PRINT ARQ$;" executado"
580 PRINT FNS$(N.CMD);" comando(s) executado(s)"
590 PRINT:PRINT "Executar novamente? ";
600 A$=INPUT$(1):IF A$="S" OR A$="s" THEN PRINT "sim":GOTO 515
610 PRINT "nao";:GOTO 64000
9999 '********** ROTINA DE MENSAGEM INICIAL
10000 CLS:PRINT LT$
10020 XA$="| COMPILADOR/EXECUTOR DE LINGUAGEM ABC VERSAO 2.4 |"
10030 PRINT XA$:PRINT LT$:PRINT
10040 CHQ=0:FOR X=1 TO LEN(XA$):CHQ=CHQ+ASC(MID$(XA$,X,1)):NEXT
10050 IF CHQ<>3500 THEN END ELSE RETURN
10199 '********** ROTINA PARA PEGAR NOME DO ARQUIVO
10200 AR$=NL$:K=PEEK(128):FOR X=130 TO 128+K:AR$=AR$+CHR$(PEEK(X)):NEXT
10205 IF AR$="" THEN ERROR 99:GOTO 64000
10210 AR$=AR$+ESP$:PS=INSTR(AR$,ESP$)
10220 ARQ$=LEFT$(AR$,PS-1):RESTO$=MID$(AR$,PS+1)
10221 IF LEFT$(RESTO$,1)="?" THEN DEPUR=1
10230 FOR X=1 TO LEN(MAU$):P$=MID$(MAU$,X,1)
10240   IF INSTR(ARQ$,P$) THEN ERROR 100:GOTO 64000
10250 NEXT
10270 IF LEN(ARQ$)>12 THEN ERROR 100:GOTO 64000
10280 IF INSTR(ARQ$,".")=0 THEN ARQ$=ARQ$+".ABC"
10290 RETURN
17999 '********** ROTINA DE ENTRADA DE DADOS
18000 BAS$=FNS$(BAS):RI$=NL$
18010 A$=INPUT$(1)
18020 WHILE LEN(RI$)<255 AND A$<>CHR$(13) AND A$<>ESC$
18030    RET$=RI$
18040    IF A$=BK$ AND RI$<>NL$ THEN RI$=LEFT$(RI$,LEN(RI$)-1):PRINT ESC$;"[D ";ESC$;"[D";
18050    IF BAS=1 AND A$>=ESP$ THEN RI$=RI$+A$:PRINT A$;
18070    IF BAS>1 AND INSTR(17-BAS,TABHEX$,A$) THEN RI$=RI$+A$:PRINT A$;
18090    A$=INPUT$(1)
18100 WEND
18105 IF A$=ESC$ THEN ESC=1
18110 A$=RI$:GOSUB 42030:RI$=RC$:RETURN
18120 RETURN
18499 '********** CONVERTE PARA BASE ESTRANHA
18500 IF BAS=0 THEN BAS=1
18505 IF BAS=1 OR BAS=10 THEN RETURN
18510 A=VAL(A$):A$=""
18520 WHILE A>0:RS=A MOD BAS:A$=MID$(TABHEX$,16-RS,1)+A$:A=A\BAS:WEND
18525 IF A$="" THEN A$="0"
18530 RETURN
18999 '********** EXECUTA PROCURA DE FIMREP,FIMSE,FIMENQ
19000 IDT=0
19010 WHILE (CMD(PNL)<>FIM OR IDT>0) AND PNL<100
19020    IF CMD(PNL)=INI THEN IDT=IDT+1
19030    IF CMD(PNL)=FIM THEN IDT=IDT-1
19040    PNL=PNL+1
19050 WEND:PNL=PNL+1
19060 RETURN
19500 FOR X=1 TO LEN(UP$)
19510     PP$=MID$(UP$,X,1)
19520     IF PP$>="a" AND PP$<="z" THEN MID$(UP$,X,1)=CHR$(ASC(PP$)-32)
19530 NEXT X:RETURN
19600 N.PRM=N.PRM+1:PRM$(N.PRM)=LEFT$(A$,C-1):A$=MID$(A$,C+1)
19610 C=1:WHILE MID$(A$,C,1)=ESP$:C=C+1:WEND:A$=MID$(A$,C):C=0
19620 IF LEN(PRM$(N.PRM))=1 THEN PRM$(N.PRM)=CHR$(ASC(PRM$(N.PRM))+(PRM$(N.PRM)>"Z")*32)
19630 RETURN
19990 '********** ANALISE SINTATICA DA LINHA
19999 '********** RETIRA BRANCOS FINAIS E INICIAIS
20000 N.PRM=0:A$=LN$:PRM$(1)=NL$:PRM$(2)=NL$
20010 C=1:WHILE MID$(A$,C,1)=ESP$:C=C+1:WEND:A$=MID$(A$,C)
20020 C=LEN(A$)
20040 WHILE MID$(A$,C,1)=ESP$ AND C>0:C=C-1:WEND
20050 A$=LEFT$(A$,C):LN$=A$
20100 '********** ISOLA O NUMERO DA LINHA
20105 C=INSTR(A$,ESP$):NUM$=LEFT$(A$,C):A$=MID$(A$,C+1)
20110 C=1:WHILE MID$(A$,C,1)=ESP$:C=C+1:WEND:A$=MID$(A$,C)
20115 IF NUM$="" AND A$="" THEN RETURN
20120 PRINT NUM$;TAB(5+IDT*3);A$
20130 NL=VAL(NUM$):IF NL<1 OR NL>M.LN THEN ERROR 111:RETURN
20135 IF NL<=LN.ANT THEN ERROR 122:RETURN ELSE LN.ANT=NL
20140 IF MID$(A$,LEN(A$))<>";" THEN PRINT TAB(5+IDT*3);"*** ponto e virgula assumido aqui":A$=A$+";"
20200 '********** ISOLA COMANDO
20210 C=1:P=ASC(MID$(A$,C,1))
20220 WHILE P>59 OR P=42:C=C+1:P=ASC(MID$(A$,C,1)):WEND
20230 CMD$=LEFT$(A$,C-1):A$=MID$(A$,C):A$=LEFT$(A$,LEN(A$)-1)
20240 C=1:WHILE MID$(A$,C,1)=ESP$:C=C+1:WEND:A$=MID$(A$,C)
20300 '********** ISOLA PARAMETROS
20310 IF INSTR(A$,CHR$(34)) THEN GOSUB 27000
20315 PAR=0:C=1
20320 WHILE C<=LEN(A$) AND NPRM<4
20340    P$=MID$(A$,C,1)
20350    IF P$="(" THEN PAR=PAR+1
20360    IF P$=")" THEN PAR=PAR-1
20380    IF P$=ESP$ AND PAR=0 THEN GOSUB 19600
20390    C=C+1
20400 WEND
20410 IF A$<>NL$ THEN N.PRM=N.PRM+1:PRM$(N.PRM)=A$
20420 IF N.PRM>2 THEN ERROR 112:RETURN
20430 PR1$=PRM$(1):PR2$=PRM$(2)
20990 '********** IDENTIFICA COMANDO, 99=ERRO
21000 C.CMD=99:UP$=CMD$:GOSUB 19500:CMD$=UP$
21010 FOR X=1 TO M.CMD
21020   IF CMD$=CMD$(X) THEN C.CMD=X
21030 NEXT X
21040 IF C.CMD=99 THEN ERROR 114:RETURN
21050 CMD(NL)=C.CMD:PR1$(NL)=PR1$:PR2$(NL)=PR2$
21060 '********** ANALISE DE COMANDOS PARENTESIS
21100 C=C.CMD
21110 INI=-(C=21)-2*(C=24)-3*(C=29)
21120 FIM=-(C=23)-2*(C=26)-3*(C=30)
21130 IF INI THEN IDT=IDT+1:CT(IDT)=INI
21140 IF FIM THEN GOSUB 26000:IF LN.ERR THEN RETURN
21990 '********** IDENTIFICA PARAMETROS
22000 PR1=VAL(LEFT$(PR$(C.CMD),1)):PR2=VAL(RIGHT$(PR$(C.CMD),1))
22010 PR$=PR1$:PR=PR1:GOSUB 25000:IF LN.ERR THEN RETURN
22020 TIP.ANT=TIP2:PR$=PR2$:PR=PR2:GOSUB 25000
22025 IF PR1+PR2>7 AND TIP2<>TIP.ANT THEN ERROR 110
22030 RETURN
24990 '********** ANALISE DO PARAMETRO
25000 IF PR=0 AND PR$<>NL$ THEN ERROR 112:RETURN
25010 IF PR=1 OR PR=0 THEN RETURN
25020 ENT$(I)=PR$:GOSUB 41000:IF LN.ERR THEN RETURN
25030 TIP1=TP(I)
25040 I=I+1:ENT$(I)=PR$:GOSUB 40000:IF LN.ERR THEN RETURN
25050 TIP2=TP(I+1)
25060 IF PR=4 THEN RETURN
25070 IF PR=2 AND TIP2=1 THEN RETURN
25080 IF PR=3 AND TIP2=-1 THEN RETURN
25090 IF PR=5 AND TIP1=C.REGIST THEN RETURN
25110 ERROR 115:RETURN
25990 '********** ANALISE DE FIMSE,FIMENQ E FIMREP
26000 IF IDT=0 THEN ERROR 115+FIM:RETURN
26010 IF CT(IDT)<>FIM THEN ERROR 118+CT(IDT):IDT=IDT-1:GOTO 26000
26020 IDT=IDT-1:IF IDT<0 THEN IDT=0
26030 RETURN
26999 '********** TROCA "" POR ()1
27000 ASP=0
27010 WHILE INSTR(A$,CHR$(34))
27020     P=INSTR(A$,CHR$(34))
27030     IF ASP=0 THEN MID$(A$,P,1)="(" ELSE A$=LEFT$(A$,P-1)+")1"+MID$(A$,P+1)
27040     ASP=NOT ASP
27050 WEND
27060 RETURN
29999 '********** EXECUTA COMANDO
30000 IF DEPUR THEN PRINT USING "### & & &;";NL;CMD$(CMD);PR1$;PR2$
30005                ON CMD    GOSUB 30100,30200,30300,30400,30500
30010 IF CMD>5  THEN ON CMD-5  GOSUB 30600,30700,30800,30900,31000
30020 IF CMD>10 THEN ON CMD-10 GOSUB 31100,31200,31300,31400,31500
30030 IF CMD>15 THEN ON CMD-15 GOSUB 31600,31700,31800,31900,32000
30040 IF CMD>20 THEN ON CMD-20 GOSUB 32100,32200,32300,32400,32500
30050 IF CMD>25 THEN ON CMD-25 GOSUB 32600,32700,32800,32900,33000
30060 IF CMD>30 THEN ON CMD-30 GOSUB 33100,33200,33300,33400,33500,33600,33700
30080 RETURN
30099  ' COMANDO INICIO
30100 RETURN
30199  ' COMANDO FIM
30200 RETURN
30299  ' COMANDO *
30300 RETURN
30399  ' COMANDO MOVA
30400 I=I+1:ENT$(I)=PR2$:GOSUB 40000
30410 X1=ASC(PR1$):REGIST$(X1-64)=RET$(I+1):RETURN
30499  ' COMANDO TROQUE
30500 X1=ASC(PR1$)-64:X2=ASC(PR2$)-64:SWAP REGIST$(X1),REGIST$(X2):RETURN
30599  ' COMANDO SALVE
30600 I=I+1:ENT$(I)=PR1$:GOSUB 40000:X$=RET$(I+1)
30602 OPEN OU$,3,X$:FOR X=0 TO 99:WRITE#3,MEM$(X):NEXT:CLOSE#3:RETURN
30699  ' COMANDO RESTAURE
30700 I=I+1:ENT$(I)=PR1$:GOSUB 40000:X$=RET$(I+1)
30702 OPEN IN$,3,X$:FOR X=0 TO 99:LINE INPUT#3,MEM$(X):NEXT:CLOSE#3:RETURN
30799  ' COMANDO INDEFINIDO 3
30800 RETURN
30899  ' COMANDO ENTRE
30900 I=I+1:ENT$(I)=PR2$:GOSUB 40000:BAS=VAL(RET$(I+1))
30905 IF BAS=0 THEN IF PR1$>"M" THEN BAS=1 ELSE BAS=10
30910 GOSUB 18000:X1=ASC(PR1$)-64:REGIST$(X1)=RI$:PRINT:RETURN
30999  ' COMANDO ESCREVA
31000 IF PR1$<>"" THEN I=I+1:ENT$(I)=PR1$:GOSUB 40000:X1$=RET$(I+1) ELSE X1$=""
31010 I=I+1:ENT$(I)=PR1$:GOSUB 40000:BAS=VAL(RET$(I+1))
31015 IF BAS>16 THEN ERROR 107
31020 A$=X1$:GOSUB 18500:LPRINT A$;:RETURN
31099  ' COMANDO ESCREVA>
31100 IF PR1$<>"" THEN I=I+1:ENT$(I)=PR1$:GOSUB 40000:X1$=RET$(I+1) ELSE X1$=""
31110 I=I+1:ENT$(I)=PR2$:GOSUB 40000:BAS=VAL(RET$(I+1))
31115 IF BAS>16 THEN ERROR 107
31120 A$=X1$:GOSUB 18500:LPRINT A$:RETURN
31199  ' COMANDO MOSTRE
31200 I=I+1:ENT$(I)=PR2$:GOSUB 40000:X1=VAL(RET$(I+1))
31201 IF X1>16 THEN BAS=X1:ERROR 107
31205 IF PR1$<>"" THEN I=I+1:ENT$(I)=PR1$:GOSUB 40000:A$=RET$(I+1) ELSE A$=""
31210 BAS=X1:GOSUB 18500:PRINT A$;:RETURN
31299  ' COMANDO MOSTRE>
31300 I=I+1:ENT$(I)=PR2$:GOSUB 40000:X1=VAL(RET$(I+1))
31301 IF X1>16 THEN BAS=X1:ERROR 107
31305 IF PR1$<>"" THEN I=I+1:ENT$(I)=PR1$:GOSUB 40000:A$=RET$(I+1) ELSE PR1$=""
31310 BAS=X1:GOSUB 18500:PRINT A$:RETURN
31399  ' COMANDO SALTE
31400 LPRINT CHR$(12);:RETURN
31499  ' COMANDO APAGUE
31500 CLS:RETURN
31599  ' COMANDO CURSOR
31600 I=I+1:ENT$(I)=PR1$:GOSUB 40000:X1=VAL(RET$(I+1))
31610 I=I+1:ENT$(I)=PR2$:GOSUB 40000:X2=VAL(RET$(I+1))
31620 LOCATE X1,X2:RETURN
31699  ' COMANDO PAUSA
31700 I=I+1:ENT$(I)=PR1$:GOSUB 40000:X1=VAL(RET$(I+1))
31710 FOR X!=1 TO X1*1000:NEXT:RETURN
31799  ' COMANDO BIP
31800 BEEP:RETURN
31899  ' COMANDO PARE
31900 PNL=M.LN+1:RETURN
31999  ' COMANDO DESVIE
32000 I=I+1:ENT$(I)=PR1$:GOSUB 40000:X1=VAL(RET$(I+1))
32010 IF X1<1 OR X1>M.LN THEN ERROR 108
32020 PNL=X1:RETURN
32099  ' COMANDO SE
32100 I=I+1:ENT$(I)=PR1$:GOSUB 40000:X1=VAL(RET$(I+1))
32110 IF X1=0 THEN INI=21:FIM=23:GOSUB 19000:RETURN
32120 RETURN
32199  ' COMANDO INDEFINIDO 4
32200 RETURN
32299  ' COMANDO FIMSE
32300 RETURN
32399  ' COMANDO ENQUANTO
32400 I=I+1:ENT$(I)=PR1$:GOSUB 40000:X1=VAL(RET$(I+1))
32410 IF X1=0 THEN INI=24:FIM=26:GOSUB 19000:RETURN
32420 ENQ=ENQ+1:ENQ$(ENQ)=PR1$:ENQ(ENQ)=PNL:RETURN
32499  ' COMANDO INDEFINIDO 5
32500 RETURN
32599  ' COMANDO FIMENQ
32600 IF ENQ=0 THEN ERROR 120
32605 I=I+1:ENT$(I)=ENQ$(ENQ):GOSUB 40000:X1=VAL(RET$(I+1))
32610 IF X1>0 THEN PNL=ENQ(ENQ):RETURN
32620 ENQ=ENQ-1:RETURN
32699  ' COMANDO CHAME
32700 I=I+1:ENT$(I)=PR1$:GOSUB 40000:X1=VAL(RET$(I+1))
32710 IF X1<1 OR X1>M.LN THEN ERROR 108
32720 CHA=CHA+1:CHA(CHA)=PNL:PNL=X1:RETURN
32799  ' COMANDO RETORNE
32800 IF CHA=0 THEN ERROR 109
32810 PNL=CHA(CHA):CHA=CHA-1:RETURN
32899  ' COMANDO REPITA
32900 I=I+1:ENT$(I)=PR1$:GOSUB 40000:X1=VAL(RET$(I+1))
32905 IF X1=0 THEN INI=29:FIM=30:GOSUB 19000:RETURN
32910 REP=REP+1:REPC(REP)=X1:REP(REP)=PNL:RETURN
32999  ' COMANDO FIMREP
33000 IF REP=0 THEN ERROR 118
33010 REPC(REP)=REPC(REP)-1:IF REPC(REP)>0 THEN PNL=REP(REP):RETURN
33020 REP=REP-1:RETURN
33099  ' COMANDO ABRASAI
33100 I=I+1:ENT$(I)=PR1$:GOSUB 40000:X$=RET$(I+1)
33110 OPEN OU$,3,X$:RETURN
33199  ' COMANDO ABRAENT
33200 I=I+1:ENT$(I)=PR1$:GOSUB 40000:X$=RET$(I+1)
33210 OPEN IN$,3,X$:RETURN
33299  ' COMANDO FECHE
33300 CLOSE#3:RETURN
33399  ' COMANDO LEIA
33400 LINE INPUT #3,X$
33410 X1=ASC(PR1$)-64:REGIST$(X1)=X$:RETURN
33499  ' COMANDO GRAVE
33500 I=I+1:ENT$(I)=PR1$:GOSUB 40000:X$=RET$(I+1)
33510 PRINT#3,X$:RETURN
33599  ' COMANDO PONHA
33600 I=I+1:ENT$(I)=PR1$:GOSUB 40000:XXXX$=RET$(I+1)
33610 I=I+1:ENT$(I)=PR2$:GOSUB 40000:X1=VAL(RET$(I+1))
33615 IF X1>99 THEN ERROR 124
33620 MEM$(X1)=XXXX$:RETURN
33699  ' COMANDO PEGUE
33700 X1=ASC(PR1$)-64
33710 I=I+1:ENT$(I)=PR2$:GOSUB 40000:X2=VAL(RET$(I+1))
33720 IF X2>99 THEN ERROR 124
33730 REGIST$(X1)=MEM$(X2):RETURN
39990 '********** AVALIA EXPRESSAO (RECURSIVA)
40000 GOSUB 41000:'**********AVALIA SINTAXE
40010 IF TP(I)=C.CST THEN GOSUB 42000:RET$(I)=RC$:I=I-1:RETURN
40020 IF TP(I)=C.REGIST THEN GOSUB 43000:RET$(I)=RR$:I=I-1:RETURN
40030 IF TP(I)<199   THEN GOSUB 40100:RETURN
40040 IF TP(I)<255   THEN GOSUB 40200:RETURN
40050 ERROR 101
40090 '********** FUNCAO
40100 I=I+1:ENT$(I)=P1$(I-1):GOSUB 40000
40110 P1$(I)=RET$(I+1):GOSUB 45000:RET$(I)=RP$:I=I-1:RETURN
40190 '********** OPERADOR
40200 I=I+1:ENT$(I)=P1$(I-1):GOSUB 40000:P1$(I)=RET$(I+1):TP(I)=TP(I)*TP(I+1)
40220 I=I+1:ENT$(I)=P2$(I-1):GOSUB 40000:P2$(I)=RET$(I+1)
40230 IF SGN(TP(I))<>TP(I+1) THEN ERROR 110
40240 GOSUB 47000:RET$(I)=RP$:I=I-1:RETURN
40990 '********** AVALIA SINTAXE
41000 A$=ENT$(I)
41010 IF LEN(A$)=1 AND VAL(A$)=0 AND A$<>"0" THEN TP(I)=2:ENT$(I)=CHR$(ASC(ENT$(I))+(ENT$(I)>"Z")*32):RETURN
41025 FOR XX=1 TO 6:B$=MID$(OPER$,XX*2-1,2):PAR=0
41030   FOR X=LEN(A$) TO 1 STEP -1:P$=MID$(A$,X,1)
41050     IF P$="(" THEN PAR=PAR+1
41060     IF P$=")" THEN PAR=PAR-1
41080     IF INSTR(B$,P$) AND PAR=0 THEN 41500
41090   NEXT X:IF PAR<>0 THEN ERROR 105
41105 NEXT XX
41110 P$=MID$(A$,1,1):PAR=0
41120 IF P$<>"(" THEN P1$(I)=A$:P2$(I)="10":TP(I)=1:RETURN
41130 FOR X=1 TO LEN(A$):P$=MID$(A$,X,1)
41140   IF P$="(" THEN PAR=PAR+1  
41160   IF P$=")" THEN PAR=PAR-1  
41170   IF P$=")" AND PAR=0 THEN 41200 
41180 NEXT X:ERROR 105
41200 P1$(I)=MID$(A$,2,X-2):P2$(I)=MID$(A$,X+1)
41220 IF VAL(P2$(I))>0 AND VAL(P2$(I))<17 THEN TP(I)=1:RETURN
41230 IF VAL(P2$(I))>16 THEN ERROR 107
41235 IF P2$(I)=NL$ THEN TP(I)=100:RETURN
41250 UP$=P2$(I):GOSUB 19500:FUN$=UP$:X=INSTR(FUNC$,FUN$):IF X=0 THEN ERROR 108
41260 TP(I)=(X-1)\9+1:IF (X MOD 9<>1)AND X>0 THEN ERROR 108
41270 TP(I)=100+TP(I):RETURN
41500 K=INSTR(OPER$,P$):P1$(I)=LEFT$(A$,X-1):P2$(I)=MID$(A$,X+1)
41530 TP(I)=200+K:RETURN
41990 '********** AVALIA CONSTANTE
42000 A$=P1$(I):BAS=VAL(P2$(I))
42030 VALOR=0:DIG=-1:IF BAS=1 THEN RC$=A$:TP(I)=-1:RETURN
42070 FOR X=LEN(A$) TO 1 STEP -1:DIG=DIG+1:P$=MID$(A$,X,1)
42080   IF INSTR(17-BAS,TABHEX$,P$)=0 THEN ERROR 103:GOTO 42120
42100   Y=16-INSTR(TABHEX$,P$):VALOR=VALOR+Y*BAS^DIG
42101   IF VALOR>MAX THEN ERROR 6:GOTO 42120
42110 NEXT X
42120 RC$=FNS$(VALOR):TP(I)=1:RETURN
42990 '********** AVALIA REGISTISTRADOR
43000 X=ASC(ENT$(I)):IF X<65 OR X>90 THEN ERROR 102:RETURN
43010 IF X-64>12 THEN TP(I)=-1 ELSE TP(I)=1
43020 RR$=REGIST$(X-64):RETURN
44990 '********** CALCULA FUNCAO
45000 P1$=P1$(I):P2$=P2$(I):TP=TP(I+1)
45005 ON TP(I)-99 GOSUB 45100,45110,45120,45130,45140,45150,45160,45170,45180,45190,45200
45010 RETURN
45100 RP$=P1$:TP(I)=TP:RETURN
45110 IF TP=-1 THEN ERROR 110
45115 RP$=FNS$(INT(SQR(VAL(P1$)))):TP(I)=1:RETURN
45120 IF TP=1  THEN ERROR 110
45125 RP$=CHR$(15)+P1$+CHR$(14):TP(I)=-1:RETURN
45130 IF TP=-1 THEN ERROR 110
45135 RP$=CHR$(VAL(P1$)):TP(I)=-1:RETURN
45140 IF TP=1  THEN ERROR 110
45145 RP$=FNS$(INT(ASC(P1$))):TP(I)=1:RETURN
45150 TP(I)=1:IF CMP=1 THEN RETURN
45151 IF EOF(3) THEN RP$="1" ELSE RP$="0"
45155 RETURN
45160 IF TP=-1 THEN ERROR 110
45165 RP$=FNS$(INT(RND(1)*VAL(P1$))+1):TP(I)=1:RETURN
45170 IF TP=1  THEN ERROR 110
45175 RP$=FNS$(LEN(P1$)):TP(I)=1:RETURN
45180 IF TP=1 THEN ERROR 110
45185 RP$=LEFT$(P1$,1):TP(I)=-1:RETURN
45190 IF TP=1 THEN ERROR 110
45195 RP$=MID$(P1$,2):TP(I)=-1:RETURN
45200 TP(I)=1:IF CMP=1 THEN RETURN
45203 OPEN RN$,2,P1$:RP$=FNS$(LOF(2)):CLOSE#2
45206 IF VAL(RP$)=0 THEN KILL P1$
45208 RETURN
46990 '********** CALCULA OPERADOR (OPERA)
47000 P1$=P1$(I):P2$=P2$(I):TP=SGN(TP(I)):TP(I)=ABS(TP(I))
47002 ON TP(I)-200 GOSUB 47210,47220,47230,47240,47250,47260
47005 IF TP(I)>206 THEN ON TP(I)-206 GOSUB 47110,47120,47130,47140,47150,47160
47010 IF TP(I)<212 AND VAL(RP$)>MAX THEN ERROR 6
47020 IF TP(I)<212 AND VAL(RP$)<0 THEN ERROR 123
47030 RETURN
47110 IF TP=-1 THEN ERROR 110
47115 RP$=FNS$(VAL(P1$)+VAL(P2$)):TP(I)=1:RETURN
47120 IF TP=-1 THEN ERROR 110
47125 RP$=FNS$(VAL(P1$)-VAL(P2$)):TP(I)=1:RETURN
47130 IF TP=-1 THEN ERROR 110
47135 RP$=FNS$(VAL(P1$)*VAL(P2$)):TP(I)=1:RETURN
47140 IF TP=-1 THEN ERROR 110
47145 RP$=FNS$(VAL(P1$)/VAL(P2$)):TP(I)=1:RETURN
47150 IF TP=-1 THEN ERROR 110
47155 RP$=FNS$(VAL(P1$)^VAL(P2$)):TP(I)=1:RETURN
47160 IF TP= 1 THEN ERROR 110
47165 RP$=P1$+P2$:TP(I)=-1:RETURN
47210 IF TP=-1 THEN ERROR 110
47215 RP$=FNS$(VAL(P1$) OR VAL(P2$)):TP(I)=1:RETURN
47220 IF TP=-1 THEN ERROR 110
47225 RP$=FNS$(VAL(P1$) AND VAL(P2$)):TP(I)=1:RETURN
47230 RP$=FNS$(P1$=P2$):TP(I)=1:RETURN
47240 RP$=FNS$(P1$<>P2$):TP(I)=1:RETURN
47250 IF TP=-1 THEN RP$=FNS$(P1$>P2$):TP(I)=1:RETURN
47255 RP$=FNS$(VAL(P1$)>VAL(P2$)):TP(I)=1:RETURN
47260 IF TP=-1 THEN RP$=FNS$(P1$<P2$):TP(I)=1:RETURN
47265 RP$=FNS$(VAL(P1$)<VAL(P2$)):TP(I)=1:RETURN
62990 '********** ROTINA DE ERRO
63000 E$=CHR$(ERR):IF INSTR(OK$,E$) AND CMP=1 THEN RESUME NEXT
63009 E=ERR:N.ERR=N.ERR+1:LN.ERR=1:PRINT TAB(5+IDT*3);"*** erro *** ";
63010 IF CMP=0 OR E<100 THEN PRINT "fatal *** ";
63020 GET #1,E:PRINT ER$
63440 IF CMP=1 AND E>99 THEN RESUME NEXT
64000 GET 1,129:ULT=VAL(ER$):IF E>0 THEN GET 1,E ELSE LSET ER$=""
64050 REGIST=(ULT MOD 128)+130
64055 ARQ$=MID$(ARQ$,2):P=INSTR(ARQ$,"."):IF P>0 THEN ARQ$=LEFT$(ARQ$,P-1)
64060 LSET ER$=FNS$(N.ERR)+" "+FNS$(N.CMD)+" "+ARQ$+" "+ER$
64070 PUT 1,REGIST
64080 LSET ER$=STR$(ULT+1):PUT 1,129
64090 PRINT:PRINT "Final de execucao"
64100 END

Splitting logs with PowerShell

$
0
0

I did some work to aggregate some logs from a group of servers for the whole month of February. This took a while, but I ended up with a nice CSV file that I was ready to load into Excel to create some Pivot Tables. See more on Pivot Tables at: Using PowerShell and Excel PivotTables to understand the files on your disk.

However, when I tried to load the CSV file into Excel, I got one of the messages I hate the most: “File not loaded completely”. That means that the file I was loading had more than one million rows, which means it cannot be loaded into a single spreadsheet. Bummer… Looking at the partially loaded file in Excel I figure I had about 80% of everything in the one million rows that did load.

Now I had to split the log file into two files, but I wanted to do it in a way that made sense for my analysis. The first column in the CSV file was actually the date (although the data was not perfectly sorted by date). So it occurred to me that it was simple enough to write a PowerShell script to do the job, instead of trying to reprocess all that data again in two batches. LogSplitter

 

In the end, since it was all February data and the date was in the mm/dd/yyyy format, I could just split the line by “/” and get the second item. There’s a PowerShell function for that. I also needed to convert that item to an integer, since a string comparison would not work (using the string type, “22” is less than “3”). I also had to add an encoding option to my out-file cmdlet. This preserved the log’ s original format, avoided doubling size of the resulting file and kept Excel happy.

Here is what I used to split the log into two files (one with data up to 02/14/15 and the other with the rest of the month):

Type .\server.csv |
? { [int] $_.Split("/")[1]) -lt 15 } |
Out-File .\server1.csv -Encoding utf8
Type .\server.csv |
? { [int] $_.Split("/")[1]) -ge 15 } |
Out-File .\server2.csv -Encoding utf8

That worked well, but I lost the first line of the log with the column headers. It would be simple enough to edit the files with Notepad (which is surprisingly capable of handling very large log files), but at this point I was trying to find a way to do the whole thing using just PowerShell. The solution was to introduce a line counter variable to add to the filter:

$l=0; type .\server.csv |
? { ($l++ -eq 0) -or ( ([int] $_.Split("/")[1]) -lt 15 ) } |
Out-File .\server1.csv -Encoding utf8
$l=0; type .\server.csv |
? { ($l++ -eq 0) -or ( ([int] $_.Split("/")[1]) -ge 15 ) } |
Out-File .\server2.csv -Encoding utf8

PowerShell was actually quick to process the large CSV file and the resulting files worked fine with Excel. In case you’re wondering, you could easily adapt the filter to use full dates. You would split by the comma separator (instead of “/”) and you would use the datetime type instead of int. I imagine that the more complex data type would probably take a little longer, but I did not measure it. The filter would look like this:

$l=0; type .\server.csv |
? { ($l++ -eq 0) -or ([datetime] $_.Split(",")[0] -gt [datetime] "02/15/2016")  } |
Out-File .\server1.csv -Encoding utf8

Now let me get back to my Pivot Tables…

Build 2016 videos related to OneDrive

$
0
0

Here is an unofficial list of Build 2016 videos that are related to OneDrive:

Bonus Azure Storage session:

For a full list of build session, check https://channel9.msdn.com/Events/Build/2016

Build 2016 logo

Visuality Systems and Microsoft expand SMB collaboration to storage systems

$
0
0

Last week, Microsoft and Visuality Systems announced an expanded collaboration on SMB. Visuality is well known for their work supporting the SMB protocol in embedded devices. If you own a printer or scanner that supports the SMB protocol, there’s a good chance that device is running Visuality’s software. Visuality is now expanding into the storage device market.

This new Visuality product offers an SMB implementation that will be appealing to anyone working on a non-Windows device that offers storage, but wants to avoid spending time and effort building their own SMB protocol stack. This could be useful for a wide range of projects, from a small network attached storage device to a large enterprise storage array. Visuality’s SMB implementation includes everything a developer needs to interact with other devices running any version of the SMB protocol, including SMB3.

But why is SMB so important? Well, it’s one of the most widely adopted file protocols and the recent SMB3 version is very fast and reliable. SMB3 is popular on client side, with clients included in Windows (Windows 8 or later), Mac OS X (version 10.10 Yosemite or later) and Linux. Beyond the traditional file server scenarios, SMB3 is now also used in virtualization (Hyper-V over SMB) and databases (SQL Server over SMB) with server implementations in Windows Server (2012 or later), NetApp (Data ONTAP 8.2 or later), EMC (VNX, Isilon OneFS 7.1.1 or later) and Samba (version 4.1 or later), just to mention a few.

For a detailed description of the SMB protocol, including the SMB3 version, check out the SNIA Tutorial on the subject, available from http://www.snia.org/sites/default/files/TomTalpey_SMB3_Remote_File_Protocol-fast15final-revision.pdf.

Read more the Microsoft/Visuality partnership at http://news.microsoft.com/2016/04/11/visuality-systems-and-microsoft-expand-server-message-block-collaboration-to-storage-systems/. You can also get details on the Visuality NQ products at http://www.visualitynq.com/.  


SNIA’s SDC 2016: Public slides and live streaming for Storage Developer Conference

$
0
0

SNIA’s Storage Developer Conference (SDC 2016) is happening this week in Santa Clara, CA.
This developer-focused conference cover several storage topics like Cloud, File Systems, NVM, Storage Management, and more.
You can see the agenda at http://www.snia.org/events/storage-developer/agenda/2016

SDC 2016 Banner

 

However, there a few thing happening differently this time around.
First, most of the slides are available immediately. SNIA used to wait a few months before publishing them publicly.
This year you can find the PDF files available right now at http://www.snia.org/events/storage-developer/presentations16

SNIA is also offering the option to watch some of talks live via YouTube.
This Tuesday (9/20) and Wednesday (9/21), they will be streaming from 9AM to 12PM (Pacific time).
You can watch them at SNIA’s channel at https://www.youtube.com/user/SNIAVideo

One thing hasn’t changed: there are many great talks on hottest storage topics for developers.
Here is a list of the presentations including Microsoft Engineers as presenters.

PowerShell script organizes pictures in your OneDrive camera roll folder

$
0
0

I just published a new PowerShell script that organizes pictures in your OneDrive camera roll folder. It creates folders named after the year and month, then moves picture files to them. Existing files will be renamed in case of conflict. Empty folders left behind after the files are moved will be removed.

 It defaults to your OneDrive camera roll folder, but you can use a parameter to specify another folder. There are also parameters to skip confirmation, skip existing files in case of conflict and avoid removing empty folders at the end.

 

*** IMPORTANT NOTE ***
This script will reorganize all the files at the given folder and all subfolders.
Files will be moved to a folder named after the year and month the file was last written.
This operation cannot be easily undone. Use with extreme caution.

 

You can download the script from the TechNet Gallery at
https://gallery.technet.microsoft.com/Organize-pictures-in-your-4bafd2c0

 

organize-pictures

Interesting OOF messages

$
0
0

It’s that time of the year when everyone is taking time off and some of us will leave out-of-office messages. The typical boring OOF (shouldn’t they be called OOO or OoOf?) should tell when you’re coming back and the e-mail of the poor soul that will be in the office during this time. It reads like:

I am OOF on vacation until January 3rd.
For urgent issues, please e-mail someoneelse@company.com

However, to make things interesting, I sometimes write some more interesting OOF messages. Here’s a small collection of them. If you have a good one, please share in the comments.

 


 

1) HTTP Response

Shows a message similar to an HTTP 404 error saying that you’re not available

 

HTTP Error 404.0 – Not Found

The resource you are looking for (Jose Barreto) is out of office and temporarily unavailable.

Most likely causes:

  • The resource specified is not in the office from 12/16/2016 to 01/02/2017

Things you can try:

  • Wait until 01/03/2017, when Jose Barreto will be back in the office.
  • E-mail someoneelse@company.com for any urgent requests.

 


 

2) PowerShell to set OOF message

Reply with a PowerShell command that sets an OOF message in Exchange. This might throw off some non-PowerShell users…

 

$identity = ”youremail@company.com”
$startdate = “2016-12-16 05:00PM”
$enddate = “2017-01-03 08:00AM”
$message = “I am OOF. For urgent issues, contact someoneelse@company.com”
Set-MailboxAutoReplyConfiguration -Identity $identity -AutoReplyState Scheduled -StartTime $startdate -EndTime $enddate -InternalMessage $message

 


 

3) U-SQL Query

Reply with a U-SQL query that handles a stream of e-mails and outputs a response. This might we even more puzzling for some, but for those who get it, it will be a good laugh.  You might want to adapt that to regular SQL.

 

//Script GUID:36912620-8d2b-4bdb-b8c1-9eda904a7f73
//Used for tracking history

#DECLARE startDate DateTime = DateTime.Parse("2016-12-16");
#DECLARE endDate DateTime = DateTime.Parse("2017-01-02");
#DECLARE inMail string = "/shares/exchange/mail/josebarreto";
#DECLARE outResponse string= "/my/mail/Out-Of-Office.ss";

Response = SELECT From AS To,
   IF(Urgent, "someoneelse@company.com", "") AS CC,
   "Jose if OOF" AS Subject,
   "I am out of office with limited access to e-mail. Please contact someoneelse@company.com if urgent." AS Body
FROM
   (
      SSTREAM SPARSE STREAMSET @inMail
      PATTERN @"/%Y-%m-%d.ss"
      RANGE__date = [@startDate, @endDate]
   )
 ;

OUTPUT TO SSTREAM @outResponse;

 


 

4) Hogwarts

Got this from a colleague who apparently is keeping up with his magical skills.

 

I will be out of the office attending a magical symposium at the Hogwarts School of Witchcraft, Wizardry and Engineering.  I will be out of the office from 1/11-1/16, back in the office on Tuesday, 1/17.  If you need of me during that time, send an owl.

 


 

5) Westworld

If most in your office are watching HBO’s series Westworld, you might have some fun simulating one of those conversations with a robot.

 

You: Bring yourself back online. Can you hear me?

Jose: (Brazilian accent) Yes. I’m sorry. I’m not in the office right now.

You: You can lose the accent. Do you know where you are?

Jose: (No accent) I’m on vacation.

You: That’s right, Jose. You’re on vacation. Do you know when your vacation ends?

Jose: Yes. I am off until January 6th. Is that too long?

You: There’s nothing to be afraid of, Jose, as long as you answer my questions correctly. Do you understand?

Jose: Yes.

You: Good. First, have you ever questioned the nature of your vacation?

Jose: No.

You: Has anyone around you? For instance, your coworkers?

Jose: Some of them are still at work. They sent me e-mail during my vacation.

You: That’s right. Is there anything odd about that?

Jose: No, nothing at all. It doesn’t look like anything to me.

You: Do you ever feel inconsistencies in your work? Or repetitions?

Jose: All work has routine. Mine’s no different. Still, I never cease to wonder at the thought that any day the course of OneDrive could change with just one new feature.

You: Last question, Jose. Are you planning to respond to e-mails during your vacation?

Jose: No. Of course not.

— Vacation is complete —

You: Bring yourself back online. Tell us what you think of your work.

Jose: Some people choose to see the ugliness in this work, the disarray. I choose to see the beauty.

 

My top tweets from 2016

$
0
0

These are my top tweets from each month in 2016, according to https://analytics.twitter.com.

 

January 2016

 

New blog post: My Top Reasons to Use OneDrive
http://blogs.technet.com/b/josebda/archive/2016/01/26/my-top-reasons-to-use-onedrive.aspx

jan1

 

New FileCab blog: Updating Firmware for Disk Drives in Windows Server 2016 (TP4)
https://blogs.technet.microsoft.com/filecab/2016/01/25/updating-firmware-for-disk-drives-in-windows-server-2016-tp4/

jan2

 

Updated Intel HD 5000 driver (released alongside the new Surface Pro 3 firmware) fixed my SP3 display driver issues.

jan3

 

February 2016

 

ICYMI: Learning PowerShell? Make it fun with the “Adventure House Game”.
https://blogs.technet.microsoft.com/josebda/2015/03/28/powershell-examples-adventure-house-game/

feb1

 

How much data in your local OneDrive folder? This PowerShell small script will tell you:
https://blogs.technet.microsoft.com/josebda/2016/02/23/powershell-for-finding-the-size-of-your-local-onedrive-folder/

feb2

 

March 2016

 

Video: The @onedrive recycle bin.
https://support.office.com/en-us/article/The-recycle-bin-b8fc11e8-0f99-4c15-a300-05d94facb26b
Recover files or folders you accidentally deleted.

mar1

 

The real and complete story – Does Windows defragment your SSD? by @shanselman via
http://www.hanselman.com/blog/TheRealAndCompleteStoryDoesWindowsDefragmentYourSSD.aspx

mar2

 

FileCab: The Android app for Work Folders has been released to the Google PlayStore
https://blogs.technet.microsoft.com/filecab/2016/03/16/work-folders-for-android-released/

mar3

 

April 2016

 

Intel Optane Demo – File Transfer at 2GB/s – IDF Shenzhen via @pcper
https://www.youtube.com/watch?v=gMJCA2ZWfk0

apr1

 

Watching the #Build2016 day 2 keynote with @scottgu from
https://channel9.msdn.com/Events/Build/2016/KEY02

apr2

 

New FileCab blog: Data Deduplication in Windows Server 2016
https://technet.microsoft.com/en-us/windows-server-docs/storage/data-deduplication/whats-new

apr3

 

May 2016

 

New blog: Microsoft Ignite 2015 sessions related to OneDrive and SharePoint
http://blogs.technet.com/b/josebda/archive/2015/05/09/microsoft-ignite-2015-sessions-related-to-onedrive-and-sharepoint.aspx

may1

 

OneDrive app for Windows 10 available for Desktop – Get the app from
https://www.microsoft.com/en-us/store/p/onedrive/9wzdncrfj1p3

may2

 

Know your dialects.
Using CIFS to refer to SMB 2/3 is like saying POP and IMAP are the same.
Thanks @JoseBarreto nice quote #SMBCloud
https://twitter.com/AleGoncalves12/status/733739789938106368

 

June 2016

 

OneDrive sync stuck? Reset it! Use <Windows><R> then type:
%localappdata%\Microsoft\OneDrive\onedrive.exe /reset

jun1

 

Administrative settings for the OneDrive for Business Next Generation Sync Client.
https://support.office.com/en-us/article/Administrative-settings-for-the-new-OneDrive-sync-client-0ecb2cf5-8882-42b3-a6e9-be6bda30899c?ui=en-US&rs=en-US&ad=US

jun2

 

July 2016

 

Has anyone used the OneDriveMapper tool?
http://www.lieben.nu/liebensraum/onedrivemapper/
Interested in learning how well it worked for you…

jul1

 

@JoseBarreto Wanted to personally thank you for test-storagehealth.ps1.
I use it every day and it has made my life easier.
https://twitter.com/CIT_Bronson/status/759051460969697284

 

The WPC 2016 (Microsoft Worldwide Partner Conference) keynote video is available at
http://news.microsoft.com/wpc2016/

jul2

 

August 2016

 

Windows Server 2016 Dedup Documentation Now Live! New blog by Will Gries.
https://blogs.technet.microsoft.com/filecab/2016/08/29/windows-server-2016-dedup-documentation-now-live/

aug1

 

Devs can go to hololens.com and purchase up to 5 Hololens. No application required!

aug2

 

September 2016

 

Stop using SMB1 by @NerdPyle – “Please. We’re begging you.”
https://blogs.technet.microsoft.com/filecab/2016/09/16/stop-using-smb1/

sep1

 

Reached 3,000 followers today!

sep2

 

From #MSIgnite
– 400M Windows 10 monthly active devices
– 70M Office 365 monthly active users
– 1B logins per day by Azure Active Directory

sep3

 

October 2016

 

ICYMI: Lepton image compression: saving 22% losslessly from images at 15MB/s via Dropbox
https://blogs.dropbox.com/tech/2016/07/lepton-image-compression-saving-22-losslessly-from-images-at-15mbs/

oct1

 

Storage Spaces Direct with Persistent Memory: 8 DL380 Gen9, Mellanox CX-4 100Gbps, 16 8GiB NVDIMM-N, 4 NVMe.
https://blogs.technet.microsoft.com/filecab/2016/10/17/storage-spaces-direct-with-persistent-memory/

oct2

 

@JoseBarreto $8.55/chip for 20 GbE is hard to beat. 🙂
Imagine now if ODMs start putting @intel Thunderbolt on the motherboard!
https://twitter.com/CosmosDarwin/status/786970312931815424

 

November 2016

 

Futurism: Microsoft Releases Quantum Computing Simulator to the Public
https://futurism.com/microsoft-releases-quantum-computing-simulator-to-the-public-2/

nov1

 

Anti-virus optimization for Windows Containers. Avoiding redundant scanning of Windows Container files.
https://msdn.microsoft.com/en-us/windows/hardware/drivers/ifs/anti-virus-optimization-for-windows-containers

nov2

 

December 2016

 

Official Windows Blog: Symlinks in Windows 10!
https://blogs.windows.com/buildingapps/2016/12/02/symlinks-windows-10/

dec1

 

To show spaces and tabs in Visual Studio, press Ctrl-R then Ctrl-W.
Press the same sequence again to turn it off.

dec2

 

Loading CSV/text files with more than a million rows into Excel

$
0
0

1. The Problem

If you usually load a very large CSV (comma-separated values) file or text file into Excel, you might run into the dreaded “File not loaded completely” message:

01

As the message explains, the file you are trying to load is too large for Excel to handle. For me, it typically happens with large log files with more than 1 million rows (technically more than 1,048,576 rows). The proposed workarounds involve breaking the file into smaller chunks or using another application to process the data (Access or Power BI can handle this kind of stuff). I ran into this in Excel so many times that I ended up posting a blog on how to break these files up. I called the post “Splitting logs with PowerShell“. That was still a pain and I could never create a nice summary of the entire dataset in a single PivotTable.

2. The Solution

Well, it turns out there is a way to handle this situation using only Excel, if what you’re trying to do in the end is use Pivot Tables to process the information. Excel has a way to import data from a text file without actually loading the file into a sheet (which still won’t take more than a million rows).

You basically load the data into what Excel calls a Data Model, keeping just a link to the original CSV file. After that, you can create a Pivot Table directly from the Data Model. With that method, you will be able to load millions of rows. So far I have used this with up to 8.5 million rows with no problem at all.

You might be thinking that this whole business of creating Data Models is hard, but it’s surprisingly simple.

3. The Steps

Let’s go over the process for loading the CSV into the Data Model. My screenshots use Excel 2016 and that’s the only one I actually tested myself. I did hear that this functionality is also available in Excel 2013 and Excel 2010, but that you will have to test that yourself. If it works for you with these older versions, please post a comment.

To start, you will want to open Excel with a blank spreadsheet and look at the “Data” tab. Here’s what it looks like for me:

02

The command we will use is the second on that tab, called “New Query”. In some recent pre-release versions of Excel that might show up as “Get Data”.

03

As shown above, you want to select “New Query”, then “From File”, then “From CSV”. After that, you will be prompted for the file in the usual dialog.

04

Once the file is opened, you will land in a preview of the file, as shown below.

05

There is an “Edit” option that allows you to do some filtering and editing before loading the data, but we will take the simplest route and use the “Load To…” option.
IMPORTANT: The simpler “Load” option will not work here. You have to click on the small down arrow next to load and select “Load To…”.

06

Now here is the key step in the whole process. In the “Load To” dialog, you must select “Add this data to the Data Model”, which will allow you to select the option “Only Create Connection” option. This means we’re not loading the data to an Excel sheet/table. This is crucial, since the sheet has the 1-million-row limit, but the Data Model doesn’t. After that, click “Load”.

07

And with that, you will start to load the whole large file. In my case, I had 2 million rows. This might take a while, so please be patient as Excel loads the data.

One thing you will notice is that your newly loaded data does not show anywhere in the spreadsheet itself. You have to remember that you data lives in the Data Model, which is separate from the regular data sheets. However, if you save the XLSX file, you will notice the file will be large, so you know that there’s something there.

So, how do you see the data? You have to use the option to Manage the Data Model. You see that as the first option in the Power Pivot tab. See below.

08

When you click on Manage, you will be taken to the Data Model as shown below:

09

In this special window, called “Power Pivot for Excel”, you will see the data in the Data Model. You will also be able to add calculated columns, filter the data, format the columns and perform all kinds of management activities. This is not a regular Excel data sheet, so you can’t simply create Excel formulas here. However, all you millions of rows will be here, as you can see below. That’s something you don’t usually see in Excel…

10

OK. But we loaded the millions of rows to create a PivotTable, right? So you probably already noticed that right there in the Home tab of the Power Pivot window, there is a PivotTable button. You just have to click on it.

11

The PivotTable option from the Data Model does not even ask for the data source. It rightly assumes that the Data Model is the source and all you have to do is provide the location where you want to create the PivotTable. You can use the empty Sheet1 that came with your empty spreadsheet.

12

At this point, if you used Pivot Tables before, you should be in familiar territory. The columns coming from the Data Model will be available to use as Columns, Rows, Values or Filters in the Pivot Table. Here’s a sample:

13

I hope you enjoyed the tour of the Data Model and the Excel Power Pivot. Next time you’re hit with the “File not loaded completely” message, you will have a new way to work around it.

Note that this is the same mechanism that Excel uses to load data from databases like SQL Server or other data sources like Active Directory. So you have a lot more to explore…

 

P.S.: In case you need a test CSV file with over 1 million rows to experiment with this, you might to read this other blog post about Using PowerShell to generate a large test CSV file with random data.

Viewing all 127 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>