Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Fix loop cloning array index recognition for arm64 #20472

Merged
merged 1 commit into from
Oct 19, 2018

Conversation

BruceForstall
Copy link
Member

arm64 Checked PMI altjit frameworks asm diffs:

Found 95 files with textual diffs.

Summary:
(Lower is better)

Total bytes of diff: 30144 (0.15% of base)
    diff is a regression.

Top file regressions by size (bytes):
       11772 : System.Private.CoreLib.dasm (2.06% of base)
        1768 : Microsoft.CodeAnalysis.dasm (0.26% of base)
        1764 : System.Private.DataContractSerialization.dasm (0.19% of base)
        1692 : Microsoft.CSharp.dasm (0.49% of base)
        1436 : System.Collections.Immutable.dasm (0.16% of base)

Top file improvements by size (bytes):
         -80 : NuGet.Packaging.dasm (-0.04% of base)
         -28 : xunit.execution.dotnet.dasm (-0.10% of base)
         -24 : Microsoft.Win32.Registry.dasm (-0.10% of base)
         -20 : Microsoft.DotNet.ProjectModel.dasm (-0.01% of base)
          -4 : System.IO.Compression.dasm (-0.01% of base)

39 total files with size differences (6 improved, 33 regressed), 90 unchanged.

Top method regressions by size (bytes):
        5908 (83.87% of base) : System.Private.CoreLib.dasm - System.DefaultBinder:BindToMethod(int,ref,byref,ref,ref,ref,byref):ref:this
         884 (62.96% of base) : Microsoft.CSharp.dasm - Microsoft.CSharp.RuntimeBinder.Errors.ErrorHandling:Error(int,ref):ref
         740 (38.14% of base) : Microsoft.VisualBasic.dasm - Microsoft.VisualBasic.CompilerServices.OverloadResolution:MoreSpecificProcedure(ref,ref,ref,ref,int,byref,bool):ref
         728 (48.02% of base) : System.Private.CoreLib.dasm - System.DefaultBinder:SelectMethod(int,ref,ref,ref):ref:this
         668 (38.84% of base) : System.Private.CoreLib.dasm - System.DefaultBinder:SelectProperty(int,ref,ref,ref,ref):ref:this

Top method improvements by size (bytes):
         -80 (-2.42% of base) : NuGet.Packaging.dasm - <<InstallFromSourceAsync>b__0>d:MoveNext():this
         -40 (-10.10% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Utilities.DirectoryUtilities:Clean(ref):int
         -32 (-5.63% of base) : System.Private.DataContractSerialization.dasm - System.Xml.XmlDictionaryReader:ReadContentAs(ref,ref):ref:this
         -28 (-3.24% of base) : xunit.execution.dotnet.dasm - Xunit.Serialization.XunitSerializationInfo:CanSerializeObject(ref):bool
         -24 (-2.38% of base) : Microsoft.CSharp.dasm - Microsoft.CSharp.RuntimeBinder.SymbolTable:AddConversionsForOneType(ref)

Top method regressions by size (percentage):
         468 (110.38% of base) : System.Data.Common.dasm - System.Data.ConstraintCollection:BaseGroupSwitch(ref,int,ref,int):this
        5908 (83.87% of base) : System.Private.CoreLib.dasm - System.DefaultBinder:BindToMethod(int,ref,byref,ref,ref,ref,byref):ref:this
         480 (82.76% of base) : System.Web.HttpUtility.dasm - System.Web.Util.HttpEncoder:UrlEncodeUnicode(ref):ref
         624 (78.00% of base) : System.Private.CoreLib.dasm - System.DefaultBinder:FindMostSpecific(ref,ref,ref,ref,ref,ref,ref,ref):int
         148 (74.00% of base) : System.Private.Xml.dasm - System.Xml.ValidateNames:ParseNmtokenNoNamespaces(ref,int):int

Top method improvements by size (percentage):
          -4 (-14.29% of base) : System.Collections.Immutable.dasm - System.Collections.Immutable.ImmutableStack`1[Int32][System.Int32]:System.Collections.Immutable.IImmutableStack<T>.Pop():ref:this
         -40 (-10.10% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Utilities.DirectoryUtilities:Clean(ref):int
          -4 (-6.67% of base) : System.Collections.Immutable.dasm - System.Collections.Immutable.ImmutableStack`1[Int32][System.Int32]:Pop(byref):ref:this
         -20 (-6.58% of base) : Microsoft.DotNet.ProjectModel.dasm - Microsoft.DotNet.ProjectModel.Resolution.FrameworkReferenceResolver:GetFrameworkInformation(ref):ref:this
         -32 (-5.63% of base) : System.Private.DataContractSerialization.dasm - System.Xml.XmlDictionaryReader:ReadContentAs(ref,ref):ref:this

180 total methods with size differences (36 improved, 144 regressed), 100761 unchanged.

18 files had text diffs but not size diffs.
System.Resources.Writer.dasm had 171 diffs
System.Security.Permissions.dasm had 76 diffs
System.IO.FileSystem.dasm had 40 diffs
System.IO.IsolatedStorage.dasm had 34 diffs
NuGet.Frameworks.dasm had 26 diffs

arm64 Checked PMI altjit frameworks asm diffs:
```
Found 95 files with textual diffs.

Summary:
(Lower is better)

Total bytes of diff: 30144 (0.15% of base)
    diff is a regression.

Top file regressions by size (bytes):
       11772 : System.Private.CoreLib.dasm (2.06% of base)
        1768 : Microsoft.CodeAnalysis.dasm (0.26% of base)
        1764 : System.Private.DataContractSerialization.dasm (0.19% of base)
        1692 : Microsoft.CSharp.dasm (0.49% of base)
        1436 : System.Collections.Immutable.dasm (0.16% of base)

Top file improvements by size (bytes):
         -80 : NuGet.Packaging.dasm (-0.04% of base)
         -28 : xunit.execution.dotnet.dasm (-0.10% of base)
         -24 : Microsoft.Win32.Registry.dasm (-0.10% of base)
         -20 : Microsoft.DotNet.ProjectModel.dasm (-0.01% of base)
          -4 : System.IO.Compression.dasm (-0.01% of base)

39 total files with size differences (6 improved, 33 regressed), 90 unchanged.

Top method regressions by size (bytes):
        5908 (83.87% of base) : System.Private.CoreLib.dasm - System.DefaultBinder:BindToMethod(int,ref,byref,ref,ref,ref,byref):ref:this
         884 (62.96% of base) : Microsoft.CSharp.dasm - Microsoft.CSharp.RuntimeBinder.Errors.ErrorHandling:Error(int,ref):ref
         740 (38.14% of base) : Microsoft.VisualBasic.dasm - Microsoft.VisualBasic.CompilerServices.OverloadResolution:MoreSpecificProcedure(ref,ref,ref,ref,int,byref,bool):ref
         728 (48.02% of base) : System.Private.CoreLib.dasm - System.DefaultBinder:SelectMethod(int,ref,ref,ref):ref:this
         668 (38.84% of base) : System.Private.CoreLib.dasm - System.DefaultBinder:SelectProperty(int,ref,ref,ref,ref):ref:this

Top method improvements by size (bytes):
         -80 (-2.42% of base) : NuGet.Packaging.dasm - <<InstallFromSourceAsync>b__0>d:MoveNext():this
         -40 (-10.10% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Utilities.DirectoryUtilities:Clean(ref):int
         -32 (-5.63% of base) : System.Private.DataContractSerialization.dasm - System.Xml.XmlDictionaryReader:ReadContentAs(ref,ref):ref:this
         -28 (-3.24% of base) : xunit.execution.dotnet.dasm - Xunit.Serialization.XunitSerializationInfo:CanSerializeObject(ref):bool
         -24 (-2.38% of base) : Microsoft.CSharp.dasm - Microsoft.CSharp.RuntimeBinder.SymbolTable:AddConversionsForOneType(ref)

Top method regressions by size (percentage):
         468 (110.38% of base) : System.Data.Common.dasm - System.Data.ConstraintCollection:BaseGroupSwitch(ref,int,ref,int):this
        5908 (83.87% of base) : System.Private.CoreLib.dasm - System.DefaultBinder:BindToMethod(int,ref,byref,ref,ref,ref,byref):ref:this
         480 (82.76% of base) : System.Web.HttpUtility.dasm - System.Web.Util.HttpEncoder:UrlEncodeUnicode(ref):ref
         624 (78.00% of base) : System.Private.CoreLib.dasm - System.DefaultBinder:FindMostSpecific(ref,ref,ref,ref,ref,ref,ref,ref):int
         148 (74.00% of base) : System.Private.Xml.dasm - System.Xml.ValidateNames:ParseNmtokenNoNamespaces(ref,int):int

Top method improvements by size (percentage):
          -4 (-14.29% of base) : System.Collections.Immutable.dasm - System.Collections.Immutable.ImmutableStack`1[Int32][System.Int32]:System.Collections.Immutable.IImmutableStack<T>.Pop():ref:this
         -40 (-10.10% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - Microsoft.Diagnostics.Utilities.DirectoryUtilities:Clean(ref):int
          -4 (-6.67% of base) : System.Collections.Immutable.dasm - System.Collections.Immutable.ImmutableStack`1[Int32][System.Int32]:Pop(byref):ref:this
         -20 (-6.58% of base) : Microsoft.DotNet.ProjectModel.dasm - Microsoft.DotNet.ProjectModel.Resolution.FrameworkReferenceResolver:GetFrameworkInformation(ref):ref:this
         -32 (-5.63% of base) : System.Private.DataContractSerialization.dasm - System.Xml.XmlDictionaryReader:ReadContentAs(ref,ref):ref:this

180 total methods with size differences (36 improved, 144 regressed), 100761 unchanged.

18 files had text diffs but not size diffs.
System.Resources.Writer.dasm had 171 diffs
System.Security.Permissions.dasm had 76 diffs
System.IO.FileSystem.dasm had 40 diffs
System.IO.IsolatedStorage.dasm had 34 diffs
NuGet.Frameworks.dasm had 26 diffs
```
@BruceForstall
Copy link
Member Author

@dotnet-bot test Ubuntu16.04 arm64 Cross Checked corefx_baseline Build and Test
@dotnet-bot test Ubuntu16.04 arm64 Cross Checked corefx_jitstress1 Build and Test
@dotnet-bot test Ubuntu16.04 arm64 Cross Checked corefx_jitstress2 Build and Test

@dotnet-bot test Ubuntu16.04 arm64 Cross Checked jitstress1 Build and Test
@dotnet-bot test Ubuntu16.04 arm64 Cross Checked jitstress2 Build and Test

@dotnet-bot test Ubuntu16.04 arm64 Cross Checked r2r Build and Test

@dotnet-bot test Windows_NT arm64 Cross Checked corefx_baseline Build and Test
@dotnet-bot test Windows_NT arm64 Cross Checked corefx_jitstress1 Build and Test
@dotnet-bot test Windows_NT arm64 Cross Checked corefx_jitstress2 Build and Test

@dotnet-bot test Windows_NT arm64 Cross Checked jitstress1 Build and Test
@dotnet-bot test Windows_NT arm64 Cross Checked jitstress2 Build and Test

@dotnet-bot test Windows_NT arm64 Cross Checked r2r Build and Test

@BruceForstall
Copy link
Member Author

@AndyAyersMS I don't think we have any lab support for arm64 perf testing. Do you recommend any kind of perf testing here?

@BruceForstall
Copy link
Member Author

@AndyAyersMS PTAL
cc @dotnet/jit-contrib

@@ -8410,7 +8410,7 @@ bool Compiler::optExtractArrIndex(GenTree* tree, ArrIndex* result, unsigned lhsN
{
return false;
}
#ifdef _TARGET_AMD64_ // TODO-ARM64: should this be _TARGET_64BIT_? If not, add comment why not.
#ifdef _TARGET_64BIT_
if (index->gtOper != GT_CAST)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What cast do we expect here for _TARGET_64BIT_?

#else // !_TARGET_64BIT_

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casts from int to long on 64 bit platforms.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you find this by auditing ifdefs or looking at some kind of structural diff between arm64 and x64 codegen?

@AndyAyersMS
Copy link
Member

For perf you can try and verify that the usual suspects get cloned. Some examples to play with over in #12836.

I wonder if we should have some kind of "flow graph" diff mode where we can verify that the jit produces similar flow graphs cross-arch.

@BruceForstall
Copy link
Member Author

Did you find this by auditing ifdefs or looking at some kind of structural diff between arm64 and x64 codegen?

I was working in this code for #20446, and it struck me the AMD64 #ifdef would apply to ARM64.

@BruceForstall
Copy link
Member Author

I wonder if we should have some kind of "flow graph" diff mode where we can verify that the jit produces similar flow graphs cross-arch.

Interesting concept. I wonder how fragile it would be.

There's always a worry that various optimizations just stop firing due to some minor bug like this, and we don't notice it (although this never fired for arm64, I'm sure).

@BruceForstall
Copy link
Member Author

test Windows_NT arm64 Cross Checked Innerloop Build and Test
test Windows_NT arm64 Cross Checked jitstress1 Build and Test
test Windows_NT arm64 Cross Checked jitstress2 Build and Test
test Windows_NT arm64 Cross Checked r2r Build and Test

Windows arm64 corefx tests have script error

@AndyAyersMS
Copy link
Member

I'll do some prototyping to see if diffing flowgraphs is a viable idea.

@BruceForstall
Copy link
Member Author

Jenkins reboot

test Windows_NT arm64 Cross Checked Innerloop Build and Test
test Windows_NT arm64 Cross Checked jitstress1 Build and Test
test Windows_NT arm64 Cross Checked jitstress2 Build and Test
test Windows_NT arm64 Cross Checked r2r Build and Test

@BruceForstall
Copy link
Member Author

@dotnet-bot test this please

@BruceForstall
Copy link
Member Author

test Windows_NT arm64 Cross Checked Innerloop Build and Test
test Windows_NT arm64 Cross Checked jitstress1 Build and Test
test Windows_NT arm64 Cross Checked jitstress2 Build and Test
test Windows_NT arm64 Cross Checked r2r Build and Test

@AndyAyersMS
Copy link
Member

Seems like maybe cross-arch FG diffing is viable (we'll need to revise the flow graph dump format to reduce noise a bit, and make it alt jit aware, and decide when to dump it). Opened dotnet/jitutils#177 with the intent of building something real.

Here's an odd diff in WindowsConsoleStream:WriteFileNative(long,ref,int,int,bool):int where arm64 has kept some unreachable throw blocks:

;;; x64 -- tail end of method

       C3                   ret      

G_M43912_IG12:
       E8D3DE395F           call     CORINFO_HELP_RNGCHKFAIL
       CC                   int3  
;;; arm64 -- tail end of method

        D65F03C0          ret     lr

G_M43912_IG12:
        94000000          bl      CORINFO_HELP_OVERFLOW

G_M43912_IG13:
        94000000          bl      CORINFO_HELP_THROWDIVZERO

G_M43912_IG14:
        94000000          bl      CORINFO_HELP_RNGCHKFAIL
        D43E0000          bkpt 

@BruceForstall
Copy link
Member Author

Benchmarks arm64 PMI Checked JIT (Release other components) altjit diffs:

Found 21 files with textual diffs.
PMI Diffs for benchstones and benchmarks game in F:\gh\coreclr10\bin\tests\Windows_NT.x64.Release for x64 protononjit.dll
Summary:
(Lower is better)
Total bytes of diff: 28020 (4.86% of base)
    diff is a regression.
Top file regressions by size (bytes):
       13100 : Bytemark\Bytemark\Bytemark.dasm (13.69% of base)
        4748 : SciMark\SciMark\SciMark.dasm (27.22% of base)
        3004 : Benchstones\BenchI\MulMatrix\MulMatrix\MulMatrix.dasm (92.49% of base)
        2164 : Benchstones\BenchI\Array2\Array2\Array2.dasm (75.24% of base)
        1172 : Benchstones\BenchF\InProd\InProd\InProd.dasm (45.64% of base)
19 total files with size differences (0 improved, 19 regressed), 63 unchanged.
Top method regressions by size (bytes):
        3332 (261.13% of base) : Bytemark\Bytemark\Bytemark.dasm - LUDecomp:ludcmp(ref,int,ref,byref):int
        3004 (174.65% of base) : Benchstones\BenchI\MulMatrix\MulMatrix\MulMatrix.dasm - MulMatrix:Inner(ref,ref,ref)
        2288 (195.89% of base) : Bytemark\Bytemark\Bytemark.dasm - AssignJagged:second_assignments(ref,ref)
        2036 (166.34% of base) : Bytemark\Bytemark\Bytemark.dasm - AssignRect:second_assignments(ref,ref)
         888 (163.24% of base) : Bytemark\Bytemark\Bytemark.dasm - LUDecomp:DoLUIteration(ref,ref,ref,ref,int):long
Top method regressions by size (percentage):
        3332 (261.13% of base) : Bytemark\Bytemark\Bytemark.dasm - LUDecomp:ludcmp(ref,int,ref,byref):int
         516 (198.46% of base) : Benchstones\BenchI\Array2\Array2\Array2.dasm - Array2:VerifyCopy(ref,ref):bool
        2288 (195.89% of base) : Bytemark\Bytemark\Bytemark.dasm - AssignJagged:second_assignments(ref,ref)
        3004 (174.65% of base) : Benchstones\BenchI\MulMatrix\MulMatrix\MulMatrix.dasm - MulMatrix:Inner(ref,ref,ref)
         652 (168.04% of base) : Benchstones\BenchF\InProd\InProd\InProd.dasm - InProd:Bench():bool
77 total methods with size differences (0 improved, 77 regressed), 1801 unchanged.
2 files had text diffs but not size diffs.
BenchmarksGame\fasta\fasta-2\fasta-2.dasm had 34 diffs
BenchmarksGame\binarytrees\binarytrees-5\binarytrees-5.dasm had 10 diffs

@AndyAyersMS
Copy link
Member

A big size regression in ludcmp is a pretty good indication that loop cloning is "working."

@BruceForstall BruceForstall merged commit 679a823 into dotnet:master Oct 19, 2018
@BruceForstall BruceForstall deleted the ImproveLoopCloningArm64 branch October 19, 2018 17:28
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…pCloningArm64

Fix loop cloning array index recognition for arm64

Commit migrated from dotnet/coreclr@679a823
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants