Commits · dfbeb1452bbc59aa0e7c7d1101db7e80943d66df · halo / art

06 May, 2016 1 commit

Add cmpb instruction to x86 and x86_64 assembler · 3b62593b

Serguei Katkov authored 9 years ago


Change-Id: I43f41ef2fdf6475238f0987842aefb1c2eb6a36d
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>

3b62593b

13 Apr, 2016 1 commit

Move Assemblers to the Arena. · 93205e39

Vladimir Marko authored 9 years ago

And clean up some APIs to return std::unique_ptr<> instead
of raw pointers that don't communicate ownership.

Change-Id: I3017302307a0253d661240750298802fb0d9585e

93205e39

12 Feb, 2016 1 commit

Add X86/X86_64 support for CMOV from memory. · abdac47c

Mark Mendell authored 9 years ago


Add support for the memory form of CMOV.  Add tests.

Change-Id: Ib9f5dbd3031c7e235ee3f2afdb7db75eed46277a
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>

abdac47c

21 Jan, 2016 1 commit

Implemented BitCount as an intrinsic. With unit test. · 3f67e692

Aart Bik authored 9 years ago

Rationale:
Recognizing this important operation as an intrinsic has
various advantages:
(1) having the no-side-effects/no-throw allows for
    much more GVN/LICM/BCE.
(2) Some architectures, like x86_64, provide direct
    support for this operation.

Performance improvements on X86_64:
CheckersEvalBench (32-bit bitboard): 27,210KNS -> 36,798KNS  =  + 35%
ReversiEvalBench  (64-bit bitboard): 52,562KNS -> 89,086KNS  =  + 69%

Change-Id: I65d549b0469b7909b12c6611cdc34a8640a5751f

3f67e692

14 Oct, 2015 1 commit

X86_64 jump tables for PackedSwitch · 9c86b485

Mark Mendell authored 9 years ago


Implement PackedSwitch using a jump table of offsets to blocks.

Bug: 24092914
Bug: 21119474
Change-Id: I83430086c03ef728d30d79b4022607e9245ef98f
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>

9c86b485

17 Sep, 2015 1 commit

ART: Refactor intrinsics slow-paths · 85b62f23

Andreas Gampe authored 9 years ago

Refactor slow paths so that there is a default implementation for
common cases (only arm64 with vixl is special). Write a generic
intrinsic slow-path that can be reused for the specific architectures.
Move helper functions into CodeGenerator so that they are accessible.

Change-Id: Ibd788dce432601c6a9f7e6f13eab31f28dcb8550

85b62f23

16 Sep, 2015 1 commit

Add X86 bsf and rotate instructions · bcee092d

Mark Mendell authored 9 years ago


These are for use in new intrinsics.  Bsf (Bit Scan Forward) is used in
{Long,Integer}NumberOfTrailingZeros and the rotates are used in
{Long,Integer}Rotate{Left,Right}.

Change-Id: Icb599d7e1eec4e4ea9e5b4f0b1654c7b8d4de678
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>

bcee092d

26 Aug, 2015 1 commit

X86: Assembler support for near labels · 73f455ec

Mark Mendell authored 9 years ago


The optimizing compiler uses 32 bit relative jumps for all forward
jumps, just in case the offset is too large to fit in one byte.  Some of
the generated code knows that the jumps will in fact fit.

Add a 'NearLabel' class to the x86 and x86_64 assemblers.  This will be
used to generate known short forward branches.

Add jecxz/jrcxz instructions, which only handle a short offset.  They
will be used for intrinsics.

Add tests for the new instructions and NearLabel.

Change-Id: I11177f36394d35d63b32364b0e6289ee6d97de46
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>

73f455ec

14 Aug, 2015 2 commits

Add 'bsr' instruction to x86 and x86_64 · 8ae3ffb2

Mark Mendell authored 9 years ago


Add support for 'bsr' instruction.  Add tests.

Change-Id: I1cd8b30d7f3f5ee7fbeef8124cc6a31bf8ce59d5
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>

8ae3ffb2

Add rep movsw to x86 and x86_64 instructions. · b9c4bbee

Mark Mendell authored 10 years ago


Add 'REP MOVSW' as a supported instruction for x86 32 and 64 bit.

Added tests.

Change-Id: I1c615ac1e7fa46c48983c90f791b92be0375c8b8
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>

b9c4bbee

11 Aug, 2015 1 commit

[optimizing] More x86_64 code improvements · cfa410b0

Mark Mendell authored 10 years ago


Use the constant area some more, use 32-bit immediates in movq
instructions when possible, and other small tweaks.

Remove the commented out code for Math.Abs(float/double) as it would
fail for baseline compiler due to the output being the same as the
input.

Change-Id: Ifa39f1865b94cec2e1c0a99af3066a645e9d3618
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>

cfa410b0

04 Aug, 2015 1 commit
- Added repe_cmpsq instruction to x86_64 assembler · 3fd0e6a8
  agicsaki authored 9 years ago
```
Change-Id: I9085694fd3313581b2775a8267ccda58fec19a1a
```
  3fd0e6a8
31 Jul, 2015 1 commit

Added repe_cmpsl instruction to x86, x86_64 assemblers · 970abfb6

agicsaki authored 9 years ago

Support for this instruction has already been added to the disassembler
in commit 124b392d.

Change-Id: I6e8401a7b814618758427f5cc6b4992e265f937c

970abfb6

30 Jul, 2015 1 commit

Optimizing: Add Non Temporal Move support for x86 · 7a08fb53

Mark Mendell authored 10 years ago


Add moves that don't pollute the data cache.  These can be used for
assigning large data structures.

Change-Id: I14d91ba6264f5ce2f128033d65d59b2536426643
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>

7a08fb53

28 Jul, 2015 1 commit

Optimizing: Use more X86 3 operand multiplies · 4a2aa4af

Mark Mendell authored 10 years ago


The X86_64 code generator generated 3 operand multiplies for long
multiplication only.  Add support for 3 operand multiplication for
int as well for both X86 and X86_64.

Note that the RHS operand must be a 32 bit constant, and that it is
possible for the constant to end up in a register (!) due to a previous
use by another instruction.  Handle this case by checking the operand,
otherwise the first input might not be the same as the output, due to
the use of Any().

Also allow stack operands for multiplication.

Change-Id: I8f3d14cc01e9a91210f418258aa18065ee87979d
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>

4a2aa4af

27 Jul, 2015 1 commit
- Added repe_cmpsw instruction to x86, x86_64 assemblers · 71311f86
  agicsaki authored 10 years ago
```
Change-Id: I7634959eebb64d607f47497db320d5c2afdef16b
```
  71311f86
01 Jul, 2015 1 commit

Implement heap poisoning in ART's Optimizing compiler. · 4d02711e

Roland Levillain authored 10 years ago

- Instrument ARM, ARM64, x86 and x86-64 code generators.
- Note: To turn heap poisoning on in Optimizing, set the
  environment variable `ART_HEAP_POISONING' to "true"
  before compiling ART.

Bug: 12687968
Change-Id: Ib3120b38cf805a8a50207a314b9ccc90c8d93740

4d02711e

02 Jun, 2015 1 commit

Move mirror::ArtMethod to native · 3d21bdf8

Mathieu Chartier authored 10 years ago

Optimizing + quick tests are passing, devices boot.

TODO: Test and fix bugs in mips64.

Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS.
Some of the savings are from removal of virtual methods and direct
methods object arrays.

Bug: 19264997

(cherry picked from commit e401d146)

Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d

Fix some ArtMethod related bugs

Added root visiting for runtime methods, not currently required
since the GcRoots in these methods are null.

Added missing GetInterfaceMethodIfProxy in GetMethodLine, fixes
--trace run-tests 005, 044.

Fixed optimizing compiler bug where we used a normal stack location
instead of double on ARM64, this fixes the debuggable tests.

TODO: Fix JDWP tests.

Bug: 19264997

Change-Id: I7c55f69c61d1b45351fd0dc7185ffe5efad82bd3

ART: Fix casts for 64-bit pointers on 32-bit compiler.

Bug: 19264997
Change-Id: Ief45cdd4bae5a43fc8bfdfa7cf744e2c57529457

Fix JDWP tests after ArtMethod change

Fixes Throwable::GetStackDepth for exception event detection after
internal stack trace representation change.

Adds missing ArtMethod::GetInterfaceMethodIfProxy call in case of
proxy method.

Bug: 19264997
Change-Id: I363e293796848c3ec491c963813f62d868da44d2

Fix accidental IMT and root marking regression

Was always using the conflict trampoline. Also included fix for
regression in GC time caused by extra roots. Most of the regression
was IMT.

Fixed bug in DumpGcPerformanceInfo where we would get SIGABRT due to
detached thread.

EvaluateAndApplyChanges:
From ~2500 -> ~1980
GC time: 8.2s -> 7.2s due to 1s less of MarkConcurrentRoots

Bug: 19264997
Change-Id: I4333e80a8268c2ed1284f87f25b9f113d4f2c7e0

Fix bogus image test assert

Previously we were comparing the size of the non moving space to
size of the image file.

Now we properly compare the size of the image space against the size
of the image file.

Bug: 19264997
Change-Id: I7359f1f73ae3df60c5147245935a24431c04808a

[MIPS64] Fix art_quick_invoke_stub argument offsets.

ArtMethod reference's size got bigger, so we need to move other args
and leave enough space for ArtMethod* and 'this' pointer.

This fixes mips64 boot.

Bug: 19264997
Change-Id: I47198d5f39a4caab30b3b77479d5eedaad5006ab

3d21bdf8

30 May, 2015 1 commit

Move mirror::ArtMethod to native · e401d146

Mathieu Chartier authored 10 years ago

Optimizing + quick tests are passing, devices boot.

TODO: Test and fix bugs in mips64.

Saves 16 bytes per most ArtMethod, 7.5MB reduction in system PSS.
Some of the savings are from removal of virtual methods and direct
methods object arrays.

Bug: 19264997
Change-Id: I622469a0cfa0e7082a2119f3d6a9491eb61e3f3d

e401d146

26 May, 2015 2 commits

ART: Clean up arm64 kNumberOfXRegisters usage. · 41b175ab

Vladimir Marko authored 10 years ago

Avoid undefined behavior for arm64 stemming from 1u << 32 in
loops with upper bound kNumberOfXRegisters.

Create iterators for enumerating bits in an integer either
from high to low or from low to high and use them for
<arch>Context::FillCalleeSaves() on all architectures.

Refactor runtime/utils.{h,cc} by moving all bit-fiddling
functions to runtime/base/bit_utils.{h,cc} (together with
the new bit iterators) and all time-related functions to
runtime/base/time_utils.{h,cc}. Improve test coverage and
fix some corner cases for the bit-fiddling functions.

Bug: 13925192

(cherry picked from commit 80afd020)

Change-Id: I905257a21de90b5860ebe1e39563758f721eab82

41b175ab

ART: Clean up arm64 kNumberOfXRegisters usage. · 80afd020

Vladimir Marko authored 10 years ago

Avoid undefined behavior for arm64 stemming from 1u << 32 in
loops with upper bound kNumberOfXRegisters.

Create iterators for enumerating bits in an integer either
from high to low or from low to high and use them for
<arch>Context::FillCalleeSaves() on all architectures.

Refactor runtime/utils.{h,cc} by moving all bit-fiddling
functions to runtime/base/bit_utils.{h,cc} (together with
the new bit iterators) and all time-related functions to
runtime/base/time_utils.{h,cc}. Improve test coverage and
fix some corner cases for the bit-fiddling functions.

Bug: 13925192
Change-Id: I704884dab15b41ecf7a1c47d397ab1c3fc7ee0f7

80afd020

11 May, 2015 1 commit

ART: x86 indexOf intrinsics for the optimizing compiler · 21030dd5

Andreas Gampe authored 10 years ago

Add intrinsics implementations for indexOf in the optimizing
compiler. These are mostly ported from Quick. Add instruction
support to assemblers where necessary.

Change-Id: Ife90ed0245532a5c436a26fe84715dc357f353c8

21030dd5

22 Apr, 2015 1 commit

Replace NULL with nullptr · 2cebb24b

Mathieu Chartier authored 10 years ago

Also fixed some lines that were too long, and a few other minor
details.

Change-Id: I6efba5fb6e03eb5d0a300fddb2a75bf8e2f175cb

2cebb24b

21 Apr, 2015 1 commit

[optimizing] Use more X86_64 addressing modes · 40741f39

Mark Mendell authored 10 years ago


Allow constant and memory addresses to more X86_64 instructions.

Add memory formats to X86_64 instructions to match.

Fix a bug in cmpq(CpuRegister, const Address&).

Allow mov <addr>,immediate (instruction 0xC7) to be a valid faulting
instruction.

Change-Id: I5b8a409444426633920cd08e09f687a7afc88a39
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>

40741f39

13 Apr, 2015 3 commits

Exercise the x86 and x86-64 FILD and FISTP instructions. · 0a18601f

Roland Levillain authored 10 years ago

- Ensure the double- and quadword x87 (FPU) instructions for
  integer loading (resp. fildl and fildll) are properly
  generated by the x86 and x86-64 generators (resp.
  X86Assembler::filds/X86_64Assembler::filds and
  X86Assembler::fildl/X86_64Assembler::fildl).
- Ensure the double- and quadword x87 (FPU) instructions for
  integer storing & popping (resp. filstpl and fistpll) are
  properly generated by the x86 and x86-64 generators (resp.
  X86Assembler::fistps/X86_64Assembler::fistps and
  X86Assembler::fistpl/X86_64Assembler::fistpl).

These instructions can be used in the implementation of the
long-to-float and long-to-double Dex type conversions.

Change-Id: Iade52a9aee326d189d77d3dbd352a2b5dab52e46

0a18601f

Revert "Optimizing: Fix long-to-fp conversion on x86." · 386ce406
Nicolas Geoffray authored 10 years ago
```
Test fails on arm.

This reverts commit 2d45b4df.

Change-Id: Id2864917b52f7ffba459680303a2d15b34f16a4e
```
386ce406

Optimizing: Fix long-to-fp conversion on x86. · 2d45b4df

Serguei Katkov authored 10 years ago


long-to-fp conversion implemented using SSE loses the precision.
The test is included. CL uses FPU to provide the correct result.

Change-Id: I8eaf3c46819a8cb52642a7e7d7c4e3e0edbc88db
Signed-off-by: Serguei Katkov <serguei.i.katkov@intel.com>

2d45b4df

10 Apr, 2015 1 commit

[optimizing] Address x86_64 RIP patch comments · 39dcf55a

Mark Mendell authored 10 years ago

Nicolas had some comments after the patch
https://android-review.googlesource.com/#/c/144100

 had merged.  Fix the
problems that he found.

Change-Id: I40e8a4273997860db7511dc8f1986281b72bead2
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>

39dcf55a

09 Apr, 2015 2 commits

[optimizing] Add RIP support for x86_64 · f55c3e08

Mark Mendell authored 10 years ago


Support a constant area addressed using RIP on x86_64. Use it for FP
operations to avoid loading constants into a CPU register and moving
to a XMM register.

Change-Id: I58421759ef2a8475538876c20e696ec787015a72
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>

f55c3e08

Speedup div/rem by constants on x86 and x86_64 · 0f88e870

Guillaume Sanchez authored 10 years ago

This is done using the algorithms in Hacker's Delight chapter 10.

Change-Id: I7bacefe10067569769ed31a1f7834f796fb41119

0f88e870

07 Apr, 2015 2 commits

Remove the old CFI infrastructure. · 8c57831b
David Srbecky authored 10 years ago
```
Change-Id: I12a17a8a1c39ffccaa499c328ebac36e4d74dc4e
```
8c57831b

[optimizing] Implement more x86/x86_64 intrinsics · 58d25fd0

Mark Mendell authored 10 years ago


Implement CAS and bit reverse and byte reverse intrinsics that were
missing from x86 and x86_64 implementations.

Add assembler tests and compareAndSwapLong test.

Change-Id: Iabb2ff46036645df0a91f640288ef06090a64ee3
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>

58d25fd0

01 Apr, 2015 1 commit

[optimizing] Implement x86/x86_64 math intrinsics · fb8d279b

Mark Mendell authored 10 years ago


Implement floor/ceil/round/RoundFloat on x86 and x86_64.
Implement RoundDouble on x86_64.

Add support for roundss and roundsd on both architectures.  Support them
in the disassembler as well.

Add the instruction set features for x86, as the 'round' instruction is
only supported if SSE4.1 is supported.

Fix the tests to handle the addition of passing the instruction set
features to x86 and x86_64.

Add assembler tests for roundsd and roundss to x86_64 assembler tests.

Change-Id: I9742d5930befb0bbc23f3d6c83ce0183ed9fe04f
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>

fb8d279b

13 Mar, 2015 1 commit

[optimizing] Improve x86, x86_64 code · 3f6c7f61

Mark Mendell authored 10 years ago


Tweak the generated code to allow more use of constants and other small
changes
- Use test vs. compare to 0
- EmitMove of 0.0 should use xorps
- VisitCompare kPrimLong can use constants
- cmp/add/sub/mul on x86_64 can use constants if in int32_t range
- long bit operations on x86 examine long constant high/low to optimize
- Use 3 operand imulq if constant is in int32_t range

Change-Id: I2dd4010fdffa129fe00905b0020590fe95f3f926
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>

3f6c7f61

19 Feb, 2015 1 commit

ART: Templatize IsInt & IsUint · ab1eb0d1

Andreas Gampe authored 10 years ago

Ensure that things are used correctly.

Change-Id: I76f082b32dcee28bbfb4c519daa401ac595873b3

ab1eb0d1

06 Feb, 2015 1 commit

x64 goodness. · 748f140d

Nicolas Geoffray authored 10 years ago

- Use test instead of cmp when comparing against 0.
- Make it possible to use lea for add.
- Use xor instead of mov when loading 0.

Change-Id: Ide95c4e2d9b773e952412892f2df6869600c324e

748f140d

21 Jan, 2015 2 commits

Enable core callee-save on x64. · 98893968

Nicolas Geoffray authored 10 years ago

Will work on other architectures and FP support in other CLs.

Change-Id: I8cef0343eedc7202d206f5217fdf0349035f0e4d

98893968

[optimizing compiler] Implement inline x86 FP '%' · 24f2dfae

Mark Mendell authored 10 years ago


Replace the calls to fmod/fmodf by inline code as is done in the Quick
compiler.

Remove the quick fmod/fmodf runtime entries, as they are no longer in
use.

64 bit code generator Move() routine needed to be enhanced to handle
constants, as Location::Any() allows them to be generated.

Change-Id: I6b6a42f6faeed4b0b3c940453e487daf5b25d184
Signed-off-by: Mark Mendell <mark.p.mendell@intel.com>

24f2dfae

16 Jan, 2015 1 commit

Add implicit null checks for the optimizing compiler · cd6dffed

Calin Juravle authored 10 years ago

- for backends: arm, arm64, x86, x86_64
- fixed parameter passing for CodeGenerator
- 003-omnibus-opcodes test verifies that NullPointerExceptions work as
expected

Change-Id: I1b302acd353342504716c9169a80706cf3aba2c8

cd6dffed

15 Jan, 2015 1 commit

ART: Optimizing compiler intrinsics · 71fb52fe

Andreas Gampe authored 10 years ago

Add intrinsics infrastructure to the optimizing compiler.

Add almost all intrinsics supported by Quick to the x86-64 backend.
Further intrinsics require more assembler support.

Change-Id: I48de9b44c82886bb298d16e74e12a9506b8e8807

71fb52fe