1040 Commits

Author SHA1 Message Date
Rob Clark
c4f423aa36 freedreno/ir3: lower load_barycentric_at_sample
This lowers load_barycentric_at_sample to load_sample_pos_from_id plus
load_barycentric_at_offset.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark
4e3ce224a7 freedreno: update generated headers
Pull in updates for sample shading.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark
6d6ec2d4d2 freedreno/ir3: cleanup instruction builder macros
De-duplicate the "normal" and "flags" versions of the macros, and while
at it go ahead and add "flags" versions for all the remaining macros,
since we'll at least need INSTR1F in a following commit.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark
77b3b96a3b freedreno/ir3: more emit-cat5 fixes
Couple more opcodes which don't take a sampler id as first arg.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark
9032f0690c freedreno/ir3: fix rgetpos decoding
It takes an argument.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark
4d08c1b595 compiler: rename SYSTEM_VALUE_VARYING_COORD
And add corresponding enums for different sorts of varying
interpolation.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark
6503918689 freedreno/drm: update for robustness
Update UABI header and add FD_PP_PGTABLE and FD_NR_FAULTS params.

Robustness can be supported by a kernel which provides the new ABI if it
also indicates that per-process pagetables are in use.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:07 -07:00
Bas Nieuwenhuizen
f2e0f5c3c4 vulkan/wsi: Add X11 adaptive sync support based on dri options.
The dri options are optional. When the dri options are not provided
the WSI will not  use adaptive sync.

FWIW I think for xf86-video-amdgpu this still requires an X11 config
option, so only people who opt in can get possible regressions from this.

So then the remaining question is: why do this in the WSI?

It has been suggested in another MR that the application sets this.
However, I disagree with that as I don't think we'll ever get a
reasonable set of applications setting it.

The next questions is whether this can be a layer. It definitely
can be as implemented now. However, I think this generally fits
well with the function of the WSI. Furthemore, for e.g. the DISPLAY
WSI this is much harder to do in a layer.

Of course, most of the WSI could almost be a layer, but I think
this still fits best in the WSI.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-23 23:49:39 +00:00
Rob Clark
a9241edfa3 freedreno/ir3: fix const assert
Fixes: fe8c57e859 freedreno/ir3: use nir_src_as_uint in a few places
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-19 12:36:06 -07:00
Kristian H. Kristensen
18ce6ac632 freedreno/ir3: Mark ir3_context_error() as NORETURN
Fixes a few warnings.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-04-18 11:46:13 -07:00
Dylan Baker
95aefc94a9 Delete autotools
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Matt Turner <mattst88@gmail.com>
2019-04-15 13:44:29 -07:00
Karol Herbst
14531d676b nir: make nir_const_value scalar
v2: remove & operator in a couple of memsets
    add some memsets
v3: fixup lima

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
2019-04-14 22:25:56 +02:00
Karol Herbst
fe8c57e859 freedreno/ir3: use nir_src_as_uint in a few places
v2 (Jason Ekstrand):
 - Add even more places

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-14 22:25:56 +02:00
Timothy Arceri
035759b61b nir/i965/freedreno/vc4: add a bindless bool to type size functions
This required to calculate sizes correctly when we have bindless
samplers/images.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-04-12 09:02:59 +02:00
Jason Ekstrand
6279074de1 nir: Get rid of global registers
We have a pass to lower global registers to locals and many drivers
dutifully call it.  However, no one ever creates a global register ever
so it's all dead code.  It's time we bury it.

Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-04-09 00:29:36 -05:00
Timothy Arceri
e30804c602 nir/radv: remove restrictions on opt_if_loop_last_continue()
When I implemented opt_if_loop_last_continue() I had restricted
this pass from moving other if-statements inside the branch opposite
the continue. At the time it was causing a bunch of spilling in
shader-db for i965.

However Samuel Pitoiset noticed that making this pass more aggressive
significantly improved the performance of Doom on RADV. Below are
the statistics he gathered.

28717 shaders in 14931 tests
Totals:
SGPRS: 1267317 -> 1267549 (0.02 %)
VGPRS: 896876 -> 895920 (-0.11 %)
Spilled SGPRs: 24701 -> 26367 (6.74 %)
Code Size: 48379452 -> 48507880 (0.27 %) bytes
Max Waves: 241159 -> 241190 (0.01 %)

Totals from affected shaders:
SGPRS: 23584 -> 23816 (0.98 %)
VGPRS: 25908 -> 24952 (-3.69 %)
Spilled SGPRs: 503 -> 2169 (331.21 %)
Code Size: 2471392 -> 2599820 (5.20 %) bytes
Max Waves: 586 -> 617 (5.29 %)

The codesize increases is related to Wolfenstein II it seems largely
due to an increase in phis rather than the existing jumps.

This gives +10% FPS with Doom on my Vega56.

Rhys Perry also benchmarked Doom on his VEGA64:

Before: 72.53 FPS
After:  80.77 FPS

v2: disable pass on non-AMD drivers

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-09 11:29:41 +10:00
Rob Clark
7ff6705b8d freedreno/ir3: convert to "new style" frag inputs
Add support for load_barycentric_pixel, load_interpolated_input, and
friends.  For now, this retains support for old-style inputs, which can
probably be dropped with some ttn work.

Prep work for sample-shading support.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-30 12:56:01 -04:00
Rob Clark
fc865de777 freedreno/ir3: add pass to move varying loads
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-30 12:56:01 -04:00
Rob Clark
831f1a05c0 freedreno/ir3: rework varying packing
Originally we kept track of a table of inputs.  But with new-style frag
inputs this becomes awkward.  Re-work it so that initially we assigned
un-packed varying locations, and then after the shader is compiled scan
to find actual used inputs, and re-pack.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-30 12:56:01 -04:00
Rob Clark
91a1354cd6 freedreno/ir3: re-indent comment
Make it more clear that it applies to the following 'case' statements,
rather than the previous one.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-30 12:56:01 -04:00
Rob Clark
26e2906382 freedreno/ir3: reads/writes to unrelated arrays are not dependent
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-28 14:36:24 -04:00
Rob Clark
d71ce69d9c freedreno/ir3: sched fix
Not sure why new-style frag inputs start triggering this.  But we
probably shouldn't consider src's from other blocks.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-28 14:36:24 -04:00
Kristian H. Kristensen
107a8ec3b3 freedreno/ir3: Add workaround for VS samgq
This instruction needs a workaround when used from vertex shaders.

Fixes:

  dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler2dshadow_vertex
  dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler3d_fixed_vertex
  dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler3d_float_vertex
  dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler2dshadow_vertex
  dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler3d_fixed_vertex
  dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler3d_float_vertex
  dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler2dshadow_vertex

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-28 10:26:32 -07:00
Kristian H. Kristensen
f30d4a1cca freedreno/ir3: Don't access beyond available regs
emit_cat5() needs to check if the last optional reg is there before it
accesses it.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-28 10:26:32 -07:00
Kristian H. Kristensen
893425a607 freedreno/ir3: Push UBOs to constant file
We have a rather big constant file and it seems that the best way to
use it is to upload all UBOs and lower UBO access the load_uniform.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-27 13:26:02 -07:00
Kristian H. Kristensen
3c8779af32 freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS
This commit turns on the gallium cap and adds a pass to lower the
load_ubo intrinsics for block 0 back to load_uniform intrinsics and
adjust the backend where the cap switches units from vec4s to dwords.

As we stop using ir3_glsl_type_size() for uniform layout, this also
corrects an issue where we would allocate a vec4 slot for samplers in
uniforms, fixing:

  dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_fragment
  dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_vertex
  dEQP-GLES3.functional.shaders.struct.uniform.sampler_nested_fragment
  dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_vertex
  dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_fragment

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-27 13:26:02 -07:00
Kristian H. Kristensen
c7c432738a freedreno/ir3: Fix operand order for DSX/DSY
Most cat5 instructions are constructed using ir3_SAM, which uses
regs[1] for the (sampler, tex) src. Not DSX/DSY though, so we look up
src1 and src2 differently for those two.

Fixes: 1dffb089 ("freedreno/ir3: fix sam.s2en encoding")
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-25 18:36:48 -07:00
Kristian H. Kristensen
a752422bd4 freedreno/ir3: Track whether shader needs derivatives
In 1088b788 ("freedreno/ir3: find # of samplers from uniform vars") we
started counting number of samplers based on the uniform vars instead
of number of cat5 instructions.  We used the number of samplers to
determine whether to enable derivatives, but when we only use
derivatives and no samplers, that now breaks.  Track whether we need
derivatives explicitly and use that to enable the state.

Fixes: 1088b788 ("freedreno/ir3: find # of samplers from uniform vars")
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-25 18:36:48 -07:00
Samuel Pitoiset
23d30f4099 spirv,nir: lower frexp_exp/frexp_sig inside a new NIR pass
This lowering isn't needed for RADV because AMDGCN has two
instructions. It will be disabled for RADV in an upcoming series.

While we are at it, factorize a little bit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-22 19:41:46 +01:00
Rob Clark
bf5a92811d freedreno/ir3: disable early-z for SSBO/image writes
Fixes:

dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_stencil
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_stencil_fbo

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-22 08:53:28 -04:00
Rob Clark
dbac1a80d1 freedreno/ir3: rename has_kill to no_earlyz
There are other cases where we need to disable early-z, like image
writes.  So rename to something more generic.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-22 08:53:28 -04:00
Rob Clark
6e781a01b9 freedreno/ir3: dynamic UBO indexing vs 64b pointers
Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_fragment
and similar things with multiple UBOs

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Rob Clark
2e01c534f4 freedreno/ir3: fix bit_count
Seems like it can only work 16b at a time.  Fixes
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitcount.*

TODO need to check if this limitation applies to a3xx as well.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Rob Clark
3d8349048b freedreno/ir3: additional lowering
For some things that show up when we expose higher glsl

TODO check blob traces to see if we have instructions for some of this?
I guess we don't but worth a check..

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Rob Clark
bcd81d2387 freedreno/ir3: optimize sam.s2en to sam
Detect when sampler/texture idx are immediate and switch to non s2en
encoding.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Rob Clark
1443694ee5 freedreno/ir3: enable indirect tex/samp (sam.s2en)
For now it uses indirect for everything.  The next step is for the
ir3_cp pass to detect the case that tex and samp idx are immediate
and convert the sam instruction back to the non .s2en variant.  But
doing that in a following patch so we can shake out the bugs with
.s2en more easily.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Rob Clark
1088b788d8 freedreno/ir3: find # of samplers from uniform vars
When we have indirect samplers, we cannot tell the max sampler
referenced.  Instead just refer to the number of sampler uniforms.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Rob Clark
8eb16ae8bf freedreno/ir3: fix regmask for merged regs
On a6xx+ with half-regs conflicting with full-regs, the legalize pass
needs to set appropriate sync bits, such as (sy), on writes to full regs
that conflict with half regs, and visa-versa.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Rob Clark
1dffb089f9 freedreno/ir3: fix sam.s2en encoding
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Rob Clark
45b7a581b4 freedreno/ir3: fix sam.s2en decoding
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Rob Clark
2d31cf9d3b freedreno/ir3/ra: fix half-class conflicts
On a6xx, half-regs conflict with full-regs.  But we were only setting up
conflicts for the first class (ie. scalar, but not hvec2/hvec3/hvec4),
resulting in higher half-reg classes getting assigned to regs that
overwrite full-regs.

Noticed while trying to enable indirect-sampler (sam.s2en) which uses an
hvec2 argument to pass the sampler/tex index.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Rob Clark
cc5ca9391c freedreno/ir3 better cat6 encoding detection
These two bits seem to be a better way to detect which encoding we are
looking at.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-21 09:13:05 -04:00
Jason Ekstrand
08f804ec0c anv,radv,turnip: Lower TG4 offsets with nir_lower_tex
v2: turn on for turnip as well (Karol Herbst)

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-03-21 02:58:41 +00:00
Rob Clark
70904eb99a freedreno/ir3/a6xx: fix ssbo comp_swap
One line left out of the conversion to ir3 ssbo intrinsics on a6xx.

Fixes: 2e4525883f ir3/compiler: Enable lower_io_offsets pass and handle new SSBO intrinsics
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-03-20 11:48:13 -04:00
Bas Nieuwenhuizen
42ed6d9789 turnip: Deconflict vk_format_table regeneration
Avoids

src/freedreno/vulkan/meson.build:42:0: ERROR:  Tried to create target "vk_format_table.c", but a target of that name already exists.

when building both radv and turnip.

Fixes: 26380b3a9f "turnip: Add driver skeleton (v2)"
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-03-16 14:38:51 +00:00
Bas Nieuwenhuizen
e1161d2ea7 turnip: Fix GCC compiles.
Apparently GCC does not consider static const variables to be
integer constants, and hence the array size and the static assert
result in compile failures.

Fixes: 4b9f967cd1 "turnip: add a more complete format table"
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-03-16 14:38:51 +00:00
Rob Clark
ca11f9263e freedreno/ir3/cp: fix ldib bug
Something that we didn't hit earlier because of the extra shr.b

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-03-15 10:52:11 -07:00
Eduardo Lima Mitev
759ceda07e ir3/lower_io_offsets: Try propagate SSBO's SHR into a previous shift instruction
While we lack value range tracking, this patch tries to 'manually' propogate
the division by 4 to calculate SSBO element-offset, into a possible previous
shift operation (shift left or right); checking that it is safe to do so.

This should help in cases like ie. when accessing a field in an array of
structs, where the offset is likely defined as base plus a multiplication
by a struct or array element size.

See dEQP test 'dEQP-GLES31.functional.ssbo.atomic.xor.highp_uint'
for an example of a shader that benefits from this.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-13 21:19:44 +01:00
Eduardo Lima Mitev
2e4525883f ir3/compiler: Enable lower_io_offsets pass and handle new SSBO intrinsics
These intrinsics have the offset in dwords already computed in the last
source, so the change here is basically using that instead of emitting
the ir3_SHR to divide the byte-offset by 4.

The improvement in shader stats is significant, of up to ~15% in
instruction count in some cases. Tested only on a5xx.

shader-db is unfortunately not very useful here because shaders that use
SSBO require GLSL versions that are not supported by freedreno yet.

For examples, most Khronos CTS tests under 'dEQP-GLES31.functional.ssbo.*'
are helped.

A random case:

dEQP-GLES31.functional.ssbo.layout.2_level_array.packed.row_major_mat3x2

with current master:

; CL prog 14/1: 1252 instructions, 0 half, 48 full
; 8 const, 8 constlen
; 61 (ss), 43 (sy)

with the SSBO dword-offset moved to NIR:

; CL prog 14/1: 1053 instructions, 0 half, 45 full
; 7 const, 7 constlen
; 34 (ss), 73 (sy)

The SHR previously emitted for every single SSBO instruction disappears
in most cases, and the dword-offset ends up embedded in the STGB
instruction as immediate in many cases as well.

There are also a few of those tests that are currently failing on register
allocation, that start to pass as a result of reducing the pressure. At least
these, probably more:

dEQP-GLES31.functional.ssbo.layout.random.unsized_arrays.24
dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.6
dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.17
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays.14
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays_instance_arrays.5
dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays_instance_arrays.7

No regressions observed with relevant CTS and piglit tests.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-13 21:19:44 +01:00
Eduardo Lima Mitev
9dd0cfafc9 ir3/nir: Add a new pass 'ir3_nir_lower_io_offsets'
This NIR->NIR pass implements offset computations that are currently
done on the IR3 backend compiler, to give NIR a better chance of
optimizing them.

For now, it supports lowering the dword-offset computation for SSBO
instructions. It will take an SSBO intrinsic and replace it with the
new ir3-specific version that adds an extra source. That source will
hold the SSA value resulting from inserting a division by 4 (an SHR op)
of the original byte-offset source already provided by NIR in one of
the intrinsic sources.

Note that on a6xx the original byte-offset is not needed, so we could
potentially replace that source instead of adding a new one. But to
keep things simple and consistent we always add the new source and
a6xx will just ignore the original one.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-03-13 21:19:44 +01:00