Compare commits

..

1484 Commits

Author SHA1 Message Date
Juan A. Suarez Romero
b43b55d461 nir/spirv: return after emitting a branch in block
When emitting a branch in a block, it does not make sense to continue
processing further instructions, as they will not be reachable.

This fixes a nasty case with a loop with a branch that both then-part
and else-part exits the loop:

%1 = OpLabel
     OpLoopMerge %2 %3 None
     OpBranchConditional %false %2 %2
%3 = OpLabel
     OpBranch %1
%2 = OpLabel
    [...]

We know that block %1 will branch always to block %2, which is the merge
block for the loop. And thus a break is emitted. If we keep continuing
processing further instructions, we will be processing the branch
conditional and thus emitting the proper NIR conditional, which leads to
instructions after the break.

This fixes dEQP-VK.graphicsfuzz.continue-and-merge.

CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-28 09:47:06 +01:00
Eric Engestrom
0c3287e94d egl/android: replace magic 0=CbCr,1=CrCb with simple enum
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-02-28 07:44:46 +00:00
Caio Marcelo de Oliveira Filho
6a553bedcc st/nir: count num_uniforms for FS bultin shader
Usually the uniforms will be assigned locations and have their slots
counted automatically, but for builtin shaders the location assignment
is manual.  So count them too otherwise we get num_uniforms == 0.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-27 22:18:24 -08:00
Ray Zhang
b344e32cdf glx: fix shared memory leak in X11
call XShmDetach to allow X server to free shared memory

Fixes: bcd80be49a "drisw/glx: use XShm if possible"
Signed-off-by: Ray Zhang <zhanglei002@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-02-28 14:23:02 +10:00
Timothy Arceri
e907337fad radeonsi/nir: move si_lower_nir() call into compiler thread
This helps improve compile times. For example the shader-db dolphin
shader shaders/dolphin/ubershaders/120.shader_test goes from
~1.69 -> ~1.57 seconds on my machine with this change.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-02-28 11:54:06 +11:00
Timothy Arceri
7536af670b glsl: fix shader cache for packed param list
Some types of params such as some builtins are always padded. We
need to keep track of this so we can restore the list correctly.

Here we also remove a couple of cache entries that are not actually
required as they get rebuilt by the _mesa_add_parameter() calls.

This patch fixes a bunch of arb_texture_multisample and
arb_sample_shading piglit tests for the radeonsi NIR backend.

Fixes: edded12376 ("mesa: rework ParameterList to allow packing")

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-02-28 11:47:37 +11:00
Yevhenii Kolesnikov
07f4b4e403 i965: Fix allow_higher_compat_version workaround limited by OpenGL 3.0
Added check for higher compat profile being allowed
before assigning certain extensions.

Fixes: 272fe94942 (mesa: enable ARB_texture_buffer_* extensions in the Compatibility profile)

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107052
2019-02-28 10:25:16 +11:00
Lionel Landwerlin
6e184147dd intel/compiler: use correct swizzle for replacement
The optimization in 4cd1a0be76 introduced a replacement of :

cmp(8).z.f0.0 vgrf11.y:D, vgrf10.xxxx:D, vgrf2.xyyy:D
...
cmp(8).nz.f0.0 null.x:D, vgrf11.yyyy:D, 0D

By :

cmp(8).z.f0.0 vgrf15.x:D, vgrf10.xxxx:D, vgrf2.yyyy:D
...
mov(8) vgrf11.y:D, vgrf15.yyyy:D

The first cmp instruction is storing in x while the second mov is
sourcing from y. We need to take into account where the replacement on
the scan_inst destination is going to store thing so that the
replacement mov can source things from the correct location.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4cd1a0be76 ("i965/vec4: Propagate conditional modifiers from more compares to other compares")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109759
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-02-27 20:06:42 +00:00
Jonathan Marek
61e3188633 freedreno: catch failing fd_blit and fallback to software blit
Fixes cases where the fd_blit fails and never happens (ex: blit to etc1)

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
2019-02-27 18:46:28 +00:00
Jonathan Marek
e3591b0339 freedreno: use renderonly path for buffers allocated with modifiers
Now that freedreno has create_with_modifiers(), this "hack" is needed to
make some cases work. Copied from vc4.

Fixes: 41ddf1d1

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
2019-02-27 18:46:28 +00:00
Jonathan Marek
6c0fefb448 freedreno: a2xx: fix mipmapping for NPOT textures
Fixes: 3a273a4a

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
2019-02-27 18:46:28 +00:00
Jonathan Marek
4f23767590 freedreno: a2xx: fix fast clear for some gmem configurations
In freedreno_gmem.c, gmem_align of 0x8000 is used. Alignment used here
should be the same.

Fixes: 912a9c8d

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
2019-02-27 18:46:28 +00:00
Jonathan Marek
8eca6df5ed freedreno: a2xx: add use_hw_binning function
Fixes: cb2322c7

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
2019-02-27 18:46:28 +00:00
Jonathan Marek
357313ab0f freedreno: a2xx: don't write 4th vertex in mem2gmem
There is only room for 3 vertices now (RECT has 3 vertices).

Fixes: 6ef7700a

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
2019-02-27 18:46:28 +00:00
Erik Faye-Lund
71a76a47cc swr/codegen: fix autotools build
When the output directory was changed, the BUILT_SOURCES and build-rule
target-path was no longer correct, leading to races to generate the
sources and compiling them.

Fix this by updating both sets of paths, so automake see what's going on
here.

Fixes: 773b3ceaca ("swr/rast: Fix autotools and scons codegen")
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Alok Hota <alok.hota@intel.com>
2019-02-27 17:59:06 +00:00
Timo Aaltonen
738626daca util/os_misc: Add check for PIPE_OS_HURD
Fix build on Hurd.

Signed-off-by: Timo Aaltonen <tjaalton@debian.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2019-02-27 14:56:48 +00:00
Lionel Landwerlin
2fff5966d6 vulkan/overlay: install layer binary in libdir
This will allow multilib.

v2: Drop path from json file, dlopen should be able to locate the lib in libdir

v3: Switch from configure_file to install_data (Dylan)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109788
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-27 11:45:42 +00:00
Eric Engestrom
7763e664ce meson/swr: replace hard-coded path with current_build_dir()
Fixes: 93cd9905c8 "swr/rast: Cleanup and generalize gen_archrast"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Alok Hota <alok.hota@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-02-27 11:13:05 +00:00
Gert Wollny
b7201a468d nir: Add posibility to not lower to source mod 'abs' for ops with three sources
This is useful for r600 since there the abs source modifier is not supported
for ops with three sources

v2: Use correct logic to enable lowering to abs source mod (Eric Anhold)

Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-27 11:04:06 +00:00
Gurchetan Singh
ce112fcc87 virgl/vtest: deprecate protocol version 1
This is a partial revert of 9d81cd ("virgl: Pass resource size and
transfer offsets").

The adjustments made in the client code means there's various
mismatches when transfering data.

Let's fallback to protocol version 0 and deprecate protocol
version 1.  We can still use the protocol version 1 slots for
a shared memory transfer mechanism later.

Fixes:
  dEQP-GLES31.functional.copy_image.mixed.viewclass_128_bits_mixed.*_renderbuffer

Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
2019-02-27 11:02:29 +00:00
Tapani Pälli
b9acfef337 util: fix a warning when building against clang7 headers
Header xmmintrin.h conditionally includes emmintrin.h that defines
_MM_DENORMALS_ZERO_MASK, add ifndef to fix this warning.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-02-27 08:57:41 +02:00
Tapani Pälli
d1af8115f8 iris: add libmesa_iris_gen8 library to the build
Patch fixes iris build on Android.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-02-27 08:57:41 +02:00
Tapani Pälli
5e52184f72 android: make libbacktrace optional on USE_LIBBACKTRACE
Otherwise with VNDK enabled we fail linking:
   src/gallium/targets/dri/Android.mk: error: gallium_dri (native:vendor)
   should not link to libbacktrace.vendor (native:vndk_private)

Option makes it possible to use libbacktrace only when VNDK is not
enabled.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-02-27 08:56:46 +02:00
Tapani Pälli
a3c366c4b2 android: add liblog to libmesa_intel_common build
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-02-27 08:53:09 +02:00
Alyssa Rosenzweig
b7a5b81d14 panfrost/midgard: Allow flt to run on most units
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-27 03:56:56 +00:00
Alyssa Rosenzweig
4c82abb9b6 panfrost: Expose perf counters in environment
Previously, we were guarded by an #ifdef, which is generally a bad form.
This patch instead guards them behind an environmental variable.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-27 03:56:38 +00:00
Alyssa Rosenzweig
60270c83b5 panfrost: Identify 4-bit channel texture formats
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-27 03:56:17 +00:00
Alyssa Rosenzweig
90fd82c540 panfrost: Add RGB565, RGB5A1 texture formats
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-27 03:55:19 +00:00
Jose Maria Casanova Crespo
4122665dd9 iris: Enable ARB_shader_draw_parameters support
Additional VERTEX_ELEMENT_STATE are used to store basevertex and
baseinstance and drawid updating the DWordLength of the
3DSTATE_VERTEX_ELEMENTS command.

This passes all piglit tests for spec.*draw_parameters.* tests
and VK-GL-CTS KHR-GL45.shader_draw_parameters_tests.* tests.

Now we only mark a dirty_update when parameters are changed or
when we have an indirect draw.

We enable PIPE_CAP_DRAW_PARAMETERS on Iris.

There is no edge flag support in the Vertex Elements setup.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-26 13:28:38 -08:00
Pierre Moreau
1c9fdcefd4 clover: Fix indentation issues
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2019-02-26 21:02:07 +01:00
Pierre Moreau
5285fff5f9 clover: Only use devices supporting IR_NATIVE
Currently clover will advertise any device that advertises
PIPE_CAP_COMPUTE, even if they do not support PIPE_SHADER_IR_NATIVE,
which is the IR used internally by clover.
This avoids clover advertising devices as available even though they
actually are not supported.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2019-02-26 21:02:07 +01:00
Pierre Moreau
8f9b4a2be6 clover: Move platform extensions definitions to clover/platform.cpp
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
2019-02-26 21:02:07 +01:00
Pierre Moreau
b033620abf clover: Move device extensions definitions to core/device.cpp
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
2019-02-26 21:02:07 +01:00
Pierre Moreau
d42f5896c5 clover: Validate program and library linking options
Program linking options are only valid if the library was created with
the `-enable-link-options` option, which itself is only valid when
creating a library, and only when creating an executable.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2019-02-26 21:02:07 +01:00
Pierre Moreau
fccc6ecb52 clover: Disallow creating libraries from other libraries
If creating a library, do not allow non-compiled object in it, as
executables are not allowed, and libraries would make it really hard to
enforce the "-enable-link-options" flag.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
2019-02-26 21:02:07 +01:00
Pierre Moreau
bad161c894 clover/api: Fail if trying to build a non-executable binary
From the OpenCL 1.2 Specification, Section 5.6.2 (about clBuildProgram):

> If program is created with clCreateProgramWithBinary, then the
> program binary must be an executable binary (not a compiled binary or
> library).

Reviewed-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2019-02-26 21:02:07 +01:00
Pierre Moreau
25d4e65eb7 clover/api: Rework the validation of devices for building
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2019-02-26 21:02:07 +01:00
Pierre Moreau
505ec3a530 clover: Add an helper for checking if an IR is supported
Reviewed-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2019-02-26 21:02:07 +01:00
Pierre Moreau
67769c913f clover: Remove the TGSI backend as unused
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2019-02-26 21:02:07 +01:00
Pierre Moreau
669d00ba4c clover: Avoid warnings from new OpenCL headers
* Avoid warnings from references to deprecated CL 1.0, 1.2, 2.0 and 2.1 APIs.
* Avoid warnings from not defining CL_TARGET_OPENCL_VERSION.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2019-02-26 21:02:07 +01:00
Karol Herbst
ba8d21a8d3 clover: update ICD table to support everything up to 2.2
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
2019-02-26 21:02:07 +01:00
Pierre Moreau
dddc5649bf include/CL: Update to the latest OpenCL 2.2 headers
Acked-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-02-26 21:02:07 +01:00
Marek Olšák
2ae07830e7 gallium/u_tests: use a compute-only context to test GCN compute ring 2019-02-26 14:58:55 -05:00
Marek Olšák
a1378639ab radeonsi: always use compute rings for clover on CI and newer (v2)
initialize all non-compute context functions to NULL.

v2: fix SI
2019-02-26 14:58:55 -05:00
Bas Nieuwenhuizen
c0110477b5 radv: Interpolate less aggressively.
Seems like dxvk used integer builtins without setting the flat
interpolation decoration.

I believe in the current spec the app is required to set these,
but in the meantime to avoid breaking things in stable releases
(and so close to release for 19.0), only expand the interpolation
to float16 and struct (which cannot be builtins as our spirv parser
lowers the builtin block).

Fixes: f324784104 "radv: Allow interpolation on non-float types."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-26 18:51:35 +00:00
Drew Davenport
1fd79b4b6d util: Don't block SIGSYS for new threads
SIGSYS is needed for programs using seccomp for sandboxing.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-02-26 19:39:14 +01:00
Rob Clark
64206102fc freedreno/ir3: gsampler2DMSArray fixes
Array index should come before sample-id.  And exclude all isam variants
(which take integer texel coords) from adding of offset.

Fixes dEQP-GLES31.functional.texture.multisample.samples_1.use_texture_*_2d_array

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-26 13:19:44 -05:00
Rob Clark
a06bb486b0 freedreno/ir3/a6xx: fix atomic shader outputs
We also need to put in the output mov.  Possibly we could just fixup the
output register to read it directly from the dummy, but that is more
work and I guess dEQP is probably the only time you encounter this.

Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.atomic_counter.const_literal_fragment

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-26 13:19:44 -05:00
Rob Clark
db1fa21374 freedreno/a6xx: vertex_id is not _zero_based
Fixes dEQP-GLES31.functional.draw_base_vertex.draw_elements_base_vertex.builtin_variable.vertex_id

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-26 13:19:44 -05:00
Rob Clark
79180a0566 freedreno/a6xx: fix DRAW_IDX_INDIRECT max_indicies
The indirect offset does not effect the index buffer size.  Fixes all of
dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_100x100_drawcount_*
with drawcount > 1.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-26 13:19:44 -05:00
Rob Clark
cabe55a2e7 freedreno/ir3/a6xx: fix non-ssa atomic dst
We weren't propagating the array info for cases where result of atomic
is array/reg.  This can happen, for example, if result is part of a phi
web lowered to regs.

Fixes dEQP-GLES31.functional.ssbo.atomic.compswap.*

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-26 13:19:44 -05:00
Rob Clark
edd5b3126d freedreno/a6xx: fix ssbo alignment
Fixes a bunch of deqp ssbo tests that use multiple ssbo blocks packed
into a single buffer.

Note the a5xx value seems suspicious, but this is what blob seems to
advertise.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-26 13:19:44 -05:00
Rob Clark
cb884d8ab2 freedreno/ir3: use nopN encoding when possible
Use the (nopN) encoding for slightly denser shaders.. this lets us fold
nop instructions into the previous alu instruction in certain cases.

Shouldn't change the # of cycles a shader takes to execute, but reduces
the size.  (ex: glmark2 refract goes from 168 to 116 instructions)

Currently only enabled for a6xx, but I think we could enable this for
a5xx and possibly a4xx.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-26 13:19:44 -05:00
Rob Clark
04c2520d91 freedreno/a6xx: fix hangs with large shaders
We were overflowing instrlen (which is # of groups of 16 instructions)
in a couple dEQP tests, causing gpu hangs:

dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-26 13:19:44 -05:00
Brian Paul
6dabcb5bcf mesa: fix display list corner case assertion
This fixes a failed assertion in glDeleteLists() for the following
case:

list = glGenLists(1);
glDeleteLists(list, 1);

when those are the first display list commands issued by the
application.

When we generate display lists, we plug in empty lists created with
the make_list() helper.  This function uses the OPCODE_END_OF_LIST
opcode but does not call dlist_alloc() which would set the
InstSize[OPCODE_END_OF_LIST] element to non-zero.

When the empty list was deleted, we failed the InstSize[opcode] > 0
assertion.

Typically, display lists are created with glNewList/glEndList so we
set InstSize[OPCODE_END_OF_LIST] = 1 in dlist_alloc().  That's why
this bug wasn't found before.

To fix this failure, simply initialize the InstSize[OPCODE_END_OF_LIST]
element in make_list().

The game oolite was hitting this.

Fixes: https://github.com/OoliteProject/oolite/issues/325
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-02-26 09:56:45 -07:00
Brian Paul
cb52d4482d svga: fix dma.pending > 0 test
The dma.pending field is boolean, so testing for > 0 isn't right.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
2019-02-26 09:56:45 -07:00
Brian Paul
96ea977c79 svga: assorted whitespace and formatting fixes
Remove trailing whitespace, etc.

Trivial.
2019-02-26 09:56:45 -07:00
Brian Paul
a81eebf9bc st/mesa: whitespace/formatting fixes in st_cb_texture.c
Remove trailing whitespace, replace tabs w/ spaces, etc.

Trivial.
2019-02-26 09:56:45 -07:00
Eleni Maria Stea
fd37a19ac4 i965: fixed clamping in set_scissor_bits when the y is flipped
Calculating the scissor rectangle fields with the y flipped (0 on top)
can generate negative values that will cause assertion failure later on
as the scissor fields are all unsigned. We must clamp the bbox values
again to make sure they don't exceed the fb_height. Also fixed a
calculation error.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108999
          https://bugs.freedesktop.org/show_bug.cgi?id=109594

v2:
   - I initially clamped the values inside the if (Y is flipped) case
   and I made a mistake in the calculation: the clamp of the bbox[2] should
   be a check if (bbox[2] >= fbheight) bbox[2] = fbheight - 1 instead and I
   shouldn't have changed the ScissorRectangleYMax calculation. As the
   fixed code is equivalent with using CLAMP instead of MAX2 at the top of
   the function when bbox[2] and bbox[3] are calculated, and the 2nd is more
   clear, I replaced it. (Nanley Chery)

v3:
   - Reversed the CLAMP change in bbox[3] as the API guarantees that the
   viewport height is positive. (Nanley Chery)

v4:
  - Added nomination for the mesa-stable branch and the link to the second
  bugzilla bug (Nanley Chery)

CC: <mesa-stable@lists.freedesktop.org>
Tested-by: Paul Chelombitko <qamonstergl@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-02-26 08:23:26 -08:00
Eduardo Lima Mitev
0bf667984b freedreno/a6xx: Silence compiler warnings
util_format_compose_swizzles() expects 'const unsigned char' and we
are feeding it 'char'.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-02-26 14:15:33 +01:00
Kasireddy, Vivek
7cab8d3661 i965: Add support for sampling from XYUV images
Add support to the i965 DRI driver to sample from XYUV8888 buffers.

Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-26 13:08:52 +00:00
Kasireddy, Vivek
65600d0946 dri: Add XYUV8888 format
In addition to adding this format to the dri_interface header,
add an entry in the android and wayland backends as well.

Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-26 13:08:52 +00:00
Vivek Kasireddy
ff14d06be5 drm-uapi: Update headers from drm-next
Pull new updates from drm-next as of the following commit:

commit a5f2fafece141ef3509e686cea576366d55cabb6
Merge: 71f4e45a4ed3 860433ed2a55
Author: Dave Airlie <airlied@redhat.com>
Date:   Wed Feb 20 12:16:30 2019 +1000

    Merge https://gitlab.freedesktop.org/drm/msm into drm-next

Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-26 13:08:51 +00:00
Kasireddy, Vivek
78fb3fd17e nir/lower_tex: Add support for XYUV lowering
The memory layout associated with this format would be:
Byte:      0 1 2 3
Component: V U Y X

Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-26 13:08:51 +00:00
Lionel Landwerlin
913d711e0f imgui: update memory editor
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
2019-02-26 12:49:07 +00:00
Lionel Landwerlin
ab9ae080ec imgui: update commit
In commit 3950e7c11e ("imgui: bump copy") I forgot to update the
README about what copy of imgui we carry.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
2019-02-26 12:49:04 +00:00
Eric Engestrom
a213b927f2 driinfo: add DTD to allow the xml to be validated
This DTD can be used to validate the output and make sure any parsers
out there can handle it:
$ xmllint --noout --valid driinfo.xml

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-02-26 12:48:28 +00:00
Lionel Landwerlin
9646750822 vulkan/overlay: fix includes
The Loader/Validation-Layers repository allow the user to choose where
header files are installed. On my system I choose /usr/include
thinking it was the obvious "base" location, but it turns out the
headers end up being installed right there rather in a vulkan
subdirectory. On Debian/Ubuntu the selected installation path is
/usr/include/vulkan, so just go with that.

Hopefully other distro don't choose another path.

Note that the validation layer doesn't provide a .pc file so we have
no way of querying where the headers are installed.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109739
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-26 12:29:54 +00:00
Lionel Landwerlin
47ef52d333 vulkan/overlay: fix missing installation of layer
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109739
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-26 12:29:46 +00:00
Eric Engestrom
318e550549 dri_interface: add missing #include
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-02-26 12:03:20 +00:00
Eric Engestrom
7f5d9c2757 gitlab-ci: always run the containers build
If the first time a fork was created, the job creating the containers was
manually cancelled, this would have left the fork unable to use the CI
(until the next automatic regeneration of the container).

Avoid this by always running the container-generation job, even though
99% of the time it will spin up, see that the container exists and shut
down.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2019-02-26 12:02:14 +00:00
Emil Velikov
40a82e6463 docs: mention "Allow commits from members who can merge..."
Mention the tick-box otherwise only the MR author can rebase the series.

Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reivewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2019-02-26 11:27:10 +00:00
Emil Velikov
d9d1cb43d7 egl/android: bump the number of drmDevices to 64
It's the current maximum supported by the kernel. Stay consistent with
the rest of Mesa and use the same number.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2019-02-26 11:07:23 +00:00
Emil Velikov
02344fe80b loader: use loader_open_device() to handle O_CLOEXEC
Some platforms lack O_CLOEXEC. The loader_open_device() handles those
appropriately, so use the helper.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-26 11:07:23 +00:00
Emil Velikov
f0a7b463b5 meson: egl: correctly manage loader/xmlconfig
Earlier commit introduced support for haiku yet did not properly
annotate the loader/xmlconfig dependencies.

Thus we ended up adding inc_loader for each !haiku platform - see
659910eda0 9a96bf0ecd c731508b98 ec6cb01e21.

One piece remained though - the wayland platform. Hence the following
would fail:

 meson -Dgallium-drivers=etnaviv -Ddri-drivers=''\
       -Dtools=etnaviv -Dplatforms=wayland -Dglx=disabled \
       build/

Cc: Alexander von Gluck IV <kallisti5@unixzen.com>
Reported-by: Boris Brezillon <boris.brezillon@collabora.com>
Fixes: 834d221512 ("meson: Add Haiku platform support v4")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-02-26 11:07:23 +00:00
Emil Velikov
9d84a922b8 egl/dri: de-duplicate dri2_load_driver*
The difference between the three functions is the list of mandatory
driver extensions. Pass that as an argument to the common helper.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-26 11:07:23 +00:00
Samuel Pitoiset
4924dfc851 radv: don't copy buffer descriptors list for samplers
Sampler descriptors don't have a buffer list.

This fixes some crashes with new CTS
dEQP-VK.binding_model.descriptor_copy.*.sampler_*.

Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-26 11:22:28 +01:00
Samuel Pitoiset
9256e0a09d radv: fix out-of-bounds access when copying descriptors BO list
We shouldn't increment the buffer list pointers twice.

This fixes some crashes with new CTS
dEQP-VK.binding_model.descriptor_copy.*.

Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-26 11:22:22 +01:00
Tapani Pälli
1d5e5ec30a nir: use nir_variable_create instead of open-coding the logic
Fixes: 3d7611e9 "st/nir: use NIR for asm programs"
Reported-by: Matthias Lorenz <oschowa@web.de>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-26 09:00:36 +02:00
Tapani Pälli
22267feff1 nir: initialize value in copy_prop_vars_block
Fixes following valgrind warning:

   ==27561== Conditional jump or move depends on uninitialised value(s)
   ==27561==    at 0x667856B: value_set_ssa_components (nir_opt_copy_prop_vars.c:78)
   ==27561==    by 0x667A1C4: copy_prop_vars_block (nir_opt_copy_prop_vars.c:797)

Fixes: 62332d139c "nir: Add a local variable-based copy propagation pass"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-26 08:56:25 +02:00
Eric Anholt
97566efe5c v3d: Rematerialize MOVs of uniforms instead of spilling them.
If we have a MOV of a uniform value available to spill, that's one of our
best choices.  We can just not spill the value, and emit a new load of the
uniform as the fill.  This saves bothering the TMU and the thrsw, and is
the same cost in uniforms (since the spill offset is a uniform anyway).

This doesn't have a huge impact on shader-db, since there aren't a whole
lot of spills and we usually copy-prop the uniforms at the VIR level such
that the only uniform MOVs are from vir_lower_uniforms:

total instructions in shared programs: 6430292 -> 6430279 (<.01%)
total uniforms in shared programs: 2386023 -> 2385787 (<.01%)
total spills in shared programs: 4961 -> 4960 (-0.02%)
total fills in shared programs: 6352 -> 6350 (-0.03%)

However, I'm interested in dropping the uniforms copy-prop in the backend,
since it would be cheaper to not load repeated uniforms if we have the
registers to spare.  This also saves many spills on
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20, which is what
motivated a bunch of my recent backend work in the first place:

before: 46 spills, 106 fills, 3062 instructions
after: 0 spills, 0 fills, 2611 instructions
2019-02-25 21:33:47 -08:00
Eric Anholt
e0fada983d v3d: Dump the VIR after register spilling if we were forced to.
Spilling is unusual, but one often has to debug it when it happens, so
dump it.
2019-02-25 21:26:24 -08:00
Eric Anholt
2786d2161a v3d: Fix vir_is_raw_mov() for input unpacks.
There are no users at the moment, but I wanted to start using this in
register spilling.
2019-02-25 21:26:24 -08:00
Mathias Fröhlich
1ab2159249 st/mesa: Reduce array updates due to current changes.
Since using bitmasks we can easily check if we have any
current value that is potentially uploaded on array setup.
So check for any potential vertex program input that is not
already a vao enabled array. Only flag array update if there is
a potential overlap.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2019-02-26 05:42:04 +01:00
Dylan Baker
6f42303646 meson/iris: Use current coding style
Just a few minor style things.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-02-25 23:37:27 +00:00
Timothy Arceri
603206d0a6 radeonsi: fix query buffer allocation
Fix the logic for buffer full check on alloc.

This patch just takes the fix Nicolai attached to the bug report
and updates it to work on master.

Fixes: e0f0d3675d ("radeonsi: factor si_query_buffer logic out of si_query_hw")

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109561
2019-02-26 09:55:41 +11:00
Eric Anholt
7c1bf075f3 nir: Just return when asked to rewrite uses of an SSA def to itself.
The nir_builder swizzling improvement to not emit extra MOVs resulted in
nir_lower_tex() trying to rewrite an SSA def to itself, triggering the
assert on all texturing in v3d.  There's no work to be done in this case,
so just stop asserting.

Fixes: 743700be1f ("nir/builder: Don't emit no-op swizzles")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-25 21:25:24 +00:00
Samuel Pitoiset
5671f38085 radv: fix clearing attachments in secondary command buffers
If no framebuffer is bound, get the number of samples and the
image format from the render pass.

This fixes new CTS dEQP-VK.geometry.layered.*.secondary_cmd_buffer.

Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-25 21:42:50 +01:00
Alok Hota
773b3ceaca swr/rast: Fix autotools and scons codegen
Use new input flags for gen_archrast.py

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-02-25 13:05:39 -06:00
Alok Hota
16e10b8c30 swr/rast: Add general SWTag statistics
Update Archrast parser to use stats, used with an internal tool

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-02-25 13:05:36 -06:00
Alok Hota
b45a15a39f swr/rast: Add string handling to AR event framework
For use by an internal tool

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-02-25 13:05:31 -06:00
Alok Hota
8608a747aa swr/rast: Add initial SWTag proto definitions
Update gen_archrast.py to properly generate event IDs

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-02-25 13:05:17 -06:00
Alok Hota
93cd9905c8 swr/rast: Cleanup and generalize gen_archrast
Update meson.build to accomodate

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-02-25 13:05:07 -06:00
Daniel Schürmann
0bd45f96b9 nir: Use SM5 properties to optimize shift(a@32, iand(31, b))
This is a common pattern from HLSL->SPIRV translation
and supported in HW by all current NIR backends.

vkpipeline-db results anv (SKL):

    total instructions in shared programs: 6403130 -> 6402380 (-0.01%)
    instructions in affected programs: 204084 -> 203334 (-0.37%)
    helped: 208
    HURT: 0

    total cycles in shared programs: 1915629582 -> 1918198408 (0.13%)
    cycles in affected programs: 1158892682 -> 1161461508 (0.22%)
    helped: 107
    HURT: 86

shader-db results on i965 (KBL):

    total instructions in shared programs: 15284592 -> 15284568 (<.01%)
    instructions in affected programs: 81683 -> 81659 (-0.03%)
    helped: 24
    HURT: 0

    total cycles in shared programs: 375013622 -> 375013932 (<.01%)
    cycles in affected programs: 40169618 -> 40169928 (<.01%)
    helped: 13
    HURT: 9

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-25 12:59:44 -06:00
Daniel Schürmann
0525bdc225 nir: Define shifts according to SM5 specification.
SPIR-V shifts are undefined for values >= bitsize, but SM5 shifts
are defined to only use the least significant bits.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-25 12:59:43 -06:00
Jason Ekstrand
c4fb6b0c81 intel/eu: Add an EOT parameter to send_indirect_[split]_message
For split indirect sends we have to put the EOT parameter in the
extended descriptor as well as the instruction itself so just calling
brw_inst_set_eot is insufficient.  Moving the EOT handling handling into
the send_indirect_[split]_message helper lets us handle it properly.
2019-02-25 11:35:12 -06:00
Sergii Romantsov
dcc4866419 d3d: meson: do not prefix user provided d3d-drivers-path
The user can select the location where there d3d drivers
are installed by the d3d-drivers-path meson option.

By default path will be $prefix/$libdir/d3d.

Currently we add $prefix to the user provided path.
Resulting in an incorrect or even missing path.

Based on logic of
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109698
CC: Kenneth Graunke <kenneth@whitecape.org>
CC: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-02-25 16:07:02 +00:00
Sergii Romantsov
f6556ec7d1 dri: meson: do not prefix user provided dri-drivers-path
The user can select the location where there dri drivers
are installed by the dri-drivers-path meson option.

By default path will be $prefix/$libdir/dri.

Currently we add $prefix to the user provided path.
Resulting in an incorrect or even missing path.

v2: fixed dri_search_path by default, rebased to master

v3: new commit-message (Emil Velikov), cc mesa-stable

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109698
CC: Rafael Antognolli <rafael.antognolli@intel.com>
CC: Dylan Baker <dylan@pnwbakers.com>
Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Fixes: 306914db92 (meson: Add dridriverdir variable to dri.pc.)
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-02-25 16:07:02 +00:00
Lionel Landwerlin
30828f4646 intel/aub_viewer: silence more compiler warnings
format not a string literal and no format arguments.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-25 13:11:16 +00:00
Lionel Landwerlin
91df8b1780 intel/aub_viewer: silence compiler warning
buffer_addr may be used uninitialized.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-25 13:11:13 +00:00
Lionel Landwerlin
f1da10e0c5 intel/aub_viewer: printout 48bits addresses
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-25 13:11:05 +00:00
Gert Wollny
875942c059 mesa/core: Enable EXT_depth_clamp for GLES >= 2.0
The extension NV_depth_clamp is written against OpenGL 1.2.1, and
since GLES 2.0 is based on GL 2.0 there is no reason not to enable
this extension also for GLES >= 2.0.

v2: Use EXT_depth_clamp that has been proposed to Khronos

v3: - Fix check for extension availability (Erik Faya-Lund)
    - Also fix the test in is_enabled
v4: - Test both, ARB and EXT extension (Erik)
v5: - Fix white space errors (Erik)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2019-02-25 09:44:27 +00:00
Kenneth Graunke
b45186a6cd iris: Properly allow rendering to RGBX formats.
I was converting them at pipe_surface creation time, but not when
answering queries about whether formats support rendering.  This caused
a lot of FBO incomplete errors for formats that ought to be supported.

Fixes "Child of Light", which uses PIPE_FORMAT_R8G8B8X8_UNORM_SRGB.

Also fixes Witcher 1 using wined3d (GL) according to Timur Kristóf.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109738
2019-02-25 01:11:27 -08:00
Kenneth Graunke
fce089c8a2 iris: Drop RGBX -> RGBA for storage image usages
GLSL doesn't expose RGB/RGBX image formats, so this isn't needed.
2019-02-25 00:57:50 -08:00
Kenneth Graunke
6921588d54 mesa: Fix RGBBuffers for renderbuffers with sized internal formats
For texture attachments, 'f' is texImg->_BaseFormat, but for
renderbuffer attachments, 'f' is att->Renderbuffer->InternalFormat.

InternalFormat may be something like GL_RGB8, which causes our
(f == GL_RGB) check to fail.  Switch to using a proper _BaseFormat,
which drops the size.

Fixes dEQP-GLES31.functional.draw_buffers_indexed.random.
max_required_draw_buffers.15 on iris when combined with a driver fix.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
2019-02-25 00:57:42 -08:00
Oscar Blumberg
da9c030763 glsl: Fix function return typechecking
apply_implicit_conversion only converts and check base types but we
need actual type equality for function returns, otherwise you can
return a vec2 from a function declared as returning a float.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-02-25 08:49:06 +02:00
Jordan Justen
bd0ad651e0 iris: Always use in-tree i915_drm.h
Ref: f1374805a8 "drm-uapi: use local files, not system libdrm"
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-24 21:06:40 -08:00
Alyssa Rosenzweig
f943047e48 panfrost: Decode render target swizzle/channels
On MRT-capable systems, the framebuffer format is encoded as a 64-bit
word in the render target descriptor. Previously, the two 32-bit
words were exposed as opaque hex values. This commit identifies a 12-bit
Mali swizzle and a 2-bit channel counter, removing some of the magic. It
also adds decoding support for the AFBC and MSAA enable bits, which were
already known but otherwise ignored in pandecode.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-25 04:49:50 +00:00
Alyssa Rosenzweig
c6be9969d2 panfrost/midgard: Add fround(_even), ftrunc, ffma
These ops were discovered by invoking the correspondingly names GLSL
functions. The rounding ops here behave exact as expected and are mapped
to their corresponding NIR ops where applicable. The ffma behaves as a
LUT instruction and requires some special argument packing (since
Midgard normally only allows for 2 arguments); this quirk will be
addressed in the future, but for now FMA is still lowered.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-25 02:36:26 +00:00
Alyssa Rosenzweig
4a4726af3c panfrost/nondrm: Split out dump_counters
Previously, this function was implied a part of the job submit.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-25 02:34:16 +00:00
Alyssa Rosenzweig
cdca103d43 panfrost/nondrm: Make COHERENT_LOCAL explicit
This flag corresponds to what was MEM_COHERENT_LOCAL in the vendor
driver, which seems to influence the cache policy, necessary for the
varying temporary storage but nothing else.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-25 02:32:45 +00:00
Alyssa Rosenzweig
f44d4653a9 panfrost/nondrm: Flag CPU-invisible regions
Potentially, the kernel could optimize these allocations, or perhaps we
can save on mapping costs.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-25 02:31:09 +00:00
Alyssa Rosenzweig
10cc251842 panfrost/meson: Remove subdir for nondrm
This change fixes cross builds with the (temporary) non-DRM overlay.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-25 02:27:26 +00:00
Alyssa Rosenzweig
77fea552f6 panfrost: Use tiler fast path (performance boost)
For reasons that are still unclear (speculation included in the comment
added in this patch), the tiler? metadata has a fast path that we were
not enabling; there looks to be a possible time/memory tradeoff, but the
details remain unclear.

Regardless, this patch improves performance dramatically. Particular
wins are for geometry-heavy scenes. For instance, glmark2-es2's
Phong-shaded bunny, rendering at fullscreen (2400x1600) via GBM, jumped
from ~20fps to hitting vsync cap at 60fps. Gains are even more obvious
when vsync is disabled, as in glmark2-es2-wayland.

With this patch, on GLES 2.0 samples not involving FBOs, it appears
performance is converging with (and sometimes surpassing) the blob.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-25 02:25:50 +00:00
Jason Ekstrand
743700be1f nir/builder: Don't emit no-op swizzles
The nir_swizzle helper is used some on it's own but it's also called by
nir_channel and nir_channels which are used everywhere.  It's pretty
quick to check while we're walking the swizzle anyway whether or not
it's an identity swizzle.  If it is, we now don't bother emitting the
instruction.  Sure, copy-prop will clean it up for us but there's no
sense making more work for the optimizer than we have to.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-02-24 20:01:27 -06:00
Jason Ekstrand
724371c6b9 nir/split_vars: Don't compact vectors unnecessarily
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-02-24 20:01:18 -06:00
Erik Faye-Lund
7a6a5d4bfa st/mesa: remove unused header-file
This header has been unused since f8f2520e88 ("st/mesa: Remove
unnecessary headers"). And in the more than 8 years since, this
hasn't been useful. So let's just get rid of it.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-24 20:53:37 +01:00
Maya Rashish
021c496135 configure: fix test portability
From the bash manual:

string1 == string2
string1 = string2
       True if the strings are equal.  = should be used with the test
       command for POSIX conformance.
2019-02-24 19:26:15 +00:00
David Shao
6fa923a65d meson: ensure that xmlpool_options.h is generated for gallium targets that need it
Fixes: 68076b8747 "meson: build gallium vdpau state tracker"
Fixes: 22a817af8a "meson: build gallium xvmc state tracker"
Fixes: 5a785d51a6 "meson: build gallium va state tracker"
Fixes: 0ba909f0f1 "meson: build gallium xa state tracker"
Fixes: 1d36dc674d "meson: build gallium omx state tracker"
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-24 09:00:39 +00:00
Matthias Lorenz
f91654120b vulkan/overlay: Add fps counter
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109747
2019-02-24 01:07:26 +00:00
Lionel Landwerlin
239b0d8570 Revert "anv: add support for INTEL_DEBUG=bat"
This reverts commit e4d88396d2.

Apologies, I pushed the wrong commit.
2019-02-24 01:06:39 +00:00
Lionel Landwerlin
e4d88396d2 anv: add support for INTEL_DEBUG=bat
As requested by Ken ;)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-23 23:29:04 +00:00
Christian Gmeiner
c56e734496 etnaviv: blt: mark used src resource as read from
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-02-23 16:00:50 +01:00
Christian Gmeiner
7244e76804 etnaviv: rs: mark used src resource as read from
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
2019-02-23 16:00:25 +01:00
Vinson Lee
2bd08b8b9d gallium/auxiliary/vl: Fix duplicate symbol build errors.
CXXLD    gallium_dri.la
duplicate symbol _compute_shader_video_buffer in:
    ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor.o)
    ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor_cs.o)
duplicate symbol _compute_shader_weave in:
    ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor.o)
    ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor_cs.o)
duplicate symbol _compute_shader_rgba in:
    ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor.o)
    ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor_cs.o)

Fixes: 9364d66cb7 ("gallium/auxiliary/vl: Add video compositor compute shader render")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
2019-02-22 23:07:26 -08:00
Caio Marcelo de Oliveira Filho
4c160b6bd8 nir: fix MSVC build
Zero initialize struct with {0} instead of {}.
2019-02-22 22:38:05 -08:00
Caio Marcelo de Oliveira Filho
eb13211997 nir/copy_prop_vars: add tests for load/store elements of vectors
Test using array deref on vectors in loads and stores.  These are
marked DISABLED_ as this optimization is currently not done.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho
4f3809d389 nir: nir_build_deref_follower accept array derefs of vectors
Code itself already supports it, just make sure we can use it for
those cases.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho
c4beadd28e nir/copy_prop_vars: change test helper to get intrinsics
Replace find_next_intrinsic(intrinsic, after) with
get_intrinsic(intrinsic, index).  This makes slightly more convenient
to check the resulting loads/stores/copies, since in most tests we
know which one we care about.  The cost is to perform more traversals,
but for such tests this is not a problem.

Added the ASSERT_EQ() on count to some tests missing it, so the
indices queried are always expected to find something.

Also, drop two nir_print_shader leftover calls in a test.

v2: Remove redundant assertions.  nir_src_comp_as_uint already
    assert what we need.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho
fdcb9779d9 nir/copy_prop_vars: keep track of components in copy_entry
When a copy_entry is SSA, store not only the nir_ssa_def* for each
component, but also the source component they come from.  At the
moment this is always a match (i.e. 'component[i] == i'), because all
the operations for a copy_entry happen using definitions with the same
size.  This prepares the code for array_derefs of vectors, in which
'component[i] != i'.

Also, extract setting all SSA components into a function of its own.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho
6624decbb5 nir/copy_prop_vars: add debug helpers
Disabled by default, to be used during development.  Adding those
so I don't rewrite some ad-hoc version of them everytime I'm working
with this pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho
60d9bb9ff5 nir/copy_prop_vars: don't get confused by array_deref of vectors
For now these derefs are not handled, so don't let these get into the
copies list -- which would cause wrong propagations.  For load_derefs,
do nothing.  For store_derefs, invalidate whatever the store is
writing to.  For copy_derefs, invalidate whatever the copy is writing
to.

These cases will happen once derefs to SSBOs/UBOs are kept around long
enough to get optimized by copy_prop_vars.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 21:00:50 -08:00
Timothy Arceri
f48527e51a nir: allow nir_lower_phis_to_scalar() on more src types
Rather than only lowering if all srcs are scalarizable we instead
check that at least one src is scalarizable.

We change undef type to return false otherwise it will cause
regressions when it is the only scalarizable src.

total instructions in shared programs: 13219105 -> 13024547 (-1.47%)
instructions in affected programs: 1153797 -> 959239 (-16.86%)
helped: 581
HURT: 74

total cycles in shared programs: 333968972 -> 324807922 (-2.74%)
cycles in affected programs: 129809402 -> 120648352 (-7.06%)
helped: 571
HURT: 131

total spills in shared programs: 57947 -> 29130 (-49.73%)
spills in affected programs: 53364 -> 24547 (-54.00%)
helped: 351
HURT: 0

total fills in shared programs: 51310 -> 25468 (-50.36%)
fills in affected programs: 44882 -> 19040 (-57.58%)
helped: 351
HURT: 0

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-23 11:11:51 +11:00
Alok Hota
6053499f2e swr/rast: bypass size limit for non-sampled textures
This fixes a bug where SWR will fail to render in cases with large
buffer allocations, e.g. very large meshes whose vertex buffers exceed
2GB

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-02-22 23:35:11 +00:00
Marek Olšák
b326a15eda tgsi: don't set tgsi_info::uses_bindless_images for constbufs and hw atomics
This might have decreased performance for radeonsi/tgsi, because most
most shaders claimed they used bindless.

Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-02-22 18:00:54 -05:00
Jordan Justen
cf652205cf iris: Add gitlab-ci build testing
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-22 14:08:21 -08:00
Rob Clark
fd360c82f0 freedreno/a6xx: cube image fix
Note that emit_intrinsic_load_image() already swaps a .3d flag with an
.a flag.  I tried doing things the other way around (going back to .3d)
but that didn't work.  And treating cube images as 2d array is also what
blob does, so let's just go with that.

Fixes dEQP-GLES31.functional.image_load_store.cube.load_store.*

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-22 14:05:32 -05:00
Rob Clark
f90c3b4485 freedreno/a6xx: fix border-color offset
Fixes nearly all of dEQP-GLES31.functional.texture.border_clamp.* when
run after a test that binds textures used in vertex shader.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-22 14:05:32 -05:00
Rob Clark
bdedb8277a freedreno/ir3: don't hardcode wrmask
Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.const_literal.vertex.samplercubeshadow
and few other similar tests that do multiple texture fetches into
individual components of a packet output.  Mostly works around the
issue mentioned in ra_block_find_definers().

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-22 14:05:32 -05:00
Rob Clark
5d4fa194b8 freedreno: fix race condition
rsc->write_batch can be cleared behind our back, so we need to acquire
the lock *before* deref'ing.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-22 14:05:32 -05:00
Kenneth Graunke
3090c6b9e9 vulkan: Fix 32-bit build for the new overlay layer
vulkan_core.h defines non-dispatchable handles as (struct object *)
on 64-bit systems, but uint64_t on 32-bit systems.  The former can be
implicitly cast to void *, but the latter requires an explicit cast.

While here, %lu is the wrong format specifier for uint64_t on 32-bit
systems, so use PRIu64, fixing a warning.

Reported-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-22 08:56:54 -08:00
Juan A. Suarez Romero
4f917e6a61 anv: advertise 8 subpixel precision bits
On one side, when emitting 3DSTATE_SF, VertexSubPixelPrecisionSelect is
used to select between 8 bit subpixel precision (value 0) or 4 bit
subpixel precision (value 1). As this value is not set, means it is
taking the value 0, so 8 bit are used.

On the other side, in the Vulkan CTS tests, if the reference rasterizer,
which uses 8 bit precision, as it is used to check what should be the
expected value for the tests, is changed to use 4 bit as ANV was
advertising so far, some of the tests will fail.

So it seems ANV is actually using 8 bits.

v2: explicitly set 3DSTATE_SF::VertexSubPixelPrecisionSelect (Jason)

v3: use _8Bit definition as value (Jason)

v4: (by Jason)
anv: Explicitly set 3DSTATE_CLIP::VertexSubPixelPrecisionSelect

This field was added on gen8 even though there's an identically defined
one in 3DSTATE_SF.

CC: Jason Ekstrand <jason@jlekstrand.net>
CC: Kenneth Graunke <kenneth@whitecape.org>
CC: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 17:53:55 +01:00
Juan A. Suarez Romero
3b423eeb2d genxml: add missing field values for 3DSTATE_SF
Fill out "Vertex Sub Pixel Precision Select" possible values.

CC: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 17:53:45 +01:00
Bas Nieuwenhuizen
f324784104 radv: Allow interpolation on non-float types.
In particular structs containing floats and 16-bit floating point
types.

Fixes: 62024fa775 "radv: enable VK_KHR_16bit_storage extension / 16bit storage features"
Fixes: da29594636 "spirv: Only split blocks"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109735
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-22 17:06:55 +01:00
Bas Nieuwenhuizen
a1fdd4a4a7 radv: Fix float16 interpolation set up.
float16 types can have non-flat interpolation so set up the HW
correctly for that.

Fixes: 62024fa775 "radv: enable VK_KHR_16bit_storage extension / 16bit storage features"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-22 17:06:55 +01:00
Ilia Mirkin
ae2cb72804 nv50: disable compute
It causes more trouble than it's worth. Now vl tries to create compute
shaders without all the proper checking. Since there's really no
(current) way to use compute on nv50, just mark it disabled.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109742
Fixes: f6ac0b5d71 ("gallium/auxiliary/vl: Add compute shader to support video compositor render")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-02-22 09:42:41 -05:00
Lionel Landwerlin
1d626fc028 intel: fix urb size for CFL GT1
Same 192Kb amount as SKL/KBL GT1 applies.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Fixes: de7ed0ba55 ("i965/CFL: Add PCI Ids for Coffee Lake.")
2019-02-22 11:53:49 +00:00
Samuel Iglesias Gonsálvez
bd2c5a8203 isl: the display engine requires 64B alignment for linear surfaces
v2: Add PRM quote (Lionel)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-22 11:45:45 +00:00
Gert Wollny
2ee197d6e8 virgl: Enable mixed color FBO attachemnets only when the host supports
it

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2019-02-22 10:44:08 +01:00
Mauro Rossi
338dacc341 android: intel/isl: remove redundant building rules
Fixes the following building error:

including ./external/mesa/Android.mk ...
build/core/base_rules.mk:183: *** external/mesa/src/intel:
MODULE.TARGET.STATIC_LIBRARIES.libmesa_isl_tiled_memcpy already defined by external/mesa/src/intel.
make: *** [build/core/ninja.mk:164: out/build-android_x86_64.ninja] Error 1

ISL_TILED_MEMCPY_FILES is isl/isl_tiled_memcpy_normal.c
and that source file includes isl_tiled_memcpy.c source

Fixes: 96bb328 ("iris: add Android build")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-02-22 07:56:11 +02:00
Kenneth Graunke
b21de090d6 Revert "iris: Enable auxiliary buffer support"
This reverts commit cd0ced49e7.

It breaks glxgears rendering.
2019-02-21 15:50:46 -08:00
Kenneth Graunke
e2cb0c5e0e iris: Enable -msse2 and -mstackrealign
This is needed for gen_clflush.h intrinsics to work on 32-bit builds.
i965 and anv both set these, and iris needs to as well.

Tested-by: Mark Janes <mark.a.janes@intel.com>
2019-02-21 14:51:15 -08:00
Francisco Jerez
7272fe9c08 intel/fs: Rely on undocumented unrestricted regioning for 32x16-bit integer multiply.
Even though the hardware spec claims that any "integer DWord multiply"
operation is affected by the regioning restrictions of CHV/BXT/GLK,
this is inconsistent with the behavior of the simulator and with
empirical evidence -- Return false from has_dst_aligned_region_restriction()
for such instructions as a micro-optimization.

Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 14:07:25 -08:00
Francisco Jerez
e03be78252 intel/fs: Implement extended strides greater than 4 for IR source regions.
Strides up to 32B can be implemented for the source regions of most
instructions by leveraging either the vertical or the horizontal
stride of the hardware Align1 region.  The main motivation for this is
that currently the lower_integer_multiplication() pass will happily
double the stride of one of the 32-bit sources, which can blow up if
the stride of the original source was already the maximum value
allowed by the hardware.

An alternative would be to use the regioning legalization pass in
order to lower such strides into the composition of multiple legal
strides, but that would be somewhat less efficient.

This showed up as a regression from my commit cbea91eb57
in Vulkan 1.1 CTS tests on CHV/BXT platforms, however it was really a
pre-existing problem that had affected conformance on other platforms
without native support for integer multiplication.  CHV/BXT were
getting around it because the code I removed in that commit had the
"fortunate" side effect of emitting narrower regions that didn't hit
the hardware stride limit after lowering.  Beyond fixing the
regression this fixes ~90 additional Vulkan 1.1 subgroup CTS tests on
ICL (that's why this patch is marked for inclusion in mesa-stable even
though the original regressing patch was not).

According to Jason, a nearly equivalent change had been committed
previously as e8c9e65185 and then (mistakenly?) reverted as
a31d038208.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
Reported-by: Mark Janes <mark.a.janes@intel.com>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 14:07:25 -08:00
Francisco Jerez
7f9f6263c1 intel/fs: Cap dst-aligned region stride to maximum representable hstride value.
This is required in combination with the following commit, because
otherwise if a source region with an extended 8+ stride is present in
the instruction (which we're about to declare legal) we'll end up
emitting code that attempts to write to such a region, even though
strides greater than four are still illegal for the destination.

Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 14:07:25 -08:00
Francisco Jerez
e2f475ddff intel/fs: Lower integer multiply correctly when destination stride equals 4.
Because the "low" temporary needs to be accessed with word type and
twice the original stride, attempting to preserve the alignment of the
original destination can potentially lead to instructions with illegal
destination stride greater than four.  Because the CHV/BXT alignment
restrictions are now being enforced by the regioning lowering pass run
after lower_integer_multiplication(), there is no real need to
preserve the original strides anymore.

Note that this bug can be reproduced on stable branches, but
back-porting would be non-trivial, because the fix relies on the
regioning lowering pass recently introduced.

Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 14:07:25 -08:00
Francisco Jerez
c3c27762f7 intel/fs: Exclude control sources from execution type and region alignment calculations.
Currently the execution type calculation will return a bogus value in
cases like:

  mov_indirect(8) vgrf0:w, vgrf1:w, vgrf2:ud, 32u

Which will be considered to have a 32-bit integer execution type even
though the actual indirect move operation will be carried out with
16-bit precision.

Similarly there's no need to apply the CHV/BXT double-precision region
alignment restrictions to such control sources, since they aren't
directly involved in the double-precision arithmetic operations
emitted by these virtual instructions.  Applying the CHV/BXT
restrictions to control sources was expected to be harmless if mildly
inefficient, but unfortunately it exposed problems at codegen level
for virtual instructions (namely the SHUFFLE instruction used for the
Vulkan 1.1 subgroup feature) that weren't prepared to accept control
sources with an arbitrary strided region.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
Reported-by: Mark Janes <mark.a.janes@intel.com>
Fixes: efa4e4bc5f "intel/fs: Introduce regioning lowering pass."
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 14:07:25 -08:00
Timothy Arceri
d9e08e753b nir: clone instruction set rather than removing individual entries
This reduces the time spent in nir_opt_cse() by almost a half.

The massif tool from callgrind reported no change in peak
memory use with the large doliphin uber shaders I used for
testing.

Reviewed-by: Thomas Helland<thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-22 08:36:36 +11:00
Jordan Justen
cd0ac3a6af genxml: Remove extra space in gen4/45/5 field name
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 13:17:10 -08:00
Jordan Justen
a9b0b72a78 genxml/gen_bits_header.py: Use regex to strip no alphanum chars
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 13:15:59 -08:00
Kenneth Graunke
cd0ced49e7 iris: Enable auxiliary buffer support
This currently regresses KHR-GL4x.compute_shader.resource-texture,
but that's a pre-existing bug (https://bugs.freedesktop.org/109113)
which should be fixed up once we have fast clear support.
2019-02-21 10:26:12 -08:00
Rafael Antognolli
db81445837 iris: Flag ALL_DIRTY_BINDINGS on aux state change.
If we change the aux state for a given resource, we need to re-emit the
binding table pointers for any stage that has such resource bound. Since
we don't track that, flag IRIS_ALL_DIRTY_BINDINGS and emit all of them.
2019-02-21 10:26:12 -08:00
Rafael Antognolli
95589652a1 iris: Skip resolve if there's no context.
If iris_resource_get_handle() gets called without a context, we can't
resolve the resource. Hopefully it shouldn't be compressed anyway, so
let's just add an assert to ensure it's correct.
2019-02-21 10:26:12 -08:00
Rafael Antognolli
36138bb7fc iris/clear: Pass on render_condition_enabled. 2019-02-21 10:26:12 -08:00
Rafael Antognolli
8190165d13 iris: Avoid leaking if we fail to allocate the aux buffer.
Otherwise we could leak the aux state map or the aux BO.
2019-02-21 10:26:12 -08:00
Kenneth Graunke
7da53d7188 iris: Only resolve compute resources for compute shaders 2019-02-21 10:26:12 -08:00
Kenneth Graunke
95a36bd55c iris: Fix aux usage in render resolve code 2019-02-21 10:26:12 -08:00
Rafael Antognolli
4f191feb0c iris: Pin HiZ buffers when rendering. 2019-02-21 10:26:12 -08:00
Rafael Antognolli
dfd54f9954 iris: Flush before hiz_exec. 2019-02-21 10:26:12 -08:00
Kenneth Graunke
f3f7d45a63 iris: Allow disabling aux via INTEL_DEBUG options 2019-02-21 10:26:12 -08:00
Kenneth Graunke
4634b754f4 iris: do flush for buffers still 2019-02-21 10:26:12 -08:00
Kenneth Graunke
15822f33ad iris: make surface states for CCS_D too
CCS_E can fall back to CCS_D with incompatible format views

CCS_D is pretty useless without fast clears and we may as well use NONE,
but we're surely going to hook those up at some point, so may as well
just go ahead and do it now...
2019-02-21 10:26:12 -08:00
Rafael Antognolli
689b590069 iris: Skip msaa16 on gen < 9.
Also needed to add gen information to KEY_INIT.
2019-02-21 10:26:12 -08:00
Kenneth Graunke
fd2038b22a iris: Set program key fields for MCS 2019-02-21 10:26:12 -08:00
Kenneth Graunke
92c310fd3f iris: don't use hiz for MSAA buffers 2019-02-21 10:26:12 -08:00
Kenneth Graunke
2cddc953cd iris: some initial HiZ bits 2019-02-21 10:26:12 -08:00
Kenneth Graunke
9b1126c990 iris: disable aux for external things 2019-02-21 10:26:12 -08:00
Kenneth Graunke
45f4dab62b iris: Resolves for compute 2019-02-21 10:26:12 -08:00
Kenneth Graunke
ecc897b8ad iris: consider framebuffer parameter for aux usages 2019-02-21 10:26:12 -08:00
Kenneth Graunke
b77d2dc71b iris: Make blit code use actual aux usages 2019-02-21 10:26:12 -08:00
Kenneth Graunke
bfc76d3525 iris: store modifier info in res 2019-02-21 10:26:12 -08:00
Kenneth Graunke
56f1fe3eac iris: pin the buffers 2019-02-21 10:26:12 -08:00
Kenneth Graunke
f8aa9aa353 iris: resolve before transfer maps 2019-02-21 10:26:12 -08:00
Kenneth Graunke
c53a67d469 iris: be sure to skip buffers in resolve code
Buffers don't have ISL surfaces, and this can get us into trouble.
2019-02-21 10:26:12 -08:00
Kenneth Graunke
5eb75345b8 iris: try to fix copyimage vs copybuffers 2019-02-21 10:26:12 -08:00
Kenneth Graunke
d8f3bc1c4c iris: actually use the multiple surf states for aux modes 2019-02-21 10:26:12 -08:00
Kenneth Graunke
3c979b0e6d iris: add some draw resolve hooks 2019-02-21 10:26:12 -08:00
Kenneth Graunke
53c484ba8a iris: blorp using resolve hooks 2019-02-21 10:26:12 -08:00
Kenneth Graunke
77a1070d36 iris: Initial import of resolve code 2019-02-21 10:26:12 -08:00
Kenneth Graunke
f879349398 iris: create aux surface if needed 2019-02-21 10:26:12 -08:00
Kenneth Graunke
3efd5299af iris: Fill out SURFACE_STATE entries for each possible aux usage 2019-02-21 10:26:12 -08:00
Kenneth Graunke
3cfc6a207b iris: Fill out res->aux.possible_usages 2019-02-21 10:26:12 -08:00
Kenneth Graunke
a7bc4d6074 iris: Add iris_resource fields for aux surfaces
But without fast clears or HiZ per-level tracking just yet.
2019-02-21 10:26:12 -08:00
Jordan Justen
d0996d5fab iris: Emit default L3 config for the render pipeline
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2019-02-21 10:26:12 -08:00
Kenneth Graunke
51ddc40084 iris: Always emit at least one BLEND_STATE 2019-02-21 10:26:12 -08:00
Kenneth Graunke
d6dd57d43c iris: Add missing depth cache flushes 2019-02-21 10:26:12 -08:00
Kenneth Graunke
1b5c342f33 iris: Simplify iris_get_depth_stencil_resources
We can safely assume that the given resource is depth, depth/stencil,
or stencil already.  The stencil-only case is easily detectable with
a single format check, and all other cases are handled identically.

This saves some CPU overhead.
2019-02-21 10:26:12 -08:00
Kenneth Graunke
07ec1f0b25 iris: Make an IRIS_MAX_MIPLEVELS define 2019-02-21 10:26:12 -08:00
Rafael Antognolli
455c959689 iris: Store internal_format when getting resource from handle. 2019-02-21 10:26:12 -08:00
Kenneth Graunke
973f01d55a iris: Move create and bind driver hooks to the end of iris_program.c
This just moves the code for dealing with pipe_shader_state /
pipe_compute_state / iris_uncompiled_shader to the end of the file.
Now that those do precompiles, they want to call the actual compile
functions.  Putting them at the end eliminates the need for a bunch
of prototypes.
2019-02-21 10:26:12 -08:00
Timur Kristóf
cacf84ed5f iris: implement clearing render target and depth stencil
v2 (Kenneth Graunke): split color/depthstencil cases, fix iris_clear
2019-02-21 10:26:12 -08:00
Kenneth Graunke
8ab82bd1fd iris: Drop XXX about checking for swizzling
Caio noted that this is not necessary on Gen8+:

   "Before Gen8, there was a historical configuration control field to
    swizzle address bit[6] for in X/Y tiling modes.  This was set in
    three different places: TILECTL[1:0], ARB_MODE[5:4], and
    DISP_ARB_CTL[14:13].  For Gen8 and subsequent generations, the
    swizzle fields are all reserved, and the CPU's memory controller
    performs all address swizzling modifications."

Since we don't support earlier hardware, we can skip it entirely.
2019-02-21 10:26:12 -08:00
Kenneth Graunke
bf23e79629 iris: Set HasWriteableRT correctly
A bit of irritating state cross dependency here, but nothing too hard
2019-02-21 10:26:12 -08:00
Kenneth Graunke
d612cd1bf8 iris: Set 3DSTATE_WM::ForceThreadDispatchEnable
The Vulkan driver only sets this if color writes are disabled, which
is more conservative - but would require us to inspect blend state.

(If color writes are enabled, we don't need to force anything, because
the internal signal is already correct.  But it shouldn't hurt to do so.)
2019-02-21 10:26:12 -08:00
Kenneth Graunke
27d751cdd8 iris: Drop XXX about alpha testing
I was misreading i965 - the 3DSTATE_WM::PixelShaderKillsPixel bit from
Gen < 8 needed all of this, but the 3DSTATE_PS_EXTRA bit only needs
prog_data->uses_kill.
2019-02-21 10:26:12 -08:00
Andre Heider
bffb65d28e iris: improve PIPE_CAP_VIDEO_MEMORY bogus value
-1 is a little too bogus for most games ;)

Signed-off-by: Andre Heider <a.heider@gmail.com>
2019-02-21 10:26:12 -08:00
Andre Heider
f89a578818 iris: fix build with gallium nine
Signed-off-by: Andre Heider <a.heider@gmail.com>
2019-02-21 10:26:12 -08:00
Kenneth Graunke
be49fb051d iris: Stop chopping off the first nine characters of the renderer string 2019-02-21 10:26:12 -08:00
Kenneth Graunke
15341778ba iris: rework num textures to util_lastbit 2019-02-21 10:26:12 -08:00
Kenneth Graunke
974229df46 iris: Add PIPE_CAP_MAX_VARYINGS 2019-02-21 10:26:11 -08:00
Kenneth Graunke
1cd001aa63 iris: Make a iris_batch_reference_signal_syncpt helper function.
Suggested by Chris Wilson.  More obvious what's going on.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
9376799bd6 iris: Use READ_ONCE and WRITE_ONCE for snapshots_landed
Suggested by Chris Wilson, if only to make it obvious to the human
readers that these are volatile reads.  It may also be necessary for
the compiler in a few cases.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
18e31a9b31 iris: Fix accidental busy-looping in query waits
When switching from bo_wait to sync-points, I missed that we turned an
if (not landed) bo_wait into a while (not landed) check_syncpt(), which
has a timeout of 0.  This meant, rather than sleeping until the batch
is complete, we'd busy-loop, continually asking the kernel "is the batch
done yet???".  This is not what we want at all - if we wanted a busy
loop, we'd just loop on !snapshots_landed.  We want to sleep.

Add an effectively infinite timeout so that we sleep.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
3b1ac8244e iris: Add a timeout_nsec parameter, rename check_syncpt to wait_syncpt
I want to be able to wait with a non-zero timeout from elsewhere.
2019-02-21 10:26:11 -08:00
Sagar Ghuge
c24a574e6c iris: Don't allocate a BO per query object
Instead of allocating 4K BO per query object, we can create a large blob
of memory and split it into pieces as required.

Having one BO for multiple query objects, we don't want to wait on all
of them, instead when we write last snapshot, we create a sync point, and
check syncpoints while waiting on particular object.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-02-21 10:26:11 -08:00
Kenneth Graunke
a1ebac3750 iris: Implement ALT mode for ARB_{vertex,fragment}_shader
Fixes gl-1.0-spot-light
2019-02-21 10:26:11 -08:00
Kenneth Graunke
732c3a90a4 iris: Fix bug in bound vertex buffer tracking
res might be NULL, at which point this is an unbind.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
4bfd12bbf7 iris: minor tidying 2019-02-21 10:26:11 -08:00
Kenneth Graunke
b1bacbf038 iris: Unreference some more things on state module teardown 2019-02-21 10:26:11 -08:00
Kenneth Graunke
e092ed9213 iris: Drop dead state_size hash table
I inherited this from i965.  It would be nice to track the state size
so INTEL_DEBUG=color,bat decoding can print the right number of e.g.
binding table entries or blend states, but...without a single point
of entry for state, it's a little tricky to get right.  Punt for now,
and drop the dead code in the meantime.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
6e41f1b459 iris: Drop comment about ISP_DIS
i965 re-emits 3DSTATE_CONSTANT_* on every batch, so there's no point in
restoring the constants from the context.  Iris actually re-pins the
constant buffers properly across the batch, and avoids re-emitting the
constant packets unless it's necessary.  So, we don't want ISP_DIS.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
edd3ce5a63 iris: Enable PIPE_CAP_COMPACT_ARRAYS 2019-02-21 10:26:11 -08:00
Kenneth Graunke
1db394f46b iris: Remap stream output indexes back to VARYING_SLOT_*.
Previously I had a hack in st/mesa to make it stop remapping
VARYING_SLOT_* into the naively compacted slots, which aren't
what we want.  But that wasn't very feasible, as we'd have to
update all drivers, or add capability bits, and it gets messy fast.

It turns out that I can map back to VARYING_SLOT_* in about 5 LOC,
so let's just do that.  It removes the need for hacks, and is easy.

This also fixes KHR-GL46.enhanced_layouts.xfb_capture_struct, which
apparently with my hack was still getting the wrong slot info.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
5d3d757178 iris: Zero the compute predicate when changing the render condition
1. Set a render condition.  We emit it immediately on the render
   engine, and stash q->bo as ice->state.compute_predicate in case
   the compute engine needs it.

2. Clear the render condition.  We were incorrectly leaving a stale
   compute_predicate kicking around...

3. Dispatch compute.  We would then read the stale compute predicate,
   and try to load it into MI_PREDICATE_DATA.  But q->bo may have been
   freed altogether, causing us to try and use garbage memory as a BO,
   adding it to the validation list, failing asserts, and tripping
   EINVALs in execbuf.

Huge thanks to Mark Janes for narrowing this sporadic GL CTS failure
down to a list of 48 tests I could easily run to reproduce it.  Huge
thanks to the Valgrind authors for the memcheck tool that immediately
pinpointed the problem.
2019-02-21 10:26:11 -08:00
Caio Marcelo de Oliveira Filho
4fd1f70e62 iris: always include an extra constbuf0 if using UBOs
In st_nir_lower_uniforms_to_ubo() all UBO access in the shader have
its index incremented to open room for uniforms in constbuf0.  So if
we use UBOs, we always need to include the extra binding entry in the
table.

To avoid doing this checks both when compiling the shader and when
assigning binding tables, store the num_cbufs in iris_compiled_shader.

Fixes a bunch of tests from Piglit and CTS that use UBOs but don't use
uniforms or system values.  Note that some tests fitting this criteria
were passing because the UBOs were moved to be push
constants (avoiding the problem).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-21 10:26:11 -08:00
Kenneth Graunke
4801af2f26 iris: Do binder address allocations per-context, not globally.
iris_bufmgr allocates addresses across the entire screen, since buffers
may be shared between multiple contexts.  There used to be a single
special address, IRIS_BINDER_ADDRESS, that was per-context - and all
contexts used the same address.  When I moved to the multi-binder
system, I made a separate memory zone for them.  I wanted there to be
2-3 binders per context, so we could cycle them to avoid the stalls
inherent in pinning two buffers to the same address in back-to-back
batches.  But I figured I'd allow 100 binders just to be wildly
excessive/cautious.

What I didn't realize was that we need 2-3 binders per *context*,
and what I did was allocate 100 binders per *screen*.  Web browsers,
for example, might have 1-2 contexts per tab, leading to hundreds of
contexts, and thus binders.

To fix this, we stop allocating VMA for binders in bufmgr, and let
the binder handle it itself.  Binders are per-context, and they can
assign context-local addresses for the buffers by simply doing a
ringbuffer style approach.  We only hold on to one binder BO at a
time, so we won't ever have a conflicting address.

This fixes dEQP-EGL.functional.multicontext.non_shared_clear.

Huge thanks to Tapani Pälli for debugging this whole mess and
figuring out what was going wrong.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-02-21 10:26:11 -08:00
Kenneth Graunke
0f33204f05 iris: Fix memzone_for_address for the surface and binder zones
We use > for IRIS_MEMZONE_DYNAMIC because IRIS_BORDER_COLOR_POOL_ADDRESS
lives at the very start of that zone.  However, IRIS_MEMZONE_SURFACE and
IRIS_MEMZONE_BINDER are normal zones.  They used to be a single zone
(surface) with a single binder BO at the beginning, similar to the
border color pool.  But when I moved us to multiple binders, I made them
have a real zone (if a small one).  So both zones should use >=.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-02-21 10:26:11 -08:00
Kenneth Graunke
3bcb1a7fcd iris: Don't whack SO dirty bits when finishing a BLORP op
Re-emitting 3DSTATE_SO_BUFFERS can be hazardous, as it could zero
offsets.  Plus, it's just not necessary - BLORP doesn't change these.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
b9697dd820 iris: Fix SO issue with INTEL_DEBUG=reemit, set fewer bits
INTEL_DEBUG=reemit was breaking streamout tests, by re-emitting
3DSTATE_SO_BUFFER commands that tell the HW to zero the SO write
offsets.  We would need to alter them to use 0xFFFFFFFF for the offset.

Also, have each upload function only flag bits relevant to its own
pipeline.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
61798e3c88 iris: CS stall on VF cache invalidate workarounds
See commit 31e4c9ce40 in i965.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
c81941f1e7 iris: Pay attention to blit masks
For combined depth/stencil formats, we may want to only blit one half.
If PIPE_BLIT_Z is set, blit depth; if PIPE_BLIT_S is set, blit stencil.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
7837fec740 iris: Assert about blits with color masking
st/mesa never asks for this today, but in theory someone might, and we
don't support it.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
0f677b0d87 iris: Don't enable smooth points when point sprites are enabled
dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_*.primitives.points
2019-02-21 10:26:11 -08:00
Kenneth Graunke
3b336a1513 iris: Allow sample mask of 0
I think this was an attempt to work around various sample mask bugs I
had early on.  It's not correct.  A sample mask of 0 is legal and means
to disable all samples.

Fixes dEQP-GLES31.functional.texture.multisample.*.*sample_mask*
2019-02-21 10:26:11 -08:00
Kenneth Graunke
e17333ea1e iris: fail to create screen for older unsupported HW
loader shouldn't try, but let's be paranoid
2019-02-21 10:26:11 -08:00
Kenneth Graunke
1f91f688e8 iris: Switch to the new PIPELINE_STATISTICS_QUERY_SINGLE capability
I had a hack in place earlier to pass the query type as q->index
for the regular statistics query, but we ended up adjusting the
interface and adding a new query type.  Use that instead, fixing
pipeline statistics queries since the rebase.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
a23c06cabc iris: Use new PIPE_STAT_QUERY enums rather than hardcoded numbers. 2019-02-21 10:26:11 -08:00
Kenneth Graunke
5aef30b886 iris: Fix Broadwell WaDividePSInvocationCountBy4
We were dividing by 4 in calculate_result_on_gpu(), and also in
iris_get_query_result().  We should stop doing the latter, and instead
divide by 4 in calculate_result_on_cpu() as well.

Otherwise, if snapshots were available, and you hit the
calculate_result_on_cpu() path, but requested it be written to a QBO,
you'd fail to get a divide.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
7f318bf2ac iris: Delete genx->bound_vertex_buffers
This is actually stored in ice->state, as it isn't gen-specific
2019-02-21 10:26:11 -08:00
Kenneth Graunke
02991e2878 iris: Drop a dead comment 2019-02-21 10:26:11 -08:00
Kenneth Graunke
572fad1e84 iris: Don't check other batches for our batch BO
This is an awkward corner case.  We create batches in order, each of
which creates and pins a BO.  The other batches may not be set up yet,
so it may not be safe to ask whether they reference a BO.

Just avoid this for now.  We could avoid it for other context-local BOs
too, but we currently don't have a flag for that (and I'm not certain
whether it's worth it).
2019-02-21 10:26:11 -08:00
Kenneth Graunke
8eda6f2288 iris: Handle PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE somewhat
Various places in the transfer code need to know whether they must
read the existing resource's values.  Rather than checking both flags
everywhere, just make PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE also flag
PIPE_TRANSFER_DISCARD_RANGE - if we can discard everything, we can
discard a subrange, too.

Obviously, we can do better for PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE,
but eventually u_threaded_context should handle swapping out buffers
for new idle buffers, anyway.  In the meantime, this is at least better.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
bacc722d13 iris: Flush the render cache in flush_and_dirty_for_history
BLORP uses the render engine to write to buffers, and we need to flush
that data out to the actual surface (finishing the write).  Then, the
rest of this function invalidates any caches that might have stale data
which needs to be refetched.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
7a9e87c224 iris: Implement multi-slice copy_region
I don't know if this is required - surprisingly, I haven't seen it
matter - but I'd like to use it for multi-slice transfer maps.  We may
as well do the right thing.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
307f3f9924 iris: Leave a comment about why Broadwell images are broken
There are a variety of ways to fix this, many of which are simple, but
I could use some advice on which ones other people prefer, and so we'll
punt until after the holidays.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
7ed1383c0a iris: Fix surface states for Gen8 lowered-to-untype images
We have to use SURFTYPE_BUFFER and ISL_FORMAT_RAW for these.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
477e7d575b iris: Fill out brw_image_params for storage images on Broadwell 2019-02-21 10:26:11 -08:00
Kenneth Graunke
7e35333c73 iris: Don't make duplicate system values
We were relying on CSE/GVN/etc to coalesce all intrinsics that load the
same value, but that's a bad idea.  We might have a couple intrinsics
that reload the same value.  If so, we only want to set up the uniform
on the first one we see.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
bc3bb28645 iris: Don't enable push constants just because there are system values
System values are built-in uniforms.  We set them up as UBO values, and
might pull or push them.  UBO push analysis will take care of that.  We
only want to enable push constants if there's an actual range being
pushed.  Otherwise, we might get into a scenario where 3DSTATE_PS
enables push constants but 3DSTATE_CONSTANT_PS isn't pushing anything.

This fixes GPU hangs in Broadwell image load store tests which have
unused image param system values but no other uniforms.  (We shouldn't
be making those anyway, but that's a separate fix...)
2019-02-21 10:26:11 -08:00
Kenneth Graunke
2ca0d913ea iris: Fix framebuffer layer count
cso_fb->layers is only valid for no-attachment framebuffers.  Use the
helper function to get the real value, then stash it so we don't have
to call the helper function on the old value for comparison, or at draw
time for Force Zero RTA Index setting.

This fixes Force Zero RTA Index being set even when attempting layered
rendering.
2019-02-21 10:26:11 -08:00
Dave Airlie
df60241ff7 iris: handle qbo fragment shader invocation workaround 2019-02-21 10:26:11 -08:00
Dave Airlie
5ae2e5aa94 iris: add fs invocations query workaround for broadwell 2019-02-21 10:26:11 -08:00
Dave Airlie
8806b29e16 iris: setup gen8 caps 2019-02-21 10:26:11 -08:00
Dave Airlie
1bbf095473 iris: limit gen8 to 8 samples 2019-02-21 10:26:11 -08:00
Dave Airlie
823609b1a3 iris/WIP: add broadwell support
This adds all the state changes, MOCS changes,
2019-02-21 10:26:11 -08:00
Kenneth Graunke
5be72d9a20 iris: Delete bogus comment about cube array counting.
Both 'z' and 'depth' are counted in slices, according to the Gallium
docs (context.rst).  In our temporary memory, we allocate `box.depth`
slices, so we need to rebase the starting slice (box.z) down to 0,
and back again when writing on unmap.

There's nothing strange about cubes here.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
73709be0c3 iris: Fix compute scratch pinning
Thanks to Eero Tamminen for helping catch this.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
3ab3aa23c2 iris: Add a more long term TODO about timebase scaling 2019-02-21 10:26:11 -08:00
Kenneth Graunke
7ddc1f8ded iris: Only resolve inputs for actual shader stages
We don't need to consider compute at render time, and don't need to
consider disabled stages.  4% on drawoverhead.
2019-02-21 10:26:11 -08:00
Rhys Kidd
6c17e7d95f iris: Fix assertion in iris_resource_from_handle() tiling usage
Assertion error:

  iris_resource_from_handle: Assertion `res->bo->tiling_mode ==
      isl_tiling_to_i915_tiling(res->surf.tiling)' failed.

This patch fixes 16 piglit tests on KBL:
glx/glx-multithread-texture
glx/glx-query-drawable-glx_fbconfig_id-glxpbuffer
glx/glx-query-drawable-glx_fbconfig_id-glxpixmap
glx/glx-query-drawable-glx_preserved_contents
glx/glx-query-drawable-glxpbuffer-glx_height
glx/glx-query-drawable-glxpbuffer-glx_width
glx/glx-query-drawable-glxpixmap-glx_height
glx/glx-query-drawable-glxpixmap-glx_width
glx/glx-swap-pixmap
glx/glx-swap-pixmap-bad
glx/glx-tfp
glx/glx-visuals-depth -pixmap
glx/glx-visuals-stencil -pixmap
spec/egl 1.4/eglcreatepbuffersurface and then glclear
spec/egl 1.4/largest possible eglcreatepbuffersurface and then glclear
spec/egl_nok_texture_from_pixmap/basic

Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
2019-02-21 10:26:11 -08:00
Kenneth Graunke
73d525f188 iris: Fix scratch space allocation on Icelake.
Gen9-10 have fewer than 4 subslices per slice, so they need this to be
rounded up.  Gen11 isn't documented as needing this hack, and it can
also have more than 4 subslices, so the hack actually can break things.

Fixes tests/spec/arb_enhanced_layouts/execution/component-layout/
sso-vs-gs-fs-array-interleave
2019-02-21 10:26:11 -08:00
Kenneth Graunke
154e3e45bb iris: better MOCS 2019-02-21 10:26:11 -08:00
Dave Airlie
aaaf611130 iris: fix gpu calcs for timestamp queries 2019-02-21 10:26:11 -08:00
Kenneth Graunke
3c45d03049 iris: only mark depth/stencil as writable if writes are actually enabled 2019-02-21 10:26:11 -08:00
Kenneth Graunke
3a938a4b23 iris: more dead comments 2019-02-21 10:26:11 -08:00
Kenneth Graunke
e169cb09c3 iris: pin and re-pin the scratch BO 2019-02-21 10:26:11 -08:00
Kenneth Graunke
dd0d47a5d2 iris: delete finished comments 2019-02-21 10:26:11 -08:00
Kenneth Graunke
32ee2e4c27 iris: always pin the binder...in the compute context, too.
not sure why this hasn't tripped things up
2019-02-21 10:26:11 -08:00
Kenneth Graunke
fbfe07c4f3 iris: Track blend enables, save outbound for resolve code 2019-02-21 10:26:11 -08:00
Kenneth Graunke
5481887ca8 iris: whitespace fixes 2019-02-21 10:26:11 -08:00
Kenneth Graunke
b2fa90706e iris: Make a alloc_surface_state helper
This does the gtt_offset addition for us
2019-02-21 10:26:11 -08:00
Kenneth Graunke
b358c4b92b iris: Use a surface state fill helper
This will check aux_usage eventually
2019-02-21 10:26:11 -08:00
Kenneth Graunke
b92ca4d0f6 iris: don't print the pointer in INTEL_DEBUG=submit
lots of noise in diff, hope was it would be useful for gdb, but the
the GEM handle is good enough
2019-02-21 10:26:11 -08:00
Kenneth Graunke
ad969a00c0 iris: Fix the prototype for iris_bo_alloc_tiled
This now matches the actual function in iris_bufmgr.c, as well as the
equivalent brw_bufmgr.c function...
2019-02-21 10:26:11 -08:00
Kenneth Graunke
598a78849e iris: Fix for PIPE_CAP_SIGNED_VERTEX_BUFFER_OFFSET
This fixes ext_transform_feedback-builtin-varyings gl_Position after the
combination of my transform feedback reworks and my vertex buffer
reworks (?)
2019-02-21 10:26:11 -08:00
Kenneth Graunke
392fba5f31 iris: drop unnecessary genx->streamout field 2019-02-21 10:26:11 -08:00
Kenneth Graunke
5307ff6a5f iris: Implement DrawTransformFeedback()
We get the count by dividing the offset by the stride.
2019-02-21 10:26:11 -08:00
Jason Ekstrand
2e103fff63 iris: Copy anv's MI_MATH helpers for multiplication and division
(import done by Ken but with author set to Jason because it's his
code that's being imported, so he deserves the credit)
2019-02-21 10:26:11 -08:00
Kenneth Graunke
52baba80f3 iris: only get space for one offset in stream output targets
Target corresponds to a buffer, buffer only records one offset, not
multiple.
2019-02-21 10:26:10 -08:00
Kenneth Graunke
31357bae4b iris: Move iris_stream_output_target def to iris_context.h
now that it doesn't have genxml
2019-02-21 10:26:10 -08:00
Kenneth Graunke
cf4931e586 iris: Don't bother packing 3DSTATE_SO_BUFFER at create time
We have to do half the packet late anyway, we may as well just do it
all at set time.  This also lets us move the struct def out of genxml
2019-02-21 10:26:10 -08:00
Kenneth Graunke
754d678b0a iris: Add _MI_ALU helpers that don't paste
This lets you pass arguments as function parameters
2019-02-21 10:26:10 -08:00
Kenneth Graunke
5094062bbe iris: Reorder LRR parameters to have dst first.
LRI and LRM both put dst first, be consistent.
2019-02-21 10:26:10 -08:00
Kenneth Graunke
2f5d85661f iris: rewrite set_vertex_buffer and VB handling
I was using the Gallium API wrong.  set_* functions with start_slot
and count parameters are supposed to update a subrange of the items.
I had been trashing all bound vertex buffers and starting over.

This should hopefully also make it easier to slot in additional
VERTEX_BUFFER_STATEs at draw time, say, for shader draw parameters.
2019-02-21 10:26:10 -08:00
Kenneth Graunke
286b8b8f99 iris: handle PatchVerticesIn as a system value. 2019-02-21 10:26:10 -08:00
Tapani Pälli
96bb328e9b iris: add Android build
Note that at least following additional libs/components require changes
since they refer to BOARD_GPU_DRIVERS variable which is used to select
the driver:

  - mixins
  - minigbm
  - libdrm
  - drm_gralloc

v2: (feedback by Gustaw Smolarczyk) Fix trailing \ in a few cases

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
2019-02-21 10:26:10 -08:00
Kenneth Graunke
97e82e80f9 iris: override alpha to one src1 blend factors
No idea why this used to pass and doesn't after updating...seems like
we should have been handling it all along...
2019-02-21 10:26:10 -08:00
Kenneth Graunke
90b2745148 iris: Always do rasterizer discard in clipper
but continue doing it in SOL if possible because it's faster

Fixes ./bin/ext_transform_feedback-discard-drawarrays - simpler too
2019-02-21 10:26:10 -08:00
Kenneth Graunke
5f511798d0 iris: Fix primitive generated query active flag 2019-02-21 10:26:10 -08:00
Kenneth Graunke
99cab4d381 iris: Enable guardband clipping 2019-02-21 10:26:10 -08:00
Kenneth Graunke
f062dcdfbb iris: Clamp viewport extents to the framebuffer dimensions
Fixes arb_framebuffer_no_attachments-query's resize subtest.
2019-02-21 10:26:10 -08:00
Kenneth Graunke
fb2df1b5d5 iris: Fix clear dimensions
Fixes depthstencil-render-miplevels 1024 s=z24_s8
2019-02-21 10:26:10 -08:00
Kenneth Graunke
2e79e46d23 iris: Drop continues in resolve
Now that we u_bit_scan we know it exists
2019-02-21 10:26:10 -08:00
Kenneth Graunke
5fde1fa988 iris: Replace num_textures etc with a bitmask we can scan
More accurate bounds, plus can skip dead ones
2019-02-21 10:26:10 -08:00
Kenneth Graunke
7ad7d0beea iris: Fix set_sampler_views with start > 0 2019-02-21 10:26:10 -08:00
Kenneth Graunke
1c6fea8e7b iris: fix set_sampler_views to not unbind, be better about bounds 2019-02-21 10:26:10 -08:00
Kenneth Graunke
598ce8e88e iris: fix overhead regression from flushing for storage images
st calls us with count = 32 but a NULL pointer...we only really care
about the highest non-NULL image...
2019-02-21 10:26:10 -08:00
Kenneth Graunke
4749f6cc4f iris: Fix NOS mechanism
Set bits, not values
2019-02-21 10:26:10 -08:00
Kenneth Graunke
a24734a2d7 iris: re-pin inherited streamout buffers 2019-02-21 10:26:10 -08:00
Kenneth Graunke
19803d0aa7 iris: reemit SBE when sprite coord origin changes
fixes arb_point_sprite-checkerboard
2019-02-21 10:26:10 -08:00
Kenneth Graunke
480c62bc7e iris: omask can kill 2019-02-21 10:26:10 -08:00
Kenneth Graunke
bd031eb2e8 iris: reject all clipping when we can't use streamout render disabled 2019-02-21 10:26:10 -08:00
Kenneth Graunke
72cf2185c8 iris: make clipper statistics dynamic 2019-02-21 10:26:10 -08:00
Kenneth Graunke
1114f0c1ce iris: CS stall for stream out -> VB
i965 doesn't do this, but I suspect it just stalls a lot and doesn't hit
this.  Fixes ext_transform_feedback-position render among others.
2019-02-21 10:26:10 -08:00
Kenneth Graunke
c03fbb41aa iris: fix dma buf import strides 2019-02-21 10:26:10 -08:00
Kenneth Graunke
90274bd48f iris: fix alpha channel for RGB BC1 formats 2019-02-21 10:26:10 -08:00
Jason Ekstrand
47d4ea1a16 iris: Allocate buffer resources separately
(cleaned up by Ken - make sure a bunch of things were more obviously
not using res->surf, do allow checking res->surf.tiling == LINEAR,
drop format cpp checks that aren't needed, drop memzone handling for
images, assume buffers / non-buffers in a few places...)
2019-02-21 10:26:10 -08:00
Kenneth Graunke
585c95f8cc iris: Don't bother considering if the underlying surface is a cube
Dave fixed it to consider whether the sampler view is a cube.
With that, there's no point (possibly harm) in looking if the original
resource was a cube...if it's an array view, we don't want to treat it
as a cube anymore...
2019-02-21 10:26:10 -08:00
Kenneth Graunke
773adeb9e9 iris: move some non-buffer case code in a bit 2019-02-21 10:26:10 -08:00
Kenneth Graunke
2c0f001295 iris: Stop leaking iris_uncompiled_shaders like mad
Now shader-db actually executes.  We still need a plan for culling
dead iris_compiled_shaders...
2019-02-21 10:26:10 -08:00
Kenneth Graunke
68d531d7d7 iris: Destroy the bufmgr
Plugs a 12360 byte leak
2019-02-21 10:26:10 -08:00
Kenneth Graunke
7c29c3d01e iris: Fix IRIS_MEMZONE_COUNT to exclude the border color pool
This is supposed to exclude single address zones.  We were getting
too many VMA allocators but failing to set them up, which worked out
because we also forgot to destroy them...
2019-02-21 10:26:10 -08:00
Kenneth Graunke
6cb211121b iris: Unref unbound_tex resource
Plugs a 12536 byte leak
2019-02-21 10:26:10 -08:00
Kenneth Graunke
f73fdb4001 iris: Destroy the border color pool
This plugs a 12224 byte leak
2019-02-21 10:26:10 -08:00
Kenneth Graunke
3d55e9a2aa iris: Destroy transfer helper on screen teardown
Plugs a 16 byte leak
2019-02-21 10:26:10 -08:00
Kenneth Graunke
bdc1269eb2 iris: Fix failed to compile TCS message 2019-02-21 10:26:10 -08:00
Kenneth Graunke
fbf3124771 iris: Rework tiling/modifiers handling
We were being very picky about things being Y tiled.  But, not
everything can be - for example, > 16382 surfaces on SKL GT1-3
have to fall back to linear.

Instead, give ISL options and let it pick.
2019-02-21 10:26:10 -08:00
Kenneth Graunke
761a5fb36a iris: fix conditional compute, don't stomp predicate for pipelined queries 2019-02-21 10:26:10 -08:00
Kenneth Graunke
40b12c103c iris: check query first
this lets us avoid the predicate bit in more cases, which is nice
2019-02-21 10:26:10 -08:00
Kenneth Graunke
0c3ea03e4b iris: for BLORP, only use the predicate enable bit when USE_BIT 2019-02-21 10:26:10 -08:00
Dave Airlie
7bbf3ff4a9 iris: add conditional render support 2019-02-21 10:26:10 -08:00
Kenneth Graunke
dbe198d6ba iris: drop key_size_for_cache
dead since my program cache API rework.  we could still use it for one
function, but it's so trivial to pass the size, that it's probably not
worth the extra code
2019-02-21 10:26:10 -08:00
Dave Airlie
e4115eaca0 iris: iris add load register reg32/64
These will be needed for broadwell and conditional render
2019-02-21 10:26:10 -08:00
Dave Airlie
311a1b3198 iris: execute compute related query on compute batch.
This only happens for the compute invocations query.
2019-02-21 10:26:10 -08:00
Dave Airlie
00645ea01c iris: fix cube texture view 2019-02-21 10:26:10 -08:00
Kenneth Graunke
39d1056d10 iris: fix some SO overflow query bugs and tidy the code a bit 2019-02-21 10:26:10 -08:00
Dave Airlie
527e5bcdc7 iris: add initial transform feedback overflow query paths (V3)
v2: fix cpu overflow calc
v3: use a struct
2019-02-21 10:26:10 -08:00
Kenneth Graunke
0ded23a552 iris: actually flush for storage images 2019-02-21 10:26:10 -08:00
Kenneth Graunke
69e97670bc iris: add an extra BT assert from Chris Wilson 2019-02-21 10:26:10 -08:00
Kenneth Graunke
4312784674 iris: add assertions about binding table starts 2019-02-21 10:26:10 -08:00
Kenneth Graunke
240615695d iris: drop pull constant binding table entry
nothing uses this
2019-02-21 10:26:10 -08:00
Kenneth Graunke
10d04cdaa4 iris: Use program's num textures not the state tracker's bound
the state tracker might bind more textures than the program is using.
2019-02-21 10:26:10 -08:00
Kenneth Graunke
855ff47d36 iris: Enable precompiles 2019-02-21 10:26:10 -08:00
Kenneth Graunke
ed4ffb9715 iris: rework program cache interface
This exposes iris_upload_shader() without having to bind it, which will
be useful for precompiles.  It also lets us examine the old programs and
flag dirty bits at a higher level, rather than cramming all that
knowledge into the cache layer.
2019-02-21 10:26:10 -08:00
Kenneth Graunke
701a6b6006 iris: Use wrappers for create_xs_state rather than a switch statement 2019-02-21 10:26:10 -08:00
Kenneth Graunke
e628095b9a iris: fix comment location 2019-02-21 10:26:10 -08:00
Kenneth Graunke
e5df8913e1 iris: export iris_upload_shader 2019-02-21 10:26:10 -08:00
Kenneth Graunke
d525b3dfad iris: fix prototype warning 2019-02-21 10:26:10 -08:00
Kenneth Graunke
84a8c63527 iris: Re-pin even if nothing is dirty 2019-02-21 10:26:10 -08:00
Kenneth Graunke
415ede346d iris: Flush for history at various moments
When we blit, transfer, or copy_resource to a buffer, we need to flush
to ensure any stale data for that buffer is invalidated in the caches.

bind_history will inform us which caches need to be flushed.

Also, for any push constant buffers, we need to flag those dirty so
that we re-emit 3DSTATE_CONSTANT_*, causing the data to be re-pushed.
2019-02-21 10:26:10 -08:00
Kenneth Graunke
c8579e708e iris: add iris_flush_and_dirty_for_history 2019-02-21 10:26:10 -08:00
Kenneth Graunke
d169747a3e iris: Track a binding history for buffer resources
This will let us know what caches to flush / state to dirty when
altering the contents of a buffer.
2019-02-21 10:26:10 -08:00
Kenneth Graunke
f49f506b13 iris: drop long dead XXX comment 2019-02-21 10:26:10 -08:00
Kenneth Graunke
5dbd6df9f7 iris: Do the 48-bit vertex buffer address invalidation workaround 2019-02-21 10:26:10 -08:00
Kenneth Graunke
1b1ea23766 iris: Fix VIEWPORT/LAYER in stream output info
Fixes glsl-1.50-transform-feedback-builtins and
ext_transform_feedback-builtin-varyings gl_PointSize
2019-02-21 10:26:10 -08:00
Kenneth Graunke
c5b22441f1 iris: Fix buffer -> buffer copy_region
Size can be too large for a surf, blorp_buffer_copy chops things up
into segments we can actually handle

Fixes map_buffer_range_test and copy_buffer_coherency
2019-02-21 10:26:10 -08:00
Kenneth Graunke
beb2d5e065 iris: Lie about indirects
fixes interpolateAt tests
2019-02-21 10:26:10 -08:00
Kenneth Graunke
b9ccb00e2c iris: Enable ctx->Const.UseSTD430AsDefaultPacking
hooray for obscurely named pipe caps with bizarre descriptions!
2019-02-21 10:26:10 -08:00
Kenneth Graunke
39cb10613c iris: update comment 2019-02-21 10:26:10 -08:00
Kenneth Graunke
f9612e7682 iris: RT flush for memorybarrier with texture bit
PIXEL_BUFFER_BARRIER_BIT turns into PIPE_BARRIER_TEXTURE and it ought
to trigger an RT flush, according to brw_memory_barrier
2019-02-21 10:26:10 -08:00
Kenneth Graunke
2c23721397 iris: PIPE_CONTROL workarounds for GPGPU mode 2019-02-21 10:26:10 -08:00
Kenneth Graunke
f1a7392be1 iris: Put batches in an array
We keep re-making this array all over the place
2019-02-21 10:26:10 -08:00
Kenneth Graunke
c2a77efa71 iris: put render batch first in fence code
this shouldn't matter, but it will make the next refactor easier
2019-02-21 10:26:10 -08:00
Kenneth Graunke
d918c09975 iris: flush the compute batch too if border pool is redone 2019-02-21 10:26:10 -08:00
Kenneth Graunke
017b556609 iris: leave a TODO 2019-02-21 10:26:10 -08:00
Chris Wilson
f459c56be6 iris: Add fence support using drm_syncobj 2019-02-21 10:26:10 -08:00
Kenneth Graunke
db199d9d07 iris: Add wait fences to properly sync between render/compute
When flushing a batch due to a data dependency, we need to not only
kick off the other batch's work, but stall our execution until it
completes.  Just wait on last_syncpt after flushing it.
2019-02-21 10:26:10 -08:00
Kenneth Graunke
d69bc4ac12 iris: Hang on to the last batch's sync-point, so we can wait on it 2019-02-21 10:26:10 -08:00
Chris Wilson
fae74234d9 iris: Tag each submitted batch with a syncobj
(adjusted by Ken to make the signalling sync object immediately on
batch reset, rather than batch finish time.  this will work better
with deferred flushes...)
2019-02-21 10:26:10 -08:00
Kenneth Graunke
3e332af611 iris: Drop vestiges of throttling code 2019-02-21 10:26:10 -08:00
Chris Wilson
54347c078e iris: Merge two walks of the exec_bos list 2019-02-21 10:26:10 -08:00
Kenneth Graunke
3455f57575 iris: replace vestiges of fence fds with newer exec_fence API
patch by me and Chris Wilson
2019-02-21 10:26:10 -08:00
Kenneth Graunke
11da219be9 iris: Avoid synchronizing due to the workaround BO 2019-02-21 10:26:10 -08:00
Kenneth Graunke
30d7bebc8a iris: Avoid cross-batch synchronization on read/reads
This avoids flushing batches just because e.g. both are reading the same
dynamic state streaming buffer, or shader assembly buffer.
2019-02-21 10:26:10 -08:00
Kenneth Graunke
b21e916a62 iris: Combine iris_use_pinned_bo and add_exec_bo 2019-02-21 10:26:10 -08:00
Kenneth Graunke
fb4c898842 iris: Use iris_use_pinned_bo rather than add_exec_bo directly
less special this way
2019-02-21 10:26:10 -08:00
Chris Wilson
e5528151a7 iris: Fix assigning the output handle for exporting for KMS
Fixes gbm_bo_get_handle() used for KMS in glamor.
2019-02-21 10:26:10 -08:00
Chris Wilson
01e729f883 iris: Tidy exporting the flink handle 2019-02-21 10:26:10 -08:00
Kenneth Graunke
1b69b14c2a iris: Fix SLM
Now that Jason has set up the L3 we can do this.  Also, my assert was
useless because we hadn't set up the field in the first place.  Oops.
2019-02-21 10:26:10 -08:00
Jason Ekstrand
f9c5e277ac iris: Don't set constant read lengths at upload time
They're set in derived_data as part of store_cs_state
2019-02-21 10:26:10 -08:00
Jason Ekstrand
a90a0e22cb iris: Configure the L3$ on the compute context 2019-02-21 10:26:10 -08:00
Kenneth Graunke
25a41b1aef iris: properly pin stencil buffers 2019-02-21 10:26:10 -08:00
Kenneth Graunke
8545e39808 iris: Fix TCS/TES slot unification
TCS outputs, TES inputs...not TCS inputs

Fixes some barrier tests
2019-02-21 10:26:10 -08:00
Kenneth Graunke
da5590496e iris: more todo notes 2019-02-21 10:26:10 -08:00
Kenneth Graunke
9878ea842f iris: scissored and mirrored blits 2019-02-21 10:26:10 -08:00
Kenneth Graunke
25f194d5ac iris: more TODO 2019-02-21 10:26:10 -08:00
Kenneth Graunke
5207a5f5d5 iris: Fix independent alpha blending.
independent_blend_enable means per-RT blending, not RGB != A
2019-02-21 10:26:10 -08:00
Kenneth Graunke
c06f6d12a5 iris: "Fix" transfer maps of buffers
x should be in bytes, not cpp units

This generally worked out because PIPE_BUFFER is supposedly required
to be R8_UINT or R8_UNORM.  I hear some state trackers pass
PIPE_FORMAT_NONE instead, however, which would make this break.

Just do the right thing directly, to be defensive and clear.
2019-02-21 10:26:10 -08:00
Kenneth Graunke
b2c04aa3a0 iris: Fix SourceAlphaBlendFactor 2019-02-21 10:26:10 -08:00
Kenneth Graunke
89833eddab iris: leave another TODO 2019-02-21 10:26:10 -08:00
Kenneth Graunke
983e2ae7d2 iris: only clip lower if there's something to clip against 2019-02-21 10:26:10 -08:00
Kenneth Graunke
e11c497fc6 iris: fix sysval only binding tables 2019-02-21 10:26:10 -08:00
Kenneth Graunke
2ddbc1025e iris: don't forget to upload CS consts 2019-02-21 10:26:10 -08:00
Kenneth Graunke
f1f84a1ae7 iris: drop param stuffs 2019-02-21 10:26:10 -08:00
Kenneth Graunke
1b5d35319e iris: don't trip on param asserts
I'd rather not rewrite i965's compute system value handling right now :(
2019-02-21 10:26:10 -08:00
Kenneth Graunke
f4829a2fe1 iris: don't support pull constants.
I don't think it matters, we won't have any params anyway, but let's
be sure it doesn't try
2019-02-21 10:26:10 -08:00
Kenneth Graunke
911f9e8f3f iris: regather info so we get CLIP_DIST slots, not CLIP_VERTEX 2019-02-21 10:26:09 -08:00
Kenneth Graunke
6d19fe376d iris: enable push constants if we have sysvals but no uniforms 2019-02-21 10:26:09 -08:00
Kenneth Graunke
1ef68d77c0 iris: drop iris_setup_push_uniform_range
it doesn't do anything, we have no params.  I guess I thought there
would be some, but they all get dead code eliminated even if we try
to make them exist in the first place.
2019-02-21 10:26:09 -08:00
Kenneth Graunke
7eeb124c02 iris: fix more uniform setup 2019-02-21 10:26:09 -08:00
Kenneth Graunke
50743eb748 iris: fix num clip plane consts 2019-02-21 10:26:09 -08:00
Kenneth Graunke
a98634a28f iris: actually upload clip planes. 2019-02-21 10:26:09 -08:00
Kenneth Graunke
c60ce3f4fd iris: bypass params and do it ourselves
the backend keeps dead code eliminating them all, so we can't do that,
plus we don't want to because params[] is lame
2019-02-21 10:26:09 -08:00
Kenneth Graunke
78fc760bab iris: dodge backend UCP lowering 2019-02-21 10:26:09 -08:00
Kenneth Graunke
deb6d588a6 iris: fix system value remapping 2019-02-21 10:26:09 -08:00
Kenneth Graunke
2b0a2915dc iris: hook up key stuff for clip plane lowering 2019-02-21 10:26:09 -08:00
Kenneth Graunke
2876dd1a37 iris: lower user clip planes 2019-02-21 10:26:09 -08:00
Kenneth Graunke
80c856cbee iris: only bother with params if there are any... 2019-02-21 10:26:09 -08:00
Kenneth Graunke
2186d83185 iris: fill out params array with built-ins, like clip planes 2019-02-21 10:26:09 -08:00
Kenneth Graunke
d3e8ff143d iris: add param domain defines 2019-02-21 10:26:09 -08:00
Kenneth Graunke
ecb28b2802 iris: drop unnecessary param[] setup from iris_setup_uniforms
the backend just considers these dead anyway
2019-02-21 10:26:09 -08:00
Kenneth Graunke
ed08f022f0 iris: Defer cbuf0 upload to draw time 2019-02-21 10:26:09 -08:00
Kenneth Graunke
e98cf9c24b iris: Clone the NIR
The backend compiler used to do this for us, but after a rebase, it's
now the driver's responsibility.  This lets us alter it for say, clip
vertex lowering, at the global level rather than the per-variant level.
2019-02-21 10:26:09 -08:00
Kenneth Graunke
587e438128 iris: Print the batch name when decoding 2019-02-21 10:26:09 -08:00
Kenneth Graunke
2727a942a4 iris: partial set_query_active_state
used to avoid OQ during clears for example

fixes occlusion_query_meta_no_fragments
2019-02-21 10:26:09 -08:00
Kenneth Graunke
64af1d9248 iris: Fix multiple RTs with non-independent blending
rt[i] isn't filled out in this case, so we have to use rt[0]
2019-02-21 10:26:09 -08:00
Kenneth Graunke
58507c02ce iris: Fix TextureBarrier
I don't know how I came up with the old one, this is now what i965 does
Also we now do compute batches too
2019-02-21 10:26:09 -08:00
Kenneth Graunke
e5d84bbd36 iris: Fix MSAA smooth points
Fixes bin/ext_framebuffer_multisample-point-smooth 2 -auto -fbo
2019-02-21 10:26:09 -08:00
Kenneth Graunke
4d219b0eb3 iris: implement scratch space!
we borrow the approach from anv rather than i965, as it works better
with pre-baked state that needs to contain scratch BO addresses

fixes a bunch of varying packing tests
2019-02-21 10:26:09 -08:00
Kenneth Graunke
9511b89ef9 iris: tidy more warnings 2019-02-21 10:26:09 -08:00
Kenneth Graunke
846316b258 iris: Enable msaa_map transfer helpers
This does the downsampling for us.  It'll use BLORP anyway because
it uses blit(), and that uses BLORP.
2019-02-21 10:26:09 -08:00
Kenneth Graunke
9ec927497e iris: Actually create/destroy HW contexts
The intention is that render and compute use their own contexts,
and each is PIPELINE_SELECT'd to the right pipeline.  But we hadn't
actually made them, so we got the fd-default context.

Thanks to Chris Wilson for catching this!
2019-02-21 10:26:09 -08:00
Kenneth Graunke
cb5f47f585 iris: Don't leak the compute batch 2019-02-21 10:26:09 -08:00
Kenneth Graunke
fbe5d75f11 iris: cross batch flushing 2019-02-21 10:26:09 -08:00
Kenneth Graunke
c3cc525c7a iris: Cross-link iris_batches so they can potentially flush each other
This makes e.g. the render batch aware of the compute batch, so it can
ask questions like "is this BO referenced by some other batch?" and do
something about that.
2019-02-21 10:26:09 -08:00
Dave Airlie
ed016b2a0b iris: fix crash in sparse vertex array
this fixes crash in array-stride piglit.
2019-02-21 10:26:09 -08:00
Kenneth Graunke
bcac11c8f1 iris: Use at least 1x1 size for null FB surface state.
Otherwise we get 0 - 1 = 0xffffffff and fail to pack SURFACE_STATE.

Fixes some object namespace pollution gltexsubimage2d tests
2019-02-21 10:26:09 -08:00
Kenneth Graunke
9c8fdf8133 iris: Drop B5G5R5X1 support
This is oddly renderable but not supported for sampling, which is the
opposite of other X formats.  Just skip it and fall back to BGRA.
2019-02-21 10:26:09 -08:00
Kenneth Graunke
4b31f506f8 iris: Enable A8/A16_UNORM in an inefficient manner
These are currently just use the 'A' hardware formats, rather than the
faster 'R' formats.  glBitmap handling needs these, it seems. :(
2019-02-21 10:26:09 -08:00
Kenneth Graunke
80497af192 iris: Enable ARB_shader_stencil_export 2019-02-21 10:26:09 -08:00
Kenneth Graunke
3e6aaa1ba5 iris: Disable a PIPE_CONTROL workaround on Icelake 2019-02-21 10:26:09 -08:00
Kenneth Graunke
84a419432d iris: Flag constants dirty on program changes
3DSTATE_CONSTANT_* looks at prog_data->ubo_ranges.  We were getting
saved by iris_set_constant_buffers() usually happening when changing
programs (as they usually change uniforms too), but with the clear
shader that doesn't use uniforms, we weren't getting one and were
leaving push constants enabled, screwing things up.

Also clean up a bit of a mess left by the hacks - we were missing
bindings in the VS/FS/CS case, among other issues...
2019-02-21 10:26:09 -08:00
Kenneth Graunke
317ba8796f iris: allow binding a null vertex buffer
PBO upload apparently does this...
2019-02-21 10:26:09 -08:00
Kenneth Graunke
aef1ba5ce4 iris: fix overhead regression from "don't stomp each other's dirty bits"
The change from dirty = 0ull to dirty &= ~NOT_MY_BITS broke the "nothing
to do?  skip it!" optimization.  thanks to Chris for noticing this!
2019-02-21 10:26:09 -08:00
Kenneth Graunke
525d89cafc iris: delete dead code 2019-02-21 10:26:09 -08:00
Kenneth Graunke
8a98e90415 iris: Fix refcounting of grid surface 2019-02-21 10:26:09 -08:00
Jason Ekstrand
8e8868d5ad iris/compute: Zero out the last grid size on indirect dispatches 2019-02-21 10:26:09 -08:00
Jason Ekstrand
c16e711ff2 iris/compute: Don't increment the grid size offset
It may be in the dynamic state buffer but the fact that we have a
resource takes care of that.  We don't need to add in the address of
the dynamic state buffer again.
2019-02-21 10:26:09 -08:00
Kenneth Graunke
a3e813c5af iris: SO_DECL_LIST fix 2019-02-21 10:26:09 -08:00
Kenneth Graunke
927c4a21bd iris: Fall back to 1x1x1 null surface if no framebuffer supplied
If the state tracker never gave us the framebuffer dimensions via
a set_framebuffer_state() call, just fall back to the unbound texture
null surface, which is 1x1x1.  Otherwise we'd use a NULL resource
(no pun intended).
2019-02-21 10:26:09 -08:00
Kenneth Graunke
5d1a9db720 iris: Fix off by one in scissoring, empty scissors, default scissors 2019-02-21 10:26:09 -08:00
Kenneth Graunke
938d63b2e8 iris: Move snapshots_landed to the front.
Transform feedback overflow queries need to write additional data,
and it would be nice to have this field remain at a consistent offset.
2019-02-21 10:26:09 -08:00
Kenneth Graunke
ba2a4207f9 iris: Clamp UBO and SSBO access to the actual BO size, for safety 2019-02-21 10:26:09 -08:00
Kenneth Graunke
a9b32f2bbf iris: Fix texture buffer / image buffer sizes.
Also fix image buffers with offsets.
2019-02-21 10:26:09 -08:00
Kenneth Graunke
d1f8947792 iris: fix SF_CLIP_VIEWPORT array indexing with multiple VPs
fixes bunches of viewport stuffs
2019-02-21 10:26:09 -08:00
Kenneth Graunke
5bd49a47b6 iris: flag CC_VIEWPORT when changing num viewports
this also has a loop over num_viewports
2019-02-21 10:26:09 -08:00
Kenneth Graunke
d98967d936 iris: fix UBOs with bindings that have an offset 2019-02-21 10:26:09 -08:00
Kenneth Graunke
3f70956a4e iris: try and avoid pointless compute submissions
if apps don't use compute shaders, we don't even want to kick off the
compute initialization batch
2019-02-21 10:26:09 -08:00
Kenneth Graunke
97125e9bb3 iris: fix SBA flushing by refactoring code 2019-02-21 10:26:09 -08:00
Kenneth Graunke
8fa99481e7 iris: do PIPELINE_SELECT for render engine, add flushes, GLK hacks 2019-02-21 10:26:09 -08:00
Kenneth Graunke
b2d223b6bf iris: hack to avoid memorybarriers out the wazoo
we don't want to emit piles of pipe controls to a compute batch if
it isn't necessary...

prevents double-batch-wraps in cs-op-selection-bool-bvec4-bvec4
(but it's still kinda a big ol' hack...)
2019-02-21 10:26:09 -08:00
Kenneth Graunke
b3a40c27a2 iris: don't let render/compute contexts stomp each other's dirty bits
only clear what you process
2019-02-21 10:26:09 -08:00
Kenneth Graunke
f8796079da iris: better dirty checking 2019-02-21 10:26:09 -08:00
Kenneth Graunke
06a993dac2 iris: rewrite grid surface handling
now we only upload a new grid when it's actually changed, which saves us
from having to emit a new binding table every time it changes.

this also moves a bunch of non-gen-specific stuff out of iris_state.c
2019-02-21 10:26:09 -08:00
Kenneth Graunke
155e1a63d5 iris: XXX for compute state tracking :/
Maybe we should just move dirty to batch, it would help with the
reset stuff too
2019-02-21 10:26:09 -08:00
Kenneth Graunke
643030f4fb iris: fix whitespace 2019-02-21 10:26:09 -08:00
Kenneth Graunke
b0dc11993e iris: bail if SLM is needed 2019-02-21 10:26:09 -08:00
Kenneth Graunke
973b937cac iris: leave XXX about unnecessary binding table uploads 2019-02-21 10:26:09 -08:00
Kenneth Graunke
7fb8c20d7b iris: drop unnecessary #ifdefs 2019-02-21 10:26:09 -08:00
Kenneth Graunke
549db5b90e iris: drop XXX that Jordan handled 2019-02-21 10:26:09 -08:00
Jordan Justen
942bdb2906 iris/compute: Support indirect compute dispatch
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2019-02-21 10:26:09 -08:00
Jordan Justen
b35c8f2182 iris/compute: Push subgroup-id
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2019-02-21 10:26:09 -08:00
Jordan Justen
229450a2a6 iris/compute: Flush compute batch on memory-barriers
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2019-02-21 10:26:09 -08:00
Jordan Justen
fb4637797e iris/compute: Provide binding table entry for gl_NumWorkGroups
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2019-02-21 10:26:09 -08:00
Jordan Justen
fcd0364857 iris/compute: Wait on compute batch when mapping
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2019-02-21 10:26:09 -08:00
Jordan Justen
ea416d0b5d iris/program: Don't try to push ubo ranges for compute
We only can push constants for compute shaders from one range.

Gallium glsl-to-nir (src/mesa/state_tracker/st_glsl_to_nir.cpp) lowers
all uniform accesses to a ubo.

Unfortunately we also load the subgroup-id as a uniform in the
compiler. Since we use the 1 push range for this subgroup-id, we then
lose the ability to actually push the ubo with all the normal user
uniform values.

In other words, there is lots of room for performance improvement, but
at least retrieving the uniforms as pull-constants is functional for
now.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2019-02-21 10:26:09 -08:00
Jordan Justen
c7cfa4000f iris/compute: Get group counts from grid->grid
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2019-02-21 10:26:09 -08:00
Jordan Justen
fd9ccd8b5d iris/compute: Flush compute batches
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2019-02-21 10:26:09 -08:00
Jordan Justen
9b5cda95aa iris/compute: Add MEDIA_STATE_FLUSH following WALKER
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2019-02-21 10:26:09 -08:00
Jordan Justen
6ebd04ac8f iris: Add iris_restore_compute_saved_bos
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2019-02-21 10:26:09 -08:00
Jordan Justen
622aaa290f iris: Add IRIS_DIRTY_CONSTANTS_CS
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2019-02-21 10:26:09 -08:00
Jordan Justen
25f1625edf iris/compute: Set mask bits on PIPELINE_SELECT
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2019-02-21 10:26:09 -08:00
Kenneth Graunke
9fc672428d iris: little bits of compute basics 2019-02-21 10:26:09 -08:00
Kenneth Graunke
860ce6af3f iris: drop XXX's about swizzling
pretty sure this is unnecessary on modern HW
2019-02-21 10:26:09 -08:00
Kenneth Graunke
12de56f53d iris: drop dead format //'s
these just aren't supported
2019-02-21 10:26:09 -08:00
Kenneth Graunke
f6c68066a6 iris: yes 2019-02-21 10:26:09 -08:00
Kenneth Graunke
752abeb690 iris: initial compute caps
RET macro borrowed from freedreno
2019-02-21 10:26:09 -08:00
Kenneth Graunke
4da28c2c22 iris: Enable fb fetch
needed for ES 3.2
2019-02-21 10:26:09 -08:00
Kenneth Graunke
be905bd461 iris: advertise GL_ARB_shader_texture_image_samples 2019-02-21 10:26:09 -08:00
Jordan Justen
6441e906e8 iris: Set num_uniforms in bytes
Ref: brw_nir_lower_uniforms, type_size_scalar_bytes

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
2019-02-21 10:26:09 -08:00
Kenneth Graunke
c29fd34259 iris: move images next to textures in binding table 2019-02-21 10:26:09 -08:00
Kenneth Graunke
0d9c5b4e7e iris: null for non-existent cbufs
prevents BTs from being shifted down incorrectly
2019-02-21 10:26:09 -08:00
Kenneth Graunke
98e8f80e7d iris: actually set image access 2019-02-21 10:26:09 -08:00
Jason Ekstrand
d9aee25a46 iris: Don't lower image formats for write-only images 2019-02-21 10:26:09 -08:00
Kenneth Graunke
a06f0fe517 iris: set image access correctly 2019-02-21 10:26:09 -08:00
Kenneth Graunke
5d1dadfc38 iris: bother with BTIs 2019-02-21 10:26:09 -08:00
Kenneth Graunke
f5b887da6c iris: implement set_shader_images hook 2019-02-21 10:26:09 -08:00
Kenneth Graunke
26a54ae4b2 iris: lower storage image derefs 2019-02-21 10:26:09 -08:00
Kenneth Graunke
e97a24da89 iris: set the binding table size
we weren't doing mark_surface_used on images (i965 does it while
uploading the unnecessary image uniforms), so our binding tables were
too small...
2019-02-21 10:26:09 -08:00
Kenneth Graunke
28b41992c8 iris: X32_S8X24 :/
This can happen when faking Z32_S8X24 and setting StencilSampling = true

I guess we'll just turn it into S8_UINT...

Fixes KHR-GL45.texture_swizzle.functional
2019-02-21 10:26:09 -08:00
Kenneth Graunke
6e7957a22d iris: enable I/L formats 2019-02-21 10:26:09 -08:00
Kenneth Graunke
bfbebbaa36 iris: Use R/RG instead of I/L/A when sampling 2019-02-21 10:26:09 -08:00
Kenneth Graunke
94569a6458 iris: rework format translation apis 2019-02-21 10:26:09 -08:00
Kenneth Graunke
b9eeed3e8f iris: Allow PIPE_CONTROL with Stall at Scoreboard and RT flush
It's nonsensical, but not illegal, and mandatory on Icelake
2019-02-21 10:26:09 -08:00
Kenneth Graunke
65d1cda995 iris: add gen11 to genX_call 2019-02-21 10:26:09 -08:00
Kenneth Graunke
0fdcb20803 iris: inline stage_from_pipe to avoid unused warnings 2019-02-21 10:26:09 -08:00
Kenneth Graunke
6fbb6ba290 iris: pipe to scs -> iris_pipe.h 2019-02-21 10:26:09 -08:00
Kenneth Graunke
87351b8dfe iris: force persample interp cap 2019-02-21 10:26:09 -08:00
Kenneth Graunke
90b9efc1f9 iris: stencil texturing 2019-02-21 10:26:09 -08:00
Kenneth Graunke
9b229d266d iris: fix Z32_S8 depth sampling
We were accidentally using the ISL_FORMAT_R32_FLOAT_X8X24_TYPELESS
format, which is NOT what we use.  We just store R32_FLOAT depth.

fixes Piglit's texwrap GL_ARB_depth_buffer_float
2019-02-21 10:26:09 -08:00
Kenneth Graunke
822f91508e iris: don't mark contains_draw = false when chaining batches
chaining to a new batch reuses create_batch(), but we don't need to do
the work of pinning BOs we inherit from a previous batch...when that is
actually part of the same execbuf invocation.

instead, just flag it when setting primary_batch_size = 0, in
iris_batch_reset
2019-02-21 10:26:09 -08:00
Kenneth Graunke
294ce58a30 iris: vma_free bo->size, not bo_size
this is more obviously correct.  I think the two end up being the same
in practice, since this is in the alloc_from_cache case, and presumably
bo from the bucket has bo->size == bucket->size, and bo_size also is
bucket->size...

still.  better to do the obvious thing.

brw_bufmgr already does it this way.
2019-02-21 10:26:09 -08:00
Kenneth Graunke
2f24000662 iris: drop a bunch of pipe_sampler_state stuff we don't need 2019-02-21 10:26:09 -08:00
Kenneth Graunke
c6016d3761 iris: just mark snapshots_landed from the CPU
otherwise, get results may check q->map->snapshots_landed...before our
commands to initialize it to false have actually executed...so it'd get
some random garbage from the BO...
2019-02-21 10:26:09 -08:00
Kenneth Graunke
3c0ef22edb iris: Enable ARB_shader_vote
The easiest get out the vote campaign ever
2019-02-21 10:26:08 -08:00
Kenneth Graunke
0395eba20f iris: magic number 36 -> #define 2019-02-21 10:26:08 -08:00
Kenneth Graunke
57f8a623c5 iris: better query file comment 2019-02-21 10:26:08 -08:00
Kenneth Graunke
d3a5d87219 iris: early return properly 2019-02-21 10:26:08 -08:00
Kenneth Graunke
07ff8c752f iris: 36-bit overflow fixes 2019-02-21 10:26:08 -08:00
Kenneth Graunke
dff174c103 iris: Need to | 1 when asking for timestamps 2019-02-21 10:26:08 -08:00
Kenneth Graunke
1d91eba7dc iris: glGet timestamps, more correct timestamps 2019-02-21 10:26:08 -08:00
Kenneth Graunke
36fbcfb06c iris: ...and SO prims emitted queries
looks like we have queries

some fails still due to races between snapshots_written and start/end
not being garbage...not sure what that's about
2019-02-21 10:26:08 -08:00
Kenneth Graunke
ec82be57e8 iris: timestamps 2019-02-21 10:26:08 -08:00
Kenneth Graunke
23572cdd07 iris: drop explicit pinning
writes will already rw_bo or ro_bo that
2019-02-21 10:26:08 -08:00
Kenneth Graunke
d8875fe406 iris: primitives generated query support 2019-02-21 10:26:08 -08:00
Kenneth Graunke
ffae6e3105 iris: pipeline stats 2019-02-21 10:26:08 -08:00
Kenneth Graunke
7840d0e091 iris: play chicken with timer queries for now
they have been crashy in the past and I don't want to risk tanking my
laptop right before my XDC talk
2019-02-21 10:26:08 -08:00
Kenneth Graunke
0b095c665d iris: gpr0 to bool
I think OQ is basically working now.
2019-02-21 10:26:08 -08:00
Kenneth Graunke
f5a8908bd1 iris: fix random failures via CS stall...but why? 2019-02-21 10:26:08 -08:00
Kenneth Graunke
ad14795805 iris: flush batch when asking for result via QBO 2019-02-21 10:26:08 -08:00
Kenneth Graunke
cf261caad9 iris: results write 2019-02-21 10:26:08 -08:00
Kenneth Graunke
d4e4517569 iris: gen10+ workarounds and break fix 2019-02-21 10:26:08 -08:00
Kenneth Graunke
dca5632de1 iris: initial query code 2019-02-21 10:26:08 -08:00
Kenneth Graunke
dd478913d5 iris: LRM/SRM/SDI hooks 2019-02-21 10:26:08 -08:00
Kenneth Graunke
af9fe0d472 iris: rw_bo for pipe controls
this is used for WRITE IMMEDIATE...
but maybe we don't want to for the workaround BO?
2019-02-21 10:26:08 -08:00
Kenneth Graunke
30c370ed4b iris: use 0 for TCS passthrough program string ID
the passthrough shader doesn't need a real program string ID - that's
basically used for ARB programs indicating total program source code
changes, or other pre-baked uniform changes, etc...none of which a
passthrough shader has...so we don't need a unique identifier to
distinguish them.  We want to use a consistent value so we find
existing passthrough shaders in the cache.
2019-02-21 10:26:08 -08:00
Caio Marcelo de Oliveira Filho
54e23442e2 iris: Add support for TCS passthrough
If no TCS is provided, create a "passthrough" TCS that will take the
default values set in the API as constants and pass to the TES, along
with any other inputs it expects.  The code to create the NIR shader
is the same as in i965.

Tested with

    ./piglit run -t 'tess' quick_shader r

and fixed a dozen crashes from that list.
2019-02-21 10:26:08 -08:00
Kenneth Graunke
5395658c61 iris: inherit the index buffer properly 2019-02-21 10:26:08 -08:00
Kenneth Graunke
a858b69880 iris: delete bogus comment
Caio asked what was wrong.  There is nothing wrong.  :)
2019-02-21 10:26:08 -08:00
Kenneth Graunke
f2f506fa43 iris: properly re-pin stencil buffers 2019-02-21 10:26:08 -08:00
Kenneth Graunke
aaced066e8 iris: fix context restore of 3DSTATE_CONSTANT ranges
if clean we want to DO the pinning...not SKIP the pinning.

thanks to Jordan Justen for catching this!
2019-02-21 10:26:08 -08:00
Kenneth Graunke
58a6c99ebe iris: silence const warning
not sure why this is labeled const, I'm pretty sure we are taking the
reference and owning this, so there's no particular reason we can't
change it.  it certainly seems to be working for non-compute.  and,
freedreno's ir3_shader.c seems to do this as well.  still...gross :/
2019-02-21 10:26:08 -08:00
Kenneth Graunke
897f8d9232 iris: refactor program CSO stuff 2019-02-21 10:26:08 -08:00
Caio Marcelo de Oliveira Filho
fb4a3e2736 iris: Fix uses of gl_TessLevel*
The backend compiler expects the gl_TessLevel* variables to be mapped
as inputs instead of system values.  Use the new PIPE_CAP to get this
behavior from GLSL compiler.

Tested with:
tests/spec/arb_tessellation_shader/execution/vs-tcs-tes-tessinner-tessouter-inputs-quads.shader_test
2019-02-21 10:26:08 -08:00
Kenneth Graunke
2b956a093a iris: totally untested icelake support 2019-02-21 10:26:08 -08:00
Kenneth Graunke
921790b080 iris: initialize "don't suck" bits, as Ben likes to call them 2019-02-21 10:26:08 -08:00
Kenneth Graunke
73a4cef220 iris: refactor LRIs in context setup
we're going to have more of them, so reduce the boilerplate
2019-02-21 10:26:08 -08:00
Kenneth Graunke
2d1db44e8e iris: enable ARB_enhanced_layouts 2019-02-21 10:26:08 -08:00
Kenneth Graunke
c0422d623c iris: re-pin binding table contents if we didn't re-emit them
fixes glsl-vs-loop and other regressions from multibinder.
2019-02-21 10:26:08 -08:00
Kenneth Graunke
2963276a58 iris: move binder pinning outside the dirty == 0 check
This might be a new batch with back to back non-dirty calls, if so we
need to inherit the old binder...
2019-02-21 10:26:08 -08:00
Chris Wilson
1a61a211f0 iris: fix memzone_for_address since multibinder changes 2019-02-21 10:26:08 -08:00
Kenneth Graunke
f6924e2379 iris: update comments for multibinder 2019-02-21 10:26:08 -08:00
Kenneth Graunke
5cb0527c4f iris: fix SO offset writes for multiple streams 2019-02-21 10:26:08 -08:00
Kenneth Graunke
eff081cdd9 iris: Support multiple binder BOs, update Surface State Base Address 2019-02-21 10:26:08 -08:00
Kenneth Graunke
148e315d96 iris: fix null FB and unbound tex surface state addresses 2019-02-21 10:26:08 -08:00
Kenneth Graunke
f838400a59 iris: set EXEC_OBJECT_CAPTURE on all driver internal buffers 2019-02-21 10:26:08 -08:00
Kenneth Graunke
938afd484a iris: fix constant buffer 0 to be absolute
thanks to Jason for catching this.  Fixes some va64 tests.  Surprisingly
not much else, as apparently getting to UBO range 4 is uncommon!
2019-02-21 10:26:08 -08:00
Kenneth Graunke
5a2257bb2f iris: don't unconditionally emit 3DSTATE_VF / 3DSTATE_VF_TOPOLOGY
this was just laziness on my part
2019-02-21 10:26:08 -08:00
Kenneth Graunke
4c27cb031c iris: skip over whole function if dirty == 0
kinda pointless in non-pathological cases, but does boost our score in
the drawarrays case by 50%...
2019-02-21 10:26:08 -08:00
Kenneth Graunke
888efcd192 iris: Allow inlining of require/get_command_space
eliminates so many callqs for ptr++
2019-02-21 10:26:08 -08:00
Kenneth Graunke
2ebce6f8c8 iris: use Eric's new caps helper
this does change a couple caps...PRIMITIVE_RESTART_FOR_PATCHES...
2019-02-21 10:26:08 -08:00
Kenneth Graunke
3e7a41f228 iris: new caps 2019-02-21 10:26:08 -08:00
Kenneth Graunke
52eb8d5593 iris: fix blend state memcpy
thanks to Jason for noticing grumpy valgrind
2019-02-21 10:26:08 -08:00
Kenneth Graunke
9ce92fa036 iris: Skip primitive ID overrides if the shader wrote a custom value
Fixes glsl-1.50/execution/geometry/primitive-id-out
2019-02-21 10:26:08 -08:00
Kenneth Graunke
47d3019c4a iris: fix crash when binding optional shader for the first time 2019-02-21 10:26:08 -08:00
Kenneth Graunke
6331b754df iris: handle level/layer in direct maps
needed now that we do 1D linear
2019-02-21 10:26:08 -08:00
Kenneth Graunke
9f7654139b iris: use linear for 1D textures
This gets us the gen9 compact linear storage
2019-02-21 10:26:08 -08:00
Kenneth Graunke
b2a5e1ebb3 iris: big old hack for tex-miplevel-selection
copied from ilo.  I don't understand this at all..
2019-02-21 10:26:08 -08:00
Kenneth Graunke
e4d22b16c8 iris: fix sampler state setting 2019-02-21 10:26:08 -08:00
Kenneth Graunke
b3bb33c4c1 iris: try to hack around binder issue 2019-02-21 10:26:08 -08:00
Kenneth Graunke
d2516358f9 iris: fix line-aa-width
we should probably move the roundf to st_atom_raster
2019-02-21 10:26:08 -08:00
Kenneth Graunke
701b47a197 iris: implement get_sample_position
Fixes arb_sample_shading/builtin-gl-sample-position
2019-02-21 10:26:08 -08:00
Kenneth Graunke
7ed4b80233 iris: z_res -> s_res
fixes crashes introduced a few commits ago
2019-02-21 10:26:08 -08:00
Kenneth Graunke
d1cb4b330a iris: reenable R32G32B32 texture buffers
This dropped us from GL 4.2 to GL 3.3 by mistake.  Thanks to Dave for
catching this!
2019-02-21 10:26:08 -08:00
Chris Wilson
367f6bbd01 iris: Record reusability of bo on construction
We know that if the bufmgr->reuse is set to false or if the bo is too
large for a bucket, the same will be true when we come to free the bo.
2019-02-21 10:26:08 -08:00
Kenneth Graunke
abe7dbfa4a iris: Reduce binder alignment from 64 to 32
3DSTATE_BINDING_TABLE_POINTER_XS's alignment requirement is only 32B.

Makes us waste less precious binder space.
2019-02-21 10:26:08 -08:00
Kenneth Graunke
04e8c5bb43 iris: precompute hashes for cache tracking
saves a touch of cpu overhead in the new resolve tracking
2019-02-21 10:26:08 -08:00
Chris Wilson
d209cc5170 iris: AMD_pinned_memory
(rebased by Ken, mainly set res->internal_format)
2019-02-21 10:26:08 -08:00
Kenneth Graunke
93c1921ce2 iris: proper cache tracking
this is copied from the i965 aux resolve stuff...minus the aux resolves
2019-02-21 10:26:08 -08:00
Kenneth Graunke
5e30b1083b iris: Move cache tracking to iris_resolve.c 2019-02-21 10:26:08 -08:00
Kenneth Graunke
42dccb1233 iris: use consistent copyright formatting
some of them had typos, didn't say 'authors or copyright holders',
or other mistakes.  This is now https://opensource.org/licenses/MIT
text, formatted consistently.
2019-02-21 10:26:08 -08:00
Kenneth Graunke
1d33982e9b iris: track depth/stencil writes enabled 2019-02-21 10:26:08 -08:00
Kenneth Graunke
3fecb1c44d iris: Move iris_sampler_view declaration to iris_resource.h
We'll need this for resolve tracking.  There's also no genxml stuff here
2019-02-21 10:26:08 -08:00
Kenneth Graunke
b75b52530a iris: Move things to iris_shader_state
We didn't originally have this struct, so we had lots of ad-hoc arrays.
Now that we have it, it makes sense to group things there.
2019-02-21 10:26:08 -08:00
Kenneth Graunke
410a555bfb iris: move iris_shader_state from ice->shaders.state to ice->state.shaders
it's more state related...
2019-02-21 10:26:08 -08:00
Kenneth Graunke
33701d5341 iris: Drop bogus sampler state saving
We do this in an earlier loop.  This was just reading things out of the
array, and saving them back over the same array...but in the wrong slots
2019-02-21 10:26:08 -08:00
Kenneth Graunke
aba2cee711 iris: rename pipe to base 2019-02-21 10:26:08 -08:00
Kenneth Graunke
7705f62cb6 iris: don't emit SBE all the time 2019-02-21 10:26:08 -08:00
Kenneth Graunke
630d602900 iris: port non-bucket alignment bugfix
Sergii's 24839663a4
2019-02-21 10:26:08 -08:00
Kenneth Graunke
ad6ba5a712 iris: drop pwrite
nobody uses it
2019-02-21 10:26:08 -08:00
Kenneth Graunke
aad70ad8a1 iris: drop dead assignments
Eric's commit 9a6a631762
2019-02-21 10:26:08 -08:00
Kenneth Graunke
2bd7d6fa71 iris: last VUE map NOS, handle > 16 FS inputs
not sure if the UNCOMPILED_FS flagging is still needed, should
reevaluate those hacks at some point
2019-02-21 10:26:08 -08:00
Kenneth Graunke
ee8cb7e0ee iris: implement ARB_clear_texture 2019-02-21 10:26:08 -08:00
Kenneth Graunke
84b30a2900 iris: call maybe_flush for each blorp operation
otherwise with high layer counts we may exceed two batches worth of
commands... (!)
2019-02-21 10:26:08 -08:00
Kenneth Graunke
0e059e4829 iris: assert depth is 1 in resource_copy_region
given the dstz parameter I don't think it does multiple slices..
2019-02-21 10:26:08 -08:00
Kenneth Graunke
03933a2d1b iris: blorp blit multiple slices
fixes getteximage-depth
2019-02-21 10:26:08 -08:00
Kenneth Graunke
84832ab7d4 iris: Fix tiled memcpy for cubes...and for array slices
tiled_memcpy_map was not offsetting map->ptr based on the slice,
while unmap was.  also, we were doing offsetting wrong for cubes.
2019-02-21 10:26:08 -08:00
Kenneth Graunke
bce7398646 iris: disallow RGB32 formats too 2019-02-21 10:26:08 -08:00
Kenneth Graunke
ea19d359cc iris: Convert RGBX to RGBA for rendering.
Fixes a bunch of RGB bugs.
2019-02-21 10:26:08 -08:00
Kenneth Graunke
906becec70 iris: we can do multisample Z resolves 2019-02-21 10:26:08 -08:00
Kenneth Graunke
1f156f004b iris: deal with Marek's new MSAA caps
storage sample count is equal to sample count for us, for now,
so 0 the pipe cap and ignore the new parameter
2019-02-21 10:26:08 -08:00
Kenneth Graunke
532cf23d25 iris: say no to more formats
copied from brw_surface_formats.c
2019-02-21 10:26:08 -08:00
Kenneth Graunke
d5146ba670 iris: actually do stencil blits 2019-02-21 10:26:08 -08:00
Kenneth Graunke
ad76389f88 iris: refcounting, who needs it?
that's right, we do!
2019-02-21 10:26:08 -08:00
Kenneth Graunke
be60e3247c iris: drop stencil handling now that u_transfer_helper does it 2019-02-21 10:26:08 -08:00
Kenneth Graunke
b932938d01 iris: use u_transfer_helper for depth stencil packing/unpacking 2019-02-21 10:26:08 -08:00
Kenneth Graunke
853230b5e6 iris: WTF transfers
stencil unfortunately is stored in the Weird Tile Format (WTF or Tile-W)
which needs special CPU detiling code.
2019-02-21 10:26:08 -08:00
Kenneth Graunke
d93a20e258 iris: allow S8 as a stencil format 2019-02-21 10:26:08 -08:00
Kenneth Graunke
7972599eab iris: actually emit stencil packets 2019-02-21 10:26:08 -08:00
Kenneth Graunke
753646dd6b iris: clear stencil 2019-02-21 10:26:08 -08:00
Kenneth Graunke
9ec2d3640e iris: depth or stencil fixes 2019-02-21 10:26:08 -08:00
Kenneth Graunke
763f9095ea iris: fill out more caps 2019-02-21 10:26:08 -08:00
Kenneth Graunke
2d578e71d5 iris: get angry about execbuf failures
want this to be easy to detect for now
2019-02-21 10:26:08 -08:00
Kenneth Graunke
a378ee3607 iris: simplify batch len qword alignment
Split from a patch by Chris Wilson so I can test it independently
2019-02-21 10:26:08 -08:00
Kenneth Graunke
621cb43f41 iris: rename ring to engine
makes more sense these days.  split from a patch by Chris Wilson
2019-02-21 10:26:08 -08:00
Kenneth Graunke
1a9651f29a iris: remember to set bo->userptr 2019-02-21 10:26:08 -08:00
Chris Wilson
796ad6fe97 iris: Wrap userptr for creating bo 2019-02-21 10:26:08 -08:00
Kenneth Graunke
5911fb8801 iris: sync bugfixes from brw_bufmgr
I wrote softpin support here first, then debugged and landed it in brw;
some of those fixes need to get brought back.
2019-02-21 10:26:08 -08:00
Kenneth Graunke
dfe1ee4f6f iris: comment everything
1. Write the code
2. Add comments
3. PROFIT (or just avoid cost of explaining or relearning things...)
2019-02-21 10:26:08 -08:00
Kenneth Graunke
387a414f2c iris: add minor comments 2019-02-21 10:26:08 -08:00
Dave Airlie
9d39e69219 iris: fix some hangs around null framebuffers
This fixes some cases in fbo-none* and framebuffer_no_attachments.

I'm not sure this is correct otherwise, the tests don't all pass yet

No idea if this is in any way the correct answer
2019-02-21 10:26:08 -08:00
Chris Wilson
02b82fe80a iris: Set resource modifier on handle
Required for gdm_bo_create_with_modifiers
2019-02-21 10:26:08 -08:00
Kenneth Graunke
682aeff8d0 iris: we don't support textureGatherOffsets, need it lowered 2019-02-21 10:26:08 -08:00
Kenneth Graunke
03dc99475d iris: cube arrays are cubes too 2019-02-21 10:26:08 -08:00
Kenneth Graunke
80c7096672 iris: fix sample mask
0xffffffff does not mean 1, it means enable as many as there actually
are.  we don't get set_sample_mask() calls until some masking is
actually applied...i.e. it doesn't get updated based on # of samples
in the FBO changing.
2019-02-21 10:26:08 -08:00
Kenneth Graunke
e990558152 iris: drop pipe_shader_state
looking at the freedreno code, this is totally unnecessary!  we can just
store the NIR and be happy, and not have any vestiges of TGSI.

plus we can reuse this structure for compute shaders, without needing a
pipe_compute_state base.
2019-02-21 10:26:08 -08:00
Kenneth Graunke
834b97c34b iris: fix GS output component limit
this is total, so should be 1024, not 128
2019-02-21 10:26:08 -08:00
Kenneth Graunke
c9f9a6f61b iris: Avoid croaking when trying to create FBO surfaces with bad formats
create_surface happens before st_validate_attachment, which actually
does the "hey, this is a render target now, is that OK?" check

Fixes asserts in ./bin/arb_texture_view-rendering-formats, allowing the
rest of the tests to run.
2019-02-21 10:26:08 -08:00
Kenneth Graunke
8da91ebb68 iris: enable texture gather 2019-02-21 10:26:08 -08:00
Kenneth Graunke
f3dd70182d iris: BIG OL' HACK for UBO updates
We need to re-push data when UBO changes.  This will need to be replaced
with a usage history based flushing system later.
2019-02-21 10:26:08 -08:00
Kenneth Graunke
a7311ef068 iris: update a todo comment 2019-02-21 10:26:07 -08:00
Kenneth Graunke
8e7b0deee2 iris: Don't reserve new binding table section unless things are dirty 2019-02-21 10:26:07 -08:00
Kenneth Graunke
870f2e8434 iris: implement texture/memory barriers 2019-02-21 10:26:07 -08:00
Kenneth Graunke
82ee971497 iris: drop unused bo parameter 2019-02-21 10:26:07 -08:00
Kenneth Graunke
f0159d5ca3 iris: update bindings when changing programs
the binding table layout depends on program info.

not known to fix anything yet.
2019-02-21 10:26:07 -08:00
Kenneth Graunke
b0e9c5797b iris: fix for disabling ssbos 2019-02-21 10:26:07 -08:00
Kenneth Graunke
b7b061c4e2 iris: fix SSBO indexing
st/nir offsets SSBO indexes by MaxABOs.  This is not what we want,
as it bloats the binding tables.  We'll need to adjust it to use
info->num_abos as the offset and buffer base instead.  For now,
just use the inefficient format to get us rolling.  We can add a
PIPE_CAP later.
2019-02-21 10:26:07 -08:00
Kenneth Graunke
376c7253f8 iris: enable SSBOs 2019-02-21 10:26:07 -08:00
Kenneth Graunke
75709d982b iris: fix TBO alignment to match 965 2019-02-21 10:26:07 -08:00
Kenneth Graunke
77b9219818 iris: unbind compiled shaders if none are present
avoids the case where you have a stale compiled shader bound, but no
uncompiled shader bound, which is not just boats, but an entire marina
2019-02-21 10:26:07 -08:00
Kenneth Graunke
fd5ed7b46b iris: shorten loop
num_ubos doesn't include Tim's magic UBO for regular uniforms, so +1
2019-02-21 10:26:07 -08:00
Kenneth Graunke
bf795b0244 iris: emit binding table for atomic counters and SSBOs 2019-02-21 10:26:07 -08:00
Kenneth Graunke
2d5f545464 iris: implement set_shader_buffers
for SSBOs/ABOs.  We just stream out SURFACE_STATE for now...since it's
a set_* API...and the buffer offset may change...not sure where else
we'd do it.
2019-02-21 10:26:07 -08:00
Kenneth Graunke
541cb60e7e iris: export get_shader_info 2019-02-21 10:26:07 -08:00
Kenneth Graunke
f0558ca22c iris: fix msaa flipping filters 2019-02-21 10:26:07 -08:00
Kenneth Graunke
2c73d7e3f1 iris: expose more things that we already support 2019-02-21 10:26:07 -08:00
Kenneth Graunke
5b8dd5f303 iris: fix blorp filters
we have to switch to blorp enums after the rebase, but also we were
probably doing it wrong for MSAA before this.
2019-02-21 10:26:07 -08:00
Kenneth Graunke
3aa1fcc65a iris: hack around samples confusion 2019-02-21 10:26:07 -08:00
Kenneth Graunke
2c15f38a29 iris: point sprite enables 2019-02-21 10:26:07 -08:00
Kenneth Graunke
c60a4de1f5 iris: reemit blend state for alpha test function changes
fixes bin/fbo-alphatest-formats GL_EXT_texture_snorm
2019-02-21 10:26:07 -08:00
Kenneth Graunke
a4036635b1 iris: fix Z24
This was backwards.

thanks to Jason Ekstrand for realizing that I was seeing the wrong bits.
2019-02-21 10:26:07 -08:00
Kenneth Graunke
a12a370d7b iris: fix EmitNoIndirect
we were using pipe stages, which are ordered dumbly for historical
reasons.  we want gl_shader_stage here.  this got us the wrong options
2019-02-21 10:26:07 -08:00
Kenneth Graunke
5bd861de8b iris: assert about passthrough shaders to make this easier to detect
otherwise it just silently fails and looks like some obscure problem
2019-02-21 10:26:07 -08:00
Kenneth Graunke
5e19885d5a iris: fill out MAX_PATCH_VERTICES 2019-02-21 10:26:07 -08:00
Kenneth Graunke
3e9e3121e5 iris: fix SGVS when there are no valid vertex elements
tessellation nop.shader_test now passes
2019-02-21 10:26:07 -08:00
Kenneth Graunke
5520a54bc5 iris: vertex ID, instance ID 2019-02-21 10:26:07 -08:00
Kenneth Graunke
a9083bdb71 iris: don't emit SO_BUFFERS and SO_DECL_LIST unless streamout is enabled
Otherwise on the first draw, if XFB isn't enabled, we get a pile of
MI_NOOPS where SO_BUFFERS should be
2019-02-21 10:26:07 -08:00
Kenneth Graunke
ebb960c6d3 iris: compile a TCS...don't bother with passthrough yet 2019-02-21 10:26:07 -08:00
Kenneth Graunke
9aa8be3d8e iris: TES program key inputs 2019-02-21 10:26:07 -08:00
Kenneth Graunke
fcee21da6b iris: fix texture buffer stride 2019-02-21 10:26:07 -08:00
Kenneth Graunke
3c41d4cf3f iris: fix sampler views of TBOs
we can't read levels/layers, they're invalid for PIPE_BUFFER
2019-02-21 10:26:07 -08:00
Kenneth Graunke
6e7e49cc4f iris: fix crash 2019-02-21 10:26:07 -08:00
Kenneth Graunke
841fc3e3ca iris: record FS NOS 2019-02-21 10:26:07 -08:00
Kenneth Graunke
d223b316ad iris: NOS mechanics 2019-02-21 10:26:07 -08:00
Kenneth Graunke
a6d480f892 iris: bind state helper function 2019-02-21 10:26:07 -08:00
Kenneth Graunke
48b826cdaf iris: s/hwcso/state/g 2019-02-21 10:26:07 -08:00
Kenneth Graunke
aeb6fc8782 iris: bits of multisample program key 2019-02-21 10:26:07 -08:00
Kenneth Graunke
e6b1cc2106 iris: save query type 2019-02-21 10:26:07 -08:00
Kenneth Graunke
44ba48eba7 iris: draw indirect support? 2019-02-21 10:26:07 -08:00
Kenneth Graunke
b030671298 iris: fix CC_VIEWPORT
I was confusing depth bounds test with depth clamping
2019-02-21 10:26:07 -08:00
Kenneth Graunke
fdbc205552 iris: multislice transfer maps 2019-02-21 10:26:07 -08:00
Kenneth Graunke
44248d16d2 iris: disable 6x MSAA support 2019-02-21 10:26:07 -08:00
Kenneth Graunke
bc1b4db3b3 iris: fix sample mask for MSAA-off 2019-02-21 10:26:07 -08:00
Kenneth Graunke
7b8c0f058e iris: actually pin the buffers 2019-02-21 10:26:07 -08:00
Kenneth Graunke
5635abadef iris: fix SO_DECL_LIST 2019-02-21 10:26:07 -08:00
Kenneth Graunke
dc3b927e97 iris: bother setting program_string_id...
not sure how useful this really is...

./bin/ext_transform_feedback-tessellation triangles flat_first
is hitting a case where we rebind the same VS program, but with
different streamout info...which isn't in the key...but is in the
cache...so we don't rebuild it...
2019-02-21 10:26:07 -08:00
Kenneth Graunke
9c1cefff52 iris: set even if no outputs 2019-02-21 10:26:07 -08:00
Kenneth Graunke
cef0b8b13b iris: streamout 2019-02-21 10:26:07 -08:00
Kenneth Graunke
059c096eff iris: SO buffers 2019-02-21 10:26:07 -08:00
Kenneth Graunke
5c00f5fdca iris: Implement 3DSTATE_SO_DECL_LIST 2019-02-21 10:26:07 -08:00
Kenneth Graunke
6794f1ffb9 iris: rearrange iris_resource.h 2019-02-21 10:26:07 -08:00
Kenneth Graunke
a3f77eceb4 iris: slab allocate transfers
apparently we need this for u_threaded_context
2019-02-21 10:26:07 -08:00
Kenneth Graunke
5165308169 iris: don't crash on shader perf logs 2019-02-21 10:26:07 -08:00
Kenneth Graunke
f20fc950a7 iris: fix depth bounds clamp enables
fixes depthrange-clear among others
2019-02-21 10:26:07 -08:00
Kenneth Graunke
eb274a31bc iris: fix clip flagging on fb changes 2019-02-21 10:26:07 -08:00
Kenneth Graunke
0232fbc2c4 iris: comment out l/a/i/la
in hopes of r/rg fallbacks
2019-02-21 10:26:07 -08:00
Kenneth Graunke
cf34dd7a61 iris: actually handle array layers in blits 2019-02-21 10:26:07 -08:00
Kenneth Graunke
33a17d566f iris: keep DISCARD_RANGE
this isn't really an iris_bo_map flag, but the various resource mappers
want to check for it to avoid making temp copies.
2019-02-21 10:26:07 -08:00
Kenneth Graunke
c0ab9c9890 iris: actually set cube bit properly 2019-02-21 10:26:07 -08:00
Kenneth Graunke
d849501f4c iris: rename map->stride 2019-02-21 10:26:07 -08:00
Kenneth Graunke
36301bbe40 iris: fix zoffset asserts with 2DArray/Cube 2019-02-21 10:26:07 -08:00
Kenneth Graunke
7f39f4843f iris: SBE change stash
not used yet, but want to flag it so I don't forget
2019-02-21 10:26:07 -08:00
Kenneth Graunke
8a080223e6 iris: just malloc one iris_genx_state instead of a bunch of oddball pieces
Things that are gen-specific can go in iris_genx_state.  Things that are
gen-agnostic can go directly in ice->state.
2019-02-21 10:26:07 -08:00
Kenneth Graunke
a7e0edffb6 iris: dead pointer 2019-02-21 10:26:07 -08:00
Kenneth Graunke
ccec5bab5b iris: implement border color, fix other sampler nonsense 2019-02-21 10:26:07 -08:00
Kenneth Graunke
8a16249285 iris: border color memory zone :(
They took away our pointer bits, so now we need a pile of special code
to handle this instead of just using u_upload_mgr. :(
2019-02-21 10:26:07 -08:00
Kenneth Graunke
1c19e3b21f iris: don't include binder in surface VMA range 2019-02-21 10:26:07 -08:00
Kenneth Graunke
1cea195a95 iris: state ref tuple 2019-02-21 10:26:07 -08:00
Kenneth Graunke
c0e80a8d0a iris: null surface for unbound textures
avoids crashes...may not be really right
2019-02-21 10:26:07 -08:00
Kenneth Graunke
d358a4a040 iris: depth clears 2019-02-21 10:26:07 -08:00
Kenneth Graunke
470fb01a7a iris: fix GS dispatch mode 2019-02-21 10:26:07 -08:00
Kenneth Graunke
01483c7933 iris: fix 3DSTATE_VERTEX_ELEMENTS / VF_INSTANCING for 0 elements 2019-02-21 10:26:07 -08:00
Kenneth Graunke
4c9067ae1d iris: don't emit garbage 3DSTATE_VERTEX_BUFFERS when there aren't any 2019-02-21 10:26:07 -08:00
Kenneth Graunke
adf0c20461 iris: geometry shader support 2019-02-21 10:26:07 -08:00
Kenneth Graunke
de08ac9b0f iris: TES uniform fixes
not that we have a TES, but...
2019-02-21 10:26:07 -08:00
Kenneth Graunke
d207f97840 iris: larger polygon offset 2019-02-21 10:26:07 -08:00
Kenneth Graunke
5188e54e97 iris: fix provoking vertex ordering
had this backwards
2019-02-21 10:26:07 -08:00
Kenneth Graunke
cbbd6a61c4 iris: maybe-flush before blorp operations
otherwise if we have a lot of back-to-back blorp operations we can
potentially overflow even the chained batch
2019-02-21 10:26:07 -08:00
Kenneth Graunke
e0f3971280 iris: lightmodel flat 2019-02-21 10:26:07 -08:00
Kenneth Graunke
4d04111bfb iris: implement copy image 2019-02-21 10:26:07 -08:00
Kenneth Graunke
40fd2fd603 iris: fall back to u_generate_mipmap
It just does blits between layers, which is all we'd do anyway,
and it already should use BLORP because of iris_blit().  Plus it
handles 3D, which our code in i965 doesn't.
2019-02-21 10:26:07 -08:00
Kenneth Graunke
6cf04c6ded iris: clear fix 2019-02-21 10:26:07 -08:00
Kenneth Graunke
d416b81779 iris: shader dirty bits 2019-02-21 10:26:07 -08:00
Kenneth Graunke
b7cd3a083a iris: rework DEBUG_REEMIT
don't want to have to special case this everywhere
2019-02-21 10:26:07 -08:00
Kenneth Graunke
72416a2d0d iris: clears 2019-02-21 10:26:07 -08:00
Kenneth Graunke
eef0d33cee iris: better boxing on maps 2019-02-21 10:26:07 -08:00
Kenneth Graunke
419fac2fc6 iris: fix fragcoord ytransform
the TGSI in the name is a misnomer, it actually controls wpos_ytransform
lowering in NIR these days.
2019-02-21 10:26:07 -08:00
Kenneth Graunke
e67951227d iris: Disable unsupported mirror clamp modes 2019-02-21 10:26:07 -08:00
Kenneth Graunke
234cf647a4 iris: tidy comments about mirroring modes 2019-02-21 10:26:07 -08:00
Kenneth Graunke
a3a998f19a iris: iris - fix QWord aligned endings after batch chaining rework
I need to save the primary batch size after expanding it to include
MI_BATCH_BUFFER_END and the QWord padding NOP
2019-02-21 10:26:07 -08:00
Kenneth Graunke
aacbcbbf47 iris: colorize batchbuffer failures to make them stand out 2019-02-21 10:26:07 -08:00
Kenneth Graunke
8e2b71b190 iris: bad inherited comments 2019-02-21 10:26:07 -08:00
Kenneth Graunke
8c54433275 iris: Handle batch submission failure "better"
We used to not reset the batch, and just keep appending to it, so you'd
get the same invalid contents over and over.

I'd also really like to know about this, so aborting seems wise for now,
if not for the long term
2019-02-21 10:26:07 -08:00
Kenneth Graunke
d0b55ca782 iris: don't always flush 2019-02-21 10:26:07 -08:00
Kenneth Graunke
9226ebfa85 iris: print second batch size separately 2019-02-21 10:26:07 -08:00
Kenneth Graunke
f12b079c0e iris: actually init num_viewports
fixes regressions
2019-02-21 10:26:07 -08:00
Kenneth Graunke
81f899c148 iris: scissor count fixes 2019-02-21 10:26:07 -08:00
Kenneth Graunke
92d6a70853 iris: fix VP iteration 2019-02-21 10:26:07 -08:00
Kenneth Graunke
4a94628513 iris: fix num viewports to be based on programs 2019-02-21 10:26:07 -08:00
Kenneth Graunke
b17215800c iris: fix viewport counts and settings
seeing

   set_viewport_state 0 1
   set_viewport_state 1 15

which gives us a total of 16 viewports, updated incrementally
so keep old values around and update them...
2019-02-21 10:26:07 -08:00
Kenneth Graunke
636cf8971e iris: max VP index 2019-02-21 10:26:07 -08:00
Kenneth Graunke
7cdc6b1173 iris: emit 3DSTATE_SBE_SWIZ 2019-02-21 10:26:07 -08:00
Kenneth Graunke
26db2ea782 iris: avoid crashing on unbound constant resources
instead, read from the workaround BO
2019-02-21 10:26:07 -08:00
Kenneth Graunke
a7770501a7 iris: fix caps so tests run again 2019-02-21 10:26:07 -08:00
Kenneth Graunke
a6aeca9727 iris: fix major refcounting bug with resources
DONTBLOCK -> NULL was happening after taking a reference, causing those
to live forever

This resolves the OOM problems
2019-02-21 10:26:07 -08:00
Kenneth Graunke
49f9c88801 iris: support signed vertex buffer offsets 2019-02-21 10:26:07 -08:00
Kenneth Graunke
0a43c9defa iris: print refcounts in INTEL_DEBUG=submit 2019-02-21 10:26:07 -08:00
Kenneth Graunke
7d1e6f1fa1 iris: redo VB CSO a bit 2019-02-21 10:26:07 -08:00
Kenneth Graunke
432790bacd iris: print binder utilization in INTEL_DEBUG=submit 2019-02-21 10:26:07 -08:00
Kenneth Graunke
f8179dc760 iris: clean up some warnings so I can see through the noise 2019-02-21 10:26:07 -08:00
Kenneth Graunke
5f3a7ee701 iris: use pipe resources not direct BOs 2019-02-21 10:26:07 -08:00
Kenneth Graunke
5619c15ecc iris: indentation 2019-02-21 10:26:07 -08:00
Kenneth Graunke
27d45eb2f2 iris: don't leak keyboxes when searching for an existing program 2019-02-21 10:26:07 -08:00
Kenneth Graunke
7d504f3d52 iris: don't leak sampler state table resources 2019-02-21 10:26:07 -08:00
Kenneth Graunke
8e186cef2c iris: rzalloc iris_compiled_shader so memcmp works even if padding creeps in 2019-02-21 10:26:07 -08:00
Kenneth Graunke
5f722bf7c4 iris: remove 4 bytes of padding in iris_compiled_shader 2019-02-21 10:26:07 -08:00
Kenneth Graunke
0db86016f7 iris: pc fixes 2019-02-21 10:26:07 -08:00
Kenneth Graunke
f9f8ea7070 iris: more leak fixes 2019-02-21 10:26:07 -08:00
Kenneth Graunke
c763ecaa65 iris: plug leaks 2019-02-21 10:26:07 -08:00
Kenneth Graunke
477ea6c39a iris: clear dirty 2019-02-21 10:26:07 -08:00
Kenneth Graunke
23987df412 iris: some dirty fixes
two scissor bits, constants not being flagged, ZeroRTA, clip not being
flagged
2019-02-21 10:26:07 -08:00
Kenneth Graunke
ccf37c7da9 iris: bindings dirty tracking 2019-02-21 10:26:07 -08:00
Kenneth Graunke
bbc6d15b59 iris: flag DIRTY_WM properly 2019-02-21 10:26:06 -08:00
Kenneth Graunke
3f863cf680 iris: fix the validation list on new batches 2019-02-21 10:26:06 -08:00
Kenneth Graunke
80dee31846 iris: save pointers to streamed state resources
will be used for cross-batch validation list fixing
2019-02-21 10:26:06 -08:00
Kenneth Graunke
daceb04bc0 iris: put back the always flush - fixes some things :( 2019-02-21 10:26:06 -08:00
Kenneth Graunke
149408a360 iris: untested SAMPLER_STATE pin BO fix 2019-02-21 10:26:06 -08:00
Kenneth Graunke
de782e5b39 iris: delete some pointless STATIC_ASSERTS
these were useful when I was patching relocs
2019-02-21 10:26:06 -08:00
Kenneth Graunke
3eebea88dc iris: untested index buffer upload 2019-02-21 10:26:06 -08:00
Kenneth Graunke
9247546181 iris: state cleaning 2019-02-21 10:26:06 -08:00
Kenneth Graunke
7c40cdc12f iris: comment about reemitting and flushing 2019-02-21 10:26:06 -08:00
Kenneth Graunke
d46c5b7c6c iris: allow mapped buffers during execution (faster) 2019-02-21 10:26:06 -08:00
Kenneth Graunke
92de0f5aa6 iris: disable __gen_validate_value in release mode 2019-02-21 10:26:06 -08:00
Kenneth Graunke
08d1f13818 iris: drop assert for now 2019-02-21 10:26:06 -08:00
Kenneth Graunke
a9e357caac iris: fix release builds 2019-02-21 10:26:06 -08:00
Kenneth Graunke
73f3c2cad0 iris: better VFI 2019-02-21 10:26:06 -08:00
Chris Wilson
2cbd42cddd iris: IndexFormat = size/2
brw uses:
  IndexFormat = index_size >> 1

anv uses:
  IndexFromat = index_type[index_size]
2019-02-21 10:26:06 -08:00
Kenneth Graunke
5dcf62bb43 iris: use u_transfer helpers for now 2019-02-21 10:26:06 -08:00
Kenneth Graunke
48dc8bd4b0 iris: fix pull bufs that aren't the first user upload 2019-02-21 10:26:06 -08:00
Kenneth Graunke
eed7f7253e iris: fill out pull constant buffers 2019-02-21 10:26:06 -08:00
Kenneth Graunke
90046b43cc iris: make surface states for cbufs 2019-02-21 10:26:06 -08:00
Kenneth Graunke
4e007dbb30 iris: have more than one const_offset 2019-02-21 10:26:06 -08:00
Kenneth Graunke
9ea05ccf1f iris: completely rewrite binder
now we get a new one per batch, and flush if it fills up
2019-02-21 10:26:06 -08:00
Kenneth Graunke
26cc609927 iris: better ubo handling 2019-02-21 10:26:06 -08:00
Chris Wilson
a504b98e72 iris: fix import from dri2/3 2019-02-21 10:26:06 -08:00
Kenneth Graunke
badefe50a0 iris: fix constant packet length to match i965 2019-02-21 10:26:06 -08:00
Kenneth Graunke
201a4d923c iris: maybe slightly less boats uniforms 2019-02-21 10:26:06 -08:00
Kenneth Graunke
a6dd9caf0d iris: flush always 2019-02-21 10:26:06 -08:00
Kenneth Graunke
04d1a3a7de iris: transfers 2019-02-21 10:26:06 -08:00
Kenneth Graunke
7437c28c0d iris: util_copy_framebuffer_state (ported from Rob's v3d patches) 2019-02-21 10:26:06 -08:00
Kenneth Graunke
f6017da83f iris: fix VF INSTANCING length 2019-02-21 10:26:06 -08:00
Kenneth Graunke
7fb7704b2e iris: more depth stuffs...
still missing stencil
2019-02-21 10:26:06 -08:00
Kenneth Graunke
02890c75b5 iris: fix 3DSTATE_VERTEX_ELEMENTS length 2019-02-21 10:26:06 -08:00
Kenneth Graunke
601ee4c189 iris: fix whitespace 2019-02-21 10:26:06 -08:00
Kenneth Graunke
4d24874236 iris: Lower the max number of decoded VBO lines
saint foo, vbo lines!
2019-02-21 10:26:06 -08:00
Kenneth Graunke
48ddd7212d iris: fix decoding and undo testing code 2019-02-21 10:26:06 -08:00
Kenneth Graunke
f31eea1f00 iris: fix batch chaining...
don't chain a batch just for the end
2019-02-21 10:26:06 -08:00
Kenneth Graunke
5b914a6d58 iris: caps 2019-02-21 10:26:06 -08:00
Kenneth Graunke
604a1a1614 iris: chaining not growing 2019-02-21 10:26:06 -08:00
Kenneth Graunke
053fb51125 iris: just turn batch reset_and_clear_caches into reset 2019-02-21 10:26:06 -08:00
Kenneth Graunke
ca735c5e0c iris: delete growing code and just die for now
we need proper batch chaining.  without relocations, we can't grow,
since we've only allocated so much VMA for the batch, and the mechanism
only works if we can pin it at the old address
2019-02-21 10:26:06 -08:00
Kenneth Graunke
7167c6d508 iris: blorp bug fixes
I wrote this earlier, but it got lost somehow...
2019-02-21 10:26:06 -08:00
Kenneth Graunke
3650f8dfa1 iris: properly reject formats, fixes RGB32 rendering with texture float 2019-02-21 10:26:06 -08:00
Kenneth Graunke
4510098b9c iris: proper # of uniforms
or at least closer...we were using bytes, we want 256-bit units...
2019-02-21 10:26:06 -08:00
Kenneth Graunke
6091dc470f iris: proper length for VE packet? 2019-02-21 10:26:06 -08:00
Kenneth Graunke
64a3f7423a iris: uniforms for VS 2019-02-21 10:26:06 -08:00
Kenneth Graunke
d4a64e0a64 iris: bump GL version to 4.2 2019-02-21 10:26:06 -08:00
Kenneth Graunke
44993d451c iris: some depth stuff :( 2019-02-21 10:26:06 -08:00
Kenneth Graunke
eb12cc70f0 iris: assert surf init 2019-02-21 10:26:06 -08:00
Kenneth Graunke
a4a426008b iris: no more drawing rectangle in blorp
there's some bug here as Jason's patches for only emitting 3DS_DR once
got reverted by Mark later on, apparently they regressed MSAA tests.

need to sort that out.
2019-02-21 10:26:06 -08:00
Kenneth Graunke
0e3870b9de iris: blorp URB 2019-02-21 10:26:06 -08:00
Kenneth Graunke
01fe6df0ed iris: make blorp pin the binder 2019-02-21 10:26:06 -08:00
Kenneth Graunke
063fc7bbb0 iris: linear staging buffers - fast CPU access... 2019-02-21 10:26:06 -08:00
Kenneth Graunke
84abf77c67 iris: hacky flushing for now 2019-02-21 10:26:06 -08:00
Kenneth Graunke
75a1639262 iris: drop the 48b printout, we never use anything else 2019-02-21 10:26:06 -08:00
Kenneth Graunke
86d7fd71f4 iris: add INTEL_DEBUG=reemit 2019-02-21 10:26:06 -08:00
Kenneth Graunke
b8a11ad256 iris: fix blorp prog data crashes 2019-02-21 10:26:06 -08:00
Kenneth Graunke
e2ba98ba39 iris: more blorp 2019-02-21 10:26:06 -08:00
Kenneth Graunke
1bba60a4bf iris: fix sampler view crashes 2019-02-21 10:26:06 -08:00
Kenneth Graunke
e22da1e7b1 iris: drop bogus binder free
I was malloc'ing it but then I changed my mind and embedded it directly
2019-02-21 10:26:06 -08:00
Kenneth Graunke
698d45b725 iris: more blitting code to make readpixels work 2019-02-21 10:26:06 -08:00
Kenneth Graunke
c9d9e44720 iris: bits of blorp code 2019-02-21 10:26:06 -08:00
Kenneth Graunke
79466c1313 iris: move bo_offset_from_sba
for wider use
2019-02-21 10:26:06 -08:00
Kenneth Graunke
60d708bb80 iris: copy over i965's cache tracking
needed to split out vtbl so I can pipe control without ice
2019-02-21 10:26:06 -08:00
Kenneth Graunke
dbd4770397 iris: pull in newer comments 2019-02-21 10:26:06 -08:00
Kenneth Graunke
841b3b9003 iris: Defines for base addresses rather than numbers everywhere 2019-02-21 10:26:06 -08:00
Kenneth Graunke
c75a1254a4 iris: Move get_command_space to iris_batch.c
for reuse in blorp.  it's a better interface anyway.
2019-02-21 10:26:06 -08:00
Kenneth Graunke
39e795d473 iris: fix texturing! 2019-02-21 10:26:06 -08:00
Kenneth Graunke
4929f020c3 iris: better SBE 2019-02-21 10:26:06 -08:00
Kenneth Graunke
8bf167c9e9 iris: vma - fix assert 2019-02-21 10:26:06 -08:00
Kenneth Graunke
10e4f1e68c iris: vma fixes - don't free binder address 2019-02-21 10:26:06 -08:00
Kenneth Graunke
5a101e6434 iris: bo reuse 2019-02-21 10:26:06 -08:00
Kenneth Graunke
21acc00490 iris: crazy pipe control code
imported from ~kwg/mesa pcx-2, gen < 8 code dropped
2019-02-21 10:26:06 -08:00
Kenneth Graunke
87aa880795 iris: fixes 2019-02-21 10:26:06 -08:00
Kenneth Graunke
3fbf7294b1 iris: fixes from i965 2019-02-21 10:26:06 -08:00
Kenneth Graunke
999ed6e213 iris: port bug fix from i965 2019-02-21 10:26:05 -08:00
Kenneth Graunke
19d11a6df3 iris: fix index 2019-02-21 10:26:05 -08:00
Kenneth Graunke
010e845af7 iris: increase allocator alignment 2019-02-21 10:26:05 -08:00
Kenneth Graunke
35afa8c8f3 iris: better BT asserts
Probably nothing is working because texture upload isn't implemented
2019-02-21 10:26:05 -08:00
Kenneth Graunke
0148bd6839 iris: decoder fixes 2019-02-21 10:26:05 -08:00
Kenneth Graunke
5d2673ba7e iris: set sampler views 2019-02-21 10:26:05 -08:00
Kenneth Graunke
34164ce622 iris: isv freeing fixes 2019-02-21 10:26:05 -08:00
Kenneth Graunke
012154c20f iris: TES stash
TODO: key setup
2019-02-21 10:26:05 -08:00
Kenneth Graunke
d890aee15d iris: SBA once at context creation, not per batch
hooray!
2019-02-21 10:26:05 -08:00
Kenneth Graunke
e0eac28bd4 iris: fix a scissor bug 2019-02-21 10:26:05 -08:00
Kenneth Graunke
0707ff3f2f iris: assemble SAMPLER_STATE table at bind time
It's useless to allocate SAMPLER_STATEs in GPU memory on creation like
we do for SURFACE_STATES, because they need to be organized into a
contiguous block of memory.  But we can do that at bind time, rather
than draw time.
2019-02-21 10:26:05 -08:00
Kenneth Graunke
199c080926 iris: same treatment for sampler views 2019-02-21 10:26:05 -08:00
Kenneth Graunke
f51204a160 iris: allocate SURFACE_STATEs up front and stop streaming them 2019-02-21 10:26:05 -08:00
Kenneth Graunke
bf90d8a125 iris: delete more trash 2019-02-21 10:26:05 -08:00
Kenneth Graunke
1398c99aff iris: canonicalize addresses.
Back to working!  Woo!
2019-02-21 10:26:05 -08:00
Kenneth Graunke
b69a85bc4d iris: validation dumping improvements
backported from i965.  don't bother with (pinned) because everything is.
2019-02-21 10:26:05 -08:00
Kenneth Graunke
24bcf1054b iris: update vb BO handling now that we have softpin 2019-02-21 10:26:05 -08:00
Kenneth Graunke
9ac81f1890 iris: decoder fixes 2019-02-21 10:26:05 -08:00
Kenneth Graunke
9955e8334b iris: binder fixes 2019-02-21 10:26:05 -08:00
Kenneth Graunke
65073c2217 iris: hook up batch decoder 2019-02-21 10:26:05 -08:00
Kenneth Graunke
6cbd1d1692 iris: binders 2019-02-21 10:26:05 -08:00
Kenneth Graunke
209692c716 iris: include p_defines.h in iris_bufmgr.h
for PIPE_TRANSFER_WRITE and friends
2019-02-21 10:26:05 -08:00
Kenneth Graunke
1af84d345a iris: set EXEC_OBJECT_WRITE 2019-02-21 10:26:05 -08:00
Kenneth Graunke
651be7cf3d iris: rewrite to use memzones and not relocs 2019-02-21 10:26:05 -08:00
Kenneth Graunke
68229caa38 iris: more uploaders 2019-02-21 10:26:05 -08:00
Kenneth Graunke
3861d24e23 iris: Also set SUPPORTS_48B? Not sure if necessary. 2019-02-21 10:26:05 -08:00
Kenneth Graunke
e95ad5994a iris: dump gtt offset in dump_validation_list 2019-02-21 10:26:05 -08:00
Kenneth Graunke
d78be0188e iris: fix icache memzone 2019-02-21 10:26:05 -08:00
Kenneth Graunke
e4aa8338c3 iris: Soft-pin the universe
Breaks everything, woo!
2019-02-21 10:26:05 -08:00
Kenneth Graunke
3693307670 iris: some thinking about binding tables 2019-02-21 10:26:05 -08:00
Kenneth Graunke
f6be3d4f3a iris: bufmgr updates.
Drop BO_ALLOC_BUSY (best not to hand people a loaded gun...)
Drop vestiges of alignment
2019-02-21 10:26:05 -08:00
Kenneth Graunke
902a122404 iris: stop adding 9 to our varyings 2019-02-21 10:26:05 -08:00
Kenneth Graunke
a235da3e68 iris: set strides on transfers 2019-02-21 10:26:05 -08:00
Kenneth Graunke
6891f70d87 iris: enable a few more formats 2019-02-21 10:26:05 -08:00
Kenneth Graunke
7130c43d96 iris: decode batches if they fail to submit 2019-02-21 10:26:05 -08:00
Kenneth Graunke
23367688e9 iris: NOOP pad batches correctly 2019-02-21 10:26:05 -08:00
Kenneth Graunke
f3150e9ecd iris: warn if execbuf fails 2019-02-21 10:26:05 -08:00
Kenneth Graunke
a50a3a8edf iris: uniform bits...badly 2019-02-21 10:26:05 -08:00
Kenneth Graunke
213b70a222 iris: sample mask...not 0.
We now have a first triangle!
2019-02-21 10:26:05 -08:00
Kenneth Graunke
1a6bb266cf iris: write DISABLES are not write ENABLES...whoops 2019-02-21 10:26:05 -08:00
Kenneth Graunke
50a2596f46 iris: fix extents 2019-02-21 10:26:05 -08:00
Kenneth Graunke
ffcd84f55a iris: catastrophic state pointer mistake 2019-02-21 10:26:05 -08:00
Kenneth Graunke
1739dc0d5e iris: more SF CL VPs 2019-02-21 10:26:05 -08:00
Kenneth Graunke
ade381fb9c iris: fix dmabuf retval comparisons
0 means success
2019-02-21 10:26:05 -08:00
Kenneth Graunke
ed42ae2f9b iris: more sketchy SBE 2019-02-21 10:26:05 -08:00
Kenneth Graunke
9be4b3baaf iris: compctrl
oh, also run things
2019-02-21 10:26:05 -08:00
Kenneth Graunke
db15993cfd iris: actually pin the instruction cache buffers 2019-02-21 10:26:05 -08:00
Kenneth Graunke
bda9a77b47 iris: smaller blend state 2019-02-21 10:26:05 -08:00
Kenneth Graunke
f9d834d588 iris: don't do samplers for disabled stages 2019-02-21 10:26:05 -08:00
Kenneth Graunke
e21bddeb4f iris: render targets! 2019-02-21 10:26:05 -08:00
Kenneth Graunke
8503578e82 iris: fix silly unused batch with addr macro 2019-02-21 10:26:05 -08:00
Kenneth Graunke
352ec1f378 iris: warning fixes 2019-02-21 10:26:05 -08:00
Kenneth Graunke
54ba8a60d5 iris: basic SBE code 2019-02-21 10:26:05 -08:00
Kenneth Graunke
5af16f5e20 iris: alpha testing in PSB 2019-02-21 10:26:05 -08:00
Kenneth Graunke
c96132d5fd iris: blend state 2019-02-21 10:26:05 -08:00
Kenneth Graunke
bb3c0be7a8 iris: dummy constants 2019-02-21 10:26:05 -08:00
Kenneth Graunke
538decc0de iris: URB configs. 2019-02-21 10:26:05 -08:00
Kenneth Graunke
b1115799e6 iris: actually set KSP offsets 2019-02-21 10:26:05 -08:00
Kenneth Graunke
6f1c07d7dd iris: actually softpin at an address 2019-02-21 10:26:05 -08:00
Kenneth Graunke
acdff2f9a6 iris: actually destroy the cache 2019-02-21 10:26:05 -08:00
Kenneth Graunke
9437e135ed iris: rewrite program cache to use u_upload_mgr 2019-02-21 10:26:05 -08:00
Kenneth Graunke
67ca2be992 iris: no NEW_SBA 2019-02-21 10:26:05 -08:00
Kenneth Graunke
e7a729ba34 iris: shuffle comments 2019-02-21 10:26:05 -08:00
Kenneth Graunke
6ecc93f764 iris: bits of WM key 2019-02-21 10:26:05 -08:00
Kenneth Graunke
bba13b1501 iris: move key pop to state module
shader key population needs to read state
2019-02-21 10:26:05 -08:00
Kenneth Graunke
5864c9414a iris: fix SBA 2019-02-21 10:26:05 -08:00
Kenneth Graunke
5ae278da18 iris: use vtbl to avoid multiple symbols, fix state base address 2019-02-21 10:26:05 -08:00
Kenneth Graunke
876417f9e8 iris: softpin some things 2019-02-21 10:26:05 -08:00
Kenneth Graunke
c493fee73f iris: drop const from prog data parameters
we ralloc steal things, which makes it a little bogus
2019-02-21 10:26:05 -08:00
Kenneth Graunke
cf7ba838ad iris: more comes from bits filled in
tomorrow, fix the build system to avoid symbol clashes somehow...
we're getting gen9 functions because they happen to be listed before 10
in the link list.
2019-02-21 10:26:05 -08:00
Kenneth Graunke
8dffc9b195 iris: index buffer BO 2019-02-21 10:26:05 -08:00
Kenneth Graunke
8665dfd602 iris: WM.
I could have added a dirty bit for this, but it doesn't seem worth it
2019-02-21 10:26:05 -08:00
Kenneth Graunke
bae5414594 iris: initial gpu state 2019-02-21 10:26:05 -08:00
Kenneth Graunke
0477591355 iris: reorganize commands to match brw 2019-02-21 10:26:05 -08:00
Kenneth Graunke
3e684d0eb7 iris: don't forget about TE 2019-02-21 10:26:05 -08:00
Kenneth Graunke
d71d2028ef iris: convert IRIS_DIRTY_* to #defines
enums are SIGNED.  so IRIS_DIRTY_VS << 4 gets sign extended, making it
not equal to IRIS_DIRTY_FS.  Surprising!
2019-02-21 10:26:05 -08:00
Kenneth Graunke
cfd5fcc256 iris: emit shader packets 2019-02-21 10:26:05 -08:00
Kenneth Graunke
1cf21cc813 iris: actually save derived state 2019-02-21 10:26:05 -08:00
Kenneth Graunke
57c1b71418 iris: promote iris_program_cache_item to iris_compiled_shader 2019-02-21 10:26:05 -08:00
Kenneth Graunke
581459a9fe iris: some shader bits 2019-02-21 10:26:05 -08:00
Kenneth Graunke
df401aaa11 iris: scissor slots 2019-02-21 10:26:05 -08:00
Kenneth Graunke
dc4453d886 iris: bind_state -> compute state 2019-02-21 10:26:05 -08:00
Kenneth Graunke
2f100c6e31 iris: 3DPRIMITIVE fields 2019-02-21 10:26:05 -08:00
Kenneth Graunke
b3646e2b48 iris: fix VF instancing length so we don't get garbage in batch 2019-02-21 10:26:05 -08:00
Kenneth Graunke
317263ab11 iris: vertex packet fixes 2019-02-21 10:26:05 -08:00
Kenneth Graunke
129fae5a90 iris: fix VBs 2019-02-21 10:26:05 -08:00
Kenneth Graunke
fc5ddc64f9 iris: fix assert 2019-02-21 10:26:05 -08:00
Kenneth Graunke
e91289908a iris: fix indentation 2019-02-21 10:26:05 -08:00
Kenneth Graunke
41b32a4eda iris: hack to stop crashing on samplers for now 2019-02-21 10:26:05 -08:00
Kenneth Graunke
dcfb06375a iris: initialize dirty bits to ~0ull 2019-02-21 10:26:05 -08:00
Kenneth Graunke
0a513d63a1 iris: actually advance forward when emitting commands 2019-02-21 10:26:05 -08:00
Kenneth Graunke
24cc627612 iris: actually flush the commands 2019-02-21 10:26:05 -08:00
Kenneth Graunke
082911409e iris: actually APPEND commands, not stomp over the top and never incr 2019-02-21 10:26:05 -08:00
Kenneth Graunke
b332ff489c iris: VB fixes 2019-02-21 10:26:05 -08:00
Kenneth Graunke
50b1e01996 iris: DEBUG=bat
Deleted in the interest of making the branch compile at each step
2019-02-21 10:26:05 -08:00
Kenneth Graunke
6e01bc0637 iris: VB addresses 2019-02-21 10:26:05 -08:00
Kenneth Graunke
b574b56325 iris: reference VB BOs 2019-02-21 10:26:05 -08:00
Kenneth Graunke
4dc683f64b iris: so, sba then. 2019-02-21 10:26:05 -08:00
Kenneth Graunke
d900a235b1 iris: try and have an iris address 2019-02-21 10:26:05 -08:00
Kenneth Graunke
f31ae76216 iris: flag SBA updates when instruction BO changes 2019-02-21 10:26:05 -08:00
Kenneth Graunke
7d90cc8da4 iris: bit of SBA code
genxml MOCS is stupid, addresses are hard news at 11
2019-02-21 10:26:05 -08:00
Kenneth Graunke
ff5c886fb3 iris: move MAX defines to iris_batch.h
for SBA
2019-02-21 10:26:05 -08:00
Kenneth Graunke
7bfc8f7d7d iris: kill iris_new_batch
reset and new are too similar, and this had exactly one caller
2019-02-21 10:26:05 -08:00
Kenneth Graunke
b701096ab9 iris: make iris_batch target a particular ring 2019-02-21 10:26:05 -08:00
Kenneth Graunke
64f043570d iris: lower io 2019-02-21 10:26:05 -08:00
Kenneth Graunke
695bd55d1a iris: do the FS...asserts because we don't lower uniforms yet 2019-02-21 10:26:05 -08:00
Kenneth Graunke
6aa15cadf3 iris: import program cache code 2019-02-21 10:26:05 -08:00
Kenneth Graunke
4525dda75f iris: reworks, FS compile pieces 2019-02-21 10:26:05 -08:00
Kenneth Graunke
628a71c2e3 iris: parse INTEL_DEBUG 2019-02-21 10:26:05 -08:00
Kenneth Graunke
d62b0b9ee8 iris: draw->restart_index is uninitialized if PR is not enabled 2019-02-21 10:26:05 -08:00
Kenneth Graunke
5fad62cef1 iris: fix bogus index buffer reference 2019-02-21 10:26:05 -08:00
Kenneth Graunke
95fe254cf2 iris: fix prim type 2019-02-21 10:26:05 -08:00
Kenneth Graunke
793276cd8b iris: msaa sample count packing problems
0 -> ffffffffffffffffffffffffffff
2019-02-21 10:26:05 -08:00
Kenneth Graunke
0252fb36e9 iris: actually save VBs 2019-02-21 10:26:05 -08:00
Kenneth Graunke
ed6ee3e270 iris: fix/rework line stipple 2019-02-21 10:26:05 -08:00
Kenneth Graunke
231935efa2 iris: init the batch! 2019-02-21 10:26:05 -08:00
Kenneth Graunke
9ca58ca517 iris: delete iris_pipe.c, shuffle code around 2019-02-21 10:26:05 -08:00
Kenneth Graunke
455e2d6dce iris: disable execbuf for now 2019-02-21 10:26:05 -08:00
Kenneth Graunke
86e0c08b14 iris: make an ice->render_batch field
we may want a second one for transfers
2019-02-21 10:26:05 -08:00
Kenneth Graunke
ffd7f13b4d iris: drop unused field 2019-02-21 10:26:05 -08:00
Kenneth Graunke
8097dc9dd9 iris: shader debug log 2019-02-21 10:26:05 -08:00
Kenneth Graunke
6c7a276470 iris: maps 2019-02-21 10:26:05 -08:00
Kenneth Graunke
49896861ce iris: linear resources 2019-02-21 10:26:05 -08:00
Kenneth Graunke
c820f5a4bd iris: some program code 2019-02-21 10:26:04 -08:00
Kenneth Graunke
d48dc416fa iris: basic push constant alloc 2019-02-21 10:26:04 -08:00
Kenneth Graunke
21c016b496 iris: emit 3DSTATE_SAMPLER_STATE_POINTERS 2019-02-21 10:26:04 -08:00
Kenneth Graunke
7b80f4587d iris: sampler states 2019-02-21 10:26:04 -08:00
Kenneth Graunke
60208d12b4 iris: COLOR_CALC_STATE 2019-02-21 10:26:04 -08:00
Kenneth Graunke
9367c44639 iris: fix crash - CSO binding can be NULL (when destroying context) 2019-02-21 10:26:04 -08:00
Kenneth Graunke
efea4d96d9 iris: some draw info, vbs, sample mask 2019-02-21 10:26:04 -08:00
Kenneth Graunke
d6ad9f4732 iris: a bit of depth
still need to allocate separate stencil
2019-02-21 10:26:04 -08:00
Kenneth Graunke
7abe5aefd3 iris: fix SF_CL length 2019-02-21 10:26:04 -08:00
Kenneth Graunke
c1c6c3a18a iris: don't segfault on !old_cso 2019-02-21 10:26:04 -08:00
Kenneth Graunke
3eadb1b3a1 iris: framebuffers 2019-02-21 10:26:04 -08:00
Kenneth Graunke
e7c9bddda7 iris: stipples and vertex elements 2019-02-21 10:26:04 -08:00
Kenneth Graunke
d0aab78dc3 iris: sampler views 2019-02-21 10:26:04 -08:00
Kenneth Graunke
831d630b8b iris: Surfaces! 2019-02-21 10:26:04 -08:00
Kenneth Graunke
4ec5f8be3e iris: SF_CLIP_VIEWPORT 2019-02-21 10:26:04 -08:00
Kenneth Graunke
970836c34e iris: scissors 2019-02-21 10:26:04 -08:00
Kenneth Graunke
7c875deaf0 iris: RASTER + SF + some CLIP, fix DIRTY vs. NEW 2019-02-21 10:26:04 -08:00
Kenneth Graunke
02f583b0a0 iris: initial gpu state, merges 2019-02-21 10:26:04 -08:00
Kenneth Graunke
a13d417ac1 iris: merge pack
this lets us merge dynamic and pre-baked state, also like anv
2019-02-21 10:26:04 -08:00
Kenneth Graunke
aee39df710 iris: packing with valgrind.
borrowed macros from anv!
2019-02-21 10:26:04 -08:00
Kenneth Graunke
d3d6ef37f6 iris: initial render state upload 2019-02-21 10:26:04 -08:00
Kenneth Graunke
26fb5a8ae2 iris: port over batchbuffer updates 2019-02-21 10:26:04 -08:00
Kenneth Graunke
14ca30507f iris: viewport state, sort of 2019-02-21 10:26:04 -08:00
Kenneth Graunke
2dce0e94a3 iris: Initial commit of a new 'iris' driver for Intel Gen8+ GPUs.
This commit introduces a new Gallium driver for Intel Gen8+ GPUs,
named 'iris_dri.so' after the hardware.

Developed by:
- Kenneth Graunke (overall driver)
- Dave Airlie (shaders, conditional render, overflow query, Gen8 port)
- Chris Wilson (fencing, pinned memory, ...)
- Jordan Justen (compute shaders)
- Jason Ekstrand (image load store)
- Caio Marcelo de Oliveira Filho (tessellation control passthrough)
- Rafael Antognolli (auxiliary buffer fixes)
- The rest of the i965 contributors and the Mesa community
2019-02-21 10:26:04 -08:00
James Zhu
eac822eac1 gallium/auxiliary/vl: Fix transparent issue on compute shader with rgba
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109646
Problem 1,4: they are caused by imcomplete blend comute shader
implementation. So Reverts rgba back to frament shader.

Fixes: 9364d66cb7 (Add video compositor compute shader render)
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Tested-by: Bruno Milreu <bmilreu@gmail.com>
2019-02-21 13:11:53 -05:00
Lionel Landwerlin
20c370c6b1 vulkan: add an overlay layer
Just a starting point to display frame timings & drawcalls/submissions
per frame.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
+1-by: Mike Lothian <mike@fireburn.co.uk>
+1-by: Tapani Pälli <tapani.palli@intel.com>
+1-by: Eric Engestrom <eric.engestrom@intel.com>
+1-by: Yurii Kolesnykov <root@yurikoles.com>
+1-by: myfreeweb <greg@unrelenting.technology>
+1-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-21 18:06:05 +00:00
Lionel Landwerlin
89f03d1872 imgui: make sure our copy of imgui doesn't clash with others in the same process
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
+1-by: Mike Lothian <mike@fireburn.co.uk>
+1-by: Tapani Pälli <tapani.palli@intel.com>
+1-by: Eric Engestrom <eric.engestrom@intel.com>
+1-by: Yurii Kolesnykov <root@yurikoles.com>
+1-by: myfreeweb <greg@unrelenting.technology>
+1-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-21 18:06:05 +00:00
Lionel Landwerlin
3950e7c11e imgui: bump copy
Updated at :

commit f977871854af941289f2a9090dcc90f7aa3449a8
Author: omar <omarcornut@gmail.com>
Date:   Fri Feb 15 13:10:22 2019 +0100

    ImFont: Minor adjustment to the structure.
    Examples: Removed unused variable.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
+1-by: Mike Lothian <mike@fireburn.co.uk>
+1-by: Tapani Pälli <tapani.palli@intel.com>
+1-by: Eric Engestrom <eric.engestrom@intel.com>
+1-by: Yurii Kolesnykov <root@yurikoles.com>
+1-by: myfreeweb <greg@unrelenting.technology>
+1-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-21 18:06:05 +00:00
Lionel Landwerlin
51047cd2e8 build: move imgui out of src/intel/tools to be reused
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
+1-by: Mike Lothian <mike@fireburn.co.uk>
+1-by: Tapani Pälli <tapani.palli@intel.com>
+1-by: Eric Engestrom <eric.engestrom@intel.com>
+1-by: Yurii Kolesnykov <root@yurikoles.com>
+1-by: myfreeweb <greg@unrelenting.technology>
+1-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-21 18:06:05 +00:00
Jason Ekstrand
f98fd9d15a nir/lower_clip_cull: Fix an incorrect assert
Copy+paste error.  It was supposed to test cull and not clip.

Fixes: 4e69fba534 "nir: Rewrite lower_clip_cull_distance_arrays..."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109717
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-21 12:05:12 -06:00
Jason Ekstrand
f9b2f10a41 nir: Fix a compile warning 2019-02-21 09:44:42 -06:00
Rob Clark
908d5ee9eb freedreno/a6xx: enable tiled images
Turns out we can write to tiled images as well as read.  This avoids
having to linearize or do the tiling in the shader.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-21 09:06:06 -05:00
Alejandro Piñeiro
0629b2a462 nir, glsl: move pixel_center_integer/origin_upper_left to shader_info.fs
On GLSL that info is set as a layout qualifier when redeclaring
gl_FragCoord, so somehow tied to a specific variable. But in practice,
they behave as a global of the shader. On ARB programs they are set
using a global OPTION (defined at ARB_fragment_coord_conventions), and
on SPIR-V using ExecutionModes, that are also not tied specifically to
the builtin.

This patch moves that info from nir variable and ir variable to nir
shader and gl_program shader_info respectively, so the map is more
similar to SPIR-V, and ARB programs, instead of more similar to GLSL.

FWIW, shader_info.fs already had pixel_center_integer, so this change
also removes some redundancy. Also, as struct gl_program also includes
a shader_info, we removed gl_program::OriginUpperLeft and
PixelCenterInteger, as it would be superfluous.

This change was needed because recently spirv_to_nir changed the order
in which execution modes and variables are handled, so the variables
didn't get the correct values. Now the info is set on the shader
itself, and we don't need to go back to the builtin variable to set
it.

Fixes: e68871f6a ("spirv: Handle constants and types before execution
                   modes")

v2: (Jason)
   * glsl_to_nir: get the info before glsl_to_nir, while all the rest
     of the info gathering is happening
   * prog_to_nir: gather the info on a general info-gathering pass,
     not on variable setup.

v3: (Jason)
   * Squash with the patch that removes that info from ir variable
   * anv: assert that OriginUpperLeft is true. It should be already
     set by spirv_to_nir.
   * blorp: set origin_upper_left on its core "compile fragment
     shader", not just on some specific places (for this we added an
     helper on a previous patch).
   * prog_to_nir: no need to gather specifically this fragcoord modes
     as the full gl_program shader_info is copied.
   * spirv_to_nir: assert that we are a fragment shader when handling
     this execution modes.

v4: (reported by failing gitlab pipeline #18750)
   * state_tracker: update too due changes on ir.h/gl_program

v5:
   * blorp: minor change after change on previous patch
   * radeonsi: update due this change.

v6: (Timothy Arceri)
   * prog_to_nir: remove extra whitespace
   * shader_info: don't use :1 on origin_upper_left
   * glsl: program.fs.origin_upper_left/pixel_center_integer can be
     move out of the shader list loop
2019-02-21 11:47:59 +01:00
Alejandro Piñeiro
675eabb560 blorp: introduce helper method blorp_nir_init_shader
This initializes the nir shader that will be used by blorp. Right now
it doesn't do too much beyond calling nir_builder_init_simple_shader,
and setting a name. More stuff will be added on following patches.

v2: there is a case were it is used a VERTEX_SHADER (Alejandro)
2019-02-21 11:47:51 +01:00
Alyssa Rosenzweig
705723e6be panfrost: Verify and print brx condition in disasm
The condition code in extended branches is repeated 8 times for unclear
reasons; accordingly, the code would be disassembled as "unknown5555",
"unknownAAAA", etc. This patch correctly masks off the lower two bits to
find the true code to print, verifying that the code is repeated as
believed to be necessary (providing some assurance for compiler quality
and an assert trip in case we encounter a shader in the wild that breaks
the convention).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-21 07:09:06 +00:00
Alyssa Rosenzweig
779e140b1a panfrost: Dynamically set discard branch targets
discard and discard_if are both implemented with the branching pipeline
on Midgard; essentially, we branch to the end of the fragment shader in
a special "discard" mode, setting the condition as necessary.
Previously, we hardcoded the form of this instruction, which worked for
very simple shaders but was incorrect for anything remotely interesting.
This patch instead emits logical branches in the IR, which are flattened
to real discard ops the same way other branches are, allowing targets to
be computed correctly.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-21 07:08:59 +00:00
Alyssa Rosenzweig
5abb7b559e panfrost/midgard: Emit extended branches
Previously, we only emitted compact branches; however, the offset range
of these branches is too small for many real world shaders. This patch
implements support for emitting extended branches and switches to always
using them for control flow. This incurs a code size and possibly
performance penalty, but expands the range of working shaders and
provides opportunity for further optimization.

Support for emitting compact branches is retained but this code path is
presently unused. In the future, we'll want to heuristically determine
which type of branch should be emitted for optimal codegen.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-21 07:08:47 +00:00
Alyssa Rosenzweig
813bb34fd8 panfrost: Rectify doubleplusungood extended branch
Midgard features "compact branches" and "extended branches", i.e.
corresponds to short jumps and far jumps. The form of the extended
branch was previously incorrect in the ISA headers; this patch corrects
it and updates the disassembler (simultaneous to preserve
bisectability).

Additionally, we fix some a corner case in the disassembly of extended
branches, and we now prefix extended branches with "brx", to visually
differentiate from compact branches prefixed with "br".

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-21 07:07:39 +00:00
Alyssa Rosenzweig
2c74709517 panfrost/midgard: Fix nested/chained if-else
An if-else statement is compiled to a conditional branch (from the start
to the second block) and an unconditional branch (from the end of the
first block to the end of the else). We previously incorrectly computed
the block index of the unconditional branch to be exactly one after that
of the conditional branch, valid for a single if-else statement but
nothing fancier. This patch correctly computes the unconditional branch
target, fixing more complex if-else chains.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-21 07:06:26 +00:00
Alyssa Rosenzweig
5e55c11a1b panfrost/midgard: Refactor tag lookahead code
Each Midgard instruction is scheduled to a particular instruction type
("tag"). Presumably the hardware prefetches memory based on tag, so it
is required to report out the first tag to the command stream and the
next tag of a branch target. This procedure was implemented in two
separate parts of the compiler (one time with a slight bug relating to
empty blocks); this patch refactors to unite the two routines and solve
the bug when branching to empty blocks.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-21 07:05:59 +00:00
Alyssa Rosenzweig
396eb1440a panfrost: Implement pantrace (command stream dump)
Historically, Panfrost debugging entailed the use of the LD_PRELOADable
`panwrap` tool. This setup is a tad fragile; Panfrost can be traced
directly without the intermediate layer. pantrace implements the
quivalent functionality of panwrap into Panfrost proper, allowing dumps
to work regardless of the kernel layer in use.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-21 07:03:21 +00:00
Alyssa Rosenzweig
f611782045 panfrost: Add pandecode (command stream debugger)
The `panwrap` utility can be LD_PRELOAD'd into a GLES app, intercepting
communication between the driver and the kernel. Modern panwrap versions
do no processing of their own; instead, they create a trace directory.
This directory contains the following files:

 - control.log: a line-by-line plain text file, denoting important
   syscalls (mmaps and job submits) along with their arguments

 - memory_*.bin, shader_*.bin: binary dumps of mapped memory

Together, these files contain enough information to reconstruct the
command stream and shaders of (at minimum) a single frame.

The `pandecode` utility takes this directory structure as input,
reconstructing the mapped memory and using the job submit command as an
entrypoint. It then walks the descriptors as the hardware would, parsing
and pretty-printing. Its final output is the pretty-printed command
stream interleaved with the disassembled shaders, suitable for driver
debugging. For instance, the behaviour of two driver versions (one
working, one broken) can be compared by diff'ing their decoded logs.

pandecode/decode.c was originally a part of `panwrap`; it is the oldest
living code in the project. Its history is generally not worth
preserving.

panwrap itself will continue to live downstream for the foreseeable
future, as it is specifically written for the vendor kernel. It is
possible, however, to produce equivalent traces directly from Panfrost,
bypassing the intermediate wrapping layer for well-behaved drivers.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-21 07:01:48 +00:00
Alyssa Rosenzweig
fb3bbd0c1c panfrost: Stub out separate stencil functions
This is not yet functional, but it resolves a crash in various apps and
provides a framework for further work.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-21 06:58:50 +00:00
Marek Olšák
edbd2c1ff5 radeonsi: use SDMA for uploading data through const_uploader
v2: use tc.stream_uploader in si buffer_transfer_map if not called from
    the driver thread

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-02-20 21:04:29 -05:00
Marek Olšák
54f7545cd7 gallium/u_upload_mgr: allow use of FLUSH_EXPLICIT with persistent mappings
for radeonsi

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-02-20 21:04:29 -05:00
Marek Olšák
dc8a2c139d gallium/u_threaded: always unmap const_uploader
radeonsi will require this. It's a no-op for drivers supporting persistent
mappings.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-02-20 21:04:29 -05:00
Marek Olšák
8ef6f68fa5 st/mesa: always unmap the uploader in st_atom_array.c
This is a no-op for drivers supporting persistent mappings.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2019-02-20 21:04:29 -05:00
Jason Ekstrand
1a93fc382b nir/xfb: Handle compact arrays in gather_xfb_info
This makes us properly handle gl_ClipDistance and gl_CullDistance.

Fixes: 19064b8c "nir: Add a pass for gathering transform feedback info"
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-02-21 00:08:42 +00:00
Jason Ekstrand
558c314504 nir/xfb: Work in terms of components rather than slots
We needed to better handle cases where a chunk of a variable starts at
some non-zero location_frac and rolls over into the next slot but may
not be more than 4 dwords.  For example, if gl_CullDistance is an array
of 3 things and has location_frac = 2, it will span across two vec4s but
is not, itself, bigger than a vec4.  If you ignore the clip/cull special
case, it's not allowed to happen for anything else because the only
things that can span more than one slot is dvec3 and dvec4 and they're
both bigger than a vec4.  The current code uses this attrib_slot thing
where we count attribute slots and iterate over them.  However, that
doesn't work in the case above because gl_CullDistance will have an
attrib_slot count of 1 even though it does span two slots.  We could fix
this by adjusting attrib_slot but we already have comp_mask and it's
easier to just handle it that way.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-02-21 00:08:42 +00:00
Jason Ekstrand
4e69fba534 nir: Rewrite lower_clip_cull_distance_arrays to do a lot less lowering
Instead of going to all the work of to combine them into one array, just
make two arrays and use location_frac to colocate them within CLIP0.
Then the back-end can sort things out and stack them on top of each
other.  Thanks to ef99f4c8, we also don't need to set compact anymore.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-21 00:08:42 +00:00
Jason Ekstrand
8f0fe71cc5 nir/xfb: Properly align 64-bit values
Fixes: 19064b8c "nir: Add a pass for gathering transform feedback info"
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-02-21 00:08:42 +00:00
Jason Ekstrand
30b548fc62 compiler/types: Add a contains_64bit helper
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-02-21 00:08:42 +00:00
Rob Clark
323958908e freedreno/a6xx: samplerBuffer fixes
Use the 'UNK31' bit (which should probably be called 'BUFFER') for
samplerBuffer case, which increases the size of supported buffer
texture beyond 2^15 elements.

Also need to fix the 2nd coord injected to handle the tex instructions
that take integer coords.

Fixes dEQP-GLES31.functional.texture.texture_buffer.render.as_fragment_texture.buffer_size_131071
and similar

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-20 18:50:08 -05:00
Rob Clark
50dd773a2d freedreno/ir3/a6xx: use ldib for ssbo reads
... instead of isam.  It seems like when using isam, plus atomics, we
can have the problem of old data being in the texture cache.  Plus this
way we don't have to load a component at a time.

Note that blob still seems to use isam in some cases.  I suppose it might
be preferable in the case of loading a single component, when atomics
are not in the picture (or that the ssbo does not need to otherwise be
coherent).

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-20 18:50:08 -05:00
Rob Clark
c543a2cf6f freedreno/ir3: sync instr/disasm and add ldib encoding
Resync disasm and instr header from envytools, and add ldib encoding.
This replaces an opcode from a3xx which was never seen in practice,
since that seemed easier than dealing with the same opc # meaning a
different thing on a6xx.  (Not really sure if 'sti' was actually a
real thing, I think it was only seen in fuzzing.)

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-20 18:50:08 -05:00
Rob Clark
cadf6def0c freedreno/ir3/a6xx: fix load_ssbo barrier type.
Silly copy/pasta bug, since load_image is actually the same instruction
but different barrier class.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-20 18:50:08 -05:00
Rob Clark
0df0fc28a5 freedreno/ir3: rename put_dst()
This was overlooked when it moved to ir3_context.c and ceased to be
static..

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-20 18:50:08 -05:00
Rob Clark
7fe9e790e7 freedreno: fix crash w/ masked non-SSA dst
Fixes
dEQP-GLES3.functional.shaders.indexing.varying_array.vec3_dynamic_write_dynamic_loop_read
regression.

Fixes: c1a27ba9ba freedreno/ir3: HIGH reg w/a for a6xx
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-20 18:50:08 -05:00
Rob Clark
8c486083d0 freedreno/a6xx: 3d and cube image fixes
Fixes dEQP-GLES31.functional.image_load_store.{3d,cube}.store.*
and a bunch more

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-20 18:50:08 -05:00
Rob Clark
97479df8aa freedreno/ir3: fix crash in compile fail case
The variant will be NULL if RA failed.  Which isn't ideal, but at least
lets not segfault and bring down the rest of the dEQP run with us.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-20 18:50:08 -05:00
Rob Clark
f5ee8c54ed freedreno/ir3: fix legalize for vecN inputs
The wrmask is handled in regmask_get()/regmask_set(), but it wasn't
being propagated from SSA src to dst.  So for example, an SSBO read
value that is passed in as src2.y component to atomic op, wasn't
getting the (sy) flag set.  Causing lots of fail.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-20 18:50:08 -05:00
Bas Nieuwenhuizen
688f5e456a radv: Disable depth clamping even without EXT_depth_range_unrestricted.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-20 23:24:31 +00:00
Bas Nieuwenhuizen
9f7e0523ce radv: Implement VK_EXT_depth_clip_enable.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-20 23:24:31 +00:00
Timothy Arceri
03783253b1 nir: remove non-ssa support from nir_copy_prop()
Even in a very basic shader this reduces the time spent in
nir_copy_prop() by ~17%.

No shader-db changes for radeonsi NIR or i965.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 10:18:24 +11:00
Bas Nieuwenhuizen
1ef2855692 radv: Handle clip+cull distances more generally as compact arrays.
Needed for https://gitlab.freedesktop.org/mesa/mesa/merge_requests/248 .

That MR keeps the clip and cull arrays split.

So we have to handle
 - compact arrays with location_frac != 0
 - VARYING_SLOT_CLIP_DIST1

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-20 22:49:52 +00:00
Eric Anholt
8cfc17bdda kmsro: Add the rest of the current set of tinydrm drivers.
While I haven't tested them all, given that they're all using the same
allocation paths and modifiers in the kernel they should be fine to use in
the same way.

v2: Rebase on other kmsro changes.
v3: Skip repeated '[with_gallium_kmsro,' in the meson build.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-20 21:49:41 +00:00
Andrii Simiklit
f4f4ec941e i965: re-emit index buffer state on a reset option change.
Seems like we forget to update the index buffer (ib) status and
IndexedDrawCutIndexEnable or CutIndexEnable flag is left unchanged it
leads to ignoring of glEnable/glDisable functions for GL_PRIMITIVE_RESTART
in some cases. The index buffer (ib) status should be re-emmited after the
reset option change to avoid some unexpected behavior.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109451
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Signed-off-by: Andrii Simiklit <asimiklit.work@gmail.com>
2019-02-20 20:27:56 +02:00
Kenneth Graunke
d6337b59f6 nir: Don't forget if-uses in new nir_opt_dead_cf liveness check
Commit 08bfd710a2. (nir/dead_cf: Stop
relying on liveness analysis) introduced a new check that iterated
through a SSA def's uses, to see if it's used.  But it only checked
normal uses, and not uses which are part of an 'if' condition.  This
led to it thinking more nodes were dead than possible.

Fixes Piglit's variable-indexing/tcs-output-array-float-index-wr test
(and related tests) with the out-of-tree Iris driver.

Fixes: 08bfd710a2 nir/dead_cf: Stop relying on liveness analysis
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-20 09:44:06 -08:00
Kristian H. Kristensen
b9eed05e7f freedreno/a6xx: Support MSAA resolve blits on blitter
This gets stencil and depth resolves working properly.

Fixes:

  dEQP-GLES3.functional.fbo.msaa.2_samples.depth32f_stencil8
  dEQP-GLES3.functional.fbo.msaa.2_samples.depth24_stencil8
  dEQP-GLES3.functional.fbo.msaa.4_samples.depth32f_stencil8
  dEQP-GLES3.functional.fbo.msaa.4_samples.depth24_stencil8
  dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_msaa_color
  dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_msaa_color

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-20 08:56:21 -08:00
Kristian H. Kristensen
686211f4c9 freedreno/a6xx: Copy stencil as R8_UINT
Blitter does support it after all. Previous attempt to use R8_UINT
failed because we overwrote the a6xx format in emit_blit_texture(),
but some of the later setup still looked at the gallium format.

If we overwrite it in the pipe_blit_info before we even call into
emit_blit_texture() it works properly.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-20 08:56:21 -08:00
Kristian H. Kristensen
e827ea8c83 freedreno: Update headers
Add support for multisampled sources for the blitter.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-20 08:56:21 -08:00
Eric Engestrom
a16c398668 anv: use anv_shader_bin_write_to_blob()'s return value
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-20 16:40:13 +00:00
Eric Engestrom
d3115f34a6 anv: drop unused imports
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-20 14:28:55 +00:00
Eric Engestrom
8cbfcab425 anv: make sure the extensions stay sorted
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-20 14:28:55 +00:00
Eric Engestrom
bc76ce1033 anv: sort vendors extensions after KHR and EXT
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-20 14:28:55 +00:00
Eric Engestrom
427aa9d154 anv: sort extensions alphabetically
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-20 14:28:55 +00:00
Tapani Pälli
886cee1f96 anv: anv: refactor error handling in anv_shader_bin_write_to_blob()
v2: blob manages error state internally, just return
    true if errors did not occur (Jason)

CID: 1442546
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-20 15:39:19 +02:00
Carlos Garnacho
30a01cd923 wayland/egl: Ensure EGL surface is resized on DRI update_buffers()
Fullscreening and unfullscreening a totem window while playing a video
sometimes results in the video subsurface not changing size along. This
is also reproducible with epiphany.

If a surface gets resized while we have an active back buffer for it, the
resized dimensions won't get neither immediately applied on the resize
callback, nor correctly synchronized on update_buffers(), as the
(now stale) surface size and currently attached buffer size still do match.

There's actually 2 things to synchronize here, first the surface query
size might not be updated yet to the wl_egl_window's (i.e. resize_callback
happened while there is a back buffer), and second the wayland buffers
would need dropping if new surface size differs with the currently attached
buffer. These are done in separate steps now.

https://bugzilla.redhat.com/show_bug.cgi?id=1650929
https://bugs.freedesktop.org/show_bug.cgi?id=109594

Fixes: a9fb331ea7 ("wayland/egl: update surface size on window resize")
Signed-off-by: Carlos Garnacho <carlosg@gnome.org>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Tested-by: Bastien Nocera <hadess@hadess.net>
Tested-by: Denys Kostin <denys.kostin@globallogic.com>
2019-02-20 12:04:33 +01:00
Lionel Landwerlin
f509213675 anv: implement VK_EXT_depth_clip_enable
A new extension allowing the user to explictly specify the clipping
behavior.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-20 09:57:58 +00:00
Lionel Landwerlin
fa4e103c32 vulkan: Update the XML and headers to 1.1.101 2019-02-20 09:57:58 +00:00
Samuel Iglesias Gonsálvez
63a919a3ce isl: remove the cache line size alignment requirement
The cacheline size was a requirement for using the BLT engine, which
we don't use anymore except for a few things on old HW, so we drop it.

Fixes CTS's CL#3500 test:

dEQP-VK.api.image_clearing.core.clear_color_image.2d.linear.single_layer.r8g8b8_unorm

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-20 08:28:31 +01:00
Bas Nieuwenhuizen
572854e706 radv: Clean up a bunch of compiler warnings.
Random unused vars.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-20 03:21:09 +01:00
Bas Nieuwenhuizen
7631feaa00 radv: Sync ETC2 whitelisted devices.
Fixes: 4bb6c49375 "radv: Allow ETC2 on RAVEN and VEGA10 instead of all GFX9."
Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-02-20 02:55:41 +01:00
Timothy Arceri
3d7611e9a6 st/nir: use NIR for asm programs
This uses prog_to_nir to translate ARB assembly programs to NIR.

Co-authored by Tim Arceri, Dave Airlie, and Ken Graunke:

 - [Tim Arceri]: original patch
 - [Dave Airlie]: fix crashes with parameter names
 - [Ken Graunke]:
   - Rebase on SCALAR_ISA cap, lower wpos_ytransform too.
   - Rebase on streamout fixes.
   - Lower system values for fragcoord support.
   - Don't try to use prog_to_nir for ATI_fragment_shader programs.
   - Create TGSI for fixed-function or ARB vertex shaders even if the
     driver prefers NIR, so we can create draw module shaders for
     feedback/select emulation, which rely on TGSI.

Tested on:
- iris (Intel Skylake/Kabylake): Piglit & GL CTS - Ken Graunke
- radeonsi (AMD Vega 64): Piglit - Ken Graunke
- vc4/v3d - Piglit - Eric Anholt
- freedreno - dEQP - Kristian Høgsberg

Fixes lit_degenerate_case on vc4 and v3d, and vp-address-01,
vp-arl-constant-array-huge-offset-neg, and vp-arl-neg-array on v3d.
No Piglit regressions on radeonsi; no dEQP regressions on freedreno.

Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-19 15:56:26 -08:00
Kenneth Graunke
3b4929ec6e st/mesa: Copy VP TGSI tokens if they exist, even for NIR shaders.
Even if the driver wants to use NIR shaders, we may need to have TGSI
tokens for creating draw module vertex shaders for the feedback/select
render modes.

So...if the st_vertex_program has any TGSI...copy it to the variant.

Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-19 15:56:19 -08:00
Kenneth Graunke
ba7519ca36 radeonsi: Go back to using llvm.pow intrinsic for nir_op_fpow
ARB_vertex_program and ARB_fragment_program define 0^0 = 1 (while GLSL
leaves it undefined).  Performing fpow lowering in NIR would break this
behavior, preventing us from using prog_to_nir.

According to llvm/lib/Target/AMDGPU/SIInstructions.td, POW_common
expands to <V_LOG_F32_e32, V_EXP_F32_e32, V_MUL_LEGACY_F32_e32>,
which presumably does a zero-wins multiply.

Lowering in NIR results in a non-legacy multiply, where:

   pow(0, 0) = 2^(log2(0) * 0)
             = 2^(-INF * 0)
             = 2^(-NaN)
             = -NaN

which isn't the desired result.

This reverts:
- commit d6b7539206
  (ac/nir: remove emission of nir_op_fpow)
- commit 22430224fe
  (radeonsi/nir: enable lowering of fpow)

and prevents a regression in gl-1.0-spot-light with AMD_DEBUG=nir
after enabling prog_to_nir in st/mesa later in this series.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-19 15:56:19 -08:00
Timothy Arceri
9c4d5926aa radeonsi/nir: set shader_buffers_declared properly
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-02-20 10:46:19 +11:00
Timothy Arceri
94a3df62d7 radeonsi/nir: set colors_read properly
shader-db results for VEGA64:

Totals from affected shaders:
SGPRS: 1976 -> 1976 (0.00 %)
VGPRS: 1240 -> 1144 (-7.74 %)
Spilled SGPRs: 145 -> 145 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 34632 -> 34604 (-0.08 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 261 -> 285 (9.20 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-02-20 10:46:19 +11:00
Timothy Arceri
05cc1dd764 radeonsi/nir: set input_usage_mask properly
shader-db results for VEGA64:

Totals from affected shaders:
SGPRS: 791528 -> 792616 (0.14 %)
VGPRS: 421624 -> 410784 (-2.57 %)
Spilled SGPRs: 1639 -> 1674 (2.14 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 16103516 -> 16063696 (-0.25 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 136307 -> 137830 (1.12 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-02-20 10:46:19 +11:00
Timur Kristóf
9429bcc4b0 radeonsi/nir: Use uniform location when calculating const_file_max.
The nine state tracker can produce NIR uniform variables
whose location is explicitly set. radeonsi did not take that
into account when calculating const_file_max, resulting in
rendering glitches. This patch fixes that.

Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-20 10:37:47 +11:00
Mario Kleiner
afb15d14ca drirc: Add sddm-greeter to adaptive_sync blacklist.
This is the sddm login screen.

Fixes: a9c36dbf9c ("drirc: Initial blacklist for adaptive sync")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-02-19 18:03:05 -05:00
Marek Olšák
bff8da6c59 driconf: add Civ6Sub executable for Civilization 6
I'm getting Civ6Sub instead of Civ6.

Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-19 17:59:17 -05:00
Marek Olšák
ae21bdf47c radeonsi: always enable NIR for Civilization 6 to fix corruption
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104602

Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-19 17:59:17 -05:00
Marek Olšák
ccbfe44e5f radeonsi: add driconf option radeonsi_enable_nir
Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-19 17:59:17 -05:00
Kenneth Graunke
f9c835eb56 mesa: Align doubles to a 64-bit starting boundary, even if packing.
In the new Intel Iris driver, I am using Tim's new packed uniform
storage system.  It works great, with one caveat: our scalar compiler
backend assumes that uniform offsets will be aligned to the underlying
data type.  For example, doubles must be 64-bit aligned, floats 32-bit,
half-floats 16-bit, and so on.  It does not need any other padding.

Currently, _mesa_add_parameter aligns everything to 32-bit offsets,
creating doubles that have an unaligned offset.  This patch alters
that code to align doubles to 64-bit offsets.

This may be slightly less optimal for drivers which can support full
packing, and allow reads from unaligned offsets at no penalty.  We could
make this extra alignment optional.  However, it only comes into play
when intermixing double and single precision uniforms.  Doubles are
already not too common, and intermixed values (floats then doubles)
is probably even less common.  At most, we burn a single 32-bit slot
to the alignment, which is not that expensive.  So, it doesn't seem
worthwhile to add the extra complexity.

Eventually, we'll likely want to update this code to allow half-float
values to be packed tighter than 32-bit offsets.  At that point, we'll
probably want to revisit what drivers ultimately want, and add options.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-19 13:26:58 -08:00
Kenneth Graunke
3c2c6bd1c7 compiler: Make is_64bit(GL_*) helper more broadly available
I'd like to use this in the prog_parameter.c code, so I need to move it
into C, make it non-static, and so on.  This probably isn't the ideal
place for it, but I couldn't think of a better one.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-19 13:26:58 -08:00
Eric Engestrom
daf8ada08d gitlab-ci: automatically run the CI on pushes to ci/* branches
Last commit limited the CI to master and MRs, but to avoid having to
manually trigger CI runs, let's add a 3rd, automatic way: by pushing to
a branch named `ci/*` (or `ci-*` or just `ci`) (which you can delete
afterwards, the pipeline results will remain).

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2019-02-19 16:57:32 +00:00
Eric Engestrom
861ade7042 gitlab-ci: limit the automatic CI to master and MRs
Runs on random other branches (stables RCs, personal forks) can still be
triggered manually via the web interface, or an app using the API.

This should massively help with the current voracious state of our CI.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2019-02-19 16:57:28 +00:00
Eric Engestrom
f84f833981 tegra/autotools: add missing libdrm cflags
Fixes: f1374805a8 "drm-uapi: use local files, not system libdrm"
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=109647
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-02-19 13:29:05 +00:00
Eric Engestrom
b787403a21 tegra/meson: add missing dep_libdrm
Fixes: f1374805a8 "drm-uapi: use local files, not system libdrm"
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=109645
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-02-19 13:29:00 +00:00
Rhys Perry
238730daef ac/nir: implement half-float nir_op_ldexp
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-19 11:04:46 +00:00
Rhys Perry
6971e8d342 ac/nir: implement half-float nir_op_frsq
v2: don't use ac_get_onef()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-19 11:04:41 +00:00
Rhys Perry
2038aec22a ac/nir: implement half-float nir_op_frcp
v2: don't use ac_get_onef()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-19 11:04:35 +00:00
Rhys Perry
4261edc067 ac/nir: make ac_build_fdiv support 16-bit floats
v2: don't use ac_get_onef()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-19 11:04:29 +00:00
Rhys Perry
6790b3a8db ac/nir: make ac_build_isign work on all bit sizes
v2: don't use ac_get_zero(), ac_get_one() and ac_int_of_size()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-19 11:04:20 +00:00
Rhys Perry
bbbfdef683 ac/nir: make ac_build_clamp work on all bit sizes
v2: don't use ac_get_zerof() and ac_get_onef()
v3: rename "intr" to "name"

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-19 11:03:58 +00:00
Rhys Perry
7e5004e30a ac/nir: fix 64-bit nir_op_f2f16_rtz
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-19 11:03:44 +00:00
Rhys Perry
c4ea20c0a0 ac/nir: implement 8-bit nir_load_const_instr
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-19 11:03:33 +00:00
Rhys Perry
0ca550e01a radv: ensure export arguments are always float
So that the signature is correct and consistent, the inputs to a export
intrinsic should always be 32-bit floats.

This and the previous commit fixes a large amount crashes from
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_int_*
tests

Fixes: b722b29f10 ('radv: add support for 16bit input/output')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-19 11:03:22 +00:00
Rhys Perry
64065aa504 radv: bitcast 16-bit outputs to integers
16-bit outputs are stored as 16-bit floats in the outputs array, so they
have to be bitcast.

Fixes: b722b29f10 ('radv: add support for 16bit input/output')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-19 11:03:18 +00:00
Eric Engestrom
23b485c920 gitlab-ci: use ccache to speed up builds
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-19 10:09:51 +00:00
Eric Anholt
dbe3af67a4 v3d: Move i2b and f2b support into emit_comparison.
This lets us save a resolve to NIR true/false for ifs and discard_if.  No
change in shader-db.
2019-02-18 18:18:37 -08:00
Eric Anholt
0bba9c8489 v3d: Emit a simpler negate for the iabs implementation.
One program affected in my shader-db.

instructions in affected programs: 110 -> 108 (-1.82%)
2019-02-18 18:13:09 -08:00
Eric Anholt
1a775d43c9 v3d: Delay emitting ldvpm on V3D 4.x until it's actually used.
For V3D 3.x, we emitted the ldvpms all at the top so that we didn't need
to do VPM setup when the load_inputs are out of order.  For V3D 4.x, we
can reduce register pressure by delaying our loads until they're actually
needed.  This also avoids a bunch of silly MOVs in the pre-opt VIR dump.

total instructions in shared programs: 6421415 -> 6419933 (-0.02%)
total uniforms in shared programs: 2393139 -> 2393140 (<.01%)
total threads in shared programs: 153864 -> 153906 (0.03%)
2019-02-18 18:09:07 -08:00
Eric Anholt
5a84d46896 v3d: Stop tracking num_inputs for VPM loads.
It's unused in the VS (since we need vattr_sizes[] anyway), so move it to
FS prog data.
2019-02-18 18:09:07 -08:00
Eric Anholt
581eba072d v3d: Add a function to describe what the c->execute.file check means.
This is what pointed out that we were misusing the check for last_thrsw in
the previous commit.
2019-02-18 18:09:07 -08:00
Eric Anholt
441294962c v3d: Fix the check for "is the last thrsw inside control flow"
The execute.file check used to be good enough, until I stopped setting up
the execute mask for uniform ifs.

No known tests fixed, noticed while doing a refactor.

Fixes: 0805060573 ("v3d: Handle dynamically uniform IF statements with uniform control flow.")
2019-02-18 18:09:07 -08:00
Eric Anholt
07d5b5a972 v3d: Fix f2b32 behavior.
Now that we don't have the vir_PF() magic, it's obvious that we were doing
the wrong thing for f2b32 by allowing -0.0 to produce true instead of
false.
2019-02-18 18:09:07 -08:00
Eric Anholt
3022b4bd82 v3d: Kill off vir_PF(), which is hard to use right.
You were allowed to pass in any old temp so that you could hopefully fold
the PF up into the def of the temp.  If we couldn't find one, it
implicitly generated a MOV(nop, reg).  However, that PF could have
different behavior depending on whether the def being folded into was a
float or int opcode, which the caller doesn't necessarily control.

Due to the fragility of the function, just switch all callers over to
vir_set_pf().  This also encourages the callers to use a _dest call for
the inst they're putting the PF on, eliminating a bunch of temps in the
pre-optimization VIR.

shader-db says the change is in the noise:

total instructions in shared programs: 6226247 -> 6227184 (0.02%)
instructions in affected programs: 851068 -> 852005 (0.11%)
2019-02-18 18:09:06 -08:00
Eric Anholt
6186a8d44e v3d: Do bool-to-cond for discard_if as well.
Turns this minimal conditional discard (glsl-fs-discard-01.shader_test):

0x3de0b086c5fe9000 fcmp.pushn  -, r1, r5; mov  r2, 0
0x3dec3086bbfc001f nop                  ; mov.ifa  r2, -1
0x3c047186bbe80000 nop                  ; mov.pushz  -, r2
0x3dea3186ba837000 setmsf.ifna  -, 0    ; nop

into:

0x3c00b186c582a000 fcmp.pushn  -, r2, r5; nop
0x3de83186ba837000 setmsf.ifa  -, 0     ; nop

total instructions in shared programs: 6229820 -> 6226247 (-0.06%)
2019-02-18 18:09:06 -08:00
Eric Anholt
718eef62cb v3d: Refactor bcsel and if condition handling.
Both were doing the same thing to try to get a condition to predicate on.
Noticed when I wanted to do this for discard_if as well.

No change in shader-db.
2019-02-18 18:09:06 -08:00
Eric Anholt
4586f9f902 v3d: Add a helper function for getting a nop register.
Just a little refactor to explain what's going on with QFILE_NULL.
2019-02-18 18:09:06 -08:00
Eric Anholt
339155122b v3d: Drop our hand-lowered nir_op_ffract.
The NIR lowering works fine, though it causes some slight noise due to
what looks like choices about propagating constants up multiply chains
changing.

total instructions in shared programs: 6229671 -> 6229820 (<.01%)
total uniforms in shared programs: 2312171 -> 2312324 (<.01%)
2019-02-18 18:09:06 -08:00
Eric Anholt
16f5085490 v3d: Drop a perf note about merging unpack_half_*, which has been implemented.
This is handled with copy-propagation now.
2019-02-18 18:09:06 -08:00
Eric Anholt
146e432b49 v3d: Fix incorrect flagging of ldtmu as writing r4 on v3d 4.x.
Fixes some stalls in 3DMMES's main vertex shader.

total instructions in shared programs: 6280751 -> 6211270 (-1.11%)
instructions in affected programs: 2935050 -> 2865569 (-2.37%)
2019-02-18 18:09:06 -08:00
Eric Anholt
cd5e0b2729 v3d: Use the early_fragment_tests flag for the shader's disable-EZ field.
Apparently we need disable-EZ flagged, not just "does Z writes".

Fixes
dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo
on 7278, even though it passed in simulation.

Signed-off-by: Eric Anholt <eric@anholt.net>
Fixes: 051a41d3d5 ("v3d: Add support for the early_fragment_tests flag.")
2019-02-18 18:09:06 -08:00
Eric Anholt
332b969c4e v3d: Sync indirect draws on the last rendering.
Fixes intermittent fails in
dEQP-GLES31.functional.draw_indirect.compute_interop.separate.drawelements_compute_cmd_and_data_and_indices
and others (particularly when run as part of a CTS run)
2019-02-18 18:09:06 -08:00
Eric Anholt
32f16b0b1e v3d: Clear the GMP on initialization of the simulator.
Otherwise, we might have pages accessible that shouldn't be and miss out
on errors.  This is unlikely for most tests since v3d_hw_get_mem() is big
enough that it'll be a freshly zeroed mmap, but if screens are destroyed
and recreated then we'd be reusing the old v3d_hw_get_mem() contents.
2019-02-18 18:09:06 -08:00
Emil Velikov
ba652394a3 docs: update calendar, add news item and link release notes for 18.3.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2019-02-18 18:38:14 +00:00
Emil Velikov
d7108dac73 docs: add sha256 checksums for 18.3.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit bfb5bdaa97)
2019-02-18 18:36:23 +00:00
Emil Velikov
a1ccff4aaf docs: add release notes for 18.3.4
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit b26488dead)
2019-02-18 18:36:21 +00:00
Ilia Mirkin
57441af8bf i965: always enable EXT_float_blend
From the table in isl_format.c, it appears that all generations
support blending on 32-bit float surfaces.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-18 12:13:54 -05:00
Ilia Mirkin
9fec653093 st/mesa: enable GL_EXT_float_blend when possible
If the driver supports PIPE_BIND_BLENABLE on RGBA32F, flip
EXT_float_blend on (which will affect ES3 contexts).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2019-02-18 12:13:54 -05:00
Ilia Mirkin
070a5e5d92 mesa: add explicit enable for EXT_float_blend, and error condition
If EXT_float_blend is not supported, error out on blending of FP32
attachments in an ES2 context.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-18 12:13:54 -05:00
Samuel Pitoiset
47616810ed radv: fix writing the alpha channel of MRT0 when alpha coverage is enabled
This version is better and safer.

Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-18 18:06:07 +01:00
Rob Clark
d6c43cceff freedreno/ir3: handle quirky atomic dst for a6xx
The new encoding returns a value via the 2nd src.  The legalize pass
needs to be aware of this to set the correct needs_sy flag, otherwise we
can, in cases where the atomic dst is not used, overwrite the register
that hardware will asynchronously load result into without (sy) flag, so
it gets clobbered by the atomic result.

This fixes a whole lot of rando ssbo+atomic fails, like
dEQP-GLES31.functional.ssbo.layout.single_basic_type.packed.highp_vec4.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-18 12:01:36 -05:00
Rob Clark
28fc6733cd freedreno/a6xx: fix helper_invocation (sampler mask/id)
Since gl_HelperInvocation is lowered to:

  !((1 << sample_id) & sample_mask_in))

Not setting these enable bits was causing it be broken.  (And probably a
bunch of other stuff too.)

Fixes dEQP-GLES31.functional.shaders.helper_invocation.*

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-18 10:37:54 -05:00
Samuel Pitoiset
32ab7a59bb radv: remove unused variable in gather_push_constant_info()
Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-18 13:30:16 +01:00
Lionel Landwerlin
8c87d029bc i965: scale factor changes should trigger recompile
Found by inspection.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 3da858a6b9 ("intel/compiler: add scale_factors to sampler_prog_key_data")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-02-18 12:18:13 +00:00
Samuel Pitoiset
0d8f096293 radv: write the alpha channel of MRT0 when alpha coverage is enabled
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109597
Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-18 12:14:22 +01:00
Samuel Pitoiset
2cf5433b99 ac: use new LLVM 8 intrinsic when loading 16-bit values
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-18 12:14:20 +01:00
Samuel Pitoiset
f0223143a8 ac: add ac_build_llvm8_tbuffer_load() helper
It uses the new LLVM intrinsics.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-18 12:14:17 +01:00
Tapani Pälli
9762a9f893 mesa: return NULL if we exceed MaxColorAttachments in get_fb_attachment
This fixes invalid access to Attachment array which would occur if caller
would exceed MaxColorAttachments. In practice this should not ever happen
because DiscardFramebufferEXT specifies only GL_COLOR_ATTACHMENT0 to be
valid and InvalidateFramebuffer will error out before but this should
make coverity happy.

v2: const, remove _EXT (Ian)

CID: 1442559
Fixes: 0c42b5f3cb "mesa: wire up InvalidateFramebuffer"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-02-18 07:51:55 +02:00
Alyssa Rosenzweig
2c6a7fbeb7 panfrost: Fix clipping region
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-18 05:13:50 +00:00
Alyssa Rosenzweig
fa1b36ddc2 panfrost: Preserve w sign in perspective division
This fixes issues where polygons that should be culled (due to negative
w, for instance) may not be.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-18 05:13:34 +00:00
Alyssa Rosenzweig
49985cebea panfrost: Cleanup mali_viewport (clipping) code
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-18 05:13:03 +00:00
Alyssa Rosenzweig
a94463732a panfrost: Swap order of tiled texture (de)alloc
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-18 05:10:33 +00:00
Alyssa Rosenzweig
4a4ed53c01 panfrost: Free imported BOs
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-18 05:10:06 +00:00
Alyssa Rosenzweig
b5a01296f4 panfrost: Fix various leaks unmapping resources
v2: Don't check for NULL before free()

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-18 05:09:41 +00:00
Kenneth Graunke
535251487b nir: Don't reassociate add/mul chains containing only constants
The idea here is to reassociate a * (b * c) into (a * c) * b, when
b is a non-constant value, but a and c are constants, allowing them
to be combined.

But nothing was enforcing that 'b' must be non-constant, which meant
that running opt_algebraic in a loop would never terminate if the IR
contained non-folded constant expressions like 256 * 0.5 * 2.  Normally,
we call constant folding in such a loop too, but IMO it's better for
nir_opt_algebraic to be robust and not rely on that.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109581
Fixes: 32e266a9a5 i965: Compile fp64 funcs only if we do not have 64-bit hardware support

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-02-16 23:36:14 -08:00
Chris Wilson
e9882b879b i965: Assert the execobject handles match for this device
Object handles are local to the device fd, so double check we are not
mixing together objects from multiple screens on execbuf submission.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-16 23:35:29 -08:00
Rob Clark
99b90ecd35 freedreno/a6xx: cache flush harder
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-16 16:28:00 -05:00
Rob Clark
1af0c5d320 freedreno/a6xx: compute support
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-16 16:28:00 -05:00
Rob Clark
5118dcf8c3 freedreno/a6xx: image/ssbo state emit
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-16 16:28:00 -05:00
Rob Clark
2183d9cff7 freedreno/a6xx: border-color offset helper
Soon we'll need this logic to deal w/ image/SSBO case, so split out a
helper rather than duplicate the logic.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-16 16:28:00 -05:00
Rob Clark
c1a27ba9ba freedreno/ir3: HIGH reg w/a for a6xx
It seems like some instructions (noticed this w/ cat3), cannot read HIGH
regs.. cat1 (mov/cov) can, and possibly some/all of cat2.

The blob seems to stick w/ an extra mov into low regs.  So lets do the
same.

This fixes WGID on a6xx, which unsurprisingly is related to a lot of
deqp compute fails.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-16 16:28:00 -05:00
Rob Clark
947848524d freedreno/ir3: add a6xx+ SSBO/image support
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-16 16:28:00 -05:00
Rob Clark
b46d5b8a84 freedreno/ir3: add a6xx instruction encoding
For the handful of instructions that use a new encoding.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-16 16:27:59 -05:00
Rob Clark
2e0ea3f09c freedreno/ir3: add image/ssbo <-> ibo/tex mapping
Images and SSBOs don't map directly to the hw.  They end up being part
texture and part something else.  Starting with a6xx, the hack used for
a5xx to smash the image tex state into hw texture state starting from
MAX counting down won't work, because we start using tex state also for
SSBO read.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-16 16:27:59 -05:00
Rob Clark
75f3a5245e freedreno/ir3: fix ncomp for _store_image() src
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-16 16:27:59 -05:00
Rob Clark
feee3050d3 freedreno/ir3: split out a4xx+ instructions
Note that image/ssbo support is currently only implemented for a5xx.
But the instruction encoding is the same for a4xx.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-16 16:27:59 -05:00
Rob Clark
42af0640f6 freedreno/ir3: split out image helpers
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-16 16:27:59 -05:00
Rob Clark
aefdb9bed2 freedreno/a6xx: clean up some open-coded bits
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-16 16:27:59 -05:00
Rob Clark
b51de44dea freedreno/a6xx: move stream-out emit to helper
Split out of the main fd6_emit() code, since it was already getting to
be a pretty giant function.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-16 16:26:14 -05:00
Rob Clark
c0d6be11d6 freedreno/ir3: fix varying packing vs. tex sharp edge
We probably need to rethink how we detect which instruction first
defines higher register classes.  But for now, this at least fixes
the symptom.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-16 16:26:14 -05:00
Samuel Pitoiset
52bdb043af radv: fix invalid element type when filling vertex input default values
The elements added into a vector should have the same type as the
first one, otherwise this hits an assertion in LLVM.

Fixes: 4b3549c084 ("radv: reduce the number of loaded channels for vertex input fetches")
reported-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-16 15:33:18 +01:00
Eleni Maria Stea
7188e2ba15 i965: Removed the field etc_format from the struct intel_mipmap_tree
After the previous changes to emulate the ETC/EAC formats using the
secondary shadow miptree, the etc_format field of the intel_mipmap_tree
struct became redundant and the remaining check that used it has been
replaced. (Nanley Chery)

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-02-15 15:54:41 -08:00
Eleni Maria Stea
248f2e7888 i965: Enabled the OES_copy_image extension on Gen 7 GPUs
OES_copy_image extension was disabled on Gen7 due to the lack of support
for ETC2 images. Enabled it back. (Kenneth Graunke)

v2:
  - Removed the blank lines in the comments above OES_copy_image and
  OES_texture_view extensions in intel_extensions.c (Nanley Chery)

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-02-15 15:54:41 -08:00
Eleni Maria Stea
db0c379c06 i965: Fixed the CopyImageSubData for ETC2 on Gen < 8
For CopyImageSubData to copy the data during the 1st draw call, we need
to update the shadow tree right before the rendering.

v2:
  - Added assertion that the miptree doesn't need update at the time we
  update the texture surface. (Nanley Chery)

v3:
  - As we now update the tree before the rendering we don't need to copy
  the data during the unmap anymore. Removed the unnecessary update from
  the intel_miptree_unmap in intel_mipmap_tree.c (Nanley Chery)

v4:
  - Fixed unrelated empty line removal (Nanley Chery)
  - As now the intel_upate_etc_shadow of intel_mipmap_tree.c is only
  called inside its following function, we don't need to declare it at
  the top of the file anymore. (Nanley Chery)

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-02-15 15:54:41 -08:00
Eleni Maria Stea
d8eb7287fe i965: Faking the ETC2 compression on Gen < 8 GPUs using two miptrees.
GPUs Gen < 8 cannot sample ETC2 formats. So far, they converted the
compressed EAC/ETC2 images to non-compressed RGBA images. When
GetCompressed* functions were called, the pixels were returned in this
RGBA format and not the compressed format that was expected.

Trying to fix this problem, we use a secondary shadow miptree to store the
decompressed data for the rendering and the main miptree to store the
compressed for the Get functions to work. Each time that the main miptree
is written with compressed data, we decompress them to RGB and update the
shadow. Then we use the shadow for rendering.

v2:
   - Fixes in the commit message (Nanley Chery)
   - Reversed the changes in brw_get_texture_swizzle and swapped the b, g
   values at the time that we decompress the data in the function:
   intel_miptree_update_etc_shadow of intel_mipmap_tree.c (Nanley Chery)
   - Simplified the format checks in the miptree_create function of the
   intel_mipmap_tree.c and reserved the call of the
   intel_lower_compressed_format for the case that we are faking the ETC
   support (Nanley Chery)
   - Removed the check for the auxiliary usage for the shadow miptree at
   creation (miptree_create of intel_mipmap_tree.c) as we won't use
   auxiliary buffers with these types of trees (Nanley Chery)
   - Set the etc_format of the non-ETC miptrees to MESA_FORMAT_NONE and
   removed the unecessary checks (Nanley Chery)
   - Fixed an unrelated indentation change (Nanley Chery)
   - Modified the function intel_miptree_finish_write to set the
   mt->shadow_needs_update to true to catch all the cases when we need to
   update the miptree (Nanley Chery)
   - In order to update the shadow miptree during the unmap of the
   main and always map the main (Nanley Chery) the following change was
   necessary: Splitted the previous update function that was updating all
   the mipmap levels and use two functions instead: one that updates one
   level and one that updates all of them. Used the first during unmap
   and the second before the rendering.
   - Removed the BRW_MAP_ETC_BIT flag and the mechanism to decide which
   miptree should be mapped each time and reversed all the changes in the
   higher level texture functions that upload data to textures as they
   aren't needed anymore.
   - Replaced the boolean needs_fake_etc with an inline function that
   checks when we need to fake the ETC compression (Nanley Chery)
   - Removed the initialization of the strides in the update function as
   the values will be overwritten by the intel_miptree_map call (Nanley
   Chery)
   - Used minify instead of division in the new update function
   intel_miptree_update_etc_shadow_levels in intel_mipmap_tree.c (Nanley
   Chery)
   - Removed the depth from the calculation of the number of slices in
   the new update function (intel_miptree_update_etc_shadow_levels of
   intel_mipmap_tree.c) as we don't need to support 3D ETC images.
   (Nanley Chery)

v3:
  - Renamed the rgba_fmt in function miptree_create
  (intel_mipmap_tree.c) to decomp_format as the format is not always in
  rgba order. (Nanley Chery)
  - Documented the new usage for the shadow miptree in the comment above
  the field in the intel_miptree struct in intel_mipmap_tree.h (Nanley
  Chery)
  - Removed the redundant flags from the mapping of the miptrees in
  intel_miptree_update_etc_shadow of intel_mipmap_tree.c (Nanley Chery)
  - Fixed the switch from surface's logical level to physical level in
  the intel_miptree_update_etc_shadow_levels of intel_mipmap_tree.c
  (Nanley Chery)
  - Excluded the Baytrail GPUs from the check for the ETC emulation as
  they support the ETC formats natively. (Nanley Chery)
  - Simplified the check if the format is BGRA in
  intel_miptree_update_etc_shadow of intel_mipmap_tree.c (Nanley Chery)

v4:
  - Removed the functions intel_miptree_(map|unmap)_etc and the check if
   we need to call them as with the new changes, they became unreachable.
   (Nanley Chery)
  - We'd rather calculate the level width and height using the shadow
  miptree instead of the main in intel_miptree_update_etc_shadow_levels of
  intel_mipmap_tree.c (Nanley Chery)
  - Fixed the format in the mt_surface_usage, set at the miptree creation,
   in miptree_create of intel_mipmap_tree.c (Nanley Chery)

v5:
  - Fixed the levels calculations in intel_mipmap_tree.c (Nanley Chery)
  - Update the flag shadow_needs_update outside the function
  intel_miptree_update_etc_shadow (Nanley Chery)
  - Fixed indentation error (Nanley Chery)

v6:
  - Fixed typo in commit message (Nanley Chery)
  - Simplified the assignment of the mt_fmt in the miptree_create of the
  intel_mipmap_tree.c (Nanley Chery)
  - Combined declarations and assignments where it was possible in the
  intel_miptree_update_etc_shadow and
  intel_miptree_update_etc_shadow_levels of the intel_mipmap_tree.c
  (Nanley Chery)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81843
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104272
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-02-15 15:54:41 -08:00
Nanley Chery
c6dada70f0 i965: Rename intel_mipmap_tree::r8stencil_* -> ::shadow_*
Use more generic field names. We'll reuse these fields for a workaround
with ASTC miptrees.

Reviewed-by: Eleni Maria Stea <estea@igalia.com>
2019-02-15 15:54:41 -08:00
Timothy Arceri
a801196ec9 nir: remove simple dead if detection from nir_opt_dead_cf()
This was probably useful when it was first written, however it
looks to be no longer necessary.

As far as I can tell these days dce is smart enough to remove useless
instructions from if branches. Once this is done
nir_opt_peephole_select() will end up removing the empty if.

Removing this support reduces the dolphin uber shader compilation
time spent in nir_opt_dead_cf() by a little over 7x.

No shader-db changes on i965 or radeonsi.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-02-16 10:45:31 +11:00
Alok Hota
f695e43354 swr/rast: Add translation support to streamout
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-02-15 14:54:29 -06:00
Alok Hota
a7fa0cc0a5 swr/rast: simdlib cleanup, clipper stack space fixes
Reduce stack space used by clipper, which had lead to crashes in some
versions for MSVC

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-02-15 14:54:23 -06:00
Alok Hota
f9c29a301a swr/rast: convert DWORD->uint32_t, QWORD->uint64_t
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-02-15 14:54:19 -06:00
Alok Hota
c503b58878 swr/rast: Refactor scratch space variable names
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-02-15 14:54:14 -06:00
Alok Hota
0b4db43705 swr/rast: FP consistency between POSH/RENDER pipes
- Ensure all threads have optimal floating-point control state
- Disable auto-generation of fused FP ops for VERTEX shader stage
- Disable "fast" FP ops for VERTEX shader stage

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-02-15 14:54:09 -06:00
Alok Hota
dc7b3c95a4 swr/rast: Move knob defaults to generated cpp file
Reduces amount of compile churn when testing different default values

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-02-15 14:54:04 -06:00
Alok Hota
05e4ff33f5 swr/rast: Flip BitScanReverse index calculation
The intrinsic returns the number of leading zeros, not the bit number of
the first nonzero, so just flip it based on the mask size

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-02-15 14:53:58 -06:00
Alok Hota
ae400a9b11 swr/rast: Correctly align 64-byte spills/fills
Fixes crashes on some compute shaders when running on AVX512

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-02-15 14:53:54 -06:00
Alok Hota
78bab66479 swr/rast: Disable use of __forceinline by default
- Was not useful to inline in release builds
- FORCEINLINE can be used if absolutely necessary

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-02-15 14:52:51 -06:00
Alok Hota
20d5c88760 swr/rast: Convert system memory pointers to gfxptr_t
Fulfills an unused internal interface

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-02-15 14:52:32 -06:00
Bas Nieuwenhuizen
4b03a19a0b radv: Use correct num formats to detect whether we should be use 1.0 or 1.
normalized and scaled formats also return floats.

Fixes: 4b3549c084 ("radv: reduce the number of loaded channels for vertex input fetches")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-15 20:24:16 +00:00
Ian Romanick
979b43b347 nir/algebraic: Simplify comparison with sequential integers starting with 0
All of the affected shaders are Unreal4 demos.

All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15437170 -> 15437001 (<.01%)
instructions in affected programs: 21536 -> 21367 (-0.78%)
helped: 43
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 3.93 x̃: 4
helped stats (rel) min: 0.68% max: 1.01% x̄: 0.80% x̃: 0.80%
95% mean confidence interval for instructions value: -4.07 -3.79
95% mean confidence interval for instructions %-change: -0.83% -0.77%
Instructions are helped.

total cycles in shared programs: 383007896 -> 383007378 (<.01%)
cycles in affected programs: 158640 -> 158122 (-0.33%)
helped: 38
HURT: 4
helped stats (abs) min: 1 max: 48 x̄: 13.89 x̃: 6
helped stats (rel) min: 0.03% max: 1.01% x̄: 0.33% x̃: 0.19%
HURT stats (abs)   min: 2 max: 3 x̄: 2.50 x̃: 2
HURT stats (rel)   min: 0.06% max: 0.09% x̄: 0.08% x̃: 0.08%
95% mean confidence interval for cycles value: -16.90 -7.77
95% mean confidence interval for cycles %-change: -0.39% -0.19%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8213746 -> 8213745 (<.01%)
instructions in affected programs: 127 -> 126 (-0.79%)
helped: 1
HURT: 0

total cycles in shared programs: 187734146 -> 187734144 (<.01%)
cycles in affected programs: 2132 -> 2130 (-0.09%)
helped: 1
HURT: 0

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-15 11:11:02 -08:00
Ian Romanick
ad05920258 nir/algebraic: Convert some f2u to f2i
Section 5.4.1 (Conversion and Scalar Constructors) of the GLSL 4.60 spec
says:

     It is undefined to convert a negative floating-point value to an
     uint.

Assuming that (uint)some_float behaves like (uint)(int)some_float allows
some optimizations in the i965 backend to proceed.

This basically undoes the small amount of damage done by
"intel/compiler: Avoid propagating inequality cmods if types are
different".

v2: Replicate part of the commit message as a comment in the code.
Suggested by Jason.

shader-db results compairing *before* "intel/compiler: Avoid propagating
inequality cmods if types are different" and after this commit:

Skylake
total cycles in shared programs: 383007996 -> 383007896 (<.01%)
cycles in affected programs: 85208 -> 85108 (-0.12%)
helped: 13
HURT: 8
helped stats (abs) min: 2 max: 26 x̄: 10.77 x̃: 6
helped stats (rel) min: 0.09% max: 0.65% x̄: 0.28% x̃: 0.14%
HURT stats (abs)   min: 2 max: 12 x̄: 5.00 x̃: 3
HURT stats (rel)   min: 0.04% max: 0.32% x̄: 0.12% x̃: 0.07%
95% mean confidence interval for cycles value: -9.31 -0.21
95% mean confidence interval for cycles %-change: -0.24% <.01%
Cycles are helped.

Broadwell
total cycles in shared programs: 415251194 -> 415251370 (<.01%)
cycles in affected programs: 83750 -> 83926 (0.21%)
helped: 7
HURT: 13
helped stats (abs) min: 10 max: 12 x̄: 11.43 x̃: 12
helped stats (rel) min: 0.30% max: 0.30% x̄: 0.30% x̃: 0.30%
HURT stats (abs)   min: 2 max: 36 x̄: 19.69 x̃: 22
HURT stats (rel)   min: 0.05% max: 0.89% x̄: 0.44% x̃: 0.47%
95% mean confidence interval for cycles value: 0.76 16.84
95% mean confidence interval for cycles %-change: <.01% 0.37%
Inconclusive result (%-change mean confidence interval includes 0).

Haswell
total instructions in shared programs: 13823885 -> 13823886 (<.01%)
instructions in affected programs: 2249 -> 2250 (0.04%)
helped: 0
HURT: 1

total cycles in shared programs: 390094243 -> 390094001 (<.01%)
cycles in affected programs: 85640 -> 85398 (-0.28%)
helped: 15
HURT: 6
helped stats (abs) min: 4 max: 26 x̄: 18.53 x̃: 18
helped stats (rel) min: 0.09% max: 0.66% x̄: 0.47% x̃: 0.42%
HURT stats (abs)   min: 2 max: 14 x̄: 6.00 x̃: 2
HURT stats (rel)   min: 0.04% max: 0.37% x̄: 0.15% x̃: 0.04%
95% mean confidence interval for cycles value: -17.36 -5.69
95% mean confidence interval for cycles %-change: -0.44% -0.14%
Cycles are helped.

Ivy Bridge
total cycles in shared programs: 180986448 -> 180986552 (<.01%)
cycles in affected programs: 34835 -> 34939 (0.30%)
helped: 0
HURT: 10
HURT stats (abs)   min: 2 max: 18 x̄: 10.40 x̃: 10
HURT stats (rel)   min: 0.06% max: 0.36% x̄: 0.28% x̃: 0.30%
95% mean confidence interval for cycles value: 4.67 16.13
95% mean confidence interval for cycles %-change: 0.20% 0.35%
Cycles are HURT.

Sandy Bridge
total cycles in shared programs: 154603969 -> 154603970 (<.01%)
cycles in affected programs: 171514 -> 171515 (<.01%)
helped: 25
HURT: 14
helped stats (abs) min: 1 max: 4 x̄: 1.80 x̃: 1
helped stats (rel) min: 0.02% max: 0.10% x̄: 0.04% x̃: 0.04%
HURT stats (abs)   min: 1 max: 8 x̄: 3.29 x̃: 3
HURT stats (rel)   min: 0.03% max: 0.28% x̄: 0.10% x̃: 0.11%
95% mean confidence interval for cycles value: -0.91 0.96
95% mean confidence interval for cycles %-change: -0.02% 0.04%
Inconclusive result (value mean confidence interval includes 0).

No changes on Iron Lake or GM45.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-15 11:11:02 -08:00
Matt Turner
ac21dd4aee intel/compiler/test: Add unit test for mismatched signedness comparison
v2 (idr): Move adding the test to after adding the fix.  Reordering the
two commits prevents possible headaches for git-bisect with scripts that
always do 'ninja check'.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109404
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-02-15 11:11:02 -08:00
Matt Turner
2dff9a66b6 intel/compiler: Avoid propagating inequality cmods if types are different
v2: Fix silly bug in logic.  s/||/&&/

All but one of the affected shaders is in an Unreal4 demo.  The other is
in Tomb Raider.  All of the cases that Ian investigated appear to be
sequences like the following

    if (int(uint(some_float)) < 0) /* other relations too */
        ...

At least in Tomb Raider, it's not obvious that this sequence came from
the original shader.

In some of the Unreal demos, the shader contains code like

    if (int(uint(textureLod(...))) > 0)
        ...

which explicitly generates the offending sequence.

All Gen6+ platforms had similar results (Skylake shown):
total instructions in shared programs: 15437170 -> 15437187 (<.01%)
instructions in affected programs: 4492 -> 4509 (0.38%)
helped: 0
HURT: 17
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.05% max: 0.73% x̄: 0.66% x̃: 0.73%
95% mean confidence interval for instructions value: 1.00 1.00
95% mean confidence interval for instructions %-change: 0.57% 0.75%
Instructions are HURT.

total cycles in shared programs: 383007996 -> 383007992 (<.01%)
cycles in affected programs: 20542 -> 20538 (-0.02%)
helped: 6
HURT: 7
helped stats (abs) min: 2 max: 6 x̄: 5.33 x̃: 6
helped stats (rel) min: 0.11% max: 0.36% x̄: 0.32% x̃: 0.36%
HURT stats (abs)   min: 4 max: 4 x̄: 4.00 x̃: 4
HURT stats (rel)   min: 0.27% max: 0.27% x̄: 0.27% x̃: 0.27%
95% mean confidence interval for cycles value: -3.30 2.69
95% mean confidence interval for cycles %-change: -0.19% 0.19%
Inconclusive result (value mean confidence interval includes 0).

No changes on Iron Lake or GM45.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109404
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: nagrigoriadis@gmail.com
Tested-by: Danylo Piliaiev <danylo.piliaiev@gmail.com>
2019-02-15 11:11:02 -08:00
Matt Turner
e50db60d16 intel/compiler/test: Set devinfo->gen = 7
We emit an FBL instruction which only exists since Gen7. This prevents
the test from segfaulting when run with TEST_DEBUG=1.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-02-15 11:11:02 -08:00
James Zhu
9364d66cb7 gallium/auxiliary/vl: Add video compositor compute shader render
Add compute shader initilization, assign and cleanup in vl_compositor API.
Set video compositor compute shader render as default when pipe support it.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2019-02-15 10:07:03 -05:00
James Zhu
f6ac0b5d71 gallium/auxiliary/vl: Add compute shader to support video compositor render
Add compute shader to support video compositor render.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
2019-02-15 10:07:03 -05:00
James Zhu
299e2bc046 gallium/auxiliary/vl: Rename csc_matrix and increase its size.
Rename csc_matrix to shader_params, and increase shader_params size
to store more constants for compute shader,

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2019-02-15 10:07:03 -05:00
James Zhu
7b7b5f2029 gallium/auxiliary/vl: Split vl_compositor graphic shaders from vl_compositor API
Split vl_compositor graphic shaders from vl_compositor API in order to share
vl_compositor API with vl_compositor compute shader later.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2019-02-15 10:07:03 -05:00
James Zhu
b34d7c5daa gallium/auxiliary/vl: Move dirty define to header file
Move dirty define to header file to share with compute shader.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
2019-02-15 10:07:03 -05:00
Juan A. Suarez Romero
1fb24080b7 nir: remove jump from two merging jump-ending blocks
In opt_peel_initial_if optimization, when moving the continue list to
end of the continue block, before the jump, could happen that the
continue list itself also ends with a jump.

This would mean that we would have two jump instructions in a row: the
first one from the continue list and the second one from the contine
block.

As inserting an instruction after a jump is not allowed (and it does not
make sense, as it will not be executed), remove the jump from the
continue block and keep the one from continue list, as it will be
executed first.

CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-15 15:16:24 +01:00
Juan A. Suarez Romero
69be9934a7 nir: move ALU instruction before the jump instruction
opt_split_alu_of_phi moves ALU instruction to the end of continue block.

But if the continue block ends with a jump instruction (an explicit
"continue" instruction) then the ALU must be inserted before the jump,
as it is illegal to add instructions after the jump.

CC: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 0881e90c09 ("nir: Split ALU instructions in loops that read phis")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-02-15 15:14:36 +01:00
Andres Gomez
a43596df62 mesa: INVALID_VALUE for wrong type or format in Clear*Buffer*Data
Instead of generating a GL_INVALID_ENUM error when the type or format
is incorrect while using glClear{Named}Buffer{Sub}Data, generate
GL_INVALID_VALUE.

From page 72 (page 94 of the PDF) of the OpenGL 4.6 spec:

  " An INVALID_VALUE error is generated if type is not one of the
    types in table 8.2.

    An INVALID_VALUE error is generated if format is not one of the
    formats in table 8.3."

Fixes the following test:
KHR-GL45.direct_state_access.buffers_errors

v2: correct the doxygen documentation.

Cc: Pi Tabred <servuswiegehtz@yahoo.de>
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-02-15 14:28:06 +02:00
Gurchetan Singh
67426ccd42 virgl: use virgl_transfer_inline_write even less
We've noticed the Team Fortress 2 engine seems to do many small
calls to glSubData(..). Let's pick our heuristic based on the
resource base width, not the size of a particular upload.
This will cause transfers to be batched together in the transfer
queue.

Revelant glbench microbenchmark --

Before: buffer_upload_dynamic_element_array_131072 = 131.17 mbytes_sec
After: buffer_upload_dynamic_element_array_131072 = 6828.24 mbytes_sec
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
2019-02-15 11:19:05 +01:00
Gurchetan Singh
f0e71b1088 virgl: use transfer queue
This improves Unigine Valley benchmark by 3 to 10 fps (depending
on the scene).

It also improves the Team Fortress 2 benchmark from 6 fps to 13
fps (host: 20 fps).

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
2019-02-15 11:19:05 +01:00
Gurchetan Singh
4a7857b377 virgl: introduce transfer queue
Transfers will be placed here at unmap time instead of incurring
a VM exit. There's an attempt to deduplicate intersecting 1D transfers,
which are surprisingly common.

This can also help with mipmapped texture upload and smaller
textures, where the majority of the time is spent in the guest
kernel / QEMU -- not virglrenderer.  This is shown by the GLbench
texture upload benchmark:

Before:
    texture_upload_rgba_teximage2d_32 = 64.23 mtexel_sec
After:
    texture_upload_rgba_teximage2d_32 = 367.44 mtexel_sec

v2: Split up list iteration functions (@gerddie)
v3: Support for optimizing glBufferSubData
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
2019-02-15 11:19:05 +01:00
Gurchetan Singh
9c4930946a virgl: add encoder functions for new protocol
Let's encode the new protocol with new helper functions.

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
2019-02-15 11:19:05 +01:00
Gurchetan Singh
5510cc67e0 virgl: make winsys modifications for encoded transfers
The idea is to have two command buffers:

1) One for transfers
2) One for commands, which can include transfers

At flush time, (2) will be filled.  Otherwise, (1) will be
used to submit transfers if there are enough of them.

v2: Pass size directly to cmd_buf_create (@gerddie)
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
2019-02-15 11:19:05 +01:00
Gurchetan Singh
90e9650585 virgl: add extra checks in virgl_res_needs_flush_wait
This is motivated by the following scenario:

glSubBufferData(GL_ARRAY_BUFFER, ...)
glFlush(..)
glSubBufferData(GL_ARRAY_BUFFER, ...)
glSubBufferData(GL_ARRAY_BUFFER, ...)
glSubBufferData(GL_ARRAY_BUFFER, ...)

This increases @davidriley's Team Fortress 2 apitrace from
1 fps to 6 fps and helps with the Chromium glbench
microbenchmarks:

Before: texture_update_rgba_texsubimage2d_2048 = 554.96 mtexel_sec
   buffer_upload_dynamic_array_12 = 0.02 mbytes_sec
   buffer_upload_dynamic_array_576 = 1.07 mbytes_sec
After: texture_update_rgba_texsubimage2d_2048 = 612.29 mtexel_sec
   buffer_upload_dynamic_array_12 = 2.22 mbytes_sec
   buffer_upload_dynamic_array_576 = 164.89 mbytes_sec
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
2019-02-15 11:19:05 +01:00
Gurchetan Singh
ab6ea6e9ce virgl: pass virgl transfer to virgl_res_needs_flush_wait
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
2019-02-15 11:19:05 +01:00
Gurchetan Singh
d98fbd9c92 virgl: keep track of number of computations
It's good to keep track of these things.

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
2019-02-15 11:19:05 +01:00
Gurchetan Singh
35515985a9 virgl: limit command length to 16 bits
Much of our logic is based around the idea the upper 16 bits
of a command dword can encode the length of the command.

Now that the command buffer >= 2^16 - 1, we should check for
this.

v2: alignment, and only check VIRGL_ENCODE_MAX_DWORDS
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
2019-02-15 11:19:05 +01:00
Gurchetan Singh
503ffe46bb virgl: use virgl_transfer in inline write
Let's define a helper function and use it.

This commit also allows resources to be emitted into different command
buffers.

Like the ioctls, send 0 for layer_stride and stride.  If we actually
send the real values, there are various assumptions in virglrenderer
for non-1D buffers that may need to be modified.

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
2019-02-15 11:19:05 +01:00
Gurchetan Singh
0fcd48bac5 virgl: add protocol for resource transfers
Mostly similar to VIRGL_CCMD_RESOURCE_INLINE_WRITE.  However, this
uses the resource's already attached iovecs rather than the command
buffer to transfer the data.

v2: Used (1 << 16) not (1 << 15) [@gerddie]
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
2019-02-15 11:19:05 +01:00
Gurchetan Singh
168c3ffce3 virgl: when creating / freeing transfers, pass slab pool directly
This will allow us to destroy transfers w/o having a pointer
to the context.

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
2019-02-15 11:19:04 +01:00
Gurchetan Singh
d5c2dacc15 virgl: unmap uploader at flush time
This should save some memory when allocating and freeing transfers.

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
2019-02-15 11:19:04 +01:00
Gurchetan Singh
14f265b533 virgl: make alignment smaller when uploading index user buffers
Since we're just uploading to guest memory, let's just align to dword
size.

Fixes: e0f932 ("u_upload_mgr: pass alignment to u_upload_data manually")
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
2019-02-15 11:19:04 +01:00
Gurchetan Singh
7626e6e189 virgl: track level cleanliness rather than resource cleanliness
This allows a minor optimization for texture upload.

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
2019-02-15 11:19:04 +01:00
Gurchetan Singh
c19aedcf1a virgl: don't mark unclean after a flush
The guest memory is still clean until host GL touches it,
which we should track elsewhere.

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
2019-02-15 11:19:04 +01:00
Gurchetan Singh
5b6a2ae987 virgl: use virgl_resource_dirty helper
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
2019-02-15 11:19:04 +01:00
Gurchetan Singh
1d294ad264 virgl: add ability to do finer grain dirty tracking
There are levels to cleanliness.

Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
2019-02-15 11:19:04 +01:00
Alyssa Rosenzweig
acc52fff20 panfrost: Improve logging and patch memory leaks
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-15 07:47:54 +00:00
Alyssa Rosenzweig
c70ed4ca18 panfrost: Don't align framebuffer dims
Fixes regressions with EGL clients

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-15 07:46:30 +00:00
Alyssa Rosenzweig
5155bcf099 panfrost: Implement PIPE_QUERY_OCCLUSION_COUNTER
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-15 07:46:02 +00:00
Alyssa Rosenzweig
2d22b5380c panfrost: Identify MALI_OCCLUSION_PRECISE bit
Setting this is required for desktop-style occlusion queries.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-15 07:45:56 +00:00
Tapani Pälli
595af46f0f drirc/i965: add option to disable 565 configs and visuals
We have cases where we would not like to expose these.

v2: call the option allow_rgb565_configs for consistency
    with existing allow_rgb10_configs (Eric, Jason)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-15 09:38:36 +02:00
Alyssa Rosenzweig
97aa05470a panfrost: Backport driver to Mali T600/T700
There are a few differenes between Mali T860 (Panfrost's primary
reference target) and the older Midgard generations (T600/T700):

 - Miscellaneous different magic numbers. It's not clear what these
numbers mean on either the old or new configurations yet.

 - Errata fixes. T800 is the final Midgard generation and presumably the
least buggy. Older Midgard has some extra hardware errata we have to
workaround.

- SFBD vs MFBD split. Essentially, older Midgard use a Single
FrameBuffer Descriptor (SFBD), which corresponds to single
render-target rendering. Newer Midgard (T760+) use a Multiple
FrameBuffer Descriptor (MFBD), allowing multiple RTs. On ES 2.0, these
descriptors serve the same function, but we implement both, depending on
the version of the hardware.

- CPU bitness. 32-bit systems generally use 32-bit GPU descriptors, and
vice versa for 64-bit. Our target T760 systems are 32-bit whereas our
target T860 systems are 64-bit. More work is needed in this area.

This patch fixes support in these areas for supporting older Midgard
hardware. It is tested on Mali T760 and Mali T860.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-15 07:22:42 +00:00
Alyssa Rosenzweig
f96e871c26 panfrost: Fix build; depend on libdrm
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-15 07:19:43 +00:00
Jason Ekstrand
08bfd710a2 nir/dead_cf: Stop relying on liveness analysis
The liveness analysis pass is fairly expensive because it has to build
large bit-sets and run a fix-point algorithm on them.  Instead of
requiring liveness for detecting if values escape a CF node, just take
advantage of the structured nature of NIR and use block indices instead.
This only requires the block index metadata which is the fastest we have
metadata to generate.

No shader-db changes on Kaby Lake

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-14 23:06:29 -06:00
Jason Ekstrand
b50465d197 nir/dead_cf: Inline cf_node_has_side_effects
We want to handle live SSA values differently and it's going to involve
walking the instructions.  We can make it a single instruction walk if
we combine it with cf_node_has_side_effects.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-14 23:05:28 -06:00
Jason Ekstrand
367b0ede4d intel/fs: Bail in optimize_extract_to_float if we have modifiers
This fixes a bug in runscape where we were optimizing x >> 16 to an
extract and then negating and converting to float.  The NIR to fs pass
was dropping the negate on the floor breaking a geometry shader and
causing it to render nothing.

Fixes: 1f862e923c "i965/fs: Optimize float conversions of byte/word..."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109601
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-02-14 23:02:44 -06:00
Ilia Mirkin
8c859367df swr: set PIPE_CAP_MAX_VARYINGS correctly
Unfortunately swr was missed in the original commit. The number of
varyings should generally match up to what's reported as the shader
caps for fragment inputs.

Fixes: 6010d7b8e8 (gallium: add PIPE_CAP_MAX_VARYINGS)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Alok Hota <alok.hota@intel.com>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-02-14 20:29:36 -05:00
Jason Ekstrand
5064464931 intel/fs: Silence a compiler warning
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-14 16:04:47 -06:00
Jason Ekstrand
9b202239ba anv: Silence some compiler warnings in release builds
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-14 16:04:45 -06:00
Jason Ekstrand
cd60c995a6 anv/blorp: Delete a pointless assert
Just a little higher up in the function we assert that the aspect masks
are actually equal so there's no reason for the weaker check.  Also, the
temporary variables were causing compiler warnings in release builds.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-14 16:04:42 -06:00
Jason Ekstrand
b14d7a6b60 nir: Silence a couple of warnings in release builds
[28/716] Compiling C object 'src/compiler/nir/068b2c8@@nir@sta/nir_gather_xfb_info.c.o'.
../src/compiler/nir/nir_gather_xfb_info.c: In function ‘nir_gather_xfb_info’:
../src/compiler/nir/nir_gather_xfb_info.c:171:13: warning: variable ‘max_offset’ set but not used [-Wunused-but-set-variable]
    unsigned max_offset[NIR_MAX_XFB_BUFFERS] = {0};
             ^~~~~~~~~~
[36/716] Compiling C object 'src/compiler/nir/068b2c8@@nir@sta/nir_instr_set.c.o'.
../src/compiler/nir/nir_instr_set.c:502:1: warning: ‘instr_each_src_and_dest_is_ssa’ defined but not used [-Wunused-function]
 instr_each_src_and_dest_is_ssa(nir_instr *instr)
 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-14 16:04:35 -06:00
Kenneth Graunke
6775665e5e spirv: Eliminate dead input/output variables after translation.
spirv_to_nir can generate input/output variables which are illegal
for the current shader stage, which would cause nir_validate_shader
to balk.  After my recent commit to start decorating arrays as compact,
dEQP-VK.spirv_assembly.instruction.graphics.module.same_module started
hitting validation errors due to outputs in a TCS (not intended for the
TCS at all) not being per-vertex arrays.

Thanks to Jason Ekstrand for suggesting this approach.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109573
Fixes: ef99f4c8d1 compiler: Mark clip/cull distance arrays as compact before lowering.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
2019-02-14 11:03:56 -08:00
Kenneth Graunke
39aee57523 anv: Put MOCS in the correct location
My patch to switch from struct-based MOCS to numeric MOCS accidentally
divided all MOCS entries by 2 in the Vulkan driver.

MOCS on Gen9+ is just an array index into a table.  But in the hardware
packets, the index starts at bit 1.  So we need to shift it.

Fixes: 0b44644ca6 (genxml: Consistently use a numeric "MOCS" field)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-14 11:03:28 -08:00
Ian Romanick
9a918050e0 spirv: Add missing break
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: c6465fec0c ("spirv: add SpvCapabilityInt64Atomics")
CID: 1442555
2019-02-14 08:35:59 -08:00
Eric Engestrom
c2b4b46fa9 util/tests: compile to something sensible in release builds
assert()-based tests make no sense without asserts, so make sure asserts
are compiled in, even if the rest of the code has asserts turned off.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-14 12:52:34 +00:00
Eric Engestrom
f7c56475d2 anv/tests: compile to something sensible in release builds
assert()-based tests make no sense without asserts, so make sure asserts
are compiled in, even if the rest of the code has asserts turned off.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-14 12:52:34 +00:00
Eric Engestrom
4c1ca5b074 etnaviv: drop duplicate #define
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-14 11:20:00 +00:00
Eric Engestrom
7f68b38439 st/dri: drop duplicate #define
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-14 11:20:00 +00:00
Eric Engestrom
2fa165e757 gbm: drop duplicate #defines
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-14 11:20:00 +00:00
Eric Engestrom
f1374805a8 drm-uapi: use local files, not system libdrm
There was an issue recently caused by the system header being included
by mistake, so let's just get rid of this include path and always
explicitly #include "drm-uapi/FOO.h"

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-14 11:20:00 +00:00
Eric Engestrom
69e4c273c4 drm-uapi/README: remove explicit list of driver names
These headers are used by a lot more than just the intel drivers nowadays.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-14 11:20:00 +00:00
Samuel Pitoiset
227df98fa6 radv: fix radv_fixup_vertex_input_fetches()
We should check that num_channels is 4, otherwise that breaks
the world. Sorry for the short breakage.

Fixes: 4b3549c084 ("radv: reduce the number of loaded channels for vertex input fetches")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-14 09:44:35 +01:00
Samuel Pitoiset
4b3549c084 radv: reduce the number of loaded channels for vertex input fetches
It's unnecessary to load more channels than the vertex attribute
format. The remaining channels are filled with 0 for y and z,
and 1 for w.

29077 shaders in 15096 tests
Totals:
SGPRS: 1321605 -> 1318869 (-0.21 %)
VGPRS: 935236 -> 932252 (-0.32 %)
Spilled SGPRs: 24860 -> 24776 (-0.34 %)
Code Size: 49832348 -> 49819464 (-0.03 %) bytes
Max Waves: 242101 -> 242611 (0.21 %)

Totals from affected shaders:
SGPRS: 93675 -> 90939 (-2.92 %)
VGPRS: 58016 -> 55032 (-5.14 %)
Spilled SGPRs: 172 -> 88 (-48.84 %)
Code Size: 2862740 -> 2849856 (-0.45 %) bytes
Max Waves: 15474 -> 15984 (3.30 %)

This mostly helps Croteam games (Talos/Sam2017).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-14 09:10:56 +01:00
Samuel Pitoiset
210aec3612 radv: store vertex attribute formats as pipeline keys
The formats will be used for reducing the number of loaded channels.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-14 09:10:09 +01:00
Samuel Pitoiset
45382baef6 radv: use MAX_{VBS,VERTEX_ATTRIBS} when defining max vertex input limits
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-14 09:09:51 +01:00
Samuel Pitoiset
2154fac6f3 ac: make use of ac_build_expand_to_vec4() in visit_image_store()
And make ac_build_expand() a static function.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-14 09:09:48 +01:00
Eric Anholt
338d399fd0 freedreno: Use the NIR lowering for isign.
I think this will save an instruction and hopefully not increase any other
costs (possibly the immediate -1 and 1?), but I haven't actually tested.

Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-14 00:32:30 +00:00
Eric Anholt
8f3694e1ab intel: Use the NIR lowering for isign.
Drops one instruction from fs-sign-int.shader_test.  No change in
shader-db due to it having 0 instances of sign(genIType).  This may hurt
isign64 if algebraic runs before int64 lowering, but I wasn't sure how to
mark the algebraic opt as "every bit size but 64".

v2: Update commit message about shader-db.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
2019-02-14 00:32:30 +00:00
Eric Anholt
3f22b35a43 v3d: Use the NIR lowering for isign instead of rolling our own.
min/max instead of comparisons saves 2 instructions on
fs-sign-int.shader_test.
2019-02-14 00:32:30 +00:00
Eric Anholt
42d2cae907 nir: Move panfrost's isign lowering to nir_opt_algebraic.
I wanted to reuse this from v3d.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-02-14 00:32:30 +00:00
Timothy Arceri
68baf96824 nir: turn an ssa check in nir_search into an assert
Everything should be in ssa form when we call this. This is a
hotpath so replace the check with an assert.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-02-14 09:35:32 +11:00
Timothy Arceri
46a4d2c867 nir: turn ssa check into an assert
Everthing should be in ssa form when this is called. Checking
for it here is expensive so turn this into an assert instead.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-02-14 09:35:32 +11:00
Timothy Arceri
0a89c9779a nir: prehash instruction in nir_instr_set_add_or_rewrite()
There is no need to hash the instruction twice, especially as we
end up adding it in the majority of cases.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-14 09:35:32 +11:00
Dylan Baker
279060cd32 meson: Add dependency on genxml to anvil
Currently the Intel "anvil" driver races with the generation of genxml
files, while i965 has an explicit dependency. This patch adds the same
dependency to anvil.

Fixes: d1992255bb
       ("meson: Add build Intel "anv" vulkan driver")
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-13 22:01:00 +00:00
Samuel Pitoiset
334da034d8 radv: always export gl_SampleMask when the fragment shader uses it
For some reasons, this breaks trees rendering in Project Cars.

Fixes: 85010585cd ("radv: only enable gl_SampleMask if MSAA is enabled too")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109401
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-13 23:01:30 +01:00
Alok Hota
736241892f gallium/aux: add PIPE_CAP_MAX_VARYINGS to u_screen
Allows drivers using `u_pipe_screen_get_param_defaults` to use a
fallback value for the new pipe cap. Default value of 8 based on GL 2.1
MAX_VARYING_FLOATS

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-13 15:08:14 -06:00
Kristian H. Kristensen
e8566d7098 .mailmap: Add a few more alises for myself
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-13 12:03:41 -08:00
Samuel Pitoiset
5e18000d1b radv/winsys: fix BO list creation when RADV_DEBUG=allbos is set
Fixes: 50fd253bd6 ("radv/winsys: Add priority handling during submit.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-13 20:51:40 +01:00
Kristian H. Kristensen
0a41ddbd4e freedreno/a6xx: Fix point coord
Use ir3_next_varying() for iterating through varyings and unset the
global point coord invert bit.

Fixes:

  dEQP-GLES3.functional.shaders.builtin_variable.pointcoord

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-13 11:14:06 -08:00
Kristian H. Kristensen
2fbd2d5f58 freedreno/a6xx: Front facing needs UNK3 bit
We need to set UNK3 in GRAS_CNTL and RB_RENDER_CONTROL0 for the value
to be reliably delivered.

Fixes:

  dEQP-GLES3.functional.shaders.builtin_variable.frontfacing

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-13 11:14:06 -08:00
Kristian H. Kristensen
1831238c8e freedreno/a6xx: Update headers
This pulls in changes for compute shaders and a6xx ssbo/image support.
FACENESS bit moved from position 1 to 2 and there's a global invert
bit for point coord.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-13 11:14:06 -08:00
Kristian H. Kristensen
182e5c011f freedreno/a6xx: Clean up mixed use of swap and swizzle for texture state
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-13 11:03:29 -08:00
Rob Clark
61094629cb freedreno/a6xx: small compiler warning fix
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-02-13 13:54:05 -05:00
Dylan Baker
aff52dd2c6 get-pick-list: Add --pretty=medium to the arguments for Cc patches
Because none of them have been picked up for 19.0 due to this bug
being reintroduced.

v2: - Fix fixes tags

Fixes: e6b3a3b201
       ("bin/get-pick-list.sh: handle "typod" usecase.")
Fixes: fac10169bb
       ("bin/get-pick-list.sh: prefix output with "[stable] "")
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-02-13 08:59:30 -08:00
Eric Engestrom
68a9383c6f gitlab-ci: limit ninja to 4 threads max
I tried bumping the limit on make and scons instead, but that just
thrashed the runners, so let's not do that (sorry @daniels :]).

Instead, remove the automatic thread management from ninja and limit it
to 4 instead, in line with make and scons.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-13 16:15:43 +00:00
Konstantin Kharlamov
fccc9d3de6 mapi: work around GCC LTO dropping assembly-defined functions
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109391

Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-13 14:20:51 +00:00
Caio Marcelo de Oliveira Filho
017349997f nir: fix example in opt_peel_loop_initial_if description
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-12 20:33:20 -08:00
Karol Herbst
7e08f22a72 nir/opt_if: don't mark progress if nothing changes
if we have something like this:

loop {
   ...
   if x {
      break;
   } else {
      continue;
   }
}

opt_if_loop_last_continue returns true marking progress allthough nothing
changes.

Fixes: 5921a19d4b "nir: add if opt opt_if_loop_last_continue()"
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-13 00:21:35 +01:00
Oscar Blumberg
3c540e0a74 radeonsi: Fix guardband computation for large render targets
Stop using 12.12 quantization for viewports that are not contained in
the lower 4k corner of the render target as the hardware needs to keep
both absolute and relative coordinates representable.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
2019-02-12 17:21:46 -05:00
Chia-I Wu
2f8734e13b egl: fix KHR_partial_update without EXT_buffer_age
EGL_BUFFER_AGE_EXT can be queried without EXT_buffer_age.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-12 19:14:34 +00:00
Kenneth Graunke
5a006b026d mesa: Advertise EXT_float_blend in ES 3.0+ contexts.
This extension simply drops a draw time restriction:

    "Furthermore, an INVALID_OPERATION error is generated by
     DrawArrays and the other drawing commands defined in section
     2.8.3 (10.5 in ES 3.1) if blending is enabled (see below) and
     any draw buffer has 32-bit floating-point format components."

We never correctly enforced this restriction anyway, so we were
basically already implementing it.  We just need to advertise it
for our behavior to be correct.

The extension requires EXT_color_buffer_float, but we already enable
that via dummy_true.  So we can dummy_true this one as well.

Found while debugging WebGL conformance tests.  Does not fix any.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-02-12 10:57:25 -08:00
Alok Hota
d3dfa86a30 gallium/swr: Param defaults for unhandled PIPE_CAPs
Without using this function, we fail the -Wswitch flag when compiling
the default debugoptimized mode in Meson

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-02-12 18:55:14 +00:00
Juan A. Suarez Romero
1ad26f9417 anv/cmd_buffer: check for NULL framebuffer
This can happen when we record a VkCmdDraw in a secondary buffer that
was created inheriting from the primary buffer, but with the framebuffer
set to NULL in the VkCommandBufferInheritanceInfo.

Vulkan 1.1.81 spec says that "the application must ensure (using scissor
if neccesary) that all rendering is contained in the render area [...]
[which] must be contained within the framebuffer dimesions".

While this should be done by the application, commit 465e5a86 added the
clamp to the framebuffer size, in case of application does not do it.
But this requires to know the framebuffer dimensions.

If we do not have a framebuffer at that moment, the best compromise we
can do is to just apply the scissor as it is, and let the application to
ensure the rendering is contained in the render area.

v2: do not clamp to framebuffer if there isn't a framebuffer

v3 (Jason):
- clamp earlier in the conditional
- clamp to render area if command buffer is primary

v4: clamp also x and y to render area (Jason)

v5: rename used variables (Jason)

Fixes: 465e5a86 ("anv: Clamp scissors to the framebuffer boundary")
CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-12 19:19:13 +01:00
Marek Olšák
6c64413b6f radeonsi: use MEM instead of MEM_GRBM in COPY_DATA.DST_SEL
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-12 13:08:54 -05:00
Marek Olšák
f8e4c9df47 radeonsi: add AMD_DEBUG env var as an alternative to R600_DEBUG
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-12 13:08:54 -05:00
Samuel Pitoiset
1b8983c25b radv: fix using LOAD_CONTEXT_REG with old GFX ME firmwares on GFX8
This fixes a critical issue.

Cc: <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109575
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-12 17:39:30 +01:00
Samuel Pitoiset
bd1186572f radv: add support for push constants inlining when possible
This removes some scalar loads from shaders, but it increases
the number of SET_SH_REG packets. This is currently basic but
it could be improved if needed. Inlining dynamic offsets might
also help.

Original idea from Dave Airlie.

29077 shaders in 15096 tests
Totals:
SGPRS: 1321325 -> 1357101 (2.71 %)
VGPRS: 936000 -> 932576 (-0.37 %)
Spilled SGPRs: 24804 -> 24791 (-0.05 %)
Code Size: 49827960 -> 49642232 (-0.37 %) bytes
Max Waves: 242007 -> 242700 (0.29 %)

Totals from affected shaders:
SGPRS: 290989 -> 326765 (12.29 %)
VGPRS: 244680 -> 241256 (-1.40 %)
Spilled SGPRs: 1442 -> 1429 (-0.90 %)
Code Size: 8126688 -> 7940960 (-2.29 %) bytes
Max Waves: 80952 -> 81645 (0.86 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-12 17:25:54 +01:00
Samuel Pitoiset
8364ffe823 radv: keep track of the number of remaining user SGPRs
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-12 17:25:52 +01:00
Samuel Pitoiset
5f9379ca35 radv: gather if shaders load dynamic offsets separately
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-12 17:25:49 +01:00
Samuel Pitoiset
5806d99984 radv: gather more info about push constants
This is needed in order to inline some push constants when possible.
This also adds a new helper for initializing the pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-12 17:25:34 +01:00
Samuel Pitoiset
129a9f4937 radv: fix compiler issues with GCC 9
"The C standard says that compound literals which occur inside of
the body of a function have automatic storage duration associated
with the enclosing block. Older GCC releases were putting such
compound literals into the scope of the whole function, so their
lifetime actually ended at the end of containing function. This
has been fixed in GCC 9. Code that relied on this extended lifetime
needs to be fixed, move the compound literals to whatever scope
they need to accessible in."

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109543
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-12 14:48:08 +01:00
Tapani Pälli
2a2e69f975 i965: add P0x formats and propagate required scaling factors
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Lin Johnson <johnson.lin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-12 08:43:04 +02:00
Tapani Pälli
3da858a6b9 intel/compiler: add scale_factors to sampler_prog_key_data
Patch propagates given scale_factors to lowering options.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-12 08:42:25 +02:00
Tapani Pälli
722f96bfc8 dri: add P010, P012, P016 for 10bit/12bit/16bit YUV420 formats
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Lin Johnson <johnson.lin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-12 08:42:02 +02:00
Tapani Pälli
19a85a704b nir: add option to use scaling factor when sampling planes YUV lowering
Patch adds nir_lower_tex_options as parameter to sample_plane so that
we don't need to extend nir_tex_instr for this.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-12 08:41:20 +02:00
Kenneth Graunke
3eedc8f7b1 i965: Use info->textures_used instead of prog->SamplersUsed.
prog->SamplersUsed is set by the linker when validating resource limits,
while info->textures_used is gathered after NIR optimizations, which may
have eliminated some unused surfaces.

This may let us skip some work.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-11 21:34:50 -08:00
Kenneth Graunke
59ae985631 i965: Drop unnecessary 'and' with prog->SamplerUnits
textures_used_by_txf is a subset of textures_used which is a subset
of prog->SamplerUnits.  This should do nothing.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-11 21:34:48 -08:00
Kenneth Graunke
f5c7df4dc9 nir: Gather texture bitmasks in gl_nir_lower_samplers_as_deref.
Eric and I would like a bitmask of which samplers are used, similar to
prog->SamplersUsed, but available in NIR.  The linker uses SamplersUsed
for resource limit checking, but later optimizations may eliminate more
samplers.  So instead of propagating it through, we gather a new one.
While there, we also gather the existing textures_used_by_txf bitmask.

Gathering these bitfields in nir_shader_gather_info is awkward at best.
The main reason is that it introduces an ordering dependency between the
two passes.  If gathering runs before lower_samplers_as_deref, it can't
look at var->data.binding.  If the driver doesn't use the full lowering
to texture_index/texture_array_size (like radeonsi), then the gathering
can't use those fields.  Gathering might be run early /and/ late, first
to get varying info, and later to update it after variant lowering.  At
this point, should gathering work on pre-lowered or post-lowered code?
Pre-lowered is also harder due to the presence of structure types.

Just doing the gathering when we do the lowering alleviates these
ordering problems.  This fixes ordering issues in i965 and makes the
txf info gathering work for radeonsi (though they don't use it).

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-11 21:34:45 -08:00
Kenneth Graunke
120f9b8362 nir: Use sampler derefs in drawpixels and bitmap lowering.
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-11 21:34:44 -08:00
Kenneth Graunke
04bdc56872 program: Make prog_to_nir create texture/sampler derefs.
Until now, prog_to_nir has been setting texture_index and sampler_index
directly.  This is different than GLSL shaders, which create variable
dereferences and rely on lowering passes to reach this final form.

radeonsi uses variable dereferences for samplers rather than
texture_index and sampler_index, so it doesn't even make sense to set
them there.  By moving to derefs, we ensure that both GLSL and ARB
programs produce the same final form that the driver desires.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-11 21:34:40 -08:00
Kenneth Graunke
6a4be25a90 st/nir: Use sampler derefs in built-in shaders.
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-11 21:34:38 -08:00
Kenneth Graunke
ba9c1c8217 st/nir: Lower sampler derefs for builtin shaders.
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-11 21:34:36 -08:00
Kenneth Graunke
8d1646e0e1 st/nir: Pull sampler lowering into a helper function.
This will make it easier to reuse across GLSL / ARB / built-ins.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-11 21:34:35 -08:00
Kenneth Graunke
243c11dc16 i965: Call nir_lower_samplers for ARB programs.
An upcoming patch will start building derefs in prog_to_nir, at which
point we'll need to lower them to indexes.

This gets both GLSL and non-GLSL shaders using the same paths.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-11 21:34:30 -08:00
Kenneth Graunke
529a0711c1 glsl: Don't look at sampler uniform storage for internal vars
Passes like nir_lower_drawpixels add additional sampler variables,
and set an explicit binding which never changes.  These extra samplers
don't have proper uniform storage associated with them, and there is no
way to update bindings via the API.  So, for any 'hidden' variables,
just trust that there's an explicit binding set.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-11 21:34:28 -08:00
Kenneth Graunke
d34e434989 glsl: Allow gl_nir_lower_samplers*() without a gl_shader_program
I would like to be able to run gl_nir_lower_samplers() to turn texture
and sampler variable dereferences into indexes and offsets, even for
ARB programs, and built-in shaders.  This would make sampler handling
more consistent across the various types of shaders.

For GLSL programs, the gl_nir_lower_samplers_as_deref() pass looks up
the variable bindings in the shader program's uniform storage.  But
ARB programs and built-in shaders don't have a gl_shader_program, and
uniform storage doesn't exist.  In this case, we simply skip that
lookup, and trust var->data.binding to be set correctly by whoever
created the shader.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-11 21:34:22 -08:00
Kenneth Graunke
f45dd6d31b st/mesa: Limit GL_MAX_[NATIVE_]PROGRAM_PARAMETERS_ARB to 2048
Piglit's vp-max-array test creates a vertex program containing a uniform
array sized to the value of GL_MAX_NATIVE_PROGRAM_PARAMETERS_ARB.  Mesa
will then add additional state-var parameters for things like the MVP
matrix.

radeonsi currently exposes a value of 4096, derived from constant buffer
upload size.  This means the array will have 4096 elements, and the
extra MVP state-vars would get a prog_src_register::Index of over 4096.

Unfortunately, prog_src_register::Index is a signed 13-bit integer, so
values beyond 4096 end up turning into negative numbers.  Negative
source indexes are only valid for relative addressing, so this ends up
generating illegal IR.

In prog_to_nir, this would cause an out of bounds array access.
st_mesa_to_tgsi checks for a negative value, assumes it's bogus,
and remaps it to parameter 0 in order to get something in-range.
This isn't right - instead of reading the MVP matrix, it would read
the first element of the vertex program's large array.  But the test
only checks that the program compiles, so we never noticed that it
was broken.

This patch limits the size of the program limits, with the understanding
that we may need to generate additional state-vars internally.  i965 has
exposed 1024 for this limit for years, so I don't expect lowering it to
2048 will cause any practical problems for radeonsi or other drivers.

Fixes vp-max-array with prog_to_nir.c.

Cc: "19.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-11 21:09:51 -08:00
Francisco Jerez
374eb3cd6f intel/dump_gpu: Disambiguate between BOs from different GEM handle spaces.
This fixes a rather astonishing problem that came up while debugging
an issue in the Vulkan CTS.  Apparently the Vulkan CTS framework has
the tendency to create multiple VkDevices, each one with a separate
DRM device FD and therefore a disjoint GEM buffer object handle space.
Because the intel_dump_gpu tool wasn't making any distinction between
buffers from the different handle spaces, it was confusing the
instruction state pools from both devices, which happened to have the
exact same GEM handle and PPGTT virtual address, but completely
different shader contents.  This was causing the simulator to believe
that the vertex pipeline was executing a fragment shader, which didn't
end up well.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-11 12:27:22 -08:00
Kristian H. Kristensen
e404c6879d freedreno/a6xx: Fall back to masked RGBA blits for depth/stencil
The blitter doesn't seem to have a write mask, so for depth only and
stencil only blits to Z24S8 we cast the Z24S8 buffer to an RGBA UNORM8
buffer and fall back to pipeline blits with corresponding write mask.

Fixes

  dEQP-GLES3.functional.fbo.blit.depth_stencil.depth24_stencil8_stencil_only
  dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_depth
  dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_msaa_depth
  dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_depth
  dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_msaa_depth
  dEQP-GLES3.functional.fbo.msaa.2_samples.stencil_index8
  dEQP-GLES3.functional.fbo.msaa.4_samples.stencil_index8

Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-11 12:26:21 -08:00
Kristian H. Kristensen
f03ba155d5 freedreno/a6xx: Add format argument to fd6_tex_swiz()
We need to allow overriding the format with that of the image or
sampler view, so we can't take it from the resource in fd6_tex_swiz().

Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-11 12:26:21 -08:00
Kristian H. Kristensen
bc8c813d5a freedreno/a6xx: Support y-inverted blits
The src coordinates are s24.8. For an inverted blit that ends at y=0
we need to program -1 for sy2, so we need to handle negative values
correctly.

Fixes

  dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_mag_reverse_dst_y
  dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_min_reverse_dst_y
  dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_min_reverse_src_y
  dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_color
  dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_color

Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-11 12:26:21 -08:00
Kristian H. Kristensen
03a01e5d23 freedreno/a6xx: Support some depth/stencil blits on blitter
We can rewrite almost all depth stencil blits to various red-only
blits.  The exception is depth-only or stencil-only blits into z24s8
combined depth stencil buffer. We can fall back for depth-only, but
stencil-only remains broken.

Fixes

  dEQP-GLES3.functional.fbo.blit.depth_stencil.depth24_stencil8_basic
  dEQP-GLES3.functional.fbo.blit.depth_stencil.depth24_stencil8_scale
  dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_basic
  dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_scale
  dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_stencil_only

Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-11 12:26:21 -08:00
Kristian H. Kristensen
e9592da2b4 freedreno/a6xx: Move blit check so as to restore comment
The explanation for the compressed format check is broken across two
comments:

	/* We can blit if both or neither formats are compressed formats... */
	/* ... but only if they're the same compression format. */

but the ok_format() checks were inserted between, breaking up the flow
of the sentence.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-11 12:26:21 -08:00
Kristian H. Kristensen
d2639f2eac freedreno: Don't tell the blitter what it can't do
Call ctx->blit() and let it reject blits it can't do instead of giving
up on stencil blits and blits u_blitter can't do.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-11 12:26:21 -08:00
Kristian H. Kristensen
8cf1303698 freedreno: Consolidate u_blitter functions in freedreno_blitter.c
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-11 12:26:21 -08:00
Kristian H. Kristensen
701d30dda8 freedreno/a6xx: Combine emit_blit and fd6_blit
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-11 12:26:21 -08:00
Kristian H. Kristensen
6d1a7bdba3 freedreno/a6xx: Use the right resource for separate stencil stride
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-11 12:26:21 -08:00
Kristian H. Kristensen
24b4172375 freedreno: Log number of draw for sysmem passes
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-11 12:26:21 -08:00
Kristian H. Kristensen
a201cb157d freedreno/a6xx: Drop render condition check in blitter
We already check earlier in the call chain in fd_blit().
glBlitFramebuffer always sets render_condition_enable and thus we
would never try the blitter path for that.

Now that we get all of dEQP-GLES3.functional.fbo.blit.conversion.*
down this path, it turs out that the

  fail_if(info->mask != util_format_get_mask(info->src.format));
  fail_if(info->mask != util_format_get_mask(info->dst.format));

conditions weren't accurate.  util_format_get_mask() returns
PIPE_MASK_RGBA for any format with any color channels, while
info->mask is the exact set of channels to blit.  So we reject things
we could blit - for example, PIPE_FORMAT_R16G16_FLOAT where info->mask
is RG while util_format_get_mask() returns RGBA - and accept things we
can't.  It turns out that the blitter is happy to blit different
number of channels, but fails to blit formats with different numerical
formats and srgb formats.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-02-11 12:26:21 -08:00
Kristian H. Kristensen
4f7a9c23ed freedreno/a6xx: regen headers
Update for a6xx.xml.h to incorporate a few new bits and changes to
blit src rect coordinate types.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-02-11 12:26:21 -08:00
Leo Liu
a0a52a0367 st/va/vp9: set max reference as default of VP9 reference number
If there is no information about number of render targets

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-02-11 14:44:16 -05:00
Leo Liu
21cdb828a3 st/va: fix the incorrect max profiles report
Add "PIPE_VIDEO_PROFILE_MAX" to enum, so it will make sure here will
be correct when adding more profiles in the future.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109107

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-02-11 14:44:16 -05:00
Guttula, Suresh
2cf2a56739 st/va:Add support for indirect manner by returning VA_STATUS_ERROR_OPERATION_FAILED
Based on VA Spec,DeriveImage() returns VA_STATUS_ERROR_OPERATION_FAILED if driver
dont have support for internal surface formats.Currently vaDeriveImage()
failed for non-contiguous planes and operation failed error string is
required to support indirect manner i.e. vaCreateImage()+vaPutImage()
incase vaDeriveImage() failed with VA_STATUS_ERROR_OPERATION_FAILED.

This patch will notify to the client as operation failed with proper
error sting,so that client will fallback to vaCreateImage()+vaPutImage().

v2: updated commit message based on VA spec.

Signed-off-by: suresh guttula <suresh.guttula@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
2019-02-11 14:44:16 -05:00
Marek Olšák
114a899cc8 winsys/amdgpu: cs_check_space sets the minimum IB size for future IBs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-11 12:35:48 -05:00
Marek Olšák
766e920cdb winsys/amdgpu: clean up IB buffer size computation
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-11 12:35:48 -05:00
Marek Olšák
8c1cb393fc winsys/amdgpu: remove occurence of INDIRECT_BUFFER_CONST
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-11 12:35:48 -05:00
Marek Olšák
881ef14b32 winsys/amdgpu: use a separate fence list for syncobjs
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-11 12:35:48 -05:00
Marek Olšák
9f00123d51 winsys/amdgpu: unify fence list code
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-11 12:35:48 -05:00
Marek Olšák
ddfe209a0d winsys/amdgpu: don't drop manually added fence dependencies
wow, it's hard to believe that fence and syncobjs dependencies were ignored.

Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-11 12:35:48 -05:00
Marek Olšák
61c678d4bc radeonsi: fix EXPLICIT_FLUSH for flush offsets > 0
Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-11 12:35:06 -05:00
Marek Olšák
4522f01d4e gallium/u_threaded: fix EXPLICIT_FLUSH for flush offsets > 0
Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-11 12:35:04 -05:00
Jason Ekstrand
9e6a6ef0d4 nir/deref: Rematerialize parents in rematerialize_derefs_in_use_blocks
When nir_rematerialize_derefs_in_use_blocks_impl was first written, I
attempted to optimize things a bit by not bothering to re-materialize
the sources of deref instructions figuring that the final caller would
take care of that.  However, in the case of more complex deref chains
where the first link or two lives in block A and then another link and
the load/store_deref intrinsic live in block B it doesn't work.  The
code in rematerialize_deref_in_block looks at the tail of the chain,
sees that it's already in block B and skips it, not realizing that part
of the chain also lives in block A.

The easy solution here is to just rematerialize deref sources of deref
instructions as well.  This may potentially lead to a few more deref
instructions being created by the conditions required for that to
actually happen are fairly unlikely and, thanks to the caching, it's all
linear time regardless.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109603
Fixes: 7d1d1208c2 "nir: Add a small pass to rematerialize derefs per-block"
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-02-11 10:57:23 -06:00
Jason Ekstrand
fd77606b5b intel/fs: Use enumerated array assignments in fb read TXF setup
It's more clear and means we don't have to update the array every time
we add an optional texture instruction argument

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-11 10:57:09 -06:00
Michel Dänzer
d6c55f6c62 gitlab-ci: Re-use docker image from the main repo in forked repos
Instead of generating it from scratch in each forked repo. This should
save time, energy and storage. (The xserver & xf86-video-amdgpu CI
scripts do basically the same)

v2:
* Hardcode "mesa" instead of using $CI_PROJECT_NAME, to avoid breakage
  if the project name is changed after forking (Eric Engestrom)

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-11 12:24:31 +01:00
Ilia Mirkin
cc79a1483f nvc0: we have 16k-sized framebuffers, fix default scissors
For some reason we don't use view volume clipping by default, and use
scissors instead. These scissors were set to an 8k max fb size, while
the driver advertises 16k-sized framebuffers.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
2019-02-10 23:36:23 -05:00
Alyssa Rosenzweig
85e2bb58ca panfrost: Specify supported draw modes per-context
Midgard has native support for QUADS and POLYGONS; Bifrost seemingly
does not. Thus, Midgard generally skips prim_convert whereas Bifrost
needs the pass; this patch allows the setting of allowed primitives to
occur on a per-context basis (for runtime hardware selection).

v2: Use (POLYGONS + 1) instead of LINES_ADJACENCY.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Robert Foss <robert.foss@collabora.com>
2019-02-11 03:23:00 +00:00
Dave Airlie
90c6880df7 radv: remove alloc parameter from pipeline init
clang points out this isn't used.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-11 10:04:40 +10:00
Dave Airlie
a523ae0cac radv/llvm: initialise passes member.
Fixes coverity warning

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-11 08:59:02 +10:00
Dave Airlie
d2e82c2682 glsl: glsl to nir fix uninit class member.
The constructor should init this to NULL

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-02-11 08:55:07 +10:00
Alyssa Rosenzweig
2458797256 panfrost: Elucidate texture op scheduling comment
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-10 00:51:57 +00:00
Alyssa Rosenzweig
658961aec3 panfrost: Remove speculative if 0'd format bit code
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-10 00:51:51 +00:00
Alyssa Rosenzweig
b1213a3947 panfrost: Remove if 0'd dead code
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-10 00:50:35 +00:00
Alyssa Rosenzweig
e91e1786c5 panfrost: Add kernel-agnostic resource management
Various methods relating to resource management were previously marked
as kernel-specific, forcing them to stay downstream in the vendor
overlay and eventually be duplicated for DRM code. This patch adds back
this code in kernel-neutral space, allowing for code sharing and
minimising the diff to downstream.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-10 00:44:32 +00:00
Alyssa Rosenzweig
4ed23b193a panfrost: Don't hardcode number of nir_ssa_defs
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-10 00:42:52 +00:00
Alyssa Rosenzweig
97dcad8d3e panfrost: Clean-up one-argument passing quirk
Most Midgard instructions take two-arguments logically; there are always
two arguments at the assembly level. For the few instructions that take
only a single argument, generally the second argument slot is unused,
with a zero inline constant occupying the space. fmov/imov are the
exception, where the first argument is filled with r24 and the logical
argument is in the second slot.

Previously, these constraints were handled by a delicate, buggy series
of hacks. This commit removes these hacks. Instead, we look at the
logical number of arguments (from NIR), switching between two argument
and one-argument-one-zero style. We then introduce a quirk for the
flipped style, which applies to fmov/imov.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-10 00:41:25 +00:00
Karol Herbst
49397a3c84 glsl_type: initialize offset and location to -1 for glsl_struct_field
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-09 13:52:15 +01:00
Kenneth Graunke
55e00a2ea8 nouveau: Silence unhandled cap warnings
Nouveau apparently uses the u_screen helper but prints a warning in the
default case, so running any GL program would start grumbling.

Fixes: 8fa54bc549 gallium: Add a PIPE_CAP_NIR_COMPACT_ARRAYS capability bit.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-02-08 16:26:00 -08:00
Caio Marcelo de Oliveira Filho
ee670d09af intel/compiler: use 0 as sampler in emit_mcs_fetch
The sampler will be ignored since the underlying 'ld_mcs' operation
won't use it, so just fill the field with 0 instead of the texture to
make it clearer that's the case.

This will also avoid is_high_sampler() to kick in unnecessarily, in
case we are using the operation for a texture with index >= 16.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-08 14:51:56 -08:00
Eric Engestrom
e8e544436c wsi: query the ICD's max dimensions instead of hard-coding them
anv and radv both happened to already return 2^14 for these, but
querying the ICD is safer and will help if vdreno (or whatever it's
called) doesn't have the same max.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-08 18:54:57 +00:00
Ian Romanick
b031c64349 nir: Convert a bcsel with only phi node sources to a phi node
v2: Remove the original ALU instruciton after all of its readers are
modified to read the new ALU instruction.

v3: Fix an issue where a bcsel that may not be executed on a loop
iteration due to a break statement is converted to a phi (and therefore
incorrectly "executed").  Noticed by Tim.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109216
Fixes: 8fb8ebfbb0 ("intel/compiler: More peephole select")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-08 10:37:06 -08:00
Ian Romanick
0881e90c09 nir: Split ALU instructions in loops that read phis
A single shader in Unigine Superposition is affected by this change.
A single iadd is moved to the end of a loop.  This iadd is involved in
a complex set of logic to terminate the loop, and an extra mov
instruction is inserted.  This shader really needs the optimization
suggested by bugzilla #94747, and I expect that to make this tiny
regression go away.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15047543 -> 15047545 (<.01%)
instructions in affected programs: 565 -> 567 (0.35%)
helped: 0
HURT: 2

total cycles in shared programs: 369977253 -> 369978253 (<.01%)
cycles in affected programs: 127910 -> 128910 (0.78%)
helped: 0
HURT: 2

v2: Skip nir_op_vec{2,3,4} and nir_op_[fi]mov instructions to avoid
infinite optimization loops.  Remove the original ALU instruciton after
all of its readers are modified to read the new ALU instruction.

v3: Extend to the more general case.  The if the prev-block value from
the phi is not undef, this means the ALU instruction has to be
duplicated in both the prev-block and the continue-block.

Fixes: 8fb8ebfbb0 ("intel/compiler: More peephole select")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-08 10:37:06 -08:00
Ian Romanick
0c0c69729b nir: Select phi nodes using prev_block instead of continue_block
This simplifies some changes coming later.

Fixes: 8fb8ebfbb0 ("intel/compiler: More peephole select")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-08 10:37:06 -08:00
Ian Romanick
8d8f80af3a nir: Refactor code that checks phi nodes in opt_peel_loop_initial_if
This will be used in a couple more places soon.

The function name is... horribly long.  Neither Matt nor I could think
of any thing that was shorter and still more descriptive than
"is_phi_foo".  I'm willing to entertain suggestions.

Fixes: 8fb8ebfbb0 ("intel/compiler: More peephole select")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-08 10:37:06 -08:00
Ian Romanick
4d65d2b12e nir: Document some fields of nir_loop_terminator
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-08 10:37:06 -08:00
Ian Romanick
28ef5bb74c intel/compiler: Silence warning about value that may be used uninitialized
For some reason, this warning only occurs for me in release builds.

In file included from src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c:25:0:
src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c: In function ‘brw_nir_lower_mem_access_bit_sizes’:
src/compiler/nir/nir_builder.h:501:26: warning: ‘src_swiz[2]’ may be used uninitialized in this function [-Wmaybe-uninitialized]
       alu_src.swizzle[i] = swiz[i];
       ~~~~~~~~~~~~~~~~~~~^~~~~~~~~
src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c:225:16: note: ‘src_swiz[2]’ was declared here
       unsigned src_swiz[4];
                ^~~~~~~~

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-08 10:37:06 -08:00
Ian Romanick
78169870e4 nir: Silence zillions of unused parameter warnings in release builds
Fixes: cd56d79b59 "nir: check NIR_SKIP to skip passes by name"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-08 10:37:06 -08:00
Eric Engestrom
3dc5faf523 gitlab-ci: workaround docker bug for users with uppercase characters
CI_REGISTRY_IMAGE == lower($CI_REGISTRY/$CI_PROJECT_PATH)

Suggested-by: Daniel Stone <daniels@collabora.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-08 17:45:57 +00:00
Andrii Simiklit
2b7d5c3217 i965: consider a 'base level' when calculating width0, height0, depth0
I guess that when we calculating the width0, height0, depth0
to use for function 'intel_miptree_create' we need to consider
the 'base level' like it is done in the 'intel_miptree_create_for_teximage'
function.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107987
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-07 21:40:50 -08:00
Timothy Arceri
26aa460940 nir: rewrite varying component packing
There are a number of reasons for the rewrite.

1. Adding support for packing tess patch varyings in a sane way.

2. Making use of qsort allowing the code to be much easier to
   follow.

3. Fixes a bug where different interp types caused component
   packing to be skipped for all varyings in some scenarios.

4. Allows us to add a crude live range analysis for deciding
   which components should be packed together. This support can
   optionally be added in a future patch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-08 02:54:56 +00:00
Timothy Arceri
2f53260417 nir: add is_packing_supported_for_type() helper
This will be used in the following patches to determine if we
support packing the components of a varying.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-08 02:54:56 +00:00
Timothy Arceri
e041123841 nir: add glsl_type_is_32bit() helper
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-08 02:54:56 +00:00
Timothy Arceri
7b01d5c354 nir: add support for marking used patches when packing varyings
This adds support needed for marking the varyings as used but we
don't actually support packing patches in this patch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-08 02:54:56 +00:00
Timothy Arceri
d0af13cfb4 st/glsl_to_nir: call nir_remove_dead_variables() after lowing local indirects
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-08 02:54:56 +00:00
Timothy Arceri
d0abbaa528 util: move BITFIELD macros to util/macros.h
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-08 02:54:56 +00:00
Karol Herbst
cbd1ad6165 st/mesa: require RGBA2, RGB4, and RGBA4 to be renderable
If the driver does not support rendering to these formats but does
support texturing, we can end up in incompatibilities between textures
and renderbuffers that are then copied to.

Fixes KHR-GL45.copy_image.functional on nvc0

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-02-07 21:51:45 -05:00
Karol Herbst
6010d7b8e8 gallium: add PIPE_CAP_MAX_VARYINGS
Some NVIDIA hardware can accept 128 fragment shader input components,
but only have up to 124 varying-interpolated input components. We add a
new cap to express this cleanly. For most drivers, this will have the
same value as PIPE_SHADER_CAP_MAX_INPUTS for the fragment shader.

Fixes KHR-GL45.limits.max_fragment_input_components

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
[imirkin: rebased, improved docs/commit message]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-02-07 21:51:45 -05:00
Alyssa Rosenzweig
738346fa23 kmsro: Silence warning if missing
Regardless of whether the build uses kmsro, kmsro is the default driver
descriptor when the static loader is used. Thus, in an edge case where
the static loader is used, no static targets are loaded, and kmsro is
not compiled, a spurious warning is printed. There's no harm in
executing the stub function in this case, but it's not "an error" to not
have kmsro in the build; the driver missing warning should not printed
kmsro.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-02-08 01:48:37 +00:00
Lionel Landwerlin
f1bcb9be46 radv: assert that colorAttachment is valid for CmdClearAttachment
This partially reverts a change from b7a93cbded ("radv: Handle
VK_ATTACHMENT_UNUSED in CmdClearAttachment") which fixed actual issues
but also started to accept invalid values for the colorAttachment
field.

This change asserts that the field is valid for the current pass.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b7a93cbded ("radv: Handle VK_ATTACHMENT_UNUSED in CmdClearAttachment")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-08 00:18:16 +00:00
Lionel Landwerlin
a934a3d124 anv: assert that color attachment are valid
This reverts commit d76e777988.

Let's make this obvious that there is an application issue if it tries
to access an attachment that doesn't exist in the current pass.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d76e777988 ("anv: Handle VK_ATTACHMENT_UNUSED in colorAttachment")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-08 00:18:16 +00:00
Dave Airlie
3c153b3982 docs: update qbo support for virgl
Signed-off-by: Dave Airlie <airlied@redhat.com>
2019-02-08 09:06:36 +10:00
Eric Engestrom
6e0effbd34 travis: fix osx make build
This variable was removed in commit 087af992a2 "travis: remove
unused linux code path" because it looked like it was only used by the
Linux build. Turns out I was wrong, so let's restore it.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-02-07 20:14:14 +00:00
Jason Ekstrand
eaf5e4a24d README: Drop the badges from the readme
They have been added as badges directly to the GitLab project.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-07 12:46:17 -06:00
Eric Engestrom
358d0cfab2 driconf: drop unused macro
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-02-07 13:40:26 +00:00
Eric Engestrom
00be88aab8 meson: add script to print the options before configuring a builddir
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-07 13:22:41 +00:00
Alyssa Rosenzweig
d43ec104b7 panfrost: Include glue for out-of-tree legacy code
In addition to the DRM interface in active development, for legacy
kernels Panfrost has a small, optional, out-of-tree glue repository. For
various reasons, this legacy code should not be included in Mesa proper,
but this commit allows it to coexist peacefully with upstream Panfrost.
If the nondrm repo is cloned/symlinked to the directory
`src/gallium/drivers/panfrost/nondrm`, legacy functionality will be
built. Otherwise, the driver will build normally, though a runtime error
message will be printed if a legacy kernel is detected.

This workaround is icky, but it allows a nearly-upstream Panfrost to
work on real hardware, today. Ideally, this patch will be reverted when
the Panfrost kernel module is mature and we drop legacy support.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-07 01:58:32 +00:00
Alyssa Rosenzweig
7da251fc72 panfrost: Check in sources for command stream
This patch includes the command stream portion of the driver,
complementing the earlier compiler. It provides a base for future work,
though it does not integrate with any particular winsys.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
2019-02-07 01:57:50 +00:00
Alyssa Rosenzweig
8f4485ef1a panfrost: Use u_pipe_screen_get_param_defaults
Switching to the defaults function cleans up pan_screen.h markedly and
futureproofs for when new PIPE_CAPs are added.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Suggested-by: Eric Anholt <eric@anholt.net>
2019-02-07 01:57:19 +00:00
Alyssa Rosenzweig
8f9f99d84d kmsro: Move DRM entrypoints to shared block
As kmsro allows an essentially mix-and-match hodgepodge of display
drivers and renderonly GPUs, it doesn't make sense to couple the display
driver entrypoint definition with the driver. Instead, we move *all*
kmsro entrypoints to a shared kmsro block at the end (avoiding clutter
and distraction since this list may snowball in the future).

v2: Alphabetize driver list.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-07 01:50:16 +00:00
Rhys Perry
5b6f522fc2 nvc0: add compute invocation counter
The strategy is to keep a CPU-side counter of the direct invocations,
and a GPU-side counter of the indirect invocations, and then add them
together for queries.

The specific technique is a macro which multiplies a list of integers
together and accumulates the product into SCRATCH registers held inside
of the context. Another macro will read those values out and add them to
the passed-in cpu-side counter to be stored in a query buffer the same
way that all the other statistics are stored.

Original implementation by Rhys Perry, redone by Ilia Mirkin to use the
SCRATCH temporaries.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-02-06 19:35:57 -05:00
Karol Herbst
cce4955721 gm107/ir: add fp64 rsq
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-02-06 19:35:57 -05:00
Karol Herbst
815a8e59c6 gm107/ir: add fp64 rcp
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-02-06 19:35:57 -05:00
Karol Herbst
12669d2970 gk104/ir: Use the new rcp/rsq in library
[imirkin: add a few more "long" prefixes to safen things up]
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-02-06 19:35:57 -05:00
Boyan Ding
656ad06051 gk110/ir: Use the new rcp/rsq in library
v2: (Karol Herbst <kherbst@redhat.com>
 * fix Value setup for the builtins

Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
[imirkin: track the fp64 flag when switching ops to calls]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-02-06 19:35:57 -05:00
Boyan Ding
7937408052 gk110/ir: Add rsq f64 implementation
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-02-06 19:35:57 -05:00
Boyan Ding
04593d9a73 gk110/ir: Add rcp f64 implementation
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-02-06 19:35:57 -05:00
Ilia Mirkin
6adb9b38bf nvc0: stick zero values for the compute invocation counts
Not quite perfect, but at least we don't end up with random values in
the query buffer.

Fixes KHR-GL45.pipeline_statistics_query_tests_ARB.functional_default_qo_values

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-02-06 19:35:57 -05:00
Ilia Mirkin
e00799d3dc nv50,nvc0: use condition for occlusion queries when already complete
For the NO_WAIT variants, we would jump into the ALWAYS case for both
nested and inverted occlusion queries. However if the query had
previously completed, the application could reasonably expect that the
render condition would follow that result.

To resolve this, we remove the nesting distinction which unnecessarily
created an imbalance between the regular and inverted cases (since
there's no "zero" condition mode). We also use the proper comparison if
we know that the query has completed (which could happen as a result of
an earlier get_query_result call).

Fixes KHR-GL45.conditional_render_inverted.functional

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-02-06 19:35:57 -05:00
Ilia Mirkin
162352e671 nvc0: fix 3d images on kepler
Looks like SUBFM.3D and SUEAU are perfectly capable of dealing with 3d
tiling, they just need the correct inputs. Supply them.

We also have to deal with the case where a 2d "layer" of a 3d image is
bound. In this case, we supply the z coordinate separately to the
shader, which has to optionally treat every 2d case as if it could be a
slice of a 3d texture.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-02-06 19:35:57 -05:00
Ilia Mirkin
5de5beedf2 nvc0/ir: fix second tex argument after levelZero optimization
We used to pre-set a bunch of extra arguments to a texture instruction
in order to force the RA to allocate a register at the boundary of 4.
However with the levelZero optimization, which removes a LOD argument
when it's uniformly equal to zero, we undid that logic by removing an
extra argument. As a result, we could end up with insufficient alignment
on the second wide texture argument.

Instead we switch to a different method of achieving the same result.
The logic runs during the constraint analysis of the RA, and adds unset
sources as necessary right before being merged into a wide argument.

Fixes MISALIGNED_REG errors in Hitman when run with bindless textures
enabled on a GK208.

Fixes: 9145873b15 ("nvc0/ir: use levelZero flag when the lod is set to 0")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-02-06 19:35:57 -05:00
Ilia Mirkin
4443b6ddf2 nvc0/ir: always use CG mode for loads from atomic-only buffers
Atomic operations don't update the local cache, which means that we
would have to issue CCTL operations in order to get the updated values.
When we know that a buffer is primarily used for atomic operations, it's
easier to just avoid the caching at that level entirely.

The same issue persists for non-atomic buffers, which will have to be
fixed separately.

Fixes the failing dEQP-GLES31.functional.atomic_counter.* tests.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-02-06 19:35:57 -05:00
Ilia Mirkin
399215eb7a nvc0: add support for handling indirect draws with attrib conversion
The hardware does not natively support FIXED and DOUBLE formats. If
those are used in an indirect draw, they have to be converted. Our
conversion tries to be clever about only converting the data that's
needed. However for indirect, that won't work.

Given that DOUBLE or FIXED are highly unlikely to ever be used with
indirect draws, read the indirect buffer on the CPU and issue draws
directly.

Fixes the failing dEQP-GLES31.functional.draw_indirect.random.* tests.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-02-06 19:35:57 -05:00
Kristian H. Kristensen
0f7a20e91e freedreno/a6xx: Use tiling for all resources
We used to restrict this to just PIPE_BIND_SAMPLER_VIEW resources, but
most resources benefit from being tiled.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-02-06 15:28:48 -08:00
Kristian H. Kristensen
357ea7da51 freedreno/a6xx: Emit blitter dst with OUT_RELOCW
We're writing to the bo and the kernel needs to know for
fd_bo_cpu_prep() to work.

Fixes: f93e431272 ("freedreno/a6xx: Enable blitter")
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-06 15:22:25 -08:00
Bas Nieuwenhuizen
13ab63bb62 radv: Implement VK_EXT_buffer_device_address.
v2: Also update the release notes.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-06 22:37:38 +01:00
Bas Nieuwenhuizen
3259e7b036 radv: Do not use the bo list for local buffers.
The kernel already does it for us.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-06 22:36:19 +01:00
Bas Nieuwenhuizen
8a15950211 amd/common: Implement global memory accesses.
Needed for VK_EXT_buffer_device_address.

The pointers are implmemented as i8*, since I could not figure
out how to emulate setting struct offsets in LLVM based on the
SPIR-V offsets (and more weird stuff like row major matrices).

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-06 22:36:11 +01:00
Bas Nieuwenhuizen
5703ecf651 amd/common: Do not use 32-bit loads for shared memory.
We use a straight glsl->llvm type conversion so types should already be right.

Also even though the writemasks were changed we we not actually doing 32-bit
things, so this fails miserably.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-06 22:36:06 +01:00
Bas Nieuwenhuizen
8d1718590b amd/common: handle nir_deref_cast for shared memory from integers.
Can happen e.g. after a phi.

Fixes: a2b5cc3c39 "radv: enable variable pointers"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-06 22:36:02 +01:00
Bas Nieuwenhuizen
830fd0efc1 amd/common: Handle nir_deref_type_ptr_as_array for shared memory.
Fixes: a2b5cc3c39 "radv: enable variable pointers"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-06 22:35:58 +01:00
Bas Nieuwenhuizen
dbdb44d575 amd/common: Fix stores to derefs with unknown variable.
Fixes: a2b5cc3c39 "radv: enable variable pointers"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-06 22:35:54 +01:00
Bas Nieuwenhuizen
3c24fc64c7 amd/common: Use correct writemask for shared memory stores.
The check was for 1 bit being set, which is clearly not what we want.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-06 22:35:49 +01:00
Bas Nieuwenhuizen
00253ab2c4 radv: Fix the shader info pass for not having the variable.
For example with VK_EXT_buffer_device_address or
 VK_KHR_variable_pointers.

Fixes: a2b5cc3c39 "radv: enable variable pointers"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-06 22:35:45 +01:00
Bas Nieuwenhuizen
58c8dadd32 amd/common: Implement ptr->int casts in ac_to_integer.
For the implicit casts inherent in nir.

This should probably have been done for shared memory for
VK_KHR_variable_pointers.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-06 22:35:40 +01:00
Bas Nieuwenhuizen
e00d9a9a72 amd/common: Add gep helper for pointer increment.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-06 22:35:36 +01:00
Bas Nieuwenhuizen
39ab4e12f7 radv: Only look at pImmutableSamples if the descriptor has a sampler.
Equivalent of ANV patch c7f4a2867c

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-06 22:35:32 +01:00
Eric Engestrom
40b53a7203 xvmc: fix string comparison
Fixes: 6fca18696d "g3dvl: Update XvMC unit tests."
Cc: Younes Manton <younes.m@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 18:15:43 +00:00
Eric Engestrom
110a6e1839 xvmc: fix string comparison
Fixes: c7b65dcaff "xvmc: Define some Xv attribs to allow users
                             to specify color standard and procamp"
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 18:15:43 +00:00
Eric Engestrom
ba26bc4ef0 gitlab-ci: add meson glvnd build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
5459900f38 travis: remove unused scons code path
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
087af992a2 travis: remove unused linux code path
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
73275147fe gitlab-ci: add make Gallium ST Other build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
360a7bfbe9 gitlab-ci: add make Gallium ST Clover LLVM-7 build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
39315a747b gitlab-ci: add make Gallium ST Clover LLVM-6.0 build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
e80f88c48a gitlab-ci: add make Gallium ST Clover LLVM-5.0 build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
cc85f50029 gitlab-ci: add make Gallium ST Clover LLVM-4.0 build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
984e295500 gitlab-ci: add make Gallium ST Clover LLVM-3.9 build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
d0dff24cbb gitlab-ci: add make Gallium Drivers "Other" build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
055cfbc6de gitlab-ci: add make Gallium Drivers RadeonSI build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
7b26a19f31 gitlab-ci: add make Gallium Drivers SWR build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
bbdc563c11 gitlab-ci: add make loaders/classic DRI build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
f33517bda7 gitlab-ci: add meson gallium ST "Other" build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
8dab707ab8 gitlab-ci: add meson gallium ST Clover (LLVM 7.0) build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
8744ac0904 gitlab-ci: add meson gallium ST Clover (LLVM 6.0) build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
b5a70af062 gitlab-ci: add meson gallium ST Clover (LLVM 5.0) build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
d407ead204 gitlab-ci: add meson gallium "other drivers" build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
06e8f1961b gitlab-ci: add meson gallium RadeonSI build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
360c814bfe gitlab-ci: add meson gallium SWR build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
d73265e20d gitlab-ci: add meson loader/classic DRI build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
6a19ec9daa gitlab-ci: add scons SWR build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
d4c6d4d5cb gitlab-ci: add scons llvm 3.5 build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
06b245b438 gitlab-ci: add a scons no-llvm build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
89a7467899 gitlab-ci: add a make vulkan build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
46d23c0a46 gitlab-ci: add a meson vulkan build
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Eric Engestrom
329f5cd780 gitlab-ci: add ubuntu container
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-06 17:56:30 +00:00
Marek Olšák
42a1cd034d radeonsi: use local ws variable in si_need_dma_space
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-06 11:17:21 -05:00
Marek Olšák
2c4911c652 radeonsi: don't leak an index buffer if draw_vbo fails
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-06 11:17:21 -05:00
Marek Olšák
d72c319867 radeonsi: make allocator_zeroed_memory unmappable and use bigger buffers
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-06 11:17:21 -05:00
Marek Olšák
5068dec5de radeonsi: clear allocator_zeroed_memory with SDMA
so that it can be used in parallel IBs.

This also removes the SO_FILLED_SIZE hack.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-06 11:17:21 -05:00
Marek Olšák
7d4c935654 radeonsi: initialize textures using DCC to black when possible
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2019-02-06 11:17:21 -05:00
Jonathan Marek
3361305f57 freedreno: a2xx: fix fast clear
Fixes: 912a9c8d

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-02-06 14:34:57 +00:00
Eric Engestrom
54fa5eceae egl: use coherent variable names
`EGLDisplay` variables (the opaque Khronos type) have mostly been
consistently called `dpy`, as this is the name used in the Khronos
specs.

However, `_EGLDisplay` variables (our internal struct) have been
randomly called `dpy` when there was no local variable clash with
`EGLDisplay`s, and `disp` otherwise.

Let's be consistent and use `dpy` for the Khronos type, and `disp`
for our struct.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Anholt <eric@anholt.net>
2019-02-06 11:53:24 +00:00
Alyssa Rosenzweig
a81d5587d6 meson: Remove panfrost from default driver list
Until the kernel side matures and the full driver is upstreamed, to
avoid end-user surprises, Panfrost should only be built for the
adventurous.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-06 02:59:00 +00:00
Eric Anholt
3c08ecf147 v3d: Whitespace consistency fix. 2019-02-05 15:46:42 -08:00
Eric Anholt
940501a446 v3d: Fix copy-propagation of input unpacks.
I had a single function for "does this do float input unpacking" with two
major flaws: It was missing the most common thing to try to copy propagate
a f32 input nunpack to (the VFPACK to an FP16 render target) along with
several other ALU ops, and also would try to propagate an f32 unpack into
a VFMUL which only does f16 unpacks.

instructions in affected programs: 659232 -> 655895 (-0.51%)
uniforms in affected programs: 132613 -> 135336 (2.05%)

and a couple of programs increase their thread counts.

The uniforms hit appears to be a pattern in generated code of doing (-a >=
a) comparisons, which when a is abs(b) can result in the abs instruction
being copy propagated once but not fully DCEed.
2019-02-05 15:46:04 -08:00
Eric Anholt
e5c6938590 v3d: Fix input packing of .l for rounding/fdx/fdy.
Avoids a regression in
dEQP-GLES3.functional.shaders.derivate.fwidth.texture.* once we start
copy-propagating more input packs.
2019-02-05 15:45:23 -08:00
Eric Anholt
1a4170952d v3d: Fix pack/unpack of VFPACK operand unpacks.
We want to be able to copy propagate our texture unpacks into the vfpack.
2019-02-05 15:45:23 -08:00
Eric Anholt
d0fdbd4211 v3d: Fix dumping of shaders with alpha test.
We were trying to print a NULL entry from the table.
2019-02-05 15:42:14 -08:00
Eric Anholt
bdef17b052 v3d: Store the actual mask of color buffers present in the key.
If you only bound rt 1+, we'd still emit a write to the rt0 that isn't
present (noticed while debugging an
ext_framebuffer_multisample-alpha-to-coverage-no-draw-buffer-zero
regression in another change).
2019-02-05 15:42:04 -08:00
Eric Anholt
17a649af05 v3d: Fix precompile of FRAG_RESULT_DATA1 and higher outputs.
I was just leaving the other MRT targets than DATA0 out, by accident.
2019-02-05 15:35:49 -08:00
Kristian H. Kristensen
ba4b22011a st/nir: Use src/ relative include path for autotools
Fixes: cdc53fa81c
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
2019-02-05 14:19:51 -08:00
Kenneth Graunke
8fa54bc549 gallium: Add a PIPE_CAP_NIR_COMPACT_ARRAYS capability bit.
Iris would like to use compact arrays for tesslevels and clip/cull
distances.  radeonsi will likely want to switch to these at some point,
since it'll be necessary for GL_ARB_gl_spirv support, but it's not ready
for them just yet.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-05 13:58:46 -08:00
Kenneth Graunke
cf731564e6 st/nir: Call nir_lower_clip_cull_distance_arrays().
Today, st always sets LowerCombinedClipCullDistance, causing the GLSL IR
lowering to run, giving us vec4[2] arrays.  I would like to disable this
and instead run the NIR lowering so that we get compact float[] arrays
instead.

Calling the new pass is a noop if the GLSL IR pass has already run, so
it's safe to call the pass unconditionally.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-05 13:58:46 -08:00
Kenneth Graunke
15c6902117 nir: Avoid splitting compact arrays into per-element variables.
Compact arrays are used for special variables like clip and cull
distances, or tessellation levels.  Drivers using compact arrays
assume that these values will always be actual arrays.  We don't
want to turn a float[1] gl_CullDistance into a single float; that
would confuse drivers.

Today, i965 uses compact arrays, and Gallium drivers use
nir_lower_io_arrays_to_elements, so we haven't had any overlap
that would demonstrate the issue.  Iris will use both.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-05 13:58:46 -08:00
Kenneth Graunke
ba9dcc80fb nir: Avoid clip/cull distance lowering multiple times.
A couple places in st/nir assume that cull distances have been lowered
away, so it will need to call this lowering pass for drivers which opt
out of the GLSL IR lowering.  The Intel backend also calls this pass,
for i965 and anv.  We need to only do it once.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-05 13:58:46 -08:00
Kenneth Graunke
5730364d69 nir: Bail on clip/cull distance lowering if GLSL IR already did it.
We have a GLSL IR pass to convert clip/cull distance float[] arrays
into vec4[2] arrays.  In ff281e6204, we attempted to skip this pass
if the GLSL IR lowering had already run.  But, that code was not quite
right, as we forgot to strip away the per-vertex IO array layer for
geometry and tessellation shader varyings.

If the GLSL IR pass has run, the variables will not be marked as
"compact".  So we can simply check that and bail.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-05 13:58:46 -08:00
Kenneth Graunke
ef99f4c8d1 compiler: Mark clip/cull distance arrays as compact before lowering.
nir_lower_clip_cull_distance_arrays() marks the combined clip/cull
distance array as compact.  However, when translating in from GLSL
or SPIR-V, we were not marking the original float[] arrays as compact.

We should do so.  That way, we can detect these corner cases properly.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-05 13:58:46 -08:00
Kenneth Graunke
3327c93510 nir: Record info->fs.pixel_center_integer in lower_system_values
radeonsi uses a system value for gl_FragCoord rather than an input var.
These get translated into load_frag_coord NIR intrinsics, which lose the
pixel_center_integer and origin_upper_left decorations.  To cope with
this, Tim added a shader_info field for pixel_center_integer, and made
glsl_to_nir set it accordingly.

prog_to_nir also needs to handle these fragcoord conventions.  Instead
of duplicating the logic to set the info field, just move it to
nir_lower_system_values so it'll happen regardless of who makes the NIR.

(For what it's worth, we don't need an info flag for origin_upper_left,
because radeonsi lowers origin conventions in nir_lower_wpos_ytransform
before nir_lower_system_values destroys the variable and qualifiers.)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-05 13:51:52 -08:00
Kenneth Graunke
536abd453b program: Extend prog_to_nir handle system values.
Some drivers, such as radeonsi, use a system value for gl_FragCoord
rather than an input variable.  In this case, our Mesa IR will have
a PROGRAM_SYSTEM_VALUE register, which we need to translate.

This makes prog_to_nir work for Gallium drivers which expose the
PIPE_CAP_TGSI_FS_POSITION_IS_SYSVAL capability bit.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-05 13:51:51 -08:00
Kenneth Graunke
fa38ca25f6 program: Use u_bit_scan64 in prog_to_nir.
We can simply iterate the bits rather than using util_last_bit and
checking each one up until that point.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-05 13:51:50 -08:00
Kenneth Graunke
a01ad3110a st/mesa: Add NIR versions of the PBO upload/download shaders.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eric Anholt <eric@anholt.net>
2019-02-05 13:43:42 -08:00
Kenneth Graunke
a02349b9e7 st/mesa: Add a NIR version of the OES_draw_texture built-in shaders.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eric Anholt <eric@anholt.net>
2019-02-05 13:43:41 -08:00
Kenneth Graunke
be492affa8 st/mesa: Add NIR versions of the clear shaders.
We implement the basic VS and FS, as well as the VS that does layered
clears by writing gl_Layer from the vertex shader.  Drivers which need
a geometry shader for writing layer continue falling back to TGSI, as
I didn't need this and so didn't bother implementing it.  (We certainly
could, however, if people want to add it in the future.)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eric Anholt <eric@anholt.net>
2019-02-05 13:43:39 -08:00
Kenneth Graunke
3f28b245b5 st/mesa: Add NIR versions of the drawpixels Z/stencil fragment shaders.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eric Anholt <eric@anholt.net>
2019-02-05 13:43:37 -08:00
Kenneth Graunke
2d45f9fa25 st/mesa: Add a NIR version of the drawpixels/bitmap VS copy shader.
This provides a native NIR version of the DrawPixels/Bitmap passthrough
vertex shader.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eric Anholt <eric@anholt.net>
2019-02-05 13:43:36 -08:00
Kenneth Graunke
cdc53fa81c st/nir: Make new helpers for constructing built-in NIR shaders.
The state tracker generates several built-in shaders in order to
perform scissored clears, upload/download PBOs, and so on.  These
are currently constructed using TGSI, using ureg and u_simple_shader.

I want to have NIR versions of these shaders, for my Gallium driver
that has a NIR backend but no TGSI support.  To that end, we'll want
a few helpers to help construct simple shaders.

This patch adds two new helpers:

- st_nir_finish_builtin_shader() takes a manually constructed NIR
  shader, applies lowering passes (like st_link_nir would do for GLSL),
  and constructs the pipe_shader_state.

- st_nir_make_passthrough_shader() makes a simple passthrough shader,
  which copies inputs to outputs.  This is similar to u_simple_shaders.

v2: Set info->fs.untyped_color_outputs for vc4/v3d (thanks Eric!).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eric Anholt <eric@anholt.net>
2019-02-05 13:43:33 -08:00
Kenneth Graunke
4f799264d1 st/nir: Move varying setup code to a helper function.
I want to reuse this for built-in shaders.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Tested-by: Eric Anholt <eric@anholt.net>
2019-02-05 13:43:02 -08:00
Jason Ekstrand
36734987a5 nir/deref: Drop zero ptr_as_array derefs
They are effectively (&x)[0] or *&x which does nothing.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-05 15:17:19 -06:00
Eric Anholt
aaef12702f nir: Move V3D's "the shader was TGSI, ignore FS output types" flag to NIR.
Ken's rework of mesa/st builtins to NIR means that we'll have more NIR
shaders with color output types that are mismatched with the render target
types.  Since this is behavior that GLSL doesn't require, add it as a
shader_info option so the driver can know that it needs to ignore the FS
output's base type in favor of the actual render target's.  This prevents
needing additional variants in several mesa/st paths (clear, pbo upload,
pbo download), given that the driver already has to handle the variants
for any TGSI being passed to it (from u_blitter, for example).

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-05 12:12:33 -08:00
Emil Velikov
8943eb8f03 anv: wire up the state_pool_padding test
Cc: Jason Ekstrand <jason@jlekstrand.net>
Fixes: 927ba12b53 ("anv/tests: Adding test for the state_pool padding.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com><Paste>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-02-05 11:39:36 -08:00
Karol Herbst
a61c388d07 nvc0/ir: replace cvt instructions with add to improve shader performance
gives me an performance boost of 0.2% in pixmark_piano on my gk106, gm204 and
gp107.

reduces the amount of generated convert instructions by roughly 30% in
shader-db.

v2: only for 32 bit operations
    move some common code out of the switch
    handle OP_SAT with modifiers
v3: only for registers and const memory
    rework if clauses
    merge isCvt into this patch
v4: merge isCvt into its use

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-02-05 20:35:38 +01:00
Bart Oldeman
a203eaa4f4 gallium-xlib: query MIT-SHM before using it.
When Mesa is compiled for gallium-xlib using e.g.
./configure --enable-glx=gallium-xlib --disable-dri --disable-gbm
-disable-egl
and is used by an X server (usually remotely via SSH X11 forwarding)
that does not support MIT-SHM such as XMing or MobaXterm, OpenGL
clients report error messages such as
Xlib:  extension "MIT-SHM" missing on display "localhost:11.0".
ad infinitum.

The reason is that the code in src/gallium/winsys/sw/xlib uses
MIT-SHM without checking for its existence, unlike the code
in src/glx/drisw_glx.c and src/mesa/drivers/x11/xm_api.c.
I copied the same check using XQueryExtension, and tested with
glxgears on MobaXterm.

This issue was reported before here:
https://lists.freedesktop.org/archives/mesa-users/2016-July/001183.html

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: <mesa-stable@lists.freedesktop.org>
2019-02-05 17:53:35 +00:00
Alok Hota
6e5eb4ead6 swr/rast: update SWR rasterizer shader stats
Primarily refactoring internal stats types

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2019-02-05 11:41:25 -06:00
Michel Dänzer
c0a540f320 loader/dri3: Use strlen instead of sizeof for creating VRR property atom
sizeof counts the terminating null character as well, so that also
contributed to the ID computed for the X11 atom. But the convention is
for only the non-null characters to contribute to the atom ID.

Fixes: 2e12fe425f "loader/dri3: Enable adaptive_sync via
                     _VARIABLE_REFRESH property"
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-02-05 17:18:44 +00:00
Jonathan Marek
4f0a3c9f9e nir: add missing vec opcodes in lower_bool_to_float
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-05 15:34:15 +00:00
Gert Wollny
b0b3de2be7 mesa: release references to image textures when a context is destroyed
When a texture is still bound as an image and the context it was bound in
is destroyed but not the texture, then the texture will still hold the
resource and will not be freed when it is finally destroyed. Hence, release
these references when the context is destroyed.

This leak was triggered by virglrenderer:
https://gitlab.freedesktop.org/virgl/virglrenderer/issues/86

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-02-05 10:53:41 +00:00
Gert Wollny
f1f3640f6f radeonsi: release tokens after creating the shader program
ureg_get_tokens clears the reference to the tokens, and create_compute_state makes
a copy, hence the tokens must be explicitely released.

Fixes: Direct leak of 256 byte(s) in 1 object(s) allocated from:
    #0 0x7ff729cf3c60 in realloc (/usr/lib64/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so+0xdbc60)
    #1 0x7ff721b1240c in tokens_expand ../../samba/mesa/src/gallium/auxiliary/tgsi/tgsi_ureg.c:234
    #2 0x7ff721b1c9c0 in get_tokens ../../samba/mesa/src/gallium/auxiliary/tgsi/tgsi_ureg.c:257
    #3 0x7ff721b1c9c0 in copy_instructions ../../samba/mesa/src/gallium/auxiliary/tgsi/tgsi_ureg.c:2040
    #4 0x7ff721b1c9c0 in ureg_finalize ../../samba/mesa/src/gallium/auxiliary/tgsi/tgsi_ureg.c:2090
    #5 0x7ff721b1e919 in ureg_get_tokens ../../samba/mesa/src/gallium/auxiliary/tgsi/tgsi_ureg.c:2167
    #6 0x7ff721f8b35a in si_create_dma_compute_shader ../../samba/mesa/src/gallium/drivers/radeonsi/si_shaderlib_tgsi.c:219
    #7 0x7ff722043ed9 in si_compute_do_clear_or_copy ../../samba/mesa/src/gallium/drivers/radeonsi/si_compute_blit.c:156
    #8 0x7ff7220448d3 in si_clear_buffer ../../samba/mesa/src/gallium/drivers/radeonsi/si_compute_blit.c:247
    #9 0x7ff7220350e8 in vi_dcc_clear_level ../../samba/mesa/src/gallium/drivers/radeonsi/si_clear.c:274

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-02-05 11:50:54 +01:00
Caio Marcelo de Oliveira Filho
8c7c543936 isl: assert that Gen8+ don't have bit6_swizzling
v2: Rewrite the condition to more clearly match the comment. (Jordan)

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-04 20:44:41 -08:00
Caio Marcelo de Oliveira Filho
5299c9cbcc anv: skip bit6 swizzle detection in Gen8+
It is always false on Gen8+.  Also, move the variable definition near
its use.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-04 20:44:41 -08:00
Caio Marcelo de Oliveira Filho
60740eade3 i965: skip bit6 swizzle detection in Gen8+
It is always false on Gen8+.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-04 20:44:41 -08:00
Caio Marcelo de Oliveira Filho
51547bbc5a nir: keep the phi order when splitting blocks
All things being equal is better to keep the original order.  Since
the new block is empty, push the phis in order to tail.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>
2019-02-04 20:41:13 -08:00
Ilia Mirkin
38f542783f nv50,nvc0: add explicit settings for recent caps
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
2019-02-04 23:36:46 -05:00
Alyssa Rosenzweig
e67e072637 panfrost: Implement Midgard shader toolchain
This patch implements the free Midgard shader toolchain: the assembler,
the disassembler, and the NIR-based compiler. The assembler is a
standalone inaccessible Python script for reference purposes. The
disassembler and the compiler are implemented in C, accessible via the
standalone `midgard_compiler` binary. Later patches will use these
interfaces from the driver for online compilation.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2019-02-05 01:26:28 +00:00
Alyssa Rosenzweig
61d3ae6e0b panfrost: Initial stub for Panfrost driver
This patch adds an initial stub for the Gallium driver, containing
simple screen functions and the majority of the driver headers but no
actual functionality. It further adds the winsys glue for linking in
this stub driver via kmsro on Rockchip/Amlogic boards.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2019-02-05 01:19:30 +00:00
Marek Olšák
742d6cdb42 radeonsi: fix crashing performance counters (division by zero)
Fixes: e2b9329f17 "radeonsi: move remaining perfcounter code into si_perfcounter.c"
2019-02-04 18:46:25 -05:00
Marek Olšák
a03ecbaeec radeonsi: handle render_condition_enable in si_compute_clear_render_target 2019-02-04 18:46:25 -05:00
Sonny Jiang
984fd73515 radeonsi: use compute for clear_render_target when possible
Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
2019-02-04 18:46:25 -05:00
Kenneth Graunke
dc46317d1a st/mesa: Set pipe_image_view::shader_access in PBO readpixels.
Commit 8b626a22b2 introduced a new
pipe_image_view::shader_access field, indicating the access mode
specified in the shader.  st/mesa's built-in PBO download shader
creates a write-only image buffer, so we should flag it as such.

Nobody uses this field yet (Iris will), so we don't need to backport
this fix to stable branches.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-02-04 11:17:56 -08:00
Rodrigo Vivi
56c3b4971d intel: Add more PCI Device IDs for Coffee Lake and Ice Lake.
Align with kernel commits:

5e0f5a58b167 ("drm/i915/cfl: Adding another PCI Device ID.")
03ca3cf8e9aa ("drm/i915/icl: Adding few more device IDs for Ice Lake")

Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-04 10:05:25 -08:00
Danylo Piliaiev
64d3b148fe anv: Fix VK_EXT_transform_feedback working with varyings packed in PSIZ
Transform feedback did not set correct SO_DECL.ComponentMask for
varyings packed in VARYING_SLOT_PSIZ:
 gl_Layer         - VARYING_SLOT_LAYER    in VARYING_SLOT_PSIZ.y
 gl_ViewportIndex - VARYING_SLOT_VIEWPORT in VARYING_SLOT_PSIZ.z
 gl_PointSize     - VARYING_SLOT_PSIZ     in VARYING_SLOT_PSIZ.w

Fixes: 36ee2fd61c "anv: Implement the basic form of VK_EXT_transform_feedback"

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-04 15:30:43 +00:00
Danylo Piliaiev
b7a93cbded radv: Handle VK_ATTACHMENT_UNUSED in CmdClearAttachment
From the Vulkan 1.0.98 spec for vkCmdClearAttachments:

"If any attachment to be cleared in the current subpass is VK_ATTACHMENT_UNUSED,
then the clear has no effect on that attachment."

"If the aspectMask member of any element of pAttachments contains
VK_IMAGE_ASPECT_COLOR_BIT, then the colorAttachment member of that
element must either refer to a color attachment which is VK_ATTACHMENT_UNUSED,
or must be a valid color attachment."

"If the aspectMask member of any element of pAttachments contains
VK_IMAGE_ASPECT_DEPTH_BIT, then the current subpass' depth/stencil attachment
must either be VK_ATTACHMENT_UNUSED, or must have a depth component"

"If the aspectMask member of any element of pAttachments contains
VK_IMAGE_ASPECT_STENCIL_BIT, then the current subpass' depth/stencil attachment
must either be VK_ATTACHMENT_UNUSED, or must have a stencil component"

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 14:50:43 +02:00
Danylo Piliaiev
d76e777988 anv: Handle VK_ATTACHMENT_UNUSED in colorAttachment
From the Vulkan 1.0.98 spec for vkCmdClearAttachments:

"If the aspectMask member of any element of pAttachments contains
VK_IMAGE_ASPECT_COLOR_BIT, then the colorAttachment member of that
element must either refer to a color attachment which is VK_ATTACHMENT_UNUSED,
or must be a valid color attachment."

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-04 14:49:50 +02:00
Samuel Pitoiset
0d0affad3c radv: don't flush src stages when dstStageMask == BOTTOM_OF_PIPE
Original patch by Fredrik Höglund.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:19:14 +01:00
Samuel Pitoiset
9efa3405a7 radv: do not set preserveAttachments for internal render passes
We don't use that.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:19:14 +01:00
Samuel Pitoiset
80e809d993 radv: drop useless checks when resolving subpass color attachments
The Vulkan spec says:
   "If pResolveAttachments is not NULL, for each resolve attachment
    that does not have the value VK_ATTACHMENT_UNUSED, the
    corresponding color attachment must not have the value
    VK_ATTACHMENT_UNUSED."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:19:14 +01:00
Samuel Pitoiset
76c17cfd8d radv: execute external subpass barriers after ending subpasses
Outgoing dependencies (ie. external) should happen after the subpass.
This doesn't change anything for subpass resolves as we already
make sure that attachments are shader readable.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:19:14 +01:00
Samuel Pitoiset
b482c030f5 radv: accumulate all ingoing external dependencies to the first subpass
In case two or more subpasses declare ingoing external dependencies.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:19:14 +01:00
Samuel Pitoiset
eaab35e5e3 radv: handle subpass dependencies correctly
The different masks should be accumulated. For example if two
subpasses declare an outgoing dependency (ie. dst ==
VK_SUBPASS_EXTERNAL).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:19:14 +01:00
Samuel Pitoiset
6430616e77 radv: track if subpasses have color attachments
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:19:14 +01:00
Samuel Pitoiset
1e810f1c53 radv: add radv_render_pass_add_subpass_dep() helper
To share common code that handles subpass dependencies.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:19:14 +01:00
Samuel Pitoiset
2472907563 radv: move some render pass things to radv_render_pass_compile()
radv_render_pass_compile() is common to vkCreateRenderPass()
and vkCreateRenderPass2().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:19:14 +01:00
Samuel Pitoiset
b509013060 radv: handle final layouts at end of every subpass and render pass
That shouldn't change anything as we check if the last
subpass id is the final subpass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:18:38 +01:00
Samuel Pitoiset
5699ac0078 radv: determine the last subpass id for every attachments
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:17:59 +01:00
Samuel Pitoiset
e1a0a268c6 radv: use the new attachments array when starting subpasses
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:17:57 +01:00
Samuel Pitoiset
a20c2e38d8 radv: store the list of attachments for every subpass
This reworks how the depth stencil attachment is used for
simplicity. This also introduces radv_render_pass_compile()
helper that will be used for further optimizations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:17:54 +01:00
Samuel Pitoiset
a7c7d811f1 radv: move subpass image transitions to radv_cmd_buffer_begin_subpass()
Instead of doing them in radv_cmd_buffer_set_subpass().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:17:52 +01:00
Samuel Pitoiset
291a933786 radv: add radv_cmd_buffer_begin_subpass() helper
To unify some code in BeginRenderPass() and NextSubpass().
Based on Intel ANV driver.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:17:50 +01:00
Samuel Pitoiset
41199e2eeb radv: remove useless MAYBE_UNUSED in CmdBeginRenderPass()
Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:17:46 +01:00
Samuel Pitoiset
545552c9b9 radv: remove unused radv_render_pass_attachment::view_mask
Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:17:42 +01:00
Samuel Pitoiset
0f932bbede radv: bail out when no image transitions will be performed
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:17:40 +01:00
Marek Olšák
1e85cfb91a meson: drop the xcb-xrandr version requirement
autotools doesn't have any requirement. This fixes meson on Ubuntu 16.04.

Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2019-02-03 18:39:57 -05:00
Eric Engestrom
808bf59cac wsi/display: add comment
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Keith Packard <keithp@keithp.com>
2019-02-02 23:08:03 +00:00
Jason Ekstrand
0aa5a97b03 relnotes: Add VK_EXT_buffer_device_address 2019-02-02 08:42:14 -06:00
Jason Ekstrand
48ed2a7bb0 anv: Implement VK_EXT_buffer_device_address
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-02-01 17:09:42 -06:00
Jason Ekstrand
e644ed468f intel/fs: Implement nir_intrinsic_global_atomic_*
eviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-01 16:11:00 -06:00
Jason Ekstrand
a91f392073 intel/fs: Use SENDS for A64 writes on gen9+
eviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-01 16:11:00 -06:00
Jason Ekstrand
1c25bf4373 intel/fs: Implement load/store_global with A64 untyped messages
eviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-01 16:11:00 -06:00
Jason Ekstrand
b4f0d062cd intel/fs: Do the grf127 hack on SIMD8 instructions in SIMD16 mode
Previously, we only applied the fix to shaders with a dispatch mode of
SIMD8 but the code it relies on for SIMD16 mode only applies to SIMD16
instructions.  If you have a SIMD8 instruction in a SIMD16 shader,
neither would trigger and the restriction could still be hit.

Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127..."
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-01 16:11:00 -06:00
Jason Ekstrand
79724a0756 intel/fs: Properly handle 64-bit types in LOAD_PAYLOAD
By just assigning dst.type to src[i].type, we ensure that the offset at
the end of the loop actually offsets it by the right number of
registers.  Otherwise, we'll get into a case where we copy with a Q type
and then offset with a D type and things get out of sync.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-01 16:10:57 -06:00
Jason Ekstrand
f02914a991 intel/fs/cse: Split create_copy_instr into three cases
Previously, we tried to combine all cases where the instruction being
CSE'd writes to more than one MOV worth of registers into one case with
a bit of special casing for LOAD_PAYLOAD.  This commit splits things so
that LOAD_PAYLOAD is entirely it's own case.  This makes tweaking the
LOAD_PAYLOAD case simpler in the next commit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-01 16:10:40 -06:00
Jason Ekstrand
f409a08e5f intel/nir: Add global support to lower_mem_access_bit_sizes
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-01 16:08:29 -06:00
Oscar Blumberg
fea5b8e5ad intel/fs: Fix memory corruption when compiling a CS
Missing check for shader stage in the fs_visitor would corrupt the
cs_prog_data.push information and trigger crashes / corruption later
when uploading the CS state.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-01 10:53:33 -08:00
Jason Ekstrand
ab940b0d97 spirv: Support LocalSizeId and LocalSizeHintId execution modes
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-01 17:34:02 +00:00
Jason Ekstrand
7223590c42 spirv: Handle OpExecutionModeId
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-01 17:34:02 +00:00
Jason Ekstrand
e68871f6a4 spirv: Handle constants and types before execution modes
We already defer handling the actual execution modes until after we've
created the shader.  This just moves it a tiny bit further so we
actually have constants and types and can handle OpExecutionModeId.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-01 17:34:02 +00:00
Jason Ekstrand
7d862ef530 spirv: Rework handling of spec constant workgroup size built-ins
Instead of handling it as part of the handling of constant instructions,
just stash the vtn_value when we see the decoration and handle it
explicitly later.  This will let us re-order handling of constant
instructions without breaking the Vulkan SPIR-V requirement that
decorating a specialization constant as the WorkgroupSize built-in
overrides the workgroup size set as an execution mode.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-01 17:34:02 +00:00
Jason Ekstrand
9b37e93e42 spirv: Replace vtn_constant_value with vtn_constant_uint
The uint version is less typing, supports different bit sizes, and is
probably a bit more safe because we're actually verifying that the
SPIR-V value is an integer scalar constant.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-01 17:34:02 +00:00
Samuel Pitoiset
5e7f800f32 radv: fix build
Fixes: 9b9ccee4d6 ("radv: take LDS into account for compute shader occupancy stats")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-01 15:31:55 +01:00
Timothy Arceri
9b9ccee4d6 radv: take LDS into account for compute shader occupancy stats
Ported from d205faeb6c.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-01 22:25:30 +11:00
Timothy Arceri
a53d68d318 ac/radv/radeonsi: add ac_get_num_physical_sgprs() helper
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-01 22:25:30 +11:00
Gurchetan Singh
574186f0e8 docs: add GL_EXT_texture_compression_s3tc_srgb to release notes
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-02-01 10:01:59 +00:00
Gurchetan Singh
dc9a15aefb st/mesa: expose EXT_texture_compression_s3tc_srgb
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2019-02-01 10:01:59 +00:00
Gurchetan Singh
a2ab400719 i965: Set flag for EXT_texture_compression_s3tc_srgb
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-02-01 10:01:59 +00:00
Gurchetan Singh
db24132d80 mesa/main: Expose EXT_texture_compression_s3tc_srgb
Required for the following test:

bin/compressedteximage GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT

pass when emulating GL on GLES.

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2019-02-01 10:01:59 +00:00
Timothy Arceri
0f3a8e1b64 st/glsl_to_nir: remove dead local variables
Without this we do not end up with a deterministic NIR because
temporary register variables are added in random order. NIR must
be deterministic because we use it to produce a sha for the
radeonsi backends disk cache.

This fixes the shader cache for a bunch of shaders.

Another positive is that this results in a large reduction in the
size of the NIR that the state tracker stores to the disk cache.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-01 15:56:02 +11:00
Dylan Baker
4052142de7 meson: remove -std=c++11 from intel/tools
for meson all C++ code is already compiled as C++11, so it's
unnecessary. It's also the wrong way to do this, if we really needed
this the correct way is to set:

```meson
executable(
  ...
  override_options : ['cpp_std=c++11'],
)
```

Which ensures not only that the correct syntax for the current
compiler is used, but also that meson doesn't create arguments like
`-std=c++14 ... -std=c++11`

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-01-31 21:42:16 +00:00
Dylan Baker
8e49b32f63 meson: fix style in intel/tools
The `:` in options should always have one space before and after `foo
: bar`, and lists do not get spaces around the braces: `[foo]` not `[
foo ]`

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-01-31 21:42:16 +00:00
Dylan Baker
d93d53fa72 meson: remove build_by_default : true
Which is and has always been the default. This is largely an artifact
of how the building of these tools was controlled when the meson build
was originally created.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-01-31 21:42:16 +00:00
Emil Velikov
1240c3cb10 docs: update calendar, add news item and link release notes for 18.3.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2019-01-31 21:17:38 +00:00
Emil Velikov
83160c6c05 docs: add sha256 checksums for 18.3.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 7475d7727f)
2019-01-31 21:15:20 +00:00
Emil Velikov
4d0732dc39 docs: add release notes for 18.3.3
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 190a79f462)
[Emil: drop VERSION hunk]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>

 Conflicts:
	VERSION
2019-01-31 21:14:56 +00:00
Neha Bhende
69d736b17a st/mesa: Fix topogun-1.06-orc-84k-resize.trace crash
We need to initialize all fields in rs->prim explicitly while
creating new rastpos stage.

Fixes: bac8534267 ("st/mesa: allow glDrawElements to work with GL_SELECT
feedback")

v2: Initializing all fields in rs->prim as per Ilia.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2019-01-31 12:21:59 -07:00
Dylan Baker
c812c740e6 android,autotools,i965: Fix location of float64_glsl.h
Android.mk and autotools disagree about where generated files should
go, which wasn't a problem until we wanted to build a dist
tarball. This corrects the problem by changing the output and include
paths to be the same on android and autotools (meson already has the
correct include path).

Fixes: 7d7b30835c
       ("automake: Fix path to generated source")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2019-01-31 19:04:30 +00:00
Marek Olšák
d49c16a597 gallium: allow more PIPE_RESOURCE_ driver flags
radeonsi has 8 and will probably have 9 soon.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-01-31 13:10:42 -05:00
Eric Anholt
ab4d5775b0 v3d: Fix image_load_store clamping of signed integer stores.
This was copy-and-paste fail, that oddly showed up in the CTS's
reinterprets of r32f, rgba8, and srgba8 to rgba8i, but not r32ui and r32i
to rgba8i or reinterprets to other signed int formats.

Fixes: 6281f26f06 ("v3d: Add support for shader_image_load_store.")
2019-01-31 08:39:40 -08:00
Eric Anholt
db2ae51121 mesa: Skip partial InvalidateFramebuffer of packed depth/stencil.
One of the CTS cases tries to invalidate just stencil of packed
depth/stencil, and we incorrectly lost the depth contents.

Fixes dEQP-GLES3.functional.fbo.invalidate.whole.unbind_read_stencil
Fixes: 0c42b5f3cb ("mesa: wire up InvalidateFramebuffer")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-01-31 08:37:46 -08:00
Rob Clark
39cfdf9930 freedreno: more fixing release tarball
Fixes: aa0fed10d3 freedreno: move ir3 to common location
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-01-31 09:59:18 -05:00
Rob Clark
e252656d14 freedreno: fix release tarball
Fixes: b4476138d5 freedreno: move drm to common location
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2019-01-31 09:59:18 -05:00
Emmanuel Gil Peyrot
0d4dd59ae5 docs: make bugs.html easier to find
Thanks to Yann Kervran for the report and suggestions.

Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-01-31 14:31:48 +00:00
Dave Airlie
9279a28f07 virgl: ARB_query_buffer_object support
v1.1: fix size define.

Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-01-31 11:23:38 +10:00
Dave Airlie
38658c6d4d virgl: enable elapsed time queries
GL underneath always has GL_TIME_ELAPSED so always enable these.

Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
2019-01-31 11:23:30 +10:00
Dylan Baker
da48cba61e automake: Add --enable-autotools to distcheck flags
Fixes: e68777c87c
       ("autotools: Deprecate the use of autotools")
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-01-30 19:32:44 +00:00
Marek Olšák
ffbd37d8e9 radeonsi: fix a comment typo in si_fine_fence_set 2019-01-30 14:32:05 -05:00
Marek Olšák
f4eb746ef7 r600: add -Wstrict-overflow=0 to meson to silence the warning
same as radeonsi
2019-01-30 12:49:45 -05:00
Marek Olšák
d50bef9831 winsys/amdgpu: remove amdgpu_drm.h definitions
trivial
2019-01-30 12:38:56 -05:00
Marek Olšák
16672f16da radeonsi: unify error paths in si_texture_create_object 2019-01-30 12:35:22 -05:00
Marek Olšák
2361558eb7 radeonsi: merge & rename texture BO metadata functions 2019-01-30 12:35:22 -05:00
Marek Olšák
1c12d56e4d radeonsi: enable dithered alpha-to-coverage for better quality
same as AMDVLK.

GL_NV_alpha_to_coverage_dither_control allows controlling this behavior.
The default is implementation-dependent.
2019-01-30 12:35:22 -05:00
Dylan Baker
b4986d2e0c gallium: wrap u_screen in extern "C" for c++
Some drivers (notabily SWR) are written in C++, and as such they need
access to C headers with extern "C". So lets add that.
2019-01-30 15:12:27 +00:00
Gert Wollny
45903cddc3 mesa/core: Enable EXT_texture_sRGB_R8 also for desktop GL
As of Nov/30/2018 the extension is also valid for OpenGL >= 1.2, so
enable it accordingly and also add the required view class entry.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-01-30 11:32:40 +00:00
Samuel Pitoiset
9c762c01c8 radv/winsys: fix hash when adding internal buffers
This fixes serious stuttering in Shadow Of The Tomb Raider.

Fixes: 50fd253bd6 ("radv/winsys: Add priority handling during submit.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-30 12:29:10 +01:00
Erik Faye-Lund
3b6f95ad66 mesa: expose NV_conditional_render on GLES
The extension spec has been updated to include GLES 2 support, so let's
enable it there.

v2: fixup ABI-check as well

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-01-30 09:43:44 +01:00
Ernestas Kulik
90458bef54 v3d: Fix leak in resource setup error path
Reported by Coverity: in the case of unsupported modifier request, the
code does not jump to the “fail” label to destroy the acquired resource.

CID: 1435704
Signed-off-by: Ernestas Kulik <ernestas.kulik@gmail.com>
Fixes: 45bb8f2957 ("broadcom: Add V3D 3.3 gallium driver called "vc5", for BCM7268.")
2019-01-29 16:14:13 -08:00
Ernestas Kulik
f6e49d5ad0 vc4: Fix leak in HW queries error path
Reported by Coverity: in the case where there exist hardware and
non-hardware queries, the code does not jump to err_free_query and leaks
the query.

CID: 1430194
Signed-off-by: Ernestas Kulik <ernestas.kulik@gmail.com>
Fixes: 9ea90ffb98 ("broadcom/vc4: Add support for HW perfmon")
2019-01-29 16:14:13 -08:00
Eric Anholt
6053c7bb43 v3d: Fix a release build set-but-unused compiler warning. 2019-01-29 16:02:51 -08:00
Eric Anholt
0c05198d6b v3d: Always enable the NEON utile load/store code.
I can't imagine the new HW block being paired with a v6 CPU, so don't
bother with the CPU detection that vc4 had to do.

Improves 1024x1024 TexImage on my 7278 by 47.3229% +/- 0.679632%
2019-01-29 16:00:25 -08:00
Emil Velikov
385843ac3c vc4: Declare the last cpu pointer as being modified in NEON asm.
Earlier commit addressed 7 of the 8 instances available.

v2: Rebase patch back to master (by anholt)

Cc: Carsten Haitzler (Rasterman) <raster@rasterman.com>
Cc: Eric Anholt <eric@anholt.net>
Fixes: 300d3ae8b1 ("vc4: Declare the cpu pointers as being modified in NEON asm.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2019-01-29 16:00:25 -08:00
Dylan Baker
75ad254acf docs: Add relnotes stub for 19.1 2019-01-29 15:32:16 -08:00
Dylan Baker
dba0989ac1 bump version for 19.0 branch 2019-01-29 15:30:25 -08:00
839 changed files with 96743 additions and 28194 deletions

499
.gitlab-ci.yml Normal file
View File

@@ -0,0 +1,499 @@
# This is the tag of the docker image used for the build jobs. If the
# image doesn't exist yet, the containers-build stage generates it.
#
# In order to generate a new image, one should generally change the tag.
# While removing the image from the registry would also work, that's not
# recommended except for ephemeral images during development: Replacing
# an image after a significant amount of time might pull in newer
# versions of gcc/clang or other packages, which might break the build
# with older commits using the same tag.
#
# After merging a change resulting in generating a new image to the
# main repository, it's recommended to remove the image from the source
# repository's container registry, so that the image from the main
# repository's registry will be used there as well.
#
# The format of the tag is "%Y-%m-%d-${counter}" where ${counter} stays
# at "01" unless you have multiple updates on the same day :)
variables:
UBUNTU_TAG: 2019-02-12-01
UBUNTU_IMAGE: "$CI_REGISTRY_IMAGE/ubuntu:$UBUNTU_TAG"
UBUNTU_IMAGE_MAIN: "registry.freedesktop.org/mesa/mesa/ubuntu:$UBUNTU_TAG"
cache:
paths:
- ccache
stages:
- containers-build
- build+test
# When to automatically run the CI
.ci-run-policy:
only:
- master
- merge_requests
- /^ci([-/].*)?$/
# CONTAINERS
containers:ubuntu:
extends: .ci-run-policy
stage: containers-build
image: docker:stable
services:
- docker:dind
variables:
DOCKER_HOST: tcp://docker:2375
DOCKER_DRIVER: overlay2
script:
# Enable experimental features such as `docker manifest inspect`
- mkdir -p ~/.docker
- "echo '{\"experimental\": \"enabled\"}' > ~/.docker/config.json"
- docker login -u gitlab-ci-token -p $CI_JOB_TOKEN $CI_REGISTRY
# Check if the image (with the specific tag) already exists
- docker manifest inspect $UBUNTU_IMAGE && exit || true
# Try to re-use the image from the main repository's registry
- docker image pull $UBUNTU_IMAGE_MAIN &&
docker image tag $UBUNTU_IMAGE_MAIN $UBUNTU_IMAGE &&
docker image push $UBUNTU_IMAGE && exit || true
- docker build -t $UBUNTU_IMAGE -f .gitlab-ci/Dockerfile.ubuntu .
- docker push $UBUNTU_IMAGE
# BUILD
.build:
extends: .ci-run-policy
image: $UBUNTU_IMAGE
stage: build+test
artifacts:
when: on_failure
untracked: true
# Use ccache transparently, and print stats before/after
before_script:
- export PATH="/usr/lib/ccache:$PATH"
- export CCACHE_BASEDIR="$PWD"
- export CCACHE_DIR="$PWD/ccache"
- export CCACHE_COMPILERCHECK=content
- ccache --zero-stats || true
- ccache --show-stats || true
after_script:
- export CCACHE_DIR="$PWD/ccache"
- ccache --show-stats
.meson-build:
extends: .build
script:
# We need to control the version of llvm-config we're using, so we'll
# generate a native file to do so. This requires meson >=0.49
- if test -n "$LLVM_VERSION"; then
LLVM_CONFIG="llvm-config-${LLVM_VERSION}";
echo -e "[binaries]\nllvm-config = '`which $LLVM_CONFIG`'" > native.file;
$LLVM_CONFIG --version;
else
touch native.file;
fi
- meson --version
- meson _build
--native-file=native.file
-D build-tests=true
-D libunwind=${UNWIND}
${DRI_LOADERS}
-D dri-drivers=${DRI_DRIVERS:-[]}
${GALLIUM_ST}
-D gallium-drivers=${GALLIUM_DRIVERS:-[]}
-D vulkan-drivers=${VULKAN_DRIVERS:-[]}
- cd _build
- meson configure
- ninja -j4
- ninja test
.make-build:
extends: .build
variables:
MAKEFLAGS: "-j4"
script:
- if test -n "$LLVM_VERSION"; then
export LLVM_CONFIG="llvm-config-${LLVM_VERSION}";
fi
- mkdir build
- cd build
- ../autogen.sh
--enable-autotools
--enable-debug
$LIBUNWIND_FLAGS
$DRI_LOADERS
--with-dri-drivers=$DRI_DRIVERS
$GALLIUM_ST
--with-gallium-drivers=$GALLIUM_DRIVERS
--with-vulkan-drivers=$VULKAN_DRIVERS
--disable-llvm-shared-libs
- make
- eval $MAKE_CHECK_COMMAND
.scons-build:
extends: .build
variables:
SCONSFLAGS: "-j4"
script:
- if test -n "$LLVM_VERSION"; then
export LLVM_CONFIG="llvm-config-${LLVM_VERSION}";
fi
- scons $SCONS_TARGET
- eval $SCONS_CHECK_COMMAND
build:meson-vulkan:
extends: .meson-build
variables:
UNWIND: "false"
DRI_LOADERS: >
-D glx=disabled
-D gbm=false
-D egl=false
-D platforms=x11,wayland,drm
-D osmesa=none
GALLIUM_ST: >
-D dri3=true
-D gallium-vdpau=false
-D gallium-xvmc=false
-D gallium-omx=disabled
-D gallium-va=false
-D gallium-xa=false
-D gallium-nine=false
-D gallium-opencl=disabled
VULKAN_DRIVERS: intel,amd
LLVM_VERSION: "7"
build:meson-loader-classic-dri:
extends: .meson-build
variables:
UNWIND: "false"
DRI_LOADERS: >
-D glx=dri
-D gbm=true
-D egl=true
-D platforms=x11,wayland,drm,surfaceless
-D osmesa=classic
DRI_DRIVERS: "i915,i965,r100,r200,swrast,nouveau"
GALLIUM_ST: >
-D dri3=true
-D gallium-vdpau=false
-D gallium-xvmc=false
-D gallium-omx=disabled
-D gallium-va=false
-D gallium-xa=false
-D gallium-nine=false
-D gallium-opencl=disabled
build:meson-glvnd:
extends: .meson-build
variables:
UNWIND: "true"
DRI_LOADERS: >
-D glvnd=true
-D egl=true
-D gbm=true
-D glx=dri
DRI_DRIVERS: "i965"
GALLIUM_ST: >
-D gallium-vdpau=false
-D gallium-xvmc=false
-D gallium-omx=disabled
-D gallium-va=false
-D gallium-xa=false
-D gallium-nine=false
-D gallium-opencl=disabled
# NOTE: Building SWR is 2x (yes two) times slower than all the other
# gallium drivers combined.
# Start this early so that it doesn't hunder the run time.
build:meson-gallium-swr:
extends: .meson-build
variables:
UNWIND: "true"
DRI_LOADERS: >
-D glx=disabled
-D egl=false
-D gbm=false
GALLIUM_ST: >
-D dri3=false
-D gallium-vdpau=false
-D gallium-xvmc=false
-D gallium-omx=disabled
-D gallium-va=false
-D gallium-xa=false
-D gallium-nine=false
-D gallium-opencl=disabled
GALLIUM_DRIVERS: "swr"
LLVM_VERSION: "6.0"
build:meson-gallium-radeonsi:
extends: .meson-build
variables:
UNWIND: "true"
DRI_LOADERS: >
-D glx=disabled
-D egl=false
-D gbm=false
GALLIUM_ST: >
-D dri3=false
-D gallium-vdpau=false
-D gallium-xvmc=false
-D gallium-omx=disabled
-D gallium-va=false
-D gallium-xa=false
-D gallium-nine=false
-D gallium-opencl=disabled
GALLIUM_DRIVERS: "radeonsi"
LLVM_VERSION: "7"
build:meson-gallium-drivers-other:
extends: .meson-build
variables:
UNWIND: "true"
DRI_LOADERS: >
-D glx=disabled
-D egl=false
-D gbm=false
GALLIUM_ST: >
-D dri3=false
-D gallium-vdpau=false
-D gallium-xvmc=false
-D gallium-omx=disabled
-D gallium-va=false
-D gallium-xa=false
-D gallium-nine=false
-D gallium-opencl=disabled
GALLIUM_DRIVERS: "i915,iris,nouveau,kmsro,r300,r600,freedreno,svga,swrast,v3d,vc4,virgl,etnaviv"
LLVM_VERSION: "5.0"
build:meson-gallium-clover-llvm5:
extends: .meson-build
variables:
UNWIND: "true"
DRI_LOADERS: >
-D glx=disabled
-D egl=false
-D gbm=false
GALLIUM_ST: >
-D dri3=false
-D gallium-vdpau=false
-D gallium-xvmc=false
-D gallium-omx=disabled
-D gallium-va=false
-D gallium-xa=false
-D gallium-nine=false
-D gallium-opencl=icd
GALLIUM_DRIVERS: "r600"
LLVM_VERSION: "5.0"
build:meson-gallium-clover-llvm6:
extends: build:meson-gallium-clover-llvm5
variables:
LLVM_VERSION: "6.0"
build:meson-gallium-clover-llvm7:
extends: build:meson-gallium-clover-llvm5
variables:
GALLIUM_DRIVERS: "r600,radeonsi"
LLVM_VERSION: "7"
build:meson-gallium-st-other:
extends: .meson-build
variables:
UNWIND: "true"
DRI_LOADERS: >
-D glx=disabled
-D egl=false
-D gbm=false
GALLIUM_ST: >
-D dri3=true
-D gallium-vdpau=true
-D gallium-xvmc=true
-D gallium-omx=bellagio
-D gallium-va=true
-D gallium-xa=true
-D gallium-nine=true
-D gallium-opencl=disabled
-D osmesa=gallium
GALLIUM_DRIVERS: "nouveau,swrast"
LLVM_VERSION: "5.0"
build:make-vulkan:
extends: .make-build
variables:
MAKE_CHECK_COMMAND: "make -C src/gtest check && make -C src/intel check"
LLVM_VERSION: "7"
DRI_LOADERS: >
--disable-glx
--disable-gbm
--disable-egl
--with-platforms=x11,wayland,drm
DRI_DRIVERS: ""
GALLIUM_ST: >
--enable-dri
--enable-dri3
--disable-opencl
--disable-xa
--disable-nine
--disable-xvmc
--disable-vdpau
--disable-va
--disable-omx-bellagio
--disable-gallium-osmesa
VULKAN_DRIVERS: intel,radeon
LIBUNWIND_FLAGS: --disable-libunwind
build:make-loader-classic-dri:
extends: .make-build
variables:
MAKE_CHECK_COMMAND: "make check"
DRI_LOADERS: >
--enable-glx
--enable-gbm
--enable-egl
--with-platforms=x11,wayland,drm,surfaceless
--enable-osmesa
DRI_DRIVERS: "i915,i965,radeon,r200,swrast,nouveau"
GALLIUM_ST: >
--enable-dri
--disable-opencl
--disable-xa
--disable-nine
--disable-xvmc
--disable-vdpau
--disable-va
--disable-omx-bellagio
--disable-gallium-osmesa
LIBUNWIND_FLAGS: --disable-libunwind
# NOTE: Building SWR is 2x (yes two) times slower than all the other
# gallium drivers combined.
# Start this early so that it doesn't hunder the run time.
build:make-gallium-drivers-swr:
extends: .make-build
variables:
MAKE_CHECK_COMMAND: "true"
LLVM_VERSION: "6.0"
DRI_LOADERS: >
--disable-glx
--disable-gbm
--disable-egl
GALLIUM_ST: >
--enable-dri
--disable-opencl
--disable-xa
--disable-nine
--disable-xvmc
--disable-vdpau
--disable-va
--disable-omx-bellagio
--disable-gallium-osmesa
GALLIUM_DRIVERS: "swr"
LIBUNWIND_FLAGS: --enable-libunwind
build:make-gallium-drivers-radeonsi:
extends: build:make-gallium-drivers-swr
variables:
LLVM_VERSION: "7"
GALLIUM_DRIVERS: "radeonsi"
build:make-gallium-drivers-other:
extends: build:make-gallium-drivers-swr
variables:
LLVM_VERSION: "3.9"
GALLIUM_DRIVERS: "i915,nouveau,kmsro,r300,r600,freedreno,svga,swrast,v3d,vc4,virgl,etnaviv"
build:make-gallium-st-clover-llvm-39:
extends: .make-build
variables:
MAKE_CHECK_COMMAND: "true"
LLVM_VERSION: "3.9"
DRI_LOADERS: >
--disable-glx
--disable-gbm
--disable-egl
GALLIUM_ST: >
--disable-dri
--enable-opencl
--enable-opencl-icd
--enable-llvm
--disable-xa
--disable-nine
--disable-xvmc
--disable-vdpau
--disable-va
--disable-omx-bellagio
--disable-gallium-osmesa
GALLIUM_DRIVERS: "r600"
LIBUNWIND_FLAGS: --enable-libunwind
build:make-gallium-st-clover-llvm-4:
extends: build:make-gallium-st-clover-llvm-39
variables:
LLVM_VERSION: "4.0"
build:make-gallium-st-clover-llvm-5:
extends: build:make-gallium-st-clover-llvm-39
variables:
LLVM_VERSION: "5.0"
build:make-gallium-st-clover-llvm-6:
extends: build:make-gallium-st-clover-llvm-39
variables:
LLVM_VERSION: "6.0"
build:make-gallium-st-clover-llvm-7:
extends: build:make-gallium-st-clover-llvm-39
variables:
LLVM_VERSION: "7"
GALLIUM_DRIVERS: "r600,radeonsi"
build:make-gallium-st-other:
extends: .make-build
variables:
MAKE_CHECK_COMMAND: "true"
# We should be testing 3.3, but 3.9 is the oldest that still exists in ubuntu
LLVM_VERSION: "3.9"
DRI_LOADERS: >
--disable-glx
--disable-gbm
--disable-egl
GALLIUM_ST: >
--enable-dri
--disable-opencl
--enable-xa
--enable-nine
--enable-xvmc
--enable-vdpau
--enable-va
--enable-omx-bellagio
--enable-gallium-osmesa
# We need swrast for osmesa and nine.
# i915 most likely doesn't work with most ST.
# Regardless - we're doing a quick build test here.
GALLIUM_DRIVERS: "i915,swrast"
LIBUNWIND_FLAGS: --enable-libunwind
build:scons-nollvm:
extends: .scons-build
variables:
SCONS_TARGET: "llvm=0"
SCONS_CHECK_COMMAND: "scons llvm=0 check"
build:scons-llvm:
extends: .scons-build
variables:
SCONS_TARGET: "llvm=1"
SCONS_CHECK_COMMAND: "scons llvm=1 check"
LLVM_VERSION: "3.9"
build:scons-swr:
extends: .scons-build
variables:
SCONS_TARGET: "swr=1"
SCONS_CHECK_COMMAND: "true"
LLVM_VERSION: "6.0"

View File

@@ -0,0 +1,165 @@
FROM ubuntu:bionic
RUN apt-get update
RUN apt-get upgrade -y
RUN apt-get install -y \
curl \
wget \
gnupg \
software-properties-common
RUN curl -fsSL https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add -
RUN add-apt-repository "deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-7 main"
RUN apt-get update
RUN apt-get install -y \
pkg-config \
libdrm-dev \
libpciaccess-dev \
libxrandr-dev \
libxdamage-dev \
libxfixes-dev \
libxshmfence-dev \
libxxf86vm-dev \
libvdpau-dev \
libva-dev \
llvm-3.9-dev \
libclang-3.9-dev \
llvm-4.0-dev \
libclang-4.0-dev \
llvm-5.0-dev \
llvm-6.0-dev \
llvm-7-dev \
clang-5.0 \
libclang-5.0-dev \
clang-6.0 \
libclang-6.0-dev \
clang-7 \
libclang-7-dev \
libclc-dev \
libxvmc-dev \
libomxil-bellagio-dev \
xz-utils \
libexpat1-dev \
libx11-xcb-dev \
x11proto-xf86vidmode-dev \
libelf-dev \
libunwind8-dev \
libglvnd-dev \
python2.7 \
python-pip \
python-setuptools \
python3.5 \
python3-pip \
python3-setuptools
RUN apt-get install -y \
libxcb-randr0
# autotools build deps
RUN apt-get install -y \
autoconf \
automake \
xutils-dev \
libtool \
bison \
flex \
gettext \
make
# dependencies where we want a specific version
ENV XORG_RELEASES https://xorg.freedesktop.org/releases/individual
ENV XCB_RELEASES https://xcb.freedesktop.org/dist
ENV WAYLAND_RELEASES https://wayland.freedesktop.org/releases
ENV XORGMACROS_VERSION util-macros-1.19.0
ENV GLPROTO_VERSION glproto-1.4.17
ENV DRI2PROTO_VERSION dri2proto-2.8
ENV LIBPCIACCESS_VERSION libpciaccess-0.13.4
ENV LIBDRM_VERSION libdrm-2.4.97
ENV XCBPROTO_VERSION xcb-proto-1.13
ENV RANDRPROTO_VERSION randrproto-1.3.0
ENV LIBXRANDR_VERSION libXrandr-1.3.0
ENV LIBXCB_VERSION libxcb-1.13
ENV LIBXSHMFENCE_VERSION libxshmfence-1.3
ENV LIBVDPAU_VERSION libvdpau-1.1
ENV LIBVA_VERSION libva-1.7.0
ENV LIBWAYLAND_VERSION wayland-1.15.0
ENV WAYLAND_PROTOCOLS_VERSION wayland-protocols-1.8
RUN wget $XORG_RELEASES/util/$XORGMACROS_VERSION.tar.bz2
RUN tar -xvf $XORGMACROS_VERSION.tar.bz2 && rm $XORGMACROS_VERSION.tar.bz2
RUN (cd $XORGMACROS_VERSION && ./configure && make install) && rm -rf $XORGMACROS_VERSION
RUN wget $XORG_RELEASES/proto/$GLPROTO_VERSION.tar.bz2
RUN tar -xvf $GLPROTO_VERSION.tar.bz2 && rm $GLPROTO_VERSION.tar.bz2
RUN (cd $GLPROTO_VERSION && ./configure && make install) && rm -rf $GLPROTO_VERSION
RUN wget $XORG_RELEASES/proto/$DRI2PROTO_VERSION.tar.bz2
RUN tar -xvf $DRI2PROTO_VERSION.tar.bz2 && rm $DRI2PROTO_VERSION.tar.bz2
RUN (cd $DRI2PROTO_VERSION && ./configure && make install) && rm -rf $DRI2PROTO_VERSION
RUN wget $XCB_RELEASES/$XCBPROTO_VERSION.tar.bz2
RUN tar -xvf $XCBPROTO_VERSION.tar.bz2 && rm $XCBPROTO_VERSION.tar.bz2
RUN (cd $XCBPROTO_VERSION && ./configure && make install) && rm -rf $XCBPROTO_VERSION
RUN wget $XCB_RELEASES/$LIBXCB_VERSION.tar.bz2
RUN tar -xvf $LIBXCB_VERSION.tar.bz2 && rm $LIBXCB_VERSION.tar.bz2
RUN (cd $LIBXCB_VERSION && ./configure && make install) && rm -rf $LIBXCB_VERSION
RUN wget $XORG_RELEASES/lib/$LIBPCIACCESS_VERSION.tar.bz2
RUN tar -xvf $LIBPCIACCESS_VERSION.tar.bz2 && rm $LIBPCIACCESS_VERSION.tar.bz2
RUN (cd $LIBPCIACCESS_VERSION && ./configure && make install) && rm -rf $LIBPCIACCESS_VERSION
RUN wget https://dri.freedesktop.org/libdrm/$LIBDRM_VERSION.tar.bz2
RUN tar -xvf $LIBDRM_VERSION.tar.bz2 && rm $LIBDRM_VERSION.tar.bz2
RUN (cd $LIBDRM_VERSION && ./configure --enable-vc4 --enable-freedreno --enable-etnaviv-experimental-api && make install) && rm -rf $LIBDRM_VERSION
RUN wget $XORG_RELEASES/proto/$RANDRPROTO_VERSION.tar.bz2
RUN tar -xvf $RANDRPROTO_VERSION.tar.bz2 && rm $RANDRPROTO_VERSION.tar.bz2
RUN (cd $RANDRPROTO_VERSION && ./configure && make install) && rm -rf $RANDRPROTO_VERSION
RUN wget $XORG_RELEASES/lib/$LIBXRANDR_VERSION.tar.bz2
RUN tar -xvf $LIBXRANDR_VERSION.tar.bz2 && rm $LIBXRANDR_VERSION.tar.bz2
RUN (cd $LIBXRANDR_VERSION && ./configure && make install) && rm -rf $LIBXRANDR_VERSION
RUN wget $XORG_RELEASES/lib/$LIBXSHMFENCE_VERSION.tar.bz2
RUN tar -xvf $LIBXSHMFENCE_VERSION.tar.bz2 && rm $LIBXSHMFENCE_VERSION.tar.bz2
RUN (cd $LIBXSHMFENCE_VERSION && ./configure && make install) && rm -rf $LIBXSHMFENCE_VERSION
RUN wget https://people.freedesktop.org/~aplattner/vdpau/$LIBVDPAU_VERSION.tar.bz2
RUN tar -xvf $LIBVDPAU_VERSION.tar.bz2 && rm $LIBVDPAU_VERSION.tar.bz2
RUN (cd $LIBVDPAU_VERSION && ./configure && make install) && rm -rf $LIBVDPAU_VERSION
RUN wget https://www.freedesktop.org/software/vaapi/releases/libva/$LIBVA_VERSION.tar.bz2
RUN tar -xvf $LIBVA_VERSION.tar.bz2 && rm $LIBVA_VERSION.tar.bz2
RUN (cd $LIBVA_VERSION && ./configure --disable-wayland --disable-dummy-driver && make install) && rm -rf $LIBVA_VERSION
RUN wget $WAYLAND_RELEASES/$LIBWAYLAND_VERSION.tar.xz
RUN tar -xvf $LIBWAYLAND_VERSION.tar.xz && rm $LIBWAYLAND_VERSION.tar.xz
RUN (cd $LIBWAYLAND_VERSION && ./configure --enable-libraries --without-host-scanner --disable-documentation --disable-dtd-validation && make install) && rm -rf $LIBWAYLAND_VERSION
RUN wget $WAYLAND_RELEASES/$WAYLAND_PROTOCOLS_VERSION.tar.xz
RUN tar -xvf $WAYLAND_PROTOCOLS_VERSION.tar.xz && rm $WAYLAND_PROTOCOLS_VERSION.tar.xz
RUN (cd $WAYLAND_PROTOCOLS_VERSION && ./configure && make install) && rm -rf $WAYLAND_PROTOCOLS_VERSION
RUN apt-get install -y unzip
# Meson requires ninja >= 1.6, but xenial has 1.3.x
RUN wget https://github.com/ninja-build/ninja/releases/download/v1.6.0/ninja-linux.zip
RUN unzip ninja-linux.zip && rm ninja-linux.zip
RUN mv ninja /usr/bin/
RUN pip3 install 'meson>=0.49'
RUN pip2 install 'scons>=2.4'
RUN pip2 install mako
RUN pip3 install mako
# Use ccache to speed up builds
RUN apt-get install -y ccache
# Cleanup workdir
WORKDIR /

View File

@@ -265,6 +265,9 @@ Kristian Høgsberg <krh@bitplanet.net> <krh@hinata.boston.redhat.com>
Kristian Høgsberg <krh@bitplanet.net> <krh@sasori.boston.redhat.com>
Kristian Høgsberg <krh@bitplanet.net> <krh@temari.boston.redhat.com>
Kristian Høgsberg <krh@bitplanet.net> <kristian.h.kristensen@intel.com>
Kristian Høgsberg <krh@bitplanet.net> <hoegsberg@chromium.org>
Kristian Høgsberg <krh@bitplanet.net> <hoegsberg@google.com>
Kristian Høgsberg <krh@bitplanet.net> <hoegsberg@gmail.com>
Krzesimir Nowak <qdlacz@gmail.com> <krzesimir@kinvolk.io>

View File

@@ -3,643 +3,14 @@ language: c
dist: xenial
cache:
apt: true
ccache: true
env:
global:
- XORG_RELEASES=https://xorg.freedesktop.org/releases/individual
- XCB_RELEASES=https://xcb.freedesktop.org/dist
- WAYLAND_RELEASES=https://wayland.freedesktop.org/releases
- XORGMACROS_VERSION=util-macros-1.19.0
- GLPROTO_VERSION=glproto-1.4.17
- DRI2PROTO_VERSION=dri2proto-2.8
- LIBPCIACCESS_VERSION=libpciaccess-0.13.4
- LIBDRM_VERSION=libdrm-2.4.97
- XCBPROTO_VERSION=xcb-proto-1.13
- RANDRPROTO_VERSION=randrproto-1.3.0
- LIBXRANDR_VERSION=libXrandr-1.3.0
- LIBXCB_VERSION=libxcb-1.13
- LIBXSHMFENCE_VERSION=libxshmfence-1.2
- LIBVDPAU_VERSION=libvdpau-1.1
- LIBVA_VERSION=libva-1.7.0
- LIBWAYLAND_VERSION=wayland-1.15.0
- WAYLAND_PROTOCOLS_VERSION=wayland-protocols-1.8
- PKG_CONFIG_PATH=$HOME/prefix/lib/pkgconfig:$HOME/prefix/share/pkgconfig
- LD_LIBRARY_PATH="$HOME/prefix/lib:$LD_LIBRARY_PATH"
- PATH="$HOME/prefix/bin:$PATH"
- PKG_CONFIG_PATH="$PKG_CONFIG_PATH"
matrix:
include:
- env:
- LABEL="meson Vulkan"
- BUILD=meson
- UNWIND="false"
- DRI_LOADERS="-Dglx=disabled -Dgbm=false -Degl=false -Dplatforms=x11,wayland,drm -Dosmesa=none"
- GALLIUM_ST="-Ddri3=true -Dgallium-vdpau=false -Dgallium-xvmc=false -Dgallium-omx=disabled -Dgallium-va=false -Dgallium-xa=false -Dgallium-nine=false -Dgallium-opencl=disabled"
- VULKAN_DRIVERS="intel,amd"
- LLVM_VERSION=7
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
addons:
apt:
sources:
- sourceline: 'deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-7 main'
key_url: https://apt.llvm.org/llvm-snapshot.gpg.key
packages:
- llvm-7-dev
# Common
- xz-utils
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- python3.5
- python3-pip
- python3-setuptools
- env:
- LABEL="meson loaders/classic DRI"
- BUILD=meson
- UNWIND="false"
- DRI_LOADERS="-Dglx=dri -Dgbm=true -Degl=true -Dplatforms=x11,wayland,drm,surfaceless -Dosmesa=classic"
- DRI_DRIVERS="i915,i965,r100,r200,swrast,nouveau"
- GALLIUM_ST="-Ddri3=true -Dgallium-vdpau=false -Dgallium-xvmc=false -Dgallium-omx=disabled -Dgallium-va=false -Dgallium-xa=false -Dgallium-nine=false -Dgallium-opencl=disabled"
addons:
apt:
packages:
- xz-utils
- x11proto-xf86vidmode-dev
- libxxf86vm-dev
- libexpat1-dev
- libx11-xcb-dev
- libxdamage-dev
- libxfixes-dev
- python3.5
- python3-pip
- python3-setuptools
- env:
- LABEL="make loaders/classic DRI"
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="make check"
- DRI_LOADERS="--enable-glx --enable-gbm --enable-egl --with-platforms=x11,drm,surfaceless,wayland --enable-osmesa"
- DRI_DRIVERS="i915,i965,radeon,r200,swrast,nouveau"
- GALLIUM_ST="--enable-dri --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"
- GALLIUM_DRIVERS=""
- VULKAN_DRIVERS=""
- LIBUNWIND_FLAGS="--disable-libunwind"
addons:
apt:
packages:
- xz-utils
- x11proto-xf86vidmode-dev
- libxxf86vm-dev
- libexpat1-dev
- libx11-xcb-dev
- libxdamage-dev
- libxfixes-dev
- python3-pip
- python3-setuptools
- env:
# NOTE: Building SWR is 2x (yes two) times slower than all the other
# gallium drivers combined.
# Start this early so that it doesn't hunder the run time.
- LABEL="meson Gallium Drivers SWR"
- BUILD=meson
- UNWIND="true"
- DRI_LOADERS="-Dglx=disabled -Degl=false -Dgbm=false"
- GALLIUM_ST="-Ddri3=false -Dgallium-vdpau=false -Dgallium-xvmc=false -Dgallium-omx=disabled -Dgallium-va=false -Dgallium-xa=false -Dgallium-nine=false -Dgallium-opencl=disabled"
- GALLIUM_DRIVERS="swr"
- LLVM_VERSION=6.0
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
addons:
apt:
packages:
- llvm-6.0-dev
# Common
- xz-utils
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- python3.5
- python3-pip
- python3-setuptools
- env:
- LABEL="meson Gallium Drivers RadeonSI"
- BUILD=meson
- UNWIND="true"
- DRI_LOADERS="-Dglx=disabled -Degl=false -Dgbm=false"
- GALLIUM_ST="-Ddri3=false -Dgallium-vdpau=false -Dgallium-xvmc=false -Dgallium-omx=disabled -Dgallium-va=false -Dgallium-xa=false -Dgallium-nine=false -Dgallium-opencl=disabled"
- GALLIUM_DRIVERS="radeonsi"
- LLVM_VERSION=7
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
addons:
apt:
sources:
- sourceline: 'deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-7 main'
key_url: https://apt.llvm.org/llvm-snapshot.gpg.key
packages:
# From sources above
- llvm-7-dev
# Common
- xz-utils
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- python3.5
- python3-pip
- python3-setuptools
- env:
- LABEL="meson Gallium Drivers Other"
- BUILD=meson
- UNWIND="true"
- DRI_LOADERS="-Dglx=disabled -Degl=false -Dgbm=false"
- GALLIUM_ST="-Ddri3=false -Dgallium-vdpau=false -Dgallium-xvmc=false -Dgallium-omx=disabled -Dgallium-va=false -Dgallium-xa=false -Dgallium-nine=false -Dgallium-opencl=disabled"
- GALLIUM_DRIVERS="i915,nouveau,kmsro,r300,r600,freedreno,svga,swrast,v3d,vc4,virgl,etnaviv"
- LLVM_VERSION=5.0
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
addons:
apt:
packages:
# LLVM packaging is broken and misses these dependencies
- libedit-dev
- llvm-5.0-dev
# Common
- xz-utils
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- python3.5
- python3-pip
- python3-setuptools
- env:
- LABEL="meson Gallium ST Clover LLVM-5.0"
- BUILD=meson
- UNWIND="true"
- DRI_LOADERS="-Dglx=disabled -Degl=false -Dgbm=false"
- GALLIUM_ST="-Ddri3=false -Dgallium-vdpau=false -Dgallium-xvmc=false -Dgallium-omx=disabled -Dgallium-va=false -Dgallium-xa=false -Dgallium-nine=false -Dgallium-opencl=icd"
- GALLIUM_DRIVERS="r600"
- LLVM_VERSION=5.0
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
addons:
apt:
packages:
- libclc-dev
# LLVM packaging is broken and misses these dependencies
- libedit-dev
- llvm-5.0-dev
- clang-5.0
- libclang-5.0-dev
# Common
- xz-utils
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- python3-pip
- python3-setuptools
- env:
- LABEL="meson Gallium ST Clover LLVM-6.0"
- BUILD=meson
- UNWIND="true"
- DRI_LOADERS="-Dglx=disabled -Degl=false -Dgbm=false"
- GALLIUM_ST="-Ddri3=false -Dgallium-vdpau=false -Dgallium-xvmc=false -Dgallium-omx=disabled -Dgallium-va=false -Dgallium-xa=false -Dgallium-nine=false -Dgallium-opencl=icd"
- GALLIUM_DRIVERS="r600"
- LLVM_VERSION=6.0
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
addons:
apt:
packages:
- libclc-dev
- llvm-6.0-dev
- clang-6.0
- libclang-6.0-dev
# Common
- xz-utils
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- python3.5
- python3-pip
- python3-setuptools
- env:
- LABEL="meson Gallium ST Clover LLVM-7"
- BUILD=meson
- UNWIND="true"
- DRI_LOADERS="-Dglx=disabled -Degl=false -Dgbm=false"
- GALLIUM_ST="-Ddri3=false -Dgallium-vdpau=false -Dgallium-xvmc=false -Dgallium-omx=disabled -Dgallium-va=false -Dgallium-xa=false -Dgallium-nine=false -Dgallium-opencl=icd"
- GALLIUM_DRIVERS="r600,radeonsi"
- LLVM_VERSION=7
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
addons:
apt:
sources:
- sourceline: 'deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-7 main'
key_url: https://apt.llvm.org/llvm-snapshot.gpg.key
packages:
- libclc-dev
# From sources above
- llvm-7-dev
- clang-7
- libclang-7-dev
# Common
- xz-utils
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- python3.5
- python3-pip
- python3-setuptools
- env:
- LABEL="meson Gallium ST Other"
- BUILD=meson
- UNWIND="true"
- DRI_LOADERS="-Dglx=disabled -Degl=false -Dgbm=false"
- GALLIUM_ST="-Ddri3=true -Dgallium-vdpau=true -Dgallium-xvmc=true -Dgallium-omx=bellagio -Dgallium-va=true -Dgallium-xa=true -Dgallium-nine=true -Dgallium-opencl=disabled -Dosmesa=gallium"
# We need swrast for osmesa and nine.
# Nouveau supports, or builds at least against all ST.
- GALLIUM_DRIVERS="nouveau,swrast"
- LLVM_VERSION=5.0
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
addons:
apt:
packages:
- llvm-5.0-dev
# LLVM packaging is broken and misses these dependencies
- libedit-dev
# Nine requires gcc 4.6... which is the one we have right ?
- libxvmc-dev
# Build locally, for now.
#- libvdpau-dev
#- libva-dev
- libomxil-bellagio-dev
# Common
- xz-utils
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- python3.5
- python3-pip
- python3-setuptools
- env:
# NOTE: Building SWR is 2x (yes two) times slower than all the other
# gallium drivers combined.
# Start this early so that it doesn't hunder the run time.
- LABEL="make Gallium Drivers SWR"
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="true"
- LLVM_VERSION=6.0
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
- DRI_DRIVERS=""
- GALLIUM_ST="--enable-dri --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"
- GALLIUM_DRIVERS="swr"
- VULKAN_DRIVERS=""
- LIBUNWIND_FLAGS="--enable-libunwind"
addons:
apt:
packages:
- llvm-6.0-dev
# Common
- xz-utils
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- python3-pip
- python3-setuptools
- env:
- LABEL="make Gallium Drivers RadeonSI"
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="true"
- LLVM_VERSION=7
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
- DRI_DRIVERS=""
- GALLIUM_ST="--enable-dri --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"
- GALLIUM_DRIVERS="radeonsi"
- VULKAN_DRIVERS=""
- LIBUNWIND_FLAGS="--enable-libunwind"
addons:
apt:
sources:
- sourceline: 'deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-7 main'
key_url: https://apt.llvm.org/llvm-snapshot.gpg.key
packages:
# From sources above
- llvm-7-dev
# Common
- xz-utils
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- python3-pip
- python3-setuptools
- env:
- LABEL="make Gallium Drivers Other"
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="true"
- LLVM_VERSION=3.9
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
- DRI_DRIVERS=""
- GALLIUM_ST="--enable-dri --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"
- GALLIUM_DRIVERS="i915,nouveau,kmsro,r300,r600,freedreno,svga,swrast,v3d,vc4,virgl,etnaviv"
- VULKAN_DRIVERS=""
- LIBUNWIND_FLAGS="--enable-libunwind"
addons:
apt:
packages:
# LLVM packaging is broken and misses these dependencies
- libedit-dev
- llvm-3.9-dev
# Common
- xz-utils
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- python3-pip
- python3-setuptools
- env:
- LABEL="make Gallium ST Clover LLVM-3.9"
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="true"
- LLVM_VERSION=3.9
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
- DRI_DRIVERS=""
- GALLIUM_ST="--disable-dri --enable-opencl --enable-opencl-icd --enable-llvm --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"
- GALLIUM_DRIVERS="r600"
- VULKAN_DRIVERS=""
- LIBUNWIND_FLAGS="--enable-libunwind"
addons:
apt:
packages:
- libclc-dev
# LLVM packaging is broken and misses these dependencies
- libedit-dev
- llvm-3.9-dev
- clang-3.9
- libclang-3.9-dev
# Common
- xz-utils
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- python3-pip
- python3-setuptools
- env:
- LABEL="make Gallium ST Clover LLVM-4.0"
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="true"
- LLVM_VERSION=4.0
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
- DRI_DRIVERS=""
- GALLIUM_ST="--disable-dri --enable-opencl --enable-opencl-icd --enable-llvm --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"
- GALLIUM_DRIVERS="r600"
- VULKAN_DRIVERS=""
- LIBUNWIND_FLAGS="--enable-libunwind"
addons:
apt:
packages:
- libclc-dev
# LLVM packaging is broken and misses these dependencies
- libedit-dev
- llvm-4.0-dev
- clang-4.0
- libclang-4.0-dev
# Common
- xz-utils
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- python3-pip
- python3-setuptools
- env:
- LABEL="make Gallium ST Clover LLVM-5.0"
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="true"
- LLVM_VERSION=5.0
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
- DRI_DRIVERS=""
- GALLIUM_ST="--disable-dri --enable-opencl --enable-opencl-icd --enable-llvm --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"
- GALLIUM_DRIVERS="r600"
- VULKAN_DRIVERS=""
- LIBUNWIND_FLAGS="--enable-libunwind"
addons:
apt:
packages:
- libclc-dev
# LLVM packaging is broken and misses these dependencies
- libedit-dev
- llvm-5.0-dev
- clang-5.0
- libclang-5.0-dev
# Common
- xz-utils
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- python3-pip
- python3-setuptools
- env:
- LABEL="make Gallium ST Clover LLVM-6.0"
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="true"
- LLVM_VERSION=6.0
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
- DRI_DRIVERS=""
- GALLIUM_ST="--disable-dri --enable-opencl --enable-opencl-icd --enable-llvm --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"
- GALLIUM_DRIVERS="r600"
- VULKAN_DRIVERS=""
- LIBUNWIND_FLAGS="--enable-libunwind"
addons:
apt:
packages:
- libclc-dev
- llvm-6.0-dev
- clang-6.0
- libclang-6.0-dev
# Common
- xz-utils
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- python3-pip
- python3-setuptools
- env:
- LABEL="make Gallium ST Clover LLVM-7"
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="true"
- LLVM_VERSION=7
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
- DRI_DRIVERS=""
- GALLIUM_ST="--disable-dri --enable-opencl --enable-opencl-icd --enable-llvm --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"
- GALLIUM_DRIVERS="r600,radeonsi"
- VULKAN_DRIVERS=""
- LIBUNWIND_FLAGS="--enable-libunwind"
addons:
apt:
sources:
- sourceline: 'deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-7 main'
key_url: https://apt.llvm.org/llvm-snapshot.gpg.key
packages:
- libclc-dev
# From sources above
- llvm-7-dev
- clang-7
- libclang-7-dev
# Common
- xz-utils
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- env:
- LABEL="make Gallium ST Other"
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="true"
- LLVM_VERSION=3.5
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"
- DRI_DRIVERS=""
- GALLIUM_ST="--enable-dri --disable-opencl --enable-xa --enable-nine --enable-xvmc --enable-vdpau --enable-va --enable-omx-bellagio --enable-gallium-osmesa"
# We need swrast for osmesa and nine.
# i915 most likely doesn't work with most ST.
# Regardless - we're doing a quick build test here.
- GALLIUM_DRIVERS="i915,swrast"
- VULKAN_DRIVERS=""
- LIBUNWIND_FLAGS="--enable-libunwind"
addons:
apt:
packages:
# We actually want to test against llvm-3.3, yet 3.5 is available
- llvm-3.5-dev
# Nine requires gcc 4.6... which is the one we have right ?
- libxvmc-dev
# Build locally, for now.
#- libvdpau-dev
#- libva-dev
- libomxil-bellagio-dev
# LLVM packaging is broken and misses these dependencies
- libedit-dev
# Common
- xz-utils
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- libunwind8-dev
- python3-pip
- python3-setuptools
- env:
- LABEL="make Vulkan"
- BUILD=make
- MAKEFLAGS="-j4"
- MAKE_CHECK_COMMAND="make -C src/gtest check && make -C src/intel check"
- LLVM_VERSION=7
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
- DRI_LOADERS="--disable-glx --disable-gbm --disable-egl --with-platforms=x11,wayland"
- DRI_DRIVERS=""
- GALLIUM_ST="--enable-dri --enable-dri3 --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"
- GALLIUM_DRIVERS=""
- VULKAN_DRIVERS="intel,radeon"
- LIBUNWIND_FLAGS="--disable-libunwind"
addons:
apt:
sources:
- sourceline: 'deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-7 main'
key_url: https://apt.llvm.org/llvm-snapshot.gpg.key
packages:
# From sources above
- llvm-7-dev
# Common
- xz-utils
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- python3-pip
- python3-setuptools
- env:
- LABEL="scons"
- BUILD=scons
- SCONSFLAGS="-j4"
# Explicitly disable.
- SCONS_TARGET="llvm=0"
# Keep it symmetrical to the make build.
- SCONS_CHECK_COMMAND="scons llvm=0 check"
addons:
apt:
packages:
# Common
- xz-utils
- x11proto-xf86vidmode-dev
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- env:
- LABEL="scons LLVM"
- BUILD=scons
- SCONSFLAGS="-j4"
- SCONS_TARGET="llvm=1"
# Keep it symmetrical to the make build.
- SCONS_CHECK_COMMAND="scons llvm=1 check"
- LLVM_VERSION=3.5
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
addons:
apt:
packages:
# LLVM packaging is broken and misses these dependencies
- libedit-dev
# We actually want to test against llvm-3.3, yet 3.5 is available
- llvm-3.5-dev
# Common
- xz-utils
- x11proto-xf86vidmode-dev
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- env:
- LABEL="scons SWR"
- BUILD=scons
- SCONSFLAGS="-j4"
- SCONS_TARGET="swr=1"
- LLVM_VERSION=6.0
- LLVM_CONFIG="llvm-config-${LLVM_VERSION}"
# Keep it symmetrical to the make build. There's no actual SWR, yet.
- SCONS_CHECK_COMMAND="true"
addons:
apt:
packages:
- llvm-6.0-dev
# Common
- xz-utils
- x11proto-xf86vidmode-dev
- libexpat1-dev
- libx11-xcb-dev
- libelf-dev
- env:
- LABEL="macOS make"
- BUILD=make
@@ -691,114 +62,9 @@ install:
pip2 install --user mako;
fi
# Install a more modern scons from pip.
- if test "x$BUILD" = xscons; then
pip2 install --user "scons>=2.4";
pip2 install --user mako;
fi
# Install dependencies where we require specific versions (or where
# disallowed by Travis CI's package whitelisting).
- |
if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then
wget $XORG_RELEASES/util/$XORGMACROS_VERSION.tar.bz2
tar -jxvf $XORGMACROS_VERSION.tar.bz2
(cd $XORGMACROS_VERSION && ./configure --prefix=$HOME/prefix && make install)
wget $XORG_RELEASES/proto/$GLPROTO_VERSION.tar.bz2
tar -jxvf $GLPROTO_VERSION.tar.bz2
(cd $GLPROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)
wget $XORG_RELEASES/proto/$DRI2PROTO_VERSION.tar.bz2
tar -jxvf $DRI2PROTO_VERSION.tar.bz2
(cd $DRI2PROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)
wget $XCB_RELEASES/$XCBPROTO_VERSION.tar.bz2
tar -jxvf $XCBPROTO_VERSION.tar.bz2
(cd $XCBPROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)
wget $XCB_RELEASES/$LIBXCB_VERSION.tar.bz2
tar -jxvf $LIBXCB_VERSION.tar.bz2
(cd $LIBXCB_VERSION && ./configure --prefix=$HOME/prefix && make install)
wget $XORG_RELEASES/lib/$LIBPCIACCESS_VERSION.tar.bz2
tar -jxvf $LIBPCIACCESS_VERSION.tar.bz2
(cd $LIBPCIACCESS_VERSION && ./configure --prefix=$HOME/prefix && make install)
wget https://dri.freedesktop.org/libdrm/$LIBDRM_VERSION.tar.bz2
tar -jxvf $LIBDRM_VERSION.tar.bz2
(cd $LIBDRM_VERSION && ./configure --prefix=$HOME/prefix --enable-vc4 --enable-freedreno --enable-etnaviv-experimental-api && make install)
wget $XORG_RELEASES/proto/$RANDRPROTO_VERSION.tar.bz2
tar -jxvf $RANDRPROTO_VERSION.tar.bz2
(cd $RANDRPROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)
wget $XORG_RELEASES/lib/$LIBXRANDR_VERSION.tar.bz2
tar -jxvf $LIBXRANDR_VERSION.tar.bz2
(cd $LIBXRANDR_VERSION && ./configure --prefix=$HOME/prefix && make install)
wget $XORG_RELEASES/lib/$LIBXSHMFENCE_VERSION.tar.bz2
tar -jxvf $LIBXSHMFENCE_VERSION.tar.bz2
(cd $LIBXSHMFENCE_VERSION && ./configure --prefix=$HOME/prefix && make install)
wget https://people.freedesktop.org/~aplattner/vdpau/$LIBVDPAU_VERSION.tar.bz2
tar -jxvf $LIBVDPAU_VERSION.tar.bz2
(cd $LIBVDPAU_VERSION && ./configure --prefix=$HOME/prefix && make install)
wget https://www.freedesktop.org/software/vaapi/releases/libva/$LIBVA_VERSION.tar.bz2
tar -jxvf $LIBVA_VERSION.tar.bz2
(cd $LIBVA_VERSION && ./configure --prefix=$HOME/prefix --disable-wayland --disable-dummy-driver && make install)
wget $WAYLAND_RELEASES/$LIBWAYLAND_VERSION.tar.xz
tar -axvf $LIBWAYLAND_VERSION.tar.xz
(cd $LIBWAYLAND_VERSION && ./configure --prefix=$HOME/prefix --enable-libraries --without-host-scanner --disable-documentation --disable-dtd-validation && make install)
wget $WAYLAND_RELEASES/$WAYLAND_PROTOCOLS_VERSION.tar.xz
tar -axvf $WAYLAND_PROTOCOLS_VERSION.tar.xz
(cd $WAYLAND_PROTOCOLS_VERSION && ./configure --prefix=$HOME/prefix && make install)
# Meson requires ninja >= 1.6, but xenial has 1.3.x
wget https://github.com/ninja-build/ninja/releases/download/v1.6.0/ninja-linux.zip
unzip ninja-linux.zip
mv ninja $HOME/prefix/bin/
# Generate this header since one is missing on the Travis instance
mkdir -p linux
printf "%s\n" \
"#ifndef _LINUX_MEMFD_H" \
"#define _LINUX_MEMFD_H" \
"" \
"#define MFD_CLOEXEC 0x0001U" \
"#define MFD_ALLOW_SEALING 0x0002U" \
"" \
"#endif /* _LINUX_MEMFD_H */" > linux/memfd.h
# Generate this header, including the missing SYS_memfd_create
# macro, which is not provided by the header in the Travis
# instance
mkdir -p sys
printf "%s\n" \
"#ifndef _SYSCALL_H" \
"#define _SYSCALL_H 1" \
"" \
"#include <asm/unistd.h>" \
"" \
"#ifndef _LIBC" \
"# include <bits/syscall.h>" \
"#endif" \
"" \
"#ifndef __NR_memfd_create" \
"# define __NR_memfd_create 319 /* Taken from <asm/unistd_64.h> */" \
"#endif" \
"" \
"#ifndef SYS_memfd_create" \
"# define SYS_memfd_create __NR_memfd_create" \
"#endif" \
"" \
"#endif" > sys/syscall.h
fi
script:
- if test "x$BUILD" = xmake; then
export CFLAGS="$CFLAGS -isystem`pwd`";
@@ -819,10 +85,6 @@ script:
make && eval $MAKE_CHECK_COMMAND;
fi
- if test "x$BUILD" = xscons; then
scons $SCONS_TARGET && eval $SCONS_CHECK_COMMAND;
fi
- |
if test "x$BUILD" = xmeson; then
if test -n "$LLVM_CONFIG"; then

View File

@@ -24,7 +24,7 @@
# BOARD_GPU_DRIVERS should be defined. The valid values are
#
# classic drivers: i915 i965
# gallium drivers: swrast freedreno i915g nouveau kmsro r300g r600g radeonsi vc4 virgl vmwgfx etnaviv
# gallium drivers: swrast freedreno i915g nouveau kmsro r300g r600g radeonsi vc4 virgl vmwgfx etnaviv iris
#
# The main target is libGLES_mesa. For each classic driver enabled, a DRI
# module will also be built. DRI modules will be loaded by libGLES_mesa.
@@ -59,7 +59,8 @@ gallium_drivers := \
vmwgfx.HAVE_GALLIUM_VMWGFX \
vc4.HAVE_GALLIUM_VC4 \
virgl.HAVE_GALLIUM_VIRGL \
etnaviv.HAVE_GALLIUM_ETNAVIV
etnaviv.HAVE_GALLIUM_ETNAVIV \
iris.HAVE_GALLIUM_IRIS
ifeq ($(BOARD_GPU_DRIVERS),all)
MESA_BUILD_CLASSIC := $(filter HAVE_%, $(subst ., , $(classic_drivers)))

View File

@@ -9,25 +9,6 @@ This repository lives at https://gitlab.freedesktop.org/mesa/mesa.
Other repositories are likely forks, and code found there is not supported.
Build status
------------
Travis:
.. image:: https://travis-ci.org/mesa3d/mesa.svg?branch=master
:target: https://travis-ci.org/mesa3d/mesa
Appveyor:
.. image:: https://img.shields.io/appveyor/ci/mesa3d/mesa.svg
:target: https://ci.appveyor.com/project/mesa3d/mesa
Coverity:
.. image:: https://scan.coverity.com/projects/139/badge.svg?flat=1
:target: https://scan.coverity.com/projects/mesa
Build & install
---------------

View File

@@ -1 +1 @@
19.0.8
19.1.0-devel

View File

@@ -1,47 +0,0 @@
# Both of these were already merged with different shas
da48cba61ef6fefb799bf96e6364b70dbf4ec712
c812c740e60c14060eb89db66039111881a0f42f
# The commit these fix was reverted from 19.0, but fixed for 19.1 due
# to the number of fixes required to make that commit work
8d8f80af3a17354508f2ec9d6559c915d5be351d
0c0c69729b6d72a5297122856c8fe48510e90764
0881e90c09965818b02e359474a6f7446b41d647
b031c643491a92a5574c7a4bd659df33f2d89bb6
# These were manually rebased by Jason, thanks!
8ab95b849e66f3221d80a67eef2ec6e3730901a8
5c30fffeec1732c21d600c036f95f8cdb1bb5487
# This doesn't actually appliy to 19.0
29179f58c6ba8099859ea25900214dbbd3814a92
# This was superceeded by a manual backport from ken
6981069fc805da1afc867ca3c905075d146d7ff9
# This was manually backported
0bc1942c9ddce4e796322a7561f06af5dec0decd
# This doesn't need to be applied, it already seems to exist in stable.
80dc78407d0d1e03ceddf8889b217e8fd113568d
# This was backported manually
4f18c43d1df64135e8968a7d4fbfd2c9918b76ae
# These were de-nominated since they don't apply nicley
88105375c978f9de82af8c654051e5aa16d61614
c9358621276ae49162e58d4a16fe37abda6a347f
# These are only for 19.1
c3538ab5702ceeead284c2b5f9e700f3082c8135
d2aa65eb1892f7b300ac24560f9dbda6b600b5a7
78e35df52aa2f7d770f929a0866a0faa89c261a9
0f1b070bad34c46c4bcc6c679fa533bf6b4b79e5
ad2b4aa37806779bdfc15d704940136c3db21eb4
9dc57eebd578b976b94c54d083377ba0920d43a8
5820ac6756898a1bd30bde04555437a55c378726
ffd2f948fee271cbbce93708fc508dab7cb5d14c
# This was manually rebased and the script doesn't understand that for some
# reason
cb7c9b2a9352cc73a2d3becc0427c53c8baf153a

View File

@@ -35,11 +35,7 @@ def main():
args = parser.parse_args()
if os.path.isabs(args.libdir):
destdir = os.environ.get('DESTDIR')
if destdir:
to = os.path.join(destdir, args.libdir[1:])
else:
to = args.libdir
to = os.path.join(os.environ.get('DESTDIR', '/'), args.libdir[1:])
else:
to = os.path.join(os.environ['MESON_INSTALL_DESTDIR_PREFIX'], args.libdir)
@@ -49,6 +45,7 @@ def main():
if os.path.lexists(to):
os.unlink(to)
os.makedirs(to)
shutil.copy(args.megadriver, master)
for driver in args.drivers:
abs_driver = os.path.join(to, driver)
@@ -70,14 +67,7 @@ def main():
name, ext = os.path.splitext(name)
finally:
os.chdir(ret)
# Remove meson-created master .so and symlinks
os.unlink(master)
name, ext = os.path.splitext(master)
while ext != '.so':
if os.path.lexists(name):
os.unlink(name)
name, ext = os.path.splitext(name)
if __name__ == '__main__':

63
bin/meson-options.py Executable file
View File

@@ -0,0 +1,63 @@
#!/usr/bin/env python3
from os import get_terminal_size
from textwrap import wrap
from mesonbuild import coredata
from mesonbuild import optinterpreter
(COLUMNS, _) = get_terminal_size()
def describe_option(option_name: str, option_default_value: str,
option_type: str, option_message: str) -> None:
print('name: ' + option_name)
print('default: ' + option_default_value)
print('type: ' + option_type)
for line in wrap(option_message, width=COLUMNS - 9):
print(' ' + line)
print('---')
oi = optinterpreter.OptionInterpreter('')
oi.process('meson_options.txt')
for (name, value) in oi.options.items():
if isinstance(value, coredata.UserStringOption):
describe_option(name,
value.value,
'string',
"You can type what you want, but make sure it makes sense")
elif isinstance(value, coredata.UserBooleanOption):
describe_option(name,
'true' if value.value else 'false',
'boolean',
"You can set it to 'true' or 'false'")
elif isinstance(value, coredata.UserIntegerOption):
describe_option(name,
str(value.value),
'integer',
"You can set it to any integer value between '{}' and '{}'".format(value.min_value, value.max_value))
elif isinstance(value, coredata.UserUmaskOption):
describe_option(name,
str(value.value),
'umask',
"You can set it to 'preserve' or a value between '0000' and '0777'")
elif isinstance(value, coredata.UserComboOption):
choices = '[' + ', '.join(["'" + v + "'" for v in value.choices]) + ']'
describe_option(name,
value.value,
'combo',
"You can set it to any one of those values: " + choices)
elif isinstance(value, coredata.UserArrayOption):
choices = '[' + ', '.join(["'" + v + "'" for v in value.choices]) + ']'
value = '[' + ', '.join(["'" + v + "'" for v in value.value]) + ']'
describe_option(name,
value,
'array',
"You can set it to one or more of those values: " + choices)
elif isinstance(value, coredata.UserFeatureOption):
describe_option(name,
value.value,
'feature',
"You can set it to 'auto', 'enabled', or 'disabled'")
else:
print(name + ' is an option of a type unknown to this script')
print('---')

View File

@@ -122,7 +122,7 @@ LLVM_REQUIRED_OPENCL=3.9.0
LLVM_REQUIRED_R600=3.9.0
LLVM_REQUIRED_RADEONSI=7.0.0
LLVM_REQUIRED_RADV=7.0.0
LLVM_REQUIRED_SWR=7.0.0
LLVM_REQUIRED_SWR=6.0.0
dnl Check for progs
AC_PROG_CPP
@@ -2357,7 +2357,7 @@ if test "x$enable_xvmc" = xyes -o \
"x$enable_omx_tizonia" = xyes -o \
"x$enable_va" = xyes; then
if echo $platforms | grep -q "x11"; then
PKG_CHECK_MODULES([VL], [x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED libdrm >= $LIBDRM_REQUIRED])
PKG_CHECK_MODULES([VL], [x11-xcb xcb xcb-dri2 >= $XCBDRI2_REQUIRED])
fi
need_gallium_vl_winsys=yes
fi
@@ -2845,8 +2845,8 @@ if test -n "$with_gallium_drivers"; then
fi
# XXX: Keep in sync with LLVM_REQUIRED_SWR
AM_CONDITIONAL(SWR_INVALID_LLVM_VERSION, test "x$LLVM_VERSION" != x7.0.0 -a \
"x$LLVM_VERSION" != x7.0.1)
AM_CONDITIONAL(SWR_INVALID_LLVM_VERSION, test "x$LLVM_VERSION" != x6.0.0 -a \
"x$LLVM_VERSION" != x6.0.1)
if test "x$enable_llvm" = "xyes" -a "$with_gallium_drivers"; then
llvm_require_version $LLVM_REQUIRED_GALLIUM "gallium"

View File

@@ -14,7 +14,7 @@
<iframe src="contents.html"></iframe>
<div class="content">
<h1>Bug Database</h1>
<h1>Report a bug</h1>
<p>
The Mesa bug database is hosted on

View File

@@ -49,10 +49,10 @@
<li><a href="precompiled.html" target="_parent">Precompiled Libraries</a>
</ul>
<b>Resources</b>
<b>Need help?</b>
<ul>
<li><a href="lists.html" target="_parent">Mailing Lists</a>
<li><a href="bugs.html" target="_parent">Bug Database</a>
<li><a href="bugs.html" target="_parent">Report a bug</a>
<li><a href="webmaster.html" target="_parent">Webmaster</a>
<li><a href="https://dri.freedesktop.org/" target="_parent">Mesa/DRI Wiki</a>
</ul>

View File

@@ -338,9 +338,6 @@ See src/mesa/state_tracker/st_debug.c for other options.
for details.
<li>SVGA_EXTRA_LOGGING - if set, enables extra logging to the vmware.log file,
such as the OpenGL program's name and command line arguments.
<li>SVGA_NO_LOGGING - if set, disables logging to the vmware.log file.
This is useful when using Valgrind because it otherwise crashes when
initializing the host log feature.
<li>See the driver code for other, lesser-used variables.
</ul>

View File

@@ -204,7 +204,7 @@ GL 4.4, GLSL 4.40 -- all DONE: i965/gen8+, nvc0, r600, radeonsi
- specified transform/feedback layout DONE
- input/output block locations DONE
GL_ARB_multi_bind DONE (all drivers)
GL_ARB_query_buffer_object DONE (i965/hsw+)
GL_ARB_query_buffer_object DONE (i965/hsw+, virgl)
GL_ARB_texture_mirror_clamp_to_edge DONE (i965, nv50, llvmpipe, softpipe, swr, virgl)
GL_ARB_texture_stencil8 DONE (freedreno, i965/hsw+, nv50, llvmpipe, softpipe, swr, virgl)
GL_ARB_vertex_type_10f_11f_11f_rev DONE (i965, nv50, llvmpipe, softpipe, swr, virgl)

View File

@@ -15,6 +15,18 @@
<div class="content">
<h1>News</h1>
<h2>February 18, 2019</h2>
<p>
<a href="relnotes/18.3.4.html">Mesa 18.3.4</a> is released.
This is a bug-fix release.
</p>
<h2>January 31, 2019</h2>
<p>
<a href="relnotes/18.3.3.html">Mesa 18.3.3</a> is released.
This is a bug-fix release.
</p>
<h2>January 17, 2019</h2>
<p>
<a href="relnotes/18.3.2.html">Mesa 18.3.2</a> is released.

View File

@@ -58,7 +58,9 @@ and your local settings.
<p>
Meson does not currently support listing options before configure a build
directory, but this feature is being discussed upstream.
For now, the only way to see what options exist is to look at the
For now, we have a <code>bin/meson-options.py</code> script that prints
the options for you.
If that script doesn't work for some reason, you can always look in the
<code>meson_options.txt</code> file at the root of the project.
</p>

View File

@@ -49,19 +49,7 @@ if you'd like to nominate a patch in the next stable release.
<th>Notes</th>
</tr>
<tr>
<td rowspan="4">18.3</td>
<td>2019-01-30</td>
<td>18.3.3</td>
<td>Emil Velikov</td>
<td>
</tr>
<tr>
<td>2019-02-13</td>
<td>18.3.4</td>
<td>Emil Velikov</td>
<td>
</tr>
<tr>
<td rowspan="2">18.3</td>
<td>2019-02-27</td>
<td>18.3.5</td>
<td>Emil Velikov</td>

View File

@@ -21,6 +21,8 @@ The release notes summarize what's new or changed in each Mesa release.
</p>
<ul>
<li><a href="relnotes/18.3.4.html">18.3.4 release notes</a>
<li><a href="relnotes/18.3.3.html">18.3.3 release notes</a>
<li><a href="relnotes/18.3.2.html">18.3.2 release notes</a>
<li><a href="relnotes/18.2.8.html">18.2.8 release notes</a>
<li><a href="relnotes/18.2.7.html">18.2.7 release notes</a>

208
docs/relnotes/18.3.3.html Normal file
View File

@@ -0,0 +1,208 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 18.3.3 Release Notes / January 31, 2019</h1>
<p>
Mesa 18.3.3 is a bug fix release which fixes bugs found since the 18.3.2 release.
</p>
<p>
Mesa 18.3.3 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
</p>
<h2>SHA256 checksums</h2>
<pre>
6b9893942fe8011c7736d51448deb6ef80ece2257e0fac27b02e997a6605d5e4 mesa-18.3.3.tar.gz
2ab6886a6966c532ccbcc3b240925e681464b658244f0cbed752615af3936299 mesa-18.3.3.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=108877">Bug 108877</a> - OpenGL CTS gl43 test cases were interrupted due to segment fault</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109023">Bug 109023</a> - error: inlining failed in call to always_inline __m512 _mm512_and_ps(__m512, __m512): target specific option mismatch</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109129">Bug 109129</a> - format_types.h:1220: undefined reference to `_mm256_cvtps_ph'</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109229">Bug 109229</a> - glLinkProgram locks up for ~30 seconds</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109242">Bug 109242</a> - [RADV] The Witcher 3 system freeze</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109488">Bug 109488</a> - Mesa 18.3.2 crash on a specific fragment shader (assert triggered) / already fixed on the master branch.</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (2):</p>
<ul>
<li>bin/get-pick-list.sh: fix the oneline printing</li>
<li>bin/get-pick-list.sh: fix redirection in sh</li>
</ul>
<p>Axel Davy (1):</p>
<ul>
<li>st/nine: Immediately upload user provided textures</li>
</ul>
<p>Bas Nieuwenhuizen (3):</p>
<ul>
<li>radv: Only use 32 KiB per threadgroup on Stoney.</li>
<li>radv: Set partial_vs_wave for pipelines with just GS, not tess.</li>
<li>nir: Account for atomics in copy propagation.</li>
</ul>
<p>Bruce Cherniak (1):</p>
<ul>
<li>gallium/swr: Fix multi-context sync fence deadlock.</li>
</ul>
<p>Carsten Haitzler (Rasterman) (2):</p>
<ul>
<li>vc4: Use named parameters for the NEON inline asm.</li>
<li>vc4: Declare the cpu pointers as being modified in NEON asm.</li>
</ul>
<p>Danylo Piliaiev (1):</p>
<ul>
<li>glsl: Fix copying function's out to temp if dereferenced by array</li>
</ul>
<p>Dave Airlie (3):</p>
<ul>
<li>dri_interface: add put shm image2 (v2)</li>
<li>glx: add support for putimageshm2 path (v2)</li>
<li>gallium: use put image shm2 path (v2)</li>
</ul>
<p>Dylan Baker (4):</p>
<ul>
<li>meson: allow building dri driver without window system if osmesa is classic</li>
<li>meson: fix swr KNL build</li>
<li>meson: Fix compiler checks for SWR with ICC</li>
<li>meson: Add warnings and errors when using ICC</li>
</ul>
<p>Emil Velikov (4):</p>
<ul>
<li>docs: add sha256 checksums for 18.3.2</li>
<li>cherry-ignore: radv: Fix multiview depth clears</li>
<li>cherry-ignore: spirv: Handle arbitrary bit sizes for deref array indices</li>
<li>cherry-ignore: WARNING: Commit XXX lists invalid sha</li>
</ul>
<p>Eric Anholt (2):</p>
<ul>
<li>vc4: Don't leak the GPU fd for renderonly usage.</li>
<li>vc4: Enable NEON asm on meson cross-builds.</li>
</ul>
<p>Eric Engestrom (2):</p>
<ul>
<li>configure: EGL requirements only apply if EGL is built</li>
<li>meson/vdpau: add missing soversion</li>
</ul>
<p>Iago Toral Quiroga (1):</p>
<ul>
<li>anv/device: fix maximum number of images supported</li>
</ul>
<p>Jason Ekstrand (3):</p>
<ul>
<li>anv/nir: Rework arguments to apply_pipeline_layout</li>
<li>anv: Only parse pImmutableSamplers if the descriptor has samplers</li>
<li>nir/xfb: Fix offset accounting for dvec3/4</li>
</ul>
<p>Karol Herbst (2):</p>
<ul>
<li>nv50/ir: disable tryCollapseChainedMULs in ConstantFolding for precise instructions</li>
<li>glsl/lower_output_reads: set invariant and precise flags on temporaries</li>
</ul>
<p>Lionel Landwerlin (1):</p>
<ul>
<li>anv: fix invalid binding table index computation</li>
</ul>
<p>Marek Olšák (4):</p>
<ul>
<li>radeonsi: also apply the GS hang workaround to draws without tessellation</li>
<li>radeonsi: fix a u_blitter crash after a shader with FBFETCH</li>
<li>radeonsi: fix rendering to tiny viewports where the viewport center is &gt; 8K</li>
<li>st/mesa: purge framebuffers when unbinding a context</li>
</ul>
<p>Niklas Haas (1):</p>
<ul>
<li>radv: correctly use vulkan 1.0 by default</li>
</ul>
<p>Pierre Moreau (1):</p>
<ul>
<li>meson: Fix with_gallium_icd to with_opencl_icd</li>
</ul>
<p>Rob Clark (1):</p>
<ul>
<li>loader: fix the no-modifiers case</li>
</ul>
<p>Samuel Pitoiset (1):</p>
<ul>
<li>radv: clean up setting partial_es_wave for distributed tess on VI</li>
</ul>
<p>Timothy Arceri (5):</p>
<ul>
<li>ac/nir_to_llvm: fix interpolateAt* for arrays</li>
<li>ac/nir_to_llvm: fix clamp shadow reference for more hardware</li>
<li>radv/ac: fix some fp16 handling</li>
<li>glsl: use remap location when serialising uniform program resource data</li>
<li>glsl: Copy function out to temp if we don't directly ref a variable</li>
</ul>
<p>Tomeu Vizoso (1):</p>
<ul>
<li>etnaviv: Consolidate buffer references from framebuffers</li>
</ul>
<p>Vinson Lee (1):</p>
<ul>
<li>meson: Fix typo.</li>
</ul>
</div>
</body>
</html>

180
docs/relnotes/18.3.4.html Normal file
View File

@@ -0,0 +1,180 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 18.3.4 Release Notes / February 18, 2019</h1>
<p>
Mesa 18.3.4 is a bug fix release which fixes bugs found since the 18.3.3 release.
</p>
<p>
Mesa 18.3.4 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
</p>
<h2>SHA256 checksums</h2>
<pre>
e22e6fe4c3aca80fe872a0a7285b6c5523e0cfc0bfb57ffcc3b3d66d292593e4 mesa-18.3.4.tar.gz
32314da4365d37f80d84f599bd9625b00161c273c39600ba63b45002d500bb07 mesa-18.3.4.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109107">Bug 109107</a> - gallium/st/va: change va max_profiles when using Radeon VCN Hardware</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109401">Bug 109401</a> - [DXVK] Project Cars rendering problems</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109543">Bug 109543</a> - After upgrade mesa to 19.0.0~rc1 all vulkan based application stop working [&quot;vulkan-cube&quot; received SIGSEGV in radv_pipeline_init_blend_state at ../src/amd/vulkan/radv_pipeline.c:699]</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109603">Bug 109603</a> - nir_instr_as_deref: Assertion `parent &amp;&amp; parent-&gt;type == nir_instr_type_deref' failed.</li>
</ul>
<h2>Changes</h2>
<p>Bart Oldeman (1):</p>
<ul>
<li>gallium-xlib: query MIT-SHM before using it.</li>
</ul>
<p>Bas Nieuwenhuizen (2):</p>
<ul>
<li>radv: Only look at pImmutableSamples if the descriptor has a sampler.</li>
<li>amd/common: Use correct writemask for shared memory stores.</li>
</ul>
<p>Dylan Baker (2):</p>
<ul>
<li>get-pick-list: Add --pretty=medium to the arguments for Cc patches</li>
<li>meson: Add dependency on genxml to anvil</li>
</ul>
<p>Emil Velikov (5):</p>
<ul>
<li>docs: add sha256 checksums for 18.3.3</li>
<li>cherry-ignore: nv50,nvc0: add explicit settings for recent caps</li>
<li>cherry-ignore: add more 19.0 only nominations from Ilia</li>
<li>cherry-ignore: radv: fix using LOAD_CONTEXT_REG with old GFX ME firmwares on GFX8</li>
<li>Update version to 18.3.4</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>vc4: Fix copy-and-paste fail in backport of NEON asm fixes.</li>
</ul>
<p>Eric Engestrom (2):</p>
<ul>
<li>xvmc: fix string comparison</li>
<li>xvmc: fix string comparison</li>
</ul>
<p>Ernestas Kulik (2):</p>
<ul>
<li>vc4: Fix leak in HW queries error path</li>
<li>v3d: Fix leak in resource setup error path</li>
</ul>
<p>Iago Toral Quiroga (1):</p>
<ul>
<li>intel/compiler: do not copy-propagate strided regions to ddx/ddy arguments</li>
</ul>
<p>Ilia Mirkin (1):</p>
<ul>
<li>nvc0: we have 16k-sized framebuffers, fix default scissors</li>
</ul>
<p>Jason Ekstrand (3):</p>
<ul>
<li>intel/fs: Handle IMAGE_SIZE in size_read() and is_send_from_grf()</li>
<li>intel/fs: Do the grf127 hack on SIMD8 instructions in SIMD16 mode</li>
<li>nir/deref: Rematerialize parents in rematerialize_derefs_in_use_blocks</li>
</ul>
<p>Juan A. Suarez Romero (1):</p>
<ul>
<li>anv/cmd_buffer: check for NULL framebuffer</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>st/mesa: Limit GL_MAX_[NATIVE_]PROGRAM_PARAMETERS_ARB to 2048</li>
</ul>
<p>Kristian H. Kristensen (1):</p>
<ul>
<li>freedreno/a6xx: Emit blitter dst with OUT_RELOCW</li>
</ul>
<p>Leo Liu (2):</p>
<ul>
<li>st/va: fix the incorrect max profiles report</li>
<li>st/va/vp9: set max reference as default of VP9 reference number</li>
</ul>
<p>Marek Olšák (4):</p>
<ul>
<li>meson: drop the xcb-xrandr version requirement</li>
<li>gallium/u_threaded: fix EXPLICIT_FLUSH for flush offsets &gt; 0</li>
<li>radeonsi: fix EXPLICIT_FLUSH for flush offsets &gt; 0</li>
<li>winsys/amdgpu: don't drop manually added fence dependencies</li>
</ul>
<p>Mario Kleiner (2):</p>
<ul>
<li>egl/wayland: Allow client-&gt;server format conversion for PRIME offload. (v2)</li>
<li>egl/wayland-drm: Only announce formats via wl_drm which the driver supports.</li>
</ul>
<p>Oscar Blumberg (1):</p>
<ul>
<li>radeonsi: Fix guardband computation for large render targets</li>
</ul>
<p>Rob Clark (1):</p>
<ul>
<li>freedreno: stop frob'ing pipe_resource::nr_samples</li>
</ul>
<p>Rodrigo Vivi (1):</p>
<ul>
<li>intel: Add more PCI Device IDs for Coffee Lake and Ice Lake.</li>
</ul>
<p>Samuel Pitoiset (2):</p>
<ul>
<li>radv: fix compiler issues with GCC 9</li>
<li>radv: always export gl_SampleMask when the fragment shader uses it</li>
</ul>
</div>
</body>
</html>

File diff suppressed because it is too large Load Diff

View File

@@ -1,159 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 19.0.1 Release Notes / March 27, 2019</h1>
<p>
Mesa 19.0.1 is a bug fix release which fixes bugs found since the 19.0.0 release.
</p>
<p>
Mesa 19.0.1 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
</p>
<h2>SHA256 checksums</h2>
<pre>
f1dd1980ed628edea3935eed7974fbc5d8353e9578c562728b880d63ac613dbd mesa-19.0.1.tar.gz
6884163c0ea9e4c98378ab8fecd72fe7b5f437713a14471beda378df247999d4 mesa-19.0.1.tar.xz
</pre>
<h2>New features</h2>
<p>None</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100316">Bug 100316</a> - Linking GLSL 1.30 shaders with invariant and deprecated variables triggers an 'mismatching invariant qualifiers' error</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=107563">Bug 107563</a> - [RADV] Broken rendering in Unity demos</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109698">Bug 109698</a> - dri.pc contents invalid when built with meson</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109980">Bug 109980</a> - [i915 CI][HSW] spec&#64;arb_fragment_shader_interlock&#64;arb_fragment_shader_interlock-image-load-store - fail</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110211">Bug 110211</a> - If DESTDIR is set to an empty string, the dri drivers are not installed</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110221">Bug 110221</a> - build error with meson</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (4):</p>
<ul>
<li>glsl: correctly validate component layout qualifier for dvec{3,4}</li>
<li>glsl/linker: don't fail non static used inputs without matching outputs</li>
<li>glsl/linker: simplify xfb_offset vs xfb_stride overflow check</li>
<li>Revert "glsl: relax input-&gt;output validation for SSO programs"</li>
</ul>
<p>Bas Nieuwenhuizen (2):</p>
<ul>
<li>radv: Use correct image view comparison for fast clears.</li>
<li>ac/nir: Return frag_coord as integer.</li>
</ul>
<p>Danylo Piliaiev (2):</p>
<ul>
<li>anv: Treat zero size XFB buffer as disabled</li>
<li>glsl: Cross validate variable's invariance by explicit invariance only</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>softpipe: fix texture view crashes</li>
</ul>
<p>Dylan Baker (5):</p>
<ul>
<li>docs: Add SHA256 sums for 19.0.0</li>
<li>cherry-ignore: Add commit that doesn't apply</li>
<li>bin/install_megadrivers.py: Correctly handle DESTDIR=''</li>
<li>bin/install_megadrivers.py: Fix regression for set DESTDIR</li>
<li>bump version for 19.0.1</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>v3d: Fix leak of the renderonly struct on screen destruction.</li>
</ul>
<p>Jason Ekstrand (6):</p>
<ul>
<li>glsl/lower_vector_derefs: Don't use a temporary for TCS outputs</li>
<li>glsl/list: Add a list variant of insert_after</li>
<li>anv/pass: Flag the need for a RT flush for resolve attachments</li>
<li>nir/builder: Add a vector extract helper</li>
<li>nir: Add a new pass to lower array dereferences on vectors</li>
<li>intel/nir: Lower array-deref-of-vector UBO and SSBO loads</li>
</ul>
<p>Józef Kucia (2):</p>
<ul>
<li>radv: Fix driverUUID</li>
<li>mesa: Fix GL_NUM_DEVICE_UUIDS_EXT</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>intel/fs: Fix opt_peephole_csel to not throw away saturates.</li>
</ul>
<p>Kevin Strasser (1):</p>
<ul>
<li>egl/dri: Avoid out of bounds array access</li>
</ul>
<p>Mark Janes (1):</p>
<ul>
<li>mesa: properly report the length of truncated log messages</li>
</ul>
<p>Plamena Manolova (1):</p>
<ul>
<li>i965: Disable ARB_fragment_shader_interlock for platforms prior to GEN9</li>
</ul>
<p>Samuel Pitoiset (3):</p>
<ul>
<li>radv: set the maximum number of IBs per submit to 192</li>
<li>radv: always initialize HTILE when the src layout is UNDEFINED</li>
<li>radv: fix binding transform feedback buffers</li>
</ul>
<p>Sergii Romantsov (1):</p>
<ul>
<li>d3d: meson: do not prefix user provided d3d-drivers-path</li>
</ul>
<p>Tapani Pälli (2):</p>
<ul>
<li>isl: fix automake build when sse41 is not supported</li>
<li>anv/radv: release memory allocated by glsl types during spirv_to_nir</li>
</ul>
</div>
</body>
</html>

View File

@@ -1,122 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 19.0.2 Release Notes / April 10, 2019</h1>
<p>
Mesa 19.0.2 is a bug fix release which fixes bugs found since the 19.0.1 release.
</p>
<p>
Mesa 19.0.2 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
</p>
<h2>SHA256 checksums</h2>
<pre>
SHA256: eb972fc11d4e1261d34ec0b91a701f158d4870c0428fb108353ae7eab64b1118 mesa-19.0.2.tar.gz
SHA256: 1a2edc3ce56906a676c91e6851298db45903df1f5cb9827395a922c1452db802 mesa-19.0.2.tar.xz
</pre>
<h2>New features</h2>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=108766">Bug 108766</a> - Mesa built with meson has RPATH entries</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109648">Bug 109648</a> - AMD Raven hang during va-api decoding</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110257">Bug 110257</a> - Major artifacts in mpeg2 vaapi hw decoding</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110259">Bug 110259</a> - radv: Sampling depth-stencil image in GENERAL layout returns nothing but zero (regression, bisected)</li>
</ul>
<h2>Changes</h2>
<p>Boyuan Zhang (1):</p>
<ul>
<li>st/va: reverse qt matrix back to its original order</li>
</ul>
<p>Caio Marcelo de Oliveira Filho (1):</p>
<ul>
<li>nir: Take if_uses into account when repairing SSA</li>
</ul>
<p>Dylan Baker (2):</p>
<ul>
<li>docs: Add SHA256 sums for mesa 19.0.1</li>
<li>VERSION: bump version for 19.0.2</li>
</ul>
<p>Eric Anholt (3):</p>
<ul>
<li>dri3: Return the current swap interval from glXGetSwapIntervalMESA().</li>
<li>v3d: Bump the maximum texture size to 4k for V3D 4.x.</li>
<li>v3d: Don't try to use the TFU blit path if a scissor is enabled.</li>
</ul>
<p>Eric Engestrom (1):</p>
<ul>
<li>meson: strip rpath from megadrivers</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>Revert "anv/radv: release memory allocated by glsl types during spirv_to_nir"</li>
</ul>
<p>Karol Herbst (1):</p>
<ul>
<li>nir/print: fix printing the image_array intrinsic index</li>
</ul>
<p>Leo Liu (2):</p>
<ul>
<li>radeon/vcn: add H.264 constrained baseline support</li>
<li>radeon/vcn/vp9: search the render target from the whole list</li>
</ul>
<p>Lionel Landwerlin (1):</p>
<ul>
<li>intel: add dependency on genxml generated files</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>radeonsi: fix assertion failure by using the correct type</li>
</ul>
<p>Samuel Pitoiset (2):</p>
<ul>
<li>radv: skip updating depth/color metadata for conditional rendering</li>
<li>radv: do not always initialize HTILE in compressed state</li>
</ul>
</div>
</body>
</html>

View File

@@ -1,148 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 19.0.3 Release Notes / April 24, 2019</h1>
<p>
Mesa 19.0.3 is a bug fix release which fixes bugs found since the l9.0.2 release.
</p>
<p>
Mesa 19.0.3 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
</p>
<h2>SHA256 checksums</h2>
<pre>
59543ec3c9f8c72990e77887f13d1678cb6739e5d5f56abc21ebf9e772389c5e mesa-19.0.3.tar.gz
f027244e38dc309a4c12db45ef79be81ab62c797a50a88d566e4edb6159fc4d5 mesa-19.0.3.tar.xz
</pre>
<h2>New features</h2>
<p>N/A</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=108879">Bug 108879</a> - [CIK] [regression] All opencl apps hangs indefinitely in si_create_context</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110201">Bug 110201</a> - [ivb] mesa 19.0.0 breaks rendering in kitty</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110356">Bug 110356</a> - install_megadrivers.py creates new dangling symlink [bisected]</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110441">Bug 110441</a> - [llvmpipe] complex-loop-analysis-bug regression</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (1):</p>
<ul>
<li>glsl/linker: location aliasing requires types to have the same width</li>
</ul>
<p>Bas Nieuwenhuizen (1):</p>
<ul>
<li>ac: Move has_local_buffers disable to radeonsi.</li>
</ul>
<p>Chia-I Wu (1):</p>
<ul>
<li>virgl: fix fence fd version check</li>
</ul>
<p>Danylo Piliaiev (1):</p>
<ul>
<li>intel/compiler: Do not reswizzle dst if instruction writes to flag register</li>
</ul>
<p>Dylan Baker (2):</p>
<ul>
<li>docs: Add sha256 sums for 19.0.2</li>
<li>Bump version for 19.0.3</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>nir: Fix deref offset calculation for structs.</li>
</ul>
<p>Eric Engestrom (1):</p>
<ul>
<li>meson: remove meson-created megadrivers symlinks</li>
</ul>
<p>Jason Ekstrand (2):</p>
<ul>
<li>anv/pipeline: Fix MEDIA_VFE_STATE::PerThreadScratchSpace on gen7</li>
<li>anv: Add a #define for the max binding table size</li>
</ul>
<p>Juan A. Suarez Romero (1):</p>
<ul>
<li>meson: Add dependency on genxml to anvil genfiles</li>
</ul>
<p>Kenneth Graunke (2):</p>
<ul>
<li>glsl: Set location on structure-split sampler uniform variables</li>
<li>Revert "glsl: Set location on structure-split sampler uniform variables"</li>
</ul>
<p>Lionel Landwerlin (2):</p>
<ul>
<li>anv: fix uninitialized pthread cond clock domain</li>
<li>intel/devinfo: fix missing num_thread_per_eu on ICL</li>
</ul>
<p>Lubomir Rintel (2):</p>
<ul>
<li>gallivm: guess CPU features also on ARM</li>
<li>gallivm: disable NEON instructions if they are not supported</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>radeonsi: use CP DMA for the null const buffer clear on CIK</li>
</ul>
<p>Rhys Perry (1):</p>
<ul>
<li>nir,ac/nir: fix cube_face_coord</li>
</ul>
<p>Roland Scheidegger (1):</p>
<ul>
<li>gallivm: fix bogus assert in get_indirect_index</li>
</ul>
<p>Samuel Pitoiset (2):</p>
<ul>
<li>ac/nir: only use the new raw/struct image atomic intrinsics with LLVM 9+</li>
<li>radv: do not load vertex attributes that are not provided by the pipeline</li>
</ul>
</div>
</body>
</html>

View File

@@ -1,243 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 19.0.4 Release Notes / May 9, 2019</h1>
<p>
Mesa 19.0.4 is a bug fix release which fixes bugs found since the 19.0.3 release.
</p>
<p>
Mesa 19.0.4 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
</p>
<h2>SHA256 checksums</h2>
<pre>
de361c76bf7aae09219f571b9ae77a34864a1cd9f6ba24c845b18b3cd5e4b9a2 mesa-19.0.4.tar.gz
39f9f32f448d77388ef817c6098d50eb0c1595815ce7e895dec09dd68774ce47 mesa-19.0.4.tar.xz
</pre>
<h2>New features</h2>
<p>N/A</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=99781">Bug 99781</a> - Some Unity games fail assertion on startup in glXCreateContextAttribsARB</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=100239">Bug 100239</a> - Incorrect rendering in CS:GO</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=108540">Bug 108540</a> - vkAcquireNextImageKHR blocks when timeout=0 in Wayland</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110143">Bug 110143</a> - Doom 3: BFG Edition - Steam and GOG.com - white flickering screen</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110291">Bug 110291</a> - Vega 64 GPU hang running Space Engineers</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110355">Bug 110355</a> - radeonsi: GTK elements become invisible in some applications (GIMP, LibreOffice)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110573">Bug 110573</a> - Mesa vulkan-radeon 19.0.3 system freeze and visual artifacts (RADV)</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110590">Bug 110590</a> - [Regression][Bisected] GTAⅣ under wine fails with GLXBadFBConfig</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110632">Bug 110632</a> - &quot;glx: Fix synthetic error generation in __glXSendError&quot; broke wine games on 32-bit</li>
</ul>
<h2>Changes</h2>
<p>Alejandro Piñeiro (1):</p>
<ul>
<li>docs: document MESA_GLSL=errors keyword</li>
</ul>
<p>Andrii Simiklit (1):</p>
<ul>
<li>egl: return correct error code for a case req ver &lt; 3 with forward-compatible</li>
</ul>
<p>Axel Davy (1):</p>
<ul>
<li>st/nine: Fix D3DWindowBuffer_release for old wine nine support</li>
</ul>
<p>Bas Nieuwenhuizen (1):</p>
<ul>
<li>radv: Disable VK_EXT_descriptor_indexing.</li>
</ul>
<p>Brian Paul (1):</p>
<ul>
<li>svga: add SVGA_NO_LOGGING env var (v2)</li>
</ul>
<p>Caio Marcelo de Oliveira Filho (1):</p>
<ul>
<li>spirv: Handle SpvOpDecorateId</li>
</ul>
<p>Charmaine Lee (1):</p>
<ul>
<li>svga: move host logging to winsys</li>
</ul>
<p>Chuck Atkins (1):</p>
<ul>
<li>meson: Fix missing glproto dependency for gallium-glx</li>
</ul>
<p>Daniel Stone (1):</p>
<ul>
<li>vulkan/wsi/wayland: Respect non-blocking AcquireNextImage</li>
</ul>
<p>Dave Airlie (2):</p>
<ul>
<li>r600: reset tex array override even when no view bound</li>
<li>util/bitset: fix bitset range mask calculations.</li>
</ul>
<p>Dylan Baker (7):</p>
<ul>
<li>docs: Add SHA256 sums for mesa 19.0.3</li>
<li>cherry-ignore: Add a patch that was manually backported</li>
<li>cherry-ignore: Add more backported patches</li>
<li>cherry-ignore: Add another patch</li>
<li>cherry-ignore: Add more patches</li>
<li>meson: Force the use of config-tool for llvm</li>
<li>VERSION: bump for 19.0.4 release</li>
</ul>
<p>Emil Velikov (3):</p>
<ul>
<li>vulkan/wsi: check if the display_fd given is master</li>
<li>vulkan/wsi: don't use DUMB_CLOSE for normal GEM handles</li>
<li>configure.ac: check for libdrm when using VL with X11</li>
</ul>
<p>Erik Faye-Lund (2):</p>
<ul>
<li>softpipe: setup pixel_offset for all primitive types</li>
<li>draw: flush when setting stream-out targets</li>
</ul>
<p>Francisco Jerez (2):</p>
<ul>
<li>intel/fs: Lower integer multiply correctly when destination stride equals 4.</li>
<li>intel/fs: Cap dst-aligned region stride to maximum representable hstride value.</li>
</ul>
<p>Hal Gentz (1):</p>
<ul>
<li>glx: Fix synthetic error generation in __glXSendError</li>
</ul>
<p>Ian Romanick (2):</p>
<ul>
<li>glsl: Silence may unused parameter warnings in glsl/ir.h</li>
<li>mesa: Add missing display list support for GL_FOG_COORDINATE_SOURCE</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>anv/descriptor_set: Destroy sets before pool finalization</li>
</ul>
<p>Jon Turney (1):</p>
<ul>
<li>meson: Force '.so' extension for DRI drivers</li>
</ul>
<p>Juan A. Suarez Romero (2):</p>
<ul>
<li>spirv: add missing SPV_EXT_descriptor_indexing capabilities</li>
<li>radv: enable descriptor indexing capabilities</li>
</ul>
<p>Kenneth Graunke (6):</p>
<ul>
<li>glsl: Allow gl_nir_lower_samplers*() without a gl_shader_program</li>
<li>glsl: Don't look at sampler uniform storage for internal vars</li>
<li>i965: Ignore uniform storage for samplers or images, use binding info</li>
<li>i965: Fix BRW_MEMZONE_LOW_4G heap size.</li>
<li>i965: Force VMA alignment to be a multiple of the page size.</li>
<li>i965: leave the top 4Gb of the high heap VMA unused</li>
</ul>
<p>Lionel Landwerlin (4):</p>
<ul>
<li>anv: store heap address bounds when initializing physical device</li>
<li>anv: leave the top 4Gb of the high heap VMA unused</li>
<li>anv: fix argument name for vkCmdEndQuery</li>
<li>anv: rework queries writes to ensure ordering memory writes</li>
</ul>
<p>Marek Olšák (2):</p>
<ul>
<li>radeonsi/gfx9: set that window_rectangles always roll the context</li>
<li>radeonsi/gfx9: rework the gfx9 scissor bug workaround (v2)</li>
</ul>
<p>Nicolai Hähnle (1):</p>
<ul>
<li>radeonsi: add si_debug_options for convenient adding/removing of options</li>
</ul>
<p>Rhys Perry (1):</p>
<ul>
<li>radv: fix set_output_usage_mask() with composite and 64-bit types</li>
</ul>
<p>Ross Burton (1):</p>
<ul>
<li>Revert "meson: drop GLESv1 .so version back to 1.0.0"</li>
</ul>
<p>Samuel Pitoiset (8):</p>
<ul>
<li>radv: add missing VEGA20 chip in radv_get_device_name()</li>
<li>radv: do not need to force emit the TCS regs on Vega20</li>
<li>radv: fix color conversions for normalized uint/sint formats</li>
<li>radv: implement a workaround for VK_EXT_conditional_rendering</li>
<li>radv: set WD_SWITCH_ON_EOP=1 when drawing primitives from a stream output buffer</li>
<li>radv: only need to force emit the TCS regs on Vega10 and Raven1</li>
<li>radv: apply the indexing workaround for atomic buffer operations on GFX9</li>
<li>radv: fix setting the number of rectangles when it's dyanmic</li>
</ul>
<p>Tapani Pälli (1):</p>
<ul>
<li>anv: expose VK_EXT_queue_family_foreign on Android</li>
</ul>
<p>Timothy Arceri (4):</p>
<ul>
<li>nir: fix nir_remove_unused_varyings()</li>
<li>util/drirc: add workarounds for bugs in Doom 3: BFG</li>
<li>radeonsi: add config entry for Counter-Strike Global Offensive</li>
<li>Revert "glx: Fix synthetic error generation in __glXSendError"</li>
</ul>
</div>
</body>
</html>

View File

@@ -1,137 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 19.0.5 Release Notes / May 21, 2019</h1>
<p>
Mesa 19.0.5 is a bug fix release which fixes bugs found since the 19.0.4 release.
</p>
<p>
Mesa 19.0.5 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
</p>
<h2>SHA256 checksums</h2>
<pre>
b6e6b78c23bec15d1e7887c78b7ad00ce395ea1b20ad8aab6ce441f55f724e70 mesa-19.0.5.tar.gz
6aecb7f67c136768692fb3c33a54196186c6c4fcafab7973516a355e1a54f831 mesa-19.0.5.tar.xz
</pre>
<h2>New features</h2>
<p>N/A</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=109659">Bug 109659</a> - Missing OpenGL symbols in OSMesa Gallium when building with meson</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110134">Bug 110134</a> - SIGSEGV while playing large hevc video in mpv</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110648">Bug 110648</a> - Dota2 will not open using vulkan since 19.0 series</li>
</ul>
<h2>Changes</h2>
<p>Caio Marcelo de Oliveira Filho (2):</p>
<ul>
<li>nir: Fix nir_opt_idiv_const when negatives are involved</li>
<li>nir: Fix clone of nir_variable state slots</li>
</ul>
<p>Charmaine Lee (2):</p>
<ul>
<li>st/mesa: purge framebuffers with current context after unbinding winsys buffers</li>
<li>mesa: unreference current winsys buffers when unbinding winsys buffers</li>
</ul>
<p>Dylan Baker (4):</p>
<ul>
<li>docs: Add SHA256 sums for mesa 19.0.4</li>
<li>cherry-ignore: add patches for panfrost</li>
<li>cherry-ignore: Add more 19.1 patches</li>
<li>bump version to 19.0.5</li>
</ul>
<p>Eric Engestrom (1):</p>
<ul>
<li>meson: expose glapi through osmesa</li>
</ul>
<p>Gert Wollny (2):</p>
<ul>
<li>softpipe/buffer: load only as many components as the the buffer resource type provides</li>
<li>Revert "softpipe/buffer: load only as many components as the the buffer resource type provides"</li>
</ul>
<p>Ian Romanick (1):</p>
<ul>
<li>Revert "nir: add late opt to turn inot/b2f combos back to bcsel"</li>
</ul>
<p>Jason Ekstrand (3):</p>
<ul>
<li>intel/fs/ra: Only add dest interference to sources that exist</li>
<li>intel/fs/ra: Stop adding RA interference to too many SENDS nodes</li>
<li>anv: Only consider minSampleShading when sampleShadingEnable is set</li>
</ul>
<p>Józef Kucia (1):</p>
<ul>
<li>radv: clear vertex bindings while resetting command buffer</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>i965: Fix memory leaks in brw_upload_cs_work_groups_surface().</li>
</ul>
<p>Leo Liu (1):</p>
<ul>
<li>winsys/amdgpu: add VCN JPEG to no user fence group</li>
</ul>
<p>Lionel Landwerlin (1):</p>
<ul>
<li>anv: Use corresponding type from the vector allocation</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>st/mesa: fix 2 crashes in st_tgsi_lower_yuv</li>
</ul>
<p>Nanley Chery (1):</p>
<ul>
<li>anv: Fix some depth buffer sampling cases on ICL+</li>
</ul>
<p>Samuel Pitoiset (1):</p>
<ul>
<li>radv: add a workaround for Monster Hunter World and LLVM 7&amp;8</li>
</ul>
</div>
</body>
</html>

View File

@@ -1,153 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 19.0.5 Release Notes / May 21, 2019</h1>
<p>
Mesa 19.0.6 is a bug fix release which fixes bugs found since the 19.0.5 release.
</p>
<p>
Mesa 19.0.6 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
</p>
<h2>SHA256 checksums</h2>
<pre>
SHA256: ac8e9ea388ec5c69f5a690190edf8ede602afdbaeea62d49e108057737430ac7 mesa-19.0.6.tar.gz
SHA256: 2db2f2fcaa4048b16e066fad76b8a93944f7d06d329972b0f5fd5ce692ce3d24 mesa-19.0.6.tar.xz
</pre>
<h2>New features</h2>
<p>N/A</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110721">Bug 110721</a> - graphics corruption on steam client with mesa 19.1.0 rc3 on polaris</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110761">Bug 110761</a> - Huge problems between Mesa and Electron engine apps</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110784">Bug 110784</a> - [regression][bisected] Reverting 'expose 0 shader binary formats for compat profiles for Qt' causes get_program_binary failures on Iris</li>
</ul>
<h2>Changes</h2>
<p>Alok Hota (2):</p>
<ul>
<li>gallium/swr: Param defaults for unhandled PIPE_CAPs</li>
<li>gallium/aux: add PIPE_CAP_MAX_VARYINGS to u_screen</li>
</ul>
<p>Bas Nieuwenhuizen (1):</p>
<ul>
<li>nir: Actually propagate progress in nir_opt_move_load_ubo.</li>
</ul>
<p>Chenglei Ren (1):</p>
<ul>
<li>anv/android: fix missing dependencies issue during parallel build</li>
</ul>
<p>Christian Gmeiner (1):</p>
<ul>
<li>etnaviv: use the correct uniform dirty bits</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>Revert "mesa: unreference current winsys buffers when unbinding winsys buffers"</li>
</ul>
<p>Deepak Rawat (1):</p>
<ul>
<li>winsys/drm: Fix out of scope variable usage</li>
</ul>
<p>Dylan Baker (6):</p>
<ul>
<li>docs: Add Sha256 sums for 19.0.5</li>
<li>cherry-ignore: Add a commit that was manually backported</li>
<li>cherry-ignore: add another 19.1 only patch</li>
<li>cherry-ignore: add another 19.1 only patch</li>
<li>gallium: wrap u_screen in extern "C" for c++</li>
<li>VERSION: bump to 19.0.6</li>
</ul>
<p>Emil Velikov (1):</p>
<ul>
<li>egl/dri: flesh out and use dri2_create_drawable()</li>
</ul>
<p>Jan Zielinski (1):</p>
<ul>
<li>swr/rast: fix 32-bit compilation on Linux</li>
</ul>
<p>Lionel Landwerlin (1):</p>
<ul>
<li>vulkan: fix build dependency issue with generated files</li>
</ul>
<p>Marek Olšák (2):</p>
<ul>
<li>u_blitter: don't fail mipmap generation for depth formats containing stencil</li>
<li>ac: fix a typo in ac_build_wg_scan_bottom</li>
</ul>
<p>Philipp Zabel (1):</p>
<ul>
<li>etnaviv: fill missing offset in etna_resource_get_handle</li>
</ul>
<p>Rob Clark (3):</p>
<ul>
<li>freedreno/ir3: dynamic UBO indexing vs 64b pointers</li>
<li>freedreno/ir3: set more barrier bits</li>
<li>freedreno/a6xx: fix GPU crash on small render targets</li>
</ul>
<p>Sagar Ghuge (1):</p>
<ul>
<li>intel/compiler: Fix assertions in brw_alu3</li>
</ul>
<p>Samuel Pitoiset (2):</p>
<ul>
<li>radv: allocate more space in the CS when emitting events</li>
<li>radv: do not use gfx fast depth clears for layered depth/stencil images</li>
</ul>
<p>Timothy Arceri (2):</p>
<ul>
<li>Revert "st/mesa: expose 0 shader binary formats for compat profiles for Qt"</li>
<li>st/glsl: make sure to propagate initialisers to driver storage</li>
</ul>
</div>
</body>
</html>

View File

@@ -1,150 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 19.0.6 Release Notes / June 24, 2019</h1>
<p>
Mesa 19.0.7 is a bug fix release which fixes bugs found since the 19.0.6 release.
</p>
<p>
Mesa 19.0.7 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
</p>
<h2>SHA256 checksums</h2>
<pre>
81119f0cbbd1fbe7c0574e1e2690e0dae8868124d24c875f5fb76f165db3a54d mesa-19.0.7.tar.gz
d7bf3db2e442fe5eeb96144f8508d94f04aededdf37af477e644638d366b2b28 mesa-19.0.7.tar.xz
</pre>
<h2>New features</h2>
<p>N/A</p>
<h2>Bug fixes</h2>
<ul>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110302">Bug 110302</a> - [bisected][regression] piglit egl-create-pbuffer-surface and egl-gl-colorspace regressions</li>
<li><a href="https://bugs.freedesktop.org/show_bug.cgi?id=110921">Bug 110921</a> - virgl on OpenGL 3.3 host regressed to OpenGL 2.1</li>
</ul>
<h2>Changes</h2>
<p>Bas Nieuwenhuizen (5):</p>
<ul>
<li>radv: Prevent out of bound shift on 32-bit builds.</li>
<li>radv: Decompress DCC when the image format is not allowed for buffers.</li>
<li>radv: Fix vulkan build in meson.</li>
<li>anv: Fix vulkan build in meson.</li>
<li>meson: Allow building radeonsi with just the android platform.</li>
</ul>
<p>Charmaine Lee (1):</p>
<ul>
<li>svga: Remove unnecessary check for the pre flush bit for setting vertex buffers</li>
</ul>
<p>Deepak Rawat (1):</p>
<ul>
<li>winsys/svga/drm: Fix 32-bit RPCI send message</li>
</ul>
<p>Dylan Baker (3):</p>
<ul>
<li>docs: Add SHA256 sums for 19.0.6</li>
<li>cherry-ignore: add additional 19.1 only patches</li>
<li>Bump version for 19.0.7 release</li>
</ul>
<p>Emil Velikov (1):</p>
<ul>
<li>mapi: correctly handle the full offset table</li>
</ul>
<p>Gert Wollny (2):</p>
<ul>
<li>virgl: Add a caps feature check version</li>
<li>virgl: Assume sRGB write control for older guest kernels or virglrenderer hosts</li>
</ul>
<p>Haihao Xiang (1):</p>
<ul>
<li>i965: support UYVY for external import only</li>
</ul>
<p>Jason Ekstrand (2):</p>
<ul>
<li>nir/propagate_invariant: Don't add NULL vars to the hash table</li>
<li>anv: Set STATE_BASE_ADDRESS upper bounds on gen7</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>glsl: Fix out of bounds read in shader_cache_read_program_metadata</li>
</ul>
<p>Kevin Strasser (2):</p>
<ul>
<li>gallium/winsys/kms: Fix dumb buffer bpp</li>
<li>st/mesa: Add rgbx handling for fp formats</li>
</ul>
<p>Lionel Landwerlin (2):</p>
<ul>
<li>intel/perf: fix EuThreadsCount value in performance equations</li>
<li>intel/perf: improve dynamic loading config detection</li>
</ul>
<p>Mathias Fröhlich (1):</p>
<ul>
<li>egl: Don't add hardware device if there is no render node v2.</li>
</ul>
<p>Nanley Chery (1):</p>
<ul>
<li>anv/cmd_buffer: Initalize the clear color struct for CNL+</li>
</ul>
<p>Nataraj Deshpande (1):</p>
<ul>
<li>anv: Fix check for isl_fmt in assert</li>
</ul>
<p>Samuel Pitoiset (5):</p>
<ul>
<li>radv: fix alpha-to-coverage when there is unused color attachments</li>
<li>radv: fix setting CB_SHADER_MASK for dual source blending</li>
<li>radv: fix occlusion queries on VegaM</li>
<li>radv: fix VK_EXT_memory_budget if one heap isn't available</li>
<li>radv: fix FMASK expand with SRGB formats</li>
</ul>
</div>
</body>
</html>

View File

@@ -14,13 +14,15 @@
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 19.0.8 Release Notes / June 26, 2019</h1>
<h1>Mesa 19.1.0 Release Notes / TBD</h1>
<p>
Mesa 19.0.8 is an emergency bug fix release which fixes a critical bug found in the 19.0.7 release.
Mesa 19.1.0 is a new development release. People who are concerned
with stability and reliability should stick with a previous release or
wait for Mesa 19.1.1.
</p>
<p>
Mesa 19.0.8 implements the OpenGL 4.5 API, but the version reported by
Mesa 19.1.0 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
@@ -28,32 +30,29 @@ Some drivers don't support all the features required in OpenGL 4.5. OpenGL
Compatibility contexts may report a lower version depending on each driver.
</p>
<h2>SHA256 checksums</h2>
<pre>
TBD
TBD.
</pre>
<h2>New features</h2>
<p>N/A</p>
<ul>
<li>GL_EXT_texture_compression_s3tc_srgb on Gallium drivers and i965 (ES extension).</li>
<li>VK_EXT_buffer_device_address on Intel and RADV.</li>
</ul>
<h2>Bug fixes</h2>
<p>None</p>
<ul>
<li>TBD</li>
</ul>
<h2>Changes</h2>
<p>Dylan Baker (2):</p>
<ul>
<li>docs: Add SHA256 sums for 19.0.7</li>
<li>version: bump to 19.0.8</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>egl/x11: calloc dri2_surf so it's properly zeroed</li>
<li>TBD</li>
</ul>
</div>

View File

@@ -59,7 +59,6 @@ execution. These are generally used for debugging.
<li><b>nopfrag</b> - force fragment shader to be a simple shader that passes
through the color attribute.
<li><b>useprog</b> - log glUseProgram calls to stderr
<li><b>errors</b> - GLSL compilation and link errors will be reported to stderr.
</ul>
<p>
Example: export MESA_GLSL=dump,nopt

View File

@@ -236,6 +236,11 @@ your email administrator for this.)
<li>Other tag examples: gallium, util
</ul>
</p>
<p>
Tick the following when creating the MR. It allows developers to
rebase your work on top of master.
<pre>Allow commits from members who can merge to the target branch</pre>
</p>
<p>
If you revise your patches based on code review and push an update
to your branch, you should maintain a <strong>clean</strong> history

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

9690
include/CL/cl2.hpp Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -1,5 +1,5 @@
/**********************************************************************************
* Copyright (c) 2008-2012 The Khronos Group Inc.
* Copyright (c) 2008-2015 The Khronos Group Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and/or associated documentation files (the
@@ -12,6 +12,11 @@
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Materials.
*
* MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS
* KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS
* SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT
* https://www.khronos.org/registry/
*
* THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.

View File

@@ -1,5 +1,5 @@
/**********************************************************************************
* Copyright (c) 2008-2012 The Khronos Group Inc.
* Copyright (c) 2008-2015 The Khronos Group Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and/or associated documentation files (the
@@ -12,6 +12,11 @@
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Materials.
*
* MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS
* KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS
* SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT
* https://www.khronos.org/registry/
*
* THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.

View File

@@ -1,5 +1,5 @@
/**********************************************************************************
* Copyright (c) 2008-2012 The Khronos Group Inc.
* Copyright (c) 2008-2015 The Khronos Group Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and/or associated documentation files (the
@@ -12,6 +12,11 @@
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Materials.
*
* MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS
* KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS
* SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT
* https://www.khronos.org/registry/
*
* THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
@@ -33,7 +38,7 @@
extern "C" {
#endif
/******************************************************************************
/******************************************************************************/
/* cl_khr_dx9_media_sharing */
#define cl_khr_dx9_media_sharing 1

View File

@@ -0,0 +1,182 @@
/**********************************************************************************
* Copyright (c) 2008-2019 The Khronos Group Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and/or associated documentation files (the
* "Materials"), to deal in the Materials without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Materials, and to
* permit persons to whom the Materials are furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Materials.
*
* MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS
* KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS
* SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT
* https://www.khronos.org/registry/
*
* THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
* CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS.
**********************************************************************************/
/*****************************************************************************\
Copyright (c) 2013-2019 Intel Corporation All Rights Reserved.
THESE MATERIALS ARE PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THESE
MATERIALS, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
File Name: cl_dx9_media_sharing_intel.h
Abstract:
Notes:
\*****************************************************************************/
#ifndef __OPENCL_CL_DX9_MEDIA_SHARING_INTEL_H
#define __OPENCL_CL_DX9_MEDIA_SHARING_INTEL_H
#include <CL/cl.h>
#include <CL/cl_platform.h>
#include <d3d9.h>
#include <dxvahd.h>
#include <wtypes.h>
#include <d3d9types.h>
#ifdef __cplusplus
extern "C" {
#endif
/***************************************
* cl_intel_dx9_media_sharing extension *
****************************************/
#define cl_intel_dx9_media_sharing 1
typedef cl_uint cl_dx9_device_source_intel;
typedef cl_uint cl_dx9_device_set_intel;
/* error codes */
#define CL_INVALID_DX9_DEVICE_INTEL -1010
#define CL_INVALID_DX9_RESOURCE_INTEL -1011
#define CL_DX9_RESOURCE_ALREADY_ACQUIRED_INTEL -1012
#define CL_DX9_RESOURCE_NOT_ACQUIRED_INTEL -1013
/* cl_dx9_device_source_intel */
#define CL_D3D9_DEVICE_INTEL 0x4022
#define CL_D3D9EX_DEVICE_INTEL 0x4070
#define CL_DXVA_DEVICE_INTEL 0x4071
/* cl_dx9_device_set_intel */
#define CL_PREFERRED_DEVICES_FOR_DX9_INTEL 0x4024
#define CL_ALL_DEVICES_FOR_DX9_INTEL 0x4025
/* cl_context_info */
#define CL_CONTEXT_D3D9_DEVICE_INTEL 0x4026
#define CL_CONTEXT_D3D9EX_DEVICE_INTEL 0x4072
#define CL_CONTEXT_DXVA_DEVICE_INTEL 0x4073
/* cl_mem_info */
#define CL_MEM_DX9_RESOURCE_INTEL 0x4027
#define CL_MEM_DX9_SHARED_HANDLE_INTEL 0x4074
/* cl_image_info */
#define CL_IMAGE_DX9_PLANE_INTEL 0x4075
/* cl_command_type */
#define CL_COMMAND_ACQUIRE_DX9_OBJECTS_INTEL 0x402A
#define CL_COMMAND_RELEASE_DX9_OBJECTS_INTEL 0x402B
/******************************************************************************/
extern CL_API_ENTRY cl_int CL_API_CALL
clGetDeviceIDsFromDX9INTEL(
cl_platform_id platform,
cl_dx9_device_source_intel dx9_device_source,
void* dx9_object,
cl_dx9_device_set_intel dx9_device_set,
cl_uint num_entries,
cl_device_id* devices,
cl_uint* num_devices) CL_EXT_SUFFIX__VERSION_1_1;
typedef CL_API_ENTRY cl_int (CL_API_CALL* clGetDeviceIDsFromDX9INTEL_fn)(
cl_platform_id platform,
cl_dx9_device_source_intel dx9_device_source,
void* dx9_object,
cl_dx9_device_set_intel dx9_device_set,
cl_uint num_entries,
cl_device_id* devices,
cl_uint* num_devices) CL_EXT_SUFFIX__VERSION_1_1;
extern CL_API_ENTRY cl_mem CL_API_CALL
clCreateFromDX9MediaSurfaceINTEL(
cl_context context,
cl_mem_flags flags,
IDirect3DSurface9* resource,
HANDLE sharedHandle,
UINT plane,
cl_int* errcode_ret) CL_EXT_SUFFIX__VERSION_1_1;
typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromDX9MediaSurfaceINTEL_fn)(
cl_context context,
cl_mem_flags flags,
IDirect3DSurface9* resource,
HANDLE sharedHandle,
UINT plane,
cl_int* errcode_ret) CL_EXT_SUFFIX__VERSION_1_1;
extern CL_API_ENTRY cl_int CL_API_CALL
clEnqueueAcquireDX9ObjectsINTEL(
cl_command_queue command_queue,
cl_uint num_objects,
const cl_mem* mem_objects,
cl_uint num_events_in_wait_list,
const cl_event* event_wait_list,
cl_event* event) CL_EXT_SUFFIX__VERSION_1_1;
typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireDX9ObjectsINTEL_fn)(
cl_command_queue command_queue,
cl_uint num_objects,
const cl_mem* mem_objects,
cl_uint num_events_in_wait_list,
const cl_event* event_wait_list,
cl_event* event) CL_EXT_SUFFIX__VERSION_1_1;
extern CL_API_ENTRY cl_int CL_API_CALL
clEnqueueReleaseDX9ObjectsINTEL(
cl_command_queue command_queue,
cl_uint num_objects,
cl_mem* mem_objects,
cl_uint num_events_in_wait_list,
const cl_event* event_wait_list,
cl_event* event) CL_EXT_SUFFIX__VERSION_1_1;
typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseDX9ObjectsINTEL_fn)(
cl_command_queue command_queue,
cl_uint num_objects,
cl_mem* mem_objects,
cl_uint num_events_in_wait_list,
const cl_event* event_wait_list,
cl_event* event) CL_EXT_SUFFIX__VERSION_1_1;
#ifdef __cplusplus
}
#endif
#endif /* __OPENCL_CL_DX9_MEDIA_SHARING_INTEL_H */

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
* Copyright (c) 2008-2010 The Khronos Group Inc.
* Copyright (c) 2008-2019 The Khronos Group Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and/or associated documentation files (the
@@ -12,6 +12,11 @@
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Materials.
*
* MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS
* KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS
* SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT
* https://www.khronos.org/registry/
*
* THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
@@ -24,13 +29,7 @@
#ifndef __OPENCL_CL_EGL_H
#define __OPENCL_CL_EGL_H
#ifdef __APPLE__
#else
#include <CL/cl.h>
#include <EGL/egl.h>
#include <EGL/eglext.h>
#endif
#ifdef __cplusplus
extern "C" {
@@ -62,69 +61,69 @@ typedef intptr_t cl_egl_image_properties_khr;
#define cl_khr_egl_image 1
extern CL_API_ENTRY cl_mem CL_API_CALL
clCreateFromEGLImageKHR(cl_context /* context */,
CLeglDisplayKHR /* egldisplay */,
CLeglImageKHR /* eglimage */,
cl_mem_flags /* flags */,
const cl_egl_image_properties_khr * /* properties */,
cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0;
clCreateFromEGLImageKHR(cl_context context,
CLeglDisplayKHR egldisplay,
CLeglImageKHR eglimage,
cl_mem_flags flags,
const cl_egl_image_properties_khr * properties,
cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0;
typedef CL_API_ENTRY cl_mem (CL_API_CALL *clCreateFromEGLImageKHR_fn)(
cl_context context,
CLeglDisplayKHR egldisplay,
CLeglImageKHR eglimage,
cl_mem_flags flags,
const cl_egl_image_properties_khr * properties,
cl_int * errcode_ret);
cl_context context,
CLeglDisplayKHR egldisplay,
CLeglImageKHR eglimage,
cl_mem_flags flags,
const cl_egl_image_properties_khr * properties,
cl_int * errcode_ret);
extern CL_API_ENTRY cl_int CL_API_CALL
clEnqueueAcquireEGLObjectsKHR(cl_command_queue /* command_queue */,
cl_uint /* num_objects */,
const cl_mem * /* mem_objects */,
cl_uint /* num_events_in_wait_list */,
const cl_event * /* event_wait_list */,
cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0;
clEnqueueAcquireEGLObjectsKHR(cl_command_queue command_queue,
cl_uint num_objects,
const cl_mem * mem_objects,
cl_uint num_events_in_wait_list,
const cl_event * event_wait_list,
cl_event * event) CL_API_SUFFIX__VERSION_1_0;
typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireEGLObjectsKHR_fn)(
cl_command_queue command_queue,
cl_uint num_objects,
const cl_mem * mem_objects,
cl_uint num_events_in_wait_list,
const cl_event * event_wait_list,
cl_event * event);
cl_command_queue command_queue,
cl_uint num_objects,
const cl_mem * mem_objects,
cl_uint num_events_in_wait_list,
const cl_event * event_wait_list,
cl_event * event);
extern CL_API_ENTRY cl_int CL_API_CALL
clEnqueueReleaseEGLObjectsKHR(cl_command_queue /* command_queue */,
cl_uint /* num_objects */,
const cl_mem * /* mem_objects */,
cl_uint /* num_events_in_wait_list */,
const cl_event * /* event_wait_list */,
cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0;
clEnqueueReleaseEGLObjectsKHR(cl_command_queue command_queue,
cl_uint num_objects,
const cl_mem * mem_objects,
cl_uint num_events_in_wait_list,
const cl_event * event_wait_list,
cl_event * event) CL_API_SUFFIX__VERSION_1_0;
typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseEGLObjectsKHR_fn)(
cl_command_queue command_queue,
cl_uint num_objects,
const cl_mem * mem_objects,
cl_uint num_events_in_wait_list,
const cl_event * event_wait_list,
cl_event * event);
cl_command_queue command_queue,
cl_uint num_objects,
const cl_mem * mem_objects,
cl_uint num_events_in_wait_list,
const cl_event * event_wait_list,
cl_event * event);
#define cl_khr_egl_event 1
extern CL_API_ENTRY cl_event CL_API_CALL
clCreateEventFromEGLSyncKHR(cl_context /* context */,
CLeglSyncKHR /* sync */,
CLeglDisplayKHR /* display */,
cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0;
clCreateEventFromEGLSyncKHR(cl_context context,
CLeglSyncKHR sync,
CLeglDisplayKHR display,
cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0;
typedef CL_API_ENTRY cl_event (CL_API_CALL *clCreateEventFromEGLSyncKHR_fn)(
cl_context context,
CLeglSyncKHR sync,
CLeglDisplayKHR display,
cl_int * errcode_ret);
cl_context context,
CLeglSyncKHR sync,
CLeglDisplayKHR display,
cl_int * errcode_ret);
#ifdef __cplusplus
}

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
* Copyright (c) 2008-2013 The Khronos Group Inc.
* Copyright (c) 2008-2019 The Khronos Group Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and/or associated documentation files (the
@@ -12,6 +12,11 @@
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Materials.
*
* MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS
* KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS
* SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT
* https://www.khronos.org/registry/
*
* THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
@@ -21,8 +26,6 @@
* MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS.
******************************************************************************/
/* $Revision: 11928 $ on $Date: 2010-07-13 09:04:56 -0700 (Tue, 13 Jul 2010) $ */
/* cl_ext.h contains OpenCL extensions which don't have external */
/* (OpenGL, D3D) dependencies. */
@@ -33,11 +36,13 @@
extern "C" {
#endif
#ifdef __APPLE__
#include <OpenCL/cl.h>
#include <AvailabilityMacros.h>
#else
#include <CL/cl.h>
#include <CL/cl.h>
/* cl_khr_fp64 extension - no extension #define since it has no functions */
/* CL_DEVICE_DOUBLE_FP_CONFIG is defined in CL.h for OpenCL >= 120 */
#if CL_TARGET_OPENCL_VERSION <= 110
#define CL_DEVICE_DOUBLE_FP_CONFIG 0x1032
#endif
/* cl_khr_fp16 extension - no extension #define since it has no functions */
@@ -47,12 +52,12 @@ extern "C" {
*
* Apple extension for use to manage externally allocated buffers used with cl_mem objects with CL_MEM_USE_HOST_PTR
*
* Registers a user callback function that will be called when the memory object is deleted and its resources
* freed. Each call to clSetMemObjectCallbackFn registers the specified user callback function on a callback
* stack associated with memobj. The registered user callback functions are called in the reverse order in
* which they were registered. The user callback functions are called and then the memory object is deleted
* and its resources freed. This provides a mechanism for the application (and libraries) using memobj to be
* notified when the memory referenced by host_ptr, specified when the memory object is created and used as
* Registers a user callback function that will be called when the memory object is deleted and its resources
* freed. Each call to clSetMemObjectCallbackFn registers the specified user callback function on a callback
* stack associated with memobj. The registered user callback functions are called in the reverse order in
* which they were registered. The user callback functions are called and then the memory object is deleted
* and its resources freed. This provides a mechanism for the application (and libraries) using memobj to be
* notified when the memory referenced by host_ptr, specified when the memory object is created and used as
* the storage bits for the memory object, can be reused or freed.
*
* The application may not call CL api's with the cl_mem object passed to the pfn_notify.
@@ -61,9 +66,9 @@ extern "C" {
* before using.
*/
#define cl_APPLE_SetMemObjectDestructor 1
cl_int CL_API_ENTRY clSetMemObjectDestructorAPPLE( cl_mem /* memobj */,
void (* /*pfn_notify*/)( cl_mem /* memobj */, void* /*user_data*/),
void * /*user_data */ ) CL_EXT_SUFFIX__VERSION_1_0;
cl_int CL_API_ENTRY clSetMemObjectDestructorAPPLE( cl_mem memobj,
void (* pfn_notify)(cl_mem memobj, void * user_data),
void * user_data) CL_EXT_SUFFIX__VERSION_1_0;
/* Context Logging Functions
@@ -72,29 +77,29 @@ cl_int CL_API_ENTRY clSetMemObjectDestructorAPPLE( cl_mem /* memobj */,
* Please check for the "cl_APPLE_ContextLoggingFunctions" extension using clGetDeviceInfo(CL_DEVICE_EXTENSIONS)
* before using.
*
* clLogMessagesToSystemLog fowards on all log messages to the Apple System Logger
* clLogMessagesToSystemLog forwards on all log messages to the Apple System Logger
*/
#define cl_APPLE_ContextLoggingFunctions 1
extern void CL_API_ENTRY clLogMessagesToSystemLogAPPLE( const char * /* errstr */,
const void * /* private_info */,
size_t /* cb */,
void * /* user_data */ ) CL_EXT_SUFFIX__VERSION_1_0;
extern void CL_API_ENTRY clLogMessagesToSystemLogAPPLE( const char * errstr,
const void * private_info,
size_t cb,
void * user_data) CL_EXT_SUFFIX__VERSION_1_0;
/* clLogMessagesToStdout sends all log messages to the file descriptor stdout */
extern void CL_API_ENTRY clLogMessagesToStdoutAPPLE( const char * /* errstr */,
const void * /* private_info */,
size_t /* cb */,
void * /* user_data */ ) CL_EXT_SUFFIX__VERSION_1_0;
extern void CL_API_ENTRY clLogMessagesToStdoutAPPLE( const char * errstr,
const void * private_info,
size_t cb,
void * user_data) CL_EXT_SUFFIX__VERSION_1_0;
/* clLogMessagesToStderr sends all log messages to the file descriptor stderr */
extern void CL_API_ENTRY clLogMessagesToStderrAPPLE( const char * /* errstr */,
const void * /* private_info */,
size_t /* cb */,
void * /* user_data */ ) CL_EXT_SUFFIX__VERSION_1_0;
extern void CL_API_ENTRY clLogMessagesToStderrAPPLE( const char * errstr,
const void * private_info,
size_t cb,
void * user_data) CL_EXT_SUFFIX__VERSION_1_0;
/************************
* cl_khr_icd extension *
/************************
* cl_khr_icd extension *
************************/
#define cl_khr_icd 1
@@ -105,16 +110,43 @@ extern void CL_API_ENTRY clLogMessagesToStderrAPPLE( const char * /* errstr */
#define CL_PLATFORM_NOT_FOUND_KHR -1001
extern CL_API_ENTRY cl_int CL_API_CALL
clIcdGetPlatformIDsKHR(cl_uint /* num_entries */,
cl_platform_id * /* platforms */,
cl_uint * /* num_platforms */);
clIcdGetPlatformIDsKHR(cl_uint num_entries,
cl_platform_id * platforms,
cl_uint * num_platforms);
typedef CL_API_ENTRY cl_int (CL_API_CALL *clIcdGetPlatformIDsKHR_fn)(
cl_uint /* num_entries */,
cl_platform_id * /* platforms */,
cl_uint * /* num_platforms */);
typedef CL_API_ENTRY cl_int
(CL_API_CALL *clIcdGetPlatformIDsKHR_fn)(cl_uint num_entries,
cl_platform_id * platforms,
cl_uint * num_platforms);
/*******************************
* cl_khr_il_program extension *
*******************************/
#define cl_khr_il_program 1
/* New property to clGetDeviceInfo for retrieving supported intermediate
* languages
*/
#define CL_DEVICE_IL_VERSION_KHR 0x105B
/* New property to clGetProgramInfo for retrieving for retrieving the IL of a
* program
*/
#define CL_PROGRAM_IL_KHR 0x1169
extern CL_API_ENTRY cl_program CL_API_CALL
clCreateProgramWithILKHR(cl_context context,
const void * il,
size_t length,
cl_int * errcode_ret);
typedef CL_API_ENTRY cl_program
(CL_API_CALL *clCreateProgramWithILKHR_fn)(cl_context context,
const void * il,
size_t length,
cl_int * errcode_ret) CL_EXT_SUFFIX__VERSION_1_2;
/* Extension: cl_khr_image2D_buffer
*
* This extension allows a 2D image to be created from a cl_mem buffer without a copy.
@@ -129,31 +161,33 @@ typedef CL_API_ENTRY cl_int (CL_API_CALL *clIcdGetPlatformIDsKHR_fn)(
* The pitch specified must be a multiple of CL_DEVICE_IMAGE_PITCH_ALIGNMENT pixels.
* The base address of the buffer must be aligned to CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT pixels.
*/
/*************************************
* cl_khr_initalize_memory extension *
*************************************/
#define CL_CONTEXT_MEMORY_INITIALIZE_KHR 0x200E
/**************************************
* cl_khr_initialize_memory extension *
**************************************/
#define CL_CONTEXT_MEMORY_INITIALIZE_KHR 0x2030
/**************************************
* cl_khr_terminate_context extension *
**************************************/
#define CL_DEVICE_TERMINATE_CAPABILITY_KHR 0x200F
#define CL_CONTEXT_TERMINATE_KHR 0x2010
#define CL_DEVICE_TERMINATE_CAPABILITY_KHR 0x2031
#define CL_CONTEXT_TERMINATE_KHR 0x2032
#define cl_khr_terminate_context 1
extern CL_API_ENTRY cl_int CL_API_CALL clTerminateContextKHR(cl_context /* context */) CL_EXT_SUFFIX__VERSION_1_2;
extern CL_API_ENTRY cl_int CL_API_CALL
clTerminateContextKHR(cl_context context) CL_EXT_SUFFIX__VERSION_1_2;
typedef CL_API_ENTRY cl_int
(CL_API_CALL *clTerminateContextKHR_fn)(cl_context context) CL_EXT_SUFFIX__VERSION_1_2;
typedef CL_API_ENTRY cl_int (CL_API_CALL *clTerminateContextKHR_fn)(cl_context /* context */) CL_EXT_SUFFIX__VERSION_1_2;
/*
* Extension: cl_khr_spir
*
* This extension adds support to create an OpenCL program object from a
* This extension adds support to create an OpenCL program object from a
* Standard Portable Intermediate Representation (SPIR) instance
*/
@@ -161,9 +195,30 @@ typedef CL_API_ENTRY cl_int (CL_API_CALL *clTerminateContextKHR_fn)(cl_context /
#define CL_PROGRAM_BINARY_TYPE_INTERMEDIATE 0x40E1
/*****************************************
* cl_khr_create_command_queue extension *
*****************************************/
#define cl_khr_create_command_queue 1
typedef cl_bitfield cl_queue_properties_khr;
extern CL_API_ENTRY cl_command_queue CL_API_CALL
clCreateCommandQueueWithPropertiesKHR(cl_context context,
cl_device_id device,
const cl_queue_properties_khr* properties,
cl_int* errcode_ret) CL_EXT_SUFFIX__VERSION_1_2;
typedef CL_API_ENTRY cl_command_queue
(CL_API_CALL *clCreateCommandQueueWithPropertiesKHR_fn)(cl_context context,
cl_device_id device,
const cl_queue_properties_khr* properties,
cl_int* errcode_ret) CL_EXT_SUFFIX__VERSION_1_2;
/******************************************
* cl_nv_device_attribute_query extension *
******************************************/
/* cl_nv_device_attribute_query extension - no extension #define since it has no functions */
#define CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV 0x4000
#define CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV 0x4001
@@ -173,88 +228,124 @@ typedef CL_API_ENTRY cl_int (CL_API_CALL *clTerminateContextKHR_fn)(cl_context /
#define CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV 0x4005
#define CL_DEVICE_INTEGRATED_MEMORY_NV 0x4006
/*********************************
* cl_amd_device_attribute_query *
*********************************/
#define CL_DEVICE_PROFILING_TIMER_OFFSET_AMD 0x4036
/*********************************
* cl_arm_printf extension
*********************************/
#define CL_PRINTF_CALLBACK_ARM 0x40B0
#define CL_PRINTF_BUFFERSIZE_ARM 0x40B1
#ifdef CL_VERSION_1_1
/***********************************
* cl_ext_device_fission extension *
***********************************/
#define cl_ext_device_fission 1
extern CL_API_ENTRY cl_int CL_API_CALL
clReleaseDeviceEXT( cl_device_id /*device*/ ) CL_EXT_SUFFIX__VERSION_1_1;
typedef CL_API_ENTRY cl_int
(CL_API_CALL *clReleaseDeviceEXT_fn)( cl_device_id /*device*/ ) CL_EXT_SUFFIX__VERSION_1_1;
extern CL_API_ENTRY cl_int CL_API_CALL
clRetainDeviceEXT( cl_device_id /*device*/ ) CL_EXT_SUFFIX__VERSION_1_1;
typedef CL_API_ENTRY cl_int
(CL_API_CALL *clRetainDeviceEXT_fn)( cl_device_id /*device*/ ) CL_EXT_SUFFIX__VERSION_1_1;
/***********************************
* cl_ext_device_fission extension
***********************************/
#define cl_ext_device_fission 1
typedef cl_ulong cl_device_partition_property_ext;
extern CL_API_ENTRY cl_int CL_API_CALL
clCreateSubDevicesEXT( cl_device_id /*in_device*/,
const cl_device_partition_property_ext * /* properties */,
cl_uint /*num_entries*/,
cl_device_id * /*out_devices*/,
cl_uint * /*num_devices*/ ) CL_EXT_SUFFIX__VERSION_1_1;
extern CL_API_ENTRY cl_int CL_API_CALL
clReleaseDeviceEXT(cl_device_id device) CL_EXT_SUFFIX__VERSION_1_1;
typedef CL_API_ENTRY cl_int
( CL_API_CALL * clCreateSubDevicesEXT_fn)( cl_device_id /*in_device*/,
const cl_device_partition_property_ext * /* properties */,
cl_uint /*num_entries*/,
cl_device_id * /*out_devices*/,
cl_uint * /*num_devices*/ ) CL_EXT_SUFFIX__VERSION_1_1;
typedef CL_API_ENTRY cl_int
(CL_API_CALL *clReleaseDeviceEXT_fn)(cl_device_id device) CL_EXT_SUFFIX__VERSION_1_1;
extern CL_API_ENTRY cl_int CL_API_CALL
clRetainDeviceEXT(cl_device_id device) CL_EXT_SUFFIX__VERSION_1_1;
typedef CL_API_ENTRY cl_int
(CL_API_CALL *clRetainDeviceEXT_fn)(cl_device_id device) CL_EXT_SUFFIX__VERSION_1_1;
typedef cl_ulong cl_device_partition_property_ext;
extern CL_API_ENTRY cl_int CL_API_CALL
clCreateSubDevicesEXT(cl_device_id in_device,
const cl_device_partition_property_ext * properties,
cl_uint num_entries,
cl_device_id * out_devices,
cl_uint * num_devices) CL_EXT_SUFFIX__VERSION_1_1;
typedef CL_API_ENTRY cl_int
(CL_API_CALL * clCreateSubDevicesEXT_fn)(cl_device_id in_device,
const cl_device_partition_property_ext * properties,
cl_uint num_entries,
cl_device_id * out_devices,
cl_uint * num_devices) CL_EXT_SUFFIX__VERSION_1_1;
/* cl_device_partition_property_ext */
#define CL_DEVICE_PARTITION_EQUALLY_EXT 0x4050
#define CL_DEVICE_PARTITION_BY_COUNTS_EXT 0x4051
#define CL_DEVICE_PARTITION_BY_NAMES_EXT 0x4052
#define CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN_EXT 0x4053
/* clDeviceGetInfo selectors */
#define CL_DEVICE_PARENT_DEVICE_EXT 0x4054
#define CL_DEVICE_PARTITION_TYPES_EXT 0x4055
#define CL_DEVICE_AFFINITY_DOMAINS_EXT 0x4056
#define CL_DEVICE_REFERENCE_COUNT_EXT 0x4057
#define CL_DEVICE_PARTITION_STYLE_EXT 0x4058
/* error codes */
#define CL_DEVICE_PARTITION_FAILED_EXT -1057
#define CL_INVALID_PARTITION_COUNT_EXT -1058
#define CL_INVALID_PARTITION_NAME_EXT -1059
/* CL_AFFINITY_DOMAINs */
#define CL_AFFINITY_DOMAIN_L1_CACHE_EXT 0x1
#define CL_AFFINITY_DOMAIN_L2_CACHE_EXT 0x2
#define CL_AFFINITY_DOMAIN_L3_CACHE_EXT 0x3
#define CL_AFFINITY_DOMAIN_L4_CACHE_EXT 0x4
#define CL_AFFINITY_DOMAIN_NUMA_EXT 0x10
#define CL_AFFINITY_DOMAIN_NEXT_FISSIONABLE_EXT 0x100
/* cl_device_partition_property_ext list terminators */
#define CL_PROPERTIES_LIST_END_EXT ((cl_device_partition_property_ext) 0)
#define CL_PARTITION_BY_COUNTS_LIST_END_EXT ((cl_device_partition_property_ext) 0)
#define CL_PARTITION_BY_NAMES_LIST_END_EXT ((cl_device_partition_property_ext) 0 - 1)
/***********************************
* cl_ext_migrate_memobject extension definitions
***********************************/
#define cl_ext_migrate_memobject 1
typedef cl_bitfield cl_mem_migration_flags_ext;
#define CL_MIGRATE_MEM_OBJECT_HOST_EXT 0x1
#define CL_COMMAND_MIGRATE_MEM_OBJECT_EXT 0x4040
extern CL_API_ENTRY cl_int CL_API_CALL
clEnqueueMigrateMemObjectEXT(cl_command_queue command_queue,
cl_uint num_mem_objects,
const cl_mem * mem_objects,
cl_mem_migration_flags_ext flags,
cl_uint num_events_in_wait_list,
const cl_event * event_wait_list,
cl_event * event);
typedef CL_API_ENTRY cl_int
(CL_API_CALL *clEnqueueMigrateMemObjectEXT_fn)(cl_command_queue command_queue,
cl_uint num_mem_objects,
const cl_mem * mem_objects,
cl_mem_migration_flags_ext flags,
cl_uint num_events_in_wait_list,
const cl_event * event_wait_list,
cl_event * event);
/* cl_device_partition_property_ext */
#define CL_DEVICE_PARTITION_EQUALLY_EXT 0x4050
#define CL_DEVICE_PARTITION_BY_COUNTS_EXT 0x4051
#define CL_DEVICE_PARTITION_BY_NAMES_EXT 0x4052
#define CL_DEVICE_PARTITION_BY_AFFINITY_DOMAIN_EXT 0x4053
/* clDeviceGetInfo selectors */
#define CL_DEVICE_PARENT_DEVICE_EXT 0x4054
#define CL_DEVICE_PARTITION_TYPES_EXT 0x4055
#define CL_DEVICE_AFFINITY_DOMAINS_EXT 0x4056
#define CL_DEVICE_REFERENCE_COUNT_EXT 0x4057
#define CL_DEVICE_PARTITION_STYLE_EXT 0x4058
/* error codes */
#define CL_DEVICE_PARTITION_FAILED_EXT -1057
#define CL_INVALID_PARTITION_COUNT_EXT -1058
#define CL_INVALID_PARTITION_NAME_EXT -1059
/* CL_AFFINITY_DOMAINs */
#define CL_AFFINITY_DOMAIN_L1_CACHE_EXT 0x1
#define CL_AFFINITY_DOMAIN_L2_CACHE_EXT 0x2
#define CL_AFFINITY_DOMAIN_L3_CACHE_EXT 0x3
#define CL_AFFINITY_DOMAIN_L4_CACHE_EXT 0x4
#define CL_AFFINITY_DOMAIN_NUMA_EXT 0x10
#define CL_AFFINITY_DOMAIN_NEXT_FISSIONABLE_EXT 0x100
/* cl_device_partition_property_ext list terminators */
#define CL_PROPERTIES_LIST_END_EXT ((cl_device_partition_property_ext) 0)
#define CL_PARTITION_BY_COUNTS_LIST_END_EXT ((cl_device_partition_property_ext) 0)
#define CL_PARTITION_BY_NAMES_LIST_END_EXT ((cl_device_partition_property_ext) 0 - 1)
/*********************************
* cl_qcom_ext_host_ptr extension
*********************************/
#define cl_qcom_ext_host_ptr 1
#define CL_MEM_EXT_HOST_PTR_QCOM (1 << 29)
#define CL_DEVICE_EXT_MEM_PADDING_IN_BYTES_QCOM 0x40A0
#define CL_DEVICE_EXT_MEM_PADDING_IN_BYTES_QCOM 0x40A0
#define CL_DEVICE_PAGE_SIZE_QCOM 0x40A1
#define CL_IMAGE_ROW_ALIGNMENT_QCOM 0x40A2
#define CL_IMAGE_SLICE_ALIGNMENT_QCOM 0x40A3
@@ -280,12 +371,21 @@ typedef struct _cl_mem_ext_host_ptr
/* Type of external memory allocation. */
/* Legal values will be defined in layered extensions. */
cl_uint allocation_type;
/* Host cache policy for this external memory allocation. */
/* Host cache policy for this external memory allocation. */
cl_uint host_cache_policy;
} cl_mem_ext_host_ptr;
/*******************************************
* cl_qcom_ext_host_ptr_iocoherent extension
********************************************/
/* Cache policy specifying io-coherence */
#define CL_MEM_HOST_IOCOHERENT_QCOM 0x40A9
/*********************************
* cl_qcom_ion_host_ptr extension
*********************************/
@@ -300,13 +400,339 @@ typedef struct _cl_mem_ion_host_ptr
/* ION file descriptor */
int ion_filedesc;
/* Host pointer to the ION allocated memory */
void* ion_hostptr;
} cl_mem_ion_host_ptr;
#endif /* CL_VERSION_1_1 */
/*********************************
* cl_qcom_android_native_buffer_host_ptr extension
*********************************/
#define CL_MEM_ANDROID_NATIVE_BUFFER_HOST_PTR_QCOM 0x40C6
typedef struct _cl_mem_android_native_buffer_host_ptr
{
/* Type of external memory allocation. */
/* Must be CL_MEM_ANDROID_NATIVE_BUFFER_HOST_PTR_QCOM for Android native buffers. */
cl_mem_ext_host_ptr ext_host_ptr;
/* Virtual pointer to the android native buffer */
void* anb_ptr;
} cl_mem_android_native_buffer_host_ptr;
/******************************************
* cl_img_yuv_image extension *
******************************************/
/* Image formats used in clCreateImage */
#define CL_NV21_IMG 0x40D0
#define CL_YV12_IMG 0x40D1
/******************************************
* cl_img_cached_allocations extension *
******************************************/
/* Flag values used by clCreateBuffer */
#define CL_MEM_USE_UNCACHED_CPU_MEMORY_IMG (1 << 26)
#define CL_MEM_USE_CACHED_CPU_MEMORY_IMG (1 << 27)
/******************************************
* cl_img_use_gralloc_ptr extension *
******************************************/
#define cl_img_use_gralloc_ptr 1
/* Flag values used by clCreateBuffer */
#define CL_MEM_USE_GRALLOC_PTR_IMG (1 << 28)
/* To be used by clGetEventInfo: */
#define CL_COMMAND_ACQUIRE_GRALLOC_OBJECTS_IMG 0x40D2
#define CL_COMMAND_RELEASE_GRALLOC_OBJECTS_IMG 0x40D3
/* Error code from clEnqueueReleaseGrallocObjectsIMG */
#define CL_GRALLOC_RESOURCE_NOT_ACQUIRED_IMG 0x40D4
extern CL_API_ENTRY cl_int CL_API_CALL
clEnqueueAcquireGrallocObjectsIMG(cl_command_queue command_queue,
cl_uint num_objects,
const cl_mem * mem_objects,
cl_uint num_events_in_wait_list,
const cl_event * event_wait_list,
cl_event * event) CL_EXT_SUFFIX__VERSION_1_2;
extern CL_API_ENTRY cl_int CL_API_CALL
clEnqueueReleaseGrallocObjectsIMG(cl_command_queue command_queue,
cl_uint num_objects,
const cl_mem * mem_objects,
cl_uint num_events_in_wait_list,
const cl_event * event_wait_list,
cl_event * event) CL_EXT_SUFFIX__VERSION_1_2;
/*********************************
* cl_khr_subgroups extension
*********************************/
#define cl_khr_subgroups 1
#if !defined(CL_VERSION_2_1)
/* For OpenCL 2.1 and newer, cl_kernel_sub_group_info is declared in CL.h.
In hindsight, there should have been a khr suffix on this type for
the extension, but keeping it un-suffixed to maintain backwards
compatibility. */
typedef cl_uint cl_kernel_sub_group_info;
#endif
/* cl_kernel_sub_group_info */
#define CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE_KHR 0x2033
#define CL_KERNEL_SUB_GROUP_COUNT_FOR_NDRANGE_KHR 0x2034
extern CL_API_ENTRY cl_int CL_API_CALL
clGetKernelSubGroupInfoKHR(cl_kernel in_kernel,
cl_device_id in_device,
cl_kernel_sub_group_info param_name,
size_t input_value_size,
const void * input_value,
size_t param_value_size,
void * param_value,
size_t * param_value_size_ret) CL_EXT_SUFFIX__VERSION_2_0_DEPRECATED;
typedef CL_API_ENTRY cl_int
(CL_API_CALL * clGetKernelSubGroupInfoKHR_fn)(cl_kernel in_kernel,
cl_device_id in_device,
cl_kernel_sub_group_info param_name,
size_t input_value_size,
const void * input_value,
size_t param_value_size,
void * param_value,
size_t * param_value_size_ret) CL_EXT_SUFFIX__VERSION_2_0_DEPRECATED;
/*********************************
* cl_khr_mipmap_image extension
*********************************/
/* cl_sampler_properties */
#define CL_SAMPLER_MIP_FILTER_MODE_KHR 0x1155
#define CL_SAMPLER_LOD_MIN_KHR 0x1156
#define CL_SAMPLER_LOD_MAX_KHR 0x1157
/*********************************
* cl_khr_priority_hints extension
*********************************/
/* This extension define is for backwards compatibility.
It shouldn't be required since this extension has no new functions. */
#define cl_khr_priority_hints 1
typedef cl_uint cl_queue_priority_khr;
/* cl_command_queue_properties */
#define CL_QUEUE_PRIORITY_KHR 0x1096
/* cl_queue_priority_khr */
#define CL_QUEUE_PRIORITY_HIGH_KHR (1<<0)
#define CL_QUEUE_PRIORITY_MED_KHR (1<<1)
#define CL_QUEUE_PRIORITY_LOW_KHR (1<<2)
/*********************************
* cl_khr_throttle_hints extension
*********************************/
/* This extension define is for backwards compatibility.
It shouldn't be required since this extension has no new functions. */
#define cl_khr_throttle_hints 1
typedef cl_uint cl_queue_throttle_khr;
/* cl_command_queue_properties */
#define CL_QUEUE_THROTTLE_KHR 0x1097
/* cl_queue_throttle_khr */
#define CL_QUEUE_THROTTLE_HIGH_KHR (1<<0)
#define CL_QUEUE_THROTTLE_MED_KHR (1<<1)
#define CL_QUEUE_THROTTLE_LOW_KHR (1<<2)
/*********************************
* cl_khr_subgroup_named_barrier
*********************************/
/* This extension define is for backwards compatibility.
It shouldn't be required since this extension has no new functions. */
#define cl_khr_subgroup_named_barrier 1
/* cl_device_info */
#define CL_DEVICE_MAX_NAMED_BARRIER_COUNT_KHR 0x2035
/**********************************
* cl_arm_import_memory extension *
**********************************/
#define cl_arm_import_memory 1
typedef intptr_t cl_import_properties_arm;
/* Default and valid proporties name for cl_arm_import_memory */
#define CL_IMPORT_TYPE_ARM 0x40B2
/* Host process memory type default value for CL_IMPORT_TYPE_ARM property */
#define CL_IMPORT_TYPE_HOST_ARM 0x40B3
/* DMA BUF memory type value for CL_IMPORT_TYPE_ARM property */
#define CL_IMPORT_TYPE_DMA_BUF_ARM 0x40B4
/* Protected DMA BUF memory type value for CL_IMPORT_TYPE_ARM property */
#define CL_IMPORT_TYPE_PROTECTED_ARM 0x40B5
/* This extension adds a new function that allows for direct memory import into
* OpenCL via the clImportMemoryARM function.
*
* Memory imported through this interface will be mapped into the device's page
* tables directly, providing zero copy access. It will never fall back to copy
* operations and aliased buffers.
*
* Types of memory supported for import are specified as additional extension
* strings.
*
* This extension produces cl_mem allocations which are compatible with all other
* users of cl_mem in the standard API.
*
* This extension maps pages with the same properties as the normal buffer creation
* function clCreateBuffer.
*/
extern CL_API_ENTRY cl_mem CL_API_CALL
clImportMemoryARM( cl_context context,
cl_mem_flags flags,
const cl_import_properties_arm *properties,
void *memory,
size_t size,
cl_int *errcode_ret) CL_EXT_SUFFIX__VERSION_1_0;
/******************************************
* cl_arm_shared_virtual_memory extension *
******************************************/
#define cl_arm_shared_virtual_memory 1
/* Used by clGetDeviceInfo */
#define CL_DEVICE_SVM_CAPABILITIES_ARM 0x40B6
/* Used by clGetMemObjectInfo */
#define CL_MEM_USES_SVM_POINTER_ARM 0x40B7
/* Used by clSetKernelExecInfoARM: */
#define CL_KERNEL_EXEC_INFO_SVM_PTRS_ARM 0x40B8
#define CL_KERNEL_EXEC_INFO_SVM_FINE_GRAIN_SYSTEM_ARM 0x40B9
/* To be used by clGetEventInfo: */
#define CL_COMMAND_SVM_FREE_ARM 0x40BA
#define CL_COMMAND_SVM_MEMCPY_ARM 0x40BB
#define CL_COMMAND_SVM_MEMFILL_ARM 0x40BC
#define CL_COMMAND_SVM_MAP_ARM 0x40BD
#define CL_COMMAND_SVM_UNMAP_ARM 0x40BE
/* Flag values returned by clGetDeviceInfo with CL_DEVICE_SVM_CAPABILITIES_ARM as the param_name. */
#define CL_DEVICE_SVM_COARSE_GRAIN_BUFFER_ARM (1 << 0)
#define CL_DEVICE_SVM_FINE_GRAIN_BUFFER_ARM (1 << 1)
#define CL_DEVICE_SVM_FINE_GRAIN_SYSTEM_ARM (1 << 2)
#define CL_DEVICE_SVM_ATOMICS_ARM (1 << 3)
/* Flag values used by clSVMAllocARM: */
#define CL_MEM_SVM_FINE_GRAIN_BUFFER_ARM (1 << 10)
#define CL_MEM_SVM_ATOMICS_ARM (1 << 11)
typedef cl_bitfield cl_svm_mem_flags_arm;
typedef cl_uint cl_kernel_exec_info_arm;
typedef cl_bitfield cl_device_svm_capabilities_arm;
extern CL_API_ENTRY void * CL_API_CALL
clSVMAllocARM(cl_context context,
cl_svm_mem_flags_arm flags,
size_t size,
cl_uint alignment) CL_EXT_SUFFIX__VERSION_1_2;
extern CL_API_ENTRY void CL_API_CALL
clSVMFreeARM(cl_context context,
void * svm_pointer) CL_EXT_SUFFIX__VERSION_1_2;
extern CL_API_ENTRY cl_int CL_API_CALL
clEnqueueSVMFreeARM(cl_command_queue command_queue,
cl_uint num_svm_pointers,
void * svm_pointers[],
void (CL_CALLBACK * pfn_free_func)(cl_command_queue queue,
cl_uint num_svm_pointers,
void * svm_pointers[],
void * user_data),
void * user_data,
cl_uint num_events_in_wait_list,
const cl_event * event_wait_list,
cl_event * event) CL_EXT_SUFFIX__VERSION_1_2;
extern CL_API_ENTRY cl_int CL_API_CALL
clEnqueueSVMMemcpyARM(cl_command_queue command_queue,
cl_bool blocking_copy,
void * dst_ptr,
const void * src_ptr,
size_t size,
cl_uint num_events_in_wait_list,
const cl_event * event_wait_list,
cl_event * event) CL_EXT_SUFFIX__VERSION_1_2;
extern CL_API_ENTRY cl_int CL_API_CALL
clEnqueueSVMMemFillARM(cl_command_queue command_queue,
void * svm_ptr,
const void * pattern,
size_t pattern_size,
size_t size,
cl_uint num_events_in_wait_list,
const cl_event * event_wait_list,
cl_event * event) CL_EXT_SUFFIX__VERSION_1_2;
extern CL_API_ENTRY cl_int CL_API_CALL
clEnqueueSVMMapARM(cl_command_queue command_queue,
cl_bool blocking_map,
cl_map_flags flags,
void * svm_ptr,
size_t size,
cl_uint num_events_in_wait_list,
const cl_event * event_wait_list,
cl_event * event) CL_EXT_SUFFIX__VERSION_1_2;
extern CL_API_ENTRY cl_int CL_API_CALL
clEnqueueSVMUnmapARM(cl_command_queue command_queue,
void * svm_ptr,
cl_uint num_events_in_wait_list,
const cl_event * event_wait_list,
cl_event * event) CL_EXT_SUFFIX__VERSION_1_2;
extern CL_API_ENTRY cl_int CL_API_CALL
clSetKernelArgSVMPointerARM(cl_kernel kernel,
cl_uint arg_index,
const void * arg_value) CL_EXT_SUFFIX__VERSION_1_2;
extern CL_API_ENTRY cl_int CL_API_CALL
clSetKernelExecInfoARM(cl_kernel kernel,
cl_kernel_exec_info_arm param_name,
size_t param_value_size,
const void * param_value) CL_EXT_SUFFIX__VERSION_1_2;
/********************************
* cl_arm_get_core_id extension *
********************************/
#ifdef CL_VERSION_1_2
#define cl_arm_get_core_id 1
/* Device info property for bitfield of cores present */
#define CL_DEVICE_COMPUTE_UNITS_BITFIELD_ARM 0x40BF
#endif /* CL_VERSION_1_2 */
#ifdef __cplusplus
}

423
include/CL/cl_ext_intel.h Normal file
View File

@@ -0,0 +1,423 @@
/*******************************************************************************
* Copyright (c) 2008-2019 The Khronos Group Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and/or associated documentation files (the
* "Materials"), to deal in the Materials without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Materials, and to
* permit persons to whom the Materials are furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Materials.
*
* MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS
* KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS
* SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT
* https://www.khronos.org/registry/
*
* THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
* CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS.
******************************************************************************/
/*****************************************************************************\
Copyright (c) 2013-2019 Intel Corporation All Rights Reserved.
THESE MATERIALS ARE PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THESE
MATERIALS, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
File Name: cl_ext_intel.h
Abstract:
Notes:
\*****************************************************************************/
#ifndef __CL_EXT_INTEL_H
#define __CL_EXT_INTEL_H
#include <CL/cl.h>
#include <CL/cl_platform.h>
#ifdef __cplusplus
extern "C" {
#endif
/***************************************
* cl_intel_thread_local_exec extension *
****************************************/
#define cl_intel_thread_local_exec 1
#define CL_QUEUE_THREAD_LOCAL_EXEC_ENABLE_INTEL (((cl_bitfield)1) << 31)
/***********************************************
* cl_intel_device_partition_by_names extension *
************************************************/
#define cl_intel_device_partition_by_names 1
#define CL_DEVICE_PARTITION_BY_NAMES_INTEL 0x4052
#define CL_PARTITION_BY_NAMES_LIST_END_INTEL -1
/************************************************
* cl_intel_accelerator extension *
* cl_intel_motion_estimation extension *
* cl_intel_advanced_motion_estimation extension *
*************************************************/
#define cl_intel_accelerator 1
#define cl_intel_motion_estimation 1
#define cl_intel_advanced_motion_estimation 1
typedef struct _cl_accelerator_intel* cl_accelerator_intel;
typedef cl_uint cl_accelerator_type_intel;
typedef cl_uint cl_accelerator_info_intel;
typedef struct _cl_motion_estimation_desc_intel {
cl_uint mb_block_type;
cl_uint subpixel_mode;
cl_uint sad_adjust_mode;
cl_uint search_path_type;
} cl_motion_estimation_desc_intel;
/* error codes */
#define CL_INVALID_ACCELERATOR_INTEL -1094
#define CL_INVALID_ACCELERATOR_TYPE_INTEL -1095
#define CL_INVALID_ACCELERATOR_DESCRIPTOR_INTEL -1096
#define CL_ACCELERATOR_TYPE_NOT_SUPPORTED_INTEL -1097
/* cl_accelerator_type_intel */
#define CL_ACCELERATOR_TYPE_MOTION_ESTIMATION_INTEL 0x0
/* cl_accelerator_info_intel */
#define CL_ACCELERATOR_DESCRIPTOR_INTEL 0x4090
#define CL_ACCELERATOR_REFERENCE_COUNT_INTEL 0x4091
#define CL_ACCELERATOR_CONTEXT_INTEL 0x4092
#define CL_ACCELERATOR_TYPE_INTEL 0x4093
/* cl_motion_detect_desc_intel flags */
#define CL_ME_MB_TYPE_16x16_INTEL 0x0
#define CL_ME_MB_TYPE_8x8_INTEL 0x1
#define CL_ME_MB_TYPE_4x4_INTEL 0x2
#define CL_ME_SUBPIXEL_MODE_INTEGER_INTEL 0x0
#define CL_ME_SUBPIXEL_MODE_HPEL_INTEL 0x1
#define CL_ME_SUBPIXEL_MODE_QPEL_INTEL 0x2
#define CL_ME_SAD_ADJUST_MODE_NONE_INTEL 0x0
#define CL_ME_SAD_ADJUST_MODE_HAAR_INTEL 0x1
#define CL_ME_SEARCH_PATH_RADIUS_2_2_INTEL 0x0
#define CL_ME_SEARCH_PATH_RADIUS_4_4_INTEL 0x1
#define CL_ME_SEARCH_PATH_RADIUS_16_12_INTEL 0x5
#define CL_ME_SKIP_BLOCK_TYPE_16x16_INTEL 0x0
#define CL_ME_CHROMA_INTRA_PREDICT_ENABLED_INTEL 0x1
#define CL_ME_LUMA_INTRA_PREDICT_ENABLED_INTEL 0x2
#define CL_ME_SKIP_BLOCK_TYPE_8x8_INTEL 0x4
#define CL_ME_FORWARD_INPUT_MODE_INTEL 0x1
#define CL_ME_BACKWARD_INPUT_MODE_INTEL 0x2
#define CL_ME_BIDIRECTION_INPUT_MODE_INTEL 0x3
#define CL_ME_BIDIR_WEIGHT_QUARTER_INTEL 16
#define CL_ME_BIDIR_WEIGHT_THIRD_INTEL 21
#define CL_ME_BIDIR_WEIGHT_HALF_INTEL 32
#define CL_ME_BIDIR_WEIGHT_TWO_THIRD_INTEL 43
#define CL_ME_BIDIR_WEIGHT_THREE_QUARTER_INTEL 48
#define CL_ME_COST_PENALTY_NONE_INTEL 0x0
#define CL_ME_COST_PENALTY_LOW_INTEL 0x1
#define CL_ME_COST_PENALTY_NORMAL_INTEL 0x2
#define CL_ME_COST_PENALTY_HIGH_INTEL 0x3
#define CL_ME_COST_PRECISION_QPEL_INTEL 0x0
#define CL_ME_COST_PRECISION_HPEL_INTEL 0x1
#define CL_ME_COST_PRECISION_PEL_INTEL 0x2
#define CL_ME_COST_PRECISION_DPEL_INTEL 0x3
#define CL_ME_LUMA_PREDICTOR_MODE_VERTICAL_INTEL 0x0
#define CL_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_INTEL 0x1
#define CL_ME_LUMA_PREDICTOR_MODE_DC_INTEL 0x2
#define CL_ME_LUMA_PREDICTOR_MODE_DIAGONAL_DOWN_LEFT_INTEL 0x3
#define CL_ME_LUMA_PREDICTOR_MODE_DIAGONAL_DOWN_RIGHT_INTEL 0x4
#define CL_ME_LUMA_PREDICTOR_MODE_PLANE_INTEL 0x4
#define CL_ME_LUMA_PREDICTOR_MODE_VERTICAL_RIGHT_INTEL 0x5
#define CL_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_DOWN_INTEL 0x6
#define CL_ME_LUMA_PREDICTOR_MODE_VERTICAL_LEFT_INTEL 0x7
#define CL_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_UP_INTEL 0x8
#define CL_ME_CHROMA_PREDICTOR_MODE_DC_INTEL 0x0
#define CL_ME_CHROMA_PREDICTOR_MODE_HORIZONTAL_INTEL 0x1
#define CL_ME_CHROMA_PREDICTOR_MODE_VERTICAL_INTEL 0x2
#define CL_ME_CHROMA_PREDICTOR_MODE_PLANE_INTEL 0x3
/* cl_device_info */
#define CL_DEVICE_ME_VERSION_INTEL 0x407E
#define CL_ME_VERSION_LEGACY_INTEL 0x0
#define CL_ME_VERSION_ADVANCED_VER_1_INTEL 0x1
#define CL_ME_VERSION_ADVANCED_VER_2_INTEL 0x2
extern CL_API_ENTRY cl_accelerator_intel CL_API_CALL
clCreateAcceleratorINTEL(
cl_context context,
cl_accelerator_type_intel accelerator_type,
size_t descriptor_size,
const void* descriptor,
cl_int* errcode_ret) CL_EXT_SUFFIX__VERSION_1_2;
typedef CL_API_ENTRY cl_accelerator_intel (CL_API_CALL *clCreateAcceleratorINTEL_fn)(
cl_context context,
cl_accelerator_type_intel accelerator_type,
size_t descriptor_size,
const void* descriptor,
cl_int* errcode_ret) CL_EXT_SUFFIX__VERSION_1_2;
extern CL_API_ENTRY cl_int CL_API_CALL
clGetAcceleratorInfoINTEL(
cl_accelerator_intel accelerator,
cl_accelerator_info_intel param_name,
size_t param_value_size,
void* param_value,
size_t* param_value_size_ret) CL_EXT_SUFFIX__VERSION_1_2;
typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetAcceleratorInfoINTEL_fn)(
cl_accelerator_intel accelerator,
cl_accelerator_info_intel param_name,
size_t param_value_size,
void* param_value,
size_t* param_value_size_ret) CL_EXT_SUFFIX__VERSION_1_2;
extern CL_API_ENTRY cl_int CL_API_CALL
clRetainAcceleratorINTEL(
cl_accelerator_intel accelerator) CL_EXT_SUFFIX__VERSION_1_2;
typedef CL_API_ENTRY cl_int (CL_API_CALL *clRetainAcceleratorINTEL_fn)(
cl_accelerator_intel accelerator) CL_EXT_SUFFIX__VERSION_1_2;
extern CL_API_ENTRY cl_int CL_API_CALL
clReleaseAcceleratorINTEL(
cl_accelerator_intel accelerator) CL_EXT_SUFFIX__VERSION_1_2;
typedef CL_API_ENTRY cl_int (CL_API_CALL *clReleaseAcceleratorINTEL_fn)(
cl_accelerator_intel accelerator) CL_EXT_SUFFIX__VERSION_1_2;
/******************************************
* cl_intel_simultaneous_sharing extension *
*******************************************/
#define cl_intel_simultaneous_sharing 1
#define CL_DEVICE_SIMULTANEOUS_INTEROPS_INTEL 0x4104
#define CL_DEVICE_NUM_SIMULTANEOUS_INTEROPS_INTEL 0x4105
/***********************************
* cl_intel_egl_image_yuv extension *
************************************/
#define cl_intel_egl_image_yuv 1
#define CL_EGL_YUV_PLANE_INTEL 0x4107
/********************************
* cl_intel_packed_yuv extension *
*********************************/
#define cl_intel_packed_yuv 1
#define CL_YUYV_INTEL 0x4076
#define CL_UYVY_INTEL 0x4077
#define CL_YVYU_INTEL 0x4078
#define CL_VYUY_INTEL 0x4079
/********************************************
* cl_intel_required_subgroup_size extension *
*********************************************/
#define cl_intel_required_subgroup_size 1
#define CL_DEVICE_SUB_GROUP_SIZES_INTEL 0x4108
#define CL_KERNEL_SPILL_MEM_SIZE_INTEL 0x4109
#define CL_KERNEL_COMPILE_SUB_GROUP_SIZE_INTEL 0x410A
/****************************************
* cl_intel_driver_diagnostics extension *
*****************************************/
#define cl_intel_driver_diagnostics 1
typedef cl_uint cl_diagnostics_verbose_level;
#define CL_CONTEXT_SHOW_DIAGNOSTICS_INTEL 0x4106
#define CL_CONTEXT_DIAGNOSTICS_LEVEL_ALL_INTEL ( 0xff )
#define CL_CONTEXT_DIAGNOSTICS_LEVEL_GOOD_INTEL ( 1 )
#define CL_CONTEXT_DIAGNOSTICS_LEVEL_BAD_INTEL ( 1 << 1 )
#define CL_CONTEXT_DIAGNOSTICS_LEVEL_NEUTRAL_INTEL ( 1 << 2 )
/********************************
* cl_intel_planar_yuv extension *
*********************************/
#define CL_NV12_INTEL 0x410E
#define CL_MEM_NO_ACCESS_INTEL ( 1 << 24 )
#define CL_MEM_ACCESS_FLAGS_UNRESTRICTED_INTEL ( 1 << 25 )
#define CL_DEVICE_PLANAR_YUV_MAX_WIDTH_INTEL 0x417E
#define CL_DEVICE_PLANAR_YUV_MAX_HEIGHT_INTEL 0x417F
/*******************************************************
* cl_intel_device_side_avc_motion_estimation extension *
********************************************************/
#define CL_DEVICE_AVC_ME_VERSION_INTEL 0x410B
#define CL_DEVICE_AVC_ME_SUPPORTS_TEXTURE_SAMPLER_USE_INTEL 0x410C
#define CL_DEVICE_AVC_ME_SUPPORTS_PREEMPTION_INTEL 0x410D
#define CL_AVC_ME_VERSION_0_INTEL 0x0; // No support.
#define CL_AVC_ME_VERSION_1_INTEL 0x1; // First supported version.
#define CL_AVC_ME_MAJOR_16x16_INTEL 0x0
#define CL_AVC_ME_MAJOR_16x8_INTEL 0x1
#define CL_AVC_ME_MAJOR_8x16_INTEL 0x2
#define CL_AVC_ME_MAJOR_8x8_INTEL 0x3
#define CL_AVC_ME_MINOR_8x8_INTEL 0x0
#define CL_AVC_ME_MINOR_8x4_INTEL 0x1
#define CL_AVC_ME_MINOR_4x8_INTEL 0x2
#define CL_AVC_ME_MINOR_4x4_INTEL 0x3
#define CL_AVC_ME_MAJOR_FORWARD_INTEL 0x0
#define CL_AVC_ME_MAJOR_BACKWARD_INTEL 0x1
#define CL_AVC_ME_MAJOR_BIDIRECTIONAL_INTEL 0x2
#define CL_AVC_ME_PARTITION_MASK_ALL_INTEL 0x0
#define CL_AVC_ME_PARTITION_MASK_16x16_INTEL 0x7E
#define CL_AVC_ME_PARTITION_MASK_16x8_INTEL 0x7D
#define CL_AVC_ME_PARTITION_MASK_8x16_INTEL 0x7B
#define CL_AVC_ME_PARTITION_MASK_8x8_INTEL 0x77
#define CL_AVC_ME_PARTITION_MASK_8x4_INTEL 0x6F
#define CL_AVC_ME_PARTITION_MASK_4x8_INTEL 0x5F
#define CL_AVC_ME_PARTITION_MASK_4x4_INTEL 0x3F
#define CL_AVC_ME_SEARCH_WINDOW_EXHAUSTIVE_INTEL 0x0
#define CL_AVC_ME_SEARCH_WINDOW_SMALL_INTEL 0x1
#define CL_AVC_ME_SEARCH_WINDOW_TINY_INTEL 0x2
#define CL_AVC_ME_SEARCH_WINDOW_EXTRA_TINY_INTEL 0x3
#define CL_AVC_ME_SEARCH_WINDOW_DIAMOND_INTEL 0x4
#define CL_AVC_ME_SEARCH_WINDOW_LARGE_DIAMOND_INTEL 0x5
#define CL_AVC_ME_SEARCH_WINDOW_RESERVED0_INTEL 0x6
#define CL_AVC_ME_SEARCH_WINDOW_RESERVED1_INTEL 0x7
#define CL_AVC_ME_SEARCH_WINDOW_CUSTOM_INTEL 0x8
#define CL_AVC_ME_SEARCH_WINDOW_16x12_RADIUS_INTEL 0x9
#define CL_AVC_ME_SEARCH_WINDOW_4x4_RADIUS_INTEL 0x2
#define CL_AVC_ME_SEARCH_WINDOW_2x2_RADIUS_INTEL 0xa
#define CL_AVC_ME_SAD_ADJUST_MODE_NONE_INTEL 0x0
#define CL_AVC_ME_SAD_ADJUST_MODE_HAAR_INTEL 0x2
#define CL_AVC_ME_SUBPIXEL_MODE_INTEGER_INTEL 0x0
#define CL_AVC_ME_SUBPIXEL_MODE_HPEL_INTEL 0x1
#define CL_AVC_ME_SUBPIXEL_MODE_QPEL_INTEL 0x3
#define CL_AVC_ME_COST_PRECISION_QPEL_INTEL 0x0
#define CL_AVC_ME_COST_PRECISION_HPEL_INTEL 0x1
#define CL_AVC_ME_COST_PRECISION_PEL_INTEL 0x2
#define CL_AVC_ME_COST_PRECISION_DPEL_INTEL 0x3
#define CL_AVC_ME_BIDIR_WEIGHT_QUARTER_INTEL 0x10
#define CL_AVC_ME_BIDIR_WEIGHT_THIRD_INTEL 0x15
#define CL_AVC_ME_BIDIR_WEIGHT_HALF_INTEL 0x20
#define CL_AVC_ME_BIDIR_WEIGHT_TWO_THIRD_INTEL 0x2B
#define CL_AVC_ME_BIDIR_WEIGHT_THREE_QUARTER_INTEL 0x30
#define CL_AVC_ME_BORDER_REACHED_LEFT_INTEL 0x0
#define CL_AVC_ME_BORDER_REACHED_RIGHT_INTEL 0x2
#define CL_AVC_ME_BORDER_REACHED_TOP_INTEL 0x4
#define CL_AVC_ME_BORDER_REACHED_BOTTOM_INTEL 0x8
#define CL_AVC_ME_SKIP_BLOCK_PARTITION_16x16_INTEL 0x0
#define CL_AVC_ME_SKIP_BLOCK_PARTITION_8x8_INTEL 0x4000
#define CL_AVC_ME_SKIP_BLOCK_16x16_FORWARD_ENABLE_INTEL ( 0x1 << 24 )
#define CL_AVC_ME_SKIP_BLOCK_16x16_BACKWARD_ENABLE_INTEL ( 0x2 << 24 )
#define CL_AVC_ME_SKIP_BLOCK_16x16_DUAL_ENABLE_INTEL ( 0x3 << 24 )
#define CL_AVC_ME_SKIP_BLOCK_8x8_FORWARD_ENABLE_INTEL ( 0x55 << 24 )
#define CL_AVC_ME_SKIP_BLOCK_8x8_BACKWARD_ENABLE_INTEL ( 0xAA << 24 )
#define CL_AVC_ME_SKIP_BLOCK_8x8_DUAL_ENABLE_INTEL ( 0xFF << 24 )
#define CL_AVC_ME_SKIP_BLOCK_8x8_0_FORWARD_ENABLE_INTEL ( 0x1 << 24 )
#define CL_AVC_ME_SKIP_BLOCK_8x8_0_BACKWARD_ENABLE_INTEL ( 0x2 << 24 )
#define CL_AVC_ME_SKIP_BLOCK_8x8_1_FORWARD_ENABLE_INTEL ( 0x1 << 26 )
#define CL_AVC_ME_SKIP_BLOCK_8x8_1_BACKWARD_ENABLE_INTEL ( 0x2 << 26 )
#define CL_AVC_ME_SKIP_BLOCK_8x8_2_FORWARD_ENABLE_INTEL ( 0x1 << 28 )
#define CL_AVC_ME_SKIP_BLOCK_8x8_2_BACKWARD_ENABLE_INTEL ( 0x2 << 28 )
#define CL_AVC_ME_SKIP_BLOCK_8x8_3_FORWARD_ENABLE_INTEL ( 0x1 << 30 )
#define CL_AVC_ME_SKIP_BLOCK_8x8_3_BACKWARD_ENABLE_INTEL ( 0x2 << 30 )
#define CL_AVC_ME_BLOCK_BASED_SKIP_4x4_INTEL 0x00
#define CL_AVC_ME_BLOCK_BASED_SKIP_8x8_INTEL 0x80
#define CL_AVC_ME_INTRA_16x16_INTEL 0x0
#define CL_AVC_ME_INTRA_8x8_INTEL 0x1
#define CL_AVC_ME_INTRA_4x4_INTEL 0x2
#define CL_AVC_ME_INTRA_LUMA_PARTITION_MASK_16x16_INTEL 0x6
#define CL_AVC_ME_INTRA_LUMA_PARTITION_MASK_8x8_INTEL 0x5
#define CL_AVC_ME_INTRA_LUMA_PARTITION_MASK_4x4_INTEL 0x3
#define CL_AVC_ME_INTRA_NEIGHBOR_LEFT_MASK_ENABLE_INTEL 0x60
#define CL_AVC_ME_INTRA_NEIGHBOR_UPPER_MASK_ENABLE_INTEL 0x10
#define CL_AVC_ME_INTRA_NEIGHBOR_UPPER_RIGHT_MASK_ENABLE_INTEL 0x8
#define CL_AVC_ME_INTRA_NEIGHBOR_UPPER_LEFT_MASK_ENABLE_INTEL 0x4
#define CL_AVC_ME_LUMA_PREDICTOR_MODE_VERTICAL_INTEL 0x0
#define CL_AVC_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_INTEL 0x1
#define CL_AVC_ME_LUMA_PREDICTOR_MODE_DC_INTEL 0x2
#define CL_AVC_ME_LUMA_PREDICTOR_MODE_DIAGONAL_DOWN_LEFT_INTEL 0x3
#define CL_AVC_ME_LUMA_PREDICTOR_MODE_DIAGONAL_DOWN_RIGHT_INTEL 0x4
#define CL_AVC_ME_LUMA_PREDICTOR_MODE_PLANE_INTEL 0x4
#define CL_AVC_ME_LUMA_PREDICTOR_MODE_VERTICAL_RIGHT_INTEL 0x5
#define CL_AVC_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_DOWN_INTEL 0x6
#define CL_AVC_ME_LUMA_PREDICTOR_MODE_VERTICAL_LEFT_INTEL 0x7
#define CL_AVC_ME_LUMA_PREDICTOR_MODE_HORIZONTAL_UP_INTEL 0x8
#define CL_AVC_ME_CHROMA_PREDICTOR_MODE_DC_INTEL 0x0
#define CL_AVC_ME_CHROMA_PREDICTOR_MODE_HORIZONTAL_INTEL 0x1
#define CL_AVC_ME_CHROMA_PREDICTOR_MODE_VERTICAL_INTEL 0x2
#define CL_AVC_ME_CHROMA_PREDICTOR_MODE_PLANE_INTEL 0x3
#define CL_AVC_ME_FRAME_FORWARD_INTEL 0x1
#define CL_AVC_ME_FRAME_BACKWARD_INTEL 0x2
#define CL_AVC_ME_FRAME_DUAL_INTEL 0x3
#define CL_AVC_ME_SLICE_TYPE_PRED_INTEL 0x0
#define CL_AVC_ME_SLICE_TYPE_BPRED_INTEL 0x1
#define CL_AVC_ME_SLICE_TYPE_INTRA_INTEL 0x2
#define CL_AVC_ME_INTERLACED_SCAN_TOP_FIELD_INTEL 0x0
#define CL_AVC_ME_INTERLACED_SCAN_BOTTOM_FIELD_INTEL 0x1
#ifdef __cplusplus
}
#endif
#endif /* __CL_EXT_INTEL_H */

View File

@@ -1,5 +1,5 @@
/**********************************************************************************
* Copyright (c) 2008 - 2012 The Khronos Group Inc.
* Copyright (c) 2008-2019 The Khronos Group Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and/or associated documentation files (the
@@ -12,6 +12,11 @@
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Materials.
*
* MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS
* KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS
* SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT
* https://www.khronos.org/registry/
*
* THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
@@ -24,11 +29,7 @@
#ifndef __OPENCL_CL_GL_H
#define __OPENCL_CL_GL_H
#ifdef __APPLE__
#include <OpenCL/cl.h>
#else
#include <CL/cl.h>
#endif
#ifdef __cplusplus
extern "C" {
@@ -44,110 +45,118 @@ typedef struct __GLsync *cl_GLsync;
#define CL_GL_OBJECT_TEXTURE2D 0x2001
#define CL_GL_OBJECT_TEXTURE3D 0x2002
#define CL_GL_OBJECT_RENDERBUFFER 0x2003
#ifdef CL_VERSION_1_2
#define CL_GL_OBJECT_TEXTURE2D_ARRAY 0x200E
#define CL_GL_OBJECT_TEXTURE1D 0x200F
#define CL_GL_OBJECT_TEXTURE1D_ARRAY 0x2010
#define CL_GL_OBJECT_TEXTURE_BUFFER 0x2011
#endif
/* cl_gl_texture_info */
#define CL_GL_TEXTURE_TARGET 0x2004
#define CL_GL_MIPMAP_LEVEL 0x2005
#ifdef CL_VERSION_1_2
#define CL_GL_NUM_SAMPLES 0x2012
#endif
extern CL_API_ENTRY cl_mem CL_API_CALL
clCreateFromGLBuffer(cl_context /* context */,
cl_mem_flags /* flags */,
cl_GLuint /* bufobj */,
int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0;
clCreateFromGLBuffer(cl_context context,
cl_mem_flags flags,
cl_GLuint bufobj,
cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0;
#ifdef CL_VERSION_1_2
extern CL_API_ENTRY cl_mem CL_API_CALL
clCreateFromGLTexture(cl_context /* context */,
cl_mem_flags /* flags */,
cl_GLenum /* target */,
cl_GLint /* miplevel */,
cl_GLuint /* texture */,
cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_2;
clCreateFromGLTexture(cl_context context,
cl_mem_flags flags,
cl_GLenum target,
cl_GLint miplevel,
cl_GLuint texture,
cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_2;
#endif
extern CL_API_ENTRY cl_mem CL_API_CALL
clCreateFromGLRenderbuffer(cl_context /* context */,
cl_mem_flags /* flags */,
cl_GLuint /* renderbuffer */,
cl_int * /* errcode_ret */) CL_API_SUFFIX__VERSION_1_0;
clCreateFromGLRenderbuffer(cl_context context,
cl_mem_flags flags,
cl_GLuint renderbuffer,
cl_int * errcode_ret) CL_API_SUFFIX__VERSION_1_0;
extern CL_API_ENTRY cl_int CL_API_CALL
clGetGLObjectInfo(cl_mem /* memobj */,
cl_gl_object_type * /* gl_object_type */,
cl_GLuint * /* gl_object_name */) CL_API_SUFFIX__VERSION_1_0;
extern CL_API_ENTRY cl_int CL_API_CALL
clGetGLTextureInfo(cl_mem /* memobj */,
cl_gl_texture_info /* param_name */,
size_t /* param_value_size */,
void * /* param_value */,
size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0;
clGetGLObjectInfo(cl_mem memobj,
cl_gl_object_type * gl_object_type,
cl_GLuint * gl_object_name) CL_API_SUFFIX__VERSION_1_0;
extern CL_API_ENTRY cl_int CL_API_CALL
clEnqueueAcquireGLObjects(cl_command_queue /* command_queue */,
cl_uint /* num_objects */,
const cl_mem * /* mem_objects */,
cl_uint /* num_events_in_wait_list */,
const cl_event * /* event_wait_list */,
cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0;
clGetGLTextureInfo(cl_mem memobj,
cl_gl_texture_info param_name,
size_t param_value_size,
void * param_value,
size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0;
extern CL_API_ENTRY cl_int CL_API_CALL
clEnqueueReleaseGLObjects(cl_command_queue /* command_queue */,
cl_uint /* num_objects */,
const cl_mem * /* mem_objects */,
cl_uint /* num_events_in_wait_list */,
const cl_event * /* event_wait_list */,
cl_event * /* event */) CL_API_SUFFIX__VERSION_1_0;
clEnqueueAcquireGLObjects(cl_command_queue command_queue,
cl_uint num_objects,
const cl_mem * mem_objects,
cl_uint num_events_in_wait_list,
const cl_event * event_wait_list,
cl_event * event) CL_API_SUFFIX__VERSION_1_0;
extern CL_API_ENTRY cl_int CL_API_CALL
clEnqueueReleaseGLObjects(cl_command_queue command_queue,
cl_uint num_objects,
const cl_mem * mem_objects,
cl_uint num_events_in_wait_list,
const cl_event * event_wait_list,
cl_event * event) CL_API_SUFFIX__VERSION_1_0;
/* Deprecated OpenCL 1.1 APIs */
extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL
clCreateFromGLTexture2D(cl_context /* context */,
cl_mem_flags /* flags */,
cl_GLenum /* target */,
cl_GLint /* miplevel */,
cl_GLuint /* texture */,
cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED;
clCreateFromGLTexture2D(cl_context context,
cl_mem_flags flags,
cl_GLenum target,
cl_GLint miplevel,
cl_GLuint texture,
cl_int * errcode_ret) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED;
extern CL_API_ENTRY CL_EXT_PREFIX__VERSION_1_1_DEPRECATED cl_mem CL_API_CALL
clCreateFromGLTexture3D(cl_context /* context */,
cl_mem_flags /* flags */,
cl_GLenum /* target */,
cl_GLint /* miplevel */,
cl_GLuint /* texture */,
cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED;
clCreateFromGLTexture3D(cl_context context,
cl_mem_flags flags,
cl_GLenum target,
cl_GLint miplevel,
cl_GLuint texture,
cl_int * errcode_ret) CL_EXT_SUFFIX__VERSION_1_1_DEPRECATED;
/* cl_khr_gl_sharing extension */
#define cl_khr_gl_sharing 1
typedef cl_uint cl_gl_context_info;
/* Additional Error Codes */
#define CL_INVALID_GL_SHAREGROUP_REFERENCE_KHR -1000
/* cl_gl_context_info */
#define CL_CURRENT_DEVICE_FOR_GL_CONTEXT_KHR 0x2006
#define CL_DEVICES_FOR_GL_CONTEXT_KHR 0x2007
/* Additional cl_context_properties */
#define CL_GL_CONTEXT_KHR 0x2008
#define CL_EGL_DISPLAY_KHR 0x2009
#define CL_GLX_DISPLAY_KHR 0x200A
#define CL_WGL_HDC_KHR 0x200B
#define CL_CGL_SHAREGROUP_KHR 0x200C
extern CL_API_ENTRY cl_int CL_API_CALL
clGetGLContextInfoKHR(const cl_context_properties * /* properties */,
cl_gl_context_info /* param_name */,
size_t /* param_value_size */,
void * /* param_value */,
size_t * /* param_value_size_ret */) CL_API_SUFFIX__VERSION_1_0;
clGetGLContextInfoKHR(const cl_context_properties * properties,
cl_gl_context_info param_name,
size_t param_value_size,
void * param_value,
size_t * param_value_size_ret) CL_API_SUFFIX__VERSION_1_0;
typedef CL_API_ENTRY cl_int (CL_API_CALL *clGetGLContextInfoKHR_fn)(
const cl_context_properties * properties,
cl_gl_context_info param_name,

View File

@@ -1,5 +1,5 @@
/**********************************************************************************
* Copyright (c) 2008-2012 The Khronos Group Inc.
* Copyright (c) 2008-2019 The Khronos Group Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and/or associated documentation files (the
@@ -12,6 +12,11 @@
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Materials.
*
* MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS
* KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS
* SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT
* https://www.khronos.org/registry/
*
* THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
@@ -21,11 +26,6 @@
* MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS.
**********************************************************************************/
/* $Revision: 11708 $ on $Date: 2010-06-13 23:36:24 -0700 (Sun, 13 Jun 2010) $ */
/* cl_gl_ext.h contains vendor (non-KHR) OpenCL extensions which have */
/* OpenGL dependencies. */
#ifndef __OPENCL_CL_GL_EXT_H
#define __OPENCL_CL_GL_EXT_H
@@ -33,34 +33,17 @@
extern "C" {
#endif
#ifdef __APPLE__
#include <OpenCL/cl_gl.h>
#else
#include <CL/cl_gl.h>
#endif
/*
* For each extension, follow this template
* cl_VEN_extname extension */
/* #define cl_VEN_extname 1
* ... define new types, if any
* ... define new tokens, if any
* ... define new APIs, if any
*
* If you need GLtypes here, mirror them with a cl_GLtype, rather than including a GL header
* This allows us to avoid having to decide whether to include GL headers or GLES here.
*/
#include <CL/cl_gl.h>
/*
* cl_khr_gl_event extension
* See section 9.9 in the OpenCL 1.1 spec for more information
* cl_khr_gl_event extension
*/
#define CL_COMMAND_GL_FENCE_SYNC_OBJECT_KHR 0x200D
extern CL_API_ENTRY cl_event CL_API_CALL
clCreateEventFromGLsyncKHR(cl_context /* context */,
cl_GLsync /* cl_GLsync */,
cl_int * /* errcode_ret */) CL_EXT_SUFFIX__VERSION_1_1;
clCreateEventFromGLsyncKHR(cl_context context,
cl_GLsync cl_GLsync,
cl_int * errcode_ret) CL_EXT_SUFFIX__VERSION_1_1;
#ifdef __cplusplus
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,172 @@
/**********************************************************************************
* Copyright (c) 2008-2019 The Khronos Group Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and/or associated documentation files (the
* "Materials"), to deal in the Materials without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Materials, and to
* permit persons to whom the Materials are furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Materials.
*
* MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS
* KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS
* SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT
* https://www.khronos.org/registry/
*
* THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
* CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS.
**********************************************************************************/
/*****************************************************************************\
Copyright (c) 2013-2019 Intel Corporation All Rights Reserved.
THESE MATERIALS ARE PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL INTEL OR ITS
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THESE
MATERIALS, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
File Name: cl_va_api_media_sharing_intel.h
Abstract:
Notes:
\*****************************************************************************/
#ifndef __OPENCL_CL_VA_API_MEDIA_SHARING_INTEL_H
#define __OPENCL_CL_VA_API_MEDIA_SHARING_INTEL_H
#include <CL/cl.h>
#include <CL/cl_platform.h>
#include <va/va.h>
#ifdef __cplusplus
extern "C" {
#endif
/******************************************
* cl_intel_va_api_media_sharing extension *
*******************************************/
#define cl_intel_va_api_media_sharing 1
/* error codes */
#define CL_INVALID_VA_API_MEDIA_ADAPTER_INTEL -1098
#define CL_INVALID_VA_API_MEDIA_SURFACE_INTEL -1099
#define CL_VA_API_MEDIA_SURFACE_ALREADY_ACQUIRED_INTEL -1100
#define CL_VA_API_MEDIA_SURFACE_NOT_ACQUIRED_INTEL -1101
/* cl_va_api_device_source_intel */
#define CL_VA_API_DISPLAY_INTEL 0x4094
/* cl_va_api_device_set_intel */
#define CL_PREFERRED_DEVICES_FOR_VA_API_INTEL 0x4095
#define CL_ALL_DEVICES_FOR_VA_API_INTEL 0x4096
/* cl_context_info */
#define CL_CONTEXT_VA_API_DISPLAY_INTEL 0x4097
/* cl_mem_info */
#define CL_MEM_VA_API_MEDIA_SURFACE_INTEL 0x4098
/* cl_image_info */
#define CL_IMAGE_VA_API_PLANE_INTEL 0x4099
/* cl_command_type */
#define CL_COMMAND_ACQUIRE_VA_API_MEDIA_SURFACES_INTEL 0x409A
#define CL_COMMAND_RELEASE_VA_API_MEDIA_SURFACES_INTEL 0x409B
typedef cl_uint cl_va_api_device_source_intel;
typedef cl_uint cl_va_api_device_set_intel;
extern CL_API_ENTRY cl_int CL_API_CALL
clGetDeviceIDsFromVA_APIMediaAdapterINTEL(
cl_platform_id platform,
cl_va_api_device_source_intel media_adapter_type,
void* media_adapter,
cl_va_api_device_set_intel media_adapter_set,
cl_uint num_entries,
cl_device_id* devices,
cl_uint* num_devices) CL_EXT_SUFFIX__VERSION_1_2;
typedef CL_API_ENTRY cl_int (CL_API_CALL * clGetDeviceIDsFromVA_APIMediaAdapterINTEL_fn)(
cl_platform_id platform,
cl_va_api_device_source_intel media_adapter_type,
void* media_adapter,
cl_va_api_device_set_intel media_adapter_set,
cl_uint num_entries,
cl_device_id* devices,
cl_uint* num_devices) CL_EXT_SUFFIX__VERSION_1_2;
extern CL_API_ENTRY cl_mem CL_API_CALL
clCreateFromVA_APIMediaSurfaceINTEL(
cl_context context,
cl_mem_flags flags,
VASurfaceID* surface,
cl_uint plane,
cl_int* errcode_ret) CL_EXT_SUFFIX__VERSION_1_2;
typedef CL_API_ENTRY cl_mem (CL_API_CALL * clCreateFromVA_APIMediaSurfaceINTEL_fn)(
cl_context context,
cl_mem_flags flags,
VASurfaceID* surface,
cl_uint plane,
cl_int* errcode_ret) CL_EXT_SUFFIX__VERSION_1_2;
extern CL_API_ENTRY cl_int CL_API_CALL
clEnqueueAcquireVA_APIMediaSurfacesINTEL(
cl_command_queue command_queue,
cl_uint num_objects,
const cl_mem* mem_objects,
cl_uint num_events_in_wait_list,
const cl_event* event_wait_list,
cl_event* event) CL_EXT_SUFFIX__VERSION_1_2;
typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueAcquireVA_APIMediaSurfacesINTEL_fn)(
cl_command_queue command_queue,
cl_uint num_objects,
const cl_mem* mem_objects,
cl_uint num_events_in_wait_list,
const cl_event* event_wait_list,
cl_event* event) CL_EXT_SUFFIX__VERSION_1_2;
extern CL_API_ENTRY cl_int CL_API_CALL
clEnqueueReleaseVA_APIMediaSurfacesINTEL(
cl_command_queue command_queue,
cl_uint num_objects,
const cl_mem* mem_objects,
cl_uint num_events_in_wait_list,
const cl_event* event_wait_list,
cl_event* event) CL_EXT_SUFFIX__VERSION_1_2;
typedef CL_API_ENTRY cl_int (CL_API_CALL *clEnqueueReleaseVA_APIMediaSurfacesINTEL_fn)(
cl_command_queue command_queue,
cl_uint num_objects,
const cl_mem* mem_objects,
cl_uint num_events_in_wait_list,
const cl_event* event_wait_list,
cl_event* event) CL_EXT_SUFFIX__VERSION_1_2;
#ifdef __cplusplus
}
#endif
#endif /* __OPENCL_CL_VA_API_MEDIA_SHARING_INTEL_H */

86
include/CL/cl_version.h Normal file
View File

@@ -0,0 +1,86 @@
/*******************************************************************************
* Copyright (c) 2018 The Khronos Group Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and/or associated documentation files (the
* "Materials"), to deal in the Materials without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Materials, and to
* permit persons to whom the Materials are furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Materials.
*
* MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS
* KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS
* SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT
* https://www.khronos.org/registry/
*
* THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
* CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* MATERIALS OR THE USE OR OTHER DEALINGS IN THE MATERIALS.
******************************************************************************/
#ifndef __CL_VERSION_H
#define __CL_VERSION_H
/* Detect which version to target */
#if !defined(CL_TARGET_OPENCL_VERSION)
#pragma message("cl_version.h: CL_TARGET_OPENCL_VERSION is not defined. Defaulting to 220 (OpenCL 2.2)")
#define CL_TARGET_OPENCL_VERSION 220
#endif
#if CL_TARGET_OPENCL_VERSION != 100 && \
CL_TARGET_OPENCL_VERSION != 110 && \
CL_TARGET_OPENCL_VERSION != 120 && \
CL_TARGET_OPENCL_VERSION != 200 && \
CL_TARGET_OPENCL_VERSION != 210 && \
CL_TARGET_OPENCL_VERSION != 220
#pragma message("cl_version: CL_TARGET_OPENCL_VERSION is not a valid value (100, 110, 120, 200, 210, 220). Defaulting to 220 (OpenCL 2.2)")
#undef CL_TARGET_OPENCL_VERSION
#define CL_TARGET_OPENCL_VERSION 220
#endif
/* OpenCL Version */
#if CL_TARGET_OPENCL_VERSION >= 220 && !defined(CL_VERSION_2_2)
#define CL_VERSION_2_2 1
#endif
#if CL_TARGET_OPENCL_VERSION >= 210 && !defined(CL_VERSION_2_1)
#define CL_VERSION_2_1 1
#endif
#if CL_TARGET_OPENCL_VERSION >= 200 && !defined(CL_VERSION_2_0)
#define CL_VERSION_2_0 1
#endif
#if CL_TARGET_OPENCL_VERSION >= 120 && !defined(CL_VERSION_1_2)
#define CL_VERSION_1_2 1
#endif
#if CL_TARGET_OPENCL_VERSION >= 110 && !defined(CL_VERSION_1_1)
#define CL_VERSION_1_1 1
#endif
#if CL_TARGET_OPENCL_VERSION >= 100 && !defined(CL_VERSION_1_0)
#define CL_VERSION_1_0 1
#endif
/* Allow deprecated APIs for older OpenCL versions. */
#if CL_TARGET_OPENCL_VERSION <= 210 && !defined(CL_USE_DEPRECATED_OPENCL_2_1_APIS)
#define CL_USE_DEPRECATED_OPENCL_2_1_APIS
#endif
#if CL_TARGET_OPENCL_VERSION <= 200 && !defined(CL_USE_DEPRECATED_OPENCL_2_0_APIS)
#define CL_USE_DEPRECATED_OPENCL_2_0_APIS
#endif
#if CL_TARGET_OPENCL_VERSION <= 120 && !defined(CL_USE_DEPRECATED_OPENCL_1_2_APIS)
#define CL_USE_DEPRECATED_OPENCL_1_2_APIS
#endif
#if CL_TARGET_OPENCL_VERSION <= 110 && !defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS)
#define CL_USE_DEPRECATED_OPENCL_1_1_APIS
#endif
#if CL_TARGET_OPENCL_VERSION <= 100 && !defined(CL_USE_DEPRECATED_OPENCL_1_0_APIS)
#define CL_USE_DEPRECATED_OPENCL_1_0_APIS
#endif
#endif /* __CL_VERSION_H */

View File

@@ -1,5 +1,5 @@
/*******************************************************************************
* Copyright (c) 2008-2012 The Khronos Group Inc.
* Copyright (c) 2008-2015 The Khronos Group Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and/or associated documentation files (the
@@ -12,6 +12,11 @@
* The above copyright notice and this permission notice shall be included
* in all copies or substantial portions of the Materials.
*
* MODIFICATIONS TO THIS FILE MAY MEAN IT NO LONGER ACCURATELY REFLECTS
* KHRONOS STANDARDS. THE UNMODIFIED, NORMATIVE VERSIONS OF KHRONOS
* SPECIFICATIONS AND HEADER INFORMATION ARE LOCATED AT
* https://www.khronos.org/registry/
*
* THE MATERIALS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
@@ -30,25 +35,13 @@
extern "C" {
#endif
#ifdef __APPLE__
#include <OpenCL/cl.h>
#include <OpenCL/cl_gl.h>
#include <OpenCL/cl_gl_ext.h>
#include <OpenCL/cl_ext.h>
#else
#include <CL/cl.h>
#include <CL/cl_gl.h>
#include <CL/cl_gl_ext.h>
#include <CL/cl_ext.h>
#endif
#ifdef __cplusplus
}
#endif
#endif /* __OPENCL_H */

View File

@@ -48,6 +48,8 @@ typedef unsigned int drm_drawable_t;
typedef struct drm_clip_rect drm_clip_rect_t;
#endif
#include <GL/gl.h>
#include <stdint.h>
/**
@@ -1345,6 +1347,7 @@ struct __DRIdri2ExtensionRec {
#define __DRI_IMAGE_FOURCC_YUYV 0x56595559
#define __DRI_IMAGE_FOURCC_UYVY 0x59565955
#define __DRI_IMAGE_FOURCC_AYUV 0x56555941
#define __DRI_IMAGE_FOURCC_XYUV8888 0x56555958
#define __DRI_IMAGE_FOURCC_YVU410 0x39555659
#define __DRI_IMAGE_FOURCC_YVU411 0x31315659
@@ -1352,6 +1355,10 @@ struct __DRIdri2ExtensionRec {
#define __DRI_IMAGE_FOURCC_YVU422 0x36315659
#define __DRI_IMAGE_FOURCC_YVU444 0x34325659
#define __DRI_IMAGE_FOURCC_P010 0x30313050
#define __DRI_IMAGE_FOURCC_P012 0x32313050
#define __DRI_IMAGE_FOURCC_P016 0x36313050
/**
* Queryable on images created by createImageFromNames.
*
@@ -1372,6 +1379,7 @@ struct __DRIdri2ExtensionRec {
#define __DRI_IMAGE_COMPONENTS_Y_XUXV 0x3005
#define __DRI_IMAGE_COMPONENTS_Y_UXVX 0x3008
#define __DRI_IMAGE_COMPONENTS_AYUV 0x3009
#define __DRI_IMAGE_COMPONENTS_XYUV 0x300A
#define __DRI_IMAGE_COMPONENTS_R 0x3006
#define __DRI_IMAGE_COMPONENTS_RG 0x3007

View File

@@ -1,6 +1,6 @@
This directory contains a copy of the installed kernel headers
required by the anv & i965 drivers to communicate with the kernel.
Whenever either of those driver needs new definitions for new kernel
required by several drivers to communicate with the kernel.
Whenever one of those driver needs new definitions for new kernel
APIs, these files should be updated.
These files in master should only be updated once the changes have landed
@@ -13,9 +13,9 @@ $ make headers_install INSTALL_HDR_PATH=/path/to/install
The last update was done at the following kernel commit :
commit 78230c46ec0a91dd4256c9e54934b3c7095a7ee3
Merge: b65bd4031156 037f03155b7d
commit a5f2fafece141ef3509e686cea576366d55cabb6
Merge: 71f4e45a4ed3 860433ed2a55
Author: Dave Airlie <airlied@redhat.com>
Date: Wed Mar 21 14:07:03 2018 +1000
Date: Wed Feb 20 12:16:30 2019 +1000
Merge tag 'omapdrm-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux into drm-next
Merge https://gitlab.freedesktop.org/drm/msm into drm-next

View File

@@ -674,6 +674,22 @@ struct drm_get_cap {
*/
#define DRM_CLIENT_CAP_ATOMIC 3
/**
* DRM_CLIENT_CAP_ASPECT_RATIO
*
* If set to 1, the DRM core will provide aspect ratio information in modes.
*/
#define DRM_CLIENT_CAP_ASPECT_RATIO 4
/**
* DRM_CLIENT_CAP_WRITEBACK_CONNECTORS
*
* If set to 1, the DRM core will expose special connectors to be used for
* writing back to memory the scene setup in the commit. Depends on client
* also supporting DRM_CLIENT_CAP_ATOMIC
*/
#define DRM_CLIENT_CAP_WRITEBACK_CONNECTORS 5
/** DRM_IOCTL_SET_CLIENT_CAP ioctl argument type */
struct drm_set_client_cap {
__u64 capability;

View File

@@ -30,11 +30,50 @@
extern "C" {
#endif
/**
* DOC: overview
*
* In the DRM subsystem, framebuffer pixel formats are described using the
* fourcc codes defined in `include/uapi/drm/drm_fourcc.h`. In addition to the
* fourcc code, a Format Modifier may optionally be provided, in order to
* further describe the buffer's format - for example tiling or compression.
*
* Format Modifiers
* ----------------
*
* Format modifiers are used in conjunction with a fourcc code, forming a
* unique fourcc:modifier pair. This format:modifier pair must fully define the
* format and data layout of the buffer, and should be the only way to describe
* that particular buffer.
*
* Having multiple fourcc:modifier pairs which describe the same layout should
* be avoided, as such aliases run the risk of different drivers exposing
* different names for the same data format, forcing userspace to understand
* that they are aliases.
*
* Format modifiers may change any property of the buffer, including the number
* of planes and/or the required allocation size. Format modifiers are
* vendor-namespaced, and as such the relationship between a fourcc code and a
* modifier is specific to the modifer being used. For example, some modifiers
* may preserve meaning - such as number of planes - from the fourcc code,
* whereas others may not.
*
* Vendors should document their modifier usage in as much detail as
* possible, to ensure maximum compatibility across devices, drivers and
* applications.
*
* The authoritative list of format modifier codes is found in
* `include/uapi/drm/drm_fourcc.h`
*/
#define fourcc_code(a, b, c, d) ((__u32)(a) | ((__u32)(b) << 8) | \
((__u32)(c) << 16) | ((__u32)(d) << 24))
#define DRM_FORMAT_BIG_ENDIAN (1<<31) /* format is big endian instead of little endian */
/* Reserve 0 for the invalid format specifier */
#define DRM_FORMAT_INVALID 0
/* color index */
#define DRM_FORMAT_C8 fourcc_code('C', '8', ' ', ' ') /* [7:0] C */
@@ -112,6 +151,21 @@ extern "C" {
#define DRM_FORMAT_VYUY fourcc_code('V', 'Y', 'U', 'Y') /* [31:0] Y1:Cb0:Y0:Cr0 8:8:8:8 little endian */
#define DRM_FORMAT_AYUV fourcc_code('A', 'Y', 'U', 'V') /* [31:0] A:Y:Cb:Cr 8:8:8:8 little endian */
#define DRM_FORMAT_XYUV8888 fourcc_code('X', 'Y', 'U', 'V') /* [31:0] X:Y:Cb:Cr 8:8:8:8 little endian */
/*
* packed YCbCr420 2x2 tiled formats
* first 64 bits will contain Y,Cb,Cr components for a 2x2 tile
*/
/* [63:0] A3:A2:Y3:0:Cr0:0:Y2:0:A1:A0:Y1:0:Cb0:0:Y0:0 1:1:8:2:8:2:8:2:1:1:8:2:8:2:8:2 little endian */
#define DRM_FORMAT_Y0L0 fourcc_code('Y', '0', 'L', '0')
/* [63:0] X3:X2:Y3:0:Cr0:0:Y2:0:X1:X0:Y1:0:Cb0:0:Y0:0 1:1:8:2:8:2:8:2:1:1:8:2:8:2:8:2 little endian */
#define DRM_FORMAT_X0L0 fourcc_code('X', '0', 'L', '0')
/* [63:0] A3:A2:Y3:Cr0:Y2:A1:A0:Y1:Cb0:Y0 1:1:10:10:10:1:1:10:10:10 little endian */
#define DRM_FORMAT_Y0L2 fourcc_code('Y', '0', 'L', '2')
/* [63:0] X3:X2:Y3:Cr0:Y2:X1:X0:Y1:Cb0:Y0 1:1:10:10:10:1:1:10:10:10 little endian */
#define DRM_FORMAT_X0L2 fourcc_code('X', '0', 'L', '2')
/*
* 2 plane RGB + A
@@ -141,6 +195,27 @@ extern "C" {
#define DRM_FORMAT_NV24 fourcc_code('N', 'V', '2', '4') /* non-subsampled Cr:Cb plane */
#define DRM_FORMAT_NV42 fourcc_code('N', 'V', '4', '2') /* non-subsampled Cb:Cr plane */
/*
* 2 plane YCbCr MSB aligned
* index 0 = Y plane, [15:0] Y:x [10:6] little endian
* index 1 = Cr:Cb plane, [31:0] Cr:x:Cb:x [10:6:10:6] little endian
*/
#define DRM_FORMAT_P010 fourcc_code('P', '0', '1', '0') /* 2x2 subsampled Cr:Cb plane 10 bits per channel */
/*
* 2 plane YCbCr MSB aligned
* index 0 = Y plane, [15:0] Y:x [12:4] little endian
* index 1 = Cr:Cb plane, [31:0] Cr:x:Cb:x [12:4:12:4] little endian
*/
#define DRM_FORMAT_P012 fourcc_code('P', '0', '1', '2') /* 2x2 subsampled Cr:Cb plane 12 bits per channel */
/*
* 2 plane YCbCr MSB aligned
* index 0 = Y plane, [15:0] Y little endian
* index 1 = Cr:Cb plane, [31:0] Cr:Cb [16:16] little endian
*/
#define DRM_FORMAT_P016 fourcc_code('P', '0', '1', '6') /* 2x2 subsampled Cr:Cb plane 16 bits per channel */
/*
* 3 plane YCbCr
* index 0: Y plane, [7:0] Y
@@ -183,6 +258,9 @@ extern "C" {
#define DRM_FORMAT_MOD_VENDOR_QCOM 0x05
#define DRM_FORMAT_MOD_VENDOR_VIVANTE 0x06
#define DRM_FORMAT_MOD_VENDOR_BROADCOM 0x07
#define DRM_FORMAT_MOD_VENDOR_ARM 0x08
#define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
/* add more to the end as needed */
#define DRM_FORMAT_RESERVED ((1ULL << 56) - 1)
@@ -298,6 +376,15 @@ extern "C" {
*/
#define DRM_FORMAT_MOD_SAMSUNG_64_32_TILE fourcc_mod_code(SAMSUNG, 1)
/*
* Tiled, 16 (pixels) x 16 (lines) - sized macroblocks
*
* This is a simple tiled layout using tiles of 16x16 pixels in a row-major
* layout. For YCbCr formats Cb/Cr components are taken in such a way that
* they correspond to their 16x16 luma block.
*/
#define DRM_FORMAT_MOD_SAMSUNG_16_16_TILE fourcc_mod_code(SAMSUNG, 2)
/*
* Qualcomm Compressed Format
*
@@ -309,7 +396,7 @@ extern "C" {
* Pixel data height is aligned with macrotile height.
* Entire pixel data buffer is aligned with 4k(bytes).
*/
#define DRM_FORMAT_MOD_QCOM_COMPRESSED fourcc_mod_code(QCOM, 1)
#define DRM_FORMAT_MOD_QCOM_COMPRESSED fourcc_mod_code(QCOM, 1)
/* Vivante framebuffer modifiers */
@@ -498,6 +585,128 @@ extern "C" {
*/
#define DRM_FORMAT_MOD_BROADCOM_UIF fourcc_mod_code(BROADCOM, 6)
/*
* Arm Framebuffer Compression (AFBC) modifiers
*
* AFBC is a proprietary lossless image compression protocol and format.
* It provides fine-grained random access and minimizes the amount of data
* transferred between IP blocks.
*
* AFBC has several features which may be supported and/or used, which are
* represented using bits in the modifier. Not all combinations are valid,
* and different devices or use-cases may support different combinations.
*
* Further information on the use of AFBC modifiers can be found in
* Documentation/gpu/afbc.rst
*/
#define DRM_FORMAT_MOD_ARM_AFBC(__afbc_mode) fourcc_mod_code(ARM, __afbc_mode)
/*
* AFBC superblock size
*
* Indicates the superblock size(s) used for the AFBC buffer. The buffer
* size (in pixels) must be aligned to a multiple of the superblock size.
* Four lowest significant bits(LSBs) are reserved for block size.
*
* Where one superblock size is specified, it applies to all planes of the
* buffer (e.g. 16x16, 32x8). When multiple superblock sizes are specified,
* the first applies to the Luma plane and the second applies to the Chroma
* plane(s). e.g. (32x8_64x4 means 32x8 Luma, with 64x4 Chroma).
* Multiple superblock sizes are only valid for multi-plane YCbCr formats.
*/
#define AFBC_FORMAT_MOD_BLOCK_SIZE_MASK 0xf
#define AFBC_FORMAT_MOD_BLOCK_SIZE_16x16 (1ULL)
#define AFBC_FORMAT_MOD_BLOCK_SIZE_32x8 (2ULL)
#define AFBC_FORMAT_MOD_BLOCK_SIZE_64x4 (3ULL)
#define AFBC_FORMAT_MOD_BLOCK_SIZE_32x8_64x4 (4ULL)
/*
* AFBC lossless colorspace transform
*
* Indicates that the buffer makes use of the AFBC lossless colorspace
* transform.
*/
#define AFBC_FORMAT_MOD_YTR (1ULL << 4)
/*
* AFBC block-split
*
* Indicates that the payload of each superblock is split. The second
* half of the payload is positioned at a predefined offset from the start
* of the superblock payload.
*/
#define AFBC_FORMAT_MOD_SPLIT (1ULL << 5)
/*
* AFBC sparse layout
*
* This flag indicates that the payload of each superblock must be stored at a
* predefined position relative to the other superblocks in the same AFBC
* buffer. This order is the same order used by the header buffer. In this mode
* each superblock is given the same amount of space as an uncompressed
* superblock of the particular format would require, rounding up to the next
* multiple of 128 bytes in size.
*/
#define AFBC_FORMAT_MOD_SPARSE (1ULL << 6)
/*
* AFBC copy-block restrict
*
* Buffers with this flag must obey the copy-block restriction. The restriction
* is such that there are no copy-blocks referring across the border of 8x8
* blocks. For the subsampled data the 8x8 limitation is also subsampled.
*/
#define AFBC_FORMAT_MOD_CBR (1ULL << 7)
/*
* AFBC tiled layout
*
* The tiled layout groups superblocks in 8x8 or 4x4 tiles, where all
* superblocks inside a tile are stored together in memory. 8x8 tiles are used
* for pixel formats up to and including 32 bpp while 4x4 tiles are used for
* larger bpp formats. The order between the tiles is scan line.
* When the tiled layout is used, the buffer size (in pixels) must be aligned
* to the tile size.
*/
#define AFBC_FORMAT_MOD_TILED (1ULL << 8)
/*
* AFBC solid color blocks
*
* Indicates that the buffer makes use of solid-color blocks, whereby bandwidth
* can be reduced if a whole superblock is a single color.
*/
#define AFBC_FORMAT_MOD_SC (1ULL << 9)
/*
* AFBC double-buffer
*
* Indicates that the buffer is allocated in a layout safe for front-buffer
* rendering.
*/
#define AFBC_FORMAT_MOD_DB (1ULL << 10)
/*
* AFBC buffer content hints
*
* Indicates that the buffer includes per-superblock content hints.
*/
#define AFBC_FORMAT_MOD_BCH (1ULL << 11)
/*
* Allwinner tiled modifier
*
* This tiling mode is implemented by the VPU found on all Allwinner platforms,
* codenamed sunxi. It is associated with a YUV format that uses either 2 or 3
* planes.
*
* With this tiling, the luminance samples are disposed in tiles representing
* 32x32 pixels and the chrominance samples in tiles representing 32x64 pixels.
* The pixel order in each tile is linear and the tiles are disposed linearly,
* both in row-major order.
*/
#define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1)
#if defined(__cplusplus)
}
#endif

View File

@@ -93,6 +93,15 @@ extern "C" {
#define DRM_MODE_PICTURE_ASPECT_NONE 0
#define DRM_MODE_PICTURE_ASPECT_4_3 1
#define DRM_MODE_PICTURE_ASPECT_16_9 2
#define DRM_MODE_PICTURE_ASPECT_64_27 3
#define DRM_MODE_PICTURE_ASPECT_256_135 4
/* Content type options */
#define DRM_MODE_CONTENT_TYPE_NO_DATA 0
#define DRM_MODE_CONTENT_TYPE_GRAPHICS 1
#define DRM_MODE_CONTENT_TYPE_PHOTO 2
#define DRM_MODE_CONTENT_TYPE_CINEMA 3
#define DRM_MODE_CONTENT_TYPE_GAME 4
/* Aspect ratio flag bitmask (4 bits 22:19) */
#define DRM_MODE_FLAG_PIC_AR_MASK (0x0F<<19)
@@ -102,6 +111,10 @@ extern "C" {
(DRM_MODE_PICTURE_ASPECT_4_3<<19)
#define DRM_MODE_FLAG_PIC_AR_16_9 \
(DRM_MODE_PICTURE_ASPECT_16_9<<19)
#define DRM_MODE_FLAG_PIC_AR_64_27 \
(DRM_MODE_PICTURE_ASPECT_64_27<<19)
#define DRM_MODE_FLAG_PIC_AR_256_135 \
(DRM_MODE_PICTURE_ASPECT_256_135<<19)
#define DRM_MODE_FLAG_ALL (DRM_MODE_FLAG_PHSYNC | \
DRM_MODE_FLAG_NHSYNC | \
@@ -173,8 +186,9 @@ extern "C" {
/*
* DRM_MODE_REFLECT_<axis>
*
* Signals that the contents of a drm plane is reflected in the <axis> axis,
* Signals that the contents of a drm plane is reflected along the <axis> axis,
* in the same way as mirroring.
* See kerneldoc chapter "Plane Composition Properties" for more details.
*
* This define is provided as a convenience, looking up the property id
* using the name->prop id lookup is the preferred method.
@@ -338,6 +352,7 @@ enum drm_mode_subconnector {
#define DRM_MODE_CONNECTOR_VIRTUAL 15
#define DRM_MODE_CONNECTOR_DSI 16
#define DRM_MODE_CONNECTOR_DPI 17
#define DRM_MODE_CONNECTOR_WRITEBACK 18
struct drm_mode_get_connector {
@@ -873,6 +888,25 @@ struct drm_mode_revoke_lease {
__u32 lessee_id;
};
/**
* struct drm_mode_rect - Two dimensional rectangle.
* @x1: Horizontal starting coordinate (inclusive).
* @y1: Vertical starting coordinate (inclusive).
* @x2: Horizontal ending coordinate (exclusive).
* @y2: Vertical ending coordinate (exclusive).
*
* With drm subsystem using struct drm_rect to manage rectangular area this
* export it to user-space.
*
* Currently used by drm_mode_atomic blob property FB_DAMAGE_CLIPS.
*/
struct drm_mode_rect {
__s32 x1;
__s32 y1;
__s32 x2;
__s32 y2;
};
#if defined(__cplusplus)
}
#endif

View File

@@ -412,6 +412,14 @@ typedef struct drm_i915_irq_wait {
int irq_seq;
} drm_i915_irq_wait_t;
/*
* Different modes of per-process Graphics Translation Table,
* see I915_PARAM_HAS_ALIASING_PPGTT
*/
#define I915_GEM_PPGTT_NONE 0
#define I915_GEM_PPGTT_ALIASING 1
#define I915_GEM_PPGTT_FULL 2
/* Ioctl to query kernel params:
*/
#define I915_PARAM_IRQ_ACTIVE 1
@@ -529,6 +537,28 @@ typedef struct drm_i915_irq_wait {
*/
#define I915_PARAM_CS_TIMESTAMP_FREQUENCY 51
/*
* Once upon a time we supposed that writes through the GGTT would be
* immediately in physical memory (once flushed out of the CPU path). However,
* on a few different processors and chipsets, this is not necessarily the case
* as the writes appear to be buffered internally. Thus a read of the backing
* storage (physical memory) via a different path (with different physical tags
* to the indirect write via the GGTT) will see stale values from before
* the GGTT write. Inside the kernel, we can for the most part keep track of
* the different read/write domains in use (e.g. set-domain), but the assumption
* of coherency is baked into the ABI, hence reporting its true state in this
* parameter.
*
* Reports true when writes via mmap_gtt are immediately visible following an
* lfence to flush the WCB.
*
* Reports false when writes via mmap_gtt are indeterminately delayed in an in
* internal buffer and are _not_ immediately visible to third parties accessing
* directly via mmap_cpu/mmap_wc. Use of mmap_gtt as part of an IPC
* communications channel when reporting false is strongly disadvised.
*/
#define I915_PARAM_MMAP_GTT_COHERENT 52
typedef struct drm_i915_getparam {
__s32 param;
/*
@@ -1456,9 +1486,73 @@ struct drm_i915_gem_context_param {
#define I915_CONTEXT_MAX_USER_PRIORITY 1023 /* inclusive */
#define I915_CONTEXT_DEFAULT_PRIORITY 0
#define I915_CONTEXT_MIN_USER_PRIORITY -1023 /* inclusive */
/*
* When using the following param, value should be a pointer to
* drm_i915_gem_context_param_sseu.
*/
#define I915_CONTEXT_PARAM_SSEU 0x7
__u64 value;
};
/**
* Context SSEU programming
*
* It may be necessary for either functional or performance reason to configure
* a context to run with a reduced number of SSEU (where SSEU stands for Slice/
* Sub-slice/EU).
*
* This is done by configuring SSEU configuration using the below
* @struct drm_i915_gem_context_param_sseu for every supported engine which
* userspace intends to use.
*
* Not all GPUs or engines support this functionality in which case an error
* code -ENODEV will be returned.
*
* Also, flexibility of possible SSEU configuration permutations varies between
* GPU generations and software imposed limitations. Requesting such a
* combination will return an error code of -EINVAL.
*
* NOTE: When perf/OA is active the context's SSEU configuration is ignored in
* favour of a single global setting.
*/
struct drm_i915_gem_context_param_sseu {
/*
* Engine class & instance to be configured or queried.
*/
__u16 engine_class;
__u16 engine_instance;
/*
* Unused for now. Must be cleared to zero.
*/
__u32 flags;
/*
* Mask of slices to enable for the context. Valid values are a subset
* of the bitmask value returned for I915_PARAM_SLICE_MASK.
*/
__u64 slice_mask;
/*
* Mask of subslices to enable for the context. Valid values are a
* subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK.
*/
__u64 subslice_mask;
/*
* Minimum/Maximum number of EUs to enable per subslice for the
* context. min_eus_per_subslice must be inferior or equal to
* max_eus_per_subslice.
*/
__u16 min_eus_per_subslice;
__u16 max_eus_per_subslice;
/*
* Unused for now. Must be cleared to zero.
*/
__u32 rsvd;
};
enum drm_i915_oa_format {
I915_OA_FORMAT_A13 = 1, /* HSW only */
I915_OA_FORMAT_A29, /* HSW only */

View File

@@ -32,143 +32,615 @@ extern "C" {
#define DRM_TEGRA_GEM_CREATE_TILED (1 << 0)
#define DRM_TEGRA_GEM_CREATE_BOTTOM_UP (1 << 1)
/**
* struct drm_tegra_gem_create - parameters for the GEM object creation IOCTL
*/
struct drm_tegra_gem_create {
/**
* @size:
*
* The size, in bytes, of the buffer object to be created.
*/
__u64 size;
/**
* @flags:
*
* A bitmask of flags that influence the creation of GEM objects:
*
* DRM_TEGRA_GEM_CREATE_TILED
* Use the 16x16 tiling format for this buffer.
*
* DRM_TEGRA_GEM_CREATE_BOTTOM_UP
* The buffer has a bottom-up layout.
*/
__u32 flags;
/**
* @handle:
*
* The handle of the created GEM object. Set by the kernel upon
* successful completion of the IOCTL.
*/
__u32 handle;
};
/**
* struct drm_tegra_gem_mmap - parameters for the GEM mmap IOCTL
*/
struct drm_tegra_gem_mmap {
/**
* @handle:
*
* Handle of the GEM object to obtain an mmap offset for.
*/
__u32 handle;
/**
* @pad:
*
* Structure padding that may be used in the future. Must be 0.
*/
__u32 pad;
/**
* @offset:
*
* The mmap offset for the given GEM object. Set by the kernel upon
* successful completion of the IOCTL.
*/
__u64 offset;
};
/**
* struct drm_tegra_syncpt_read - parameters for the read syncpoint IOCTL
*/
struct drm_tegra_syncpt_read {
/**
* @id:
*
* ID of the syncpoint to read the current value from.
*/
__u32 id;
/**
* @value:
*
* The current syncpoint value. Set by the kernel upon successful
* completion of the IOCTL.
*/
__u32 value;
};
/**
* struct drm_tegra_syncpt_incr - parameters for the increment syncpoint IOCTL
*/
struct drm_tegra_syncpt_incr {
/**
* @id:
*
* ID of the syncpoint to increment.
*/
__u32 id;
/**
* @pad:
*
* Structure padding that may be used in the future. Must be 0.
*/
__u32 pad;
};
/**
* struct drm_tegra_syncpt_wait - parameters for the wait syncpoint IOCTL
*/
struct drm_tegra_syncpt_wait {
/**
* @id:
*
* ID of the syncpoint to wait on.
*/
__u32 id;
/**
* @thresh:
*
* Threshold value for which to wait.
*/
__u32 thresh;
/**
* @timeout:
*
* Timeout, in milliseconds, to wait.
*/
__u32 timeout;
/**
* @value:
*
* The new syncpoint value after the wait. Set by the kernel upon
* successful completion of the IOCTL.
*/
__u32 value;
};
#define DRM_TEGRA_NO_TIMEOUT (0xffffffff)
/**
* struct drm_tegra_open_channel - parameters for the open channel IOCTL
*/
struct drm_tegra_open_channel {
/**
* @client:
*
* The client ID for this channel.
*/
__u32 client;
/**
* @pad:
*
* Structure padding that may be used in the future. Must be 0.
*/
__u32 pad;
/**
* @context:
*
* The application context of this channel. Set by the kernel upon
* successful completion of the IOCTL. This context needs to be passed
* to the DRM_TEGRA_CHANNEL_CLOSE or the DRM_TEGRA_SUBMIT IOCTLs.
*/
__u64 context;
};
/**
* struct drm_tegra_close_channel - parameters for the close channel IOCTL
*/
struct drm_tegra_close_channel {
/**
* @context:
*
* The application context of this channel. This is obtained from the
* DRM_TEGRA_OPEN_CHANNEL IOCTL.
*/
__u64 context;
};
/**
* struct drm_tegra_get_syncpt - parameters for the get syncpoint IOCTL
*/
struct drm_tegra_get_syncpt {
/**
* @context:
*
* The application context identifying the channel for which to obtain
* the syncpoint ID.
*/
__u64 context;
/**
* @index:
*
* Index of the client syncpoint for which to obtain the ID.
*/
__u32 index;
/**
* @id:
*
* The ID of the given syncpoint. Set by the kernel upon successful
* completion of the IOCTL.
*/
__u32 id;
};
/**
* struct drm_tegra_get_syncpt_base - parameters for the get wait base IOCTL
*/
struct drm_tegra_get_syncpt_base {
/**
* @context:
*
* The application context identifying for which channel to obtain the
* wait base.
*/
__u64 context;
/**
* @syncpt:
*
* ID of the syncpoint for which to obtain the wait base.
*/
__u32 syncpt;
/**
* @id:
*
* The ID of the wait base corresponding to the client syncpoint. Set
* by the kernel upon successful completion of the IOCTL.
*/
__u32 id;
};
/**
* struct drm_tegra_syncpt - syncpoint increment operation
*/
struct drm_tegra_syncpt {
/**
* @id:
*
* ID of the syncpoint to operate on.
*/
__u32 id;
/**
* @incrs:
*
* Number of increments to perform for the syncpoint.
*/
__u32 incrs;
};
/**
* struct drm_tegra_cmdbuf - structure describing a command buffer
*/
struct drm_tegra_cmdbuf {
/**
* @handle:
*
* Handle to a GEM object containing the command buffer.
*/
__u32 handle;
/**
* @offset:
*
* Offset, in bytes, into the GEM object identified by @handle at
* which the command buffer starts.
*/
__u32 offset;
/**
* @words:
*
* Number of 32-bit words in this command buffer.
*/
__u32 words;
/**
* @pad:
*
* Structure padding that may be used in the future. Must be 0.
*/
__u32 pad;
};
/**
* struct drm_tegra_reloc - GEM object relocation structure
*/
struct drm_tegra_reloc {
struct {
/**
* @cmdbuf.handle:
*
* Handle to the GEM object containing the command buffer for
* which to perform this GEM object relocation.
*/
__u32 handle;
/**
* @cmdbuf.offset:
*
* Offset, in bytes, into the command buffer at which to
* insert the relocated address.
*/
__u32 offset;
} cmdbuf;
struct {
/**
* @target.handle:
*
* Handle to the GEM object to be relocated.
*/
__u32 handle;
/**
* @target.offset:
*
* Offset, in bytes, into the target GEM object at which the
* relocated data starts.
*/
__u32 offset;
} target;
/**
* @shift:
*
* The number of bits by which to shift relocated addresses.
*/
__u32 shift;
/**
* @pad:
*
* Structure padding that may be used in the future. Must be 0.
*/
__u32 pad;
};
/**
* struct drm_tegra_waitchk - wait check structure
*/
struct drm_tegra_waitchk {
/**
* @handle:
*
* Handle to the GEM object containing a command stream on which to
* perform the wait check.
*/
__u32 handle;
/**
* @offset:
*
* Offset, in bytes, of the location in the command stream to perform
* the wait check on.
*/
__u32 offset;
/**
* @syncpt:
*
* ID of the syncpoint to wait check.
*/
__u32 syncpt;
/**
* @thresh:
*
* Threshold value for which to check.
*/
__u32 thresh;
};
/**
* struct drm_tegra_submit - job submission structure
*/
struct drm_tegra_submit {
/**
* @context:
*
* The application context identifying the channel to use for the
* execution of this job.
*/
__u64 context;
__u32 num_syncpts;
__u32 num_cmdbufs;
__u32 num_relocs;
__u32 num_waitchks;
__u32 waitchk_mask;
__u32 timeout;
__u64 syncpts;
__u64 cmdbufs;
__u64 relocs;
__u64 waitchks;
__u32 fence; /* Return value */
__u32 reserved[5]; /* future expansion */
/**
* @num_syncpts:
*
* The number of syncpoints operated on by this job. This defines the
* length of the array pointed to by @syncpts.
*/
__u32 num_syncpts;
/**
* @num_cmdbufs:
*
* The number of command buffers to execute as part of this job. This
* defines the length of the array pointed to by @cmdbufs.
*/
__u32 num_cmdbufs;
/**
* @num_relocs:
*
* The number of relocations to perform before executing this job.
* This defines the length of the array pointed to by @relocs.
*/
__u32 num_relocs;
/**
* @num_waitchks:
*
* The number of wait checks to perform as part of this job. This
* defines the length of the array pointed to by @waitchks.
*/
__u32 num_waitchks;
/**
* @waitchk_mask:
*
* Bitmask of valid wait checks.
*/
__u32 waitchk_mask;
/**
* @timeout:
*
* Timeout, in milliseconds, before this job is cancelled.
*/
__u32 timeout;
/**
* @syncpts:
*
* A pointer to an array of &struct drm_tegra_syncpt structures that
* specify the syncpoint operations performed as part of this job.
* The number of elements in the array must be equal to the value
* given by @num_syncpts.
*/
__u64 syncpts;
/**
* @cmdbufs:
*
* A pointer to an array of &struct drm_tegra_cmdbuf structures that
* define the command buffers to execute as part of this job. The
* number of elements in the array must be equal to the value given
* by @num_syncpts.
*/
__u64 cmdbufs;
/**
* @relocs:
*
* A pointer to an array of &struct drm_tegra_reloc structures that
* specify the relocations that need to be performed before executing
* this job. The number of elements in the array must be equal to the
* value given by @num_relocs.
*/
__u64 relocs;
/**
* @waitchks:
*
* A pointer to an array of &struct drm_tegra_waitchk structures that
* specify the wait checks to be performed while executing this job.
* The number of elements in the array must be equal to the value
* given by @num_waitchks.
*/
__u64 waitchks;
/**
* @fence:
*
* The threshold of the syncpoint associated with this job after it
* has been completed. Set by the kernel upon successful completion of
* the IOCTL. This can be used with the DRM_TEGRA_SYNCPT_WAIT IOCTL to
* wait for this job to be finished.
*/
__u32 fence;
/**
* @reserved:
*
* This field is reserved for future use. Must be 0.
*/
__u32 reserved[5];
};
#define DRM_TEGRA_GEM_TILING_MODE_PITCH 0
#define DRM_TEGRA_GEM_TILING_MODE_TILED 1
#define DRM_TEGRA_GEM_TILING_MODE_BLOCK 2
/**
* struct drm_tegra_gem_set_tiling - parameters for the set tiling IOCTL
*/
struct drm_tegra_gem_set_tiling {
/* input */
/**
* @handle:
*
* Handle to the GEM object for which to set the tiling parameters.
*/
__u32 handle;
/**
* @mode:
*
* The tiling mode to set. Must be one of:
*
* DRM_TEGRA_GEM_TILING_MODE_PITCH
* pitch linear format
*
* DRM_TEGRA_GEM_TILING_MODE_TILED
* 16x16 tiling format
*
* DRM_TEGRA_GEM_TILING_MODE_BLOCK
* 16Bx2 tiling format
*/
__u32 mode;
/**
* @value:
*
* The value to set for the tiling mode parameter.
*/
__u32 value;
/**
* @pad:
*
* Structure padding that may be used in the future. Must be 0.
*/
__u32 pad;
};
/**
* struct drm_tegra_gem_get_tiling - parameters for the get tiling IOCTL
*/
struct drm_tegra_gem_get_tiling {
/* input */
/**
* @handle:
*
* Handle to the GEM object for which to query the tiling parameters.
*/
__u32 handle;
/* output */
/**
* @mode:
*
* The tiling mode currently associated with the GEM object. Set by
* the kernel upon successful completion of the IOCTL.
*/
__u32 mode;
/**
* @value:
*
* The tiling mode parameter currently associated with the GEM object.
* Set by the kernel upon successful completion of the IOCTL.
*/
__u32 value;
/**
* @pad:
*
* Structure padding that may be used in the future. Must be 0.
*/
__u32 pad;
};
#define DRM_TEGRA_GEM_BOTTOM_UP (1 << 0)
#define DRM_TEGRA_GEM_FLAGS (DRM_TEGRA_GEM_BOTTOM_UP)
/**
* struct drm_tegra_gem_set_flags - parameters for the set flags IOCTL
*/
struct drm_tegra_gem_set_flags {
/* input */
/**
* @handle:
*
* Handle to the GEM object for which to set the flags.
*/
__u32 handle;
/* output */
/**
* @flags:
*
* The flags to set for the GEM object.
*/
__u32 flags;
};
/**
* struct drm_tegra_gem_get_flags - parameters for the get flags IOCTL
*/
struct drm_tegra_gem_get_flags {
/* input */
/**
* @handle:
*
* Handle to the GEM object for which to query the flags.
*/
__u32 handle;
/* output */
/**
* @flags:
*
* The flags currently associated with the GEM object. Set by the
* kernel upon successful completion of the IOCTL.
*/
__u32 flags;
};
@@ -193,7 +665,7 @@ struct drm_tegra_gem_get_flags {
#define DRM_IOCTL_TEGRA_SYNCPT_INCR DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_INCR, struct drm_tegra_syncpt_incr)
#define DRM_IOCTL_TEGRA_SYNCPT_WAIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_WAIT, struct drm_tegra_syncpt_wait)
#define DRM_IOCTL_TEGRA_OPEN_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_OPEN_CHANNEL, struct drm_tegra_open_channel)
#define DRM_IOCTL_TEGRA_CLOSE_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_CLOSE_CHANNEL, struct drm_tegra_open_channel)
#define DRM_IOCTL_TEGRA_CLOSE_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_CLOSE_CHANNEL, struct drm_tegra_close_channel)
#define DRM_IOCTL_TEGRA_GET_SYNCPT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GET_SYNCPT, struct drm_tegra_get_syncpt)
#define DRM_IOCTL_TEGRA_SUBMIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SUBMIT, struct drm_tegra_submit)
#define DRM_IOCTL_TEGRA_GET_SYNCPT_BASE DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GET_SYNCPT_BASE, struct drm_tegra_get_syncpt_base)

View File

@@ -52,6 +52,14 @@ extern "C" {
*
* This asks the kernel to have the GPU execute an optional binner
* command list, and a render command list.
*
* The L1T, slice, L2C, L2T, and GCA caches will be flushed before
* each CL executes. The VCD cache should be flushed (if necessary)
* by the submitted CLs. The TLB writes are guaranteed to have been
* flushed by the time the render done IRQ happens, which is the
* trigger for out_sync. Any dirtying of cachelines by the job (only
* possible using TMU writes) must be flushed by the caller using the
* CL's cache flush commands.
*/
struct drm_v3d_submit_cl {
/* Pointer to the binner command list.

View File

@@ -18,7 +18,6 @@
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
inc_drm_uapi = include_directories('drm-uapi')
inc_vulkan = include_directories('vulkan')
inc_d3d9 = include_directories('D3D9')
inc_gl_internal = include_directories('GL/internal')
@@ -94,14 +93,19 @@ if with_gallium_opencl and not with_opencl_icd
install_headers(
'CL/cl.h',
'CL/cl.hpp',
'CL/cl2.hpp',
'CL/cl_d3d10.h',
'CL/cl_d3d11.h',
'CL/cl_dx9_media_sharing.h',
'CL/cl_dx9_media_sharing_intel.h',
'CL/cl_egl.h',
'CL/cl_ext.h',
'CL/cl_ext_intel.h',
'CL/cl_gl.h',
'CL/cl_gl_ext.h',
'CL/cl_platform.h',
'CL/cl_va_api_media_sharing_intel.h',
'CL/cl_version.h',
'CL/opencl.h',
subdir: 'CL'
)

View File

@@ -1,3 +1,4 @@
#ifndef IRIS
CHIPSET(0x29A2, i965, "Intel(R) 965G")
CHIPSET(0x2992, i965, "Intel(R) 965Q")
CHIPSET(0x2982, i965, "Intel(R) 965G")
@@ -91,6 +92,11 @@ CHIPSET(0x0F32, byt, "Intel(R) Bay Trail")
CHIPSET(0x0F33, byt, "Intel(R) Bay Trail")
CHIPSET(0x0157, byt, "Intel(R) Bay Trail")
CHIPSET(0x0155, byt, "Intel(R) Bay Trail")
CHIPSET(0x22B0, chv, "Intel(R) HD Graphics (Cherrytrail)")
CHIPSET(0x22B1, chv, "Intel(R) HD Graphics XXX (Braswell)") /* Overridden in brw_get_renderer_string */
CHIPSET(0x22B2, chv, "Intel(R) HD Graphics (Cherryview)")
CHIPSET(0x22B3, chv, "Intel(R) HD Graphics (Cherryview)")
#endif
CHIPSET(0x1602, bdw_gt1, "Intel(R) Broadwell GT1")
CHIPSET(0x1606, bdw_gt1, "Intel(R) Broadwell GT1")
CHIPSET(0x160A, bdw_gt1, "Intel(R) Broadwell GT1")
@@ -109,10 +115,6 @@ CHIPSET(0x162A, bdw_gt3, "Intel(R) Iris Pro P6300 (Broadwell GT3e)")
CHIPSET(0x162B, bdw_gt3, "Intel(R) Iris 6100 (Broadwell GT3)")
CHIPSET(0x162D, bdw_gt3, "Intel(R) Broadwell GT3")
CHIPSET(0x162E, bdw_gt3, "Intel(R) Broadwell GT3")
CHIPSET(0x22B0, chv, "Intel(R) HD Graphics (Cherrytrail)")
CHIPSET(0x22B1, chv, "Intel(R) HD Graphics XXX (Braswell)") /* Overridden in brw_get_renderer_string */
CHIPSET(0x22B2, chv, "Intel(R) HD Graphics (Cherryview)")
CHIPSET(0x22B3, chv, "Intel(R) HD Graphics (Cherryview)")
CHIPSET(0x1902, skl_gt1, "Intel(R) HD Graphics 510 (Skylake GT1)")
CHIPSET(0x1906, skl_gt1, "Intel(R) HD Graphics 510 (Skylake GT1)")
CHIPSET(0x190A, skl_gt1, "Intel(R) Skylake GT1")

View File

@@ -43,7 +43,7 @@ extern "C" {
#define VK_VERSION_MINOR(version) (((uint32_t)(version) >> 12) & 0x3ff)
#define VK_VERSION_PATCH(version) ((uint32_t)(version) & 0xfff)
// Version of this file
#define VK_HEADER_VERSION 97
#define VK_HEADER_VERSION 101
#define VK_NULL_HANDLE 0
@@ -349,6 +349,8 @@ typedef enum VkStructureType {
VK_STRUCTURE_TYPE_PIPELINE_DISCARD_RECTANGLE_STATE_CREATE_INFO_EXT = 1000099001,
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_CONSERVATIVE_RASTERIZATION_PROPERTIES_EXT = 1000101000,
VK_STRUCTURE_TYPE_PIPELINE_RASTERIZATION_CONSERVATIVE_STATE_CREATE_INFO_EXT = 1000101001,
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_DEPTH_CLIP_ENABLE_FEATURES_EXT = 1000102000,
VK_STRUCTURE_TYPE_PIPELINE_RASTERIZATION_DEPTH_CLIP_STATE_CREATE_INFO_EXT = 1000102001,
VK_STRUCTURE_TYPE_HDR_METADATA_EXT = 1000105000,
VK_STRUCTURE_TYPE_ATTACHMENT_DESCRIPTION_2_KHR = 1000109000,
VK_STRUCTURE_TYPE_ATTACHMENT_REFERENCE_2_KHR = 1000109001,
@@ -431,6 +433,8 @@ typedef enum VkStructureType {
VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_INFO_NV = 1000165012,
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_REPRESENTATIVE_FRAGMENT_TEST_FEATURES_NV = 1000166000,
VK_STRUCTURE_TYPE_PIPELINE_REPRESENTATIVE_FRAGMENT_TEST_STATE_CREATE_INFO_NV = 1000166001,
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_IMAGE_VIEW_IMAGE_FORMAT_INFO_EXT = 1000170000,
VK_STRUCTURE_TYPE_FILTER_CUBIC_IMAGE_VIEW_IMAGE_FORMAT_PROPERTIES_EXT = 1000170001,
VK_STRUCTURE_TYPE_DEVICE_QUEUE_GLOBAL_PRIORITY_CREATE_INFO_EXT = 1000174000,
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_8BIT_STORAGE_FEATURES_KHR = 1000177000,
VK_STRUCTURE_TYPE_IMPORT_MEMORY_HOST_POINTER_INFO_EXT = 1000178000,
@@ -466,11 +470,15 @@ typedef enum VkStructureType {
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_MEMORY_BUDGET_PROPERTIES_EXT = 1000237000,
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_MEMORY_PRIORITY_FEATURES_EXT = 1000238000,
VK_STRUCTURE_TYPE_MEMORY_PRIORITY_ALLOCATE_INFO_EXT = 1000238001,
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_DEDICATED_ALLOCATION_IMAGE_ALIASING_FEATURES_NV = 1000240000,
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_BUFFER_ADDRESS_FEATURES_EXT = 1000244000,
VK_STRUCTURE_TYPE_BUFFER_DEVICE_ADDRESS_INFO_EXT = 1000244001,
VK_STRUCTURE_TYPE_BUFFER_DEVICE_ADDRESS_CREATE_INFO_EXT = 1000244002,
VK_STRUCTURE_TYPE_IMAGE_STENCIL_USAGE_CREATE_INFO_EXT = 1000246000,
VK_STRUCTURE_TYPE_VALIDATION_FEATURES_EXT = 1000247000,
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_COOPERATIVE_MATRIX_FEATURES_NV = 1000249000,
VK_STRUCTURE_TYPE_COOPERATIVE_MATRIX_PROPERTIES_NV = 1000249001,
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_COOPERATIVE_MATRIX_PROPERTIES_NV = 1000249002,
VK_STRUCTURE_TYPE_DEBUG_REPORT_CREATE_INFO_EXT = VK_STRUCTURE_TYPE_DEBUG_REPORT_CALLBACK_CREATE_INFO_EXT,
VK_STRUCTURE_TYPE_RENDER_PASS_MULTIVIEW_CREATE_INFO_KHR = VK_STRUCTURE_TYPE_RENDER_PASS_MULTIVIEW_CREATE_INFO,
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_MULTIVIEW_FEATURES_KHR = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_MULTIVIEW_FEATURES,
@@ -1141,6 +1149,7 @@ typedef enum VkFilter {
VK_FILTER_NEAREST = 0,
VK_FILTER_LINEAR = 1,
VK_FILTER_CUBIC_IMG = 1000015000,
VK_FILTER_CUBIC_EXT = VK_FILTER_CUBIC_IMG,
VK_FILTER_BEGIN_RANGE = VK_FILTER_NEAREST,
VK_FILTER_END_RANGE = VK_FILTER_LINEAR,
VK_FILTER_RANGE_SIZE = (VK_FILTER_LINEAR - VK_FILTER_NEAREST + 1),
@@ -1352,6 +1361,7 @@ typedef enum VkFormatFeatureFlagBits {
VK_FORMAT_FEATURE_SAMPLED_IMAGE_YCBCR_CONVERSION_CHROMA_RECONSTRUCTION_EXPLICIT_FORCEABLE_BIT_KHR = VK_FORMAT_FEATURE_SAMPLED_IMAGE_YCBCR_CONVERSION_CHROMA_RECONSTRUCTION_EXPLICIT_FORCEABLE_BIT,
VK_FORMAT_FEATURE_DISJOINT_BIT_KHR = VK_FORMAT_FEATURE_DISJOINT_BIT,
VK_FORMAT_FEATURE_COSITED_CHROMA_SAMPLES_BIT_KHR = VK_FORMAT_FEATURE_COSITED_CHROMA_SAMPLES_BIT,
VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_CUBIC_BIT_EXT = VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_CUBIC_BIT_IMG,
VK_FORMAT_FEATURE_FLAG_BITS_MAX_ENUM = 0x7FFFFFFF
} VkFormatFeatureFlagBits;
typedef VkFlags VkFormatFeatureFlags;
@@ -6244,7 +6254,7 @@ typedef struct VkPhysicalDeviceDepthStencilResolvePropertiesKHR {
#define VK_KHR_vulkan_memory_model 1
#define VK_KHR_VULKAN_MEMORY_MODEL_SPEC_VERSION 2
#define VK_KHR_VULKAN_MEMORY_MODEL_SPEC_VERSION 3
#define VK_KHR_VULKAN_MEMORY_MODEL_EXTENSION_NAME "VK_KHR_vulkan_memory_model"
typedef struct VkPhysicalDeviceVulkanMemoryModelFeaturesKHR {
@@ -6252,6 +6262,7 @@ typedef struct VkPhysicalDeviceVulkanMemoryModelFeaturesKHR {
void* pNext;
VkBool32 vulkanMemoryModel;
VkBool32 vulkanMemoryModelDeviceScope;
VkBool32 vulkanMemoryModelAvailabilityVisibilityChains;
} VkPhysicalDeviceVulkanMemoryModelFeaturesKHR;
@@ -7456,6 +7467,27 @@ typedef struct VkPipelineRasterizationConservativeStateCreateInfoEXT {
#define VK_EXT_depth_clip_enable 1
#define VK_EXT_DEPTH_CLIP_ENABLE_SPEC_VERSION 1
#define VK_EXT_DEPTH_CLIP_ENABLE_EXTENSION_NAME "VK_EXT_depth_clip_enable"
typedef VkFlags VkPipelineRasterizationDepthClipStateCreateFlagsEXT;
typedef struct VkPhysicalDeviceDepthClipEnableFeaturesEXT {
VkStructureType sType;
void* pNext;
VkBool32 depthClipEnable;
} VkPhysicalDeviceDepthClipEnableFeaturesEXT;
typedef struct VkPipelineRasterizationDepthClipStateCreateInfoEXT {
VkStructureType sType;
const void* pNext;
VkPipelineRasterizationDepthClipStateCreateFlagsEXT flags;
VkBool32 depthClipEnable;
} VkPipelineRasterizationDepthClipStateCreateInfoEXT;
#define VK_EXT_swapchain_colorspace 1
#define VK_EXT_SWAPCHAIN_COLOR_SPACE_SPEC_VERSION 3
#define VK_EXT_SWAPCHAIN_COLOR_SPACE_EXTENSION_NAME "VK_EXT_swapchain_colorspace"
@@ -8551,6 +8583,25 @@ typedef struct VkPipelineRepresentativeFragmentTestStateCreateInfoNV {
#define VK_EXT_filter_cubic 1
#define VK_EXT_FILTER_CUBIC_SPEC_VERSION 1
#define VK_EXT_FILTER_CUBIC_EXTENSION_NAME "VK_EXT_filter_cubic"
typedef struct VkPhysicalDeviceImageViewImageFormatInfoEXT {
VkStructureType sType;
void* pNext;
VkImageViewType imageViewType;
} VkPhysicalDeviceImageViewImageFormatInfoEXT;
typedef struct VkFilterCubicImageViewImageFormatPropertiesEXT {
VkStructureType sType;
void* pNext;
VkBool32 filterCubic;
VkBool32 filterCubicMinmax ;
} VkFilterCubicImageViewImageFormatPropertiesEXT;
#define VK_EXT_global_priority 1
#define VK_EXT_GLOBAL_PRIORITY_SPEC_VERSION 2
#define VK_EXT_GLOBAL_PRIORITY_EXTENSION_NAME "VK_EXT_global_priority"
@@ -9003,6 +9054,18 @@ typedef struct VkMemoryPriorityAllocateInfoEXT {
#define VK_NV_dedicated_allocation_image_aliasing 1
#define VK_NV_DEDICATED_ALLOCATION_IMAGE_ALIASING_SPEC_VERSION 1
#define VK_NV_DEDICATED_ALLOCATION_IMAGE_ALIASING_EXTENSION_NAME "VK_NV_dedicated_allocation_image_aliasing"
typedef struct VkPhysicalDeviceDedicatedAllocationImageAliasingFeaturesNV {
VkStructureType sType;
void* pNext;
VkBool32 dedicatedAllocationImageAliasing;
} VkPhysicalDeviceDedicatedAllocationImageAliasingFeaturesNV;
#define VK_EXT_buffer_device_address 1
typedef uint64_t VkDeviceAddress;
@@ -9089,6 +9152,76 @@ typedef struct VkValidationFeaturesEXT {
#define VK_NV_cooperative_matrix 1
#define VK_NV_COOPERATIVE_MATRIX_SPEC_VERSION 1
#define VK_NV_COOPERATIVE_MATRIX_EXTENSION_NAME "VK_NV_cooperative_matrix"
typedef enum VkComponentTypeNV {
VK_COMPONENT_TYPE_FLOAT16_NV = 0,
VK_COMPONENT_TYPE_FLOAT32_NV = 1,
VK_COMPONENT_TYPE_FLOAT64_NV = 2,
VK_COMPONENT_TYPE_SINT8_NV = 3,
VK_COMPONENT_TYPE_SINT16_NV = 4,
VK_COMPONENT_TYPE_SINT32_NV = 5,
VK_COMPONENT_TYPE_SINT64_NV = 6,
VK_COMPONENT_TYPE_UINT8_NV = 7,
VK_COMPONENT_TYPE_UINT16_NV = 8,
VK_COMPONENT_TYPE_UINT32_NV = 9,
VK_COMPONENT_TYPE_UINT64_NV = 10,
VK_COMPONENT_TYPE_BEGIN_RANGE_NV = VK_COMPONENT_TYPE_FLOAT16_NV,
VK_COMPONENT_TYPE_END_RANGE_NV = VK_COMPONENT_TYPE_UINT64_NV,
VK_COMPONENT_TYPE_RANGE_SIZE_NV = (VK_COMPONENT_TYPE_UINT64_NV - VK_COMPONENT_TYPE_FLOAT16_NV + 1),
VK_COMPONENT_TYPE_MAX_ENUM_NV = 0x7FFFFFFF
} VkComponentTypeNV;
typedef enum VkScopeNV {
VK_SCOPE_DEVICE_NV = 1,
VK_SCOPE_WORKGROUP_NV = 2,
VK_SCOPE_SUBGROUP_NV = 3,
VK_SCOPE_QUEUE_FAMILY_NV = 5,
VK_SCOPE_BEGIN_RANGE_NV = VK_SCOPE_DEVICE_NV,
VK_SCOPE_END_RANGE_NV = VK_SCOPE_QUEUE_FAMILY_NV,
VK_SCOPE_RANGE_SIZE_NV = (VK_SCOPE_QUEUE_FAMILY_NV - VK_SCOPE_DEVICE_NV + 1),
VK_SCOPE_MAX_ENUM_NV = 0x7FFFFFFF
} VkScopeNV;
typedef struct VkCooperativeMatrixPropertiesNV {
VkStructureType sType;
void* pNext;
uint32_t MSize;
uint32_t NSize;
uint32_t KSize;
VkComponentTypeNV AType;
VkComponentTypeNV BType;
VkComponentTypeNV CType;
VkComponentTypeNV DType;
VkScopeNV scope;
} VkCooperativeMatrixPropertiesNV;
typedef struct VkPhysicalDeviceCooperativeMatrixFeaturesNV {
VkStructureType sType;
void* pNext;
VkBool32 cooperativeMatrix;
VkBool32 cooperativeMatrixRobustBufferAccess;
} VkPhysicalDeviceCooperativeMatrixFeaturesNV;
typedef struct VkPhysicalDeviceCooperativeMatrixPropertiesNV {
VkStructureType sType;
void* pNext;
VkShaderStageFlags cooperativeMatrixSupportedStages;
} VkPhysicalDeviceCooperativeMatrixPropertiesNV;
typedef VkResult (VKAPI_PTR *PFN_vkGetPhysicalDeviceCooperativeMatrixPropertiesNV)(VkPhysicalDevice physicalDevice, uint32_t* pPropertyCount, VkCooperativeMatrixPropertiesNV* pProperties);
#ifndef VK_NO_PROTOTYPES
VKAPI_ATTR VkResult VKAPI_CALL vkGetPhysicalDeviceCooperativeMatrixPropertiesNV(
VkPhysicalDevice physicalDevice,
uint32_t* pPropertyCount,
VkCooperativeMatrixPropertiesNV* pProperties);
#endif
#ifdef __cplusplus
}
#endif

View File

@@ -132,7 +132,7 @@ if _drivers.contains('auto')
elif ['arm', 'aarch64'].contains(host_machine.cpu_family())
_drivers = [
'kmsro', 'v3d', 'vc4', 'freedreno', 'etnaviv', 'nouveau',
'tegra', 'virgl', 'swrast',
'tegra', 'virgl', 'swrast'
]
else
error('Unknown architecture @0@. Please pass -Dgallium-drivers to set driver options. Patches gladly accepted to fix this.'.format(
@@ -154,8 +154,10 @@ with_gallium_freedreno = _drivers.contains('freedreno')
with_gallium_softpipe = _drivers.contains('swrast')
with_gallium_vc4 = _drivers.contains('vc4')
with_gallium_v3d = _drivers.contains('v3d')
with_gallium_panfrost = _drivers.contains('panfrost')
with_gallium_etnaviv = _drivers.contains('etnaviv')
with_gallium_tegra = _drivers.contains('tegra')
with_gallium_iris = _drivers.contains('iris')
with_gallium_i915 = _drivers.contains('i915')
with_gallium_svga = _drivers.contains('svga')
with_gallium_virgl = _drivers.contains('virgl')
@@ -209,8 +211,8 @@ endif
if with_dri_i915 and with_gallium_i915
error('Only one i915 provider can be built')
endif
if with_gallium_kmsro and not (with_gallium_vc4 or with_gallium_etnaviv or with_gallium_freedreno)
error('kmsro driver requires one or more renderonly drivers (vc4, etnaviv, freedreno)')
if with_gallium_kmsro and not (with_gallium_vc4 or with_gallium_etnaviv or with_gallium_freedreno or with_gallium_panfrost)
error('kmsro driver requires one or more renderonly drivers (vc4, etnaviv, freedreno, panfrost)')
endif
if with_gallium_tegra and not with_gallium_nouveau
error('tegra driver requires nouveau driver')
@@ -327,12 +329,12 @@ else
with_egl = false
endif
if with_egl and not (with_platform_drm or with_platform_surfaceless or with_platform_android)
if with_egl and not (with_platform_drm or with_platform_surfaceless)
if with_gallium_radeonsi
error('RadeonSI requires the drm, surfaceless or android platform when using EGL')
error('RadeonSI requires drm or surfaceless platform when using EGL')
endif
if with_gallium_virgl
error('Virgl requires the drm, surfaceless or android platform when using EGL')
error('Virgl requires drm or surfaceless platform when using EGL')
endif
endif
@@ -616,7 +618,8 @@ if with_gallium_st_nine
if not with_gallium_softpipe
error('The nine state tracker requires gallium softpipe/llvmpipe.')
elif not (with_gallium_radeonsi or with_gallium_nouveau or with_gallium_r600
or with_gallium_r300 or with_gallium_svga or with_gallium_i915)
or with_gallium_r300 or with_gallium_svga or with_gallium_i915
or with_gallium_iris)
error('The nine state tracker requires at least one non-swrast gallium driver.')
endif
if not with_dri3
@@ -1213,7 +1216,6 @@ if _llvm != 'false'
with_gallium_opencl or _llvm == 'true'
),
static : not _shared_llvm,
method : 'config-tool',
)
with_llvm = dep_llvm.found()
endif
@@ -1388,14 +1390,12 @@ if with_platform_x11
dep_xshmfence = dependency('xshmfence', version : '>= 1.1')
endif
endif
if with_glx == 'dri' or with_glx == 'gallium-xlib'
dep_glproto = dependency('glproto', version : '>= 1.4.14')
endif
if with_glx == 'dri'
if with_glx == 'dri'
if with_dri_platform == 'drm'
dep_dri2proto = dependency('dri2proto', version : '>= 2.8')
dep_xxf86vm = dependency('xxf86vm')
endif
dep_glproto = dependency('glproto', version : '>= 1.4.14')
endif
if (with_egl or (
with_gallium_vdpau or with_gallium_xvmc or with_gallium_xa or
@@ -1467,6 +1467,10 @@ pkg = import('pkgconfig')
env_test = environment()
env_test.set('NM', find_program('nm').path())
# This quirk needs to be applied to sources with functions defined in assembly
# as GCC LTO drops them. See: https://bugs.freedesktop.org/show_bug.cgi?id=109391
gcc_lto_quirk = (cc.get_id() == 'gcc') ? ['-fno-lto'] : []
subdir('include')
subdir('bin')
subdir('src')

View File

@@ -60,7 +60,7 @@ option(
choices : [
'', 'auto', 'kmsro', 'radeonsi', 'r300', 'r600', 'nouveau', 'freedreno',
'swrast', 'v3d', 'vc4', 'etnaviv', 'tegra', 'i915', 'svga', 'virgl',
'swr',
'swr', 'panfrost', 'iris'
],
description : 'List of gallium drivers to build. If this is set to auto all drivers applicable to the target OS/architecture will be built'
)
@@ -167,6 +167,12 @@ option(
value : '',
description : 'Location relative to prefix to put vulkan icds on install. Default: $datadir/vulkan/icd.d'
)
option(
'vulkan-overlay-layer',
type : 'boolean',
value : false,
description : 'Whether to build the vulkan overlay layer'
)
option(
'shared-glapi',
type : 'boolean',

View File

@@ -48,12 +48,7 @@ import source_list
# a path directly. We want to support both, so we need to detect the SCons version,
# for which no API is provided by SCons 8-P
# Scons version string has consistently been in this format:
# MajorVersion.MinorVersion.Patch[.alpha/beta.yyyymmdd]
# so this formula should cover all versions regardless of type
# stable, alpha or beta.
# For simplicity alpha and beta flags are removed.
scons_version = tuple(map(int, SCons.__version__.split('.')[:3]))
scons_version = tuple(map(int, SCons.__version__.split('.')))
def quietCommandLines(env):
# Quiet command lines

View File

@@ -308,20 +308,7 @@ def generate(env):
if env.GetOption('num_jobs') <= 1:
env.SetOption('num_jobs', num_jobs())
# Speed up dependency checking. See
# - https://github.com/SCons/scons/wiki/GoFastButton
# - https://bugs.freedesktop.org/show_bug.cgi?id=109443
# Scons version string has consistently been in this format:
# MajorVersion.MinorVersion.Patch[.alpha/beta.yyyymmdd]
# so this formula should cover all versions regardless of type
# stable, alpha or beta.
# For simplicity alpha and beta flags are removed.
scons_version = distutils.version.StrictVersion('.'.join(SCons.__version__.split('.')[:3]))
if scons_version < distutils.version.StrictVersion('3.0.2') or \
scons_version > distutils.version.StrictVersion('3.0.4'):
env.Decider('MD5-timestamp')
env.Decider('MD5-timestamp')
env.SetOption('max_drift', 60)
# C preprocessor options

View File

@@ -136,3 +136,18 @@ libglsl_util_la_SOURCES = \
mesa/program/prog_parameter.c \
mesa/program/symbol_table.c \
mesa/program/dummy_errors.c
EXTRA_DIST += \
tools/imgui/imconfig.h \
tools/imgui/imgui.cpp \
tools/imgui/imgui.h \
tools/imgui/imgui_draw.cpp \
tools/imgui/imgui_demo.cpp \
tools/imgui/imgui_internal.h \
tools/imgui/imgui_memory_editor.h \
tools/imgui/stb_rect_pack.h \
tools/imgui/stb_textedit.h \
tools/imgui/stb_truetype.h \
tools/imgui/README \
tools/imgui/LICENSE.txt \
tools/imgui/meson.build

View File

@@ -367,7 +367,9 @@ bool ac_query_gpu_info(int fd, amdgpu_device_handle dev,
info->has_syncobj_wait_for_submit = info->has_syncobj && info->drm_minor >= 20;
info->has_fence_to_handle = info->has_syncobj && info->drm_minor >= 21;
info->has_ctx_priority = info->drm_minor >= 22;
info->has_local_buffers = info->drm_minor >= 20;
/* TODO: Enable this once the kernel handles it efficiently. */
info->has_local_buffers = info->drm_minor >= 20 &&
!info->has_dedicated_vram;
info->kernel_flushes_hdp_before_ib = true;
info->htile_cmask_support_1d_tiling = true;
info->si_TA_CS_BC_BASE_ADDR_allowed = true;

View File

@@ -172,6 +172,12 @@ static inline unsigned ac_get_max_simd_waves(enum radeon_family family)
}
}
static inline uint32_t
ac_get_num_physical_sgprs(enum chip_class chip_class)
{
return chip_class >= VI ? 800 : 512;
}
#ifdef __cplusplus
}
#endif

View File

@@ -219,6 +219,16 @@ ac_to_integer_type(struct ac_llvm_context *ctx, LLVMTypeRef t)
return LLVMVectorType(to_integer_type_scalar(ctx, elem_type),
LLVMGetVectorSize(t));
}
if (LLVMGetTypeKind(t) == LLVMPointerTypeKind) {
switch (LLVMGetPointerAddressSpace(t)) {
case AC_ADDR_SPACE_GLOBAL:
return ctx->i64;
case AC_ADDR_SPACE_LDS:
return ctx->i32;
default:
unreachable("unhandled address space");
}
}
return to_integer_type_scalar(ctx, t);
}
@@ -226,6 +236,9 @@ LLVMValueRef
ac_to_integer(struct ac_llvm_context *ctx, LLVMValueRef v)
{
LLVMTypeRef type = LLVMTypeOf(v);
if (LLVMGetTypeKind(type) == LLVMPointerTypeKind) {
return LLVMBuildPtrToInt(ctx->builder, v, ac_to_integer_type(ctx, type), "");
}
return LLVMBuildBitCast(ctx->builder, v, ac_to_integer_type(ctx, type), "");
}
@@ -535,10 +548,11 @@ ac_build_gather_values(struct ac_llvm_context *ctx,
/* Expand a scalar or vector to <dst_channels x type> by filling the remaining
* channels with undef. Extract at most src_channels components from the input.
*/
LLVMValueRef ac_build_expand(struct ac_llvm_context *ctx,
LLVMValueRef value,
unsigned src_channels,
unsigned dst_channels)
static LLVMValueRef
ac_build_expand(struct ac_llvm_context *ctx,
LLVMValueRef value,
unsigned src_channels,
unsigned dst_channels)
{
LLVMTypeRef elemtype;
LLVMValueRef chan[dst_channels];
@@ -606,7 +620,7 @@ ac_build_fdiv(struct ac_llvm_context *ctx,
* If we do (num * (1 / den)), LLVM does:
* return num * v_rcp_f32(den);
*/
LLVMValueRef one = LLVMTypeOf(num) == ctx->f64 ? ctx->f64_1 : ctx->f32_1;
LLVMValueRef one = LLVMConstReal(LLVMTypeOf(num), 1.0);
LLVMValueRef rcp = LLVMBuildFDiv(ctx->builder, one, den, "");
LLVMValueRef ret = LLVMBuildFMul(ctx->builder, num, rcp, "");
@@ -1364,23 +1378,74 @@ ac_build_tbuffer_load_short(struct ac_llvm_context *ctx,
LLVMValueRef immoffset,
LLVMValueRef glc)
{
const char *name = "llvm.amdgcn.tbuffer.load.i32";
LLVMTypeRef type = ctx->i32;
LLVMValueRef params[] = {
rsrc,
vindex,
voffset,
soffset,
immoffset,
LLVMConstInt(ctx->i32, V_008F0C_BUF_DATA_FORMAT_16, false),
LLVMConstInt(ctx->i32, V_008F0C_BUF_NUM_FORMAT_UINT, false),
glc,
ctx->i1false,
};
LLVMValueRef res = ac_build_intrinsic(ctx, name, type, params, 9, 0);
unsigned dfmt = V_008F0C_BUF_DATA_FORMAT_16;
unsigned nfmt = V_008F0C_BUF_NUM_FORMAT_UINT;
LLVMValueRef res;
if (HAVE_LLVM >= 0x0800) {
voffset = LLVMBuildAdd(ctx->builder, voffset, immoffset, "");
res = ac_build_llvm8_tbuffer_load(ctx, rsrc, vindex, voffset,
soffset, 1, dfmt, nfmt, glc,
false, true, true);
} else {
const char *name = "llvm.amdgcn.tbuffer.load.i32";
LLVMTypeRef type = ctx->i32;
LLVMValueRef params[] = {
rsrc,
vindex,
voffset,
soffset,
immoffset,
LLVMConstInt(ctx->i32, dfmt, false),
LLVMConstInt(ctx->i32, nfmt, false),
glc,
ctx->i1false,
};
res = ac_build_intrinsic(ctx, name, type, params, 9, 0);
}
return LLVMBuildTrunc(ctx->builder, res, ctx->i16, "");
}
LLVMValueRef
ac_build_llvm8_tbuffer_load(struct ac_llvm_context *ctx,
LLVMValueRef rsrc,
LLVMValueRef vindex,
LLVMValueRef voffset,
LLVMValueRef soffset,
unsigned num_channels,
unsigned dfmt,
unsigned nfmt,
bool glc,
bool slc,
bool can_speculate,
bool structurized)
{
LLVMValueRef args[6];
int idx = 0;
args[idx++] = LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, "");
if (structurized)
args[idx++] = vindex ? vindex : ctx->i32_0;
args[idx++] = voffset ? voffset : ctx->i32_0;
args[idx++] = soffset ? soffset : ctx->i32_0;
args[idx++] = LLVMConstInt(ctx->i32, dfmt | (nfmt << 4), 0);
args[idx++] = LLVMConstInt(ctx->i32, (glc ? 1 : 0) + (slc ? 2 : 0), 0);
unsigned func = CLAMP(num_channels, 1, 3) - 1;
LLVMTypeRef types[] = {ctx->i32, ctx->v2i32, ctx->v4i32};
const char *type_names[] = {"i32", "v2i32", "v4i32"};
const char *indexing_kind = structurized ? "struct" : "raw";
char name[256];
snprintf(name, sizeof(name), "llvm.amdgcn.%s.tbuffer.load.%s",
indexing_kind, type_names[func]);
return ac_build_intrinsic(ctx, name, types[func], args,
idx,
ac_get_load_intr_attribs(can_speculate));
}
/**
* Set range metadata on an instruction. This can only be used on load and
* call instructions. If you know an instruction can only produce the values
@@ -1570,16 +1635,20 @@ ac_build_umsb(struct ac_llvm_context *ctx,
LLVMValueRef ac_build_fmin(struct ac_llvm_context *ctx, LLVMValueRef a,
LLVMValueRef b)
{
char name[64];
snprintf(name, sizeof(name), "llvm.minnum.f%d", ac_get_elem_bits(ctx, LLVMTypeOf(a)));
LLVMValueRef args[2] = {a, b};
return ac_build_intrinsic(ctx, "llvm.minnum.f32", ctx->f32, args, 2,
return ac_build_intrinsic(ctx, name, LLVMTypeOf(a), args, 2,
AC_FUNC_ATTR_READNONE);
}
LLVMValueRef ac_build_fmax(struct ac_llvm_context *ctx, LLVMValueRef a,
LLVMValueRef b)
{
char name[64];
snprintf(name, sizeof(name), "llvm.maxnum.f%d", ac_get_elem_bits(ctx, LLVMTypeOf(a)));
LLVMValueRef args[2] = {a, b};
return ac_build_intrinsic(ctx, "llvm.maxnum.f32", ctx->f32, args, 2,
return ac_build_intrinsic(ctx, name, LLVMTypeOf(a), args, 2,
AC_FUNC_ATTR_READNONE);
}
@@ -1606,8 +1675,9 @@ LLVMValueRef ac_build_umin(struct ac_llvm_context *ctx, LLVMValueRef a,
LLVMValueRef ac_build_clamp(struct ac_llvm_context *ctx, LLVMValueRef value)
{
return ac_build_fmin(ctx, ac_build_fmax(ctx, value, ctx->f32_0),
ctx->f32_1);
LLVMTypeRef t = LLVMTypeOf(value);
return ac_build_fmin(ctx, ac_build_fmax(ctx, value, LLVMConstReal(t, 0.0)),
LLVMConstReal(t, 1.0));
}
void ac_build_export(struct ac_llvm_context *ctx, struct ac_export_args *a)
@@ -2039,30 +2109,11 @@ LLVMValueRef ac_build_fract(struct ac_llvm_context *ctx, LLVMValueRef src0,
LLVMValueRef ac_build_isign(struct ac_llvm_context *ctx, LLVMValueRef src0,
unsigned bitsize)
{
LLVMValueRef cmp, val, zero, one;
LLVMTypeRef type;
switch (bitsize) {
case 64:
type = ctx->i64;
zero = ctx->i64_0;
one = ctx->i64_1;
break;
case 32:
type = ctx->i32;
zero = ctx->i32_0;
one = ctx->i32_1;
break;
case 16:
type = ctx->i16;
zero = ctx->i16_0;
one = ctx->i16_1;
break;
default:
unreachable(!"invalid bitsize");
break;
}
LLVMTypeRef type = LLVMIntTypeInContext(ctx->context, bitsize);
LLVMValueRef zero = LLVMConstInt(type, 0, false);
LLVMValueRef one = LLVMConstInt(type, 1, false);
LLVMValueRef cmp, val;
cmp = LLVMBuildICmp(ctx->builder, LLVMIntSGT, src0, zero, "");
val = LLVMBuildSelect(ctx->builder, cmp, one, src0, "");
cmp = LLVMBuildICmp(ctx->builder, LLVMIntSGE, val, zero, "");
@@ -3455,7 +3506,7 @@ ac_build_wg_scan_bottom(struct ac_llvm_context *ctx, struct ac_wg_scan *ws)
/* ws->result_reduce is already the correct value */
if (ws->enable_inclusive)
ws->result_inclusive = ac_build_alu_op(ctx, ws->result_inclusive, ws->src, ws->op);
ws->result_inclusive = ac_build_alu_op(ctx, ws->result_exclusive, ws->src, ws->op);
if (ws->enable_exclusive)
ws->result_exclusive = ac_build_alu_op(ctx, ws->result_exclusive, ws->extra, ws->op);
}

View File

@@ -171,9 +171,6 @@ LLVMValueRef
ac_build_gather_values(struct ac_llvm_context *ctx,
LLVMValueRef *values,
unsigned value_count);
LLVMValueRef ac_build_expand(struct ac_llvm_context *ctx,
LLVMValueRef value,
unsigned src_channels, unsigned dst_channels);
LLVMValueRef ac_build_expand_to_vec4(struct ac_llvm_context *ctx,
LLVMValueRef value,
unsigned num_channels);
@@ -309,6 +306,20 @@ ac_build_tbuffer_load_short(struct ac_llvm_context *ctx,
LLVMValueRef immoffset,
LLVMValueRef glc);
LLVMValueRef
ac_build_llvm8_tbuffer_load(struct ac_llvm_context *ctx,
LLVMValueRef rsrc,
LLVMValueRef vindex,
LLVMValueRef voffset,
LLVMValueRef soffset,
unsigned num_channels,
unsigned dfmt,
unsigned nfmt,
bool glc,
bool slc,
bool can_speculate,
bool structurized);
LLVMValueRef
ac_get_thread_id(struct ac_llvm_context *ctx);

View File

@@ -151,14 +151,13 @@ static LLVMTargetMachineRef ac_create_target_machine(enum radeon_family family,
LLVMTargetRef target = ac_get_llvm_target(triple);
snprintf(features, sizeof(features),
"+DumpCode,-fp32-denormals,+fp64-denormals%s%s%s%s%s%s",
"+DumpCode,-fp32-denormals,+fp64-denormals%s%s%s%s%s",
HAVE_LLVM >= 0x0800 ? "" : ",+vgpr-spilling",
tm_options & AC_TM_SISCHED ? ",+si-scheduler" : "",
tm_options & AC_TM_FORCE_ENABLE_XNACK ? ",+xnack" : "",
tm_options & AC_TM_FORCE_DISABLE_XNACK ? ",-xnack" : "",
tm_options & AC_TM_PROMOTE_ALLOCA_TO_SCRATCH ? ",-promote-alloca" : "",
tm_options & AC_TM_NO_LOAD_STORE_OPT ? ",-load-store-opt" : "");
tm_options & AC_TM_PROMOTE_ALLOCA_TO_SCRATCH ? ",-promote-alloca" : "");
LLVMTargetMachineRef tm = LLVMCreateTargetMachine(
target,
triple,

View File

@@ -65,7 +65,6 @@ enum ac_target_machine_options {
AC_TM_CHECK_IR = (1 << 5),
AC_TM_ENABLE_GLOBAL_ISEL = (1 << 6),
AC_TM_CREATE_LOW_OPT = (1 << 7),
AC_TM_NO_LOAD_STORE_OPT = (1 << 8),
};
enum ac_float_mode {

View File

@@ -657,8 +657,7 @@ static void visit_alu(struct ac_nir_context *ctx, const nir_alu_instr *instr)
break;
case nir_op_frcp:
src[0] = ac_to_float(&ctx->ac, src[0]);
result = ac_build_fdiv(&ctx->ac, instr->dest.dest.ssa.bit_size == 32 ? ctx->ac.f32_1 : ctx->ac.f64_1,
src[0]);
result = ac_build_fdiv(&ctx->ac, LLVMConstReal(LLVMTypeOf(src[0]), 1.0), src[0]);
break;
case nir_op_iand:
result = LLVMBuildAnd(ctx->ac.builder, src[0], src[1], "");
@@ -789,8 +788,7 @@ static void visit_alu(struct ac_nir_context *ctx, const nir_alu_instr *instr)
case nir_op_frsq:
result = emit_intrin_1f_param(&ctx->ac, "llvm.sqrt",
ac_to_float_type(&ctx->ac, def_type), src[0]);
result = ac_build_fdiv(&ctx->ac, instr->dest.dest.ssa.bit_size == 32 ? ctx->ac.f32_1 : ctx->ac.f64_1,
result);
result = ac_build_fdiv(&ctx->ac, LLVMConstReal(LLVMTypeOf(result), 1.0), result);
break;
case nir_op_frexp_exp:
src[0] = ac_to_float(&ctx->ac, src[0]);
@@ -803,6 +801,10 @@ static void visit_alu(struct ac_nir_context *ctx, const nir_alu_instr *instr)
result = ac_build_intrinsic(&ctx->ac, "llvm.amdgcn.frexp.mant.f64",
ctx->ac.f64, src, 1, AC_FUNC_ATTR_READNONE);
break;
case nir_op_fpow:
result = emit_intrin_2f_param(&ctx->ac, "llvm.pow",
ac_to_float_type(&ctx->ac, def_type), src[0], src[1]);
break;
case nir_op_fmax:
result = emit_intrin_2f_param(&ctx->ac, "llvm.maxnum",
ac_to_float_type(&ctx->ac, def_type), src[0], src[1]);
@@ -831,8 +833,10 @@ static void visit_alu(struct ac_nir_context *ctx, const nir_alu_instr *instr)
break;
case nir_op_ldexp:
src[0] = ac_to_float(&ctx->ac, src[0]);
if (ac_get_elem_bits(&ctx->ac, LLVMTypeOf(src[0])) == 32)
if (ac_get_elem_bits(&ctx->ac, def_type) == 32)
result = ac_build_intrinsic(&ctx->ac, "llvm.amdgcn.ldexp.f32", ctx->ac.f32, src, 2, AC_FUNC_ATTR_READNONE);
else if (ac_get_elem_bits(&ctx->ac, def_type) == 16)
result = ac_build_intrinsic(&ctx->ac, "llvm.amdgcn.ldexp.f16", ctx->ac.f16, src, 2, AC_FUNC_ATTR_READNONE);
else
result = ac_build_intrinsic(&ctx->ac, "llvm.amdgcn.ldexp.f64", ctx->ac.f64, src, 2, AC_FUNC_ATTR_READNONE);
break;
@@ -884,6 +888,8 @@ static void visit_alu(struct ac_nir_context *ctx, const nir_alu_instr *instr)
break;
case nir_op_f2f16_rtz:
src[0] = ac_to_float(&ctx->ac, src[0]);
if (LLVMTypeOf(src[0]) == ctx->ac.f64)
src[0] = LLVMBuildFPTrunc(ctx->ac.builder, src[0], ctx->ac.f32, "");
LLVMValueRef param[2] = { src[0], ctx->ac.f32_0 };
result = ac_build_cvt_pkrtz_f16(&ctx->ac, param);
result = LLVMBuildExtractElement(ctx->ac.builder, result, ctx->ac.i32_0, "");
@@ -1019,17 +1025,10 @@ static void visit_alu(struct ac_nir_context *ctx, const nir_alu_instr *instr)
LLVMValueRef in[3];
for (unsigned chan = 0; chan < 3; chan++)
in[chan] = ac_llvm_extract_elem(&ctx->ac, src[0], chan);
results[0] = ac_build_intrinsic(&ctx->ac, "llvm.amdgcn.cubesc",
results[0] = ac_build_intrinsic(&ctx->ac, "llvm.amdgcn.cubetc",
ctx->ac.f32, in, 3, AC_FUNC_ATTR_READNONE);
results[1] = ac_build_intrinsic(&ctx->ac, "llvm.amdgcn.cubetc",
results[1] = ac_build_intrinsic(&ctx->ac, "llvm.amdgcn.cubesc",
ctx->ac.f32, in, 3, AC_FUNC_ATTR_READNONE);
LLVMValueRef ma = ac_build_intrinsic(&ctx->ac, "llvm.amdgcn.cubema",
ctx->ac.f32, in, 3, AC_FUNC_ATTR_READNONE);
results[0] = ac_build_fdiv(&ctx->ac, results[0], ma);
results[1] = ac_build_fdiv(&ctx->ac, results[1], ma);
LLVMValueRef offset = LLVMConstReal(ctx->ac.f32, 0.5);
results[0] = LLVMBuildFAdd(ctx->ac.builder, results[0], offset, "");
results[1] = LLVMBuildFAdd(ctx->ac.builder, results[1], offset, "");
result = ac_build_gather_values(&ctx->ac, results, 2);
break;
}
@@ -1121,6 +1120,10 @@ static void visit_load_const(struct ac_nir_context *ctx,
for (unsigned i = 0; i < instr->def.num_components; ++i) {
switch (instr->def.bit_size) {
case 8:
values[i] = LLVMConstInt(element_type,
instr->value.u8[i], false);
break;
case 16:
values[i] = LLVMConstInt(element_type,
instr->value.u16[i], false);
@@ -1399,10 +1402,31 @@ static LLVMValueRef visit_load_push_constant(struct ac_nir_context *ctx,
nir_intrinsic_instr *instr)
{
LLVMValueRef ptr, addr;
LLVMValueRef src0 = get_src(ctx, instr->src[0]);
unsigned index = nir_intrinsic_base(instr);
addr = LLVMConstInt(ctx->ac.i32, nir_intrinsic_base(instr), 0);
addr = LLVMBuildAdd(ctx->ac.builder, addr,
get_src(ctx, instr->src[0]), "");
addr = LLVMConstInt(ctx->ac.i32, index, 0);
addr = LLVMBuildAdd(ctx->ac.builder, addr, src0, "");
/* Load constant values from user SGPRS when possible, otherwise
* fallback to the default path that loads directly from memory.
*/
if (LLVMIsConstant(src0) &&
instr->dest.ssa.bit_size == 32) {
unsigned count = instr->dest.ssa.num_components;
unsigned offset = index;
offset += LLVMConstIntGetZExtValue(src0);
offset /= 4;
offset -= ctx->abi->base_inline_push_consts;
if (offset + count <= ctx->abi->num_inline_push_consts) {
return ac_build_gather_values(&ctx->ac,
ctx->abi->inline_push_consts + offset,
count);
}
}
ptr = ac_build_gep0(&ctx->ac, ctx->abi->push_constants, addr);
@@ -1885,10 +1909,19 @@ static LLVMValueRef load_tess_varyings(struct ac_nir_context *ctx,
return LLVMBuildBitCast(ctx->ac.builder, result, dest_type, "");
}
static unsigned
type_scalar_size_bytes(const struct glsl_type *type)
{
assert(glsl_type_is_vector_or_scalar(type) ||
glsl_type_is_matrix(type));
return glsl_type_is_boolean(type) ? 4 : glsl_get_bit_size(type) / 8;
}
static LLVMValueRef visit_load_var(struct ac_nir_context *ctx,
nir_intrinsic_instr *instr)
{
nir_variable *var = nir_deref_instr_get_variable(nir_instr_as_deref(instr->src[0].ssa->parent_instr));
nir_deref_instr *deref = nir_instr_as_deref(instr->src[0].ssa->parent_instr);
nir_variable *var = nir_deref_instr_get_variable(deref);
LLVMValueRef values[8];
int idx = 0;
@@ -1898,7 +1931,7 @@ static LLVMValueRef visit_load_var(struct ac_nir_context *ctx,
LLVMValueRef ret;
unsigned const_index;
unsigned stride = 4;
int mode = nir_var_mem_shared;
int mode = deref->mode;
if (var) {
bool vs_in = ctx->stage == MESA_SHADER_VERTEX &&
@@ -1907,7 +1940,7 @@ static LLVMValueRef visit_load_var(struct ac_nir_context *ctx,
comp = var->data.location_frac;
mode = var->data.mode;
get_deref_offset(ctx, nir_instr_as_deref(instr->src[0].ssa->parent_instr), vs_in, NULL, NULL,
get_deref_offset(ctx, deref, vs_in, NULL, NULL,
&const_index, &indir_index);
if (var->data.compact) {
@@ -1917,7 +1950,10 @@ static LLVMValueRef visit_load_var(struct ac_nir_context *ctx,
}
}
if (instr->dest.ssa.bit_size == 64)
if (instr->dest.ssa.bit_size == 64 &&
(deref->mode == nir_var_shader_in ||
deref->mode == nir_var_shader_out ||
deref->mode == nir_var_function_temp))
ve *= 2;
switch (mode) {
@@ -1931,8 +1967,8 @@ static LLVMValueRef visit_load_var(struct ac_nir_context *ctx,
LLVMTypeRef type = LLVMIntTypeInContext(ctx->ac.context, instr->dest.ssa.bit_size);
LLVMValueRef indir_index;
unsigned const_index, vertex_index;
get_deref_offset(ctx, nir_instr_as_deref(instr->src[0].ssa->parent_instr),
false, &vertex_index, NULL, &const_index, &indir_index);
get_deref_offset(ctx, deref, false, &vertex_index, NULL,
&const_index, &indir_index);
return ctx->abi->load_inputs(ctx->abi, var->data.location,
var->data.driver_location,
@@ -2006,6 +2042,32 @@ static LLVMValueRef visit_load_var(struct ac_nir_context *ctx,
}
}
break;
case nir_var_mem_global: {
LLVMValueRef address = get_src(ctx, instr->src[0]);
unsigned explicit_stride = glsl_get_explicit_stride(deref->type);
unsigned natural_stride = type_scalar_size_bytes(deref->type);
unsigned stride = explicit_stride ? explicit_stride : natural_stride;
LLVMTypeRef result_type = get_def_type(ctx, &instr->dest.ssa);
if (stride != natural_stride) {
LLVMTypeRef ptr_type = LLVMPointerType(LLVMGetElementType(result_type),
LLVMGetPointerAddressSpace(LLVMTypeOf(address)));
address = LLVMBuildBitCast(ctx->ac.builder, address, ptr_type , "");
for (unsigned i = 0; i < instr->dest.ssa.num_components; ++i) {
LLVMValueRef offset = LLVMConstInt(ctx->ac.i32, i * stride / natural_stride, 0);
values[i] = LLVMBuildLoad(ctx->ac.builder,
ac_build_gep_ptr(&ctx->ac, address, offset), "");
}
return ac_build_gather_values(&ctx->ac, values, instr->dest.ssa.num_components);
} else {
LLVMTypeRef ptr_type = LLVMPointerType(result_type,
LLVMGetPointerAddressSpace(LLVMTypeOf(address)));
address = LLVMBuildBitCast(ctx->ac.builder, address, ptr_type , "");
LLVMValueRef val = LLVMBuildLoad(ctx->ac.builder, address, "");
return val;
}
}
default:
unreachable("unhandle variable mode");
}
@@ -2040,7 +2102,9 @@ visit_store_var(struct ac_nir_context *ctx,
}
}
if (ac_get_elem_bits(&ctx->ac, LLVMTypeOf(src)) == 64) {
if (ac_get_elem_bits(&ctx->ac, LLVMTypeOf(src)) == 64 &&
(deref->mode == nir_var_shader_out ||
deref->mode == nir_var_function_temp)) {
src = LLVMBuildBitCast(ctx->ac.builder, src,
LLVMVectorType(ctx->ac.f32, ac_get_llvm_num_components(src) * 2),
@@ -2124,33 +2188,52 @@ visit_store_var(struct ac_nir_context *ctx,
}
}
break;
case nir_var_mem_global:
case nir_var_mem_shared: {
int writemask = instr->const_index[0];
LLVMValueRef address = get_src(ctx, instr->src[0]);
LLVMValueRef val = get_src(ctx, instr->src[1]);
if (writemask == (1u << ac_get_llvm_num_components(val)) - 1) {
val = LLVMBuildBitCast(
ctx->ac.builder, val,
LLVMGetElementType(LLVMTypeOf(address)), "");
unsigned explicit_stride = glsl_get_explicit_stride(deref->type);
unsigned natural_stride = type_scalar_size_bytes(deref->type);
unsigned stride = explicit_stride ? explicit_stride : natural_stride;
LLVMTypeRef ptr_type = LLVMPointerType(LLVMTypeOf(val),
LLVMGetPointerAddressSpace(LLVMTypeOf(address)));
address = LLVMBuildBitCast(ctx->ac.builder, address, ptr_type , "");
if (writemask == (1u << ac_get_llvm_num_components(val)) - 1 &&
stride == natural_stride) {
LLVMTypeRef ptr_type = LLVMPointerType(LLVMTypeOf(val),
LLVMGetPointerAddressSpace(LLVMTypeOf(address)));
address = LLVMBuildBitCast(ctx->ac.builder, address, ptr_type , "");
val = LLVMBuildBitCast(ctx->ac.builder, val,
LLVMGetElementType(LLVMTypeOf(address)), "");
LLVMBuildStore(ctx->ac.builder, val, address);
} else {
LLVMTypeRef ptr_type = LLVMPointerType(LLVMGetElementType(LLVMTypeOf(val)),
LLVMGetPointerAddressSpace(LLVMTypeOf(address)));
address = LLVMBuildBitCast(ctx->ac.builder, address, ptr_type , "");
for (unsigned chan = 0; chan < 4; chan++) {
if (!(writemask & (1 << chan)))
continue;
LLVMValueRef ptr =
LLVMBuildStructGEP(ctx->ac.builder,
address, chan, "");
LLVMValueRef offset = LLVMConstInt(ctx->ac.i32, chan * stride / natural_stride, 0);
LLVMValueRef ptr = ac_build_gep_ptr(&ctx->ac, address, offset);
LLVMValueRef src = ac_llvm_extract_elem(&ctx->ac, val,
chan);
src = LLVMBuildBitCast(
ctx->ac.builder, src,
LLVMGetElementType(LLVMTypeOf(ptr)), "");
src = LLVMBuildBitCast(ctx->ac.builder, src,
LLVMGetElementType(LLVMTypeOf(ptr)), "");
LLVMBuildStore(ctx->ac.builder, src, ptr);
}
}
break;
}
default:
abort();
break;
}
}
@@ -2359,12 +2442,10 @@ static void get_image_coords(struct ac_nir_context *ctx,
}
static LLVMValueRef get_image_buffer_descriptor(struct ac_nir_context *ctx,
const nir_intrinsic_instr *instr,
bool write, bool atomic)
const nir_intrinsic_instr *instr, bool write)
{
LLVMValueRef rsrc = get_image_descriptor(ctx, instr, AC_DESC_BUFFER, write);
if (ctx->abi->gfx9_stride_size_workaround ||
(ctx->abi->gfx9_stride_size_workaround_for_atomic && atomic)) {
if (ctx->abi->gfx9_stride_size_workaround) {
LLVMValueRef elem_count = LLVMBuildExtractElement(ctx->ac.builder, rsrc, LLVMConstInt(ctx->ac.i32, 2, 0), "");
LLVMValueRef stride = LLVMBuildExtractElement(ctx->ac.builder, rsrc, LLVMConstInt(ctx->ac.i32, 1, 0), "");
stride = LLVMBuildLShr(ctx->ac.builder, stride, LLVMConstInt(ctx->ac.i32, 16, 0), "");
@@ -2397,7 +2478,7 @@ static LLVMValueRef visit_image_load(struct ac_nir_context *ctx,
unsigned num_channels = util_last_bit(mask);
LLVMValueRef rsrc, vindex;
rsrc = get_image_buffer_descriptor(ctx, instr, false, false);
rsrc = get_image_buffer_descriptor(ctx, instr, false);
vindex = LLVMBuildExtractElement(ctx->ac.builder, get_src(ctx, instr->src[1]),
ctx->ac.i32_0, "");
@@ -2441,12 +2522,12 @@ static void visit_image_store(struct ac_nir_context *ctx,
if (dim == GLSL_SAMPLER_DIM_BUF) {
char name[48];
const char *types[] = { "f32", "v2f32", "v4f32" };
LLVMValueRef rsrc = get_image_buffer_descriptor(ctx, instr, true, false);
LLVMValueRef rsrc = get_image_buffer_descriptor(ctx, instr, true);
LLVMValueRef src = ac_to_float(&ctx->ac, get_src(ctx, instr->src[3]));
unsigned src_channels = ac_get_llvm_num_components(src);
if (src_channels == 3)
src = ac_build_expand(&ctx->ac, src, 3, 4);
src = ac_build_expand_to_vec4(&ctx->ac, src, 3);
params[0] = src; /* data */
params[1] = rsrc;
@@ -2537,14 +2618,11 @@ static LLVMValueRef visit_image_atomic(struct ac_nir_context *ctx,
params[param_count++] = get_src(ctx, instr->src[3]);
if (glsl_get_sampler_dim(type) == GLSL_SAMPLER_DIM_BUF) {
params[param_count++] = get_image_buffer_descriptor(ctx, instr, true, true);
params[param_count++] = get_image_buffer_descriptor(ctx, instr, true);
params[param_count++] = LLVMBuildExtractElement(ctx->ac.builder, get_src(ctx, instr->src[1]),
ctx->ac.i32_0, ""); /* vindex */
params[param_count++] = ctx->ac.i32_0; /* voffset */
if (HAVE_LLVM >= 0x900) {
/* XXX: The new raw/struct atomic intrinsics are buggy
* with LLVM 8, see r358579.
*/
if (HAVE_LLVM >= 0x800) {
params[param_count++] = ctx->ac.i32_0; /* soffset */
params[param_count++] = ctx->ac.i32_0; /* slc */
@@ -3105,8 +3183,7 @@ static void visit_intrinsic(struct ac_nir_context *ctx,
ctx->abi->frag_pos[2],
ac_build_fdiv(&ctx->ac, ctx->ac.f32_1, ctx->abi->frag_pos[3])
};
result = ac_to_integer(&ctx->ac,
ac_build_gather_values(&ctx->ac, values, 4));
result = ac_build_gather_values(&ctx->ac, values, 4);
break;
}
case nir_intrinsic_load_front_face:
@@ -3915,7 +3992,8 @@ glsl_to_llvm_type(struct ac_llvm_context *ac,
static void visit_deref(struct ac_nir_context *ctx,
nir_deref_instr *instr)
{
if (instr->mode != nir_var_mem_shared)
if (instr->mode != nir_var_mem_shared &&
instr->mode != nir_var_mem_global)
return;
LLVMValueRef result = NULL;
@@ -3926,22 +4004,79 @@ static void visit_deref(struct ac_nir_context *ctx,
break;
}
case nir_deref_type_struct:
result = ac_build_gep0(&ctx->ac, get_src(ctx, instr->parent),
LLVMConstInt(ctx->ac.i32, instr->strct.index, 0));
if (instr->mode == nir_var_mem_global) {
nir_deref_instr *parent = nir_deref_instr_parent(instr);
uint64_t offset = glsl_get_struct_field_offset(parent->type,
instr->strct.index);
result = ac_build_gep_ptr(&ctx->ac, get_src(ctx, instr->parent),
LLVMConstInt(ctx->ac.i32, offset, 0));
} else {
result = ac_build_gep0(&ctx->ac, get_src(ctx, instr->parent),
LLVMConstInt(ctx->ac.i32, instr->strct.index, 0));
}
break;
case nir_deref_type_array:
result = ac_build_gep0(&ctx->ac, get_src(ctx, instr->parent),
get_src(ctx, instr->arr.index));
if (instr->mode == nir_var_mem_global) {
nir_deref_instr *parent = nir_deref_instr_parent(instr);
unsigned stride = glsl_get_explicit_stride(parent->type);
if ((glsl_type_is_matrix(parent->type) &&
glsl_matrix_type_is_row_major(parent->type)) ||
(glsl_type_is_vector(parent->type) && stride == 0))
stride = type_scalar_size_bytes(parent->type);
assert(stride > 0);
LLVMValueRef index = get_src(ctx, instr->arr.index);
if (LLVMTypeOf(index) != ctx->ac.i64)
index = LLVMBuildZExt(ctx->ac.builder, index, ctx->ac.i64, "");
LLVMValueRef offset = LLVMBuildMul(ctx->ac.builder, index, LLVMConstInt(ctx->ac.i64, stride, 0), "");
result = ac_build_gep_ptr(&ctx->ac, get_src(ctx, instr->parent), offset);
} else {
result = ac_build_gep0(&ctx->ac, get_src(ctx, instr->parent),
get_src(ctx, instr->arr.index));
}
break;
case nir_deref_type_ptr_as_array:
result = ac_build_gep_ptr(&ctx->ac, get_src(ctx, instr->parent),
get_src(ctx, instr->arr.index));
if (instr->mode == nir_var_mem_global) {
unsigned stride = nir_deref_instr_ptr_as_array_stride(instr);
LLVMValueRef index = get_src(ctx, instr->arr.index);
if (LLVMTypeOf(index) != ctx->ac.i64)
index = LLVMBuildZExt(ctx->ac.builder, index, ctx->ac.i64, "");
LLVMValueRef offset = LLVMBuildMul(ctx->ac.builder, index, LLVMConstInt(ctx->ac.i64, stride, 0), "");
result = ac_build_gep_ptr(&ctx->ac, get_src(ctx, instr->parent), offset);
} else {
result = ac_build_gep_ptr(&ctx->ac, get_src(ctx, instr->parent),
get_src(ctx, instr->arr.index));
}
break;
case nir_deref_type_cast: {
result = get_src(ctx, instr->parent);
LLVMTypeRef pointee_type = glsl_to_llvm_type(&ctx->ac, instr->type);
LLVMTypeRef type = LLVMPointerType(pointee_type, AC_ADDR_SPACE_LDS);
/* We can't use the structs from LLVM because the shader
* specifies its own offsets. */
LLVMTypeRef pointee_type = ctx->ac.i8;
if (instr->mode == nir_var_mem_shared)
pointee_type = glsl_to_llvm_type(&ctx->ac, instr->type);
unsigned address_space;
switch(instr->mode) {
case nir_var_mem_shared:
address_space = AC_ADDR_SPACE_LDS;
break;
case nir_var_mem_global:
address_space = AC_ADDR_SPACE_GLOBAL;
break;
default:
unreachable("Unhandled address space");
}
LLVMTypeRef type = LLVMPointerType(pointee_type, address_space);
if (LLVMTypeOf(result) != type) {
if (LLVMGetTypeKind(LLVMTypeOf(result)) == LLVMVectorTypeKind) {

View File

@@ -32,6 +32,8 @@ struct nir_variable;
#define AC_LLVM_MAX_OUTPUTS (VARYING_SLOT_VAR31 + 1)
#define AC_MAX_INLINE_PUSH_CONSTS 8
enum ac_descriptor_type {
AC_DESC_IMAGE,
AC_DESC_FMASK,
@@ -66,6 +68,9 @@ struct ac_shader_abi {
/* Vulkan only */
LLVMValueRef push_constants;
LLVMValueRef inline_push_consts[AC_MAX_INLINE_PUSH_CONSTS];
unsigned num_inline_push_consts;
unsigned base_inline_push_consts;
LLVMValueRef view_index;
LLVMValueRef outputs[AC_LLVM_MAX_OUTPUTS * 4];
@@ -195,7 +200,6 @@ struct ac_shader_abi {
/* Whether to workaround GFX9 ignoring the stride for the buffer size if IDXEN=0
* and LLVM optimizes an indexed load with constant index to IDXEN=0. */
bool gfx9_stride_size_workaround;
bool gfx9_stride_size_workaround_for_atomic;
};
#endif /* AC_SHADER_ABI_H */

View File

@@ -128,26 +128,21 @@ if with_xlib_lease
radv_flags += '-DVK_USE_PLATFORM_XLIB_XRANDR_EXT'
endif
if with_platform_android
radv_flags += [
'-DVK_USE_PLATFORM_ANDROID_KHR'
]
libradv_files += files('radv_android.c')
endif
libvulkan_radeon = shared_library(
'vulkan_radeon',
[libradv_files, radv_entrypoints, radv_extensions_c, vk_format_table_c, sha1_h],
include_directories : [
inc_common, inc_amd, inc_amd_common, inc_compiler, inc_vulkan_wsi,
inc_common, inc_amd, inc_amd_common, inc_compiler, inc_vulkan_util,
inc_vulkan_wsi,
],
link_with : [
libamd_common, libamdgpu_addrlib, libvulkan_wsi, libmesa_util,
libamd_common, libamdgpu_addrlib, libvulkan_util, libvulkan_wsi,
libmesa_util,
],
dependencies : [
dep_llvm, dep_libdrm_amdgpu, dep_thread, dep_elf, dep_dl, dep_m,
dep_valgrind, radv_deps,
idep_nir, idep_vulkan_util,
idep_nir,
],
c_args : [c_vis_args, no_override_init_args, radv_flags],
cpp_args : [cpp_vis_args, radv_flags],

View File

@@ -301,6 +301,7 @@ radv_cmd_buffer_destroy(struct radv_cmd_buffer *cmd_buffer)
static VkResult
radv_reset_cmd_buffer(struct radv_cmd_buffer *cmd_buffer)
{
cmd_buffer->device->ws->cs_reset(cmd_buffer->cs);
list_for_each_entry_safe(struct radv_cmd_buffer_upload, up,
@@ -325,8 +326,6 @@ radv_reset_cmd_buffer(struct radv_cmd_buffer *cmd_buffer)
cmd_buffer->record_result = VK_SUCCESS;
memset(cmd_buffer->vertex_bindings, 0, sizeof(cmd_buffer->vertex_bindings));
for (unsigned i = 0; i < VK_PIPELINE_BIND_POINT_RANGE_SIZE; i++) {
cmd_buffer->descriptors[i].dirty = 0;
cmd_buffer->descriptors[i].valid = 0;
@@ -339,15 +338,14 @@ radv_reset_cmd_buffer(struct radv_cmd_buffer *cmd_buffer)
unsigned fence_offset, eop_bug_offset;
void *fence_ptr;
radv_cmd_buffer_upload_alloc(cmd_buffer, 8, 8, &fence_offset,
radv_cmd_buffer_upload_alloc(cmd_buffer, 8, 0, &fence_offset,
&fence_ptr);
cmd_buffer->gfx9_fence_va =
radv_buffer_get_va(cmd_buffer->upload.upload_bo);
cmd_buffer->gfx9_fence_va += fence_offset;
/* Allocate a buffer for the EOP bug on GFX9. */
radv_cmd_buffer_upload_alloc(cmd_buffer, 16 * num_db, 8,
radv_cmd_buffer_upload_alloc(cmd_buffer, 16 * num_db, 0,
&eop_bug_offset, &fence_ptr);
cmd_buffer->gfx9_eop_bug_va =
radv_buffer_get_va(cmd_buffer->upload.upload_bo);
@@ -418,8 +416,6 @@ radv_cmd_buffer_upload_alloc(struct radv_cmd_buffer *cmd_buffer,
unsigned *out_offset,
void **ptr)
{
assert(util_is_power_of_two_nonzero(alignment));
uint64_t offset = align(cmd_buffer->upload.offset, alignment);
if (offset + size > cmd_buffer->upload.size) {
if (!radv_cmd_buffer_resize_upload_buf(cmd_buffer, size))
@@ -566,8 +562,8 @@ radv_save_descriptors(struct radv_cmd_buffer *cmd_buffer,
for_each_bit(i, descriptors_state->valid) {
struct radv_descriptor_set *set = descriptors_state->sets[i];
data[i * 2] = (uint64_t)(uintptr_t)set;
data[i * 2 + 1] = (uint64_t)(uintptr_t)set >> 32;
data[i * 2] = (uintptr_t)set;
data[i * 2 + 1] = (uintptr_t)set >> 32;
}
radv_emit_write_data_packet(cmd_buffer, va, MAX_SETS * 2, data);
@@ -632,6 +628,23 @@ radv_emit_descriptor_pointers(struct radv_cmd_buffer *cmd_buffer,
}
}
static void
radv_emit_inline_push_consts(struct radv_cmd_buffer *cmd_buffer,
struct radv_pipeline *pipeline,
gl_shader_stage stage,
int idx, int count, uint32_t *values)
{
struct radv_userdata_info *loc = radv_lookup_user_sgpr(pipeline, stage, idx);
uint32_t base_reg = pipeline->user_data_0[stage];
if (loc->sgpr_idx == -1)
return;
assert(loc->num_sgprs == count);
radeon_set_sh_reg_seq(cmd_buffer->cs, base_reg + loc->sgpr_idx * 4, count);
radeon_emit_array(cmd_buffer->cs, values, count);
}
static void
radv_update_multisample_state(struct radv_cmd_buffer *cmd_buffer,
struct radv_pipeline *pipeline)
@@ -1209,10 +1222,10 @@ radv_update_bound_fast_clear_ds(struct radv_cmd_buffer *cmd_buffer,
if (!framebuffer || !subpass)
return;
att_idx = subpass->depth_stencil_attachment.attachment;
if (att_idx == VK_ATTACHMENT_UNUSED)
if (!subpass->depth_stencil_attachment)
return;
att_idx = subpass->depth_stencil_attachment->attachment;
att = &framebuffer->attachments[att_idx];
if (att->attachment->image != image)
return;
@@ -1226,7 +1239,7 @@ radv_update_bound_fast_clear_ds(struct radv_cmd_buffer *cmd_buffer,
*/
if ((aspects & VK_IMAGE_ASPECT_DEPTH_BIT) &&
ds_clear_value.depth == 0.0) {
VkImageLayout layout = subpass->depth_stencil_attachment.layout;
VkImageLayout layout = subpass->depth_stencil_attachment->layout;
radv_update_zrange_precision(cmd_buffer, &att->ds, image,
layout, false);
@@ -1259,7 +1272,7 @@ radv_set_ds_clear_metadata(struct radv_cmd_buffer *cmd_buffer,
if (aspects & VK_IMAGE_ASPECT_DEPTH_BIT)
++reg_count;
radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 2 + reg_count, cmd_buffer->state.predicating));
radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 2 + reg_count, 0));
radeon_emit(cs, S_370_DST_SEL(V_370_MEM) |
S_370_WR_CONFIRM(1) |
S_370_ENGINE_SEL(V_370_PFP));
@@ -1283,7 +1296,7 @@ radv_set_tc_compat_zrange_metadata(struct radv_cmd_buffer *cmd_buffer,
uint64_t va = radv_buffer_get_va(image->bo);
va += image->offset + image->tc_compat_zrange_offset;
radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, cmd_buffer->state.predicating));
radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 3, 0));
radeon_emit(cs, S_370_DST_SEL(V_370_MEM) |
S_370_WR_CONFIRM(1) |
S_370_ENGINE_SEL(V_370_PFP));
@@ -1477,7 +1490,7 @@ radv_set_color_clear_metadata(struct radv_cmd_buffer *cmd_buffer,
assert(radv_image_has_cmask(image) || radv_image_has_dcc(image));
radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 4, cmd_buffer->state.predicating));
radeon_emit(cs, PKT3(PKT3_WRITE_DATA, 4, 0));
radeon_emit(cs, S_370_DST_SEL(V_370_MEM) |
S_370_WR_CONFIRM(1) |
S_370_ENGINE_SEL(V_370_PFP));
@@ -1578,9 +1591,9 @@ radv_emit_framebuffer_state(struct radv_cmd_buffer *cmd_buffer)
num_bpp64_colorbufs++;
}
if(subpass->depth_stencil_attachment.attachment != VK_ATTACHMENT_UNUSED) {
int idx = subpass->depth_stencil_attachment.attachment;
VkImageLayout layout = subpass->depth_stencil_attachment.layout;
if (subpass->depth_stencil_attachment) {
int idx = subpass->depth_stencil_attachment->attachment;
VkImageLayout layout = subpass->depth_stencil_attachment->layout;
struct radv_attachment_info *att = &framebuffer->attachments[idx];
struct radv_image *image = att->attachment->image;
radv_cs_add_buffer(cmd_buffer->device->ws, cmd_buffer->cs, att->attachment->bo);
@@ -1904,6 +1917,7 @@ radv_flush_constants(struct radv_cmd_buffer *cmd_buffer,
radv_get_descriptors_state(cmd_buffer, bind_point);
struct radv_pipeline_layout *layout = pipeline->layout;
struct radv_shader_variant *shader, *prev_shader;
bool need_push_constants = false;
unsigned offset;
void *ptr;
uint64_t va;
@@ -1913,37 +1927,56 @@ radv_flush_constants(struct radv_cmd_buffer *cmd_buffer,
(!layout->push_constant_size && !layout->dynamic_offset_count))
return;
if (!radv_cmd_buffer_upload_alloc(cmd_buffer, layout->push_constant_size +
16 * layout->dynamic_offset_count,
256, &offset, &ptr))
return;
memcpy(ptr, cmd_buffer->push_constants, layout->push_constant_size);
memcpy((char*)ptr + layout->push_constant_size,
descriptors_state->dynamic_buffers,
16 * layout->dynamic_offset_count);
va = radv_buffer_get_va(cmd_buffer->upload.upload_bo);
va += offset;
MAYBE_UNUSED unsigned cdw_max = radeon_check_space(cmd_buffer->device->ws,
cmd_buffer->cs, MESA_SHADER_STAGES * 4);
prev_shader = NULL;
radv_foreach_stage(stage, stages) {
shader = radv_get_shader(pipeline, stage);
if (!pipeline->shaders[stage])
continue;
/* Avoid redundantly emitting the address for merged stages. */
if (shader && shader != prev_shader) {
radv_emit_userdata_address(cmd_buffer, pipeline, stage,
AC_UD_PUSH_CONSTANTS, va);
need_push_constants |= pipeline->shaders[stage]->info.info.loads_push_constants;
need_push_constants |= pipeline->shaders[stage]->info.info.loads_dynamic_offsets;
prev_shader = shader;
uint8_t base = pipeline->shaders[stage]->info.info.base_inline_push_consts;
uint8_t count = pipeline->shaders[stage]->info.info.num_inline_push_consts;
radv_emit_inline_push_consts(cmd_buffer, pipeline, stage,
AC_UD_INLINE_PUSH_CONSTANTS,
count,
(uint32_t *)&cmd_buffer->push_constants[base * 4]);
}
if (need_push_constants) {
if (!radv_cmd_buffer_upload_alloc(cmd_buffer, layout->push_constant_size +
16 * layout->dynamic_offset_count,
256, &offset, &ptr))
return;
memcpy(ptr, cmd_buffer->push_constants, layout->push_constant_size);
memcpy((char*)ptr + layout->push_constant_size,
descriptors_state->dynamic_buffers,
16 * layout->dynamic_offset_count);
va = radv_buffer_get_va(cmd_buffer->upload.upload_bo);
va += offset;
MAYBE_UNUSED unsigned cdw_max =
radeon_check_space(cmd_buffer->device->ws,
cmd_buffer->cs, MESA_SHADER_STAGES * 4);
prev_shader = NULL;
radv_foreach_stage(stage, stages) {
shader = radv_get_shader(pipeline, stage);
/* Avoid redundantly emitting the address for merged stages. */
if (shader && shader != prev_shader) {
radv_emit_userdata_address(cmd_buffer, pipeline, stage,
AC_UD_PUSH_CONSTANTS, va);
prev_shader = shader;
}
}
assert(cmd_buffer->cs->cdw <= cdw_max);
}
cmd_buffer->push_constant_stages &= ~stages;
assert(cmd_buffer->cs->cdw <= cdw_max);
}
static void
@@ -2158,7 +2191,6 @@ radv_emit_draw_registers(struct radv_cmd_buffer *cmd_buffer,
ia_multi_vgt_param =
si_get_ia_multi_vgt_param(cmd_buffer, draw_info->instance_count > 1,
draw_info->indirect,
!!draw_info->strmout_buffer,
draw_info->indirect ? 0 : draw_info->count);
if (state->last_ia_multi_vgt_param != ia_multi_vgt_param) {
@@ -2429,28 +2461,8 @@ static void radv_handle_subpass_image_transition(struct radv_cmd_buffer *cmd_buf
void
radv_cmd_buffer_set_subpass(struct radv_cmd_buffer *cmd_buffer,
const struct radv_subpass *subpass, bool transitions)
const struct radv_subpass *subpass)
{
if (transitions) {
radv_subpass_barrier(cmd_buffer, &subpass->start_barrier);
for (unsigned i = 0; i < subpass->color_count; ++i) {
if (subpass->color_attachments[i].attachment != VK_ATTACHMENT_UNUSED)
radv_handle_subpass_image_transition(cmd_buffer,
subpass->color_attachments[i]);
}
for (unsigned i = 0; i < subpass->input_count; ++i) {
radv_handle_subpass_image_transition(cmd_buffer,
subpass->input_attachments[i]);
}
if (subpass->depth_stencil_attachment.attachment != VK_ATTACHMENT_UNUSED) {
radv_handle_subpass_image_transition(cmd_buffer,
subpass->depth_stencil_attachment);
}
}
cmd_buffer->state.subpass = subpass;
cmd_buffer->state.dirty |= RADV_CMD_DIRTY_FRAMEBUFFER;
@@ -2633,7 +2645,7 @@ VkResult radv_BeginCommandBuffer(
if (result != VK_SUCCESS)
return result;
radv_cmd_buffer_set_subpass(cmd_buffer, subpass, false);
radv_cmd_buffer_set_subpass(cmd_buffer, subpass);
}
if (unlikely(cmd_buffer->device->trace_bo)) {
@@ -3413,6 +3425,69 @@ void radv_TrimCommandPool(
}
}
static uint32_t
radv_get_subpass_id(struct radv_cmd_buffer *cmd_buffer)
{
struct radv_cmd_state *state = &cmd_buffer->state;
uint32_t subpass_id = state->subpass - state->pass->subpasses;
/* The id of this subpass shouldn't exceed the number of subpasses in
* this render pass minus 1.
*/
assert(subpass_id < state->pass->subpass_count);
return subpass_id;
}
static void
radv_cmd_buffer_begin_subpass(struct radv_cmd_buffer *cmd_buffer,
uint32_t subpass_id)
{
struct radv_cmd_state *state = &cmd_buffer->state;
struct radv_subpass *subpass = &state->pass->subpasses[subpass_id];
MAYBE_UNUSED unsigned cdw_max = radeon_check_space(cmd_buffer->device->ws,
cmd_buffer->cs, 2048);
radv_subpass_barrier(cmd_buffer, &subpass->start_barrier);
for (uint32_t i = 0; i < subpass->attachment_count; ++i) {
const uint32_t a = subpass->attachments[i].attachment;
if (a == VK_ATTACHMENT_UNUSED)
continue;
radv_handle_subpass_image_transition(cmd_buffer,
subpass->attachments[i]);
}
radv_cmd_buffer_set_subpass(cmd_buffer, subpass);
radv_cmd_buffer_clear_subpass(cmd_buffer);
assert(cmd_buffer->cs->cdw <= cdw_max);
}
static void
radv_cmd_buffer_end_subpass(struct radv_cmd_buffer *cmd_buffer)
{
struct radv_cmd_state *state = &cmd_buffer->state;
const struct radv_subpass *subpass = state->subpass;
uint32_t subpass_id = radv_get_subpass_id(cmd_buffer);
radv_cmd_buffer_resolve_subpass(cmd_buffer);
for (uint32_t i = 0; i < subpass->attachment_count; ++i) {
const uint32_t a = subpass->attachments[i].attachment;
if (a == VK_ATTACHMENT_UNUSED)
continue;
if (state->pass->attachments[a].last_subpass_idx != subpass_id)
continue;
VkImageLayout layout = state->pass->attachments[a].final_layout;
radv_handle_subpass_image_transition(cmd_buffer,
(struct radv_subpass_attachment){a, layout});
}
}
void radv_CmdBeginRenderPass(
VkCommandBuffer commandBuffer,
const VkRenderPassBeginInfo* pRenderPassBegin,
@@ -3421,10 +3496,7 @@ void radv_CmdBeginRenderPass(
RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
RADV_FROM_HANDLE(radv_render_pass, pass, pRenderPassBegin->renderPass);
RADV_FROM_HANDLE(radv_framebuffer, framebuffer, pRenderPassBegin->framebuffer);
MAYBE_UNUSED unsigned cdw_max = radeon_check_space(cmd_buffer->device->ws,
cmd_buffer->cs, 2048);
MAYBE_UNUSED VkResult result;
VkResult result;
cmd_buffer->state.framebuffer = framebuffer;
cmd_buffer->state.pass = pass;
@@ -3434,10 +3506,7 @@ void radv_CmdBeginRenderPass(
if (result != VK_SUCCESS)
return;
radv_cmd_buffer_set_subpass(cmd_buffer, pass->subpasses, true);
assert(cmd_buffer->cs->cdw <= cdw_max);
radv_cmd_buffer_clear_subpass(cmd_buffer);
radv_cmd_buffer_begin_subpass(cmd_buffer, 0);
}
void radv_CmdBeginRenderPass2KHR(
@@ -3455,13 +3524,9 @@ void radv_CmdNextSubpass(
{
RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
radv_cmd_buffer_resolve_subpass(cmd_buffer);
radeon_check_space(cmd_buffer->device->ws, cmd_buffer->cs,
2048);
radv_cmd_buffer_set_subpass(cmd_buffer, cmd_buffer->state.subpass + 1, true);
radv_cmd_buffer_clear_subpass(cmd_buffer);
uint32_t prev_subpass = radv_get_subpass_id(cmd_buffer);
radv_cmd_buffer_end_subpass(cmd_buffer);
radv_cmd_buffer_begin_subpass(cmd_buffer, prev_subpass + 1);
}
void radv_CmdNextSubpass2KHR(
@@ -4327,16 +4392,10 @@ void radv_CmdEndRenderPass(
{
RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
radv_cmd_buffer_end_subpass(cmd_buffer);
radv_subpass_barrier(cmd_buffer, &cmd_buffer->state.pass->end_barrier);
radv_cmd_buffer_resolve_subpass(cmd_buffer);
for (unsigned i = 0; i < cmd_buffer->state.framebuffer->attachment_count; ++i) {
VkImageLayout layout = cmd_buffer->state.pass->attachments[i].final_layout;
radv_handle_subpass_image_transition(cmd_buffer,
(struct radv_subpass_attachment){i, layout});
}
vk_free(&cmd_buffer->pool->alloc, cmd_buffer->state.attachments);
cmd_buffer->state.pass = NULL;
@@ -4408,15 +4467,10 @@ static void radv_handle_depth_image_transition(struct radv_cmd_buffer *cmd_buffe
if (!radv_image_has_htile(image))
return;
if (src_layout == VK_IMAGE_LAYOUT_UNDEFINED) {
uint32_t clear_value = vk_format_is_stencil(image->vk_format) ? 0xfffff30f : 0xfffc000f;
if (radv_layout_is_htile_compressed(image, dst_layout,
dst_queue_mask)) {
clear_value = 0;
}
radv_initialize_htile(cmd_buffer, image, range, clear_value);
if (src_layout == VK_IMAGE_LAYOUT_UNDEFINED &&
radv_layout_has_htile(image, dst_layout, dst_queue_mask)) {
/* TODO: merge with the clear if applicable */
radv_initialize_htile(cmd_buffer, image, range, 0);
} else if (!radv_layout_is_htile_compressed(image, src_layout, src_queue_mask) &&
radv_layout_is_htile_compressed(image, dst_layout, dst_queue_mask)) {
uint32_t clear_value = vk_format_is_stencil(image->vk_format) ? 0xfffff30f : 0xfffc000f;
@@ -4601,6 +4655,9 @@ static void radv_handle_image_transition(struct radv_cmd_buffer *cmd_buffer,
return;
}
if (src_layout == dst_layout)
return;
unsigned src_queue_mask =
radv_image_queue_family_mask(image, src_family,
cmd_buffer->queue_family_index);
@@ -4625,6 +4682,7 @@ struct radv_barrier_info {
uint32_t eventCount;
const VkEvent *pEvents;
VkPipelineStageFlags srcStageMask;
VkPipelineStageFlags dstStageMask;
};
static void
@@ -4676,7 +4734,19 @@ radv_barrier(struct radv_cmd_buffer *cmd_buffer,
image);
}
radv_stage_flush(cmd_buffer, info->srcStageMask);
/* The Vulkan spec 1.1.98 says:
*
* "An execution dependency with only
* VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT in the destination stage mask
* will only prevent that stage from executing in subsequently
* submitted commands. As this stage does not perform any actual
* execution, this is not observable - in effect, it does not delay
* processing of subsequent commands. Similarly an execution dependency
* with only VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT in the source stage mask
* will effectively not wait for any prior commands to complete."
*/
if (info->dstStageMask != VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT)
radv_stage_flush(cmd_buffer, info->srcStageMask);
cmd_buffer->state.flush_bits |= src_flush_bits;
for (uint32_t i = 0; i < imageMemoryBarrierCount; i++) {
@@ -4717,6 +4787,7 @@ void radv_CmdPipelineBarrier(
info.eventCount = 0;
info.pEvents = NULL;
info.srcStageMask = srcStageMask;
info.dstStageMask = destStageMask;
radv_barrier(cmd_buffer, memoryBarrierCount, pMemoryBarriers,
bufferMemoryBarrierCount, pBufferMemoryBarriers,
@@ -4736,7 +4807,7 @@ static void write_event(struct radv_cmd_buffer *cmd_buffer,
radv_cs_add_buffer(cmd_buffer->device->ws, cs, event->bo);
MAYBE_UNUSED unsigned cdw_max = radeon_check_space(cmd_buffer->device->ws, cs, 21);
MAYBE_UNUSED unsigned cdw_max = radeon_check_space(cmd_buffer->device->ws, cs, 18);
/* Flags that only require a top-of-pipe event. */
VkPipelineStageFlags top_of_pipe_flags =
@@ -4846,11 +4917,8 @@ void radv_CmdBeginConditionalRenderingEXT(
{
RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
RADV_FROM_HANDLE(radv_buffer, buffer, pConditionalRenderingBegin->buffer);
struct radeon_cmdbuf *cs = cmd_buffer->cs;
bool draw_visible = true;
uint64_t pred_value = 0;
uint64_t va, new_va;
unsigned pred_offset;
uint64_t va;
va = radv_buffer_get_va(buffer->bo) + pConditionalRenderingBegin->offset;
@@ -4866,51 +4934,13 @@ void radv_CmdBeginConditionalRenderingEXT(
si_emit_cache_flush(cmd_buffer);
/* From the Vulkan spec 1.1.107:
*
* "If the 32-bit value at offset in buffer memory is zero, then the
* rendering commands are discarded, otherwise they are executed as
* normal. If the value of the predicate in buffer memory changes while
* conditional rendering is active, the rendering commands may be
* discarded in an implementation-dependent way. Some implementations
* may latch the value of the predicate upon beginning conditional
* rendering while others may read it before every rendering command."
*
* But, the AMD hardware treats the predicate as a 64-bit value which
* means we need a workaround in the driver. Luckily, it's not required
* to support if the value changes when predication is active.
*
* The workaround is as follows:
* 1) allocate a 64-value in the upload BO and initialize it to 0
* 2) copy the 32-bit predicate value to the upload BO
* 3) use the new allocated VA address for predication
*
* Based on the conditionalrender demo, it's faster to do the COPY_DATA
* in ME (+ sync PFP) instead of PFP.
*/
radv_cmd_buffer_upload_data(cmd_buffer, 8, 16, &pred_value, &pred_offset);
new_va = radv_buffer_get_va(cmd_buffer->upload.upload_bo) + pred_offset;
radeon_emit(cs, PKT3(PKT3_COPY_DATA, 4, 0));
radeon_emit(cs, COPY_DATA_SRC_SEL(COPY_DATA_SRC_MEM) |
COPY_DATA_DST_SEL(COPY_DATA_DST_MEM) |
COPY_DATA_WR_CONFIRM);
radeon_emit(cs, va);
radeon_emit(cs, va >> 32);
radeon_emit(cs, new_va);
radeon_emit(cs, new_va >> 32);
radeon_emit(cs, PKT3(PKT3_PFP_SYNC_ME, 0, 0));
radeon_emit(cs, 0);
/* Enable predication for this command buffer. */
si_emit_set_predication_state(cmd_buffer, draw_visible, new_va);
si_emit_set_predication_state(cmd_buffer, draw_visible, va);
cmd_buffer->state.predicating = true;
/* Store conditional rendering user info. */
cmd_buffer->state.predication_type = draw_visible;
cmd_buffer->state.predication_va = new_va;
cmd_buffer->state.predication_va = va;
}
void radv_CmdEndConditionalRenderingEXT(
@@ -4954,7 +4984,7 @@ void radv_CmdBindTransformFeedbackBuffersEXT(
enabled_mask |= 1 << idx;
}
cmd_buffer->state.streamout.enabled_mask |= enabled_mask;
cmd_buffer->state.streamout.enabled_mask = enabled_mask;
cmd_buffer->state.dirty |= RADV_CMD_DIRTY_STREAMOUT_BUFFER;
}

View File

@@ -51,7 +51,6 @@ enum {
RADV_DEBUG_CHECKIR = 0x200000,
RADV_DEBUG_NOTHREADLLVM = 0x400000,
RADV_DEBUG_NOBINNING = 0x800000,
RADV_DEBUG_NO_LOAD_STORE_OPT = 0x1000000,
};
enum {

View File

@@ -111,7 +111,6 @@ radv_get_device_name(enum radeon_family family, char *name, size_t name_len)
case CHIP_VEGAM: chip_string = "AMD RADV VEGA M"; break;
case CHIP_VEGA10: chip_string = "AMD RADV VEGA10"; break;
case CHIP_VEGA12: chip_string = "AMD RADV VEGA12"; break;
case CHIP_VEGA20: chip_string = "AMD RADV VEGA20"; break;
case CHIP_RAVEN: chip_string = "AMD RADV RAVEN"; break;
case CHIP_RAVEN2: chip_string = "AMD RADV RAVEN2"; break;
default: chip_string = "AMD RADV unknown"; break;
@@ -338,7 +337,7 @@ radv_physical_device_init(struct radv_physical_device *device,
device->rad_info.chip_class > GFX9)
fprintf(stderr, "WARNING: radv is not a conformant vulkan implementation, testing use only.\n");
radv_get_driver_uuid(&device->driver_uuid);
radv_get_driver_uuid(&device->device_uuid);
radv_get_device_uuid(&device->rad_info, &device->device_uuid);
if (device->rad_info.family == CHIP_STONEY ||
@@ -466,7 +465,6 @@ static const struct debug_control radv_debug_options[] = {
{"checkir", RADV_DEBUG_CHECKIR},
{"nothreadllvm", RADV_DEBUG_NOTHREADLLVM},
{"nobinning", RADV_DEBUG_NOBINNING},
{"noloadstoreopt", RADV_DEBUG_NO_LOAD_STORE_OPT},
{NULL, 0}
};
@@ -512,13 +510,6 @@ radv_handle_per_app_options(struct radv_instance *instance,
} else if (!strcmp(name, "DOOM_VFR")) {
/* Work around a Doom VFR game bug */
instance->debug_flags |= RADV_DEBUG_NO_DYNAMIC_BOUNDS;
} else if (!strcmp(name, "MonsterHunterWorld.exe")) {
/* Workaround for a WaW hazard when LLVM moves/merges
* load/store memory operations.
* See https://reviews.llvm.org/D61313
*/
if (HAVE_LLVM < 0x900)
instance->debug_flags |= RADV_DEBUG_NO_LOAD_STORE_OPT;
}
}
@@ -882,6 +873,20 @@ void radv_GetPhysicalDeviceFeatures2(
features->memoryPriority = VK_TRUE;
break;
}
case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_BUFFER_ADDRESS_FEATURES_EXT: {
VkPhysicalDeviceBufferAddressFeaturesEXT *features =
(VkPhysicalDeviceBufferAddressFeaturesEXT *)ext;
features->bufferDeviceAddress = true;
features->bufferDeviceAddressCaptureReplay = false;
features->bufferDeviceAddressMultiDevice = false;
break;
}
case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_DEPTH_CLIP_ENABLE_FEATURES_EXT: {
VkPhysicalDeviceDepthClipEnableFeaturesEXT *features =
(VkPhysicalDeviceDepthClipEnableFeaturesEXT *)ext;
features->depthClipEnable = true;
break;
}
default:
break;
}
@@ -939,8 +944,8 @@ void radv_GetPhysicalDeviceProperties(
.maxDescriptorSetSampledImages = max_descriptor_set_size,
.maxDescriptorSetStorageImages = max_descriptor_set_size,
.maxDescriptorSetInputAttachments = max_descriptor_set_size,
.maxVertexInputAttributes = 32,
.maxVertexInputBindings = 32,
.maxVertexInputAttributes = MAX_VERTEX_ATTRIBS,
.maxVertexInputBindings = MAX_VBS,
.maxVertexInputAttributeOffset = 2047,
.maxVertexInputBindingStride = 2048,
.maxVertexOutputComponents = 128,
@@ -1011,7 +1016,7 @@ void radv_GetPhysicalDeviceProperties(
.maxCullDistances = 8,
.maxCombinedClipAndCullDistances = 8,
.discreteQueuePriorities = 2,
.pointSizeRange = { 0.0, 8192.0 },
.pointSizeRange = { 0.125, 255.875 },
.lineWidthRange = { 0.0, 7.9921875 },
.pointSizeGranularity = (1.0 / 8.0),
.lineWidthGranularity = (1.0 / 128.0),
@@ -1142,7 +1147,7 @@ void radv_GetPhysicalDeviceProperties2(
/* SGPR. */
properties->sgprsPerSimd =
radv_get_num_physical_sgprs(pdevice);
ac_get_num_physical_sgprs(pdevice->rad_info.chip_class);
properties->minSgprAllocation =
pdevice->rad_info.chip_class >= VI ? 16 : 8;
properties->maxSgprAllocation =
@@ -1389,46 +1394,40 @@ radv_get_memory_budget_properties(VkPhysicalDevice physicalDevice,
* Note that the application heap usages are not really accurate (eg.
* in presence of shared buffers).
*/
for (int i = 0; i < device->memory_properties.memoryTypeCount; i++) {
uint32_t heap_index = device->memory_properties.memoryTypes[i].heapIndex;
if (vram_size) {
heap_usage = device->ws->query_value(device->ws,
RADEON_ALLOCATED_VRAM);
switch (device->mem_type_indices[i]) {
case RADV_MEM_TYPE_VRAM:
heap_usage = device->ws->query_value(device->ws,
RADEON_ALLOCATED_VRAM);
heap_budget = vram_size -
device->ws->query_value(device->ws, RADEON_VRAM_USAGE) +
heap_usage;
heap_budget = vram_size -
device->ws->query_value(device->ws, RADEON_VRAM_USAGE) +
heap_usage;
memoryBudget->heapBudget[RADV_MEM_HEAP_VRAM] = heap_budget;
memoryBudget->heapUsage[RADV_MEM_HEAP_VRAM] = heap_usage;
}
memoryBudget->heapBudget[heap_index] = heap_budget;
memoryBudget->heapUsage[heap_index] = heap_usage;
break;
case RADV_MEM_TYPE_VRAM_CPU_ACCESS:
heap_usage = device->ws->query_value(device->ws,
RADEON_ALLOCATED_VRAM_VIS);
if (visible_vram_size) {
heap_usage = device->ws->query_value(device->ws,
RADEON_ALLOCATED_VRAM_VIS);
heap_budget = visible_vram_size -
device->ws->query_value(device->ws, RADEON_VRAM_VIS_USAGE) +
heap_usage;
heap_budget = visible_vram_size -
device->ws->query_value(device->ws, RADEON_VRAM_VIS_USAGE) +
heap_usage;
memoryBudget->heapBudget[heap_index] = heap_budget;
memoryBudget->heapUsage[heap_index] = heap_usage;
break;
case RADV_MEM_TYPE_GTT_WRITE_COMBINE:
heap_usage = device->ws->query_value(device->ws,
RADEON_ALLOCATED_GTT);
memoryBudget->heapBudget[RADV_MEM_HEAP_VRAM_CPU_ACCESS] = heap_budget;
memoryBudget->heapUsage[RADV_MEM_HEAP_VRAM_CPU_ACCESS] = heap_usage;
}
heap_budget = gtt_size -
device->ws->query_value(device->ws, RADEON_GTT_USAGE) +
heap_usage;
if (gtt_size) {
heap_usage = device->ws->query_value(device->ws,
RADEON_ALLOCATED_GTT);
memoryBudget->heapBudget[heap_index] = heap_budget;
memoryBudget->heapUsage[heap_index] = heap_usage;
break;
default:
break;
}
heap_budget = gtt_size -
device->ws->query_value(device->ws, RADEON_GTT_USAGE) +
heap_usage;
memoryBudget->heapBudget[RADV_MEM_HEAP_GTT] = heap_budget;
memoryBudget->heapUsage[RADV_MEM_HEAP_GTT] = heap_usage;
}
/* The heapBudget and heapUsage values must be zero for array elements
@@ -1570,6 +1569,9 @@ static VkResult radv_bo_list_add(struct radv_device *device,
{
struct radv_bo_list *bo_list = &device->bo_list;
if (bo->is_local)
return VK_SUCCESS;
if (unlikely(!device->use_global_bo_list))
return VK_SUCCESS;
@@ -1597,6 +1599,9 @@ static void radv_bo_list_remove(struct radv_device *device,
{
struct radv_bo_list *bo_list = &device->bo_list;
if (bo->is_local)
return;
if (unlikely(!device->use_global_bo_list))
return;
@@ -1707,7 +1712,8 @@ VkResult radv_CreateDevice(
* from the descriptor set anymore, so we have to use a global BO list.
*/
device->use_global_bo_list =
device->enabled_extensions.EXT_descriptor_indexing;
device->enabled_extensions.EXT_descriptor_indexing ||
device->enabled_extensions.EXT_buffer_device_address;
mtx_init(&device->shader_slab_mutex, mtx_plain);
list_inithead(&device->shader_slabs);
@@ -2809,7 +2815,7 @@ VkResult radv_QueueSubmit(
struct radeon_winsys_fence *base_fence = fence ? fence->fence : NULL;
struct radeon_winsys_ctx *ctx = queue->hw_ctx;
int ret;
uint32_t max_cs_submission = queue->device->trace_bo ? 1 : RADV_MAX_IBS_PER_SUBMIT;
uint32_t max_cs_submission = queue->device->trace_bo ? 1 : UINT32_MAX;
uint32_t scratch_size = 0;
uint32_t compute_scratch_size = 0;
uint32_t esgs_ring_size = 0, gsvs_ring_size = 0;
@@ -4045,6 +4051,15 @@ void radv_DestroyBuffer(
vk_free2(&device->alloc, pAllocator, buffer);
}
VkDeviceAddress radv_GetBufferDeviceAddressEXT(
VkDevice device,
const VkBufferDeviceAddressInfoEXT* pInfo)
{
RADV_FROM_HANDLE(radv_buffer, buffer, pInfo->buffer);
return radv_buffer_get_va(buffer->bo) + buffer->offset;
}
static inline unsigned
si_tile_mode_index(const struct radv_image *image, unsigned level, bool stencil)
{

View File

@@ -93,14 +93,16 @@ EXTENSIONS = [
Extension('VK_KHR_display', 23, 'VK_USE_PLATFORM_DISPLAY_KHR'),
Extension('VK_EXT_direct_mode_display', 1, 'VK_USE_PLATFORM_DISPLAY_KHR'),
Extension('VK_EXT_acquire_xlib_display', 1, 'VK_USE_PLATFORM_XLIB_XRANDR_EXT'),
Extension('VK_EXT_buffer_device_address', 1, True),
Extension('VK_EXT_calibrated_timestamps', 1, True),
Extension('VK_EXT_conditional_rendering', 1, True),
Extension('VK_EXT_conservative_rasterization', 1, 'device->rad_info.chip_class >= GFX9'),
Extension('VK_EXT_display_surface_counter', 1, 'VK_USE_PLATFORM_DISPLAY_KHR'),
Extension('VK_EXT_display_control', 1, 'VK_USE_PLATFORM_DISPLAY_KHR'),
Extension('VK_EXT_debug_report', 9, True),
Extension('VK_EXT_depth_clip_enable', 1, True),
Extension('VK_EXT_depth_range_unrestricted', 1, True),
Extension('VK_EXT_descriptor_indexing', 2, False),
Extension('VK_EXT_descriptor_indexing', 2, True),
Extension('VK_EXT_discard_rectangles', 1, True),
Extension('VK_EXT_external_memory_dma_buf', 1, True),
Extension('VK_EXT_external_memory_host', 1, 'device->rad_info.has_userptr'),

View File

@@ -524,7 +524,7 @@ static bool radv_is_storage_image_format_supported(struct radv_physical_device *
}
}
bool radv_is_buffer_format_supported(VkFormat format, bool *scaled)
static bool radv_is_buffer_format_supported(VkFormat format, bool *scaled)
{
const struct vk_format_description *desc = vk_format_description(format);
unsigned data_format, num_format;
@@ -536,8 +536,7 @@ bool radv_is_buffer_format_supported(VkFormat format, bool *scaled)
num_format = radv_translate_buffer_numformat(desc,
vk_format_get_first_non_void_channel(format));
if (scaled)
*scaled = (num_format == V_008F0C_BUF_NUM_FORMAT_SSCALED) || (num_format == V_008F0C_BUF_NUM_FORMAT_USCALED);
*scaled = (num_format == V_008F0C_BUF_NUM_FORMAT_SSCALED) || (num_format == V_008F0C_BUF_NUM_FORMAT_USCALED);
return data_format != V_008F0C_BUF_DATA_FORMAT_INVALID &&
num_format != ~0;
}
@@ -991,22 +990,10 @@ bool radv_format_pack_clear_color(VkFormat format,
assert(channel->size == 8);
v = util_format_linear_float_to_srgb_8unorm(value->float32[c]);
} else {
float f = MIN2(value->float32[c], 1.0f);
if (channel->type == VK_FORMAT_TYPE_UNSIGNED) {
f = MAX2(f, 0.0f) * ((1ULL << channel->size) - 1);
} else {
f = MAX2(f, -1.0f) * ((1ULL << (channel->size - 1)) - 1);
}
/* The hardware rounds before conversion. */
if (f > 0)
f += 0.5f;
else
f -= 0.5f;
v = (uint64_t)f;
} else if (channel->type == VK_FORMAT_TYPE_UNSIGNED) {
v = MAX2(MIN2(value->float32[c], 1.0f), 0.0f) * ((1ULL << channel->size) - 1);
} else {
v = MAX2(MIN2(value->float32[c], 1.0f), -1.0f) * ((1ULL << (channel->size - 1)) - 1);
}
} else if (channel->type == VK_FORMAT_TYPE_FLOAT) {
if (channel->size == 32) {

View File

@@ -29,7 +29,7 @@ class radv_llvm_per_thread_info {
public:
radv_llvm_per_thread_info(enum radeon_family arg_family,
enum ac_target_machine_options arg_tm_options)
: family(arg_family), tm_options(arg_tm_options) {}
: family(arg_family), tm_options(arg_tm_options), passes(NULL) {}
~radv_llvm_per_thread_info()
{

View File

@@ -956,8 +956,8 @@ radv_device_init_meta_blit_color(struct radv_device *device, bool on_demand)
.attachment = VK_ATTACHMENT_UNUSED,
.layout = VK_IMAGE_LAYOUT_GENERAL,
},
.preserveAttachmentCount = 1,
.pPreserveAttachments = (uint32_t[]) { 0 },
.preserveAttachmentCount = 0,
.pPreserveAttachments = NULL,
},
.dependencyCount = 0,
}, &device->meta_state.alloc, &device->meta_state.blit.render_pass[key][j]);
@@ -1016,8 +1016,8 @@ radv_device_init_meta_blit_depth(struct radv_device *device, bool on_demand)
.attachment = 0,
.layout = layout,
},
.preserveAttachmentCount = 1,
.pPreserveAttachments = (uint32_t[]) { 0 },
.preserveAttachmentCount = 0,
.pPreserveAttachments = NULL,
},
.dependencyCount = 0,
}, &device->meta_state.alloc, &device->meta_state.blit.depth_only_rp[ds_layout]);
@@ -1073,8 +1073,8 @@ radv_device_init_meta_blit_stencil(struct radv_device *device, bool on_demand)
.attachment = 0,
.layout = layout,
},
.preserveAttachmentCount = 1,
.pPreserveAttachments = (uint32_t[]) { 0 },
.preserveAttachmentCount = 0,
.pPreserveAttachments = NULL,
},
.dependencyCount = 0,
}, &device->meta_state.alloc, &device->meta_state.blit.stencil_only_rp[ds_layout]);

View File

@@ -807,8 +807,8 @@ blit2d_init_color_pipeline(struct radv_device *device,
.attachment = VK_ATTACHMENT_UNUSED,
.layout = layout,
},
.preserveAttachmentCount = 1,
.pPreserveAttachments = (uint32_t[]) { 0 },
.preserveAttachmentCount = 0,
.pPreserveAttachments = NULL,
},
.dependencyCount = 0,
}, &device->meta_state.alloc, &device->meta_state.blit2d_render_passes[fs_key][dst_layout]);
@@ -978,8 +978,8 @@ blit2d_init_depth_only_pipeline(struct radv_device *device,
.attachment = 0,
.layout = layout,
},
.preserveAttachmentCount = 1,
.pPreserveAttachments = (uint32_t[]) { 0 },
.preserveAttachmentCount = 0,
.pPreserveAttachments = NULL,
},
.dependencyCount = 0,
}, &device->meta_state.alloc, &device->meta_state.blit2d_depth_only_rp[ds_layout]);
@@ -1148,8 +1148,8 @@ blit2d_init_stencil_only_pipeline(struct radv_device *device,
.attachment = 0,
.layout = layout,
},
.preserveAttachmentCount = 1,
.pPreserveAttachments = (uint32_t[]) { 0 },
.preserveAttachmentCount = 0,
.pPreserveAttachments = NULL,
},
.dependencyCount = 0,
}, &device->meta_state.alloc, &device->meta_state.blit2d_stencil_only_rp[ds_layout]);

View File

@@ -232,8 +232,8 @@ create_color_renderpass(struct radv_device *device,
.attachment = VK_ATTACHMENT_UNUSED,
.layout = VK_IMAGE_LAYOUT_GENERAL,
},
.preserveAttachmentCount = 1,
.pPreserveAttachments = (uint32_t[]) { 0 },
.preserveAttachmentCount = 0,
.pPreserveAttachments = NULL,
},
.dependencyCount = 0,
}, &device->meta_state.alloc, pass);
@@ -438,10 +438,10 @@ emit_color_clear(struct radv_cmd_buffer *cmd_buffer,
.color_attachments = (struct radv_subpass_attachment[]) {
subpass->color_attachments[clear_att->colorAttachment]
},
.depth_stencil_attachment = (struct radv_subpass_attachment) { VK_ATTACHMENT_UNUSED, VK_IMAGE_LAYOUT_UNDEFINED }
.depth_stencil_attachment = NULL,
};
radv_cmd_buffer_set_subpass(cmd_buffer, &clear_subpass, false);
radv_cmd_buffer_set_subpass(cmd_buffer, &clear_subpass);
radv_CmdBindPipeline(cmd_buffer_h, VK_PIPELINE_BIND_POINT_GRAPHICS,
pipeline);
@@ -465,7 +465,7 @@ emit_color_clear(struct radv_cmd_buffer *cmd_buffer,
radv_CmdDraw(cmd_buffer_h, 3, clear_rect->layerCount, 0, clear_rect->baseArrayLayer);
}
radv_cmd_buffer_set_subpass(cmd_buffer, subpass, false);
radv_cmd_buffer_set_subpass(cmd_buffer, subpass);
}
@@ -547,8 +547,8 @@ create_depthstencil_renderpass(struct radv_device *device,
.attachment = 0,
.layout = VK_IMAGE_LAYOUT_GENERAL,
},
.preserveAttachmentCount = 1,
.pPreserveAttachments = (uint32_t[]) { 0 },
.preserveAttachmentCount = 0,
.pPreserveAttachments = NULL,
},
.dependencyCount = 0,
}, &device->meta_state.alloc, render_pass);
@@ -650,9 +650,8 @@ static bool depth_view_can_fast_clear(struct radv_cmd_buffer *cmd_buffer,
if (radv_image_has_htile(iview->image) &&
iview->base_mip == 0 &&
iview->base_layer == 0 &&
iview->layer_count == iview->image->info.array_size &&
radv_layout_is_htile_compressed(iview->image, layout, queue_mask) &&
radv_image_extent_compare(iview->image, &iview->extent))
!radv_image_extent_compare(iview->image, &iview->extent))
return true;
return false;
}
@@ -721,7 +720,7 @@ emit_depthstencil_clear(struct radv_cmd_buffer *cmd_buffer,
struct radv_meta_state *meta_state = &device->meta_state;
const struct radv_subpass *subpass = cmd_buffer->state.subpass;
const struct radv_framebuffer *fb = cmd_buffer->state.framebuffer;
const uint32_t pass_att = subpass->depth_stencil_attachment.attachment;
const uint32_t pass_att = subpass->depth_stencil_attachment->attachment;
VkClearDepthStencilValue clear_value = clear_att->clearValue.depthStencil;
VkImageAspectFlags aspects = clear_att->aspectMask;
const struct radv_image_view *iview = fb ? fb->attachments[pass_att].attachment : NULL;
@@ -761,7 +760,7 @@ emit_depthstencil_clear(struct radv_cmd_buffer *cmd_buffer,
iview,
samples_log2,
aspects,
subpass->depth_stencil_attachment.layout,
subpass->depth_stencil_attachment->layout,
clear_rect,
clear_value);
if (!pipeline)
@@ -771,7 +770,7 @@ emit_depthstencil_clear(struct radv_cmd_buffer *cmd_buffer,
pipeline);
if (depth_view_can_fast_clear(cmd_buffer, iview, aspects,
subpass->depth_stencil_attachment.layout,
subpass->depth_stencil_attachment->layout,
clear_rect, clear_value))
radv_update_ds_clear_metadata(cmd_buffer, iview->image,
clear_value, aspects);
@@ -1321,6 +1320,7 @@ radv_clear_cmask(struct radv_cmd_buffer *cmd_buffer,
image->cmask.size, value);
}
uint32_t
radv_clear_fmask(struct radv_cmd_buffer *cmd_buffer,
struct radv_image *image, uint32_t value)
@@ -1555,7 +1555,11 @@ emit_clear(struct radv_cmd_buffer *cmd_buffer,
if (aspects & VK_IMAGE_ASPECT_COLOR_BIT) {
const uint32_t subpass_att = clear_att->colorAttachment;
assert(subpass_att < subpass->color_count);
const uint32_t pass_att = subpass->color_attachments[subpass_att].attachment;
if (pass_att == VK_ATTACHMENT_UNUSED)
return;
VkImageLayout image_layout = subpass->color_attachments[subpass_att].layout;
const struct radv_image_view *iview = fb ? fb->attachments[pass_att].attachment : NULL;
VkClearColorValue clear_value = clear_att->clearValue.color;
@@ -1569,11 +1573,11 @@ emit_clear(struct radv_cmd_buffer *cmd_buffer,
emit_color_clear(cmd_buffer, clear_att, clear_rect, view_mask);
}
} else {
const uint32_t pass_att = subpass->depth_stencil_attachment.attachment;
const uint32_t pass_att = subpass->depth_stencil_attachment->attachment;
if (pass_att == VK_ATTACHMENT_UNUSED)
return;
VkImageLayout image_layout = subpass->depth_stencil_attachment.layout;
VkImageLayout image_layout = subpass->depth_stencil_attachment->layout;
const struct radv_image_view *iview = fb ? fb->attachments[pass_att].attachment : NULL;
VkClearDepthStencilValue clear_value = clear_att->clearValue.depthStencil;
@@ -1616,7 +1620,10 @@ radv_subpass_needs_clear(struct radv_cmd_buffer *cmd_buffer)
return true;
}
a = cmd_state->subpass->depth_stencil_attachment.attachment;
if (!cmd_state->subpass->depth_stencil_attachment)
return false;
a = cmd_state->subpass->depth_stencil_attachment->attachment;
return radv_attachment_needs_clear(cmd_state, a);
}
@@ -1685,17 +1692,19 @@ radv_cmd_buffer_clear_subpass(struct radv_cmd_buffer *cmd_buffer)
&post_flush);
}
uint32_t ds = cmd_state->subpass->depth_stencil_attachment.attachment;
if (radv_attachment_needs_clear(cmd_state, ds)) {
VkClearAttachment clear_att = {
.aspectMask = cmd_state->attachments[ds].pending_clear_aspects,
.clearValue = cmd_state->attachments[ds].clear_value,
};
if (cmd_state->subpass->depth_stencil_attachment) {
uint32_t ds = cmd_state->subpass->depth_stencil_attachment->attachment;
if (radv_attachment_needs_clear(cmd_state, ds)) {
VkClearAttachment clear_att = {
.aspectMask = cmd_state->attachments[ds].pending_clear_aspects,
.clearValue = cmd_state->attachments[ds].clear_value,
};
radv_subpass_clear_attachment(cmd_buffer,
&cmd_state->attachments[ds],
&clear_att, &pre_flush,
&post_flush);
radv_subpass_clear_attachment(cmd_buffer,
&cmd_state->attachments[ds],
&clear_att, &pre_flush,
&post_flush);
}
}
radv_meta_restore(&saved_state, cmd_buffer);

View File

@@ -189,24 +189,6 @@ meta_copy_buffer_to_image(struct radv_cmd_buffer *cmd_buffer,
layout,
&pRegions[r].imageSubresource);
if (!radv_is_buffer_format_supported(img_bsurf.format, NULL)) {
uint32_t queue_mask = radv_image_queue_family_mask(image,
cmd_buffer->queue_family_index,
cmd_buffer->queue_family_index);
MAYBE_UNUSED bool compressed = radv_layout_dcc_compressed(image, layout, queue_mask);
if (compressed) {
radv_decompress_dcc(cmd_buffer, image, &(VkImageSubresourceRange) {
.aspectMask = pRegions[r].imageSubresource.aspectMask,
.baseMipLevel = pRegions[r].imageSubresource.mipLevel,
.levelCount = 1,
.baseArrayLayer = pRegions[r].imageSubresource.baseArrayLayer,
.layerCount = pRegions[r].imageSubresource.layerCount,
});
}
img_bsurf.format = vk_format_for_size(vk_format_get_blocksize(img_bsurf.format));
img_bsurf.current_layout = VK_IMAGE_LAYOUT_GENERAL;
}
struct radv_meta_blit2d_buffer buf_bsurf = {
.bs = img_bsurf.bs,
.format = img_bsurf.format,
@@ -332,24 +314,6 @@ meta_copy_image_to_buffer(struct radv_cmd_buffer *cmd_buffer,
layout,
&pRegions[r].imageSubresource);
if (!radv_is_buffer_format_supported(img_info.format, NULL)) {
uint32_t queue_mask = radv_image_queue_family_mask(image,
cmd_buffer->queue_family_index,
cmd_buffer->queue_family_index);
MAYBE_UNUSED bool compressed = radv_layout_dcc_compressed(image, layout, queue_mask);
if (compressed) {
radv_decompress_dcc(cmd_buffer, image, &(VkImageSubresourceRange) {
.aspectMask = pRegions[r].imageSubresource.aspectMask,
.baseMipLevel = pRegions[r].imageSubresource.mipLevel,
.levelCount = 1,
.baseArrayLayer = pRegions[r].imageSubresource.baseArrayLayer,
.layerCount = pRegions[r].imageSubresource.layerCount,
});
}
img_info.format = vk_format_for_size(vk_format_get_blocksize(img_info.format));
img_info.current_layout = VK_IMAGE_LAYOUT_GENERAL;
}
struct radv_meta_blit2d_buffer buf_info = {
.bs = img_info.bs,
.format = img_info.format,

View File

@@ -24,7 +24,6 @@
#include "radv_meta.h"
#include "radv_private.h"
#include "vk_format.h"
static nir_shader *
build_fmask_expand_compute_shader(struct radv_device *device, int samples)
@@ -133,7 +132,7 @@ radv_expand_fmask_image_inplace(struct radv_cmd_buffer *cmd_buffer,
.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO,
.image = radv_image_to_handle(image),
.viewType = radv_meta_get_view_type(image),
.format = vk_format_no_srgb(image->vk_format),
.format = image->vk_format,
.subresourceRange = {
.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
.baseMipLevel = 0,

View File

@@ -633,8 +633,7 @@ radv_cmd_buffer_resolve_subpass(struct radv_cmd_buffer *cmd_buffer)
struct radv_subpass_attachment src_att = subpass->color_attachments[i];
struct radv_subpass_attachment dest_att = subpass->resolve_attachments[i];
if (src_att.attachment == VK_ATTACHMENT_UNUSED ||
dest_att.attachment == VK_ATTACHMENT_UNUSED)
if (dest_att.attachment == VK_ATTACHMENT_UNUSED)
continue;
struct radv_image *dst_img = cmd_buffer->state.framebuffer->attachments[dest_att.attachment].attachment->image;
@@ -661,8 +660,7 @@ radv_cmd_buffer_resolve_subpass(struct radv_cmd_buffer *cmd_buffer)
struct radv_subpass_attachment src_att = subpass->color_attachments[i];
struct radv_subpass_attachment dest_att = subpass->resolve_attachments[i];
if (src_att.attachment == VK_ATTACHMENT_UNUSED ||
dest_att.attachment == VK_ATTACHMENT_UNUSED)
if (dest_att.attachment == VK_ATTACHMENT_UNUSED)
continue;
struct radv_image *dst_img = cmd_buffer->state.framebuffer->attachments[dest_att.attachment].attachment->image;
@@ -675,10 +673,10 @@ radv_cmd_buffer_resolve_subpass(struct radv_cmd_buffer *cmd_buffer)
struct radv_subpass resolve_subpass = {
.color_count = 2,
.color_attachments = (struct radv_subpass_attachment[]) { src_att, dest_att },
.depth_stencil_attachment = { .attachment = VK_ATTACHMENT_UNUSED },
.depth_stencil_attachment = NULL,
};
radv_cmd_buffer_set_subpass(cmd_buffer, &resolve_subpass, false);
radv_cmd_buffer_set_subpass(cmd_buffer, &resolve_subpass);
VkResult ret = build_resolve_pipeline(cmd_buffer->device, radv_format_meta_fs_key(dst_img->vk_format));
if (ret != VK_SUCCESS) {
@@ -710,8 +708,7 @@ radv_decompress_resolve_subpass_src(struct radv_cmd_buffer *cmd_buffer)
struct radv_subpass_attachment src_att = subpass->color_attachments[i];
struct radv_subpass_attachment dest_att = subpass->resolve_attachments[i];
if (src_att.attachment == VK_ATTACHMENT_UNUSED ||
dest_att.attachment == VK_ATTACHMENT_UNUSED)
if (dest_att.attachment == VK_ATTACHMENT_UNUSED)
continue;
struct radv_image *src_image =

View File

@@ -232,8 +232,8 @@ create_resolve_pipeline(struct radv_device *device,
.attachment = VK_ATTACHMENT_UNUSED,
.layout = VK_IMAGE_LAYOUT_GENERAL,
},
.preserveAttachmentCount = 1,
.pPreserveAttachments = (uint32_t[]) { 0 },
.preserveAttachmentCount = 0,
.pPreserveAttachments = NULL,
},
.dependencyCount = 0,
}, &device->meta_state.alloc, rp + dst_layout);
@@ -610,8 +610,7 @@ radv_cmd_buffer_resolve_subpass_fs(struct radv_cmd_buffer *cmd_buffer)
struct radv_subpass_attachment src_att = subpass->color_attachments[i];
struct radv_subpass_attachment dest_att = subpass->resolve_attachments[i];
if (src_att.attachment == VK_ATTACHMENT_UNUSED ||
dest_att.attachment == VK_ATTACHMENT_UNUSED)
if (dest_att.attachment == VK_ATTACHMENT_UNUSED)
continue;
struct radv_image_view *dest_iview = cmd_buffer->state.framebuffer->attachments[dest_att.attachment].attachment;
@@ -620,10 +619,10 @@ radv_cmd_buffer_resolve_subpass_fs(struct radv_cmd_buffer *cmd_buffer)
struct radv_subpass resolve_subpass = {
.color_count = 1,
.color_attachments = (struct radv_subpass_attachment[]) { dest_att },
.depth_stencil_attachment = { .attachment = VK_ATTACHMENT_UNUSED },
.depth_stencil_attachment = NULL,
};
radv_cmd_buffer_set_subpass(cmd_buffer, &resolve_subpass, false);
radv_cmd_buffer_set_subpass(cmd_buffer, &resolve_subpass);
emit_resolve(cmd_buffer,
src_iview,

View File

@@ -589,6 +589,7 @@ set_loc_desc(struct radv_shader_context *ctx, int idx, uint8_t *sgpr_idx)
struct user_sgpr_info {
bool need_ring_offsets;
bool indirect_all_descriptor_sets;
uint8_t remaining_sgprs;
};
static bool needs_view_index_sgpr(struct radv_shader_context *ctx,
@@ -627,6 +628,50 @@ count_vs_user_sgprs(struct radv_shader_context *ctx)
return count;
}
static void allocate_inline_push_consts(struct radv_shader_context *ctx,
struct user_sgpr_info *user_sgpr_info)
{
uint8_t remaining_sgprs = user_sgpr_info->remaining_sgprs;
/* Only supported if shaders use push constants. */
if (ctx->shader_info->info.min_push_constant_used == UINT8_MAX)
return;
/* Only supported if shaders don't have indirect push constants. */
if (ctx->shader_info->info.has_indirect_push_constants)
return;
/* Only supported for 32-bit push constants. */
if (!ctx->shader_info->info.has_only_32bit_push_constants)
return;
uint8_t num_push_consts =
(ctx->shader_info->info.max_push_constant_used -
ctx->shader_info->info.min_push_constant_used) / 4;
/* Check if the number of user SGPRs is large enough. */
if (num_push_consts < remaining_sgprs) {
ctx->shader_info->info.num_inline_push_consts = num_push_consts;
} else {
ctx->shader_info->info.num_inline_push_consts = remaining_sgprs;
}
/* Clamp to the maximum number of allowed inlined push constants. */
if (ctx->shader_info->info.num_inline_push_consts > AC_MAX_INLINE_PUSH_CONSTS)
ctx->shader_info->info.num_inline_push_consts = AC_MAX_INLINE_PUSH_CONSTS;
if (ctx->shader_info->info.num_inline_push_consts == num_push_consts &&
!ctx->shader_info->info.loads_dynamic_offsets) {
/* Disable the default push constants path if all constants are
* inlined and if shaders don't use dynamic descriptors.
*/
ctx->shader_info->info.loads_push_constants = false;
}
ctx->shader_info->info.base_inline_push_consts =
ctx->shader_info->info.min_push_constant_used / 4;
}
static void allocate_user_sgprs(struct radv_shader_context *ctx,
gl_shader_stage stage,
bool has_previous_stage,
@@ -702,7 +747,12 @@ static void allocate_user_sgprs(struct radv_shader_context *ctx,
if (remaining_sgprs < num_desc_set) {
user_sgpr_info->indirect_all_descriptor_sets = true;
user_sgpr_info->remaining_sgprs = remaining_sgprs - 1;
} else {
user_sgpr_info->remaining_sgprs = remaining_sgprs - num_desc_set;
}
allocate_inline_push_consts(ctx, user_sgpr_info);
}
static void
@@ -732,6 +782,13 @@ declare_global_input_sgprs(struct radv_shader_context *ctx,
add_arg(args, ARG_SGPR, type, &ctx->abi.push_constants);
}
for (unsigned i = 0; i < ctx->shader_info->info.num_inline_push_consts; i++) {
add_arg(args, ARG_SGPR, ctx->ac.i32,
&ctx->abi.inline_push_consts[i]);
}
ctx->abi.num_inline_push_consts = ctx->shader_info->info.num_inline_push_consts;
ctx->abi.base_inline_push_consts = ctx->shader_info->info.base_inline_push_consts;
if (ctx->shader_info->info.so.num_outputs) {
add_arg(args, ARG_SGPR,
ac_array_in_const32_addr_space(ctx->ac.v4i32),
@@ -850,6 +907,11 @@ set_global_input_locs(struct radv_shader_context *ctx,
set_loc_shader_ptr(ctx, AC_UD_PUSH_CONSTANTS, user_sgpr_idx);
}
if (ctx->shader_info->info.num_inline_push_consts) {
set_loc_shader(ctx, AC_UD_INLINE_PUSH_CONSTANTS, user_sgpr_idx,
ctx->shader_info->info.num_inline_push_consts);
}
if (ctx->streamout_buffers) {
set_loc_shader_ptr(ctx, AC_UD_STREAMOUT_BUFFERS,
user_sgpr_idx);
@@ -1976,6 +2038,70 @@ adjust_vertex_fetch_alpha(struct radv_shader_context *ctx,
return alpha;
}
static unsigned
get_num_channels_from_data_format(unsigned data_format)
{
switch (data_format) {
case V_008F0C_BUF_DATA_FORMAT_8:
case V_008F0C_BUF_DATA_FORMAT_16:
case V_008F0C_BUF_DATA_FORMAT_32:
return 1;
case V_008F0C_BUF_DATA_FORMAT_8_8:
case V_008F0C_BUF_DATA_FORMAT_16_16:
case V_008F0C_BUF_DATA_FORMAT_32_32:
return 2;
case V_008F0C_BUF_DATA_FORMAT_10_11_11:
case V_008F0C_BUF_DATA_FORMAT_11_11_10:
case V_008F0C_BUF_DATA_FORMAT_32_32_32:
return 3;
case V_008F0C_BUF_DATA_FORMAT_8_8_8_8:
case V_008F0C_BUF_DATA_FORMAT_10_10_10_2:
case V_008F0C_BUF_DATA_FORMAT_2_10_10_10:
case V_008F0C_BUF_DATA_FORMAT_16_16_16_16:
case V_008F0C_BUF_DATA_FORMAT_32_32_32_32:
return 4;
default:
break;
}
return 4;
}
static LLVMValueRef
radv_fixup_vertex_input_fetches(struct radv_shader_context *ctx,
LLVMValueRef value,
unsigned num_channels,
bool is_float)
{
LLVMValueRef zero = is_float ? ctx->ac.f32_0 : ctx->ac.i32_0;
LLVMValueRef one = is_float ? ctx->ac.f32_1 : ctx->ac.i32_1;
LLVMValueRef chan[4];
if (LLVMGetTypeKind(LLVMTypeOf(value)) == LLVMVectorTypeKind) {
unsigned vec_size = LLVMGetVectorSize(LLVMTypeOf(value));
if (num_channels == 4 && num_channels == vec_size)
return value;
num_channels = MIN2(num_channels, vec_size);
for (unsigned i = 0; i < num_channels; i++)
chan[i] = ac_llvm_extract_elem(&ctx->ac, value, i);
} else {
if (num_channels) {
assert(num_channels == 1);
chan[0] = value;
}
}
for (unsigned i = num_channels; i < 4; i++) {
chan[i] = i == 3 ? one : zero;
chan[i] = ac_to_float(&ctx->ac, chan[i]);
}
return ac_build_gather_values(&ctx->ac, chan, 4);
}
static void
handle_vs_input_decl(struct radv_shader_context *ctx,
struct nir_variable *variable)
@@ -1988,7 +2114,7 @@ handle_vs_input_decl(struct radv_shader_context *ctx,
unsigned attrib_count = glsl_count_attribute_slots(variable->type, true);
uint8_t input_usage_mask =
ctx->shader_info->info.vs.input_usage_mask[variable->data.location];
unsigned num_channels = util_last_bit(input_usage_mask);
unsigned num_input_channels = util_last_bit(input_usage_mask);
variable->data.driver_location = variable->data.location * 4;
@@ -1996,6 +2122,11 @@ handle_vs_input_decl(struct radv_shader_context *ctx,
for (unsigned i = 0; i < attrib_count; ++i) {
LLVMValueRef output[4];
unsigned attrib_index = variable->data.location + i - VERT_ATTRIB_GENERIC0;
unsigned attrib_format = ctx->options->key.vs.vertex_attribute_formats[attrib_index];
unsigned data_format = attrib_format & 0x0f;
unsigned num_format = (attrib_format >> 4) & 0x07;
bool is_float = num_format != V_008F0C_BUF_NUM_FORMAT_UINT &&
num_format != V_008F0C_BUF_NUM_FORMAT_SINT;
if (ctx->options->key.vs.instance_rate_inputs & (1u << attrib_index)) {
uint32_t divisor = ctx->options->key.vs.instance_rate_divisors[attrib_index];
@@ -2027,34 +2158,19 @@ handle_vs_input_decl(struct radv_shader_context *ctx,
t_list = ac_build_load_to_sgpr(&ctx->ac, t_list_ptr, t_offset);
if (ctx->options->key.vs.vertex_attribute_provided & (1u << attrib_index)) {
input = ac_build_buffer_load_format(&ctx->ac, t_list,
buffer_index,
ctx->ac.i32_0,
num_channels, false, true);
} else {
/* Per the Vulkan spec, it's invalid to consume vertex
* attributes that are not provided by the pipeline but
* some (invalid) apps appear to do that. Fill the
* input array with (eg. (0, 0, 0, 1)) to workaround
* the problem and to avoid possible GPU hangs.
*/
LLVMValueRef chan[4];
/* Adjust the number of channels to load based on the vertex
* attribute format.
*/
unsigned num_format_channels = get_num_channels_from_data_format(data_format);
unsigned num_channels = MIN2(num_input_channels, num_format_channels);
/* The input_usage mask might be 0 if input variables
* are not removed by the compiler.
*/
num_channels = CLAMP(num_channels, 1, 4);
input = ac_build_buffer_load_format(&ctx->ac, t_list,
buffer_index,
ctx->ac.i32_0,
num_channels, false, true);
for (unsigned i = 0; i < num_channels; i++) {
chan[i] = i == 3 ? ctx->ac.f32_1 : ctx->ac.f32_0;
chan[i] = ac_to_float(&ctx->ac, chan[i]);
}
input = ac_build_gather_values(&ctx->ac, chan, num_channels);
}
input = ac_build_expand_to_vec4(&ctx->ac, input, num_channels);
input = radv_fixup_vertex_input_fetches(ctx, input, num_channels,
is_float);
for (unsigned chan = 0; chan < 4; chan++) {
LLVMValueRef llvm_chan = LLVMConstInt(ctx->ac.i32, chan, false);
@@ -3402,9 +3518,9 @@ ac_setup_rings(struct radv_shader_context *ctx)
}
}
static unsigned
ac_nir_get_max_workgroup_size(enum chip_class chip_class,
const struct nir_shader *nir)
unsigned
radv_nir_get_max_workgroup_size(enum chip_class chip_class,
const struct nir_shader *nir)
{
switch (nir->info.stage) {
case MESA_SHADER_TESS_CTRL:
@@ -3469,6 +3585,8 @@ LLVMModuleRef ac_translate_nir_to_llvm(struct ac_llvm_compiler *ac_llvm,
memset(shader_info, 0, sizeof(*shader_info));
radv_nir_shader_info_init(&shader_info->info);
for(int i = 0; i < shader_count; ++i)
radv_nir_shader_info_pass(shaders[i], options, &shader_info->info);
@@ -3480,7 +3598,7 @@ LLVMModuleRef ac_translate_nir_to_llvm(struct ac_llvm_compiler *ac_llvm,
ctx.max_workgroup_size = 0;
for (int i = 0; i < shader_count; ++i) {
ctx.max_workgroup_size = MAX2(ctx.max_workgroup_size,
ac_nir_get_max_workgroup_size(ctx.options->chip_class,
radv_nir_get_max_workgroup_size(ctx.options->chip_class,
shaders[i]));
}
@@ -3497,17 +3615,10 @@ LLVMModuleRef ac_translate_nir_to_llvm(struct ac_llvm_compiler *ac_llvm,
ctx.abi.clamp_shadow_reference = false;
ctx.abi.gfx9_stride_size_workaround = ctx.ac.chip_class == GFX9 && HAVE_LLVM < 0x800;
/* Because the new raw/struct atomic intrinsics are buggy with LLVM 8,
* we fallback to the old intrinsics for atomic buffer image operations
* and thus we need to apply the indexing workaround...
*/
ctx.abi.gfx9_stride_size_workaround_for_atomic = ctx.ac.chip_class == GFX9 && HAVE_LLVM < 0x900;
if (shader_count >= 2)
ac_init_exec_full_mask(&ctx.ac);
if ((ctx.ac.family == CHIP_VEGA10 ||
ctx.ac.family == CHIP_RAVEN) &&
if (ctx.ac.chip_class == GFX9 &&
shaders[shader_count - 1]->info.stage == MESA_SHADER_TESS_CTRL)
ac_nir_fixup_ls_hs_input_vgprs(&ctx);

View File

@@ -28,6 +28,116 @@
#include "vk_util.h"
static void
radv_render_pass_add_subpass_dep(struct radv_render_pass *pass,
const VkSubpassDependency2KHR *dep)
{
uint32_t src = dep->srcSubpass;
uint32_t dst = dep->dstSubpass;
/* Ignore subpass self-dependencies as they allow the app to call
* vkCmdPipelineBarrier() inside the render pass and the driver should
* only do the barrier when called, not when starting the render pass.
*/
if (src == dst)
return;
/* Accumulate all ingoing external dependencies to the first subpass. */
if (src == VK_SUBPASS_EXTERNAL)
dst = 0;
if (dst == VK_SUBPASS_EXTERNAL) {
if (dep->dstStageMask != VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT)
pass->end_barrier.src_stage_mask |= dep->srcStageMask;
pass->end_barrier.src_access_mask |= dep->srcAccessMask;
pass->end_barrier.dst_access_mask |= dep->dstAccessMask;
} else {
if (dep->dstStageMask != VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT)
pass->subpasses[dst].start_barrier.src_stage_mask |= dep->srcStageMask;
pass->subpasses[dst].start_barrier.src_access_mask |= dep->srcAccessMask;
pass->subpasses[dst].start_barrier.dst_access_mask |= dep->dstAccessMask;
}
}
static void
radv_render_pass_compile(struct radv_render_pass *pass)
{
for (uint32_t i = 0; i < pass->subpass_count; i++) {
struct radv_subpass *subpass = &pass->subpasses[i];
uint32_t color_sample_count = 1, depth_sample_count = 1;
/* We don't allow depth_stencil_attachment to be non-NULL and
* be VK_ATTACHMENT_UNUSED. This way something can just check
* for NULL and be guaranteed that they have a valid
* attachment.
*/
if (subpass->depth_stencil_attachment &&
subpass->depth_stencil_attachment->attachment == VK_ATTACHMENT_UNUSED)
subpass->depth_stencil_attachment = NULL;
for (uint32_t j = 0; j < subpass->attachment_count; j++) {
struct radv_subpass_attachment *subpass_att =
&subpass->attachments[j];
if (subpass_att->attachment == VK_ATTACHMENT_UNUSED)
continue;
struct radv_render_pass_attachment *pass_att =
&pass->attachments[subpass_att->attachment];
pass_att->last_subpass_idx = i;
}
subpass->has_color_att = false;
for (uint32_t j = 0; j < subpass->color_count; j++) {
struct radv_subpass_attachment *subpass_att =
&subpass->color_attachments[j];
if (subpass_att->attachment == VK_ATTACHMENT_UNUSED)
continue;
subpass->has_color_att = true;
struct radv_render_pass_attachment *pass_att =
&pass->attachments[subpass_att->attachment];
color_sample_count = pass_att->samples;
}
if (subpass->depth_stencil_attachment) {
const uint32_t a =
subpass->depth_stencil_attachment->attachment;
struct radv_render_pass_attachment *pass_att =
&pass->attachments[a];
depth_sample_count = pass_att->samples;
}
subpass->max_sample_count = MAX2(color_sample_count,
depth_sample_count);
/* We have to handle resolve attachments specially */
subpass->has_resolve = false;
if (subpass->resolve_attachments) {
for (uint32_t j = 0; j < subpass->color_count; j++) {
struct radv_subpass_attachment *resolve_att =
&subpass->resolve_attachments[j];
if (resolve_att->attachment == VK_ATTACHMENT_UNUSED)
continue;
subpass->has_resolve = true;
}
}
}
}
static unsigned
radv_num_subpass_attachments(const VkSubpassDescription *desc)
{
return desc->inputAttachmentCount +
desc->colorAttachmentCount +
(desc->pResolveAttachments ? desc->colorAttachmentCount : 0) +
(desc->pDepthStencilAttachment != NULL);
}
VkResult radv_CreateRenderPass(
VkDevice _device,
const VkRenderPassCreateInfo* pCreateInfo,
@@ -82,13 +192,8 @@ VkResult radv_CreateRenderPass(
uint32_t subpass_attachment_count = 0;
struct radv_subpass_attachment *p;
for (uint32_t i = 0; i < pCreateInfo->subpassCount; i++) {
const VkSubpassDescription *desc = &pCreateInfo->pSubpasses[i];
subpass_attachment_count +=
desc->inputAttachmentCount +
desc->colorAttachmentCount +
(desc->pResolveAttachments ? desc->colorAttachmentCount : 0) +
(desc->pDepthStencilAttachment != NULL);
radv_num_subpass_attachments(&pCreateInfo->pSubpasses[i]);
}
if (subpass_attachment_count) {
@@ -106,11 +211,13 @@ VkResult radv_CreateRenderPass(
p = pass->subpass_attachments;
for (uint32_t i = 0; i < pCreateInfo->subpassCount; i++) {
const VkSubpassDescription *desc = &pCreateInfo->pSubpasses[i];
uint32_t color_sample_count = 1, depth_sample_count = 1;
struct radv_subpass *subpass = &pass->subpasses[i];
subpass->input_count = desc->inputAttachmentCount;
subpass->color_count = desc->colorAttachmentCount;
subpass->attachment_count = radv_num_subpass_attachments(desc);
subpass->attachments = p;
if (multiview_info)
subpass->view_mask = multiview_info->pViewMasks[i];
@@ -123,8 +230,6 @@ VkResult radv_CreateRenderPass(
.attachment = desc->pInputAttachments[j].attachment,
.layout = desc->pInputAttachments[j].layout,
};
if (desc->pInputAttachments[j].attachment != VK_ATTACHMENT_UNUSED)
pass->attachments[desc->pInputAttachments[j].attachment].view_mask |= subpass->view_mask;
}
}
@@ -137,76 +242,61 @@ VkResult radv_CreateRenderPass(
.attachment = desc->pColorAttachments[j].attachment,
.layout = desc->pColorAttachments[j].layout,
};
if (desc->pColorAttachments[j].attachment != VK_ATTACHMENT_UNUSED) {
pass->attachments[desc->pColorAttachments[j].attachment].view_mask |= subpass->view_mask;
color_sample_count = pCreateInfo->pAttachments[desc->pColorAttachments[j].attachment].samples;
}
}
}
subpass->has_resolve = false;
if (desc->pResolveAttachments) {
subpass->resolve_attachments = p;
p += desc->colorAttachmentCount;
for (uint32_t j = 0; j < desc->colorAttachmentCount; j++) {
uint32_t a = desc->pResolveAttachments[j].attachment;
subpass->resolve_attachments[j] = (struct radv_subpass_attachment) {
.attachment = desc->pResolveAttachments[j].attachment,
.layout = desc->pResolveAttachments[j].layout,
};
if (a != VK_ATTACHMENT_UNUSED) {
subpass->has_resolve = true;
pass->attachments[desc->pResolveAttachments[j].attachment].view_mask |= subpass->view_mask;
}
}
}
if (desc->pDepthStencilAttachment) {
subpass->depth_stencil_attachment = (struct radv_subpass_attachment) {
subpass->depth_stencil_attachment = p++;
*subpass->depth_stencil_attachment = (struct radv_subpass_attachment) {
.attachment = desc->pDepthStencilAttachment->attachment,
.layout = desc->pDepthStencilAttachment->layout,
};
if (desc->pDepthStencilAttachment->attachment != VK_ATTACHMENT_UNUSED) {
pass->attachments[desc->pDepthStencilAttachment->attachment].view_mask |= subpass->view_mask;
depth_sample_count = pCreateInfo->pAttachments[desc->pDepthStencilAttachment->attachment].samples;
}
} else {
subpass->depth_stencil_attachment.attachment = VK_ATTACHMENT_UNUSED;
}
subpass->max_sample_count = MAX2(color_sample_count,
depth_sample_count);
}
for (unsigned i = 0; i < pCreateInfo->dependencyCount; ++i) {
uint32_t src = pCreateInfo->pDependencies[i].srcSubpass;
uint32_t dst = pCreateInfo->pDependencies[i].dstSubpass;
/* Ignore subpass self-dependencies as they allow the app to
* call vkCmdPipelineBarrier() inside the render pass and the
* driver should only do the barrier when called, not when
* starting the render pass.
*/
if (src == dst)
continue;
if (dst == VK_SUBPASS_EXTERNAL) {
pass->end_barrier.src_stage_mask = pCreateInfo->pDependencies[i].srcStageMask;
pass->end_barrier.src_access_mask = pCreateInfo->pDependencies[i].srcAccessMask;
pass->end_barrier.dst_access_mask = pCreateInfo->pDependencies[i].dstAccessMask;
} else {
pass->subpasses[dst].start_barrier.src_stage_mask = pCreateInfo->pDependencies[i].srcStageMask;
pass->subpasses[dst].start_barrier.src_access_mask = pCreateInfo->pDependencies[i].srcAccessMask;
pass->subpasses[dst].start_barrier.dst_access_mask = pCreateInfo->pDependencies[i].dstAccessMask;
}
/* Convert to a Dependency2KHR */
struct VkSubpassDependency2KHR dep2 = {
.srcSubpass = pCreateInfo->pDependencies[i].srcSubpass,
.dstSubpass = pCreateInfo->pDependencies[i].dstSubpass,
.srcStageMask = pCreateInfo->pDependencies[i].srcStageMask,
.dstStageMask = pCreateInfo->pDependencies[i].dstStageMask,
.srcAccessMask = pCreateInfo->pDependencies[i].srcAccessMask,
.dstAccessMask = pCreateInfo->pDependencies[i].dstAccessMask,
.dependencyFlags = pCreateInfo->pDependencies[i].dependencyFlags,
};
radv_render_pass_add_subpass_dep(pass, &dep2);
}
radv_render_pass_compile(pass);
*pRenderPass = radv_render_pass_to_handle(pass);
return VK_SUCCESS;
}
static unsigned
radv_num_subpass_attachments2(const VkSubpassDescription2KHR *desc)
{
return desc->inputAttachmentCount +
desc->colorAttachmentCount +
(desc->pResolveAttachments ? desc->colorAttachmentCount : 0) +
(desc->pDepthStencilAttachment != NULL);
}
VkResult radv_CreateRenderPass2KHR(
VkDevice _device,
const VkRenderPassCreateInfo2KHR* pCreateInfo,
@@ -250,13 +340,8 @@ VkResult radv_CreateRenderPass2KHR(
uint32_t subpass_attachment_count = 0;
struct radv_subpass_attachment *p;
for (uint32_t i = 0; i < pCreateInfo->subpassCount; i++) {
const VkSubpassDescription2KHR *desc = &pCreateInfo->pSubpasses[i];
subpass_attachment_count +=
desc->inputAttachmentCount +
desc->colorAttachmentCount +
(desc->pResolveAttachments ? desc->colorAttachmentCount : 0) +
(desc->pDepthStencilAttachment != NULL);
radv_num_subpass_attachments2(&pCreateInfo->pSubpasses[i]);
}
if (subpass_attachment_count) {
@@ -274,11 +359,12 @@ VkResult radv_CreateRenderPass2KHR(
p = pass->subpass_attachments;
for (uint32_t i = 0; i < pCreateInfo->subpassCount; i++) {
const VkSubpassDescription2KHR *desc = &pCreateInfo->pSubpasses[i];
uint32_t color_sample_count = 1, depth_sample_count = 1;
struct radv_subpass *subpass = &pass->subpasses[i];
subpass->input_count = desc->inputAttachmentCount;
subpass->color_count = desc->colorAttachmentCount;
subpass->attachment_count = radv_num_subpass_attachments2(desc);
subpass->attachments = p;
subpass->view_mask = desc->viewMask;
if (desc->inputAttachmentCount > 0) {
@@ -290,8 +376,6 @@ VkResult radv_CreateRenderPass2KHR(
.attachment = desc->pInputAttachments[j].attachment,
.layout = desc->pInputAttachments[j].layout,
};
if (desc->pInputAttachments[j].attachment != VK_ATTACHMENT_UNUSED)
pass->attachments[desc->pInputAttachments[j].attachment].view_mask |= subpass->view_mask;
}
}
@@ -304,71 +388,38 @@ VkResult radv_CreateRenderPass2KHR(
.attachment = desc->pColorAttachments[j].attachment,
.layout = desc->pColorAttachments[j].layout,
};
if (desc->pColorAttachments[j].attachment != VK_ATTACHMENT_UNUSED) {
pass->attachments[desc->pColorAttachments[j].attachment].view_mask |= subpass->view_mask;
color_sample_count = pCreateInfo->pAttachments[desc->pColorAttachments[j].attachment].samples;
}
}
}
subpass->has_resolve = false;
if (desc->pResolveAttachments) {
subpass->resolve_attachments = p;
p += desc->colorAttachmentCount;
for (uint32_t j = 0; j < desc->colorAttachmentCount; j++) {
uint32_t a = desc->pResolveAttachments[j].attachment;
subpass->resolve_attachments[j] = (struct radv_subpass_attachment) {
.attachment = desc->pResolveAttachments[j].attachment,
.layout = desc->pResolveAttachments[j].layout,
};
if (a != VK_ATTACHMENT_UNUSED) {
subpass->has_resolve = true;
pass->attachments[desc->pResolveAttachments[j].attachment].view_mask |= subpass->view_mask;
}
}
}
if (desc->pDepthStencilAttachment) {
subpass->depth_stencil_attachment = (struct radv_subpass_attachment) {
subpass->depth_stencil_attachment = p++;
*subpass->depth_stencil_attachment = (struct radv_subpass_attachment) {
.attachment = desc->pDepthStencilAttachment->attachment,
.layout = desc->pDepthStencilAttachment->layout,
};
if (desc->pDepthStencilAttachment->attachment != VK_ATTACHMENT_UNUSED) {
pass->attachments[desc->pDepthStencilAttachment->attachment].view_mask |= subpass->view_mask;
depth_sample_count = pCreateInfo->pAttachments[desc->pDepthStencilAttachment->attachment].samples;
}
} else {
subpass->depth_stencil_attachment.attachment = VK_ATTACHMENT_UNUSED;
}
subpass->max_sample_count = MAX2(color_sample_count,
depth_sample_count);
}
for (unsigned i = 0; i < pCreateInfo->dependencyCount; ++i) {
uint32_t src = pCreateInfo->pDependencies[i].srcSubpass;
uint32_t dst = pCreateInfo->pDependencies[i].dstSubpass;
/* Ignore subpass self-dependencies as they allow the app to
* call vkCmdPipelineBarrier() inside the render pass and the
* driver should only do the barrier when called, not when
* starting the render pass.
*/
if (src == dst)
continue;
if (dst == VK_SUBPASS_EXTERNAL) {
pass->end_barrier.src_stage_mask = pCreateInfo->pDependencies[i].srcStageMask;
pass->end_barrier.src_access_mask = pCreateInfo->pDependencies[i].srcAccessMask;
pass->end_barrier.dst_access_mask = pCreateInfo->pDependencies[i].dstAccessMask;
} else {
pass->subpasses[dst].start_barrier.src_stage_mask = pCreateInfo->pDependencies[i].srcStageMask;
pass->subpasses[dst].start_barrier.src_access_mask = pCreateInfo->pDependencies[i].srcAccessMask;
pass->subpasses[dst].start_barrier.dst_access_mask = pCreateInfo->pDependencies[i].dstAccessMask;
}
radv_render_pass_add_subpass_dep(pass,
&pCreateInfo->pDependencies[i]);
}
radv_render_pass_compile(pass);
*pRenderPass = radv_render_pass_to_handle(pass);
return VK_SUCCESS;

View File

@@ -524,7 +524,7 @@ radv_pipeline_compute_spi_color_formats(struct radv_pipeline *pipeline,
col_format |= cf << (4 * i);
}
if (!(col_format & 0xf) && blend->need_src_alpha & (1 << 0)) {
if (!col_format && blend->need_src_alpha & (1 << 0)) {
/* When a subpass doesn't have any color attachments, write the
* alpha channel of MRT0 when alpha coverage is enabled because
* the depth attachment needs it.
@@ -542,13 +542,10 @@ radv_pipeline_compute_spi_color_formats(struct radv_pipeline *pipeline,
}
}
/* The output for dual source blending should have the same format as
* the first output.
*/
blend->cb_shader_mask = ac_get_cb_shader_mask(col_format);
if (blend->mrt0_is_dual_src)
col_format |= (col_format & 0xf) << 4;
blend->cb_shader_mask = ac_get_cb_shader_mask(col_format);
blend->spi_shader_col_format = col_format;
}
@@ -978,11 +975,11 @@ radv_pipeline_out_of_order_rast(struct radv_pipeline *pipeline,
};
if (pCreateInfo->pDepthStencilState &&
subpass->depth_stencil_attachment.attachment != VK_ATTACHMENT_UNUSED) {
subpass->depth_stencil_attachment) {
const VkPipelineDepthStencilStateCreateInfo *vkds =
pCreateInfo->pDepthStencilState;
struct radv_render_pass_attachment *attachment =
pass->attachments + subpass->depth_stencil_attachment.attachment;
pass->attachments + subpass->depth_stencil_attachment->attachment;
bool has_stencil = vk_format_is_stencil(attachment->format);
struct radv_dsa_order_invariance order_invariance[2];
struct radv_shader_variant *ps =
@@ -1387,15 +1384,7 @@ radv_pipeline_init_dynamic_state(struct radv_pipeline *pipeline,
* disabled or if the subpass of the render pass the pipeline is
* created against does not use any color attachments.
*/
bool uses_color_att = false;
for (unsigned i = 0; i < subpass->color_count; ++i) {
if (subpass->color_attachments[i].attachment != VK_ATTACHMENT_UNUSED) {
uses_color_att = true;
break;
}
}
if (uses_color_att && states & RADV_DYNAMIC_BLEND_CONSTANTS) {
if (subpass->has_color_att && states & RADV_DYNAMIC_BLEND_CONSTANTS) {
assert(pCreateInfo->pColorBlendState);
typed_memcpy(dynamic->blend_constants,
pCreateInfo->pColorBlendState->blendConstants, 4);
@@ -1413,8 +1402,7 @@ radv_pipeline_init_dynamic_state(struct radv_pipeline *pipeline,
* disabled or if the subpass of the render pass the pipeline is created
* against does not use a depth/stencil attachment.
*/
if (needed_states &&
subpass->depth_stencil_attachment.attachment != VK_ATTACHMENT_UNUSED) {
if (needed_states && subpass->depth_stencil_attachment) {
assert(pCreateInfo->pDepthStencilState);
if (states & RADV_DYNAMIC_DEPTH_BOUNDS) {
@@ -1448,13 +1436,11 @@ radv_pipeline_init_dynamic_state(struct radv_pipeline *pipeline,
const VkPipelineDiscardRectangleStateCreateInfoEXT *discard_rectangle_info =
vk_find_struct_const(pCreateInfo->pNext, PIPELINE_DISCARD_RECTANGLE_STATE_CREATE_INFO_EXT);
if (needed_states & RADV_DYNAMIC_DISCARD_RECTANGLE) {
if (states & RADV_DYNAMIC_DISCARD_RECTANGLE) {
dynamic->discard_rectangle.count = discard_rectangle_info->discardRectangleCount;
if (states & RADV_DYNAMIC_DISCARD_RECTANGLE) {
typed_memcpy(dynamic->discard_rectangle.rectangles,
discard_rectangle_info->pDiscardRectangles,
discard_rectangle_info->discardRectangleCount);
}
typed_memcpy(dynamic->discard_rectangle.rectangles,
discard_rectangle_info->pDiscardRectangles,
discard_rectangle_info->discardRectangleCount);
}
pipeline->dynamic_state.mask = states;
@@ -1897,13 +1883,27 @@ radv_generate_graphics_pipeline_key(struct radv_pipeline *pipeline,
}
for (unsigned i = 0; i < input_state->vertexAttributeDescriptionCount; ++i) {
unsigned location = input_state->pVertexAttributeDescriptions[i].location;
unsigned binding = input_state->pVertexAttributeDescriptions[i].binding;
const VkVertexInputAttributeDescription *desc =
&input_state->pVertexAttributeDescriptions[i];
const struct vk_format_description *format_desc;
unsigned location = desc->location;
unsigned binding = desc->binding;
unsigned num_format, data_format;
int first_non_void;
if (binding_input_rate & (1u << binding)) {
key.instance_rate_inputs |= 1u << location;
key.instance_rate_divisors[location] = instance_rate_divisors[binding];
}
format_desc = vk_format_description(desc->format);
first_non_void = vk_format_get_first_non_void_channel(desc->format);
num_format = radv_translate_buffer_numformat(format_desc, first_non_void);
data_format = radv_translate_buffer_dataformat(format_desc, first_non_void);
key.vertex_attribute_formats[location] = data_format | (num_format << 4);
if (pipeline->device->physical_device->rad_info.chip_class <= VI &&
pipeline->device->physical_device->rad_info.family != CHIP_STONEY) {
VkFormat format = input_state->pVertexAttributeDescriptions[i].format;
@@ -1927,8 +1927,6 @@ radv_generate_graphics_pipeline_key(struct radv_pipeline *pipeline,
}
key.vertex_alpha_adjust |= adjust << (2 * location);
}
key.vertex_attribute_provided |= 1 << location;
}
if (pCreateInfo->pTessellationState)
@@ -1957,9 +1955,10 @@ radv_fill_shader_keys(struct radv_shader_variant_key *keys,
{
keys[MESA_SHADER_VERTEX].vs.instance_rate_inputs = key->instance_rate_inputs;
keys[MESA_SHADER_VERTEX].vs.alpha_adjust = key->vertex_alpha_adjust;
keys[MESA_SHADER_VERTEX].vs.vertex_attribute_provided = key->vertex_attribute_provided;
for (unsigned i = 0; i < MAX_VERTEX_ATTRIBS; ++i)
for (unsigned i = 0; i < MAX_VERTEX_ATTRIBS; ++i) {
keys[MESA_SHADER_VERTEX].vs.instance_rate_divisors[i] = key->instance_rate_divisors[i];
keys[MESA_SHADER_VERTEX].vs.vertex_attribute_formats[i] = key->vertex_attribute_formats[i];
}
if (nir[MESA_SHADER_TESS_CTRL]) {
keys[MESA_SHADER_VERTEX].vs.as_ls = true;
@@ -2523,8 +2522,8 @@ radv_compute_bin_size(struct radv_pipeline *pipeline, const VkGraphicsPipelineCr
extent = color_entry->extent;
if (subpass->depth_stencil_attachment.attachment != VK_ATTACHMENT_UNUSED) {
struct radv_render_pass_attachment *attachment = pass->attachments + subpass->depth_stencil_attachment.attachment;
if (subpass->depth_stencil_attachment) {
struct radv_render_pass_attachment *attachment = pass->attachments + subpass->depth_stencil_attachment->attachment;
/* Coefficients taken from AMDVLK */
unsigned depth_coeff = vk_format_is_depth(attachment->format) ? 5 : 0;
@@ -2615,8 +2614,8 @@ radv_pipeline_generate_depth_stencil_state(struct radeon_cmdbuf *ctx_cs,
uint32_t db_render_control = 0, db_render_override2 = 0;
uint32_t db_render_override = 0;
if (subpass->depth_stencil_attachment.attachment != VK_ATTACHMENT_UNUSED)
attachment = pass->attachments + subpass->depth_stencil_attachment.attachment;
if (subpass->depth_stencil_attachment)
attachment = pass->attachments + subpass->depth_stencil_attachment->attachment;
bool has_depth_attachment = attachment && vk_format_is_depth(attachment->format);
bool has_stencil_attachment = attachment && vk_format_is_stencil(attachment->format);
@@ -2658,8 +2657,7 @@ radv_pipeline_generate_depth_stencil_state(struct radeon_cmdbuf *ctx_cs,
db_render_override |= S_02800C_FORCE_HIS_ENABLE0(V_02800C_FORCE_DISABLE) |
S_02800C_FORCE_HIS_ENABLE1(V_02800C_FORCE_DISABLE);
if (pipeline->device->enabled_extensions.EXT_depth_range_unrestricted &&
!pCreateInfo->pRasterizationState->depthClampEnable &&
if (!pCreateInfo->pRasterizationState->depthClampEnable &&
ps->info.info.ps.writes_z) {
/* From VK_EXT_depth_range_unrestricted spec:
*
@@ -2728,11 +2726,18 @@ radv_pipeline_generate_raster_state(struct radeon_cmdbuf *ctx_cs,
const VkConservativeRasterizationModeEXT mode =
radv_get_conservative_raster_mode(vkraster);
uint32_t pa_sc_conservative_rast = S_028C4C_NULL_SQUAD_AA_MASK_ENABLE(1);
bool depth_clip_disable = vkraster->depthClampEnable;
const VkPipelineRasterizationDepthClipStateCreateInfoEXT *depth_clip_state =
vk_find_struct_const(vkraster->pNext, PIPELINE_RASTERIZATION_DEPTH_CLIP_STATE_CREATE_INFO_EXT);
if (depth_clip_state) {
depth_clip_disable = !depth_clip_state->depthClipEnable;
}
radeon_set_context_reg(ctx_cs, R_028810_PA_CL_CLIP_CNTL,
S_028810_DX_CLIP_SPACE_DEF(1) | // vulkan uses DX conventions.
S_028810_ZCLIP_NEAR_DISABLE(vkraster->depthClampEnable ? 1 : 0) |
S_028810_ZCLIP_FAR_DISABLE(vkraster->depthClampEnable ? 1 : 0) |
S_028810_ZCLIP_NEAR_DISABLE(depth_clip_disable ? 1 : 0) |
S_028810_ZCLIP_FAR_DISABLE(depth_clip_disable ? 1 : 0) |
S_028810_DX_RASTERIZATION_KILL(vkraster->rasterizerDiscardEnable ? 1 : 0) |
S_028810_DX_LINEAR_ATTR_CLIP_ENA(1));
@@ -3205,7 +3210,6 @@ radv_compute_db_shader_control(const struct radv_device *device,
const struct radv_pipeline *pipeline,
const struct radv_shader_variant *ps)
{
const struct radv_multisample_state *ms = &pipeline->graphics.ms;
unsigned z_order;
if (ps->info.fs.early_fragment_test || !ps->info.info.ps.writes_memory)
z_order = V_02880C_EARLY_Z_THEN_LATE_Z;
@@ -3575,8 +3579,7 @@ radv_pipeline_init(struct radv_pipeline *pipeline,
struct radv_device *device,
struct radv_pipeline_cache *cache,
const VkGraphicsPipelineCreateInfo *pCreateInfo,
const struct radv_graphics_pipeline_create_info *extra,
const VkAllocationCallbacks *alloc)
const struct radv_graphics_pipeline_create_info *extra)
{
VkResult result;
bool has_view_index = false;
@@ -3585,8 +3588,6 @@ radv_pipeline_init(struct radv_pipeline *pipeline,
struct radv_subpass *subpass = pass->subpasses + pCreateInfo->subpass;
if (subpass->view_mask)
has_view_index = true;
if (alloc == NULL)
alloc = &device->alloc;
pipeline->device = device;
pipeline->layout = radv_pipeline_layout_from_handle(pCreateInfo->layout);
@@ -3714,7 +3715,7 @@ radv_graphics_pipeline_create(
return vk_error(device->instance, VK_ERROR_OUT_OF_HOST_MEMORY);
result = radv_pipeline_init(pipeline, device, cache,
pCreateInfo, extra, pAllocator);
pCreateInfo, extra);
if (result != VK_SUCCESS) {
radv_pipeline_destroy(device, pipeline, pAllocator);
return result;

View File

@@ -365,7 +365,7 @@ struct radv_pipeline_cache {
struct radv_pipeline_key {
uint32_t instance_rate_inputs;
uint32_t instance_rate_divisors[MAX_VERTEX_ATTRIBS];
uint32_t vertex_attribute_provided;
uint8_t vertex_attribute_formats[MAX_VERTEX_ATTRIBS];
uint64_t vertex_alpha_adjust;
unsigned tess_input_vertices;
uint32_t col_format;
@@ -1148,7 +1148,6 @@ void si_write_scissors(struct radeon_cmdbuf *cs, int first,
const VkViewport *viewports, bool can_use_guardband);
uint32_t si_get_ia_multi_vgt_param(struct radv_cmd_buffer *cmd_buffer,
bool instanced_draw, bool indirect_draw,
bool count_from_stream_output,
uint32_t draw_vertex_count);
void si_cs_emit_write_event_eop(struct radeon_cmdbuf *cs,
enum chip_class chip_class,
@@ -1188,8 +1187,7 @@ radv_cmd_buffer_upload_alloc(struct radv_cmd_buffer *cmd_buffer,
void **ptr);
void
radv_cmd_buffer_set_subpass(struct radv_cmd_buffer *cmd_buffer,
const struct radv_subpass *subpass,
bool transitions);
const struct radv_subpass *subpass);
bool
radv_cmd_buffer_upload_data(struct radv_cmd_buffer *cmd_buffer,
unsigned size, unsigned alignmnet,
@@ -1448,7 +1446,6 @@ uint32_t radv_translate_buffer_dataformat(const struct vk_format_description *de
int first_non_void);
uint32_t radv_translate_buffer_numformat(const struct vk_format_description *desc,
int first_non_void);
bool radv_is_buffer_format_supported(VkFormat format, bool *scaled);
uint32_t radv_translate_colorformat(VkFormat format);
uint32_t radv_translate_color_numformat(VkFormat format,
const struct vk_format_description *desc,
@@ -1823,16 +1820,22 @@ struct radv_subpass_attachment {
};
struct radv_subpass {
uint32_t attachment_count;
struct radv_subpass_attachment * attachments;
uint32_t input_count;
uint32_t color_count;
struct radv_subpass_attachment * input_attachments;
struct radv_subpass_attachment * color_attachments;
struct radv_subpass_attachment * resolve_attachments;
struct radv_subpass_attachment depth_stencil_attachment;
struct radv_subpass_attachment * depth_stencil_attachment;
/** Subpass has at least one resolve attachment */
bool has_resolve;
/** Subpass has at least one color attachment */
bool has_color_att;
struct radv_subpass_barrier start_barrier;
uint32_t view_mask;
@@ -1846,7 +1849,9 @@ struct radv_render_pass_attachment {
VkAttachmentLoadOp stencil_load_op;
VkImageLayout initial_layout;
VkImageLayout final_layout;
uint32_t view_mask;
/* The subpass id in which the attachment will be used last. */
uint32_t last_subpass_idx;
};
struct radv_render_pass {
@@ -1941,6 +1946,9 @@ void radv_compile_nir_shader(struct ac_llvm_compiler *ac_llvm,
int nir_count,
const struct radv_nir_compiler_options *options);
unsigned radv_nir_get_max_workgroup_size(enum chip_class chip_class,
const struct nir_shader *nir);
/* radv_shader_info.h */
struct radv_shader_info;
@@ -1948,6 +1956,8 @@ void radv_nir_shader_info_pass(const struct nir_shader *nir,
const struct radv_nir_compiler_options *options,
struct radv_shader_info *info);
void radv_nir_shader_info_init(struct radv_shader_info *info);
struct radeon_winsys_sem;
#define RADV_DEFINE_HANDLE_CASTS(__radv_type, __VkType) \

View File

@@ -40,6 +40,18 @@
static const int pipelinestat_block_size = 11 * 8;
static const unsigned pipeline_statistics_indices[] = {7, 6, 3, 4, 5, 2, 1, 0, 8, 9, 10};
static unsigned get_max_db(struct radv_device *device)
{
unsigned num_db = device->physical_device->rad_info.num_render_backends;
MAYBE_UNUSED unsigned rb_mask = device->physical_device->rad_info.enabled_rb_mask;
/* Otherwise we need to change the query reset procedure */
assert(rb_mask == ((1ull << num_db) - 1));
return num_db;
}
static nir_ssa_def *nir_test_flag(nir_builder *b, nir_ssa_def *flags, uint32_t flag)
{
return nir_i2b(b, nir_iand(b, flags, nir_imm_int(b, flag)));
@@ -96,14 +108,12 @@ build_occlusion_query_shader(struct radv_device *device) {
* uint64_t dst_offset = dst_stride * global_id.x;
* bool available = true;
* for (int i = 0; i < db_count; ++i) {
* if (enabled_rb_mask & (1 << i)) {
* uint64_t start = src_buf[src_offset + 16 * i];
* uint64_t end = src_buf[src_offset + 16 * i + 8];
* if ((start & (1ull << 63)) && (end & (1ull << 63)))
* result += end - start;
* else
* available = false;
* }
* uint64_t start = src_buf[src_offset + 16 * i];
* uint64_t end = src_buf[src_offset + 16 * i + 8];
* if ((start & (1ull << 63)) && (end & (1ull << 63)))
* result += end - start;
* else
* available = false;
* }
* uint32_t elem_size = flags & VK_QUERY_RESULT_64_BIT ? 8 : 4;
* if ((flags & VK_QUERY_RESULT_PARTIAL_BIT) || available) {
@@ -129,8 +139,7 @@ build_occlusion_query_shader(struct radv_device *device) {
nir_variable *start = nir_local_variable_create(b.impl, glsl_uint64_t_type(), "start");
nir_variable *end = nir_local_variable_create(b.impl, glsl_uint64_t_type(), "end");
nir_variable *available = nir_local_variable_create(b.impl, glsl_bool_type(), "available");
unsigned enabled_rb_mask = device->physical_device->rad_info.enabled_rb_mask;
unsigned db_count = device->physical_device->rad_info.num_render_backends;
unsigned db_count = get_max_db(device);
nir_ssa_def *flags = radv_load_push_int(&b, 0, "flags");
@@ -176,16 +185,6 @@ build_occlusion_query_shader(struct radv_device *device) {
nir_ssa_def *current_outer_count = nir_load_var(&b, outer_counter);
radv_break_on_count(&b, outer_counter, nir_imm_int(&b, db_count));
nir_ssa_def *enabled_cond =
nir_iand(&b, nir_imm_int(&b, enabled_rb_mask),
nir_ishl(&b, nir_imm_int(&b, 1), current_outer_count));
nir_if *enabled_if = nir_if_create(b.shader);
enabled_if->condition = nir_src_for_ssa(nir_i2b(&b, enabled_cond));
nir_cf_node_insert(b.cursor, &enabled_if->cf_node);
b.cursor = nir_after_cf_list(&enabled_if->then_list);
nir_ssa_def *load_offset = nir_imul(&b, current_outer_count, nir_imm_int(&b, 16));
load_offset = nir_iadd(&b, input_base, load_offset);
@@ -1039,7 +1038,7 @@ VkResult radv_CreateQueryPool(
switch(pCreateInfo->queryType) {
case VK_QUERY_TYPE_OCCLUSION:
pool->stride = 16 * device->physical_device->rad_info.num_render_backends;
pool->stride = 16 * get_max_db(device);
break;
case VK_QUERY_TYPE_PIPELINE_STATISTICS:
pool->stride = pipelinestat_block_size * 2;
@@ -1153,17 +1152,12 @@ VkResult radv_GetQueryPoolResults(
}
case VK_QUERY_TYPE_OCCLUSION: {
volatile uint64_t const *src64 = (volatile uint64_t const *)src;
uint32_t db_count = device->physical_device->rad_info.num_render_backends;
uint32_t enabled_rb_mask = device->physical_device->rad_info.enabled_rb_mask;
uint64_t sample_count = 0;
int db_count = get_max_db(device);
available = 1;
for (int i = 0; i < db_count; ++i) {
uint64_t start, end;
if (!(enabled_rb_mask & (1 << i)))
continue;
do {
start = src64[2 * i];
end = src64[2 * i + 1];

View File

@@ -159,7 +159,7 @@ radv_optimize_nir(struct nir_shader *shader, bool optimize_conservatively,
NIR_PASS(progress, shader, nir_opt_if);
NIR_PASS(progress, shader, nir_opt_dead_cf);
NIR_PASS(progress, shader, nir_opt_cse);
NIR_PASS(progress, shader, nir_opt_peephole_select, 8, true);
NIR_PASS(progress, shader, nir_opt_peephole_select, 8, true, true);
NIR_PASS(progress, shader, nir_opt_algebraic);
NIR_PASS(progress, shader, nir_opt_constant_folding);
NIR_PASS(progress, shader, nir_opt_undef);
@@ -222,8 +222,6 @@ radv_shader_compile_to_nir(struct radv_device *device,
.lower_ubo_ssbo_access_to_offsets = true,
.caps = {
.descriptor_array_dynamic_indexing = true,
.descriptor_array_non_uniform_indexing = true,
.descriptor_indexing = true,
.device_group = true,
.draw_parameters = true,
.float64 = true,
@@ -234,6 +232,7 @@ radv_shader_compile_to_nir(struct radv_device *device,
.int16 = true,
.int64 = true,
.multiview = true,
.physical_storage_buffer_address = true,
.runtime_descriptor_array = true,
.shader_viewport_index_layer = true,
.stencil_export = true,
@@ -252,6 +251,7 @@ radv_shader_compile_to_nir(struct radv_device *device,
},
.ubo_ptr_type = glsl_vector_type(GLSL_TYPE_UINT, 2),
.ssbo_ptr_type = glsl_vector_type(GLSL_TYPE_UINT, 2),
.phys_ssbo_ptr_type = glsl_vector_type(GLSL_TYPE_UINT64, 1),
.push_const_ptr_type = glsl_uint_type(),
.shared_ptr_type = glsl_uint_type(),
};
@@ -612,8 +612,6 @@ shader_variant_create(struct radv_device *device,
tm_options |= AC_TM_SISCHED;
if (options->check_ir)
tm_options |= AC_TM_CHECK_IR;
if (device->instance->debug_flags & RADV_DEBUG_NO_LOAD_STORE_OPT)
tm_options |= AC_TM_NO_LOAD_STORE_OPT;
thread_compiler = !(device->instance->debug_flags & RADV_DEBUG_NOTHREADLLVM);
radv_init_llvm_once();
@@ -737,7 +735,8 @@ generate_shader_stats(struct radv_device *device,
gl_shader_stage stage,
struct _mesa_string_buffer *buf)
{
unsigned lds_increment = device->physical_device->rad_info.chip_class >= CIK ? 512 : 256;
enum chip_class chip_class = device->physical_device->rad_info.chip_class;
unsigned lds_increment = chip_class >= CIK ? 512 : 256;
struct ac_shader_config *conf;
unsigned max_simd_waves;
unsigned lds_per_wave = 0;
@@ -750,12 +749,17 @@ generate_shader_stats(struct radv_device *device,
lds_per_wave = conf->lds_size * lds_increment +
align(variant->info.fs.num_interp * 48,
lds_increment);
} else if (stage == MESA_SHADER_COMPUTE) {
unsigned max_workgroup_size =
radv_nir_get_max_workgroup_size(chip_class, variant->nir);
lds_per_wave = (conf->lds_size * lds_increment) /
DIV_ROUND_UP(max_workgroup_size, 64);
}
if (conf->num_sgprs)
max_simd_waves =
MIN2(max_simd_waves,
radv_get_num_physical_sgprs(device->physical_device) / conf->num_sgprs);
ac_get_num_physical_sgprs(chip_class) / conf->num_sgprs);
if (conf->num_vgprs)
max_simd_waves =
@@ -840,7 +844,7 @@ radv_GetShaderInfoAMD(VkDevice _device,
VkShaderStatisticsInfoAMD statistics = {};
statistics.shaderStageMask = shaderStage;
statistics.numPhysicalVgprs = RADV_NUM_PHYSICAL_VGPRS;
statistics.numPhysicalSgprs = radv_get_num_physical_sgprs(device->physical_device);
statistics.numPhysicalSgprs = ac_get_num_physical_sgprs(device->physical_device->rad_info.chip_class);
statistics.numAvailableSgprs = statistics.numPhysicalSgprs;
if (stage == MESA_SHADER_COMPUTE) {

View File

@@ -65,9 +65,7 @@ enum {
struct radv_vs_variant_key {
uint32_t instance_rate_inputs;
uint32_t instance_rate_divisors[MAX_VERTEX_ATTRIBS];
/* Mask of vertex attributes that are provided by the pipeline. */
uint32_t vertex_attribute_provided;
uint8_t vertex_attribute_formats[MAX_VERTEX_ATTRIBS];
/* For 2_10_10_10 formats the alpha is handled as unsigned by pre-vega HW.
* so we may need to fix it up. */
@@ -132,10 +130,11 @@ struct radv_nir_compiler_options {
enum radv_ud_index {
AC_UD_SCRATCH_RING_OFFSETS = 0,
AC_UD_PUSH_CONSTANTS = 1,
AC_UD_INDIRECT_DESCRIPTOR_SETS = 2,
AC_UD_VIEW_INDEX = 3,
AC_UD_STREAMOUT_BUFFERS = 4,
AC_UD_SHADER_START = 5,
AC_UD_INLINE_PUSH_CONSTANTS = 2,
AC_UD_INDIRECT_DESCRIPTOR_SETS = 3,
AC_UD_VIEW_INDEX = 4,
AC_UD_STREAMOUT_BUFFERS = 5,
AC_UD_SHADER_START = 6,
AC_UD_VS_VERTEX_BUFFERS = AC_UD_SHADER_START,
AC_UD_VS_BASE_VERTEX_START_INSTANCE,
AC_UD_VS_MAX_UD,
@@ -165,6 +164,13 @@ struct radv_streamout_info {
struct radv_shader_info {
bool loads_push_constants;
bool loads_dynamic_offsets;
uint8_t min_push_constant_used;
uint8_t max_push_constant_used;
bool has_only_32bit_push_constants;
bool has_indirect_push_constants;
uint8_t num_inline_push_consts;
uint8_t base_inline_push_consts;
uint32_t desc_set_used_mask;
bool needs_multiview_view_index;
bool uses_invocation_id;
@@ -413,10 +419,4 @@ static inline unsigned shader_io_get_unique_index(gl_varying_slot slot)
unreachable("illegal slot in get unique index\n");
}
static inline uint32_t
radv_get_num_physical_sgprs(struct radv_physical_device *physical_device)
{
return physical_device->rad_info.chip_class >= VI ? 800 : 512;
}
#endif

View File

@@ -115,15 +115,6 @@ gather_intrinsic_load_deref_info(const nir_shader *nir,
}
}
static uint32_t
widen_writemask(uint32_t wrmask)
{
uint32_t new_wrmask = 0;
for(unsigned i = 0; i < 4; i++)
new_wrmask |= (wrmask & (1 << i) ? 0x3 : 0x0) << (i * 2);
return new_wrmask;
}
static void
set_output_usage_mask(const nir_shader *nir, const nir_intrinsic_instr *instr,
uint8_t *output_usage_mask)
@@ -131,7 +122,7 @@ set_output_usage_mask(const nir_shader *nir, const nir_intrinsic_instr *instr,
nir_deref_instr *deref_instr =
nir_instr_as_deref(instr->src[0].ssa->parent_instr);
nir_variable *var = nir_deref_instr_get_variable(deref_instr);
unsigned attrib_count = glsl_count_attribute_slots(deref_instr->type, false);
unsigned attrib_count = glsl_count_attribute_slots(var->type, false);
unsigned idx = var->data.location;
unsigned comp = var->data.location_frac;
unsigned const_offset = 0;
@@ -139,19 +130,15 @@ set_output_usage_mask(const nir_shader *nir, const nir_intrinsic_instr *instr,
get_deref_offset(deref_instr, &const_offset);
if (var->data.compact) {
assert(!glsl_type_is_64bit(deref_instr->type));
const_offset += comp;
output_usage_mask[idx + const_offset / 4] |= 1 << (const_offset % 4);
return;
}
uint32_t wrmask = nir_intrinsic_write_mask(instr);
if (glsl_type_is_64bit(deref_instr->type))
wrmask = widen_writemask(wrmask);
for (unsigned i = 0; i < attrib_count; i++)
for (unsigned i = 0; i < attrib_count; i++) {
output_usage_mask[idx + i + const_offset] |=
((wrmask >> (i * 4)) & 0xf) << comp;
instr->const_index[0] << comp;
}
}
static void
@@ -197,6 +184,32 @@ gather_intrinsic_store_deref_info(const nir_shader *nir,
}
}
static void
gather_push_constant_info(const nir_shader *nir,
const nir_intrinsic_instr *instr,
struct radv_shader_info *info)
{
nir_const_value *cval = nir_src_as_const_value(instr->src[0]);
int base = nir_intrinsic_base(instr);
if (!cval) {
info->has_indirect_push_constants = true;
} else {
uint32_t min = base + cval->u32[0];
uint32_t max = min + instr->num_components * 4;
info->max_push_constant_used =
MAX2(max, info->max_push_constant_used);
info->min_push_constant_used =
MIN2(min, info->min_push_constant_used);
}
if (instr->dest.ssa.bit_size != 32)
info->has_only_32bit_push_constants = false;
info->loads_push_constants = true;
}
static void
gather_intrinsic_info(const nir_shader *nir, const nir_intrinsic_instr *instr,
struct radv_shader_info *info)
@@ -250,7 +263,7 @@ gather_intrinsic_info(const nir_shader *nir, const nir_intrinsic_instr *instr,
info->uses_prim_id = true;
break;
case nir_intrinsic_load_push_constant:
info->loads_push_constants = true;
gather_push_constant_info(nir, instr, info);
break;
case nir_intrinsic_vulkan_resource_index:
info->desc_set_used_mask |= (1 << nir_intrinsic_desc_set(instr));
@@ -512,6 +525,14 @@ gather_xfb_info(const nir_shader *nir, struct radv_shader_info *info)
ralloc_free(xfb);
}
void
radv_nir_shader_info_init(struct radv_shader_info *info)
{
/* Assume that shaders only have 32-bit push constants by default. */
info->min_push_constant_used = UINT8_MAX;
info->has_only_32bit_push_constants = true;
}
void
radv_nir_shader_info_pass(const struct nir_shader *nir,
const struct radv_nir_compiler_options *options,
@@ -523,6 +544,7 @@ radv_nir_shader_info_pass(const struct nir_shader *nir,
if (options->layout && options->layout->dynamic_offset_count &&
(options->layout->dynamic_shader_stages & mesa_to_vk_shader_stage(nir->info.stage))) {
info->loads_push_constants = true;
info->loads_dynamic_offsets = true;
}
nir_foreach_variable(variable, &nir->inputs)

View File

@@ -561,7 +561,6 @@ radv_prims_for_vertices(struct radv_prim_vertex_count *info, unsigned num)
uint32_t
si_get_ia_multi_vgt_param(struct radv_cmd_buffer *cmd_buffer,
bool instanced_draw, bool indirect_draw,
bool count_from_stream_output,
uint32_t draw_vertex_count)
{
enum chip_class chip_class = cmd_buffer->device->physical_device->rad_info.chip_class;
@@ -623,12 +622,6 @@ si_get_ia_multi_vgt_param(struct radv_cmd_buffer *cmd_buffer,
(instanced_draw || indirect_draw))
partial_vs_wave = true;
/* Hardware requirement when drawing primitives from a stream
* output buffer.
*/
if (count_from_stream_output)
wd_switch_on_eop = true;
/* If the WD switch is false, the IA switch must be false too. */
assert(wd_switch_on_eop || !ia_switch_on_eop);
}

View File

@@ -29,13 +29,6 @@
#ifndef RADV_AMDGPU_WINSYS_PUBLIC_H
#define RADV_AMDGPU_WINSYS_PUBLIC_H
/* The number of IBs per submit isn't infinite, it depends on the ring type
* (ie. some initial setup needed for a submit) and the number of IBs (4 DW).
* This limit is arbitrary but should be safe for now. Ideally, we should get
* this limit from the KMD.
*/
#define RADV_MAX_IBS_PER_SUBMIT 192
struct radeon_winsys *radv_amdgpu_winsys_create(int fd, uint64_t debug_flags,
uint64_t perftest_flags);

View File

@@ -651,7 +651,8 @@ v3d_spec_load(const struct v3d_device_info *devinfo)
struct parser_context ctx;
void *buf;
uint8_t *text_data = NULL;
uint32_t text_offset = 0, text_length = 0, total_length;
uint32_t text_offset = 0, text_length = 0;
MAYBE_UNUSED uint32_t total_length;
for (int i = 0; i < ARRAY_SIZE(genxml_files_table); i++) {
if (i != 0) {

View File

@@ -820,8 +820,8 @@
<packet code="120" name="Tile Binning Mode Cfg" min_ver="41">
<field name="Height (in pixels)" size="16" start="48" type="uint" minus_one="true"/>
<field name="Width (in pixels)" size="16" start="32" type="uint" minus_one="true"/>
<field name="Height (in pixels)" size="12" start="48" type="uint" minus_one="true"/>
<field name="Width (in pixels)" size="12" start="32" type="uint" minus_one="true"/>
<field name="Double-buffer in non-ms mode" size="1" start="15" type="bool"/>
<field name="Multisample Mode (4x)" size="1" start="14" type="bool"/>

View File

@@ -32,8 +32,7 @@
*/
#define V3D_MAX_TEXTURE_SAMPLERS 16
/* The HW can do 16384 (15), but we run into hangs when we expose that. */
#define V3D_MAX_MIP_LEVELS 13
#define V3D_MAX_MIP_LEVELS 12
#define V3D_MAX_SAMPLES 4

View File

@@ -283,8 +283,10 @@ ntq_emit_tmu_general(struct v3d_compile *c, nir_intrinsic_instr *instr,
instr->num_components - 2);
}
if (c->execute.file != QFILE_NULL)
vir_PF(c, c->execute, V3D_QPU_PF_PUSHZ);
if (vir_in_nonuniform_control_flow(c)) {
vir_set_pf(vir_MOV_dest(c, vir_nop_reg(), c->execute),
V3D_QPU_PF_PUSHZ);
}
struct qreg dest;
if (config == ~0)
@@ -307,7 +309,7 @@ ntq_emit_tmu_general(struct v3d_compile *c, nir_intrinsic_instr *instr,
vir_uniform_ui(c, config);
}
if (c->execute.file != QFILE_NULL)
if (vir_in_nonuniform_control_flow(c))
vir_set_cond(tmu, V3D_QPU_COND_IFA);
vir_emit_thrsw(c);
@@ -392,13 +394,14 @@ ntq_store_dest(struct v3d_compile *c, nir_dest *dest, int chan,
/* If we're in control flow, then make this update of the reg
* conditional on the execution mask.
*/
if (c->execute.file != QFILE_NULL) {
if (vir_in_nonuniform_control_flow(c)) {
last_inst->dst.index = qregs[chan].index;
/* Set the flags to the current exec mask.
*/
c->cursor = vir_before_inst(last_inst);
vir_PF(c, c->execute, V3D_QPU_PF_PUSHZ);
vir_set_pf(vir_MOV_dest(c, vir_nop_reg(), c->execute),
V3D_QPU_PF_PUSHZ);
c->cursor = vir_after_inst(last_inst);
vir_set_cond(last_inst, V3D_QPU_COND_IFA);
@@ -540,26 +543,13 @@ ntq_fsign(struct v3d_compile *c, struct qreg src)
struct qreg t = vir_get_temp(c);
vir_MOV_dest(c, t, vir_uniform_f(c, 0.0));
vir_PF(c, vir_FMOV(c, src), V3D_QPU_PF_PUSHZ);
vir_set_pf(vir_FMOV_dest(c, vir_nop_reg(), src), V3D_QPU_PF_PUSHZ);
vir_MOV_cond(c, V3D_QPU_COND_IFNA, t, vir_uniform_f(c, 1.0));
vir_PF(c, vir_FMOV(c, src), V3D_QPU_PF_PUSHN);
vir_set_pf(vir_FMOV_dest(c, vir_nop_reg(), src), V3D_QPU_PF_PUSHN);
vir_MOV_cond(c, V3D_QPU_COND_IFA, t, vir_uniform_f(c, -1.0));
return vir_MOV(c, t);
}
static struct qreg
ntq_isign(struct v3d_compile *c, struct qreg src)
{
struct qreg t = vir_get_temp(c);
vir_MOV_dest(c, t, vir_uniform_ui(c, 0));
vir_PF(c, vir_MOV(c, src), V3D_QPU_PF_PUSHZ);
vir_MOV_cond(c, V3D_QPU_COND_IFNA, t, vir_uniform_ui(c, 1));
vir_PF(c, vir_MOV(c, src), V3D_QPU_PF_PUSHN);
vir_MOV_cond(c, V3D_QPU_COND_IFA, t, vir_uniform_ui(c, -1));
return vir_MOV(c, t);
}
static void
emit_fragcoord_input(struct v3d_compile *c, int attr)
{
@@ -711,7 +701,7 @@ ntq_emit_comparison(struct v3d_compile *c,
if (nir_op_infos[compare_instr->op].num_inputs > 1)
src1 = ntq_get_alu_src(c, compare_instr, 1);
bool cond_invert = false;
struct qreg nop = vir_reg(QFILE_NULL, 0);
struct qreg nop = vir_nop_reg();
switch (compare_instr->op) {
case nir_op_feq32:
@@ -756,6 +746,16 @@ ntq_emit_comparison(struct v3d_compile *c,
vir_set_pf(vir_SUB_dest(c, nop, src0, src1), V3D_QPU_PF_PUSHC);
break;
case nir_op_i2b32:
vir_set_pf(vir_MOV_dest(c, nop, src0), V3D_QPU_PF_PUSHZ);
cond_invert = true;
break;
case nir_op_f2b32:
vir_set_pf(vir_FMOV_dest(c, nop, src0), V3D_QPU_PF_PUSHZ);
cond_invert = true;
break;
default:
return false;
}
@@ -789,28 +789,24 @@ ntq_get_alu_parent(nir_src src)
return instr;
}
/**
* Attempts to fold a comparison generating a boolean result into the
* condition code for selecting between two values, instead of comparing the
* boolean result against 0 to generate the condition code.
*/
static struct qreg ntq_emit_bcsel(struct v3d_compile *c, nir_alu_instr *instr,
struct qreg *src)
/* Turns a NIR bool into a condition code to predicate on. */
static enum v3d_qpu_cond
ntq_emit_bool_to_cond(struct v3d_compile *c, nir_src src)
{
nir_alu_instr *compare = ntq_get_alu_parent(instr->src[0].src);
nir_alu_instr *compare = ntq_get_alu_parent(src);
if (!compare)
goto out;
enum v3d_qpu_cond cond;
if (ntq_emit_comparison(c, compare, &cond))
return vir_MOV(c, vir_SEL(c, cond, src[1], src[2]));
return cond;
out:
vir_PF(c, src[0], V3D_QPU_PF_PUSHZ);
return vir_MOV(c, vir_SEL(c, V3D_QPU_COND_IFNA, src[1], src[2]));
vir_set_pf(vir_MOV_dest(c, vir_nop_reg(), ntq_get_src(c, src, 0)),
V3D_QPU_PF_PUSHZ);
return V3D_QPU_COND_IFNA;
}
static void
ntq_emit_alu(struct v3d_compile *c, nir_alu_instr *instr)
{
@@ -889,13 +885,6 @@ ntq_emit_alu(struct v3d_compile *c, nir_alu_instr *instr)
case nir_op_b2i32:
result = vir_AND(c, src[0], vir_uniform_ui(c, 1));
break;
case nir_op_i2b32:
case nir_op_f2b32:
vir_PF(c, src[0], V3D_QPU_PF_PUSHZ);
result = vir_MOV(c, vir_SEL(c, V3D_QPU_COND_IFNA,
vir_uniform_ui(c, ~0),
vir_uniform_ui(c, 0)));
break;
case nir_op_iadd:
result = vir_ADD(c, src[0], src[1]);
@@ -958,6 +947,8 @@ ntq_emit_alu(struct v3d_compile *c, nir_alu_instr *instr)
break;
}
case nir_op_i2b32:
case nir_op_f2b32:
case nir_op_feq32:
case nir_op_fne32:
case nir_op_fge32:
@@ -978,10 +969,15 @@ ntq_emit_alu(struct v3d_compile *c, nir_alu_instr *instr)
}
case nir_op_b32csel:
result = ntq_emit_bcsel(c, instr, src);
result = vir_MOV(c,
vir_SEL(c,
ntq_emit_bool_to_cond(c, instr->src[0].src),
src[1], src[2]));
break;
case nir_op_fcsel:
vir_PF(c, src[0], V3D_QPU_PF_PUSHZ);
vir_set_pf(vir_MOV_dest(c, vir_nop_reg(), src[0]),
V3D_QPU_PF_PUSHZ);
result = vir_MOV(c, vir_SEL(c, V3D_QPU_COND_IFNA,
src[1], src[2]));
break;
@@ -1011,9 +1007,6 @@ ntq_emit_alu(struct v3d_compile *c, nir_alu_instr *instr)
case nir_op_ftrunc:
result = vir_FTRUNC(c, src[0]);
break;
case nir_op_ffract:
result = vir_FSUB(c, src[0], vir_FFLOOR(c, src[0]));
break;
case nir_op_fsin:
result = ntq_fsincos(c, src[0], false);
@@ -1025,9 +1018,6 @@ ntq_emit_alu(struct v3d_compile *c, nir_alu_instr *instr)
case nir_op_fsign:
result = ntq_fsign(c, src[0]);
break;
case nir_op_isign:
result = ntq_isign(c, src[0]);
break;
case nir_op_fabs: {
result = vir_FMOV(c, src[0]);
@@ -1036,8 +1026,7 @@ ntq_emit_alu(struct v3d_compile *c, nir_alu_instr *instr)
}
case nir_op_iabs:
result = vir_MAX(c, src[0],
vir_SUB(c, vir_uniform_ui(c, 0), src[0]));
result = vir_MAX(c, src[0], vir_NEG(c, src[0]));
break;
case nir_op_fddx:
@@ -1053,7 +1042,8 @@ ntq_emit_alu(struct v3d_compile *c, nir_alu_instr *instr)
break;
case nir_op_uadd_carry:
vir_PF(c, vir_ADD(c, src[0], src[1]), V3D_QPU_PF_PUSHC);
vir_set_pf(vir_ADD_dest(c, vir_nop_reg(), src[0], src[1]),
V3D_QPU_PF_PUSHC);
result = vir_MOV(c, vir_SEL(c, V3D_QPU_COND_IFA,
vir_uniform_ui(c, ~0),
vir_uniform_ui(c, 0)));
@@ -1064,9 +1054,6 @@ ntq_emit_alu(struct v3d_compile *c, nir_alu_instr *instr)
break;
case nir_op_unpack_half_2x16_split_x:
/* XXX perf: It would be good to be able to merge this unpack
* with whatever uses our result.
*/
result = vir_FMOV(c, src[0]);
vir_set_unpack(c->defs[result.index], 0, V3D_QPU_UNPACK_L);
break;
@@ -1129,8 +1116,8 @@ emit_frag_end(struct v3d_compile *c)
*/
bool has_any_tlb_color_write = false;
for (int rt = 0; rt < c->fs_key->nr_cbufs; rt++) {
if (c->output_color_var[rt])
for (int rt = 0; rt < V3D_MAX_DRAW_BUFFERS; rt++) {
if (c->fs_key->cbufs & (1 << rt) && c->output_color_var[rt])
has_any_tlb_color_write = true;
}
@@ -1138,7 +1125,7 @@ emit_frag_end(struct v3d_compile *c)
struct nir_variable *var = c->output_color_var[0];
struct qreg *color = &c->outputs[var->data.driver_location * 4];
vir_SETMSF_dest(c, vir_reg(QFILE_NULL, 0),
vir_SETMSF_dest(c, vir_nop_reg(),
vir_AND(c,
vir_MSF(c),
vir_FTOC(c, color[3])));
@@ -1175,7 +1162,7 @@ emit_frag_end(struct v3d_compile *c)
struct qinst *inst = vir_MOV_dest(c,
vir_reg(QFILE_TLBU, 0),
vir_reg(QFILE_NULL, 0));
vir_nop_reg());
uint8_t tlb_specifier = TLB_TYPE_DEPTH;
if (c->devinfo->ver >= 42) {
@@ -1197,8 +1184,8 @@ emit_frag_end(struct v3d_compile *c)
* uniform setup
*/
for (int rt = 0; rt < c->fs_key->nr_cbufs; rt++) {
if (!c->output_color_var[rt])
for (int rt = 0; rt < V3D_MAX_DRAW_BUFFERS; rt++) {
if (!(c->fs_key->cbufs & (1 << rt)) || !c->output_color_var[rt])
continue;
nir_variable *var = c->output_color_var[rt];
@@ -1458,7 +1445,7 @@ v3d_optimize_nir(struct nir_shader *s)
NIR_PASS(progress, s, nir_opt_dce);
NIR_PASS(progress, s, nir_opt_dead_cf);
NIR_PASS(progress, s, nir_opt_cse);
NIR_PASS(progress, s, nir_opt_peephole_select, 8, true);
NIR_PASS(progress, s, nir_opt_peephole_select, 8, true, true);
NIR_PASS(progress, s, nir_opt_algebraic);
NIR_PASS(progress, s, nir_opt_constant_folding);
NIR_PASS(progress, s, nir_opt_undef);
@@ -1492,7 +1479,6 @@ ntq_emit_vpm_read(struct v3d_compile *c,
if (*num_components_queued != 0) {
(*num_components_queued)--;
c->num_inputs++;
return vir_MOV(c, vpm);
}
@@ -1502,7 +1488,6 @@ ntq_emit_vpm_read(struct v3d_compile *c,
*num_components_queued = num_components - 1;
*remaining -= num_components;
c->num_inputs++;
return vir_MOV(c, vpm);
}
@@ -1550,6 +1535,12 @@ ntq_setup_vpm_inputs(struct v3d_compile *c)
&num_components, ~0);
}
/* The actual loads will happen directly in nir_intrinsic_load_input
* on newer versions.
*/
if (c->devinfo->ver >= 40)
return;
for (int loc = 0; loc < ARRAY_SIZE(c->vattr_sizes); loc++) {
resize_qreg_array(c, &c->inputs, &c->inputs_array_size,
(loc + 1) * 4);
@@ -1855,7 +1846,7 @@ ntq_emit_intrinsic(struct v3d_compile *c, nir_intrinsic_instr *instr)
break;
case nir_intrinsic_load_helper_invocation:
vir_PF(c, vir_MSF(c), V3D_QPU_PF_PUSHZ);
vir_set_pf(vir_MSF_dest(c, vir_nop_reg()), V3D_QPU_PF_PUSHZ);
ntq_store_dest(c, &instr->dest, 0,
vir_MOV(c, vir_SEL(c, V3D_QPU_COND_IFA,
vir_uniform_ui(c, ~0),
@@ -1881,12 +1872,43 @@ ntq_emit_intrinsic(struct v3d_compile *c, nir_intrinsic_instr *instr)
break;
case nir_intrinsic_load_input:
for (int i = 0; i < instr->num_components; i++) {
offset = (nir_intrinsic_base(instr) +
nir_src_as_uint(instr->src[0]));
int comp = nir_intrinsic_component(instr) + i;
ntq_store_dest(c, &instr->dest, i,
vir_MOV(c, c->inputs[offset * 4 + comp]));
offset = (nir_intrinsic_base(instr) +
nir_src_as_uint(instr->src[0]));
if (c->s->info.stage != MESA_SHADER_FRAGMENT &&
c->devinfo->ver >= 40) {
/* Emit the LDVPM directly now, rather than at the top
* of the shader like we did for V3D 3.x (which needs
* vpmsetup when not just taking the next offset).
*
* Note that delaying like this may introduce stalls,
* as LDVPMV takes a minimum of 1 instruction but may
* be slower if the VPM unit is busy with another QPU.
*/
int index = 0;
if (c->s->info.system_values_read &
(1ull << SYSTEM_VALUE_INSTANCE_ID)) {
index++;
}
if (c->s->info.system_values_read &
(1ull << SYSTEM_VALUE_VERTEX_ID)) {
index++;
}
for (int i = 0; i < offset; i++)
index += c->vattr_sizes[i];
index += nir_intrinsic_component(instr);
for (int i = 0; i < instr->num_components; i++) {
struct qreg vpm_offset =
vir_uniform_ui(c, index++);
ntq_store_dest(c, &instr->dest, i,
vir_LDVPMV_IN(c, vpm_offset));
}
} else {
for (int i = 0; i < instr->num_components; i++) {
int comp = nir_intrinsic_component(instr) + i;
ntq_store_dest(c, &instr->dest, i,
vir_MOV(c, c->inputs[offset * 4 +
comp]));
}
}
break;
@@ -1908,38 +1930,35 @@ ntq_emit_intrinsic(struct v3d_compile *c, nir_intrinsic_instr *instr)
break;
case nir_intrinsic_discard:
if (c->execute.file != QFILE_NULL) {
vir_PF(c, c->execute, V3D_QPU_PF_PUSHZ);
vir_set_cond(vir_SETMSF_dest(c, vir_reg(QFILE_NULL, 0),
if (vir_in_nonuniform_control_flow(c)) {
vir_set_pf(vir_MOV_dest(c, vir_nop_reg(), c->execute),
V3D_QPU_PF_PUSHZ);
vir_set_cond(vir_SETMSF_dest(c, vir_nop_reg(),
vir_uniform_ui(c, 0)),
V3D_QPU_COND_IFA);
} else {
vir_SETMSF_dest(c, vir_reg(QFILE_NULL, 0),
vir_SETMSF_dest(c, vir_nop_reg(),
vir_uniform_ui(c, 0));
}
break;
case nir_intrinsic_discard_if: {
/* true (~0) if we're discarding */
struct qreg cond = ntq_get_src(c, instr->src[0], 0);
enum v3d_qpu_cond cond = ntq_emit_bool_to_cond(c, instr->src[0]);
if (c->execute.file != QFILE_NULL) {
/* execute == 0 means the channel is active. Invert
* the condition so that we can use zero as "executing
* and discarding."
*/
vir_PF(c, vir_OR(c, c->execute, vir_NOT(c, cond)),
V3D_QPU_PF_PUSHZ);
vir_set_cond(vir_SETMSF_dest(c, vir_reg(QFILE_NULL, 0),
vir_uniform_ui(c, 0)),
V3D_QPU_COND_IFA);
} else {
vir_PF(c, cond, V3D_QPU_PF_PUSHZ);
vir_set_cond(vir_SETMSF_dest(c, vir_reg(QFILE_NULL, 0),
vir_uniform_ui(c, 0)),
V3D_QPU_COND_IFNA);
if (vir_in_nonuniform_control_flow(c)) {
struct qinst *exec_flag = vir_MOV_dest(c, vir_nop_reg(),
c->execute);
if (cond == V3D_QPU_COND_IFA) {
vir_set_uf(exec_flag, V3D_QPU_UF_ANDZ);
} else {
vir_set_uf(exec_flag, V3D_QPU_UF_NORNZ);
cond = V3D_QPU_COND_IFA;
}
}
vir_set_cond(vir_SETMSF_dest(c, vir_nop_reg(),
vir_uniform_ui(c, 0)), cond);
break;
}
@@ -2030,7 +2049,7 @@ ntq_emit_intrinsic(struct v3d_compile *c, nir_intrinsic_instr *instr)
static void
ntq_activate_execute_for_block(struct v3d_compile *c)
{
vir_set_pf(vir_XOR_dest(c, vir_reg(QFILE_NULL, 0),
vir_set_pf(vir_XOR_dest(c, vir_nop_reg(),
c->execute, vir_uniform_ui(c, c->cur_block->index)),
V3D_QPU_PF_PUSHZ);
@@ -2054,14 +2073,7 @@ ntq_emit_uniform_if(struct v3d_compile *c, nir_if *if_stmt)
else_block = vir_new_block(c);
/* Set up the flags for the IF condition (taking the THEN branch). */
nir_alu_instr *if_condition_alu = ntq_get_alu_parent(if_stmt->condition);
enum v3d_qpu_cond cond;
if (!if_condition_alu ||
!ntq_emit_comparison(c, if_condition_alu, &cond)) {
vir_PF(c, ntq_get_src(c, if_stmt->condition, 0),
V3D_QPU_PF_PUSHZ);
cond = V3D_QPU_COND_IFNA;
}
enum v3d_qpu_cond cond = ntq_emit_bool_to_cond(c, if_stmt->condition);
/* Jump to ELSE. */
vir_BRANCH(c, cond == V3D_QPU_COND_IFA ?
@@ -2107,20 +2119,13 @@ ntq_emit_nonuniform_if(struct v3d_compile *c, nir_if *if_stmt)
else_block = vir_new_block(c);
bool was_uniform_control_flow = false;
if (c->execute.file == QFILE_NULL) {
if (!vir_in_nonuniform_control_flow(c)) {
c->execute = vir_MOV(c, vir_uniform_ui(c, 0));
was_uniform_control_flow = true;
}
/* Set up the flags for the IF condition (taking the THEN branch). */
nir_alu_instr *if_condition_alu = ntq_get_alu_parent(if_stmt->condition);
enum v3d_qpu_cond cond;
if (!if_condition_alu ||
!ntq_emit_comparison(c, if_condition_alu, &cond)) {
vir_PF(c, ntq_get_src(c, if_stmt->condition, 0),
V3D_QPU_PF_PUSHZ);
cond = V3D_QPU_COND_IFNA;
}
enum v3d_qpu_cond cond = ntq_emit_bool_to_cond(c, if_stmt->condition);
/* Update the flags+cond to mean "Taking the ELSE branch (!cond) and
* was previously active (execute Z) for updating the exec flags.
@@ -2128,8 +2133,7 @@ ntq_emit_nonuniform_if(struct v3d_compile *c, nir_if *if_stmt)
if (was_uniform_control_flow) {
cond = v3d_qpu_cond_invert(cond);
} else {
struct qinst *inst = vir_MOV_dest(c, vir_reg(QFILE_NULL, 0),
c->execute);
struct qinst *inst = vir_MOV_dest(c, vir_nop_reg(), c->execute);
if (cond == V3D_QPU_COND_IFA) {
vir_set_uf(inst, V3D_QPU_UF_NORNZ);
} else {
@@ -2145,7 +2149,7 @@ ntq_emit_nonuniform_if(struct v3d_compile *c, nir_if *if_stmt)
/* Jump to ELSE if nothing is active for THEN, otherwise fall
* through.
*/
vir_PF(c, c->execute, V3D_QPU_PF_PUSHZ);
vir_set_pf(vir_MOV_dest(c, vir_nop_reg(), c->execute), V3D_QPU_PF_PUSHZ);
vir_BRANCH(c, V3D_QPU_BRANCH_COND_ALLNA);
vir_link_blocks(c->cur_block, else_block);
vir_link_blocks(c->cur_block, then_block);
@@ -2159,14 +2163,16 @@ ntq_emit_nonuniform_if(struct v3d_compile *c, nir_if *if_stmt)
* active channels update their execute flags to point to
* ENDIF
*/
vir_PF(c, c->execute, V3D_QPU_PF_PUSHZ);
vir_set_pf(vir_MOV_dest(c, vir_nop_reg(), c->execute),
V3D_QPU_PF_PUSHZ);
vir_MOV_cond(c, V3D_QPU_COND_IFA, c->execute,
vir_uniform_ui(c, after_block->index));
/* If everything points at ENDIF, then jump there immediately. */
vir_PF(c, vir_XOR(c, c->execute,
vir_uniform_ui(c, after_block->index)),
V3D_QPU_PF_PUSHZ);
vir_set_pf(vir_XOR_dest(c, vir_nop_reg(),
c->execute,
vir_uniform_ui(c, after_block->index)),
V3D_QPU_PF_PUSHZ);
vir_BRANCH(c, V3D_QPU_BRANCH_COND_ALLA);
vir_link_blocks(c->cur_block, after_block);
vir_link_blocks(c->cur_block, else_block);
@@ -2190,7 +2196,7 @@ ntq_emit_if(struct v3d_compile *c, nir_if *nif)
{
bool was_in_control_flow = c->in_control_flow;
c->in_control_flow = true;
if (c->execute.file == QFILE_NULL &&
if (!vir_in_nonuniform_control_flow(c) &&
nir_src_is_dynamically_uniform(nif->condition)) {
ntq_emit_uniform_if(c, nif);
} else {
@@ -2204,13 +2210,15 @@ ntq_emit_jump(struct v3d_compile *c, nir_jump_instr *jump)
{
switch (jump->type) {
case nir_jump_break:
vir_PF(c, c->execute, V3D_QPU_PF_PUSHZ);
vir_set_pf(vir_MOV_dest(c, vir_nop_reg(), c->execute),
V3D_QPU_PF_PUSHZ);
vir_MOV_cond(c, V3D_QPU_COND_IFA, c->execute,
vir_uniform_ui(c, c->loop_break_block->index));
break;
case nir_jump_continue:
vir_PF(c, c->execute, V3D_QPU_PF_PUSHZ);
vir_set_pf(vir_MOV_dest(c, vir_nop_reg(), c->execute),
V3D_QPU_PF_PUSHZ);
vir_MOV_cond(c, V3D_QPU_COND_IFA, c->execute,
vir_uniform_ui(c, c->loop_cont_block->index));
break;
@@ -2277,7 +2285,7 @@ ntq_emit_loop(struct v3d_compile *c, nir_loop *loop)
c->in_control_flow = true;
bool was_uniform_control_flow = false;
if (c->execute.file == QFILE_NULL) {
if (!vir_in_nonuniform_control_flow(c)) {
c->execute = vir_MOV(c, vir_uniform_ui(c, 0));
was_uniform_control_flow = true;
}
@@ -2299,13 +2307,14 @@ ntq_emit_loop(struct v3d_compile *c, nir_loop *loop)
*
* XXX: Use the .ORZ flags update, instead.
*/
vir_PF(c, vir_XOR(c,
c->execute,
vir_uniform_ui(c, c->loop_cont_block->index)),
V3D_QPU_PF_PUSHZ);
vir_set_pf(vir_XOR_dest(c,
vir_nop_reg(),
c->execute,
vir_uniform_ui(c, c->loop_cont_block->index)),
V3D_QPU_PF_PUSHZ);
vir_MOV_cond(c, V3D_QPU_COND_IFA, c->execute, vir_uniform_ui(c, 0));
vir_PF(c, c->execute, V3D_QPU_PF_PUSHZ);
vir_set_pf(vir_MOV_dest(c, vir_nop_reg(), c->execute), V3D_QPU_PF_PUSHZ);
struct qinst *branch = vir_BRANCH(c, V3D_QPU_BRANCH_COND_ANYA);
/* Pixels that were not dispatched or have been discarded should not
@@ -2471,6 +2480,7 @@ const nir_shader_compiler_options v3d_nir_options = {
.lower_bitfield_reverse = true,
.lower_bit_count = true,
.lower_cs_local_id_from_index = true,
.lower_ffract = true,
.lower_pack_unorm_2x16 = true,
.lower_pack_snorm_2x16 = true,
.lower_pack_unorm_4x8 = true,
@@ -2487,6 +2497,7 @@ const nir_shader_compiler_options v3d_nir_options = {
.lower_fsat = true,
.lower_fsqrt = true,
.lower_ifind_msb = true,
.lower_isign = true,
.lower_ldexp = true,
.lower_mul_high = true,
.lower_wpos_pntc = true,
@@ -2659,5 +2670,15 @@ v3d_nir_to_vir(struct v3d_compile *c)
vir_remove_thrsw(c);
}
if (c->spill_size &&
(V3D_DEBUG & (V3D_DEBUG_VIR |
v3d_debug_flag_for_shader_stage(c->s->info.stage)))) {
fprintf(stderr, "%s prog %d/%d spilled VIR:\n",
vir_get_stage_name(c),
c->program_id, c->variant_id);
vir_dump(c);
fprintf(stderr, "\n");
}
v3d_vir_to_qpu(c, temp_registers);
}

Some files were not shown because too many files have changed in this diff Show More