Compare commits

..

51 Commits

Author SHA1 Message Date
Dylan Baker
4c8bd415b4 VERSION: bump for 19.3.0 final 2019-12-12 11:21:58 -08:00
Dylan Baker
9e8aaa6f18 docs: add release notes for 19.3.0 2019-12-12 11:21:43 -08:00
Dylan Baker
a857bc66dc Revert "egl: move #include of local headers out of Khronos headers"
This reverts commit 87efb9f3a4.

This is breaking the QT build, so it needs to go until these symbols can
make their way to upstream khronos
2019-12-12 09:24:42 -08:00
Dylan Baker
e0da018907 Revert "egl: avoid local modifications for eglext.h Khronos standard header file"
This reverts commit 2a497735ec.

This patch is built on the previous patch, which needs to be reverted.
2019-12-12 09:23:54 -08:00
Lionel Landwerlin
ce856a7392 anv: fix incorrect VMA alignment for CCS main surfaces
Maybe finer way of dealing with this requirement would be to increase
the number of pdevice->memory.types[] to add a category for special
alignment cases.

Meanwhile this fixes the problem of CCS surface alignment and it's
probably not going to cause issues given the size of our address
space.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 6af8a4acc4 ("anv: Add aux-map translation for gen12+")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 5fdea9f401)
2019-12-12 09:22:54 -08:00
Samuel Pitoiset
3a58a73661 ac/nir: fix out-of-bound access when loading constants from global
Global load/store instructions can't know if the offset is
out-of-bound because they don't use descriptors (no range).

Fix this by clamping the offset for arrays that are indexed
with a non-constant offset that's greater or equal to the array
size.

This fixes VM faults and GPU hangs with Dead Rising 4.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2148
Fixes: 71a6794200 ("ac/nir: Enable nir_opt_large_constants")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit a0f1a5fa05)
2019-12-12 09:22:54 -08:00
Pierre-Eric Pelloux-Prayer
1452cf672c radeonsi: use gfx9.surf_offset to compute texture offset
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2177
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit ff0f108666)
2019-12-12 09:22:54 -08:00
Mauro Rossi
88b2a8ba3f android: radeonsi: fix build after vl refactoring (v2)
vl functions moved from radeonsi to gallium/auxiliary/vl have left
android build of radeonsi in broken state.

libmesa_galliumvl static is need to build readeonsi,
gallium_dri building rules are reworked to avoid multiple symbols
and libmesa_galliumvl static dependency is needed in radeonsi.

Here is the changelog:
- android: gallium/auxiliary: add libmesa_galliumvl static
- android: gallium_dri: move libmesa_gallium to static to prevent multiple symbols
- android: radeonsi: fix build after vl refactoring

Fixes the following building error:

external/mesa/src/gallium/drivers/radeonsi/si_uvd.c:47:
error: undefined reference to 'vl_video_buffer_create_as_resource'
clang.real: error: linker command failed with exit code 1 (use -v to see invocation)

Fixes: 86e60bc ("radeonsi: remove si_vid_join_surfaces and use combined planar allocations")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 96aef08dc6)
Conflicts Resolved by Dylan Baker

Conflicts:
	src/gallium/targets/dri/Android.mk

Panfrost is not enabled for android in 19.3, and the series is a bit
bigger than I'd like to pull into the stable branch for a .0 release
2019-12-11 16:41:11 -08:00
Jason Ekstrand
3f50741bc2 anv: Don't leak when set_tiling fails
Fixes: a44744e01d "anv: Require a dedicated allocation for..."
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit 0a36fafa95)
Conflicts resolved by Dylan Baker

Conflicts:
	src/intel/vulkan/anv_device.c
2019-12-11 16:34:38 -08:00
Nanley Chery
58395e5293 iris: Fix import of multi-planar surfaces with modifiers
Multi-planar surfaces are allowed to have modifiers. Don't require
DRM_FORMAT_MOD_INVALID in order to create a surface for each plane
defined by the format.

Fixes: 246eebba4a ("iris: Export and import surfaces with modifiers that have aux data")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 21376cffb3)
2019-12-11 15:49:41 -08:00
James Xiong
ae06960627 iris: try to set the specified tiling when importing a dmabuf
When importing a dmabuf with a specified tiling, the dmabuf user
should always try to set the tiling mode because: 1) the exporter
can set tiling AFTER exporting/importing. 2) a dmabuf could be
exported from a kernel driver other than i915, in this case the
dmabuf user and exporter need to set tiling separately.

This patch fixes a problem when running vkmark under weston with
iris on ICL, it crashed to console with the following assert. i965
doesn't have this problem as it always tries to set the specified
tiling mode.

weston: ../src/gallium/drivers/iris/iris_resource.c:990: iris_resource_from_handle: Assertion `res->bo->tiling_mode == isl_tiling_to_i915_tiling(res->surf.tiling)' failed.

Signed-off-by: James Xiong <james.xiong@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
(cherry picked from commit b6d45e7f74)
2019-12-11 15:49:35 -08:00
Nanley Chery
7b2ef16086 gallium: Store the image format in winsys_handle
This format will be used to properly handle planar images with modifiers
in iris.

Fixes: 246eebba4a ("iris: Export and import surfaces with modifiers that have aux data")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 51ee8fff9b)
2019-12-11 15:46:50 -08:00
Bas Nieuwenhuizen
d4dad580e5 radv: Fix RGBX Android<->Vulkan format correspondence.
This is correct per the Vulkan spec format equivalence table.

Fixes: f36b52740a "radv/android: Add android hardware buffer queries."
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 2e44bfc14f)
2019-12-11 15:46:20 -08:00
Dylan Baker
4a3b4ccf6b meson/broadcom: libbroadcom_cle also needs zlib
Fixes: 1ae8018a6a
       ("meson: Add support for the vc4 driver.")
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit d0eebda990)
2019-12-11 15:46:15 -08:00
Dylan Baker
a637f36b61 meson/broadcom: libbroadcom_cle needs expat headers
Fixes: 1ae8018a6a
       ("meson: Add support for the vc4 driver.")
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit 85a9698ac3)
2019-12-11 15:46:10 -08:00
Lionel Landwerlin
843629708e anv: fix missing gen12 handling
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 181be14d43 ("anv: Build for gen12")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit dcfe1903c3)
2019-12-10 09:14:38 -08:00
Pierre-Eric Pelloux-Prayer
01d53f7ac0 radeonsi: fix multi plane buffers creation
When using 3 planes, the sequence produces this chain:
  plane0 -> plane2
This commit fixes this to produce:
  plane0 -> plane1 -> plane2

Fixes: 86e60bc265 ("radeonsi: remove si_vid_join_surfaces and use combined planar allocations")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2193
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit e3e91cebcd)
2019-12-10 09:14:34 -08:00
Alyssa Rosenzweig
166a3ae3c8 gallium/util: Support POLYGON in u_stream_outputs_for_vertices
u_decomposed_prims_for_vertices cannot support POLYGON, but POLYGON is
trivial to support as a special case directly (since we have the number
of vertices directly).

Fixes aborts in Panfrost in apps using GL_POLYGON.

Fixes: e881aa8c12 ("gallium/util: Add u_stream_outputs_for_vertices helper")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Revewied-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit a37822f5f7)
2019-12-10 09:14:28 -08:00
Jason Ekstrand
c5e203ff50 anv: Re-emit all compute state on pipeline switch
It's a very odd case to hit in the real world.  However, there are some
CTS tests which switch back and forth between dispatch and clear without
changing the pipeline.

Fixes: bc612536eb "anv: Emit a dummy MEDIA_VFE_STATE before switching..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
(cherry picked from commit 0f60aa4037)
2019-12-10 09:14:21 -08:00
Fritz Koenig
22d1e495da freedreno: reorder format check
With the addition of the planar formats helper, the
planar formats no longer have a valid block.bits field.
Calling util_format_get_blocksize therefore asserts.

Reorder the check to see if the format is supported
before doing the query to get the blocksize.

Fixes: 20f132e5ef ("gallium/util: add planar format layouts and helpers")

Signed-off-by: Fritz Koenig <frkoenig@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
(cherry picked from commit c496d44284)
2019-12-10 09:14:14 -08:00
Nanley Chery
a67289631f gallium/dri2: Fix creation of multi-planar modifier images
The commit noted below assumed and enforced that DRM_MOD_INVALID was the
only valid modifier for multi-planar imported images. Due to that, it
required that modifier on multi-planar images to:

   1. Allow multiple planes.
   2. Perform YUV format lowering and extent adjustments.
   3. Use buffer_index to correctly map the given planes.

Fix these issues by removing or updating the code built on that
assumption.

Fixes: 2066966c10 ("gallium/dri2: Support creating multi-planar modifier images")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit d5c857837a)
2019-12-10 09:13:56 -08:00
Timothy Arceri
6adf4fe26d glsl/nir: iterate the system values list when adding varyings
Iterate the system values list when adding varyings to the program
resource list in the NIR linker. This is needed to avoid CTS
regressions when using the NIR to build the GLSL resource list in
an upcoming series. Presumably it also fixes a bug with the current
ARB_gl_spirv support.

Fixes: ffdb44d3a0 ("nir/linker: Add inputs/outputs to the program resource list")

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
(cherry picked from commit 1abca2b3c8)
2019-12-10 09:13:46 -08:00
Ian Romanick
fe136a943d intel/compiler: Fix 'comparison is always true' warning
Without looking at the assembly or something, I'm not sure what the
compiler does here.  The brw_reg_type enum is marked packed, so I'm
guess that it gets represented as a uint8_t.  That's the only reason I
could think that comparing with -1 would be always true.

This patch adds the same cast that exists in brw_hw_type_to_reg_type.
It might be better to add a #define outside the enum for
BRW_REGISTER_TYPE_INVALID as (enum brw_reg_type)-1.

src/intel/compiler/brw_eu_compact.c: In function ‘has_immediate’:
src/intel/compiler/brw_eu_compact.c:1515:20: warning: comparison is always true due to limited range of data type [-Wtype-limits]
 1515 |       return *type != -1;
      |                    ^~
src/intel/compiler/brw_eu_compact.c:1518:20: warning: comparison is always true due to limited range of data type [-Wtype-limits]
 1518 |       return *type != -1;
      |                    ^~

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
CID: 1455194
Fixes: 12d3b11908 ("intel/compiler: Add instruction compaction support on Gen12")
Cc: @mattst88
(cherry picked from commit 668635abd2)
2019-12-10 09:13:05 -08:00
Rob Clark
b2d5d0aae1 nir/lower_clip: Fix incorrect driver loc for clipdist outputs
Somehow adjusting maxloc based on existing outputs got lost, resulting
in the clipdist varying clobbering the position varying.  Causing a
shader that had no position output in freedreno/ir3, which triggers GPU
hangs in neverball.

Fixes: d0f746b645 ("nir: Save nir_variable pointers in nir_lower_clip_vs rather than locs.")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
(cherry picked from commit 372ed42d22)
2019-12-10 09:13:00 -08:00
Dylan Baker
e8635ce28e cherry-ignore: update for 19.3-rc7 2019-12-04 13:41:07 -08:00
Lionel Landwerlin
fb6db6b5bb intel/perf: fix improper pointer access
This expression was unused by the macro, probably why it didn't
register in the compilation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit ddacd3d43b)
2019-12-04 13:41:07 -08:00
Lionel Landwerlin
c90f4e9508 intel/perf: simplify the processing of OA reports
This is a more accurate description of what happens in processing the
OA reports.

Previously we only had a somewhat difficult to parse state machine
tracking the context ID.

What we really only need to do to decide if the delta between 2
reports (r0 & r1) should be accumulated in the query result is :

   * whether the r0 is tagged with the context ID relevant to us

   * if r0 is not tagged with our context ID and r1 is: does r0 have a
     invalid context id? If not then we're in a case where i915 has
     resubmitted the same context for execution through the execlist
     submission port

v2: Update comment (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 8c0b058263)
2019-12-04 13:41:07 -08:00
Lionel Landwerlin
1de3548668 intel/perf: take into account that reports read can be fairly old
If we read the OA reports late enough after the query happens, we can
get a timestamp in the report that is significantly in the past
compared to the start timestamp of the query. The current code must
deal with the wraparound of the timestamp value (every ~6 minute). So
consider that if the difference is greater than half that wraparound
period, we're probably dealing with an old report and make the caller
aware it should read more reports when they're available.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit b364e920bf)
2019-12-04 13:41:07 -08:00
Lionel Landwerlin
4399795fbd intel/perf: set read buffer len to 0 to identify empty buffer
We always add an empty buffer in the list when creating the query.
Let's set the len appropriately so that we can recognize it when we
read OA reports up to the end of a query.

We were using an 0 timestamp value associated with the empty buffer
and incorrectly assuming this was a valid value. In turn that led to
not reading enough reports and resulted in deltas added to our counter
values which should have been discarded because those would be flagged
for a different context.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 9d0a5c817c)
2019-12-04 13:41:07 -08:00
Lionel Landwerlin
d362ba77ce intel/perf: fix invalid hw_id in query results
Accumulation happens between 2 reports, it can be between a start/end
report from another context. So only consider updating the hw_id of
the results when it's not already valid and that we have a valid value
to put in there.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 41b54b5faf ("i965: move OA accumulation code to intel/perf")
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit acea59dbf8)
2019-12-04 13:41:07 -08:00
Dylan Baker
9b189cb9b1 VERSION: bump version for 19.3-rc6 2019-12-04 13:14:01 -08:00
Daniel Schürmann
15791ca8f9 aco: fix a couple of value numbering issues
Fixes: 3a20ef4a32 'aco: refactor value numbering'

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-12-04 08:18:46 +00:00
Jason Ekstrand
f0aa6a7535 anv: Set up SBE_SWIZ properly for gl_Viewport
gl_Viewport is also in the VUE header so we need to whack the read
offset to 0 and emit a default (no overrides) SBE_SWIZ entry in that
case as well.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
(cherry picked from commit b1f37688ba)
2019-12-03 10:46:25 -08:00
Jordan Justen
f6ac7d9a5b iris: Allow max dynamic pool size of 2GB for gen12
Reworks:
 * Adjust comment to list the state packets that curro found to be
   affected.

Fixes: 8125d7960b ("intel/dev: Add preliminary device info for Tigerlake")
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(cherry picked from commit e277009d8d)
2019-12-03 10:46:25 -08:00
Rhys Perry
f9e8f6bad8 nir/lower_io_to_vector: don't create arrays when not needed
Some backends require that there are no array varyings.

If there were no arrays in the input shader, the pass shouldn't have to
create new ones.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2103
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2167
Fixes: bcd14756ee ('nir/lower_io_to_vector: add flat mode')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
(cherry picked from commit 5404b7aaa3)
2019-12-03 10:46:09 -08:00
Rhys Perry
f7d100caad radv: set writes_memory for global memory stores/atomics
Fixes: 13ab63bb62 ('radv: Implement VK_EXT_buffer_device_address.')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 35fab1ba33)
2019-12-03 10:24:23 -08:00
Daniel Schürmann
0fa0b5fc3a aco: don't split live-ranges of linear VGPRs
Fixes: 93c8ebfa78 'aco: Initial commit of independent AMD compiler'

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
(cherry picked from commit 8861a82be7)
2019-12-03 10:24:08 -08:00
Rhys Perry
bf03a4311b aco: add v_nop inbetween exec write and VMEM/DS/FLAT
LLVM and the proprietary compiler seem to do this

Fixes: b01847bd9 ("aco/gfx10: Fix mitigation of VMEMtoScalarWriteHazard.")
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
(cherry picked from commit a9fc81b098)
2019-12-03 10:24:04 -08:00
Rhys Perry
967043eb68 aco: fix i2i64
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
(cherry picked from commit 11f43caaec)
2019-12-03 10:23:56 -08:00
Rhys Perry
f4a4cce590 aco: propagate p_wqm on an image_sample's coordinate p_create_vector
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2156
Fixes: 93c8ebfa78 ('aco: Initial commit of independent AMD compiler')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
(cherry picked from commit ff70ccad16)
2019-12-03 10:23:51 -08:00
Christian Gmeiner
4f026b2a05 etnaviv: remove dead code
ptiled is always NULL so the if statement is useless.

CoverityID: 1415572
Fixes: b962776530 ("etnaviv: rework compatible render base")
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
(cherry picked from commit 1be220833c)
2019-12-03 10:23:46 -08:00
Jonathan Gray
d32a34a3f3 i965: update Makefile.sources for perf changes
brw_performance_query_metrics.h was removed in
134e750e16 and
brw_performance_query.h was removed in
8ae6667992

remove reference to these files from Makefile.sources

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Fixes: 134e750e16 ("i965: extract performance query metrics")
Fixes: 8ae6667992 ("intel/perf: move query_object into perf")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
(cherry picked from commit 34dda0ca65)
2019-12-03 10:23:40 -08:00
Boris Brezillon
8e3c4caf74 panfrost: Make sure we reset the damage region of RTs at flush time
We must reset the damage info of our render targets here even though a
damage reset normally happens when the DRI layer swaps buffers. That's
because there can be implicit flushes the GL app is not aware of, and
those might impact the damage region: if part of the damaged portion
is drawn during those implicit flushes, you have to reload those areas
before next draws are pushed, and since the driver can't easily know
what's been modified by the draws it flushed, the easiest solution is
to reload everything.

Reported-by: Carsten Haitzler <raster@rasterman.com>
Fixes: 65ae86b854 ("panfrost: Add support for KHR_partial_update()")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
(cherry picked from commit c6e2096c47)
2019-12-03 10:23:35 -08:00
Boris Brezillon
e11d9cd9ed gallium: Fix the ->set_damage_region() implementation
BACK_LEFT attachment can be outdated when the user calls
KHR_partial_update() (->lastStamp != ->texture_stamp), leading to a
damage region update on the wrong pipe_resource object.
Let's delay the ->set_damage_region() call until the attachments are
updated when we're in that case.

Reported-by: Carsten Haitzler <raster@rasterman.com>
Fixes: 492ffbed63 ("st/dri2: Implement DRI2bufferDamageExtension")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit b196e1a8cf)
2019-12-03 10:23:29 -08:00
Bas Nieuwenhuizen
0ca8b506a4 radv: Fix timeline semaphore refcounting.
Was totally broken ...

Removed two if(point) {} because point is always non-NULL and we
were counting on that already for counting, since we NULL our
references to semaphores without active point earlier.

Fixes: 4aa75bb3bd "radv: Add wait-before-submit support for timelines."
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2137
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 48fc65413c)
2019-12-03 10:23:24 -08:00
Jonathan Gray
a260645345 winsys/amdgpu: avoid double simple_mtx_unlock()
pthread_mutex_unlock() when unlocked is documented by posix as
being undefined behaviour.  On OpenBSD pthread_mutex_unlock() will call
abort(3) if this happens.

This occurs in amdgpu_winsys_create() after
cb446dc0fa
winsys/amdgpu: Add amdgpu_screen_winsys

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: 19.2 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit 3fe3bde4f2)
2019-12-03 10:23:20 -08:00
Bas Nieuwenhuizen
5ba4fb857d radv: Unify max_descriptor_set_size.
They were out of sync. Besides syncing, lets ensure they never diverge
again.

Fixes: 8d2654a419 "radv: Support VK_EXT_inline_uniform_block."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 4cde0e04e3)
2019-12-03 10:23:16 -08:00
Kenneth Graunke
553de940de drirc: Set vs_position_always_invariant for Shadow of Mordor on Intel
When drawing the main character in Shadow of Mordor, the game appears
to draw Talion with one vertex shader, and the Wraith with another.
If the compiler optimizes those in different ways which lead to slight
imprecisions, then the resulting positions may not line up, leading to
Z-fighting occurring as the game decides which of the two are in front.

brw_nir_opt_peephole_ffma looks at usages of multiply adds across the
entire shader, and may make different decisions between the two, leading
to such imprecisions and Z-fighting.  This started happening recently
after a NIR change to eliminate unnecessary MOVs (7025dbe7), but that
change simply exposed the existing problem.

Improves performance on Skylake GT4e by 1.22945% +/- 0.398672% (n=3),
likely due to the fixed rendering.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1985
Fixes: 7025dbe794 ("nir: Skip emitting no-op movs from the builder.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 51cc380894)
2019-12-03 10:23:12 -08:00
Kenneth Graunke
f63c3ecaa0 driconf, glsl: Add a vs_position_always_invariant option
Many applications use multi-pass rendering and require their vertex
shader position to be computed the same way each time.  Optimizations
may consider, say, fusing a multiply-add based on global usage of an
expression in a shader.  But a second shader with the same expression
may have different code, causing that optimization to make the other
choice the second time around.

The correct solution is for applications to mark their VS outputs
'invariant', indicating they need multiple shaders to compute that
output in the same manner.  However, most applications fail to do so.

So, we add a new driconf option - vs_position_always_invariant - which
forces the gl_Position output in vertex shaders to be marked invariant.

Fixes: 7025dbe794 ("nir: Skip emitting no-op movs from the builder.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
(cherry picked from commit 9b577f2a88)
2019-12-03 10:23:03 -08:00
Samuel Pitoiset
d438ccdedf radv/gfx10: fix implementation of exclusive scans
This implementation is loosely based on ROCm.
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/master/ockl/src/wfredscan.cl

This fixes dEQP-VK.subgroups.arithmetic.*.subgroupexclusive* on GFX10.

Fixes: 227c29a80d ("amd/common/gfx10: implement scan & reduce operations")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit c9aa843961)
Conflicts resolved by Dylan Baker
2019-12-03 10:22:47 -08:00
Samuel Pitoiset
19573e4374 radv: fix enabling sample shading with SampleID/SamplePosition
When a fragment shader includes an input variable decorated with
SampleId or SamplePosition, sample shading should be enabled
because minSampleShadingFactor is expected to be 1.0.

Cc: 19.2, 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 86a5fbfd4a)
2019-11-27 09:47:14 -08:00
58 changed files with 3690 additions and 246 deletions

View File

@@ -1 +1 @@
19.3.0-rc5
19.3.0

View File

@@ -1,6 +1,7 @@
# This is reverted shortly after landing
4432a2d14d80081d062f7939a950d65ea3a16eed
# This was manually backported
# These were manually backported
21be5c8edd3ad156f6cbfbceb96e7939716d9f2c
4b392ced2d744fccffe95490ff57e6b41033c266
b6905438514ae4de0b7f85c861e3d811ddaadda9

3138
docs/relnotes/19.3.0.html Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -1362,6 +1362,20 @@ EGLAPI EGLuint64NV EGLAPIENTRY eglGetSystemTimeNV (void);
#define EGL_NATIVE_SURFACE_TIZEN 0x32A1
#endif /* EGL_TIZEN_image_native_surface */
#ifndef EGL_EXT_image_flush_external
#define EGL_EXT_image_flush_external 1
#define EGL_IMAGE_EXTERNAL_FLUSH_EXT 0x32A2
typedef EGLBoolean (EGLAPIENTRYP PFNEGLIMAGEFLUSHEXTERNALEXTPROC) (EGLDisplay dpy, EGLImageKHR image, const EGLAttrib *attrib_list);
typedef EGLBoolean (EGLAPIENTRYP PFNEGLIMAGEINVALIDATEEXTERNALEXTPROC) (EGLDisplay dpy, EGLImageKHR image, const EGLAttrib *attrib_list);
#ifdef EGL_EGLEXT_PROTOTYPES
EGLAPI EGLBoolean EGLAPIENTRY eglImageFlushExternalEXT (EGLDisplay dpy, EGLImageKHR image, const EGLAttrib *attrib_list);
EGLAPI EGLBoolean EGLAPIENTRY eglImageInvalidateExternalEXT (EGLDisplay dpy, EGLImageKHR image, const EGLAttrib *attrib_list);
#endif
#endif /* EGL_EXT_image_flush_external */
#include <EGL/eglmesaext.h>
#include <EGL/eglextchromium.h>
#ifdef __cplusplus
}
#endif

View File

@@ -53,17 +53,6 @@ typedef EGLBoolean (EGLAPIENTRYP PFNEGLGETSYNCVALUESCHROMIUMPROC)
#endif
#endif
#ifndef EGL_EXT_image_flush_external
#define EGL_EXT_image_flush_external 1
#define EGL_IMAGE_EXTERNAL_FLUSH_EXT 0x32A2
typedef EGLBoolean (EGLAPIENTRYP PFNEGLIMAGEFLUSHEXTERNALEXTPROC) (EGLDisplay dpy, EGLImageKHR image, const EGLAttrib *attrib_list);
typedef EGLBoolean (EGLAPIENTRYP PFNEGLIMAGEINVALIDATEEXTERNALEXTPROC) (EGLDisplay dpy, EGLImageKHR image, const EGLAttrib *attrib_list);
#ifdef EGL_EGLEXT_PROTOTYPES
EGLAPI EGLBoolean EGLAPIENTRY eglImageFlushExternalEXT (EGLDisplay dpy, EGLImageKHR image, const EGLAttrib *attrib_list);
EGLAPI EGLBoolean EGLAPIENTRY eglImageInvalidateExternalEXT (EGLDisplay dpy, EGLImageKHR image, const EGLAttrib *attrib_list);
#endif
#endif /* EGL_EXT_image_flush_external */
#ifdef __cplusplus
}
#endif

View File

@@ -392,7 +392,7 @@ void insert_NOPs_gfx8_9(Program* program)
}
}
void handle_instruction_gfx10(NOP_ctx_gfx10 &ctx, aco_ptr<Instruction>& instr,
void handle_instruction_gfx10(Program *program, NOP_ctx_gfx10 &ctx, aco_ptr<Instruction>& instr,
std::vector<aco_ptr<Instruction>>& old_instructions,
std::vector<aco_ptr<Instruction>>& new_instructions)
{
@@ -403,6 +403,9 @@ void handle_instruction_gfx10(NOP_ctx_gfx10 &ctx, aco_ptr<Instruction>& instr,
instr->format == Format::SCRATCH || instr->format == Format::DS) {
/* Remember all SGPRs that are read by the VMEM instruction */
mark_read_regs(instr, ctx.sgprs_read_by_VMEM);
ctx.sgprs_read_by_VMEM.set(exec);
if (program->wave_size == 64)
ctx.sgprs_read_by_VMEM.set(exec_hi);
} else if (instr->isSALU() || instr->format == Format::SMEM) {
/* Check if SALU writes an SGPR that was previously read by the VALU */
if (check_written_regs(instr, ctx.sgprs_read_by_VMEM)) {
@@ -535,7 +538,7 @@ void handle_instruction_gfx10(NOP_ctx_gfx10 &ctx, aco_ptr<Instruction>& instr,
}
}
void handle_block_gfx10(NOP_ctx_gfx10& ctx, Block& block)
void handle_block_gfx10(Program *program, NOP_ctx_gfx10& ctx, Block& block)
{
if (block.instructions.empty())
return;
@@ -544,7 +547,7 @@ void handle_block_gfx10(NOP_ctx_gfx10& ctx, Block& block)
instructions.reserve(block.instructions.size());
for (aco_ptr<Instruction>& instr : block.instructions) {
handle_instruction_gfx10(ctx, instr, block.instructions, instructions);
handle_instruction_gfx10(program, ctx, instr, block.instructions, instructions);
instructions.emplace_back(std::move(instr));
}
@@ -569,7 +572,7 @@ void mitigate_hazards_gfx10(Program *program)
for (unsigned b : program->blocks[idx].linear_preds)
loop_block_ctx.join(all_ctx[b]);
handle_block_gfx10(loop_block_ctx, program->blocks[idx]);
handle_block_gfx10(program, loop_block_ctx, program->blocks[idx]);
/* We only need to continue if the loop header context changed */
if (idx == loop_header_indices.top() && loop_block_ctx == all_ctx[idx])
@@ -584,7 +587,7 @@ void mitigate_hazards_gfx10(Program *program)
for (unsigned b : block.linear_preds)
ctx.join(all_ctx[b]);
handle_block_gfx10(ctx, block);
handle_block_gfx10(program, ctx, block);
}
}

View File

@@ -1976,8 +1976,12 @@ void visit_alu_instr(isel_context *ctx, nir_alu_instr *instr)
}
case nir_op_i2i64: {
Temp src = get_alu_src(ctx, instr->src[0]);
if (instr->src[0].src.ssa->bit_size == 32) {
bld.pseudo(aco_opcode::p_create_vector, Definition(dst), src, Operand(0u));
if (src.regClass() == s1) {
Temp high = bld.sopc(aco_opcode::s_ashr_i32, bld.def(s1, scc), src, Operand(31u));
bld.pseudo(aco_opcode::p_create_vector, Definition(dst), src, high);
} else if (src.regClass() == v1) {
Temp high = bld.vop2(aco_opcode::v_ashrrev_i32, bld.def(v1), Operand(31u), src);
bld.pseudo(aco_opcode::p_create_vector, Definition(dst), src, high);
} else {
fprintf(stderr, "Unimplemented NIR instr bit size: ");
nir_print_instr(&instr->instr, stderr);
@@ -6572,11 +6576,6 @@ void visit_tex(isel_context *ctx, nir_tex_instr *instr)
}
}
if (!(has_ddx && has_ddy) && !has_lod && !level_zero &&
instr->sampler_dim != GLSL_SAMPLER_DIM_MS &&
instr->sampler_dim != GLSL_SAMPLER_DIM_SUBPASS_MS)
coords = emit_wqm(ctx, coords, bld.tmp(coords.regClass()), true);
std::vector<Operand> args;
if (has_offset)
args.emplace_back(Operand(offset));
@@ -6592,7 +6591,7 @@ void visit_tex(isel_context *ctx, nir_tex_instr *instr)
if (has_lod)
args.emplace_back(lod);
Operand arg;
Temp arg;
if (args.size() > 1) {
aco_ptr<Pseudo_instruction> vec{create_instruction<Pseudo_instruction>(aco_opcode::p_create_vector, Format::PSEUDO, args.size(), 1)};
unsigned size = 0;
@@ -6604,12 +6603,20 @@ void visit_tex(isel_context *ctx, nir_tex_instr *instr)
Temp tmp = bld.tmp(rc);
vec->definitions[0] = Definition(tmp);
ctx->block->instructions.emplace_back(std::move(vec));
arg = Operand(tmp);
arg = tmp;
} else {
assert(args[0].isTemp());
arg = Operand(as_vgpr(ctx, args[0].getTemp()));
arg = as_vgpr(ctx, args[0].getTemp());
}
/* we don't need the bias, sample index, compare value or offset to be
* computed in WQM but if the p_create_vector copies the coordinates, then it
* needs to be in WQM */
if (!(has_ddx && has_ddy) && !has_lod && !level_zero &&
instr->sampler_dim != GLSL_SAMPLER_DIM_MS &&
instr->sampler_dim != GLSL_SAMPLER_DIM_SUBPASS_MS)
arg = emit_wqm(ctx, arg, bld.tmp(arg.regClass()), true);
if (instr->sampler_dim == GLSL_SAMPLER_DIM_BUF) {
//FIXME: if (ctx->abi->gfx9_stride_size_workaround) return ac_build_buffer_load_format_gfx9_safe()
@@ -6741,7 +6748,7 @@ void visit_tex(isel_context *ctx, nir_tex_instr *instr)
}
tex.reset(create_instruction<MIMG_instruction>(opcode, Format::MIMG, 3, 1));
tex->operands[0] = arg;
tex->operands[0] = Operand(arg);
tex->operands[1] = Operand(resource);
tex->operands[2] = Operand(sampler);
tex->dim = dim;

View File

@@ -130,8 +130,7 @@ struct InstrPred {
return false;
}
}
if (a->format == Format::PSEUDO_BRANCH)
return false;
if (a->isVOP3()) {
VOP3A_instruction* a3 = static_cast<VOP3A_instruction*>(a);
VOP3A_instruction* b3 = static_cast<VOP3A_instruction*>(b);
@@ -147,7 +146,8 @@ struct InstrPred {
if (a->isDPP()) {
DPP_instruction* aDPP = static_cast<DPP_instruction*>(a);
DPP_instruction* bDPP = static_cast<DPP_instruction*>(b);
return aDPP->dpp_ctrl == bDPP->dpp_ctrl &&
return aDPP->pass_flags == bDPP->pass_flags &&
aDPP->dpp_ctrl == bDPP->dpp_ctrl &&
aDPP->bank_mask == bDPP->bank_mask &&
aDPP->row_mask == bDPP->row_mask &&
aDPP->bound_ctrl == bDPP->bound_ctrl &&
@@ -156,6 +156,7 @@ struct InstrPred {
aDPP->neg[0] == bDPP->neg[0] &&
aDPP->neg[1] == bDPP->neg[1];
}
switch (a->format) {
case Format::VOPC: {
/* Since the results depend on the exec mask, these shouldn't
@@ -191,7 +192,7 @@ struct InstrPred {
/* this is fine since they are only used for vertex input fetches */
MTBUF_instruction* aM = static_cast<MTBUF_instruction *>(a);
MTBUF_instruction* bM = static_cast<MTBUF_instruction *>(b);
return aM->can_reorder == bM->can_reorder &&
return aM->can_reorder && bM->can_reorder &&
aM->barrier == bM->barrier &&
aM->dfmt == bM->dfmt &&
aM->nfmt == bM->nfmt &&
@@ -208,6 +209,10 @@ struct InstrPred {
case Format::FLAT:
case Format::GLOBAL:
case Format::SCRATCH:
case Format::EXP:
case Format::SOPP:
case Format::PSEUDO_BRANCH:
case Format::PSEUDO_BARRIER:
return false;
case Format::DS: {
/* we already handle potential issue with permute/swizzle above */
@@ -276,6 +281,10 @@ void process_block(vn_ctx& ctx, Block& block)
op.setTemp(it->second);
}
if (instr->opcode == aco_opcode::p_discard_if ||
instr->opcode == aco_opcode::p_demote_to_helper)
ctx.exec_id++;
if (instr->definitions.empty()) {
new_instructions.emplace_back(std::move(instr));
continue;
@@ -288,10 +297,6 @@ void process_block(vn_ctx& ctx, Block& block)
ctx.renames[instr->definitions[0].tempId()] = instr->operands[0].getTemp();
}
if (instr->opcode == aco_opcode::p_discard_if ||
instr->opcode == aco_opcode::p_demote_to_helper)
ctx.exec_id++;
instr->pass_flags = ctx.exec_id;
std::pair<expr_set::iterator, bool> res = ctx.expr_values.emplace(instr.get(), block.index);
@@ -303,6 +308,7 @@ void process_block(vn_ctx& ctx, Block& block)
if (dominates(ctx, res.first->second, block.index)) {
for (unsigned i = 0; i < instr->definitions.size(); i++) {
assert(instr->definitions[i].regClass() == orig_instr->definitions[i].regClass());
assert(instr->definitions[i].isTemp());
ctx.renames[instr->definitions[i].tempId()] = orig_instr->definitions[i].getTemp();
}
} else {

View File

@@ -759,11 +759,18 @@ PhysReg get_reg_create_vector(ra_ctx& ctx,
/* count variables to be moved and check war_hint */
bool war_hint = false;
for (unsigned j = reg_lo; j <= reg_hi; j++) {
if (reg_file[j] != 0)
bool linear_vgpr = false;
for (unsigned j = reg_lo; j <= reg_hi && !linear_vgpr; j++) {
if (reg_file[j] != 0) {
k++;
/* we cannot split live ranges of linear vgprs */
if (ctx.assignments[reg_file[j]].second & (1 << 6))
linear_vgpr = true;
}
war_hint |= ctx.war_hint[j];
}
if (linear_vgpr || (war_hint && !best_war_hint))
continue;
/* count operands in wrong positions */
for (unsigned j = 0, offset = 0; j < instr->operands.size(); offset += instr->operands[j].size(), j++) {
@@ -775,7 +782,7 @@ PhysReg get_reg_create_vector(ra_ctx& ctx,
k += instr->operands[j].size();
}
bool aligned = rc == RegClass::v4 && reg_lo % 4 == 0;
if (k > num_moves || (!aligned && k == num_moves) || (war_hint && !best_war_hint))
if (k > num_moves || (!aligned && k == num_moves))
continue;
best_pos = reg_lo;
@@ -961,7 +968,7 @@ void register_allocation(Program *program, std::vector<std::set<Temp>> live_out_
handle_live_in = [&](Temp val, Block *block) -> Temp {
std::vector<unsigned>& preds = val.is_linear() ? block->linear_preds : block->logical_preds;
if (preds.size() == 0 && block->index != 0) {
if (preds.size() == 0 || val.regClass() == val.regClass().as_linear()) {
renames[block->index][val.id()] = val;
return val;
}

View File

@@ -3953,8 +3953,43 @@ ac_build_scan(struct ac_llvm_context *ctx, nir_op op, LLVMValueRef src, LLVMValu
{
LLVMValueRef result, tmp;
if (ctx->chip_class >= GFX10) {
result = inclusive ? src : identity;
if (inclusive) {
result = src;
} else if (ctx->chip_class >= GFX10) {
/* wavefront shift_right by 1 on GFX10 (emulate dpp_wf_sr1) */
LLVMValueRef active, tmp1, tmp2;
LLVMValueRef tid = ac_get_thread_id(ctx);
tmp1 = ac_build_dpp(ctx, identity, src, dpp_row_sr(1), 0xf, 0xf, false);
tmp2 = ac_build_permlane16(ctx, src, (uint64_t)~0, true, false);
if (maxprefix > 32) {
active = LLVMBuildICmp(ctx->builder, LLVMIntEQ, tid,
LLVMConstInt(ctx->i32, 32, false), "");
tmp2 = LLVMBuildSelect(ctx->builder, active,
ac_build_readlane(ctx, src,
LLVMConstInt(ctx->i32, 31, false)),
tmp2, "");
active = LLVMBuildOr(ctx->builder, active,
LLVMBuildICmp(ctx->builder, LLVMIntEQ,
LLVMBuildAnd(ctx->builder, tid,
LLVMConstInt(ctx->i32, 0x1f, false), ""),
LLVMConstInt(ctx->i32, 0x10, false), ""), "");
src = LLVMBuildSelect(ctx->builder, active, tmp2, tmp1, "");
} else if (maxprefix > 16) {
active = LLVMBuildICmp(ctx->builder, LLVMIntEQ, tid,
LLVMConstInt(ctx->i32, 16, false), "");
src = LLVMBuildSelect(ctx->builder, active, tmp2, tmp1, "");
}
result = src;
} else if (ctx->chip_class >= GFX8) {
src = ac_build_dpp(ctx, identity, src, dpp_wf_sr1, 0xf, 0xf, false);
result = src;
} else {
if (!inclusive)
src = ac_build_dpp(ctx, identity, src, dpp_wf_sr1, 0xf, 0xf, false);
@@ -3984,33 +4019,31 @@ ac_build_scan(struct ac_llvm_context *ctx, nir_op op, LLVMValueRef src, LLVMValu
return result;
if (ctx->chip_class >= GFX10) {
/* dpp_row_bcast{15,31} are not supported on gfx10. */
LLVMBuilderRef builder = ctx->builder;
LLVMValueRef tid = ac_get_thread_id(ctx);
LLVMValueRef cc;
/* TODO-GFX10: Can we get better code-gen by putting this into
* a branch so that LLVM generates EXEC mask manipulations? */
if (inclusive)
tmp = result;
else
tmp = ac_build_alu_op(ctx, result, src, op);
tmp = ac_build_permlane16(ctx, tmp, ~(uint64_t)0, true, false);
tmp = ac_build_alu_op(ctx, result, tmp, op);
cc = LLVMBuildAnd(builder, tid, LLVMConstInt(ctx->i32, 16, false), "");
cc = LLVMBuildICmp(builder, LLVMIntNE, cc, ctx->i32_0, "");
result = LLVMBuildSelect(builder, cc, tmp, result, "");
LLVMValueRef active;
tmp = ac_build_permlane16(ctx, result, ~(uint64_t)0, true, false);
active = LLVMBuildICmp(ctx->builder, LLVMIntNE,
LLVMBuildAnd(ctx->builder, tid,
LLVMConstInt(ctx->i32, 16, false), ""),
ctx->i32_0, "");
tmp = LLVMBuildSelect(ctx->builder, active, tmp, identity, "");
result = ac_build_alu_op(ctx, result, tmp, op);
if (maxprefix <= 32)
return result;
if (inclusive)
tmp = result;
else
tmp = ac_build_alu_op(ctx, result, src, op);
tmp = ac_build_readlane(ctx, tmp, LLVMConstInt(ctx->i32, 31, false));
tmp = ac_build_alu_op(ctx, result, tmp, op);
cc = LLVMBuildICmp(builder, LLVMIntUGE, tid,
LLVMConstInt(ctx->i32, 32, false), "");
result = LLVMBuildSelect(builder, cc, tmp, result, "");
tmp = ac_build_readlane(ctx, result, LLVMConstInt(ctx->i32, 31, false));
active = LLVMBuildICmp(ctx->builder, LLVMIntUGE, tid,
LLVMConstInt(ctx->i32, 32, false), "");
tmp = LLVMBuildSelect(ctx->builder, active, tmp, identity, "");
result = ac_build_alu_op(ctx, result, tmp, op);
return result;
}

View File

@@ -3713,11 +3713,21 @@ static void visit_intrinsic(struct ac_nir_context *ctx,
break;
}
case nir_intrinsic_load_constant: {
unsigned base = nir_intrinsic_base(instr);
unsigned range = nir_intrinsic_range(instr);
LLVMValueRef offset = get_src(ctx, instr->src[0]);
LLVMValueRef base = LLVMConstInt(ctx->ac.i32,
nir_intrinsic_base(instr),
false);
offset = LLVMBuildAdd(ctx->ac.builder, offset, base, "");
offset = LLVMBuildAdd(ctx->ac.builder, offset,
LLVMConstInt(ctx->ac.i32, base, false), "");
/* Clamp the offset to avoid out-of-bound access because global
* instructions can't handle them.
*/
LLVMValueRef size = LLVMConstInt(ctx->ac.i32, base + range, false);
LLVMValueRef cond = LLVMBuildICmp(ctx->ac.builder, LLVMIntULT,
offset, size, "");
offset = LLVMBuildSelect(ctx->ac.builder, cond, offset, size, "");
LLVMValueRef ptr = ac_build_gep0(&ctx->ac, ctx->constant_data,
offset);
LLVMTypeRef comp_type =

View File

@@ -392,8 +392,8 @@ vk_format_from_android(unsigned android_format, unsigned android_usage)
{
switch (android_format) {
case AHARDWAREBUFFER_FORMAT_R8G8B8A8_UNORM:
return VK_FORMAT_R8G8B8A8_UNORM;
case AHARDWAREBUFFER_FORMAT_R8G8B8X8_UNORM:
return VK_FORMAT_R8G8B8A8_UNORM;
case AHARDWAREBUFFER_FORMAT_R8G8B8_UNORM:
return VK_FORMAT_R8G8B8_UNORM;
case AHARDWAREBUFFER_FORMAT_R5G6B5_UNORM:

View File

@@ -1097,6 +1097,24 @@ void radv_GetPhysicalDeviceFeatures2(
return radv_GetPhysicalDeviceFeatures(physicalDevice, &pFeatures->features);
}
static size_t
radv_max_descriptor_set_size()
{
/* make sure that the entire descriptor set is addressable with a signed
* 32-bit int. So the sum of all limits scaled by descriptor size has to
* be at most 2 GiB. the combined image & samples object count as one of
* both. This limit is for the pipeline layout, not for the set layout, but
* there is no set limit, so we just set a pipeline limit. I don't think
* any app is going to hit this soon. */
return ((1ull << 31) - 16 * MAX_DYNAMIC_BUFFERS
- MAX_INLINE_UNIFORM_BLOCK_SIZE * MAX_INLINE_UNIFORM_BLOCK_COUNT) /
(32 /* uniform buffer, 32 due to potential space wasted on alignment */ +
32 /* storage buffer, 32 due to potential space wasted on alignment */ +
32 /* sampler, largest when combined with image */ +
64 /* sampled image */ +
64 /* storage image */);
}
void radv_GetPhysicalDeviceProperties(
VkPhysicalDevice physicalDevice,
VkPhysicalDeviceProperties* pProperties)
@@ -1104,18 +1122,7 @@ void radv_GetPhysicalDeviceProperties(
RADV_FROM_HANDLE(radv_physical_device, pdevice, physicalDevice);
VkSampleCountFlags sample_counts = 0xf;
/* make sure that the entire descriptor set is addressable with a signed
* 32-bit int. So the sum of all limits scaled by descriptor size has to
* be at most 2 GiB. the combined image & samples object count as one of
* both. This limit is for the pipeline layout, not for the set layout, but
* there is no set limit, so we just set a pipeline limit. I don't think
* any app is going to hit this soon. */
size_t max_descriptor_set_size = ((1ull << 31) - 16 * MAX_DYNAMIC_BUFFERS) /
(32 /* uniform buffer, 32 due to potential space wasted on alignment */ +
32 /* storage buffer, 32 due to potential space wasted on alignment */ +
32 /* sampler, largest when combined with image */ +
64 /* sampled image */ +
64 /* storage image */);
size_t max_descriptor_set_size = radv_max_descriptor_set_size();
VkPhysicalDeviceLimits limits = {
.maxImageDimension1D = (1 << 14),
@@ -1394,13 +1401,7 @@ void radv_GetPhysicalDeviceProperties2(
properties->robustBufferAccessUpdateAfterBind = false;
properties->quadDivergentImplicitLod = false;
size_t max_descriptor_set_size = ((1ull << 31) - 16 * MAX_DYNAMIC_BUFFERS -
MAX_INLINE_UNIFORM_BLOCK_SIZE * MAX_INLINE_UNIFORM_BLOCK_COUNT) /
(32 /* uniform buffer, 32 due to potential space wasted on alignment */ +
32 /* storage buffer, 32 due to potential space wasted on alignment */ +
32 /* sampler, largest when combined with image */ +
64 /* sampled image */ +
64 /* storage image */);
size_t max_descriptor_set_size = radv_max_descriptor_set_size();
properties->maxPerStageDescriptorUpdateAfterBindSamplers = max_descriptor_set_size;
properties->maxPerStageDescriptorUpdateAfterBindUniformBuffers = max_descriptor_set_size;
properties->maxPerStageDescriptorUpdateAfterBindStorageBuffers = max_descriptor_set_size;
@@ -3855,8 +3856,7 @@ radv_finalize_timelines(struct radv_device *device,
pthread_mutex_lock(&wait_sems[i]->timeline.mutex);
struct radv_timeline_point *point =
radv_timeline_find_point_at_least_locked(device, &wait_sems[i]->timeline, wait_values[i]);
if (point)
--point->wait_count;
point->wait_count -= 2;
pthread_mutex_unlock(&wait_sems[i]->timeline.mutex);
}
}
@@ -3865,11 +3865,9 @@ radv_finalize_timelines(struct radv_device *device,
pthread_mutex_lock(&signal_sems[i]->timeline.mutex);
struct radv_timeline_point *point =
radv_timeline_find_point_at_least_locked(device, &signal_sems[i]->timeline, signal_values[i]);
if (point) {
signal_sems[i]->timeline.highest_submitted =
MAX2(signal_sems[i]->timeline.highest_submitted, point->value);
point->wait_count--;
}
signal_sems[i]->timeline.highest_submitted =
MAX2(signal_sems[i]->timeline.highest_submitted, point->value);
point->wait_count -= 2;
radv_timeline_trigger_waiters_locked(&signal_sems[i]->timeline, processing_list);
pthread_mutex_unlock(&signal_sems[i]->timeline.mutex);
}
@@ -5458,8 +5456,6 @@ radv_timeline_wait_locked(struct radv_device *device,
if (!point)
return VK_SUCCESS;
point->wait_count++;
pthread_mutex_unlock(&timeline->mutex);
bool success = device->ws->wait_syncobj(device->ws, &point->syncobj, 1, true, abs_timeout);

View File

@@ -1101,15 +1101,32 @@ radv_pipeline_init_multisample_state(struct radv_pipeline *pipeline,
int ps_iter_samples = 1;
uint32_t mask = 0xffff;
if (vkms)
if (vkms) {
ms->num_samples = vkms->rasterizationSamples;
else
ms->num_samples = 1;
if (vkms)
ps_iter_samples = radv_pipeline_get_ps_iter_samples(vkms);
if (vkms && !vkms->sampleShadingEnable && pipeline->shaders[MESA_SHADER_FRAGMENT]->info.ps.force_persample) {
ps_iter_samples = ms->num_samples;
/* From the Vulkan 1.1.129 spec, 26.7. Sample Shading:
*
* "Sample shading is enabled for a graphics pipeline:
*
* - If the interface of the fragment shader entry point of the
* graphics pipeline includes an input variable decorated
* with SampleId or SamplePosition. In this case
* minSampleShadingFactor takes the value 1.0.
* - Else if the sampleShadingEnable member of the
* VkPipelineMultisampleStateCreateInfo structure specified
* when creating the graphics pipeline is set to VK_TRUE. In
* this case minSampleShadingFactor takes the value of
* VkPipelineMultisampleStateCreateInfo::minSampleShading.
*
* Otherwise, sample shading is considered disabled."
*/
if (pipeline->shaders[MESA_SHADER_FRAGMENT]->info.ps.force_persample) {
ps_iter_samples = ms->num_samples;
} else {
ps_iter_samples = radv_pipeline_get_ps_iter_samples(vkms);
}
} else {
ms->num_samples = 1;
}
const struct VkPipelineRasterizationStateRasterizationOrderAMD *raster_order =

View File

@@ -151,6 +151,15 @@ set_output_usage_mask(const nir_shader *nir, const nir_intrinsic_instr *instr,
((wrmask >> (i * 4)) & 0xf) << comp;
}
static void
set_writes_memory(const nir_shader *nir, struct radv_shader_info *info)
{
if (nir->info.stage == MESA_SHADER_FRAGMENT)
info->ps.writes_memory = true;
else if (nir->info.stage == MESA_SHADER_GEOMETRY)
info->gs.writes_memory = true;
}
static void
gather_intrinsic_store_deref_info(const nir_shader *nir,
const nir_intrinsic_instr *instr,
@@ -308,10 +317,7 @@ gather_intrinsic_info(const nir_shader *nir, const nir_intrinsic_instr *instr,
instr->intrinsic == nir_intrinsic_image_deref_atomic_xor ||
instr->intrinsic == nir_intrinsic_image_deref_atomic_exchange ||
instr->intrinsic == nir_intrinsic_image_deref_atomic_comp_swap) {
if (nir->info.stage == MESA_SHADER_FRAGMENT)
info->ps.writes_memory = true;
else if (nir->info.stage == MESA_SHADER_GEOMETRY)
info->gs.writes_memory = true;
set_writes_memory(nir, info);
}
break;
}
@@ -326,17 +332,28 @@ gather_intrinsic_info(const nir_shader *nir, const nir_intrinsic_instr *instr,
case nir_intrinsic_ssbo_atomic_xor:
case nir_intrinsic_ssbo_atomic_exchange:
case nir_intrinsic_ssbo_atomic_comp_swap:
if (nir->info.stage == MESA_SHADER_FRAGMENT)
info->ps.writes_memory = true;
else if (nir->info.stage == MESA_SHADER_GEOMETRY)
info->gs.writes_memory = true;
set_writes_memory(nir, info);
break;
case nir_intrinsic_load_deref:
gather_intrinsic_load_deref_info(nir, instr, info);
break;
case nir_intrinsic_store_deref:
gather_intrinsic_store_deref_info(nir, instr, info);
/* fallthrough */
case nir_intrinsic_deref_atomic_add:
case nir_intrinsic_deref_atomic_imin:
case nir_intrinsic_deref_atomic_umin:
case nir_intrinsic_deref_atomic_imax:
case nir_intrinsic_deref_atomic_umax:
case nir_intrinsic_deref_atomic_and:
case nir_intrinsic_deref_atomic_or:
case nir_intrinsic_deref_atomic_xor:
case nir_intrinsic_deref_atomic_exchange:
case nir_intrinsic_deref_atomic_comp_swap: {
if (nir_src_as_deref(instr->src[0])->mode & (nir_var_mem_global | nir_var_mem_ssbo))
set_writes_memory(nir, info);
break;
}
default:
break;
}

View File

@@ -58,6 +58,6 @@ libbroadcom_cle = static_library(
'v3d_decoder.c',
include_directories : [inc_common, inc_broadcom],
c_args : [c_vis_args, no_override_init_args],
dependencies : [dep_libdrm, dep_valgrind],
dependencies : [dep_libdrm, dep_valgrind, dep_expat, dep_zlib],
build_by_default : false,
)

View File

@@ -1435,6 +1435,9 @@ builtin_variable_generator::add_varying(int slot, const glsl_type *type,
void
builtin_variable_generator::generate_varyings()
{
struct gl_shader_compiler_options *options =
&state->ctx->Const.ShaderCompilerOptions[state->stage];
/* gl_Position and gl_PointSize are not visible from fragment shaders. */
if (state->stage != MESA_SHADER_FRAGMENT) {
add_varying(VARYING_SLOT_POS, vec4_t, GLSL_PRECISION_HIGH, "gl_Position");
@@ -1526,6 +1529,9 @@ builtin_variable_generator::generate_varyings()
var->data.sample = fields[i].sample;
var->data.patch = fields[i].patch;
var->init_interface_type(per_vertex_out_type);
var->data.invariant = fields[i].location == VARYING_SLOT_POS &&
options->PositionAlwaysInvariant;
}
}
}

View File

@@ -34,32 +34,11 @@
*/
static bool
add_interface_variables(const struct gl_context *cts,
struct gl_shader_program *prog,
struct set *resource_set,
unsigned stage, GLenum programInterface)
add_vars_from_list(const struct gl_context *ctx,
struct gl_shader_program *prog, struct set *resource_set,
const struct exec_list *var_list, unsigned stage,
GLenum programInterface)
{
const struct exec_list *var_list = NULL;
struct gl_linked_shader *sh = prog->_LinkedShaders[stage];
if (!sh)
return true;
nir_shader *nir = sh->Program->nir;
assert(nir);
switch (programInterface) {
case GL_PROGRAM_INPUT:
var_list = &nir->inputs;
break;
case GL_PROGRAM_OUTPUT:
var_list = &nir->outputs;
break;
default:
assert("!Should not get here");
break;
}
nir_foreach_variable(var, var_list) {
if (var->data.how_declared == nir_var_hidden)
continue;
@@ -108,6 +87,38 @@ add_interface_variables(const struct gl_context *cts,
return true;
}
static bool
add_interface_variables(const struct gl_context *ctx,
struct gl_shader_program *prog,
struct set *resource_set,
unsigned stage, GLenum programInterface)
{
struct gl_linked_shader *sh = prog->_LinkedShaders[stage];
if (!sh)
return true;
nir_shader *nir = sh->Program->nir;
assert(nir);
switch (programInterface) {
case GL_PROGRAM_INPUT: {
bool result = add_vars_from_list(ctx, prog, resource_set,
&nir->inputs, stage, programInterface);
result &= add_vars_from_list(ctx, prog, resource_set, &nir->system_values,
stage, programInterface);
return result;
}
case GL_PROGRAM_OUTPUT:
return add_vars_from_list(ctx, prog, resource_set, &nir->outputs, stage,
programInterface);
default:
assert("!Should not get here");
break;
}
return false;
}
/* TODO: as we keep adding features, this method is becoming more and more
* similar to its GLSL counterpart at linker.cpp. Eventually it would be good
* to check if they could be refactored, and reduce code duplication somehow

View File

@@ -316,6 +316,17 @@ nir_lower_clip_vs(nir_shader *shader, unsigned ucp_enables, bool use_vars,
if (!ucp_enables)
return false;
/* find clipvertex/position outputs: */
nir_foreach_variable(var, &shader->outputs) {
int loc = var->data.driver_location;
/* keep track of last used driver-location.. we'll be
* appending CLIP_DIST0/CLIP_DIST1 after last existing
* output:
*/
maxloc = MAX2(maxloc, loc);
}
nir_builder_init(&b, impl);
/* NIR should ensure that, even in case of loops/if-else, there

View File

@@ -184,7 +184,10 @@ get_flat_type(const nir_shader *shader, nir_variable *old_vars[MAX_SLOTS][4],
if (num_vars <= 1)
return NULL;
return glsl_array_type(glsl_vector_type(base, 4), slots, 0);
if (slots == 1)
return glsl_vector_type(base, 4);
else
return glsl_array_type(glsl_vector_type(base, 4), slots, 0);
}
static bool
@@ -340,6 +343,9 @@ build_array_deref_of_new_var_flat(nir_shader *shader,
deref = nir_build_deref_array(b, deref, index);
}
if (!glsl_type_is_array(deref->type))
return deref;
bool vs_in = shader->info.stage == MESA_SHADER_VERTEX &&
new_var->data.mode == nir_var_shader_in;
return nir_build_deref_array(

View File

@@ -100,8 +100,6 @@ def generateHeader(functions):
#include <EGL/egl.h>
#include <EGL/eglext.h>
#include <EGL/eglmesaext.h>
#include <EGL/eglextchromium.h>
#include "glvnd/libeglabi.h"
""".lstrip("\n"))

View File

@@ -33,8 +33,6 @@
#include <EGL/egl.h>
#include <EGL/eglext.h>
#include <EGL/eglmesaext.h>
#include <EGL/eglextchromium.h>
#ifdef __cplusplus
extern "C" {

View File

@@ -77,3 +77,14 @@ LOCAL_GENERATED_SOURCES += $(MESA_GEN_NIR_H)
include $(GALLIUM_COMMON_MK)
include $(BUILD_STATIC_LIBRARY)
# Build libmesa_galliumvl used by radeonsi
include $(CLEAR_VARS)
LOCAL_SRC_FILES := \
$(VL_SOURCES)
LOCAL_MODULE := libmesa_galliumvl
include $(GALLIUM_COMMON_MK)
include $(BUILD_STATIC_LIBRARY)

View File

@@ -38,6 +38,7 @@ DRI_CONF_SECTION_END
DRI_CONF_SECTION_MISCELLANEOUS
DRI_CONF_ALWAYS_HAVE_DEPTH_BUFFER("false")
DRI_CONF_GLSL_ZERO_INIT("false")
DRI_CONF_VS_POSITION_ALWAYS_INVARIANT("false")
DRI_CONF_ALLOW_RGB10_CONFIGS("true")
DRI_CONF_ALLOW_FP16_CONFIGS("false")
DRI_CONF_SECTION_END

View File

@@ -338,7 +338,14 @@ u_stream_outputs_for_vertices(enum pipe_prim_type primitive, unsigned nr)
/* Extraneous vertices don't contribute to stream outputs */
u_trim_pipe_prim(primitive, &nr);
/* Consider how many primitives are actually generated */
/* Polygons are special, since they are a single primitive with many
* vertices. In this case, we just have an output for each vertex (after
* trimming) */
if (primitive == PIPE_PRIM_POLYGON)
return nr;
/* Normally, consider how many primitives are actually generated */
unsigned prims = u_decomposed_prims_for_vertices(primitive, nr);
/* One output per vertex after decomposition */

View File

@@ -498,7 +498,6 @@ etna_resource_from_handle(struct pipe_screen *pscreen,
struct etna_resource *rsc;
struct etna_resource_level *level;
struct pipe_resource *prsc;
struct pipe_resource *ptiled = NULL;
DBG("target=%d, format=%s, %ux%ux%u, array_size=%u, last_level=%u, "
"nr_samples=%u, usage=%u, bind=%x, flags=%x",
@@ -572,8 +571,6 @@ etna_resource_from_handle(struct pipe_screen *pscreen,
fail:
etna_resource_destroy(pscreen, prsc);
if (ptiled)
etna_resource_destroy(pscreen, ptiled);
return NULL;
}

View File

@@ -60,9 +60,9 @@ fd4_screen_is_format_supported(struct pipe_screen *pscreen,
}
if ((usage & PIPE_BIND_SAMPLER_VIEW) &&
(fd4_pipe2tex(format) != (enum a4xx_tex_fmt)~0) &&
(target == PIPE_BUFFER ||
util_format_get_blocksize(format) != 12) &&
(fd4_pipe2tex(format) != (enum a4xx_tex_fmt)~0)) {
util_format_get_blocksize(format) != 12)) {
retval |= PIPE_BIND_SAMPLER_VIEW;
}

View File

@@ -76,9 +76,9 @@ fd5_screen_is_format_supported(struct pipe_screen *pscreen,
}
if ((usage & (PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_SHADER_IMAGE)) &&
(fd5_pipe2tex(format) != (enum a5xx_tex_fmt)~0) &&
(target == PIPE_BUFFER ||
util_format_get_blocksize(format) != 12) &&
(fd5_pipe2tex(format) != (enum a5xx_tex_fmt)~0)) {
util_format_get_blocksize(format) != 12)) {
retval |= usage & (PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_SHADER_IMAGE);
}

View File

@@ -82,9 +82,9 @@ fd6_screen_is_format_supported(struct pipe_screen *pscreen,
}
if ((usage & (PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_SHADER_IMAGE)) &&
(fd6_pipe2tex(format) != (enum a6xx_tex_fmt)~0) &&
(target == PIPE_BUFFER ||
util_format_get_blocksize(format) != 12) &&
(fd6_pipe2tex(format) != (enum a6xx_tex_fmt)~0)) {
util_format_get_blocksize(format) != 12)) {
retval |= usage & (PIPE_BIND_SAMPLER_VIEW | PIPE_BIND_SHADER_IMAGE);
}

View File

@@ -1294,7 +1294,8 @@ iris_bo_get_tiling(struct iris_bo *bo, uint32_t *tiling_mode,
}
struct iris_bo *
iris_bo_import_dmabuf(struct iris_bufmgr *bufmgr, int prime_fd)
iris_bo_import_dmabuf(struct iris_bufmgr *bufmgr, int prime_fd,
uint32_t tiling, uint32_t stride)
{
uint32_t handle;
struct iris_bo *bo;
@@ -1345,9 +1346,15 @@ iris_bo_import_dmabuf(struct iris_bufmgr *bufmgr, int prime_fd)
if (gen_ioctl(bufmgr->fd, DRM_IOCTL_I915_GEM_GET_TILING, &get_tiling))
goto err;
bo->tiling_mode = get_tiling.tiling_mode;
bo->swizzle_mode = get_tiling.swizzle_mode;
/* XXX stride is unknown */
if (get_tiling.tiling_mode == tiling || tiling > I915_TILING_LAST) {
bo->tiling_mode = get_tiling.tiling_mode;
bo->swizzle_mode = get_tiling.swizzle_mode;
/* XXX stride is unknown */
} else {
if (bo_set_tiling_internal(bo, tiling, stride)) {
goto err;
}
}
out:
mtx_unlock(&bufmgr->lock);
@@ -1660,6 +1667,7 @@ iris_bufmgr_init(struct gen_device_info *devinfo, int fd, bool bo_reuse)
STATIC_ASSERT(IRIS_MEMZONE_SHADER_START == 0ull);
const uint64_t _4GB = 1ull << 32;
const uint64_t _2GB = 1ul << 31;
/* The STATE_BASE_ADDRESS size field can only hold 1 page shy of 4GB */
const uint64_t _4GB_minus_1 = _4GB - PAGE_SIZE;
@@ -1669,9 +1677,16 @@ iris_bufmgr_init(struct gen_device_info *devinfo, int fd, bool bo_reuse)
util_vma_heap_init(&bufmgr->vma_allocator[IRIS_MEMZONE_SURFACE],
IRIS_MEMZONE_SURFACE_START,
_4GB_minus_1 - IRIS_MAX_BINDERS * IRIS_BINDER_SIZE);
/* TODO: Why does limiting to 2GB help some state items on gen12?
* - CC Viewport Pointer
* - Blend State Pointer
* - Color Calc State Pointer
*/
const uint64_t dynamic_pool_size =
(devinfo->gen >= 12 ? _2GB : _4GB_minus_1) - IRIS_BORDER_COLOR_POOL_SIZE;
util_vma_heap_init(&bufmgr->vma_allocator[IRIS_MEMZONE_DYNAMIC],
IRIS_MEMZONE_DYNAMIC_START + IRIS_BORDER_COLOR_POOL_SIZE,
_4GB_minus_1 - IRIS_BORDER_COLOR_POOL_SIZE);
dynamic_pool_size);
/* Leave the last 4GB out of the high vma range, so that no state
* base address + size can overflow 48 bits.

View File

@@ -352,7 +352,8 @@ int iris_hw_context_set_priority(struct iris_bufmgr *bufmgr,
void iris_destroy_hw_context(struct iris_bufmgr *bufmgr, uint32_t ctx_id);
int iris_bo_export_dmabuf(struct iris_bo *bo, int *prime_fd);
struct iris_bo *iris_bo_import_dmabuf(struct iris_bufmgr *bufmgr, int prime_fd);
struct iris_bo *iris_bo_import_dmabuf(struct iris_bufmgr *bufmgr, int prime_fd,
uint32_t tiling, uint32_t stride);
uint32_t iris_bo_export_gem_handle(struct iris_bo *bo);

View File

@@ -960,12 +960,21 @@ iris_resource_from_handle(struct pipe_screen *pscreen,
struct gen_device_info *devinfo = &screen->devinfo;
struct iris_bufmgr *bufmgr = screen->bufmgr;
struct iris_resource *res = iris_alloc_resource(pscreen, templ);
const struct isl_drm_modifier_info *mod_inf =
isl_drm_modifier_get_info(whandle->modifier);
uint32_t tiling;
if (!res)
return NULL;
switch (whandle->type) {
case WINSYS_HANDLE_TYPE_FD:
res->bo = iris_bo_import_dmabuf(bufmgr, whandle->handle);
if (mod_inf)
tiling = isl_tiling_to_i915_tiling(mod_inf->tiling);
else
tiling = I915_TILING_LAST + 1;
res->bo = iris_bo_import_dmabuf(bufmgr, whandle->handle,
tiling, whandle->stride);
break;
case WINSYS_HANDLE_TYPE_SHARED:
res->bo = iris_bo_gem_create_from_name(bufmgr, "winsys image",
@@ -979,12 +988,14 @@ iris_resource_from_handle(struct pipe_screen *pscreen,
res->offset = whandle->offset;
uint64_t modifier = whandle->modifier;
if (modifier == DRM_FORMAT_MOD_INVALID) {
modifier = tiling_to_modifier(res->bo->tiling_mode);
if (mod_inf == NULL) {
mod_inf =
isl_drm_modifier_get_info(tiling_to_modifier(res->bo->tiling_mode));
}
res->mod_info = isl_drm_modifier_get_info(modifier);
assert(res->mod_info);
assert(mod_inf);
res->external_format = whandle->format;
res->mod_info = mod_inf;
isl_surf_usage_flags_t isl_usage = pipe_bind_to_isl_usage(templ->bind);
@@ -995,7 +1006,8 @@ iris_resource_from_handle(struct pipe_screen *pscreen,
if (templ->target == PIPE_BUFFER) {
res->surf.tiling = ISL_TILING_LINEAR;
} else {
if (whandle->modifier == DRM_FORMAT_MOD_INVALID || whandle->plane == 0) {
/* Create a surface for each plane specified by the external format. */
if (whandle->plane < util_format_get_num_planes(whandle->format)) {
UNUSED const bool isl_surf_created_successfully =
isl_surf_init(&screen->isl_dev, &res->surf,
.dim = target_to_isl_surf_dim(templ->target),
@@ -1173,6 +1185,8 @@ iris_resource_get_handle(struct pipe_screen *pscreen,
whandle->stride = res->surf.row_pitch_B;
bo = res->bo;
}
whandle->format = res->external_format;
whandle->modifier =
res->mod_info ? res->mod_info->modifier
: tiling_to_modifier(res->bo->tiling_mode);

View File

@@ -162,6 +162,13 @@ struct iris_resource {
uint16_t has_hiz;
} aux;
/**
* For external surfaces, this is format that was used to create or import
* the surface. For internal surfaces, this will always be
* PIPE_FORMAT_NONE.
*/
enum pipe_format external_format;
/**
* For external surfaces, this is DRM format modifier that was used to
* create or import the surface. For internal surfaces, this will always

View File

@@ -933,6 +933,25 @@ panfrost_batch_submit(struct panfrost_batch *batch)
if (ret)
fprintf(stderr, "panfrost_batch_submit failed: %d\n", ret);
/* We must reset the damage info of our render targets here even
* though a damage reset normally happens when the DRI layer swaps
* buffers. That's because there can be implicit flushes the GL
* app is not aware of, and those might impact the damage region: if
* part of the damaged portion is drawn during those implicit flushes,
* you have to reload those areas before next draws are pushed, and
* since the driver can't easily know what's been modified by the draws
* it flushed, the easiest solution is to reload everything.
*/
for (unsigned i = 0; i < batch->key.nr_cbufs; i++) {
struct panfrost_resource *res;
if (!batch->key.cbufs[i])
continue;
res = pan_resource(batch->key.cbufs[i]->texture);
panfrost_resource_reset_damage(res);
}
out:
panfrost_freeze_batch(batch);
panfrost_free_batch(batch);

View File

@@ -48,7 +48,7 @@
#include "pan_util.h"
#include "pan_tiling.h"
static void
void
panfrost_resource_reset_damage(struct panfrost_resource *pres)
{
/* We set the damage extent to the full resource size but keep the

View File

@@ -135,6 +135,9 @@ void
panfrost_blit_wallpaper(struct panfrost_context *ctx,
struct pipe_box *box);
void
panfrost_resource_reset_damage(struct panfrost_resource *pres);
void
panfrost_resource_set_damage_region(struct pipe_screen *screen,
struct pipe_resource *res,

View File

@@ -40,7 +40,9 @@ LOCAL_C_INCLUDES := \
$(call generated-sources-dir-for,STATIC_LIBRARIES,libmesa_amd_common,,)/common \
$(call generated-sources-dir-for,STATIC_LIBRARIES,libmesa_nir,,)/nir
LOCAL_STATIC_LIBRARIES := libmesa_amd_common
LOCAL_STATIC_LIBRARIES := \
libmesa_amd_common \
libmesa_galliumvl
LOCAL_SHARED_LIBRARIES := libdrm_radeon
LOCAL_MODULE := libmesa_pipe_radeonsi

View File

@@ -199,7 +199,8 @@ static unsigned si_texture_get_offset(struct si_screen *sscreen,
/* Each texture is an array of slices. Each slice is an array
* of mipmap levels. */
return box->z * tex->surface.u.gfx9.surf_slice_size +
return tex->surface.u.gfx9.surf_offset +
box->z * tex->surface.u.gfx9.surf_slice_size +
tex->surface.u.gfx9.offset[level] +
(box->y / tex->surface.blk_h *
tex->surface.u.gfx9.surf_pitch +
@@ -1721,10 +1722,12 @@ struct pipe_resource *si_texture_create(struct pipe_screen *screen,
tex->plane_index = i;
tex->num_planes = num_planes;
if (!last_plane)
if (!plane0) {
plane0 = last_plane = tex;
else
} else {
last_plane->buffer.b.b.next = &tex->buffer.b.b;
last_plane = tex;
}
}
return (struct pipe_resource *)plane0;

View File

@@ -228,6 +228,7 @@ struct st_config_options
bool allow_glsl_builtin_variable_redeclaration;
bool allow_higher_compat_version;
bool glsl_zero_init;
bool vs_position_always_invariant;
bool force_glsl_abs_sqrt;
bool allow_glsl_cross_stage_interpolation_mismatch;
bool allow_glsl_layout_qualifier_on_function_parameters;

View File

@@ -49,6 +49,12 @@ struct winsys_handle
*/
unsigned offset;
/**
* Input to resource_from_handle.
* Output from resource_get_handle.
*/
uint64_t format;
/**
* Input to resource_from_handle.
* Output from resource_get_handle.

View File

@@ -547,6 +547,7 @@ dri2_allocate_textures(struct dri_context *ctx,
whandle.handle = buf->name;
whandle.stride = buf->pitch;
whandle.offset = 0;
whandle.format = format;
whandle.modifier = DRM_FORMAT_MOD_INVALID;
if (screen->can_share_buffer)
whandle.type = WINSYS_HANDLE_TYPE_SHARED;
@@ -777,18 +778,12 @@ dri2_create_image_from_winsys(__DRIscreen *_screen,
for (i = num_handles - 1; i >= 0; i--) {
struct pipe_resource *tex;
if (whandle[i].modifier == DRM_FORMAT_MOD_INVALID) {
templ.width0 = width >> map->planes[i].width_shift;
templ.height0 = height >> map->planes[i].height_shift;
if (is_yuv)
templ.format = dri2_get_pipe_format_for_dri_format(map->planes[i].dri_format);
else
templ.format = map->pipe_format;
} else {
templ.width0 = width;
templ.height0 = height;
templ.width0 = width >> map->planes[i].width_shift;
templ.height0 = height >> map->planes[i].height_shift;
if (is_yuv)
templ.format = dri2_get_pipe_format_for_dri_format(map->planes[i].dri_format);
else
templ.format = map->pipe_format;
}
assert(templ.format != PIPE_FORMAT_NONE);
tex = pscreen->resource_from_handle(pscreen,
@@ -826,6 +821,7 @@ dri2_create_image_from_name(__DRIscreen *_screen,
memset(&whandle, 0, sizeof(whandle));
whandle.type = WINSYS_HANDLE_TYPE_SHARED;
whandle.handle = name;
whandle.format = map->pipe_format;
whandle.modifier = DRM_FORMAT_MOD_INVALID;
whandle.stride = pitch * util_format_get_blocksize(map->pipe_format);
@@ -844,8 +840,13 @@ dri2_create_image_from_name(__DRIscreen *_screen,
}
static unsigned
dri2_get_modifier_num_planes(uint64_t modifier)
dri2_get_modifier_num_planes(uint64_t modifier, int fourcc)
{
const struct dri2_format_mapping *map = dri2_get_mapping_by_fourcc(fourcc);
if (!map)
return 0;
switch (modifier) {
case I915_FORMAT_MOD_Y_TILED_CCS:
return 2;
@@ -867,8 +868,8 @@ dri2_get_modifier_num_planes(uint64_t modifier)
/* FD_FORMAT_MOD_QCOM_TILED is not in drm_fourcc.h */
case I915_FORMAT_MOD_X_TILED:
case I915_FORMAT_MOD_Y_TILED:
return 1;
case DRM_FORMAT_MOD_INVALID:
return map->nplanes;
default:
return 0;
}
@@ -886,15 +887,13 @@ dri2_create_image_from_fd(__DRIscreen *_screen,
__DRIimage *img = NULL;
unsigned err = __DRI_IMAGE_ERROR_SUCCESS;
int i, expected_num_fds;
uint64_t mod_planes = dri2_get_modifier_num_planes(modifier);
int num_handles = dri2_get_modifier_num_planes(modifier, fourcc);
if (!map || (modifier != DRM_FORMAT_MOD_INVALID && mod_planes == 0)) {
if (!map || num_handles == 0) {
err = __DRI_IMAGE_ERROR_BAD_MATCH;
goto exit;
}
int num_handles = mod_planes > 0 ? mod_planes : map->nplanes;
switch (fourcc) {
case DRM_FORMAT_YUYV:
case DRM_FORMAT_UYVY:
@@ -914,7 +913,7 @@ dri2_create_image_from_fd(__DRIscreen *_screen,
for (i = 0; i < num_handles; i++) {
int fdnum = i >= num_fds ? 0 : i;
int index = mod_planes > 0 ? i : map->planes[i].buffer_index;
int index = i >= map->nplanes ? i : map->planes[i].buffer_index;
if (fds[fdnum] < 0) {
err = __DRI_IMAGE_ERROR_BAD_ALLOC;
goto exit;
@@ -924,6 +923,7 @@ dri2_create_image_from_fd(__DRIscreen *_screen,
whandles[i].handle = (unsigned)fds[fdnum];
whandles[i].stride = (unsigned)strides[index];
whandles[i].offset = (unsigned)offsets[index];
whandles[i].format = map->pipe_format;
whandles[i].modifier = modifier;
whandles[i].plane = index;
}
@@ -1314,6 +1314,7 @@ dri2_from_names(__DRIscreen *screen, int width, int height, int format,
whandle.handle = names[0];
whandle.stride = strides[0];
whandle.offset = offsets[0];
whandle.format = map->pipe_format;
whandle.modifier = DRM_FORMAT_MOD_INVALID;
img = dri2_create_image_from_winsys(screen, width, height, map,
@@ -1411,7 +1412,7 @@ dri2_query_dma_buf_format_modifier_attribs(__DRIscreen *_screen,
{
switch (attrib) {
case __DRI_IMAGE_FORMAT_MODIFIER_ATTRIB_PLANE_COUNT: {
uint64_t mod_planes = dri2_get_modifier_num_planes(modifier);
uint64_t mod_planes = dri2_get_modifier_num_planes(modifier, fourcc);
if (mod_planes > 0)
*value = mod_planes;
return mod_planes > 0;
@@ -1879,8 +1880,6 @@ static void
dri2_set_damage_region(__DRIdrawable *dPriv, unsigned int nrects, int *rects)
{
struct dri_drawable *drawable = dri_drawable(dPriv);
struct pipe_resource *resource = drawable->textures[ST_ATTACHMENT_BACK_LEFT];
struct pipe_screen *screen = resource->screen;
struct pipe_box *boxes = NULL;
if (nrects) {
@@ -1894,8 +1893,25 @@ dri2_set_damage_region(__DRIdrawable *dPriv, unsigned int nrects, int *rects)
}
}
screen->set_damage_region(screen, resource, nrects, boxes);
FREE(boxes);
FREE(drawable->damage_rects);
drawable->damage_rects = boxes;
drawable->num_damage_rects = nrects;
/* Only apply the damage region if the BACK_LEFT texture is up-to-date. */
if (drawable->texture_stamp == drawable->dPriv->lastStamp &&
(drawable->texture_mask & (1 << ST_ATTACHMENT_BACK_LEFT))) {
struct pipe_screen *screen = drawable->screen->base.screen;
struct pipe_resource *resource;
if (drawable->stvis.samples > 1)
resource = drawable->msaa_textures[ST_ATTACHMENT_BACK_LEFT];
else
resource = drawable->textures[ST_ATTACHMENT_BACK_LEFT];
screen->set_damage_region(screen, resource,
drawable->num_damage_rects,
drawable->damage_rects);
}
}
static __DRI2bufferDamageExtension dri2BufferDamageExtension = {

View File

@@ -92,6 +92,18 @@ dri_st_framebuffer_validate(struct st_context_iface *stctx,
}
} while (lastStamp != drawable->dPriv->lastStamp);
/* Flush the pending set_damage_region request. */
struct pipe_screen *pscreen = screen->base.screen;
if (new_mask & (1 << ST_ATTACHMENT_BACK_LEFT) &&
pscreen->set_damage_region) {
struct pipe_resource *resource = textures[ST_ATTACHMENT_BACK_LEFT];
pscreen->set_damage_region(pscreen, resource,
drawable->num_damage_rects,
drawable->damage_rects);
}
if (!out)
return true;
@@ -197,6 +209,7 @@ dri_destroy_buffer(__DRIdrawable * dPriv)
/* Notify the st manager that this drawable is no longer valid */
stapi->destroy_drawable(stapi, &drawable->base);
FREE(drawable->damage_rects);
FREE(drawable);
}

View File

@@ -52,6 +52,9 @@ struct dri_drawable
unsigned old_w;
unsigned old_h;
struct pipe_box *damage_rects;
unsigned int num_damage_rects;
struct pipe_resource *textures[ST_ATTACHMENT_COUNT];
struct pipe_resource *msaa_textures[ST_ATTACHMENT_COUNT];
unsigned int texture_mask, texture_stamp;

View File

@@ -84,6 +84,8 @@ dri_fill_st_options(struct dri_screen *screen)
options->allow_higher_compat_version =
driQueryOptionb(optionCache, "allow_higher_compat_version");
options->glsl_zero_init = driQueryOptionb(optionCache, "glsl_zero_init");
options->vs_position_always_invariant =
driQueryOptionb(optionCache, "vs_position_always_invariant");
options->force_glsl_abs_sqrt =
driQueryOptionb(optionCache, "force_glsl_abs_sqrt");
options->allow_glsl_cross_stage_interpolation_mismatch =

View File

@@ -57,7 +57,8 @@ endif
LOCAL_STATIC_LIBRARIES += \
libfreedreno_drm \
libfreedreno_ir3 \
libpanfrost_shared \
libmesa_gallium \
libpanfrost_shared
ifeq ($(USE_LIBBACKTRACE),true)
LOCAL_SHARED_LIBRARIES += libbacktrace
@@ -75,7 +76,6 @@ LOCAL_WHOLE_STATIC_LIBRARIES := \
libmesa_nir \
libmesa_dri_common \
libmesa_megadriver_stub \
libmesa_gallium \
libmesa_pipe_loader \
libmesa_util \
libmesa_loader

View File

@@ -326,7 +326,6 @@ amdgpu_winsys_create(int fd, const struct pipe_screen_config *config,
aws = util_hash_table_get(dev_tab, dev);
if (aws) {
pipe_reference(NULL, &aws->reference);
simple_mtx_unlock(&dev_tab_mutex);
/* Release the device handle, because we don't need it anymore.
* This function is returning an existing winsys instance, which

View File

@@ -1512,10 +1512,10 @@ has_immediate(const struct gen_device_info *devinfo, const brw_inst *inst,
{
if (brw_inst_src0_reg_file(devinfo, inst) == BRW_IMMEDIATE_VALUE) {
*type = brw_inst_src0_type(devinfo, inst);
return *type != -1;
return *type != (enum brw_reg_type)-1;
} else if (brw_inst_src1_reg_file(devinfo, inst) == BRW_IMMEDIATE_VALUE) {
*type = brw_inst_src1_type(devinfo, inst);
return *type != -1;
return *type != (enum brw_reg_type)-1;
}
return false;

View File

@@ -71,6 +71,8 @@
#define MAP_READ (1 << 0)
#define MAP_WRITE (1 << 1)
#define OA_REPORT_INVALID_CTX_ID (0xffffffff)
/**
* Periodic OA samples are read() into these buffer structures via the
* i915 perf kernel interface and appended to the
@@ -1137,7 +1139,9 @@ gen_perf_query_result_accumulate(struct gen_perf_query_result *result,
{
int i, idx = 0;
result->hw_id = start[2];
if (result->hw_id == OA_REPORT_INVALID_CTX_ID &&
start[2] != OA_REPORT_INVALID_CTX_ID)
result->hw_id = start[2];
result->reports_accumulated++;
switch (query->oa_format) {
@@ -1175,7 +1179,7 @@ void
gen_perf_query_result_clear(struct gen_perf_query_result *result)
{
memset(result, 0, sizeof(*result));
result->hw_id = 0xffffffff; /* invalid */
result->hw_id = OA_REPORT_INVALID_CTX_ID; /* invalid */
}
static void
@@ -1456,8 +1460,8 @@ get_free_sample_buf(struct gen_perf_context *perf_ctx)
exec_node_init(&buf->link);
buf->refcount = 0;
buf->len = 0;
}
buf->len = 0;
return buf;
}
@@ -1974,7 +1978,8 @@ read_oa_samples_until(struct gen_perf_context *perf_ctx,
exec_list_get_tail(&perf_ctx->sample_buffers);
struct oa_sample_buf *tail_buf =
exec_node_data(struct oa_sample_buf, tail_node, link);
uint32_t last_timestamp = tail_buf->last_timestamp;
uint32_t last_timestamp =
tail_buf->len == 0 ? start_timestamp : tail_buf->last_timestamp;
while (1) {
struct oa_sample_buf *buf = get_free_sample_buf(perf_ctx);
@@ -1989,12 +1994,13 @@ read_oa_samples_until(struct gen_perf_context *perf_ctx,
exec_list_push_tail(&perf_ctx->free_sample_buffers, &buf->link);
if (len < 0) {
if (errno == EAGAIN)
return ((last_timestamp - start_timestamp) >=
if (errno == EAGAIN) {
return ((last_timestamp - start_timestamp) < INT32_MAX &&
(last_timestamp - start_timestamp) >=
(end_timestamp - start_timestamp)) ?
OA_READ_STATUS_FINISHED :
OA_READ_STATUS_UNFINISHED;
else {
} else {
DBG("Error reading i915 perf samples: %m\n");
}
} else
@@ -2210,6 +2216,17 @@ discard_all_queries(struct gen_perf_context *perf_ctx)
}
}
/* Looks for the validity bit of context ID (dword 2) of an OA report. */
static bool
oa_report_ctx_id_valid(const struct gen_device_info *devinfo,
const uint32_t *report)
{
assert(devinfo->gen >= 8);
if (devinfo->gen == 8)
return (report[0] & (1 << 25)) != 0;
return (report[0] & (1 << 16)) != 0;
}
/**
* Accumulate raw OA counter values based on deltas between pairs of
* OA reports.
@@ -2237,7 +2254,7 @@ accumulate_oa_reports(struct gen_perf_context *perf_ctx,
uint32_t *last;
uint32_t *end;
struct exec_node *first_samples_node;
bool in_ctx = true;
bool last_report_ctx_match = true;
int out_duration = 0;
assert(query->oa.map != NULL);
@@ -2266,7 +2283,7 @@ accumulate_oa_reports(struct gen_perf_context *perf_ctx,
first_samples_node = query->oa.samples_head->next;
foreach_list_typed_from(struct oa_sample_buf, buf, link,
&perf_ctx.sample_buffers,
&perf_ctx->sample_buffers,
first_samples_node)
{
int offset = 0;
@@ -2283,6 +2300,7 @@ accumulate_oa_reports(struct gen_perf_context *perf_ctx,
switch (header->type) {
case DRM_I915_PERF_RECORD_SAMPLE: {
uint32_t *report = (uint32_t *)(header + 1);
bool report_ctx_match = true;
bool add = true;
/* Ignore reports that come before the start marker.
@@ -2311,35 +2329,30 @@ accumulate_oa_reports(struct gen_perf_context *perf_ctx,
* of OA counters while any other context is acctive.
*/
if (devinfo->gen >= 8) {
if (in_ctx && report[2] != query->oa.result.hw_id) {
DBG("i915 perf: Switch AWAY (observed by ID change)\n");
in_ctx = false;
/* Consider that the current report matches our context only if
* the report says the report ID is valid.
*/
report_ctx_match = oa_report_ctx_id_valid(devinfo, report) &&
report[2] == start[2];
if (report_ctx_match)
out_duration = 0;
} else if (in_ctx == false && report[2] == query->oa.result.hw_id) {
DBG("i915 perf: Switch TO\n");
in_ctx = true;
/* From experimentation in IGT, we found that the OA unit
* might label some report as "idle" (using an invalid
* context ID), right after a report for a given context.
* Deltas generated by those reports actually belong to the
* previous context, even though they're not labelled as
* such.
*
* We didn't *really* Switch AWAY in the case that we e.g.
* saw a single periodic report while idle...
*/
if (out_duration >= 1)
add = false;
} else if (in_ctx) {
assert(report[2] == query->oa.result.hw_id);
DBG("i915 perf: Continuation IN\n");
} else {
assert(report[2] != query->oa.result.hw_id);
DBG("i915 perf: Continuation OUT\n");
add = false;
else
out_duration++;
}
/* Only add the delta between <last, report> if the last report
* was clearly identified as our context, or if we have at most
* 1 report without a matching ID.
*
* The OA unit will sometimes label reports with an invalid
* context ID when i915 rewrites the execlist submit register
* with the same context as the one currently running. This
* happens when i915 wants to notify the HW of ringbuffer tail
* register update. We have to consider this report as part of
* our context as the 3d pipeline behind the OACS unit is still
* processing the operations started at the previous execlist
* submission.
*/
add = last_report_ctx_match && out_duration < 2;
}
if (add) {
@@ -2349,6 +2362,7 @@ accumulate_oa_reports(struct gen_perf_context *perf_ctx,
}
last = report;
last_report_ctx_match = report_ctx_match;
break;
}

View File

@@ -345,6 +345,9 @@ VkResult anv_ResetCommandBuffer(
case 11: \
gen11_##func(__VA_ARGS__); \
break; \
case 12: \
gen12_##func(__VA_ARGS__); \
break; \
default: \
assert(!"Unknown hardware generation"); \
}

View File

@@ -3000,6 +3000,14 @@ VkResult anv_DeviceWaitIdle(
bool
anv_vma_alloc(struct anv_device *device, struct anv_bo *bo)
{
const struct anv_physical_device *pdevice = &device->instance->physicalDevice;
const struct gen_device_info *devinfo = &pdevice->info;
/* Gen12 CCS surface addresses need to be 64K aligned. We have no way of
* telling what this allocation is for so pick the largest alignment.
*/
const uint32_t vma_alignment =
devinfo->gen >= 12 ? (64 * 1024) : (4 * 1024);
if (!(bo->flags & EXEC_OBJECT_PINNED))
return true;
@@ -3009,7 +3017,8 @@ anv_vma_alloc(struct anv_device *device, struct anv_bo *bo)
if (bo->flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS &&
device->vma_hi_available >= bo->size) {
uint64_t addr = util_vma_heap_alloc(&device->vma_hi, bo->size, 4096);
uint64_t addr =
util_vma_heap_alloc(&device->vma_hi, bo->size, vma_alignment);
if (addr) {
bo->offset = gen_canonical_address(addr);
assert(addr == gen_48b_address(bo->offset));
@@ -3018,7 +3027,8 @@ anv_vma_alloc(struct anv_device *device, struct anv_bo *bo)
}
if (bo->offset == 0 && device->vma_lo_available >= bo->size) {
uint64_t addr = util_vma_heap_alloc(&device->vma_lo, bo->size, 4096);
uint64_t addr =
util_vma_heap_alloc(&device->vma_lo, bo->size, vma_alignment);
if (addr) {
bo->offset = gen_canonical_address(addr);
assert(addr == gen_48b_address(bo->offset));
@@ -3267,9 +3277,10 @@ VkResult anv_AllocateMemory(
i915_tiling);
if (ret) {
anv_bo_cache_release(device, &device->bo_cache, mem->bo);
return vk_errorf(device->instance, NULL,
VK_ERROR_OUT_OF_DEVICE_MEMORY,
"failed to set BO tiling: %m");
result = vk_errorf(device->instance, NULL,
VK_ERROR_OUT_OF_DEVICE_MEMORY,
"failed to set BO tiling: %m");
goto fail;
}
}
}

View File

@@ -3893,6 +3893,13 @@ genX(flush_pipeline_select)(struct anv_cmd_buffer *cmd_buffer,
vfe.NumberofURBEntries = 2;
vfe.URBEntryAllocationSize = 2;
}
/* We just emitted a dummy MEDIA_VFE_STATE so now that packet is
* invalid. Set the compute pipeline to dirty to force a re-emit of the
* pipeline in case we get back-to-back dispatch calls with the same
* pipeline and a PIPELINE_SELECT in between.
*/
cmd_buffer->state.compute.pipeline_dirty = true;
}
#endif

View File

@@ -369,8 +369,8 @@ emit_3dstate_sbe(struct anv_pipeline *pipeline)
if (input_index < 0)
continue;
/* gl_Layer is stored in the VUE header */
if (attr == VARYING_SLOT_LAYER) {
/* gl_Viewport and gl_Layer are stored in the VUE header */
if (attr == VARYING_SLOT_VIEWPORT || attr == VARYING_SLOT_LAYER) {
urb_entry_read_offset = 0;
continue;
}

View File

@@ -35,9 +35,7 @@ i965_FILES = \
brw_object_purgeable.c \
brw_pipe_control.c \
brw_pipe_control.h \
brw_performance_query.h \
brw_performance_query.c \
brw_performance_query_metrics.h \
brw_program.c \
brw_program.h \
brw_program_binary.c \

View File

@@ -98,6 +98,7 @@ DRI_CONF_BEGIN
DRI_CONF_SECTION_MISCELLANEOUS
DRI_CONF_GLSL_ZERO_INIT("false")
DRI_CONF_VS_POSITION_ALWAYS_INVARIANT("false")
DRI_CONF_ALLOW_RGB10_CONFIGS("false")
DRI_CONF_ALLOW_RGB565_CONFIGS("true")
DRI_CONF_ALLOW_FP16_CONFIGS("false")
@@ -2798,6 +2799,8 @@ __DRIconfig **intelInitScreen2(__DRIscreen *dri_screen)
screen->compiler->constant_buffer_0_is_relative = devinfo->gen < 8 ||
!(screen->kernel_features & KERNEL_ALLOWS_CONTEXT_ISOLATION);
screen->compiler->glsl_compiler_options[MESA_SHADER_VERTEX].PositionAlwaysInvariant = driQueryOptionb(&screen->optionCache, "vs_position_always_invariant");
screen->compiler->supports_pull_constants = true;
screen->has_exec_fence =

View File

@@ -3193,6 +3193,9 @@ struct gl_shader_compiler_options
/** Clamp UBO and SSBO block indices so they don't go out-of-bounds. */
GLboolean ClampBlockIndicesToArrayBounds;
/** (driconf) Force gl_Position to be considered invariant */
GLboolean PositionAlwaysInvariant;
const struct nir_shader_compiler_options *NirOptions;
};

View File

@@ -754,6 +754,8 @@ st_create_context_priv(struct gl_context *ctx, struct pipe_context *pipe,
ctx->Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].EmitNoSat =
!screen->get_param(screen, PIPE_CAP_VERTEX_SHADER_SATURATE);
ctx->Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].PositionAlwaysInvariant = options->vs_position_always_invariant;
if (ctx->Const.GLSLVersion < 400) {
for (i = 0; i < MESA_SHADER_STAGES; i++)
ctx->Const.ShaderCompilerOptions[i].EmitNoIndirectSampler = true;

View File

@@ -550,4 +550,14 @@ TODO: document the other workarounds.
<option name="gles_emulate_bgra" value="true" />
</application>
</device>
<device driver="i965">
<application name="Middle Earth: Shadow of Mordor" executable="ShadowOfMordor">
<option name="vs_position_always_invariant" value="true" />
</application>
</device>
<device driver="iris">
<application name="Middle Earth: Shadow of Mordor" executable="ShadowOfMordor">
<option name="vs_position_always_invariant" value="true" />
</application>
</device>
</driconf>

View File

@@ -279,6 +279,11 @@ DRI_CONF_OPT_BEGIN_B(glsl_zero_init, def) \
DRI_CONF_DESC(en,gettext("Force uninitialized variables to default to zero")) \
DRI_CONF_OPT_END
#define DRI_CONF_VS_POSITION_ALWAYS_INVARIANT(def) \
DRI_CONF_OPT_BEGIN_B(vs_position_always_invariant, def) \
DRI_CONF_DESC(en,gettext("Force the vertex shader's gl_Position output to be considered 'invariant'")) \
DRI_CONF_OPT_END
#define DRI_CONF_ALLOW_RGB10_CONFIGS(def) \
DRI_CONF_OPT_BEGIN_B(allow_rgb10_configs, def) \
DRI_CONF_DESC(en,gettext("Allow exposure of visuals and fbconfigs with rgb10a2 formats")) \