Compare commits

..

39 Commits

Author SHA1 Message Date
Eric Engestrom
0a443eb1ad VERSION: bump to release 20.1.9 2020-09-30 20:37:42 +02:00
Eric Engestrom
bc6fd91e68 docs: add release notes for 20.1.9 2020-09-30 20:33:53 +02:00
Connor Abbott
e1f6000b54 nir/lower_io_arrays: Fix xfb_offset bug
I noticed this once I started gathering xfb_info after
nir_lower_io_arrays_to_elements_no_indirect.

Fixes: b2bbd978d0 ("nir: fix lowering arrays to elements for XFB outputs")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6514>
(cherry picked from commit 5a88db682e)
2020-09-30 11:37:10 +02:00
Erik Faye-Lund
30b256c21e st/mesa: use roundf instead of floorf for lod-bias rounding
There's no good reason not to use a symmetric rounding mode here. This
fixes the following GL CTS case for me:

GTF-GL33.gtf21.GL3Tests.texture_lod_bias.texture_lod_bias_all

Fixes: 132b69c4ed ("st/mesa: round lod_bias to a multiple of 1/256")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6892>
(cherry picked from commit 7685c37bf4)
2020-09-30 11:37:10 +02:00
Pierre-Eric Pelloux-Prayer
71b3582ec1 gallium/vl: add chroma_format arg to vl_video_buffer functions
vl_mpeg12_decoder needs to override the chroma_format value to get the
correct size calculated (chroma_format is used by vl_video_buffer_adjust_size).

I'm not sure why it's needed, but this is needed to get correct mpeg decode.

Fixes: 24f2b0a856 ("gallium/video: remove pipe_video_buffer.chroma_format")
Acked-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6817>
(cherry picked from commit 2584d48b2c)
2020-09-29 22:11:46 +02:00
Pierre-Eric Pelloux-Prayer
fc21ef6b66 gallium/vl: do not call transfer_unmap if transfer is NULL
CC: mesa-stable
Acked-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6817>
(cherry picked from commit b121b1b8b8)
2020-09-29 22:11:42 +02:00
Eric Engestrom
d74c2e743d .pick_status.json: Update to efaea653b5 2020-09-29 22:11:28 +02:00
Eric Engestrom
0dbec6b964 .pick_status.json: Mark 89401e5867 as denominated 2020-09-28 23:04:07 +02:00
Samuel Pitoiset
db4a29d078 spirv: fix emitting switch cases that directly jump to the merge block
As shown in the valid SPIR-V below, if one switch case statement
directly jumps to the merge block, it has no branches at all and
we have to reset the fall variable. Otherwise, it creates an
unintentional fallthrough.

       OpSelectionMerge %97 None
       OpSwitch %96 %97 1 %99 2 %100
%100 = OpLabel
%102 = OpAccessChain %_ptr_StorageBuffer_v4float %86 %uint_0 %uint_37
%103 = OpLoad %v4float %102
%104 = OpBitcast %v4uint %103
%105 = OpCompositeExtract %uint %104 0
%106 = OpShiftLeftLogical %uint %105 %uint_1
       OpBranch %97
 %99 = OpLabel
       OpBranch %97
 %97 = OpLabel
%107 = OpPhi %uint %uint_4 %75 %uint_5 %99 %106 %100

This fixes serious corruption in Horizon Zero Dawn.

v2: Changed the code to skip the entire if-block instead of resetting
    the fallthrough variable.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3460
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6590>
(cherry picked from commit 57fba85da4)
2020-09-28 18:23:20 +02:00
Karol Herbst
4bff9ca691 spirv: extract switch parsing into its own function
v2 (Jason Ekstrand):
 - Construct a list of vtn_case objects

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2401>
(cherry picked from commit 467b90fcc4)
2020-09-28 18:23:20 +02:00
Eric Engestrom
9dcc7d4d41 .pick_status.json: Mark 6b1a56b908 as denominated 2020-09-28 17:00:59 +02:00
Eric Engestrom
3a8ba8ecb3 .pick_status.json: Mark e98c7a6634 as denominated 2020-09-28 17:00:59 +02:00
Eric Engestrom
7e3ed26c28 .pick_status.json: Mark 802d3611dc as denominated 2020-09-28 17:00:59 +02:00
Danylo Piliaiev
79bed11bdd intel/fs: Disable sample mask predication for scratch stores
Scratch stores are being lowered to the instructions with side-effects,
however they should be enabled in fs helper invocations, since they
are produced from operations which don't imply side-effects.

To fix this - we move the decision of whether the sample mask predication
is enable to the point where logical brw instructions are created.

GLSL example of the issue:

 int tmp[1024];
 ...
 do {
   // changes to tmp
 } while (some_condition(tmp))

If `tmp` is lowered to scrach memory, `some_condition` would be
undefined if scratch write is predicated on sample mask, making
possible for the while loop to become infinite and hang the GPU.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3256
Fixes: 53bfcdeecf
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 77486db867)
2020-09-28 17:00:59 +02:00
Dylan Baker
80c6955c23 meson/anv: Use variable that checks for --build-id
fixes: d1992255bb
       ("meson: Add build Intel "anv" vulkan driver")

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6819>
(cherry picked from commit 465460943a)
2020-09-27 11:13:04 +02:00
Nanley Chery
02f2b9fa7b blorp: Ensure aligned HIZ_CCS_WT partial clears
Fixes: 5425fcf2cb ("intel/blorp: Satisfy HIZ_CCS fast-clear alignments")
Reported-by: Sagar Ghuge <sagar.ghuge@intel.com>
Tested-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6854>
(cherry picked from commit 7f3e881c6c)
2020-09-27 11:11:33 +02:00
Jason Ekstrand
083b992f9d nir/liveness: Consider if uses in nir_ssa_defs_interfere
Fixes: f86902e75d "nir: Add an SSA-based liveness analysis pass"
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3428
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Yevhenii Kharchenko <yevhenii.kharchenko@globallogic.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6824>
(cherry picked from commit 0206fb3941)
2020-09-27 11:11:31 +02:00
Marek Olšák
520d023bfb radeonsi: fix indirect dispatches with variable block sizes
The block size input was uninitialized.

Fixes: 77c81164bc "radeonsi: support ARB_compute_variable_group_size"

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6782>
(cherry picked from commit 8be46d6558)
2020-09-27 11:10:03 +02:00
Christian Gmeiner
14c7f4740e etnaviv: simplify linear stride implementation
As documented in the galcore kernel driver "only LOD0 is valid
for this register". This makes sense, as NTE's LINEAR_STRIDE is
only capable to store one linear stride value per sampler.
This fixes linear textures in sampler slot != 0.

Fixes: 34458c1cf6 ("etnaviv: add linear sampling support")
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Michael Tretter <m.tretter@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3285>
(cherry picked from commit a7e3cc7a0e)
2020-09-27 11:10:02 +02:00
Erik Faye-Lund
53356f8972 mesa: handle GL_FRONT after translating to it
Without this, we end up throwing errors on code along these lines when
rendering using single-buffering:

GLint att;
glGetIntegerv(GL_READ_BUFFER, &att);
glGetFramebufferAttachmentParameteriv(GL_READ_FRAMEBUFFER, att, ...);

This is because we internally translate GL_BACK (which is what
glGetIntegerv returned) to GL_FRONT, which we don't handle in the
Desktop GL case. So let's start handling it.

This fixes the GLTF-GL33.gtf21.GL2FixedTests.buffer_color.blend_color
test for me.

Fixes: e6ca6e587e ("mesa: Handle pbuffers in desktop GL framebuffer attachment queries")

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6815>
(cherry picked from commit 9e13a16c97)
2020-09-27 11:09:59 +02:00
Eric Engestrom
7590165899 .pick_status.json: Update to a3543adc26 2020-09-27 11:09:31 +02:00
Danylo Piliaiev
46762687a0 nir/lower_samplers: Clamp out-of-bounds access to array of samplers
Section 5.11 (Out-of-Bounds Accesses) of the GLSL 4.60 spec says:

"In the subsections described above for array, vector, matrix and
 structure accesses, any out-of-bounds access produced undefined
 behavior.... Out-of-bounds reads return undefined values, which
 include values from other variables of the active program or zero."

Robustness extensions suggest to return zero on out-of-bounds
accesses, however it's not applicable to the arrays of samplers,
so just clamp the index.

Otherwise instr->sampler_index or instr->texture_index would be out
of bounds, and they are used as an index to arrays of driver state.

E.g. this fixes such dereference:
 if (options->lower_tex_packing[tex->sampler_index] !=
in nir_lower_tex.c

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6428>
(cherry picked from commit f2b17dec12)
2020-09-23 20:58:12 +02:00
Danylo Piliaiev
ef29f3758e nir/large_constants: Eliminate out-of-bounds writes to large constants
Out-of-bounds writes could be eliminated per spec:

Section 5.11 (Out-of-Bounds Accesses) of the GLSL 4.60 spec says:

"In the subsections described above for array, vector, matrix and
 structure accesses, any out-of-bounds access produced undefined
 behavior.... Out-of-bounds writes may be discarded or overwrite
 other variables of the active program."

Fixes: 1235850522
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6428>
(cherry picked from commit 0ba82f78a5)
2020-09-23 20:58:10 +02:00
Danylo Piliaiev
45a937e040 nir/lower_io: Eliminate oob writes and return zero for oob reads
Out-of-bounds writes could be eliminated per spec:

Section 5.11 (Out-of-Bounds Accesses) of the GLSL 4.60 spec says:

 "In the subsections described above for array, vector, matrix and
  structure accesses, any out-of-bounds access produced undefined
  behavior....
  Out-of-bounds writes may be discarded or overwrite
  other variables of the active program.
  Out-of-bounds reads return undefined values, which
  include values from other variables of the active program or zero."

GL_KHR_robustness and GL_ARB_robustness encourage us to return zero
for reads.

Otherwise get_io_offset would return out-of-bound offset which may
result in out-of-bound loading/storing of inputs/outputs,
that could cause issues in drivers down the line.

E.g. this fixes such dereference:
 int vue_slot = vue_map->varying_to_slot[intrin->const_index[0]];
in brw_nir.c

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6428>
(cherry picked from commit 66669eb529)
2020-09-23 20:58:08 +02:00
Bas Nieuwenhuizen
5fedabe34b st/mesa: Deal with empty textures/buffers in semaphore wait/signal.
The actual texture might not have been created yet.

Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3257
CC: mesa-stable
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6788>
(cherry picked from commit ade72e677b)
2020-09-23 20:58:06 +02:00
Lionel Landwerlin
a4f2c6face intel/compiler: fixup Gen12 workaround for array sizes
We didn't handle the case of NULL images/textures for which we should
return 0.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 397ff2976b ("intel: Implement Gen12 workaround for array textures of size 1")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3522
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6729>
(cherry picked from commit cc3bf00cc2)
2020-09-23 20:58:04 +02:00
Samuel Pitoiset
077d2a8068 radv: fix transform feedback crashes if pCounterBufferOffsets is NULL
From the Vulkan 1.2.154 spec:
    "If pCounterBufferOffsets is NULL, then it is assumed the
     offsets are zero."

Fix new CTS
dEQP-VK.transform_feedback.simple.backward_dependency_no_offset_array.

CC: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6798>
(cherry picked from commit 2b99e15d0a)
2020-09-23 20:58:02 +02:00
Rhys Perry
78df8e5e38 radv,aco: fix reading primitive ID in FS after TES
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3530
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6760>
(cherry picked from commit 2228835fb5)
2020-09-23 20:58:00 +02:00
Bas Nieuwenhuizen
0f61e68ede ac/surface: Fix depth import on GFX6-GFX8.
Lets just do depth interop imports by convention between radv and
radeonsi for now. The only thing using this should be Vulkan interop
anyway.

CC: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6617>
(cherry picked from commit ecc19e9819)
2020-09-23 20:56:27 +02:00
Jason Ekstrand
11ebe27d97 intel/fs/swsb: SCHEDULING_FENCE only emits SYNC_NOP
It's not really unordered in the sense that it can still stall on
ordered things and we don't need a SYNC_NOP for that because it is a
SYNC_NOP.  However, it also doesn't count when computing instruction
distances.

Fixes: 18e72ee210 "intel/fs: Add FS_OPCODE_SCHEDULING_FENCE"
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6781>
(cherry picked from commit f63ffc18e7)
2020-09-23 20:46:57 +02:00
Jesse Natalie
80da07288b glsl_type: Add packed to structure type comparison for hash map
Fixes: 659f333b3a "glsl: add packed for struct types"
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6767>
(cherry picked from commit 9aa86eb61a)
2020-09-23 20:46:52 +02:00
Pierre-Loup A. Griffais
d99fe9f86f radv: fix vertex buffer null descriptors
Fixes: 0f1ead7b53 "radv: handle NULL vertex bindings"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6773>
(cherry picked from commit 7b4eaac6a9)
2020-09-23 20:45:18 +02:00
Pierre-Loup A. Griffais
b8534f4771 radv: fix null descriptor for dynamic buffers
Fixes: c1ef225d18 "radv: handle NULL descriptors"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6772>
(cherry picked from commit ec13622ff4)
2020-09-23 20:45:16 +02:00
Pierre-Eric Pelloux-Prayer
819be690c0 mesa: fix glUniform* when a struct contains a bindless sampler
Small example from #3271:

layout (bindless_sampler) uniform;
struct SamplerSparse {
  sampler2D tex;
  vec4 size;
  [...]
};
uniform SamplerSparse foo;

'foo' will be marked as bindless but we should only take the assign-as-GLuint64 path for 'tex'.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3271
Fixes: 990c8d15ac ("mesa: fix setting uniform variables for bindless samplers/images")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6730>
(cherry picked from commit 090fc593b4)
2020-09-23 20:41:21 +02:00
Eric Engestrom
c2c53b9e63 .pick_status.json: Update to c669db0b50 2020-09-23 20:40:51 +02:00
Rhys Perry
d226595210 radv: initialize with expanded cmask if the destination layout needs it
If radv_layout_can_fast_clear() is false, 028C70_COMPRESSION is unset when
the image is rendered to and CMASK isn't updated. This appears to cause
FMASK to be ignored and the 0th sample to always be used.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3449
Fixes: 7b21ce401f
   ('radv: disable FMASK compression when drawing with GENERAL layout')

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6745>
(cherry picked from commit 85cc2950a0)
2020-09-17 22:03:29 +02:00
Bas Nieuwenhuizen
c0d443656f amd/common: Cache intra-tile addresses for retile map.
However complicated DCC addressing is it is still based on tiles.
If we have the intra-tile offsets + tile dimensions we can expand
that to the full image ourselves.

Behavior around ~1080p on a 2500U:

old:
  30-60 ms on every miss

new:
  5 ms initally (miss in the tile cache)
  <0.5 ms afterwards

The most common case is that the tile cache only contains data for
2 tiles, which for Raven/Renoir/Navi14 will be 4 KiB each, so the
size increase is fairly modest.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5865>
(cherry picked from commit a37aeb128d)
2020-09-17 22:03:29 +02:00
Eric Engestrom
ed94f8f266 .pick_status.json: Update to d74fe47101 2020-09-17 22:03:29 +02:00
Eric Engestrom
e9ec84ad66 docs/relnotes: add sha256 sums to 20.1.8 2020-09-16 19:42:34 +02:00
36 changed files with 4863 additions and 222 deletions

File diff suppressed because it is too large Load Diff

View File

@@ -1 +1 @@
20.1.8
20.1.9

View File

@@ -36,7 +36,7 @@ depends on the particular driver being used.
<h2>SHA256 checksum</h2>
<pre>
TBD.
df21351494f7caaec5a3ccc16f14f15512e98d2ecde178bba1d134edc899b961 mesa-20.1.8.tar.xz
</pre>

140
docs/relnotes/20.1.9.html Normal file
View File

@@ -0,0 +1,140 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<title>Mesa Release Notes</title>
<link rel="stylesheet" type="text/css" href="../mesa.css">
</head>
<body>
<div class="header">
<h1>The Mesa 3D Graphics Library</h1>
</div>
<iframe src="../contents.html"></iframe>
<div class="content">
<h1>Mesa 20.1.9 Release Notes / 2020-09-30</h1>
<p>
Mesa 20.1.9 is a bug fix release which fixes bugs found since the 20.1.8 release.
</p>
<p>
Mesa 20.1.9 implements the OpenGL 4.6 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.6. OpenGL
4.6 is <strong>only</strong> available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
</p>
<p>
Mesa 20.1.9 implements the Vulkan 1.2 API, but the version reported by
the apiVersion property of the VkPhysicalDeviceProperties struct
depends on the particular driver being used.
</p>
<h2>SHA256 checksum</h2>
<pre>
TBD.
</pre>
<h2>New features</h2>
<ul>
<li>None</li>
</ul>
<h2>Bug fixes</h2>
<ul>
<li>Horizon Zero Dawn graphics corruption with with radv</li>
<li>Running Amber test leads to VK_DEVICE_LOST</li>
<li>[spirv-fuzz] Shader generates a wrong image</li>
<li>anv: dEQP-VK.robustness.robustness2.* failures on gen12</li>
<li>[RADV] Problems reading primitive ID in fragment shader after tessellation</li>
<li>Substance Painter 6.1.3 black glitches on Radeon RX570</li>
<li>vkCmdCopyImage broadcasts subsample 0 of MSAA src into all subsamples of dst on RADV</li>
</ul>
<h2>Changes</h2>
<ul>
<p>Bas Nieuwenhuizen (3):</p>
<li> amd/common: Cache intra-tile addresses for retile map.</li>
<li> ac/surface: Fix depth import on GFX6-GFX8.</li>
<li> st/mesa: Deal with empty textures/buffers in semaphore wait/signal.</li>
<p></p>
<p>Christian Gmeiner (1):</p>
<li> etnaviv: simplify linear stride implementation</li>
<p></p>
<p>Connor Abbott (1):</p>
<li> nir/lower_io_arrays: Fix xfb_offset bug</li>
<p></p>
<p>Danylo Piliaiev (4):</p>
<li> nir/lower_io: Eliminate oob writes and return zero for oob reads</li>
<li> nir/large_constants: Eliminate out-of-bounds writes to large constants</li>
<li> nir/lower_samplers: Clamp out-of-bounds access to array of samplers</li>
<li> intel/fs: Disable sample mask predication for scratch stores</li>
<p></p>
<p>Dylan Baker (1):</p>
<li> meson/anv: Use variable that checks for --build-id</li>
<p></p>
<p>Eric Engestrom (9):</p>
<li> docs/relnotes: add sha256 sums to 20.1.8</li>
<li> .pick_status.json: Update to d74fe47101995d2659b1e59495d2f77b9dc14f3d</li>
<li> .pick_status.json: Update to c669db0b503c10faf2d1c67c9340d7222b4f946e</li>
<li> .pick_status.json: Update to a3543adc2628461818cfa691a7f547af7bc6f0fb</li>
<li> .pick_status.json: Mark 802d3611dcec8102ef75fe2461340c2997af931e as denominated</li>
<li> .pick_status.json: Mark e98c7a66347a05fc166c377ab1abb77955aff775 as denominated</li>
<li> .pick_status.json: Mark 6b1a56b908e702c06f55c63b19b695a47f607456 as denominated</li>
<li> .pick_status.json: Mark 89401e58672e1251b954662f0f776a6e9bce6df8 as denominated</li>
<li> .pick_status.json: Update to efaea653b5766427701817ab06c319902a148ee9</li>
<p></p>
<p>Erik Faye-Lund (2):</p>
<li> mesa: handle GL_FRONT after translating to it</li>
<li> st/mesa: use roundf instead of floorf for lod-bias rounding</li>
<p></p>
<p>Jason Ekstrand (2):</p>
<li> intel/fs/swsb: SCHEDULING_FENCE only emits SYNC_NOP</li>
<li> nir/liveness: Consider if uses in nir_ssa_defs_interfere</li>
<p></p>
<p>Jesse Natalie (1):</p>
<li> glsl_type: Add packed to structure type comparison for hash map</li>
<p></p>
<p>Karol Herbst (1):</p>
<li> spirv: extract switch parsing into its own function</li>
<p></p>
<p>Lionel Landwerlin (1):</p>
<li> intel/compiler: fixup Gen12 workaround for array sizes</li>
<p></p>
<p>Marek Olšák (1):</p>
<li> radeonsi: fix indirect dispatches with variable block sizes</li>
<p></p>
<p>Nanley Chery (1):</p>
<li> blorp: Ensure aligned HIZ_CCS_WT partial clears</li>
<p></p>
<p>Pierre-Eric Pelloux-Prayer (3):</p>
<li> mesa: fix glUniform* when a struct contains a bindless sampler</li>
<li> gallium/vl: do not call transfer_unmap if transfer is NULL</li>
<li> gallium/vl: add chroma_format arg to vl_video_buffer functions</li>
<p></p>
<p>Pierre-Loup A. Griffais (2):</p>
<li> radv: fix null descriptor for dynamic buffers</li>
<li> radv: fix vertex buffer null descriptors</li>
<p></p>
<p>Rhys Perry (2):</p>
<li> radv: initialize with expanded cmask if the destination layout needs it</li>
<li> radv,aco: fix reading primitive ID in FS after TES</li>
<p></p>
<p>Samuel Pitoiset (2):</p>
<li> radv: fix transform feedback crashes if pCounterBufferOffsets is NULL</li>
<li> spirv: fix emitting switch cases that directly jump to the merge block</li>
<p></p>
<p></p>
</ul>
</div>
</body>
</html>

View File

@@ -61,6 +61,7 @@ struct ac_addrlib {
*/
simple_mtx_t dcc_retile_map_lock;
struct hash_table *dcc_retile_maps;
struct hash_table *dcc_retile_tile_indices;
};
struct dcc_retile_map_key {
@@ -89,6 +90,156 @@ static void dcc_retile_map_free(struct hash_entry *entry)
free(entry->data);
}
struct dcc_retile_tile_key {
enum radeon_family family;
unsigned bpp;
unsigned swizzle_mode;
bool rb_aligned;
bool pipe_aligned;
};
struct dcc_retile_tile_data {
unsigned tile_width_log2;
unsigned tile_height_log2;
uint16_t *data;
};
static uint32_t dcc_retile_tile_hash_key(const void *key)
{
return _mesa_hash_data(key, sizeof(struct dcc_retile_tile_key));
}
static bool dcc_retile_tile_keys_equal(const void *a, const void *b)
{
return memcmp(a, b, sizeof(struct dcc_retile_tile_key)) == 0;
}
static void dcc_retile_tile_free(struct hash_entry *entry)
{
free((void*)entry->key);
free(((struct dcc_retile_tile_data*)entry->data)->data);
free(entry->data);
}
/* Assumes dcc_retile_map_lock is taken. */
static const struct dcc_retile_tile_data *
ac_compute_dcc_retile_tile_indices(struct ac_addrlib *addrlib,
const struct radeon_info *info,
unsigned bpp, unsigned swizzle_mode,
bool rb_aligned, bool pipe_aligned)
{
struct dcc_retile_tile_key key = (struct dcc_retile_tile_key) {
.family = info->family,
.bpp = bpp,
.swizzle_mode = swizzle_mode,
.rb_aligned = rb_aligned,
.pipe_aligned = pipe_aligned
};
struct hash_entry *entry = _mesa_hash_table_search(addrlib->dcc_retile_tile_indices, &key);
if (entry)
return entry->data;
ADDR2_COMPUTE_DCCINFO_INPUT din = {0};
ADDR2_COMPUTE_DCCINFO_OUTPUT dout = {0};
din.size = sizeof(ADDR2_COMPUTE_DCCINFO_INPUT);
dout.size = sizeof(ADDR2_COMPUTE_DCCINFO_OUTPUT);
din.dccKeyFlags.pipeAligned = pipe_aligned;
din.dccKeyFlags.rbAligned = rb_aligned;
din.resourceType = ADDR_RSRC_TEX_2D;
din.swizzleMode = swizzle_mode;
din.bpp = bpp;
din.unalignedWidth = 1;
din.unalignedHeight = 1;
din.numSlices = 1;
din.numFrags = 1;
din.numMipLevels = 1;
ADDR_E_RETURNCODE ret = Addr2ComputeDccInfo(addrlib->handle, &din, &dout);
if (ret != ADDR_OK)
return NULL;
ADDR2_COMPUTE_DCC_ADDRFROMCOORD_INPUT addrin = {0};
addrin.size = sizeof(addrin);
addrin.swizzleMode = swizzle_mode;
addrin.resourceType = ADDR_RSRC_TEX_2D;
addrin.bpp = bpp;
addrin.numSlices = 1;
addrin.numMipLevels = 1;
addrin.numFrags = 1;
addrin.pitch = dout.pitch;
addrin.height = dout.height;
addrin.compressBlkWidth = dout.compressBlkWidth;
addrin.compressBlkHeight = dout.compressBlkHeight;
addrin.compressBlkDepth = dout.compressBlkDepth;
addrin.metaBlkWidth = dout.metaBlkWidth;
addrin.metaBlkHeight = dout.metaBlkHeight;
addrin.metaBlkDepth = dout.metaBlkDepth;
addrin.dccKeyFlags.pipeAligned = pipe_aligned;
addrin.dccKeyFlags.rbAligned = rb_aligned;
unsigned w = dout.metaBlkWidth / dout.compressBlkWidth;
unsigned h = dout.metaBlkHeight / dout.compressBlkHeight;
uint16_t *indices = malloc(w * h * sizeof (uint16_t));
if (!indices)
return NULL;
ADDR2_COMPUTE_DCC_ADDRFROMCOORD_OUTPUT addrout = {};
addrout.size = sizeof(addrout);
for (unsigned y = 0; y < h; ++y) {
addrin.y = y * dout.compressBlkHeight;
for (unsigned x = 0; x < w; ++x) {
addrin.x = x * dout.compressBlkWidth;
addrout.addr = 0;
if (Addr2ComputeDccAddrFromCoord(addrlib->handle, &addrin, &addrout) != ADDR_OK) {
free(indices);
return NULL;
}
indices[y * w + x] = addrout.addr;
}
}
struct dcc_retile_tile_data *data = calloc(1, sizeof(*data));
if (!data) {
free(indices);
return NULL;
}
data->tile_width_log2 = util_logbase2(w);
data->tile_height_log2 = util_logbase2(h);
data->data = indices;
struct dcc_retile_tile_key *heap_key = mem_dup(&key, sizeof(key));
if (!heap_key) {
free(data);
free(indices);
return NULL;
}
entry = _mesa_hash_table_insert(addrlib->dcc_retile_tile_indices, heap_key, data);
if (!entry) {
free(heap_key);
free(data);
free(indices);
}
return data;
}
static uint32_t ac_compute_retile_tile_addr(const struct dcc_retile_tile_data *tile,
unsigned stride, unsigned x, unsigned y)
{
unsigned x_mask = (1u << tile->tile_width_log2) - 1;
unsigned y_mask = (1u << tile->tile_height_log2) - 1;
unsigned tile_size_log2 = tile->tile_width_log2 + tile->tile_height_log2;
unsigned base = ((y >> tile->tile_height_log2) * stride + (x >> tile->tile_width_log2)) << tile_size_log2;
unsigned offset_in_tile = tile->data[((y & y_mask) << tile->tile_width_log2) + (x & x_mask)];
return base + offset_in_tile;
}
static uint32_t *ac_compute_dcc_retile_map(struct ac_addrlib *addrlib,
const struct radeon_info *info,
unsigned retile_width, unsigned retile_height,
@@ -120,11 +271,17 @@ static uint32_t *ac_compute_dcc_retile_map(struct ac_addrlib *addrlib,
return map;
}
ADDR2_COMPUTE_DCC_ADDRFROMCOORD_INPUT addrin;
memcpy(&addrin, in, sizeof(*in));
ADDR2_COMPUTE_DCC_ADDRFROMCOORD_OUTPUT addrout = {};
addrout.size = sizeof(addrout);
const struct dcc_retile_tile_data *src_tile =
ac_compute_dcc_retile_tile_indices(addrlib, info, in->bpp,
in->swizzleMode,
rb_aligned, pipe_aligned);
const struct dcc_retile_tile_data *dst_tile =
ac_compute_dcc_retile_tile_indices(addrlib, info, in->bpp,
in->swizzleMode, false, false);
if (!src_tile || !dst_tile) {
simple_mtx_unlock(&addrlib->dcc_retile_map_lock);
return NULL;
}
void *dcc_retile_map = malloc(dcc_retile_map_size);
if (!dcc_retile_map) {
@@ -133,47 +290,27 @@ static uint32_t *ac_compute_dcc_retile_map(struct ac_addrlib *addrlib,
}
unsigned index = 0;
unsigned w = DIV_ROUND_UP(retile_width, in->compressBlkWidth);
unsigned h = DIV_ROUND_UP(retile_height, in->compressBlkHeight);
unsigned src_stride = DIV_ROUND_UP(w, 1u << src_tile->tile_width_log2);
unsigned dst_stride = DIV_ROUND_UP(w, 1u << dst_tile->tile_width_log2);
for (unsigned y = 0; y < retile_height; y += in->compressBlkHeight) {
addrin.y = y;
for (unsigned y = 0; y < h; ++y) {
for (unsigned x = 0; x < w; ++x) {
unsigned src_addr = ac_compute_retile_tile_addr(src_tile, src_stride, x, y);
unsigned dst_addr = ac_compute_retile_tile_addr(dst_tile, dst_stride, x, y);
for (unsigned x = 0; x < retile_width; x += in->compressBlkWidth) {
addrin.x = x;
/* Compute src DCC address */
addrin.dccKeyFlags.pipeAligned = pipe_aligned;
addrin.dccKeyFlags.rbAligned = rb_aligned;
addrout.addr = 0;
if (Addr2ComputeDccAddrFromCoord(addrlib->handle, &addrin, &addrout) != ADDR_OK) {
simple_mtx_unlock(&addrlib->dcc_retile_map_lock);
return NULL;
if (use_uint16) {
((uint16_t*)dcc_retile_map)[2 * index] = src_addr;
((uint16_t*)dcc_retile_map)[2 * index + 1] = dst_addr;
} else {
((uint32_t*)dcc_retile_map)[2 * index] = src_addr;
((uint32_t*)dcc_retile_map)[2 * index + 1] = dst_addr;
}
if (use_uint16)
((uint16_t*)dcc_retile_map)[index * 2] = addrout.addr;
else
((uint32_t*)dcc_retile_map)[index * 2] = addrout.addr;
/* Compute dst DCC address */
addrin.dccKeyFlags.pipeAligned = 0;
addrin.dccKeyFlags.rbAligned = 0;
addrout.addr = 0;
if (Addr2ComputeDccAddrFromCoord(addrlib->handle, &addrin, &addrout) != ADDR_OK) {
simple_mtx_unlock(&addrlib->dcc_retile_map_lock);
return NULL;
}
if (use_uint16)
((uint16_t*)dcc_retile_map)[index * 2 + 1] = addrout.addr;
else
((uint32_t*)dcc_retile_map)[index * 2 + 1] = addrout.addr;
assert(index * 2 + 1 < dcc_retile_num_elements);
index++;
++index;
}
}
/* Fill the remaining pairs with the last one (for the compute shader). */
for (unsigned i = index * 2; i < dcc_retile_num_elements; i++) {
if (use_uint16)
@@ -276,6 +413,8 @@ struct ac_addrlib *ac_addrlib_create(const struct radeon_info *info,
simple_mtx_init(&addrlib->dcc_retile_map_lock, mtx_plain);
addrlib->dcc_retile_maps = _mesa_hash_table_create(NULL, dcc_retile_map_hash_key,
dcc_retile_map_keys_equal);
addrlib->dcc_retile_tile_indices = _mesa_hash_table_create(NULL, dcc_retile_tile_hash_key,
dcc_retile_tile_keys_equal);
return addrlib;
}
@@ -284,6 +423,7 @@ void ac_addrlib_destroy(struct ac_addrlib *addrlib)
AddrDestroy(addrlib->handle);
simple_mtx_destroy(&addrlib->dcc_retile_map_lock);
_mesa_hash_table_destroy(addrlib->dcc_retile_maps, dcc_retile_map_free);
_mesa_hash_table_destroy(addrlib->dcc_retile_tile_indices, dcc_retile_tile_free);
free(addrlib);
}
@@ -872,7 +1012,8 @@ static int gfx6_compute_surface(ADDR_HANDLE addrlib,
/* Set preferred macrotile parameters. This is usually required
* for shared resources. This is for 2D tiling only. */
if (AddrSurfInfoIn.tileMode >= ADDR_TM_2D_TILED_THIN1 &&
if (!(surf->flags & RADEON_SURF_Z_OR_SBUFFER) &&
AddrSurfInfoIn.tileMode >= ADDR_TM_2D_TILED_THIN1 &&
surf->u.legacy.bankw && surf->u.legacy.bankh &&
surf->u.legacy.mtilea && surf->u.legacy.tile_split) {
/* If any of these parameters are incorrect, the calculation

View File

@@ -9753,7 +9753,10 @@ static void create_vs_exports(isel_context *ctx)
if (outinfo->export_prim_id && !(ctx->stage & hw_ngg_gs)) {
ctx->outputs.mask[VARYING_SLOT_PRIMITIVE_ID] |= 0x1;
ctx->outputs.temps[VARYING_SLOT_PRIMITIVE_ID * 4u] = get_arg(ctx, ctx->args->vs_prim_id);
if (ctx->stage & sw_tes)
ctx->outputs.temps[VARYING_SLOT_PRIMITIVE_ID * 4u] = get_arg(ctx, ctx->args->ac.tes_patch_id);
else
ctx->outputs.temps[VARYING_SLOT_PRIMITIVE_ID * 4u] = get_arg(ctx, ctx->args->vs_prim_id);
}
if (ctx->options->key.has_multiview_view_index) {

View File

@@ -2486,8 +2486,10 @@ radv_flush_vertex_descriptors(struct radv_cmd_buffer *cmd_buffer,
uint32_t stride = cmd_buffer->state.pipeline->binding_stride[i];
unsigned num_records;
if (!buffer)
if (!buffer) {
memset(desc, 0, 4 * 4);
continue;
}
va = radv_buffer_get_va(buffer->bo);
@@ -3619,22 +3621,27 @@ void radv_CmdBindDescriptorSets(
assert(dyn_idx < dynamicOffsetCount);
struct radv_descriptor_range *range = set->dynamic_descriptors + j;
uint64_t va = range->va + pDynamicOffsets[dyn_idx];
dst[0] = va;
dst[1] = S_008F04_BASE_ADDRESS_HI(va >> 32);
dst[2] = no_dynamic_bounds ? 0xffffffffu : range->size;
dst[3] = S_008F0C_DST_SEL_X(V_008F0C_SQ_SEL_X) |
S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) |
S_008F0C_DST_SEL_Z(V_008F0C_SQ_SEL_Z) |
S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W);
if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX10) {
dst[3] |= S_008F0C_FORMAT(V_008F0C_IMG_FORMAT_32_FLOAT) |
S_008F0C_OOB_SELECT(V_008F0C_OOB_SELECT_RAW) |
S_008F0C_RESOURCE_LEVEL(1);
if (!range->va) {
memset(dst, 0, 4 * 4);
} else {
dst[3] |= S_008F0C_NUM_FORMAT(V_008F0C_BUF_NUM_FORMAT_FLOAT) |
S_008F0C_DATA_FORMAT(V_008F0C_BUF_DATA_FORMAT_32);
uint64_t va = range->va + pDynamicOffsets[dyn_idx];
dst[0] = va;
dst[1] = S_008F04_BASE_ADDRESS_HI(va >> 32);
dst[2] = no_dynamic_bounds ? 0xffffffffu : range->size;
dst[3] = S_008F0C_DST_SEL_X(V_008F0C_SQ_SEL_X) |
S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) |
S_008F0C_DST_SEL_Z(V_008F0C_SQ_SEL_Z) |
S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W);
if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX10) {
dst[3] |= S_008F0C_FORMAT(V_008F0C_IMG_FORMAT_32_FLOAT) |
S_008F0C_OOB_SELECT(V_008F0C_OOB_SELECT_RAW) |
S_008F0C_RESOURCE_LEVEL(1);
} else {
dst[3] |= S_008F0C_NUM_FORMAT(V_008F0C_BUF_NUM_FORMAT_FLOAT) |
S_008F0C_DATA_FORMAT(V_008F0C_BUF_DATA_FORMAT_32);
}
}
cmd_buffer->push_constant_stages |=
@@ -5517,8 +5524,16 @@ static void radv_init_color_image_metadata(struct radv_cmd_buffer *cmd_buffer,
if (radv_image_has_cmask(image)) {
uint32_t value = 0xffffffffu; /* Fully expanded mode. */
/* TODO: clarify this. */
if (radv_image_has_fmask(image)) {
/* TODO: clarify why 0xccccccccu is used. */
/* If CMASK isn't updated with the new layout, we should use the
* fully expanded mode so that the image is read correctly if
* CMASK is used (such as when transitioning to a compressed
* layout).
*/
if (radv_image_has_fmask(image) &&
radv_layout_can_fast_clear(image, dst_layout,
dst_render_loop, dst_queue_mask)) {
value = 0xccccccccu;
}
@@ -6163,8 +6178,12 @@ radv_emit_streamout_begin(struct radv_cmd_buffer *cmd_buffer,
/* The array of counter buffers is optional. */
RADV_FROM_HANDLE(radv_buffer, buffer, pCounterBuffers[counter_buffer_idx]);
uint64_t va = radv_buffer_get_va(buffer->bo);
uint64_t counter_buffer_offset = 0;
va += buffer->offset + pCounterBufferOffsets[counter_buffer_idx];
if (pCounterBufferOffsets)
counter_buffer_offset = pCounterBufferOffsets[counter_buffer_idx];
va += buffer->offset + counter_buffer_offset;
/* Append */
radeon_emit(cs, PKT3(PKT3_STRMOUT_BUFFER_UPDATE, 4, 0));
@@ -6227,9 +6246,13 @@ gfx10_emit_streamout_begin(struct radv_cmd_buffer *cmd_buffer,
if (append) {
RADV_FROM_HANDLE(radv_buffer, buffer, pCounterBuffers[counter_buffer_idx]);
uint64_t counter_buffer_offset = 0;
if (pCounterBufferOffsets)
counter_buffer_offset = pCounterBufferOffsets[counter_buffer_idx];
va += radv_buffer_get_va(buffer->bo);
va += buffer->offset + pCounterBufferOffsets[counter_buffer_idx];
va += buffer->offset + counter_buffer_offset;
radv_cs_add_buffer(cmd_buffer->device->ws, cs, buffer->bo);
}
@@ -6292,8 +6315,12 @@ radv_emit_streamout_end(struct radv_cmd_buffer *cmd_buffer,
/* The array of counters buffer is optional. */
RADV_FROM_HANDLE(radv_buffer, buffer, pCounterBuffers[counter_buffer_idx]);
uint64_t va = radv_buffer_get_va(buffer->bo);
uint64_t counter_buffer_offset = 0;
va += buffer->offset + pCounterBufferOffsets[counter_buffer_idx];
if (pCounterBufferOffsets)
counter_buffer_offset = pCounterBufferOffsets[counter_buffer_idx];
va += buffer->offset + counter_buffer_offset;
radeon_emit(cs, PKT3(PKT3_STRMOUT_BUFFER_UPDATE, 4, 0));
radeon_emit(cs, STRMOUT_SELECT_BUFFER(i) |
@@ -6344,8 +6371,12 @@ gfx10_emit_streamout_end(struct radv_cmd_buffer *cmd_buffer,
/* The array of counters buffer is optional. */
RADV_FROM_HANDLE(radv_buffer, buffer, pCounterBuffers[counter_buffer_idx]);
uint64_t va = radv_buffer_get_va(buffer->bo);
uint64_t counter_buffer_offset = 0;
va += buffer->offset + pCounterBufferOffsets[counter_buffer_idx];
if (pCounterBufferOffsets)
counter_buffer_offset = pCounterBufferOffsets[counter_buffer_idx];
va += buffer->offset + counter_buffer_offset;
si_cs_emit_write_event_eop(cs,
cmd_buffer->device->physical_device->rad_info.chip_class,

View File

@@ -928,8 +928,10 @@ static void write_dynamic_buffer_descriptor(struct radv_device *device,
uint64_t va;
unsigned size;
if (!buffer)
if (!buffer) {
range->va = 0;
return;
}
va = radv_buffer_get_va(buffer->bo);
size = buffer_info->range;

View File

@@ -1987,8 +1987,12 @@ handle_vs_outputs_post(struct radv_shader_context *ctx,
outputs[noutput].slot_name = VARYING_SLOT_PRIMITIVE_ID;
outputs[noutput].slot_index = 0;
outputs[noutput].usage_mask = 0x1;
outputs[noutput].values[0] =
ac_get_arg(&ctx->ac, ctx->args->vs_prim_id);
if (ctx->stage == MESA_SHADER_TESS_EVAL)
outputs[noutput].values[0] =
ac_get_arg(&ctx->ac, ctx->args->ac.tes_patch_id);
else
outputs[noutput].values[0] =
ac_get_arg(&ctx->ac, ctx->args->vs_prim_id);
for (unsigned j = 1; j < 4; j++)
outputs[noutput].values[j] = ctx->ac.f32_0;
noutput++;

View File

@@ -1087,6 +1087,9 @@ glsl_type::record_compare(const glsl_type *b, bool match_name,
if (this->interface_row_major != b->interface_row_major)
return false;
if (this->packed != b->packed)
return false;
/* From the GLSL 4.20 specification (Sec 4.2):
*
* "Structures must have the same name, sequence of type names, and

View File

@@ -250,6 +250,15 @@ search_for_use_after_instr(nir_instr *start, nir_ssa_def *def)
return true;
node = node->next;
}
/* If uses are considered to be in the block immediately preceding the if
* so we need to also check the following if condition, if any.
*/
nir_if *following_if = nir_block_get_following_if(start->block);
if (following_if && following_if->condition.is_ssa &&
following_if->condition.ssa == def)
return true;
return false;
}

View File

@@ -649,6 +649,37 @@ nir_lower_io_block(nir_block *block,
mode == nir_var_shader_out ||
var->data.bindless;
if (nir_deref_instr_is_known_out_of_bounds(deref)) {
/* Section 5.11 (Out-of-Bounds Accesses) of the GLSL 4.60 spec says:
*
* In the subsections described above for array, vector, matrix and
* structure accesses, any out-of-bounds access produced undefined
* behavior....
* Out-of-bounds reads return undefined values, which
* include values from other variables of the active program or zero.
* Out-of-bounds writes may be discarded or overwrite
* other variables of the active program.
*
* GL_KHR_robustness and GL_ARB_robustness encourage us to return zero
* for reads.
*
* Otherwise get_io_offset would return out-of-bound offset which may
* result in out-of-bound loading/storing of inputs/outputs,
* that could cause issues in drivers down the line.
*/
if (intrin->intrinsic != nir_intrinsic_store_deref) {
nir_ssa_def *zero =
nir_imm_zero(b, intrin->dest.ssa.num_components,
intrin->dest.ssa.bit_size);
nir_ssa_def_rewrite_uses(&intrin->dest.ssa,
nir_src_for_ssa(zero));
}
nir_instr_remove(&intrin->instr);
progress = true;
continue;
}
offset = get_io_offset(b, deref, per_vertex ? &vertex_index : NULL,
state->type_size, &component_offset,
bindless_type_size);

View File

@@ -61,7 +61,7 @@ get_io_offset(nir_builder *b, nir_deref_instr *deref, nir_variable *var,
unsigned size = glsl_count_attribute_slots((*p)->type, false);
offset += size * index;
xfb_offset += index * glsl_get_component_slots((*p)->type) * 4;
*xfb_offset += index * glsl_get_component_slots((*p)->type) * 4;
unsigned num_elements = glsl_type_is_array((*p)->type) ?
glsl_get_aoa_size((*p)->type) : 1;

View File

@@ -47,7 +47,27 @@ lower_tex_src_to_offset(nir_builder *b,
if (nir_src_is_const(deref->arr.index) && index == NULL) {
/* We're still building a direct index */
base_index += nir_src_as_uint(deref->arr.index) * array_elements;
unsigned index_in_array = nir_src_as_uint(deref->arr.index);
/* Section 5.11 (Out-of-Bounds Accesses) of the GLSL 4.60 spec says:
*
* In the subsections described above for array, vector, matrix and
* structure accesses, any out-of-bounds access produced undefined
* behavior.... Out-of-bounds reads return undefined values, which
* include values from other variables of the active program or zero.
*
* Robustness extensions suggest to return zero on out-of-bounds
* accesses, however it's not applicable to the arrays of samplers,
* so just clamp the index.
*
* Otherwise instr->sampler_index or instr->texture_index would be out
* of bounds, and they are used as an index to arrays of driver state.
*/
if (index_in_array < glsl_array_size(parent->type)) {
base_index += index_in_array * array_elements;
} else {
base_index = glsl_array_size(parent->type) - 1;
}
} else {
if (index == NULL) {
/* We used to be direct but not anymore */

View File

@@ -118,8 +118,11 @@ handle_constant_store(void *mem_ctx, struct var_info *info,
info->constant_data = rzalloc_size(mem_ctx, var_size);
}
char *dst = (char *)info->constant_data +
nir_deref_instr_get_const_offset(deref, size_align);
const unsigned offset = nir_deref_instr_get_const_offset(deref, size_align);
if (offset >= info->constant_data_size)
return;
char *dst = (char *)info->constant_data + offset;
for (unsigned i = 0; i < num_components; i++) {
if (!(writemask & (1 << i)))

View File

@@ -608,6 +608,74 @@ vtn_add_cfg_work_item(struct vtn_builder *b,
list_addtail(&work->link, work_list);
}
/* returns the default block */
static void
vtn_parse_switch(struct vtn_builder *b,
struct vtn_switch *swtch,
const uint32_t *branch,
struct list_head *case_list)
{
const uint32_t *branch_end = branch + (branch[0] >> SpvWordCountShift);
struct vtn_value *sel_val = vtn_untyped_value(b, branch[1]);
vtn_fail_if(!sel_val->type ||
sel_val->type->base_type != vtn_base_type_scalar,
"Selector of OpSwitch must have a type of OpTypeInt");
nir_alu_type sel_type =
nir_get_nir_type_for_glsl_type(sel_val->type->type);
vtn_fail_if(nir_alu_type_get_base_type(sel_type) != nir_type_int &&
nir_alu_type_get_base_type(sel_type) != nir_type_uint,
"Selector of OpSwitch must have a type of OpTypeInt");
struct hash_table *block_to_case = _mesa_pointer_hash_table_create(b);
bool is_default = true;
const unsigned bitsize = nir_alu_type_get_type_size(sel_type);
for (const uint32_t *w = branch + 2; w < branch_end;) {
uint64_t literal = 0;
if (!is_default) {
if (bitsize <= 32) {
literal = *(w++);
} else {
assert(bitsize == 64);
literal = vtn_u64_literal(w);
w += 2;
}
}
struct vtn_block *case_block = vtn_block(b, *(w++));
struct hash_entry *case_entry =
_mesa_hash_table_search(block_to_case, case_block);
struct vtn_case *cse;
if (case_entry) {
cse = case_entry->data;
} else {
cse = rzalloc(b, struct vtn_case);
cse->node.type = vtn_cf_node_type_case;
cse->node.parent = swtch ? &swtch->node : NULL;
cse->block = case_block;
list_inithead(&cse->body);
util_dynarray_init(&cse->values, b);
list_addtail(&cse->node.link, case_list);
_mesa_hash_table_insert(block_to_case, case_block, cse);
}
if (is_default) {
cse->is_default = true;
} else {
util_dynarray_append(&cse->values, uint64_t, literal);
}
is_default = false;
}
_mesa_hash_table_destroy(block_to_case, NULL);
}
/* Processes a block and returns the next block to process or NULL if we've
* reached the end of the construct.
*/
@@ -812,17 +880,6 @@ vtn_process_block(struct vtn_builder *b,
}
case SpvOpSwitch: {
struct vtn_value *sel_val = vtn_untyped_value(b, block->branch[1]);
vtn_fail_if(!sel_val->type ||
sel_val->type->base_type != vtn_base_type_scalar,
"Selector of OpSwitch must have a type of OpTypeInt");
nir_alu_type sel_type =
nir_get_nir_type_for_glsl_type(sel_val->type->type);
vtn_fail_if(nir_alu_type_get_base_type(sel_type) != nir_type_int &&
nir_alu_type_get_base_type(sel_type) != nir_type_uint,
"Selector of OpSwitch must have a type of OpTypeInt");
struct vtn_switch *swtch = rzalloc(b, struct vtn_switch);
swtch->node.type = vtn_cf_node_type_switch;
@@ -843,82 +900,39 @@ vtn_process_block(struct vtn_builder *b,
}
/* First, we go through and record all of the cases. */
const uint32_t *branch_end =
block->branch + (block->branch[0] >> SpvWordCountShift);
vtn_parse_switch(b, swtch, block->branch, &swtch->cases);
struct hash_table *block_to_case = _mesa_pointer_hash_table_create(b);
/* Gather the branch types for the switch */
vtn_foreach_cf_node(case_node, &swtch->cases) {
struct vtn_case *cse = vtn_cf_node_as_case(case_node);
bool is_default = true;
const unsigned bitsize = nir_alu_type_get_type_size(sel_type);
for (const uint32_t *w = block->branch + 2; w < branch_end;) {
uint64_t literal = 0;
if (!is_default) {
if (bitsize <= 32) {
literal = *(w++);
} else {
assert(bitsize == 64);
literal = vtn_u64_literal(w);
w += 2;
}
cse->type = vtn_handle_branch(b, &swtch->node, cse->block);
switch (cse->type) {
case vtn_branch_type_none:
/* This is a "real" cases which has stuff in it */
vtn_fail_if(cse->block->switch_case != NULL,
"OpSwitch has a case which is also in another "
"OpSwitch construct");
cse->block->switch_case = cse;
vtn_add_cfg_work_item(b, work_list, &cse->node,
&cse->body, cse->block);
break;
case vtn_branch_type_switch_break:
case vtn_branch_type_loop_break:
case vtn_branch_type_loop_continue:
/* Switch breaks as well as loop breaks and continues can be
* used to break out of a switch construct or as direct targets
* of the OpSwitch.
*/
break;
default:
vtn_fail("Target of OpSwitch is not a valid structured exit "
"from the switch construct.");
}
struct vtn_block *case_block = vtn_block(b, *(w++));
struct hash_entry *case_entry =
_mesa_hash_table_search(block_to_case, case_block);
struct vtn_case *cse;
if (case_entry) {
cse = case_entry->data;
} else {
cse = rzalloc(b, struct vtn_case);
cse->node.type = vtn_cf_node_type_case;
cse->node.parent = &swtch->node;
list_inithead(&cse->body);
util_dynarray_init(&cse->values, b);
cse->type = vtn_handle_branch(b, &swtch->node, case_block);
switch (cse->type) {
case vtn_branch_type_none:
/* This is a "real" cases which has stuff in it */
vtn_fail_if(case_block->switch_case != NULL,
"OpSwitch has a case which is also in another "
"OpSwitch construct");
case_block->switch_case = cse;
vtn_add_cfg_work_item(b, work_list, &cse->node,
&cse->body, case_block);
break;
case vtn_branch_type_switch_break:
case vtn_branch_type_loop_break:
case vtn_branch_type_loop_continue:
/* Switch breaks as well as loop breaks and continues can be
* used to break out of a switch construct or as direct targets
* of the OpSwitch.
*/
break;
default:
vtn_fail("Target of OpSwitch is not a valid structured exit "
"from the switch construct.");
}
list_addtail(&cse->node.link, &swtch->cases);
_mesa_hash_table_insert(block_to_case, case_block, cse);
}
if (is_default) {
cse->is_default = true;
} else {
util_dynarray_append(&cse->values, uint64_t, literal);
}
is_default = false;
}
_mesa_hash_table_destroy(block_to_case, NULL);
return swtch->break_block;
}
@@ -1271,6 +1285,13 @@ vtn_emit_cf_list(struct vtn_builder *b, struct list_head *cf_list,
vtn_foreach_cf_node(case_node, &vtn_switch->cases) {
struct vtn_case *cse = vtn_cf_node_as_case(case_node);
/* If this case jumps directly to the break block, we don't have
* to handle the case as the body is empty and doesn't fall
* through.
*/
if (cse->block == vtn_switch->break_block)
continue;
/* Figure out the condition */
nir_ssa_def *cond =
vtn_switch_case_condition(b, vtn_switch, sel, cse);

View File

@@ -185,6 +185,8 @@ struct vtn_if {
struct vtn_case {
struct vtn_cf_node node;
struct vtn_block *block;
enum vtn_branch_type type;
struct list_head body;

View File

@@ -769,7 +769,8 @@ vl_mpeg12_end_frame(struct pipe_video_codec *decoder,
vl_vb_unmap(&buf->vertex_stream, dec->context);
dec->context->transfer_unmap(dec->context, buf->tex_transfer);
if (buf->tex_transfer)
dec->context->transfer_unmap(dec->context, buf->tex_transfer);
vb[0] = dec->quads;
vb[1] = dec->pos;
@@ -982,28 +983,28 @@ init_idct(struct vl_mpeg12_decoder *dec, const struct format_config* format_conf
nr_of_idct_render_targets = 1;
formats[0] = formats[1] = formats[2] = format_config->idct_source_format;
assert(pipe_format_to_chroma_format(formats[0]) == dec->base.chroma_format);
memset(&templat, 0, sizeof(templat));
templat.width = dec->base.width / 4;
templat.height = dec->base.height;
dec->idct_source = vl_video_buffer_create_ex
(
dec->context, &templat,
formats, 1, 1, PIPE_USAGE_DEFAULT
formats, 1, 1, PIPE_USAGE_DEFAULT,
PIPE_VIDEO_CHROMA_FORMAT_420
);
if (!dec->idct_source)
goto error_idct_source;
formats[0] = formats[1] = formats[2] = format_config->mc_source_format;
assert(pipe_format_to_chroma_format(formats[0]) == dec->base.chroma_format);
memset(&templat, 0, sizeof(templat));
templat.width = dec->base.width / nr_of_idct_render_targets;
templat.height = dec->base.height / 4;
dec->mc_source = vl_video_buffer_create_ex
(
dec->context, &templat,
formats, nr_of_idct_render_targets, 1, PIPE_USAGE_DEFAULT
formats, nr_of_idct_render_targets, 1, PIPE_USAGE_DEFAULT,
PIPE_VIDEO_CHROMA_FORMAT_420
);
if (!dec->mc_source)
@@ -1054,9 +1055,10 @@ init_mc_source_widthout_idct(struct vl_mpeg12_decoder *dec, const struct format_
dec->mc_source = vl_video_buffer_create_ex
(
dec->context, &templat,
formats, 1, 1, PIPE_USAGE_DEFAULT
formats, 1, 1, PIPE_USAGE_DEFAULT,
PIPE_VIDEO_CHROMA_FORMAT_420
);
return dec->mc_source != NULL;
}

View File

@@ -85,7 +85,8 @@ vl_video_buffer_template(struct pipe_resource *templ,
const struct pipe_video_buffer *tmpl,
enum pipe_format resource_format,
unsigned depth, unsigned array_size,
unsigned usage, unsigned plane)
unsigned usage, unsigned plane,
enum pipe_video_chroma_format chroma_format)
{
assert(0);
}

View File

@@ -352,11 +352,13 @@ vl_vb_unmap(struct vl_vertex_buffer *buffer, struct pipe_context *pipe)
assert(buffer && pipe);
for (i = 0; i < VL_NUM_COMPONENTS; ++i) {
pipe_buffer_unmap(pipe, buffer->ycbcr[i].transfer);
if (buffer->ycbcr[i].transfer)
pipe_buffer_unmap(pipe, buffer->ycbcr[i].transfer);
}
for (i = 0; i < VL_MAX_REF_FRAMES; ++i) {
pipe_buffer_unmap(pipe, buffer->mv[i].transfer);
if (buffer->mv[i].transfer)
pipe_buffer_unmap(pipe, buffer->mv[i].transfer);
}
}

View File

@@ -169,7 +169,8 @@ vl_video_buffer_template(struct pipe_resource *templ,
const struct pipe_video_buffer *tmpl,
enum pipe_format resource_format,
unsigned depth, unsigned array_size,
unsigned usage, unsigned plane)
unsigned usage, unsigned plane,
enum pipe_video_chroma_format chroma_format)
{
unsigned height = tmpl->height;
@@ -188,7 +189,7 @@ vl_video_buffer_template(struct pipe_resource *templ,
templ->usage = usage;
vl_video_buffer_adjust_size(&templ->width0, &height, plane,
pipe_format_to_chroma_format(tmpl->buffer_format), false);
chroma_format, false);
templ->height0 = height;
}
@@ -372,7 +373,8 @@ vl_video_buffer_create(struct pipe_context *pipe,
result = vl_video_buffer_create_ex
(
pipe, &templat, resource_formats,
1, tmpl->interlaced ? 2 : 1, PIPE_USAGE_DEFAULT
1, tmpl->interlaced ? 2 : 1, PIPE_USAGE_DEFAULT,
pipe_format_to_chroma_format(templat.buffer_format)
);
@@ -386,7 +388,8 @@ struct pipe_video_buffer *
vl_video_buffer_create_ex(struct pipe_context *pipe,
const struct pipe_video_buffer *tmpl,
const enum pipe_format resource_formats[VL_NUM_COMPONENTS],
unsigned depth, unsigned array_size, unsigned usage)
unsigned depth, unsigned array_size, unsigned usage,
enum pipe_video_chroma_format chroma_format)
{
struct pipe_resource res_tmpl;
struct pipe_resource *resources[VL_NUM_COMPONENTS];
@@ -396,7 +399,8 @@ vl_video_buffer_create_ex(struct pipe_context *pipe,
memset(resources, 0, sizeof resources);
vl_video_buffer_template(&res_tmpl, tmpl, resource_formats[0], depth, array_size, usage, 0);
vl_video_buffer_template(&res_tmpl, tmpl, resource_formats[0], depth, array_size,
usage, 0, chroma_format);
resources[0] = pipe->screen->resource_create(pipe->screen, &res_tmpl);
if (!resources[0])
goto error;
@@ -406,7 +410,8 @@ vl_video_buffer_create_ex(struct pipe_context *pipe,
return vl_video_buffer_create_ex2(pipe, tmpl, resources);
}
vl_video_buffer_template(&res_tmpl, tmpl, resource_formats[1], depth, array_size, usage, 1);
vl_video_buffer_template(&res_tmpl, tmpl, resource_formats[1], depth, array_size,
usage, 1, chroma_format);
resources[1] = pipe->screen->resource_create(pipe->screen, &res_tmpl);
if (!resources[1])
goto error;
@@ -414,7 +419,8 @@ vl_video_buffer_create_ex(struct pipe_context *pipe,
if (resource_formats[2] == PIPE_FORMAT_NONE)
return vl_video_buffer_create_ex2(pipe, tmpl, resources);
vl_video_buffer_template(&res_tmpl, tmpl, resource_formats[2], depth, array_size, usage, 2);
vl_video_buffer_template(&res_tmpl, tmpl, resource_formats[2], depth, array_size,
usage, 2, chroma_format);
resources[2] = pipe->screen->resource_create(pipe->screen, &res_tmpl);
if (!resources[2])
goto error;

View File

@@ -119,7 +119,8 @@ vl_video_buffer_template(struct pipe_resource *templ,
const struct pipe_video_buffer *templat,
enum pipe_format resource_format,
unsigned depth, unsigned array_size,
unsigned usage, unsigned plane);
unsigned usage, unsigned plane,
enum pipe_video_chroma_format chroma_format);
/**
* creates a video buffer, can be used as a standard implementation for pipe->create_video_buffer
@@ -135,7 +136,8 @@ struct pipe_video_buffer *
vl_video_buffer_create_ex(struct pipe_context *pipe,
const struct pipe_video_buffer *templat,
const enum pipe_format resource_formats[VL_NUM_COMPONENTS],
unsigned depth, unsigned array_size, unsigned usage);
unsigned depth, unsigned array_size, unsigned usage,
enum pipe_video_chroma_format chroma_format);
/**
* even more extended create function, provide the pipe_resource for each plane

View File

@@ -68,7 +68,7 @@ struct etna_sampler_view {
uint32_t TE_SAMPLER_SIZE;
uint32_t TE_SAMPLER_LOG_SIZE;
uint32_t TE_SAMPLER_ASTC0;
uint32_t TE_SAMPLER_LINEAR_STRIDE[VIVS_TE_SAMPLER_LINEAR_STRIDE__LEN];
uint32_t TE_SAMPLER_LINEAR_STRIDE; /* only LOD0 */
struct etna_reloc TE_SAMPLER_LOD_ADDR[VIVS_TE_SAMPLER_LOD_ADDR__LEN];
unsigned min_lod, max_lod; /* 5.5 fixp */
@@ -211,12 +211,11 @@ etna_create_sampler_view_state(struct pipe_context *pctx, struct pipe_resource *
if (res->layout == ETNA_LAYOUT_LINEAR && !util_format_is_compressed(so->format)) {
sv->TE_SAMPLER_CONFIG0 |= VIVS_TE_SAMPLER_CONFIG0_ADDRESSING_MODE(TEXTURE_ADDRESSING_MODE_LINEAR);
for (int lod = 0; lod <= res->base.last_level; ++lod)
sv->TE_SAMPLER_LINEAR_STRIDE[lod] = res->levels[lod].stride;
assert(res->base.last_level == 0);
sv->TE_SAMPLER_LINEAR_STRIDE = res->levels[0].stride;
} else {
sv->TE_SAMPLER_CONFIG0 |= VIVS_TE_SAMPLER_CONFIG0_ADDRESSING_MODE(TEXTURE_ADDRESSING_MODE_TILED);
memset(&sv->TE_SAMPLER_LINEAR_STRIDE, 0, sizeof(sv->TE_SAMPLER_LINEAR_STRIDE));
sv->TE_SAMPLER_LINEAR_STRIDE = 0;
}
sv->TE_SAMPLER_CONFIG1 |= COND(ext, VIVS_TE_SAMPLER_CONFIG1_FORMAT_EXT(format)) |
@@ -406,12 +405,11 @@ etna_emit_texture_state(struct etna_context *ctx)
}
}
if (unlikely(dirty & (ETNA_DIRTY_SAMPLER_VIEWS))) {
for (int y = 0; y < VIVS_TE_SAMPLER_LINEAR_STRIDE__LEN; ++y) {
for (int x = 0; x < VIVS_TE_SAMPLER__LEN; ++x) {
if ((1 << x) & active_samplers) {
struct etna_sampler_view *sv = etna_sampler_view(ctx->sampler_view[x]);
/*02C00*/ EMIT_STATE(TE_SAMPLER_LINEAR_STRIDE(x, y), sv->TE_SAMPLER_LINEAR_STRIDE[y]);
}
/* only LOD0 is valid for this register */
for (int x = 0; x < VIVS_TE_SAMPLER__LEN; ++x) {
if ((1 << x) & active_samplers) {
struct etna_sampler_view *sv = etna_sampler_view(ctx->sampler_view[x]);
/*02C00*/ EMIT_STATE(TE_SAMPLER_LINEAR_STRIDE(0, x), sv->TE_SAMPLER_LINEAR_STRIDE);
}
}
}

View File

@@ -66,6 +66,8 @@ struct pipe_video_buffer *r600_video_buffer_create(struct pipe_context *pipe,
struct pipe_video_buffer template;
struct pipe_resource templ;
unsigned i, array_size;
enum pipe_video_chroma_format chroma_format =
pipe_format_to_chroma_format(tmpl->buffer_format);
assert(pipe);
@@ -77,7 +79,8 @@ struct pipe_video_buffer *r600_video_buffer_create(struct pipe_context *pipe,
template.width = align(tmpl->width, VL_MACROBLOCK_WIDTH);
template.height = align(tmpl->height / array_size, VL_MACROBLOCK_HEIGHT);
vl_video_buffer_template(&templ, &template, resource_formats[0], 1, array_size, PIPE_USAGE_DEFAULT, 0);
vl_video_buffer_template(&templ, &template, resource_formats[0], 1, array_size,
PIPE_USAGE_DEFAULT, 0, chroma_format);
if (ctx->b.chip_class < EVERGREEN || tmpl->interlaced || !R600_UVD_ENABLE_TILING)
templ.bind = PIPE_BIND_LINEAR;
resources[0] = (struct r600_texture *)
@@ -86,7 +89,8 @@ struct pipe_video_buffer *r600_video_buffer_create(struct pipe_context *pipe,
goto error;
if (resource_formats[1] != PIPE_FORMAT_NONE) {
vl_video_buffer_template(&templ, &template, resource_formats[1], 1, array_size, PIPE_USAGE_DEFAULT, 1);
vl_video_buffer_template(&templ, &template, resource_formats[1], 1, array_size,
PIPE_USAGE_DEFAULT, 1, chroma_format);
if (ctx->b.chip_class < EVERGREEN || tmpl->interlaced || !R600_UVD_ENABLE_TILING)
templ.bind = PIPE_BIND_LINEAR;
resources[1] = (struct r600_texture *)
@@ -96,7 +100,8 @@ struct pipe_video_buffer *r600_video_buffer_create(struct pipe_context *pipe,
}
if (resource_formats[2] != PIPE_FORMAT_NONE) {
vl_video_buffer_template(&templ, &template, resource_formats[2], 1, array_size, PIPE_USAGE_DEFAULT, 2);
vl_video_buffer_template(&templ, &template, resource_formats[2], 1, array_size,
PIPE_USAGE_DEFAULT, 2, chroma_format);
if (ctx->b.chip_class < EVERGREEN || tmpl->interlaced || !R600_UVD_ENABLE_TILING)
templ.bind = PIPE_BIND_LINEAR;
resources[2] = (struct r600_texture *)

View File

@@ -677,27 +677,26 @@ static void si_setup_nir_user_data(struct si_context *sctx, const struct pipe_gr
12 * sel->info.uses_grid_size;
unsigned cs_user_data_reg = block_size_reg + 12 * program->reads_variable_block_size;
if (info->indirect) {
if (sel->info.uses_grid_size) {
if (sel->info.uses_grid_size) {
if (info->indirect) {
for (unsigned i = 0; i < 3; ++i) {
si_cp_copy_data(sctx, sctx->gfx_cs, COPY_DATA_REG, NULL, (grid_size_reg >> 2) + i,
COPY_DATA_SRC_MEM, si_resource(info->indirect),
info->indirect_offset + 4 * i);
}
}
} else {
if (sel->info.uses_grid_size) {
} else {
radeon_set_sh_reg_seq(cs, grid_size_reg, 3);
radeon_emit(cs, info->grid[0]);
radeon_emit(cs, info->grid[1]);
radeon_emit(cs, info->grid[2]);
}
if (program->reads_variable_block_size) {
radeon_set_sh_reg_seq(cs, block_size_reg, 3);
radeon_emit(cs, info->block[0]);
radeon_emit(cs, info->block[1]);
radeon_emit(cs, info->block[2]);
}
}
if (program->reads_variable_block_size) {
radeon_set_sh_reg_seq(cs, block_size_reg, 3);
radeon_emit(cs, info->block[0]);
radeon_emit(cs, info->block[1]);
radeon_emit(cs, info->block[2]);
}
if (program->num_cs_user_data_dwords) {

View File

@@ -834,11 +834,12 @@ blorp_can_hiz_clear_depth(const struct gen_device_info *devinfo,
const bool unaligned = (slice_x0 + x0) % 16 || (slice_y0 + y0) % 8 ||
(max_x1_y1 ? haligned_x1 % 16 || valigned_y1 % 8 :
x1 % 16 || y1 % 8);
const bool alignment_used = surf->levels > 1 ||
surf->logical_level0_px.depth > 1 ||
surf->logical_level0_px.array_len > 1;
const bool partial_clear = x0 > 0 || y0 > 0 || !max_x1_y1;
const bool multislice_surf = surf->levels > 1 ||
surf->logical_level0_px.depth > 1 ||
surf->logical_level0_px.array_len > 1;
if (unaligned && alignment_used)
if (unaligned && (partial_clear || multislice_surf))
return false;
}

View File

@@ -901,6 +901,11 @@ enum surface_logical_srcs {
SURFACE_LOGICAL_SRC_IMM_DIMS,
/** Per-opcode immediate argument. For atomics, this is the atomic opcode */
SURFACE_LOGICAL_SRC_IMM_ARG,
/**
* Some instructions with side-effects should not be predicated on
* sample mask, e.g. lowered stores to scratch.
*/
SURFACE_LOGICAL_SRC_ALLOW_SAMPLE_MASK,
SURFACE_LOGICAL_NUM_SRCS
};

View File

@@ -5462,7 +5462,10 @@ lower_surface_logical_send(const fs_builder &bld, fs_inst *inst)
const fs_reg &surface_handle = inst->src[SURFACE_LOGICAL_SRC_SURFACE_HANDLE];
const UNUSED fs_reg &dims = inst->src[SURFACE_LOGICAL_SRC_IMM_DIMS];
const fs_reg &arg = inst->src[SURFACE_LOGICAL_SRC_IMM_ARG];
const fs_reg &allow_sample_mask =
inst->src[SURFACE_LOGICAL_SRC_ALLOW_SAMPLE_MASK];
assert(arg.file == IMM);
assert(allow_sample_mask.file == IMM);
/* We must have exactly one of surface and surface_handle */
assert((surface.file == BAD_FILE) != (surface_handle.file == BAD_FILE));
@@ -5486,8 +5489,9 @@ lower_surface_logical_send(const fs_builder &bld, fs_inst *inst)
surface.ud == GEN8_BTI_STATELESS_NON_COHERENT);
const bool has_side_effects = inst->has_side_effects();
fs_reg sample_mask = has_side_effects ? sample_mask_reg(bld) :
fs_reg(brw_imm_d(0xffff));
fs_reg sample_mask = allow_sample_mask.ud ? sample_mask_reg(bld) :
fs_reg(brw_imm_d(0xffff));
/* From the BDW PRM Volume 7, page 147:
*

View File

@@ -3767,6 +3767,7 @@ fs_visitor::nir_emit_cs_intrinsic(const fs_builder &bld,
srcs[SURFACE_LOGICAL_SRC_SURFACE] = brw_imm_ud(surface);
srcs[SURFACE_LOGICAL_SRC_IMM_DIMS] = brw_imm_ud(1);
srcs[SURFACE_LOGICAL_SRC_IMM_ARG] = brw_imm_ud(1); /* num components */
srcs[SURFACE_LOGICAL_SRC_ALLOW_SAMPLE_MASK] = brw_imm_ud(0);
/* Read the 3 GLuint components of gl_NumWorkGroups */
for (unsigned i = 0; i < 3; i++) {
@@ -3804,6 +3805,7 @@ fs_visitor::nir_emit_cs_intrinsic(const fs_builder &bld,
srcs[SURFACE_LOGICAL_SRC_SURFACE] = brw_imm_ud(GEN7_BTI_SLM);
srcs[SURFACE_LOGICAL_SRC_ADDRESS] = get_nir_src(instr->src[0]);
srcs[SURFACE_LOGICAL_SRC_IMM_DIMS] = brw_imm_ud(1);
srcs[SURFACE_LOGICAL_SRC_ALLOW_SAMPLE_MASK] = brw_imm_ud(0);
/* Make dest unsigned because that's what the temporary will be */
dest.type = brw_reg_type_from_bit_size(bit_size, BRW_REGISTER_TYPE_UD);
@@ -3840,6 +3842,7 @@ fs_visitor::nir_emit_cs_intrinsic(const fs_builder &bld,
srcs[SURFACE_LOGICAL_SRC_SURFACE] = brw_imm_ud(GEN7_BTI_SLM);
srcs[SURFACE_LOGICAL_SRC_ADDRESS] = get_nir_src(instr->src[1]);
srcs[SURFACE_LOGICAL_SRC_IMM_DIMS] = brw_imm_ud(1);
srcs[SURFACE_LOGICAL_SRC_ALLOW_SAMPLE_MASK] = brw_imm_ud(1);
fs_reg data = get_nir_src(instr->src[0]);
data.type = brw_reg_type_from_bit_size(bit_size, BRW_REGISTER_TYPE_UD);
@@ -4123,6 +4126,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
if (instr->intrinsic == nir_intrinsic_image_load ||
instr->intrinsic == nir_intrinsic_bindless_image_load) {
srcs[SURFACE_LOGICAL_SRC_IMM_ARG] = brw_imm_ud(instr->num_components);
srcs[SURFACE_LOGICAL_SRC_ALLOW_SAMPLE_MASK] = brw_imm_ud(0);
fs_inst *inst =
bld.emit(SHADER_OPCODE_TYPED_SURFACE_READ_LOGICAL,
dest, srcs, SURFACE_LOGICAL_NUM_SRCS);
@@ -4131,6 +4135,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
instr->intrinsic == nir_intrinsic_bindless_image_store) {
srcs[SURFACE_LOGICAL_SRC_IMM_ARG] = brw_imm_ud(instr->num_components);
srcs[SURFACE_LOGICAL_SRC_DATA] = get_nir_src(instr->src[3]);
srcs[SURFACE_LOGICAL_SRC_ALLOW_SAMPLE_MASK] = brw_imm_ud(1);
bld.emit(SHADER_OPCODE_TYPED_SURFACE_WRITE_LOGICAL,
fs_reg(), srcs, SURFACE_LOGICAL_NUM_SRCS);
} else {
@@ -4153,6 +4158,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
data = tmp;
}
srcs[SURFACE_LOGICAL_SRC_DATA] = data;
srcs[SURFACE_LOGICAL_SRC_ALLOW_SAMPLE_MASK] = brw_imm_ud(1);
bld.emit(SHADER_OPCODE_TYPED_ATOMIC_LOGICAL,
dest, srcs, SURFACE_LOGICAL_NUM_SRCS);
@@ -4210,6 +4216,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
srcs[SURFACE_LOGICAL_SRC_ADDRESS] = get_nir_src(instr->src[1]);
srcs[SURFACE_LOGICAL_SRC_IMM_DIMS] = brw_imm_ud(1);
srcs[SURFACE_LOGICAL_SRC_IMM_ARG] = brw_imm_ud(instr->num_components);
srcs[SURFACE_LOGICAL_SRC_ALLOW_SAMPLE_MASK] = brw_imm_ud(0);
fs_inst *inst =
bld.emit(SHADER_OPCODE_UNTYPED_SURFACE_READ_LOGICAL,
@@ -4229,6 +4236,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
srcs[SURFACE_LOGICAL_SRC_DATA] = get_nir_src(instr->src[2]);
srcs[SURFACE_LOGICAL_SRC_IMM_DIMS] = brw_imm_ud(1);
srcs[SURFACE_LOGICAL_SRC_IMM_ARG] = brw_imm_ud(instr->num_components);
srcs[SURFACE_LOGICAL_SRC_ALLOW_SAMPLE_MASK] = brw_imm_ud(1);
bld.emit(SHADER_OPCODE_UNTYPED_SURFACE_WRITE_LOGICAL,
fs_reg(), srcs, SURFACE_LOGICAL_NUM_SRCS);
@@ -4643,6 +4651,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
get_nir_ssbo_intrinsic_index(bld, instr);
srcs[SURFACE_LOGICAL_SRC_ADDRESS] = get_nir_src(instr->src[1]);
srcs[SURFACE_LOGICAL_SRC_IMM_DIMS] = brw_imm_ud(1);
srcs[SURFACE_LOGICAL_SRC_ALLOW_SAMPLE_MASK] = brw_imm_ud(0);
/* Make dest unsigned because that's what the temporary will be */
dest.type = brw_reg_type_from_bit_size(bit_size, BRW_REGISTER_TYPE_UD);
@@ -4682,6 +4691,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
get_nir_ssbo_intrinsic_index(bld, instr);
srcs[SURFACE_LOGICAL_SRC_ADDRESS] = get_nir_src(instr->src[2]);
srcs[SURFACE_LOGICAL_SRC_IMM_DIMS] = brw_imm_ud(1);
srcs[SURFACE_LOGICAL_SRC_ALLOW_SAMPLE_MASK] = brw_imm_ud(1);
fs_reg data = get_nir_src(instr->src[0]);
data.type = brw_reg_type_from_bit_size(bit_size, BRW_REGISTER_TYPE_UD);
@@ -4820,6 +4830,7 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
srcs[SURFACE_LOGICAL_SRC_IMM_DIMS] = brw_imm_ud(1);
srcs[SURFACE_LOGICAL_SRC_IMM_ARG] = brw_imm_ud(bit_size);
srcs[SURFACE_LOGICAL_SRC_ALLOW_SAMPLE_MASK] = brw_imm_ud(0);
const fs_reg nir_addr = get_nir_src(instr->src[0]);
/* Make dest unsigned because that's what the temporary will be */
@@ -4865,6 +4876,14 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr
srcs[SURFACE_LOGICAL_SRC_IMM_DIMS] = brw_imm_ud(1);
srcs[SURFACE_LOGICAL_SRC_IMM_ARG] = brw_imm_ud(bit_size);
/**
* While this instruction has side-effects, it should not be predicated
* on sample mask, because otherwise fs helper invocations would
* load undefined values from scratch memory. And scratch memory
* load-stores are produced from operations without side-effects, thus
* they should not have different behaviour in the helper invocations.
*/
srcs[SURFACE_LOGICAL_SRC_ALLOW_SAMPLE_MASK] = brw_imm_ud(0);
const fs_reg nir_addr = get_nir_src(instr->src[1]);
fs_reg data = get_nir_src(instr->src[0]);
@@ -5316,6 +5335,7 @@ fs_visitor::nir_emit_ssbo_atomic(const fs_builder &bld,
srcs[SURFACE_LOGICAL_SRC_ADDRESS] = get_nir_src(instr->src[1]);
srcs[SURFACE_LOGICAL_SRC_IMM_DIMS] = brw_imm_ud(1);
srcs[SURFACE_LOGICAL_SRC_IMM_ARG] = brw_imm_ud(op);
srcs[SURFACE_LOGICAL_SRC_ALLOW_SAMPLE_MASK] = brw_imm_ud(1);
fs_reg data;
if (op != BRW_AOP_INC && op != BRW_AOP_DEC && op != BRW_AOP_PREDEC)
@@ -5351,6 +5371,7 @@ fs_visitor::nir_emit_ssbo_atomic_float(const fs_builder &bld,
srcs[SURFACE_LOGICAL_SRC_ADDRESS] = get_nir_src(instr->src[1]);
srcs[SURFACE_LOGICAL_SRC_IMM_DIMS] = brw_imm_ud(1);
srcs[SURFACE_LOGICAL_SRC_IMM_ARG] = brw_imm_ud(op);
srcs[SURFACE_LOGICAL_SRC_ALLOW_SAMPLE_MASK] = brw_imm_ud(1);
fs_reg data = get_nir_src(instr->src[2]);
if (op == BRW_AOP_FCMPWR) {
@@ -5379,6 +5400,7 @@ fs_visitor::nir_emit_shared_atomic(const fs_builder &bld,
srcs[SURFACE_LOGICAL_SRC_SURFACE] = brw_imm_ud(GEN7_BTI_SLM);
srcs[SURFACE_LOGICAL_SRC_IMM_DIMS] = brw_imm_ud(1);
srcs[SURFACE_LOGICAL_SRC_IMM_ARG] = brw_imm_ud(op);
srcs[SURFACE_LOGICAL_SRC_ALLOW_SAMPLE_MASK] = brw_imm_ud(1);
fs_reg data;
if (op != BRW_AOP_INC && op != BRW_AOP_DEC && op != BRW_AOP_PREDEC)
@@ -5420,6 +5442,7 @@ fs_visitor::nir_emit_shared_atomic_float(const fs_builder &bld,
srcs[SURFACE_LOGICAL_SRC_SURFACE] = brw_imm_ud(GEN7_BTI_SLM);
srcs[SURFACE_LOGICAL_SRC_IMM_DIMS] = brw_imm_ud(1);
srcs[SURFACE_LOGICAL_SRC_IMM_ARG] = brw_imm_ud(op);
srcs[SURFACE_LOGICAL_SRC_ALLOW_SAMPLE_MASK] = brw_imm_ud(1);
fs_reg data = get_nir_src(instr->src[1]);
if (op == BRW_AOP_FCMPWR) {

View File

@@ -77,6 +77,7 @@ namespace {
case BRW_OPCODE_DO:
case SHADER_OPCODE_UNDEF:
case FS_OPCODE_PLACEHOLDER_HALT:
case FS_OPCODE_SCHEDULING_FENCE:
return 0;
default:
/* Note that the following is inaccurate for virtual instructions

View File

@@ -107,12 +107,29 @@ brw_nir_clamp_image_1d_2d_array_sizes(nir_shader *shader)
b.cursor = nir_after_instr(instr);
nir_ssa_def *components[4];
/* OR all the sizes for all components but the last. */
nir_ssa_def *or_components = nir_imm_int(&b, 0);
for (int i = 0; i < image_size->num_components; i++) {
if (i == (image_size->num_components - 1)) {
components[i] = nir_imax(&b, nir_channel(&b, image_size, i),
nir_imm_int(&b, 1));
nir_ssa_def *null_or_size[2] = {
nir_imm_int(&b, 0),
nir_imax(&b, nir_channel(&b, image_size, i),
nir_imm_int(&b, 1)),
};
nir_ssa_def *vec2_null_or_size = nir_vec(&b, null_or_size, 2);
/* Using the ORed sizes select either the element 0 or 1
* from this vec2. For NULL textures which have a size of
* 0x0x0, we'll select the first element which is 0 and for
* the rest MAX(depth, 1).
*/
components[i] =
nir_vector_extract(&b, vec2_null_or_size,
nir_imin(&b, or_components,
nir_imm_int(&b, 1)));
} else {
components[i] = nir_channel(&b, image_size, i);
or_components = nir_ior(&b, components[i], or_components);
}
}
nir_ssa_def *image_size_replacement =

View File

@@ -203,7 +203,7 @@ libvulkan_intel = shared_library(
idep_nir, idep_genxml, idep_vulkan_util, idep_mesautil, idep_xmlconfig,
],
c_args : anv_flags,
link_args : ['-Wl,--build-id=sha1', ld_args_bsymbolic, ld_args_gc_sections],
link_args : [ld_args_build_id, ld_args_bsymbolic, ld_args_gc_sections],
install : true,
)

View File

@@ -343,6 +343,7 @@ get_fb0_attachment(struct gl_context *ctx, struct gl_framebuffer *fb,
}
switch (attachment) {
case GL_FRONT:
case GL_FRONT_LEFT:
/* Front buffers can be allocated on the first use, but
* glGetFramebufferAttachmentParameteriv must work even if that

View File

@@ -1043,10 +1043,12 @@ copy_uniforms_to_storage(gl_constant_value *storage,
const unsigned offset, const unsigned components,
enum glsl_base_type basicType)
{
if (!uni->type->is_boolean() && !uni->is_bindless) {
bool copy_as_uint64 = uni->is_bindless &&
(uni->type->is_sampler() || uni->type->is_image());
if (!uni->type->is_boolean() && !copy_as_uint64) {
memcpy(storage, values,
sizeof(storage[0]) * components * count * size_mul);
} else if (uni->is_bindless) {
} else if (copy_as_uint64) {
const union gl_constant_value *src =
(const union gl_constant_value *) values;
GLuint64 *dst = (GLuint64 *)&storage->i;

View File

@@ -132,7 +132,7 @@ st_convert_sampler(const struct st_context *st,
* levels.
*/
sampler->lod_bias = CLAMP(sampler->lod_bias, -16, 16);
sampler->lod_bias = floorf(sampler->lod_bias * 256) / 256;
sampler->lod_bias = roundf(sampler->lod_bias * 256) / 256;
sampler->min_lod = MAX2(msamp->MinLod, 0.0f);
sampler->max_lod = msamp->MaxLod;

View File

@@ -109,7 +109,8 @@ st_server_wait_semaphore(struct gl_context *ctx,
continue;
bufObj = st_buffer_object(bufObjs[i]);
pipe->flush_resource(pipe, bufObj->buffer);
if (bufObj->buffer)
pipe->flush_resource(pipe, bufObj->buffer);
}
for (unsigned i = 0; i < numTextureBarriers; i++) {
@@ -117,7 +118,8 @@ st_server_wait_semaphore(struct gl_context *ctx,
continue;
texObj = st_texture_object(texObjs[i]);
pipe->flush_resource(pipe, texObj->pt);
if (texObj->pt)
pipe->flush_resource(pipe, texObj->pt);
}
}
@@ -141,7 +143,8 @@ st_server_signal_semaphore(struct gl_context *ctx,
continue;
bufObj = st_buffer_object(bufObjs[i]);
pipe->flush_resource(pipe, bufObj->buffer);
if (bufObj->buffer)
pipe->flush_resource(pipe, bufObj->buffer);
}
for (unsigned i = 0; i < numTextureBarriers; i++) {
@@ -149,7 +152,8 @@ st_server_signal_semaphore(struct gl_context *ctx,
continue;
texObj = st_texture_object(texObjs[i]);
pipe->flush_resource(pipe, texObj->pt);
if (texObj->pt)
pipe->flush_resource(pipe, texObj->pt);
}
/* The driver is allowed to flush during fence_server_signal, be prepared */