The kernel driver requires immediate notification using a
BindGBSurface command when a graphics coherent memory resource changes
backing MOB, so that it can start dirty-tracking the new MOB.
Since we always use graphics coherent memory for persistent memory, enqueue
and flush a BindGBSurface commmand at map time rather than at unmap time.
Since we're dealing with persistent memory, It's OK to flush while mapped.
This fixes an issue with gnome-shell / Wayland which uses persistent
memory together with discard maps when we advertise ARB_buffer_storage.
XWayland clients will render incorrectly.
Fixes: 71b43490dd ("svga: Support ARB_buffer_storage")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4399>
(cherry picked from commit 46fdc288fb)
If we have an allocation that's exactly the block size, we end up
computing a new block size to allocate that's exactly the block size,
add in the header, and then assert fail. When computing the block size,
we need to account for the header.
Fixes: 955127db93 "anv/allocator: Add support for large stream..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4336>
(cherry picked from commit 63bec07e14)
This code produces warnings, so let's fix that. The problem is that
casting a pointer to an integer of non-pointer-size triggers warnings on
MSVC, and on 64-bit Windows unsigned long is 32-bit large.
So let's instead use uintptr_t, which is exactly for these kinds of
things.
While we're at it, let's make the resulting index a plain "unsigned",
which is the type this originated from before we started with this
cast-dance.
Fixes: 1a66ead1c7 ("pipebuffer, winsys/svga: Add functionality to update pb_validate_entry flags")
Reviewed-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4297>
(cherry picked from commit 079cb4949d)
Unlike other stages TCS outputs not read by the TES cannot always
be demoted to globals e.g. when they are read by other TCS
invocations.
We were not taking these outputs into account when packing which
could result in other outputs being assigned to the same location.
Here we make sure to gather information on these outputs and group
them together when packing.
This fixes rendering issues in QUBE 2 via Proton.
Closes: #2653
Fixes: 26aa460940 ("nir: rewrite varying component packing")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4328>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4328>
(cherry picked from commit b5e00f5c2b)
This has been reported to fix a hang in Shadow of Mordor on Gen12.
One of its compute shaders seems to cause an in-order exec_all
dependency to be merged into an out-of-order SET dependency slot,
which would prevent us from baking the SET dependency into the parent
instruction, leading to an assert failure in emit_inst_dependencies()
(Thanks to Rafael for noticing that). Prevent that by avoiding
combination of in-order dependencies whenever that would cause a SET
dependency to be demoted to a SYNC.NOP instruction.
Fixes: e14529ff32 "intel/fs/gen12: Workaround data coherency issues due to broken NoMask control flow."
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
(cherry picked from commit 36c155a017)
At least GC880 (iMX6S), GC2000 (iMX6Q) blobs do not emit the
PE.ALPHA_COLOR_EXT0 and PE.ALPHA_COLOR_EXT1 into the command
stream. The GCnano (STM32MP1) is not affected by this change
either. This is because neither of these GPUs support the
half-float feature.
Emit PE.ALPHA_COLOR_EXT* in etnaviv only if half-float support
is present in the GPU. This fixes all of the currently failing
dEQPs in this group:
dEQP-GLES2.functional.fragment_ops.blend.*
Fixes: 76adf041f2 ("etnaviv: fix blend color on newer GPUs")
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4277>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4277>
(cherry picked from commit 9e78f17b74)
Starting with Gen12, we can fast-clear a lot more surface formats and we
are suddenly in the position of having to fast-clear surfaces with
formats with an implicit swizzle such as VK_FORMAT_R4G4B4A4_UNORM_PACK16
which is represented as ISL_FORMAT_A4B4G4R4 with a BGRA swizzle. In
order for blorp to do the fast-clear color conversion for us, it needs
a properly swizzled color.
This fixes the following Vulkan CTS groups on TGL:
- dEQP-VK.pipeline.blend.format.b4g4r4a4_unorm_pack16.*
- dEQP-VK.api.image_clearing.core.clear_color_image.*.b4g4r4a4*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4218>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4218>
(cherry picked from commit 46187bb54f)
This function has code like:
if (0x7FD <= zExp) {
if ((0x7FD < zExp) ||
((zExp == 0x7FD) &&
(0x001FFFFFu == zFrac0 && 0xFFFFFFFFu == zFrac1) &&
increment)) {
...
return ...;
}
if (zExp < 0) {
I saw that, and I thought, "Uh... what? Dead code?" I thought it was a
bit fishy, so I grabbed the Berkeley SoftFloat Library 3e code, and
there is similar code in softfloat_roundPackToF64
(source/s_roundPackToF64.c), but it has an extra (uint16_t) cast in the
first comparison. This is basicially a shortcut for
if (zExp < 0 || zExp >= 0x7FD) {
So, having the nesting kind of makes sense. On a CPU, nesting the flow
control can be an optimization. On a GPU, it's just fail. Split the
block so that we don't need the uint16_t cast magic.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 683638 -> 658127 (-3.73%)
instructions in affected programs: 666839 -> 641328 (-3.83%)
helped: 92
HURT: 0
helped stats (abs) min: 26 max: 2456 x̄: 277.29 x̃: 144
helped stats (rel) min: 3.21% max: 4.22% x̄: 3.79% x̃: 3.90%
95% mean confidence interval for instructions value: -345.84 -208.75
95% mean confidence interval for instructions %-change: -3.86% -3.73%
Instructions are helped.
total cycles in shared programs: 5458858 -> 5344600 (-2.09%)
cycles in affected programs: 5360114 -> 5245856 (-2.13%)
helped: 92
HURT: 0
helped stats (abs) min: 126 max: 10300 x̄: 1241.93 x̃: 655
helped stats (rel) min: 1.71% max: 2.37% x̄: 2.12% x̃: 2.17%
95% mean confidence interval for cycles value: -1539.93 -943.94
95% mean confidence interval for cycles %-change: -2.16% -2.08%
Cycles are helped.
Fixes: f111d72596 ("glsl: Add "built-in" functions to do add(fp64, fp64)")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
(cherry picked from commit bf2eb3e0ee)
fsat is defined as min(max(a, 0.0), 1.0), and IEEE defines both min and
max to return the non-NaN value when one value is NaN. Based on this,
fsat should definitely return 0.0 for NaN.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 841666 -> 841647 (<.01%)
instructions in affected programs: 122033 -> 122014 (-0.02%)
helped: 7
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 2.71 x̃: 3
helped stats (rel) min: 0.01% max: 0.02% x̄: 0.02% x̃: 0.01%
95% mean confidence interval for instructions value: -3.74 -1.69
95% mean confidence interval for instructions %-change: -0.02% -0.01%
Instructions are helped.
total cycles in shared programs: 6927246 -> 6926904 (<.01%)
cycles in affected programs: 1038987 -> 1038645 (-0.03%)
helped: 7
HURT: 0
helped stats (abs) min: 18 max: 72 x̄: 48.86 x̃: 54
helped stats (rel) min: 0.03% max: 0.05% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -67.38 -30.33
95% mean confidence interval for cycles %-change: -0.05% -0.02%
Cycles are helped.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: a42163cbbc ("compiler: Add lowering support for 64-bit saturate operations to software")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2585
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
(cherry picked from commit 7673dcbd21)
In Fedora 32 build was failing with meson-0.53.2-1.git88e40c7.fc32
and gcc-10.0.1-0.9.fc32.x86_64.
Worked with meson-0.53.1-1 and same gcc.
/usr/bin/ld: src/gallium/state_trackers/dri/libdri.a(dri2.c.o): in function `dri2_interop_export_object':
/home/airlied/devel/mesa/mesa/build/../src/gallium/state_trackers/dri/dri2.c:1813: undefined reference to `st_finalize_texture'
/usr/bin/ld: src/gallium/state_trackers/dri/libdri.a(dri_screen.c.o): in function `dri_init_screen_helper':
/home/airlied/devel/mesa/mesa/build/../src/gallium/state_trackers/dri/dri_screen.c:580: undefined reference to `st_gl_api_create'
Moving this around seems to fix it.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4220>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4220>
(cherry picked from commit 040ce9a1b3)
Instead of uint64_t. Fixes potentially writing beyond the end of the
handles pointer array on 32-bit architectures (and copying all 0s
instead of the computed pointer values to the array on big endian
ones).
Corresponding compiler warning:
../src/gallium/drivers/llvmpipe/lp_state_cs.c: In function ‘llvmpipe_set_global_binding’:
../src/gallium/drivers/llvmpipe/lp_state_cs.c:1312:12: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
1312 | va = (uint64_t)((char *)lp_res->data + offset);
| ^
Fixes: 264663d55d "gallivm/llvmpipe: add support for global
operations."
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4166>
(cherry picked from commit 106bf59ca9)
There are multiple LLVM passes that very much move the
intrinsic using the descriptor outside of the loop, defeating
the entire point of creating the loop.
Defeat the optimizer by splitting the break into a separate
if-statement and putting an optimization barrier on the bool
in between.
v2: Move from a callback based system to begin/end loop.
This does not make it significantly less intrusive but
is a bit nicer with all the extra struct and callback
stubs.
v3: Deal with non-divergent values in divergent path.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2160
Fixes: 028ce52739 "radv: Add non-uniform indexing lowering."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit b83c9aca4a)
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4171>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4171>
The freedreno gen_header.py script now only works under python3.
It contains a "print()" call which prints a blank line under python3
but prints "()" under python2.7.
However the Android build currently uses python2.
This leads to incorrect code generation and a later build error.
.../STATIC_LIBRARIES/libfreedreno_registers_intermediates/registers/adreno_common.xml.h:163:2: error: expected identifier or '('
()
Fix this by adding MESA_PYTHON3 and using it for the freedreno scripts.
Signed-off-by: Martin Fuzzey <martin.fuzzey@flowbird.group>
(cherry picked from commit cad400a59e)
Signed-off-by: John Stultz <john.stultz@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4151>
The current workaround for this hardware bug involved marking the ADD
instruction used to initialize the address register as NoMask on
Gen12, which was based on the assumption that the problem was caused
by a hardware bug affecting the application of the execution mask to
the address register write.
However that doesn't seem to be the case: The address register write
was working correctly, the real problem leading to hangs on TGL is
that the indirect addressing logic is unable to deal with garbage
values in the address register (e.g. misaligned offsets), even for
channels which are currently inactive due to non-uniform control flow.
The current workaround isn't able to avoid that situation in general,
since the result of the NoMask ADD instruction for a dead channel is
calculated based on the corresponding (dead) component of the
indirect_byte_offset source, which would still be undefined in the
likely case that the source was initialized under control flow itself.
This would lead to hangs whenever MOV_INDIRECT was used under
non-uniform control flow in some scenarios like a tessellation shader
from GFXBench5/gl_4 (AKA Car Chase) on TGL. In addition I've managed
to reproduce the same issue on earlier platforms by initializing the
whole address register with garbage before the ADD instruction, so
this seems to be a long-standing issue we have avoided mostly by luck.
This patch fixes the problem and applies the workaround to all
platforms, since even when the hardware is able to deal with garbage
address values without hanging there might be a significant
performance cost from reading random GRF registers due to the useless
extra EU cycles spent fetching registers for dead channels and due to
the potential for unintended serialization with respect to other
random instructions that could be executed in parallel, which may have
had a cost of the order of hundreds of cycles in the worst case
scenario.
Fixes: f93dfb509c "intel/fs: Write the address register with NoMask for MOV_INDIRECT"
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 45d4665dc7)
We depend on BLORP to convert the clear color and write it into the
clear color buffer for us. However, we weren't bothering to call blorp
in the case where the state is ISL_AUX_STATE_CLEAR. This leads to the
clear color not getting properly updated if we have back-to-back clears
with different clear colors. Technically, we could go out of our way to
set the clear color directly from iris in this case but this is a case
we're unlikely to see in the wild so let's not bother. This matches
what we already do for color surfaces.
Cc: mesa-stable@lists.freedesktop.org
Reported-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4073>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4073>
(cherry picked from commit 9d07d59842)
This reverts commit f04d7439a0.
It no longer helps performance and the current vbo implementation is
faster anyway.
The app that hit this was a CAD program called Spazio3D. It made pretty
terrible use of the OpenGL API and we sent them some tips for improvements.
I'm assuming they've fixed this by now.
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4052>
(cherry picked from commit df3891e74a)
This reverts commit 35fc7bdf0e.
Unfortunately mentioned commit introduced a memory leak because
`driwindowsMapConfigs` and `createDriMode` functions allocate
small memory portions for each element:
21,576 (232 direct, 21,344 indirect) bytes in 1 blocks are definitely lost in loss record 1,411 of 1,414
at 0x483A7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x5D4AA09: createDriMode (dri_common.c:291)
by 0x5D4ABF5: driConvertConfigs (dri_common.c:310)
by 0x5D58414: dri3_create_screen (dri3_glx.c:945)
by 0x5D39829: AllocAndFetchScreenConfigs (glxext.c:815)
by 0x5D39C57: __glXInitialize (glxext.c:941)
by 0x5D3290A: GetGLXPrivScreenConfig (glxcmds.c:174)
by 0x5D34F38: glXQueryExtensionsString (glxcmds.c:1307)
by 0x4F83038: glXQueryExtensionsString (in /usr/local/lib/libGL.so.1.7.0)
by 0x4F2EA6B: ??? (in /usr/lib/x86_64-linux-gnu/libwaffle-1.so.0.6.0)
by 0x4F2A0D7: waffle_display_connect (in /usr/lib/x86_64-linux-gnu/libwaffle-1.so.0.6.0)
by 0x498F42A: wfl_checked_display_connect (piglit-util-waffle.h:74)
There is one more thing which disallow us to easily fix it are different element sizes
for instance: `glx_config_create_list` allocates memory just for `glx_config`,
`driwindowsMapConfigs` for `driwindows_config` and
`createDriMode` for `__GLXDRIconfigPrivate`.
Yes it is possible but size of such fix
will be more big and complex than original one.
So it make sense only if the malloc overhead
really is a big problem there.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3406>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3406>
(cherry picked from commit 311c82e192)
This is the same idea as "intel: fix the gen 11 compute shader scratch
IDs".
The number of EUs on TGL is not the same as ICL, but the
MEDIA_VFE_STATE restrictions stay the same, so adapt the code to it.
Also, consider the base configuration instead of what we read from the
Kernel.
According to Mark, this fixes the following piglit tests on TGL:
piglit.spec.arb_compute_shader.execution.shared-atomicmax-uint.tglm64
piglit.spec.arb_compute_shader.execution.shared-atomicmax-int.tglm64
piglit.spec.intel_shader_atomic_float_minmax.execution.shared-atomicmax-float.tglm64
v2: s/ICL+/Gen11+/ (Jason).
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4006>
(cherry picked from commit 9e5ce30da7)
Scratch space allocation is based on the number of threads in the base
configuration, and we only have one base configuration for ICL, with 8
subslices.
This fixes an issue with Aztec on Vulkan in a machine with a
configuration that's not the base. The issue looks like a regression
from b9e93db208, but it seems things are broken since forever, just
not easily reproducible.
v2: Reimplement it using the subslices variable. Don't touch TGL.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4006>
(cherry picked from commit 1efe139cad)
Our current GPGPU_WALKER code only supports up to 64 threads.
On HSW we could use up to 70 and TGL up to 112, but only if the walker
is adjusted so the width does not exceed 64. Work to support this is
in progress.
Previous to this change, we might try to downgrade to SIMD8 if the
SIMD16 shader spilled. Since HSW and TGL have the max number of
threads above 64, we would then try to emit an invalid GPGPU walker
command.
Fixes: 932045061b ("i965/cs: Emit compute shader code and upload programs")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
(cherry picked from commit cf12faef61)
Blender was crashing with a SIGFPE even though the divide by 0
logic was kicking in. I'm not sure why TGSI doesn't get into this state.
The problem was is the numerator was INT_MIN we'd replace the div by
0 with a divide by -1, which is an exception for INT_MIN as INT_MIN/-1
== INT_MAX + 1 (too large for 32-bits). Instead for integer divides
just replace the mask values with 0x7fffffff. Also fix up the
result handling so it aligns with TGSI usage. (gives 0)
Fixes: c717ac1247 ("gallivm/nir: wrap idiv to avoid divide by 0 (v2)")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3956>
(cherry picked from commit 5370c685da)
Even though the workaround description says:
"all the listed commands are non-pipelined and hence flush caused due
to pipeline mode change must not cause performance issues..."
My understanding is that we still need to have the flushes. Also, the
flushes are required not only to stall the pipeline, but also to clear
caches, so I don't think they can simply be discarded.
Additionally, while doing some testing that increased the number of
surface STATE_BASE_ADDRESS emitted, I got a lot more GPU hangs. Adding
these flushes fixes those hangs.
Fixes: b8fbb39a (iris: Implement Gen12 workaround for non pipelined
state)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3908>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3908>
(cherry picked from commit a70a605ad6)
The __DRI_IMAGE_FORMAT_* part wants to be handled for the *101010
type formats as well. Factor out a common function for that task.
That again makes the piglit egl_ext_device_base test work again
for hardware drivers.
v2: Factor out a common function for that task.
v3: dri2_pbuffer_visuals -> dri2_pbuffer_visuals
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fixes: 9acb94b623 "egl: Enable 10bpc EGLConfigs for platform_{device,surfaceless}"
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3790>
(cherry picked from commit d32c458de7)
Because we set the needs_data_cache bit from the NIR during compilation,
any time a shader was pulled out of the pipeline cache, we wouldn't set
the bit and the data cache was disabled. Fortunately, on Gen8+, this
bit is ignored because we always use the ALL section in the L3$ config
instead of separate DC and RO sections. On Gen7, however, this meant
that we were basically never running with the data cache enabled and our
compute performance was suffering massively because of it. This commit
improves Geekbench 5 scores on my Haswell GT3 by roughly 330% (no,
that's not a typo).
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3912>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3912>
(cherry picked from commit 5dfd83d7a1)
When a resource is written by a compute shader and then used by a
non-compute stage we sync on last compute job to guarantee that the
resource has been completely written when the next stage reads resources.
In the other cases how flushes are done guarantee the serialization of
the writes and reads.
To reproduce the failure the following tests should be executed in batch
as last test don't fail when run isolated:
KHR-GLES31.core.shader_image_load_store.basic-allFormats-load-fs
KHR-GLES31.core.shader_image_load_store.basic-allFormats-loadStoreComputeStage
KHR-GLES31.core.shader_image_load_store.basic-allTargets-load-cs
KHR-GLES31.core.shader_image_load_store.advanced-sync-vertexArray
v2: Use fence dep instead of bo_wait (Eric Anholt)
v3: Rename struct names (Iago Toral)
Document why is not needed on graphics->compute case. (Iago Toral)
Follow same code pattern of the other update of in_sync_bcl.
v4: Fixed comments style. (Iago Toral)
Fixes KHR-GLES31.core.shader_image_load_store.advanced-sync-vertexArray
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
CC: 19.3 20.0 <mesa-stable@lists.freedesktop.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2700>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2700>
(cherry picked from commit 01496e3d1e)
If an attempt to create an shm pixmap in XCreateDrawable fails
then it ends up with the shmid == -1. This means the get image
path needs to fallback so return false in this case to use the
non-shm get image path.
Fixes: 02c3dad0f3 ("Call shmget() with permission 0600 instead of 0777")
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3823>
(cherry picked from commit 246e4aeaef)
When Brian in 02c3dad0f3 restricted
the shm permissions it means we hit the fallback paths in some
scenarios we hadn't before.
When you use Xephyr to xdmcp from one user to another the new perms
stop the X server (running as user a) attaching to the SHM segments
from gnome-shell (running as user b).
In this case however only the GLX side of the code had insight into this,
and the dri could was meant of fall back, and it worked for put image
fine but the get image path was broken, since there was no indication
in the broken case of the need to fallback.
This adds a return type to a new interface member that lets the
caller know it has to fallback.
Fixes: 02c3dad0f3 ("Call shmget() with permission 0600 instead of 0777")
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3823>
(cherry picked from commit 466a0b2e49)
When os_memory_debug.h was promoted to src/util, this source-file on
which it depends on when the debug-flag is set on windows was left
out. So let's move this also.
It doesn't seem there's any way of triggering this issue right now, but
it seems better to correct this to avoid this from biting us in the ass
in the future.
Fixes: 88c4680b5a ("util: promote u_memory to src/util")
Reviewed-by: Dylan Baker <dylan@pnwbakers>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3844>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3844>
(cherry picked from commit 2e3318b151)
The other source of the multiply will be interpreted as a uint32_t in an
XOR instruction. Any source modifiers with either not be interpreted at
all or will be misinterpreted due to the differing types.
If the other operand of the multiplication has a source modifier, just
emit an extra move to resolve the source modifiers.
The negation source modifier problem is difficult to reproduce due to an
algebraic optimization that changes (-a*b) to -(a*b). However, changes
in MR !1359 push the negations back down.
On Gen7+ it might be possible to do slightly better for an abs() source
modifier by using BFI2 as a glorified copysign().
On Gen8+ it might be possible to do slightly better for a neg() source
modifier by emitting (~a ^ b).
There were no shader-db changes on any Intel platform, so I think we can
deal with that problem when it arises.
See also piglit!224.
Fixes: 06d2c11641 ("intel/fs: Add a scale factor to emit_fsign")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3780>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3780>
(cherry picked from commit 273b8cd1ca)
The interpretation of the fields is different depending whether the
instruction is a SEND/MATH or not.
This fixes the disassembly output for non-SEND/MATH instructions that
have both in-order and out-of-order dependencies. Their dependencies
were wrongly represented as `@A $B` when the correct would be `@A
$B.dst`.
Fixes: 6154cdf924 ("intel/eu/gen12: Add auxiliary type to represent SWSB information during codegen.")
Fixes: 83612c0127 ("intel/disasm/gen12: Disassemble software scoreboard information.")
Acked-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3660>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3660>
(cherry picked from commit 79788b8f7f)
Together with the fixup_nomask_control_flow() pass introduced in a
previous patch, this implements a less invasive alternative to the
workaround documented in the hardware spec for GEN:BUG:1407528679,
which doesn't involve disabling structured control flow.
Under some conditions Gen12 hardware can end up executing a BB with
all channels disabled, which will lead to the execution of any NoMask
instructions in it, even though any execution-masked instructions will
be correctly shot down. This could break assumptions of the SWSB pass
if the data computed by a NoMask instruction is synchronized against
by using an SWSB annotation baked into a regular execution-masked
instruction, since the first (NoMask) instruction may be executed
redundantly by the hardware, even though the second will correctly be
shot down, potentially leading to a RaW or WaW hazard if a third
instruction subsequently accesses the destination register of the
first instruction.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit e14529ff32)
Found by inspection. Existing code was trying to avoid assuming that
an SBID had been assigned to the virtual instruction, but
synchronizing the header setup with respect to the previous SIMD16
SEND by using SYNC.ALLRD doesn't really seem possible unless the SEND
instruction had been assigned an SBID. Assert-fail instead if no SBID
has been allocated.
Fixes: 15e3a0d9d2 "intel/eu/gen12: Set SWSB annotations in hand-crafted assembly."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 4e4e8d793f)
This is a less invasive alternative to the workaround documented in
the hardware spec for GEN:BUG:1407528679, which doesn't involve
disabling structured control flow (it's unlikely that switching to
GOTO/JOIN would have actually fixed the problem anyway).
Under some conditions Gen12 hardware can end up executing a BB with
all channels disabled, which will lead to the execution of any NoMask
instructions in it, even though any execution-masked instructions will
be correctly shot down. This may break assumptions of some NoMask
SEND messages whose descriptor depends on data generated by live
invocations of the shader.
This avoids the problem by predicating certain instructions on an ANY
horizontal predicate that makes sure that their execution is omitted
when all channels of the program are disabled. The shader-db impact
of this patch seems to be minimal:
total instructions in shared programs: 17169833 -> 17169913 (0.00%)
instructions in affected programs: 30663 -> 30743 (0.26%)
helped: 0
HURT: 42
total cycles in shared programs: 336966176 -> 336968568 (0.00%)
cycles in affected programs: 2367290 -> 2369682 (0.10%)
helped: 0
HURT: 13
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a8ac0bd759)
We need to pass a width of 32 since the opcode bashes the whole f1.0
register on IVB. This is unlikely to have caused problems since f1.0
is largely unused currently. That's likely to change soon though,
even on platforms other than Gen7.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit b8b509fb92)
Found by inspection. This seems particularly likely to cause problems
with instructions dependent on the current execution mask like
SHADER_OPCODE_FIND_LIVE_CHANNEL or the FS_OPCODE_LOAD_LIVE_CHANNELS
instruction I'm about to introduce, but one could imagine it leading
to data corruption if CSE ever managed to combine two instructions
before and after the FS_OPCODE_PLACEHOLDER_HALT, since the one before
may not be executed for some channels.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit c9e33e5cbf)
==7265== 248 (120 direct, 128 indirect) bytes in 1 blocks are definitely lost in loss record 1,438 of 1,465
==7265== at 0x483980B: malloc (vg_replace_malloc.c:309)
==7265== by 0x598A2AB: ralloc_size (ralloc.c:119)
==7265== by 0x598F861: _mesa_set_create (set.c:127)
==7265== by 0x599079D: _mesa_pointer_set_create (set.c:570)
==7265== by 0x58BD7D1: build_program_resource_list(gl_context*, gl_shader_program*, bool) (linker.cpp:4026)
==7265== by 0x548231B: st_link_shader (st_glsl_to_ir.cpp:170)
==7265== by 0x54DA269: _mesa_glsl_link_shader (ir_to_mesa.cpp:3119)
Fixes: a6aedc66 ("st/glsl_to_nir: use nir based program resource list builder")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3574>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3574>
(cherry picked from commit f7d1bf075a)
This patch fix these build errors with GCC 10.
/usr/bin/ld: src/gallium/drivers/panfrost/libpanfrost.a(pan_resource.c.o):src/panfrost/midgard/midgard_compile.h:52: multiple definition of `pan_sysval'; src/gallium/drivers/panfrost/libpanfrost.a(pan_screen.c.o):src/panfrost/midgard/midgard_compile.h:52: first defined here
/usr/bin/ld: src/gallium/drivers/panfrost/libpanfrost.a(pan_resource.c.o):src/panfrost/midgard/midgard_compile.h:68: multiple definition of `pan_special_attributes'; src/gallium/drivers/panfrost/libpanfrost.a(pan_screen.c.o):src/panfrost/midgard/midgard_compile.h:68: first defined here
Fixes: 7e8de5a707 ("panfrost: Implement system values")
Fixes: 306800d747 ("pan/midgard: Lower gl_VertexID/gl_InstanceID to attributes")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3752>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3752>
(cherry picked from commit 63345a3596)
This reverts the functional part of commit
d17ff2f7f1, leaving the unit test for
mesa/pipe agreement on what's an array.
The issue is that the util_channel_desc.shift values on array formats are
not used for bit addressing in memory, they're bit addressing within a
word treating a pixel of the format as a native type, as seen by
llvmpipe's use of the values to do shifts (see
lp_build_unpack_arith_rgba_aos() for example). This means the values are
nonsensical for 3-byte RGB, but then llvmpipe doesn't expose those formats
so it works out.
I still want to clean up our big-endian format handling at some point, but
let's fix the s390x regression first, sort out our format unit tests in
CI, then be able to refactor with confidence.
Fixes: d17ff2f7f1 ("gallium: Fix big-endian addressing of non-bitmask array formats.")
Closes: #2472
Acked-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3721>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3721>
(cherry picked from commit 1886dbfe73)
Use pipe_shader_state_from_tgsi() to set shader state for transformed
shader so that we get all correct data for respective shader state.
This fixes several regressed glretrace, piglit crashes found during merging
upsteam mesa
Fixes: bf12bc2dd7 (draw: add nir info gathering and building support)
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
(cherry picked from commit 144561dc5e)
Since we are now using sparse matrix for format_conversion_table,
we have to make sure we have last entry in table which gives the
sense of required size of format_conversion_table
Fixes: 84db6ba7 ("svga: Drop unsupported formats from the format table")
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
(cherry picked from commit 470e73e7f8)
The previous commit leads to match immed values unexpectedly.
This makes constlen for each shader including bvert wrong.
Also fixes atan2 for mediump deqp tests.
Fixes: cbd1f47433 ("freedreno/ir3: convert back to 32-bit values for half constant registers.")
v2: Move conversion up above fabs/fneg modifier handling as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737>
(cherry picked from commit 260bd32b58)
Since 9254fb4fc7, the pass replaced the SCC clobber with the scalar
identity temporary. Just skip most of the temporary setup, since we don't
need it for gfx10_wave64_bpermute.
Although shuffles are disabled on GFX10, Detroit: Become Human seems to
use them anyway.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Fixes: 9254fb4fc7 ('aco: don't use a scalar
temporary for reductions on GFX10')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3683>
(cherry picked from commit 20eb1acb6f)
The extra bits in CB_SHADER_MASK break dual source blending in
SkQP on a Stoney device. However:
- As far as I can tell, some other dual source blend tests are passing
before and after the change.
- A hacked around skqp passes on my Vega desktop and Raven laptop
- Getting Skqp to give any useful info or to run it outside of Android
on ChromeOS is proving difficult.
I have confirmed 3 strategies that seem to work:
- The old radv behavior of setting CB_SHADER_MASK to 0xF
- AMDVLK: CB_SHADER_MASK = 0xFF, and the 3 RB+ regs
are 0.
- radeonsi: CB_SHADER_MASK = 0xFF, but does not set DISABLE
bits in SX_BLEND_OPT_CONTROL for CB 1-7.
Let us use the radeonsi solution as that solution also seems like the correct
thing to do for holes. I have tested on my Raven laptop that setting the high
surfaces to not disabled and downconvert to 32_R does not imply a performance
penalty.
Fixes: e9316fdfd4 "radv: fix setting CB_SHADER_MASK for dual source blending"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3670>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3670>
(cherry picked from commit 65a6dc5139)
This patch fixes this build error with GCC 10.
/usr/bin/ld: src/gallium/drivers/lima/liblima.a(lima_context.c.o):src/gallium/drivers/lima/lima_util.h:32: multiple definition of `lima_dump_command_stream'; src/gallium/drivers/lima/liblima.a(lima_screen.c.o):src/gallium/drivers/lima/lima_util.h:32: first defined here
/usr/bin/ld: src/gallium/drivers/lima/liblima.a(lima_resource.c.o):src/gallium/drivers/lima/lima_util.h:32: multiple definition of `lima_dump_command_stream'; src/gallium/drivers/lima/liblima.a(lima_screen.c.o):src/gallium/drivers/lima/lima_util.h:32: first defined here
/usr/bin/ld: src/gallium/drivers/lima/liblima.a(lima_draw.c.o):src/gallium/drivers/lima/lima_util.h:32: multiple definition of `lima_dump_command_stream'; src/gallium/drivers/lima/liblima.a(lima_screen.c.o):src/gallium/drivers/lima/lima_util.h:32: first defined here
/usr/bin/ld: src/gallium/drivers/lima/liblima.a(lima_bo.c.o):src/gallium/drivers/lima/lima_util.h:32: multiple definition of `lima_dump_command_stream'; src/gallium/drivers/lima/liblima.a(lima_screen.c.o):src/gallium/drivers/lima/lima_util.h:32: first defined here
/usr/bin/ld: src/gallium/drivers/lima/liblima.a(lima_submit.c.o):src/gallium/drivers/lima/lima_util.h:32: multiple definition of `lima_dump_command_stream'; src/gallium/drivers/lima/liblima.a(lima_screen.c.o):src/gallium/drivers/lima/lima_util.h:32: first defined here
/usr/bin/ld: src/gallium/drivers/lima/liblima.a(lima_util.c.o):src/gallium/drivers/lima/lima_util.h:32: multiple definition of `lima_dump_command_stream'; src/gallium/drivers/lima/liblima.a(lima_screen.c.o):src/gallium/drivers/lima/lima_util.h:32: first defined here
/usr/bin/ld: src/gallium/drivers/lima/liblima.a(lima_texture.c.o):src/gallium/drivers/lima/lima_util.h:32: multiple definition of `lima_dump_command_stream'; src/gallium/drivers/lima/liblima.a(lima_screen.c.o):src/gallium/drivers/lima/lima_util.h:32: first defined here
Fixes: d71cd245d7 ("lima: Rotate dump files after each finished pp frame")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
(cherry picked from commit 02658df152)
Previously, i965/iris tried to reuse the currently programmed URB config
if it was good enough for BLORP, rather than reprogramming it each time.
However, this will make some things harder on Gen12+ and we've not seen
any performance impact from emitting URB more frequently in ANV.
This makes the blorp <-> driver interface a bit simpler on Gen7+ because
now all the driver has to do is to provide the L3$ config rather than
trying to hand off URB re-config to blorp.
Cc: "20.0" mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
(cherry picked from commit 09e4c33085)
In the long term the goal of this script is to nearly completely
automate the process of picking stable nominations, in a well tested
way.
In the short term the goal is to provide a better, faster UI to interact
with stable nominations.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.