This was introduced in commit 8f848ada8a
but not added to the sources list, which is necessary for it to be
included in release tarballs.
Fixes: 8f848ada8a
("swr/rast: Start refactoring of builder/packetizer.")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is needed to provide vk_android_native_buffer.h for vk_enum_to_str.
v2: - remove accidentally included changes
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
We're going to combine ::mcs_buf and ::hiz_buf in later commits. Once
that happens, this function no longer make sense.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Free the clear_color_bo in addition to freeing the
intel_miptree_aux_buffer which holds the reference to it.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
memcmp returns 0 when both swizzles are the same, which means we don't
need any hardware swizzling. texture_format_needs_swiz should return
true when the return value of the memcmp is non-zero.
Fixes: 751ae6afbe ("etnaviv: add support for swizzled texture formats")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
For subpass attachments we need one more coordinate with
the layer, so make them array types.
This fixes a bunch of CTS fails with RADV.
Fixes: 24fb3e6aa1 ("ac/nir: use ac_build_image_opcode for image intrinsics")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The SI family doesn't support chaining which means the maximum
size in dwords per CS is limited. When that limit was reached
we failed to submit the CS and the application crashed.
This patch allows to submit up to 4 IBs which is currently the
limit, but recent amdgpu supports more than that.
Please note that we can reach the limit of 4 IBs per submit
but currently we can't improve that. The only solution is to
upgrade libdrm. That will be improved later but for now this
should fix crashes on SI or when using RADV_DEBUG=noibs.
Fixes: 36cb5508e8 ("radv/winsys: Fail early on overgrown cs.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105775
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We can't use any of the existing implementations in u_debug_stack.
Android technically has libunwind, but it's been modified to the point
where it no longer compiles with the Mesa usage. The library is also
not meant to be referenced by vendor libraries. The officially sanctioned
way of obtaining backtraces is through the Android own libbacktrace, a
C++ library. Access it through a separate C++ source file on Android only.
Signed-off-by: Stefan Schake <stschake@gmail.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The fallback path for no libunwind ends up being stubs for Android.
Don't compile them in so we can provide our own implementation.
Signed-off-by: Stefan Schake <stschake@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Ported from RadeonSI.
Local BOs ignore BO priorities, and we don't need those on APUs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Maintaining two different paths is annoying but this gets
rid of the performance regression introduced by the global
BO list.
We might find a better solution in the future, but for now
just keeps two paths.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
In order to reduce a performance regression introduced by
4b13fe55a4 ("radv: Keep a global BO list for VkMemory."),
we are going to maintain two different paths.
One when VK_EXT_descriptor_indexing is enabled by the
application because we need to have a global BO list, and
one (the old one) when it's not enabled.
With Talos on Polaris, the global BO list reduces performance
by 10% which is too much for me.
This reverts commit ab6cadd3ec.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
All operations with offset_reg at do_vector_read are done
with UD type. So copy propagation was not working through
the generated MOVs:
mov(8) vgrf9:UD, vgrf7:D
This change allows removing the MOV generated for reading the
first components for 16-bit and 64-bit ssbo reads with
non-constant offsets.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The LLVM instruction returns { i32, i1 }, where the i1 indicates success.
We're only interested in the first part, which is the loaded value.
Fixes dEQP-GLES31.functional.compute.shared_var.atomic.compswap.*
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
It looks as if the structure fields array is fully initialized below,
but in fact at least gcc in debug builds will not actually overwrite
the unused bits of bit fields.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
From CL 1.2 Section 5.2.1:
CL_INVALID_VALUE if buffer was created with CL_MEM_HOST_WRITE_ONLY and
flags specify CL_MEM_HOST_READ_ONLY , or if buffer was created with
CL_MEM_HOST_READ_ONLY and flags specify CL_MEM_HOST_WRITE_ONLY , or if
buffer was created with CL_MEM_HOST_NO_ACCESS and flags specify
CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_WRITE_ONLY .
Fixes CL 1.2 CTS test/api get_buffer_info
v2: Correct host_access_flags check (Francisco)
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
base_vertex will be zero for non-indexed calls and in that case we
need vertex_id to be offset by the ‘first’ parameter instead. That is
what we get with first_vertex. This is true for both GL and Vulkan.
The freedreno driver is also setting vertex_id_zero_based on
nir_options. In order to avoid breakage this patch switches the
relevant code to handle SYSTEM_VALUE_FIRST_VERTEX so that it can
retain the same behavior.
v2: change a3xx/fd3_emit.c and a4xx/fd4_emit.c from
SYSTEM_VALUE_BASE_VERTEX to SYSTEM_VALUE_FIRST_VERTEX (Kenneth).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
The base vertex in Vulkan is different from GL in that for non-indexed
primitives the value is taken from the firstVertex parameter instead
of being set to zero. This coincides with the new SYSTEM_VALUE_FIRST_VERTEX
instead of BASE_VERTEX.
v2 (idr): Add comment describing why SYSTEM_VALUE_FIRST_VERTEX is used
for SpvBuiltInBaseVertex. Suggested by Jason.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Until we set gl_BaseVertex to zero for non-indexed draw calls
both have an identical value.
The Vertex Elements are kept like that:
* VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <Draw ID, 0, 0, 0>
v2 (idr): Mark nir_intrinsic_load_first_vertex as "unreachable" in
emit_system_values_block and fs_visitor::nir_emit_vs_intrinsic.
This VS system value will contain the value passed as <basevertex> for
indexed draw calls or the value passed as <first> for non-indexed draw
calls. It can be used to calculate the gl_VertexID as
SYSTEM_VALUE_VERTEX_ID_ZERO_BASE plus SYSTEM_VALUE_FIRST_VERTEX.
From the OpenGL 4.6 spec, 10.4 "Drawing Commands Using Vertex Arrays":
- Page 352:
"The index of any element transferred to the GL by DrawArraysOneInstance
is referred to as its vertex ID, and may be read by a vertex shader as
gl_VertexID. The vertex ID of the ith element transferred is first +
i."
- Page 355:
"The index of any element transferred to the GL by
DrawElementsOneInstance is referred to as its vertex ID, and may be read
by a vertex shader as gl_VertexID. The vertex ID of the ith element
transferred is the sum of basevertex and the value stored in the
currently bound element array buffer at offset indices + i."
Currently the gl_VertexID calculation uses SYSTEM_VALUE_BASE_VERTEX but
this will have to change when the value of gl_BaseVertex is
fixed. Currently its value is broken for non-indexed draw calls because
it must be zero but we are setting it to <first>.
v2: use SYSTEM_VALUE_FIRST_VERTEX as name for the value, instead of
SYSTEM_VALUE_BASE_VERTEX_ID (Kenneth).
v3 (idr): Rebase on Rob Clark converting nir_intrinsics.h to be
generated. Reformat commit message to 72 columns.
Reviewed-by: Neil Roberts <nroberts@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I have seen a few applications and games do the dynamic buffer bounds incorrectly, this
make it easier to work around, e.g. for debugging.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
When advertizing this extension, egl_dri2 uses the DRI2_RENDERER_QUERY
extension to query whether an sRGB format is supported. That extension will
query our driver with the BIND flag PIPE_BIND_RENDER_TARGET rather than
PIPE_BIND_DISPLAY_TARGET which is used when building the configs.
We only return the correct value for PIPE_BIND_DISPLAY_TARGET.
The inconsistency causes EGL to crash at surface initialization if sRGB is
not supported. Fix this by supporting both bind flags.
Testing done:
piglit egl_gl_colorspace srgb
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Include llvm/Transforms/Utils.h with the newest LLVM 7
v2: Include with " " rather than < > (Vinson Lee)
v3: Use LLVM_VERSION_MAJOR rather than HAVE_LLVM (George Kyriazis)
Signed-of-by: Mike Lothian <mike@fireburn.co.uk>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
This can be enabled with RADV_PERFTEST=dccmsaa.
DCC for MSAA textures is actually not as easy to implement. It
looks like there is some corner cases. I will improve support
incrementally.
Vega support, as well as Polaris improvements, will be added later.
No CTS changes on Polaris using RADV_DEBUG=zerovram and
RADV_PERFTEST=dccmsaa.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Multisampled source images (ie. color attachments) can be now
DCC compressed, so the driver needs to perform a DCC decompression
pass before resolving
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
CMASK is required because it should be cleared to
0xCCCCCCCC for MSAA textures.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When DCC is enabled with MSAA textures, CMASK should be
cleared to 0xCCCCCCCC.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fixes some random CTS failures:
dEQP-VK.renderpass.multisample.*.
Performing a fast-clear eliminate is still useless, but it
seems that we need to sync.
Found while running CTS with RADV_DEBUG=zerovram.
Fixes: 56a171a499 ("radv: don't fast-clear eliminate after resolving a subpass with compute")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Might be useful for debugging purposes, especially when we
want to replace a shader on the fly.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The continue means we do alignment differently than during creation,
making the buffer smaller than expected.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Previously we did not care about havin the set storage in order,
but for variable descriptor count we want the highest binding
at the end of the storage.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
With update after bind we can't attach bo's to the command buffer
from the descriptor set anymore, so we have to have a global BO
list.
I am somewhat surprised this works really well even though we have
implicit synchronization in the WSI based on the bo list associations
and with the new behavior every command buffer is associated with
every swapchain image. But I could not find slowdowns in games because
of it.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
brw_bo_alloc may round up our allocation size to the next bucket size.
In this case, we would malloc a shadow buffer that was the original
intended size, but use bo->size (the larger size) for all of our checks.
This could cause us to run off the end of the shadow buffer.
v2: Actually use the new BO size (caught by Lionel)
Reported-by: James Xiong <james.xiong@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c7dcee58b5 (i965: Avoid problems from referencing orphaned BOs after growing.)
This packet causes the no-op IB detection to fail, so the IB is always
submitted. Also fix the no-op IB detection by moving the begin call.
Cc: 18.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This only enables the null and xlib target, so no windows support yet.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: - gate unit tests on swrast being enabled (Eric A)
v3: - rebase on libtrace being merged with gallium auxiliary
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net> (v2)
This ports glcpp-test.sh and glcpp-test-cr-lf.sh to a python script that
accepts arguments for each line ending type. This should allow for
better reporting to users.
v2: - Use $PYTHON2 to be consistent with other tests in mesa
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch converts optimization-test.sh to python, in this process it
removes external shell dependencies including diff. It replaces the
python script that generates shell scripts with a python library that
generates test cases and runs them using subprocess.
v2: - use $PYTHON2 to be consistent with other tests in mesa
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
This reimplements the test in python with a shell script wrapper that
allows autotools to continue to run the test without realizing that
anything has changed.
Using python has two advantages, first it's portable so this test can be
run on windows as well as Linux since it just requires python, no more
diff, pwd or sh. It's also no longer tied to autotools implementation
details, like the environment variables $srcdir and $abs_builddir,
though the autotools shell wrapper still uses those, which makes it
possible to run the test in meson.
v2: - Use $PYTHON2 in script to be consistent with other scripts in mesa
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Add per-worker thread private data to all shader calls
Add per-worker sampler cache and jit context
Add late LoadTexel JIT support
Add per-worker-thread Sampler / LoadTexel JIT
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Fix issue where temporary allocas were getting hoisted to function entry
unnecessarily. We now explicitly mark temporary allocas and skip hoisting
during the hoist pass. Shuold reduce stack usage.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Needed because some FP paths (namely stipple) use gather intrinsics
that now need to be lowered to x86.
v2: fix typo in commit message
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Enable generalized fetch jit with 8 or 16 wide SIMD target. Still some
work needed to remove some simd8 double pumping for 16-wide target.
Also removed unused non-gather load vertices path.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Abstract usage scenarios for memory accesses into builder_gfx_mem.
Builder_gfx_mem will convert gfxptr_t from 64-bit int to regular pointer
types for use by builder_mem.
v2: reworded commit message; renamed enum more appropriately
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Some more work to do before we can support simultaneous 8-wide and
16-wide and remove the VGATHERPS_16 version.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Small cleanup. Remove convenience types from JitManager and standardize
on the Builder's convenience types.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Add support for providing an emulation callback function for arch/width
combinations that don't map cleanly to an x86 intrinsic.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Move x86 intrinsic lowering to a separate pass. Builder now instantiates
generic intrinsics for features not supported by llvm. The separate x86
lowering pass is responsible for lowering to valid x86 for the target
SIMD architecture. Currently it's a port of existing code to get it
up and running quickly. Will eventually support optimized x86 for AVX,
AVX2 and AVX512.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Removed preprocessor defines from structures passed to LLVM jitted code.
The python scripts do not understand the preprocessor defines and ignores
them. So for fields that are compiled out due to a preprocessor define
the LLVM script accounts for them anyway because it doesn't know what
the defines are set to. The sanitize defines for open source are fine
in that they're safely used.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Added a SWR_SHADER_STATS structure which is passed to each shader. The
stats pass will instrument the shader to populate this.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Fix slow permutes in PA tri lists under SIMD16 emulation on AVX
Added missing permute (interlane, immediate) to SIMDLIB
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Finish up the remaining explicit intrinsic uses. At this point all
explicit Intrinsic::getDeclaration() usage has been replaced with auto
generated macros generated with gen_llvm_ir_macros.py. Going forward,
make sure to only use the intrinsics here, adding new ones as needed.
Next step is to remove all references to x86 intrinsics to keep the
builder target-independent. Any x86 lowering will be handled by a
separate pass.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Add stats for degenerate and backfacing primitive counts
Wire archrast stats for alpha blend and alpha test.
pass value to jitter, upon return have archrast event increment a value
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Stuff parameters into a blend context struct before passing down through
the PFN_BLEND_JIT_FUNC function pointer. Needed for stat changes.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Add assert for correct usage of memory accesses
v2: reworded commit message; renamed enum more appropriately
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This is for parity with autotools. It names the library
libMesaOpenCL.so.1.0.0 and points mesa.icd to the .1 symlink.
opencl_version now matches configure.ac's OPENCL_VERSION.
Signed-off-by: Jan Alexander Steffens (heftig) <jan.steffens@gmail.com>
Tested-By: Aaron Watry <awatry@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The SPIR-V spec doesn’t specify a size requirement for these and the
equivalent functions in the GLSL spec have explicit alternatives for
doubles. Refract is a little bit more complicated due to the fact that
the final argument is always supposed to be a scalar 32- or 16- bit
float regardless of the other operands. However in practice it seems
there is a bug in glslang that makes it convert the argument to 64-bit
if you actually try to pass it a 32-bit value while the other
arguments are 64-bit. This adds an optional conversion of the final
argument in order to support any type.
These have been tested against the automatically generated tests of
glsl-4.00/execution/built-in-functions using the ARB_gl_spirv branch
which tests it with quite a large range of combinations.
The issue with glslang has been filed here:
https://github.com/KhronosGroup/glslang/issues/1279
v2: Convert the eta operand of Refract from any size in order to make
it eventually cope with 16-bit floats.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The only change neccessary is to change the type of the constant used
to compare against.
This has been tested against the arb_gpu_shader_fp64/execution/
fs-isinf-dvec tests using the ARB_gl_spirv branch.
v2: Use nir_imm_floatN_t for the constant.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There is an existing macro that is used to choose between either a
float or a double immediate constant based on the bit size of the
first operand to the builtin. This is now changed to use the new
nir_imm_floatN_t helper function to reduce the number of places that
make this decision.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This lets you easily build float immediates just given the bit size.
If we have this single place here to handle this then it will be
easier to add support for 16-bit floats later.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Otherwise we create unused conditional return flags and things
get unnecessarily ugly fast when lowering nested functions.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The extra params we unused by the drivers that used DrawBuffers.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
fixes warnings like this:
[184/1137] Compiling C++ object 'src/compiler/glsl/glsl@sta/lower_jumps.cpp.o'.
In file included from ../src/mesa/main/mtypes.h:48,
from ../src/compiler/glsl_types.h:149,
from ../src/compiler/glsl/lower_jumps.cpp:59:
../src/compiler/glsl/lower_jumps.cpp: In member function '{anonymous}::block_record {anonymous}::ir_lower_jumps_visitor::visit_block(exec_list*)':
../src/compiler/glsl/list.h:650:17: warning: unnecessary parentheses in declaration of 'node' [-Wparentheses]
for (__type *(__inst) = (__type *)(__list)->head_sentinel.next; \
^
../src/compiler/glsl/lower_jumps.cpp:510:7: note: in expansion of macro 'foreach_in_list'
foreach_in_list(ir_instruction, node, list) {
^~~~~~~~~~~~~~~
Signed-off-by: Marc Dietrich <marvin24@gmx.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
A couple spots were missed for handling of the new INT8/UINT8 base type.
Also de-duplicate get_base_type().. get_scalar_type() had nearly the
same switch statement, with the exception that anything with base_type
that was not scalar would return error_type. So just handle that one
special case in get_scalar_type().
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(This patch doesn't enable the behavior. It will be enabled in a later
commit.)
Draw calls from multiple IBs can be executed in parallel.
v2: do emit partial flushes on SI
v3: invalidate all shader caches at the beginning of IBs
v4: don't call si_emit_cache_flush in si_flush_gfx_cs if not needed,
only do this for flushes invoked internally
v5: empty IBs should wait for idle if the flush requires it
v6: split the commit
If we artificially limit the number of draw calls per IB to 5, we'll get
a lot more IBs, leading to a lot more partial flushes. Let's see how
the removal of partial flushes changes GPU utilization in that scenario:
With partial flushes (time busy):
CP: 99%
SPI: 86%
CB: 73:
Without partial flushes (time busy):
CP: 99%
SPI: 93%
CB: 81%
Tested-by: Benedikt Schemmer <ben@besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
ir_binop_gequal needs to be converted to nir_op_sge when native integers
are not supported in the driver.
Otherwise it becomes no different than ir_binop_less after the
conversion.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
That looks useless, and I think radv_handle_image_transition()
will do a fast-clear eliminate because it's called after the
resolve.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
In order to separate initialization from decompression. In the
future, that will allow us to init DCC/FMASK/CMASK in one shot.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
Mostly because DCC implies a fast-clear eliminate and we
should be able to skip some DCC decompressions by setting
a predicate like for CMASK and FMASK.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
When decompressing DCC we don't enable it, so it's useless
to disable it. This reduces the number of prediction packets
sent to the GPU when performing color decompression passes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Niuwenhuizen <bas@basnieuwenhuizen.nl>
To fix the MSVC build. The build broke because we started to compile
the ddebug code on Windows after the mtypes.h changes. Building ddebug
caused us to also use the u_network.c code for the first time.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Don't include Unix headers or use Unix functions when building with MSVC.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
THIS is a macro in one of the MSVC header files. It's also a token
in the GLSL lexer. This causes a compilation failure with MSVC.
This issue seems to be newly exposed after the recent mtypes.h removal
patches.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
The recent mtypes.h removal patches seems to have exposed a MSVC
issue where 'interface' is defined as a macro in an MSVC header file.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
snprintf is a macro in the MSVC stdio.h header and we needed to
include that header before imports.h where we also defined an
snprintf macro. Otherwise, the MSVC build would fail. The recent
mtypes.h removal patches seems to have exposed this issue.
This patch simply removes our snprintf macro and replaces one use
of it in teximage.c with _mesa_snprintf(). There are other calls
to snprintf() in DRI drivers, but none of them are built on Windows.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
We're not counting correctly with depth & stencil images.
Additionally we need to move an assert that is meant just for color
attachments.
v2: Move an assert() (Reported by Craig)
Change aspect mask checks (Francesco)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a62a979335 ("anv: enable multiple planes per image/imageView")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105994
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Add this prefix to the env var: "simple," For example:
GALLIUM_HUD=simple,fps
The X coordinates are the same, but the Y coordinates are different, because
there is only text.
'+' happens to behave the same as "\n".
',' happens to behave the same as "\n\n".
so that the draw is started as soon as possible.
v2: only prefetch the API VS and VBO descriptors
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This code was written with the constant engine in mind.
We can simplify it now.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Current code is returning an INVALID_OPERATION when trying to use
getTextureImage() on a level that has not been explicitly defined.
That is, we define a mipmapped Texture2D with 3 levels, and try to use
GetTextureImage() for the 4th levels, and INVALID_OPERATION is returned.
Nevertheless, such case is not listed as an error in OpenGL 4.6 spec,
section 8.11.4 ("Texture Image Queries"), where all the case errors for
this function are defined. So it seems this is a valid operation.
On the other hand, in section 8.22 ("Texture State and Proxy State") it
states:
"Each initial texture image is null. It has zero width, height, and
depth, internal format RGBA, or R8 for buffer textures, component
sizes set to zero and component types set to NONE, the compressed
flag set to FALSE, a zero compressed size, and the bound buffer
object name is zero."
We can assume that we are reading this initialized empty image when
calling GetTextureImage() with a non defined level.
With this assumption, we will reach one of the other error cases defined
for the functions. In the end this means that we would end up returning
INVALID_VALUE to the caller.
This fixes arb_get_texture_sub_image piglit tests.
v2: just return INVALID_VALUE if there is no defined level (Iago)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
According to OpenGL 4.6 spec, section 8.11.4 ("Texture Image Queries"),
relative to errors for GetTexImage, GetTextureImage, and GetnTexImage:
"An INVALID_OPERATION error is generated by GetTextureImage if the
effective target is TEXTURE_CUBE_MAP or TEXTURE_CUBE_MAP_ARRAY, and
the texture object is not cube complete or cube array complete,
respectively."
This fixes arb_get_texture_sub_image piglit tests.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
According to OpenGL 4.6 spec, section 8.11.4 ("Texture Image Queries"),
relative to errors for GetTextureSubImage() function:
"An INVALID_VALUE error is generated if the effective target is
TEXTURE_1D and either yoffset is not zero, or height is not one.
An INVALID_VALUE error is generated if the effective target is
TEXTURE_1D, TEXTURE_1D_ARRAY, TEXTURE_2D or TEXTURE_RECTANGLE, and
either zoffset is not zero, or depth is not one."
The commit fixes the check for height and depth.
This fixes arb_get_texture_sub_image piglit tests.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
- remove mtypes.h from most header files
- add main/menums.h for often used definitions
- remove main/core.h
v2: fix radv build
Reviewed-by: Brian Paul <brianp@vmware.com>
Pretty straight forward, just pass the divisors through the shader
key and then do a LLVM divide.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
For dcn1 && < 64 bpp displayable surfaces, addrlib only accepts
S swizzles.
At the same time addrlib prefers D swizzles is allowed, so we can
just allow S swizzles as fallback.
Fixes: b64b712558 "ac/surface/gfx9: request desired micro tile mode explicitly"
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is the ABI I'm hoping to stabilize for merging the driver. seqnos
are eliminated, which allows for the GPU scheduler to task-switch between
DRM fds even after submission to the kernel. In/out sync objects are
introduced, to allow the Android fencing extension (not yet implemented,
but should be trivial), and to also allow the driver to tell the kernel to
not start a bin until a previous render is complete.
This should avoid mistakes with not flushing as we change the series of
loads. Already, it fixes a hopefully unreachable case where we were
emitting just the TILE_COORDINATES and not the dummy store that needs to
go with it.
This was dying in the simulator on
GTF-GLES3.gtf.GL3Tests.packed_depth_stencil.packed_depth_stencil_blit.
We'll need to do basically the same thing as Z32F/S8 does in the MSAA
Z24S8 case.
The v3dX(get_internal_type_bpp_for_output_format)() call only handles
color output formats (which overlap in enum numbers with depth output
formats), so for depth we just need to take the normal cpp times the
number of samples.
The current FW has restricted the size to the worse case,
and the new dynamic dpb buffer support is on the way from
firmware side, we will change accordingly.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
num_dcc_levels means that DCC is supported, but this doesn't
mean that it's enabled by the driver. Instead, we should rely
on radv_image_has_dcc().
This fixes some multisample regressions since 0babc8e5d6
("radv: fix picking the method for resolve subpass") on Vega.
This is because the resolve method changed from HW to FS, but
those fails are totally unexpected, so there might some
differences between Polaris and Vega here.
Fixes: 44fcf58744 ("radv: Disable DCC for GENERAL layout and compute transfer dest.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This helper shares common code before resolving using either
a fragment or a compute shader.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The compatibility and core tokens were not added until GLSL 1.50,
for GLSL 1.40 just assume all shaders built with a compat profile
are compat shaders.
Fixes rendering issues in Dawn of War II on radeonsi which has
enabled OpenGL 3.1 compat support.
Fixes: a0c8b49284 "mesa: enable OpenGL 3.1 with ARB_compatibility"
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105807
Passing ctx to compressedteximage_only_format was the only use of the
ctx parameter in _mesa_format_no_online_compression, so that parameter
had to go too.
../../SOURCE/master/src/mesa/main/teximage.c: In function ‘compressedteximage_only_format’:
../../SOURCE/master/src/mesa/main/teximage.c:1355:57: warning: unused parameter ‘ctx’ [-Wunused-parameter]
compressedteximage_only_format(const struct gl_context *ctx, GLenum format)
^~~
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
vulkan/genX_blorp_exec.c:69:1: warning: ‘blorp_get_surface_base_address’ defined but not used [-Wunused-function]
blorp_get_surface_base_address(struct blorp_batch *batch)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from vulkan/genX_blorp_exec.c:35:0:
./blorp/blorp_genX_exec.h:1249:1: warning: ‘blorp_emit_memcpy’ defined but not used [-Wunused-function]
blorp_emit_memcpy(struct blorp_batch *batch,
^~~~~~~~~~~~~~~~~
genX_blorp_exec.c:99:1: warning: ‘blorp_get_surface_base_address’ defined but not used [-Wunused-function]
blorp_get_surface_base_address(struct blorp_batch *batch)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from genX_blorp_exec.c:33:0:
../../../../../src/intel/blorp/blorp_genX_exec.h:1249:1: warning: ‘blorp_emit_memcpy’ defined but not used [-Wunused-function]
blorp_emit_memcpy(struct blorp_batch *batch,
^~~~~~~~~~~~~~~~~
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The matching code doesn't make real use of the return value. The main
function return value is ignored, and while the worker function
propagate its return value, the actual callback never returns false.
v2: Style fixes. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Only fully-qualified direct derefs, collected in direct_deref_nodes,
are checked for aliasing, so it is already known up front that they
have only array derefs of type direct.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The return value was needed to make use of the old nir_foreach_block
helper, but not needed anymore with the macro version. Then go one
step further and move the foreach directly into the register variable
uses function.
v2: Move foreach to register_variable_uses(). (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
LLVM now emits labels as part of the disassembly string, which is very
useful but breaks the old parsing approach.
Use the semicolon to detect the boundary of instructions instead of going
by line breaks.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will give us meaningful wave information in the case of a hang where
shaders are still running in an infinite loop.
Note that we call umr multiple times for different sections of the ddebug
hang dump, and so the wave information will not necessarily match up
between sections.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
All the information in vk_android_native_buffer.xml is now in vk.xml.
The only exception is the extension type attribute which we can work
around in the generators while we wait for the XML to be fixed.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
This replaces some "if (...} { }" with "if (...) continue;" to reduce
nesting depth and makes nir_metadata_preserve conditional on progress
for the given impl.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
According to Marek, not enabling it on Stoney has a significant
negative performance impact. (And I guess this might impact
performance on Raven as well)
The register settings are pretty much copied from radeonsi. I did
not put this in the pipeline as that would make the pipeline more
dependent on the format which mean we would have to have more
pipelines for the meta shaders.
v2: Don't clear RB+ regs if not enabled as the CLEAR_STATE packet
does already.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
However, it only fails when running out of memory. Now, if we
are about to check that, we should be consistent and check
the allocation of the worklist as well.
CID: 1433512
Fixes: edb18564c7 nir: Initial implementation of a nir_instr_worklist
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The source and destination image parameters were swapped.
No CTS changes on Polaris10, but I suspect this might
fix something.
Fixes: 2a04f5481d ("radv/meta: select resolve paths")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This enables the tile swizzle for some cases of the displayable micro mode,
and it also fixes an addrlib assertion failure on Vega.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Dylan has kindly stepped up to help with 18.1.0, while I've taken the
liberty to nominate Andres for 18.2.0 ;-)
As always, people are welcome to swap/adjust where needed.
v2: Add Juan for 18.0.x (Juan)
Cc: Andres Gomez <agomez@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com> (v1)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Earlier commit enforced that we'll bail out if the number of terminators
is different than 2. With that in mind, the assert() will never trigger.
Fixes: 56b867395d ("glsl: fix infinite loop caused by bug in loop
unrolling pass")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Saves about 2% of compile time for F1 2017, as well as reduce code
size of an optimized libvulkan_radeon.so by about 1 KiB.
This still keeps the hashtable, as we also stored blocks in there.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We don't create variants of the NIR so here we finalise it before
caching to avoid unnecessary processing when restoring it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This makes it easier to follow the code, and also initialises
dynamic_index which will be useful for adding bindless textures
support.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We neeed to skip the var if its not a uniform here as well as checking
the bindless flag since UBOs can contain bindless samplers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This sets the LinkShader function for the driver, but for the st we
set it properly with the following call to st_init_program_functions().
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Once we've gotten rid of everything but the main entrypoint, there's no
reason why we should go ahead and lower them all. This is what radv
does and it will make future work easier.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
As we sometimes reset them to -1, -1 does not mean that they are
not written by the secondary command buffer.
Fixes: ad11fc3571 "radv: don't emit unneeded vertex state."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
At one point this kinda worked (or at least didn't cause problems). But
with deref-instructions it results in dangling deref instructions not
being properly removed.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Not using the macro means no nir_validate in debug builds, resulting in
problems showing up only after later passes.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
So it is more clear about when to use nir_instr_rewrite_src()
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We want to hide the internal details of how the miptree's clear color
is calculated.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The parameters for the compute engine are wrong when using
an E8860 on a big endian machine.
To fix this, convert the contents of struct dispatch_packet
to little endian.
This ensures that get_global_id(0) and similar functions
in the OpenCL code get the correct endian values, and
makes my simple OpenCL program work correctly.
Signed-off-by: Bas Vermeulen <bas@daedalean.ai>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Using mesa OpenCL failed on a big endian PowerPC machine because
si_vgt_param_key is using bitfields and a 32 bit int for an
index into an array.
Fix si_vgt_param_key to work correctly on both little endian
and big endian machines.
Signed-off-by: Bas Vermeulen <bas@daedalean.ai>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
When creating a image from a texture, the image's dri_format is
set to the first plane's format, and used to look up for the
fourcc. e.g. for FOURCC_NV12 texture, the dri_format is set to
__DRI_IMAGE_FORMAT_R8, we end up with a wrong entry in function
intel_lookup_fourcc():
{ __DRI_IMAGE_FOURCC_R8, __DRI_IMAGE_COMPONENTS_R, 1,
{ { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 }, } },
instead of the correct one:
{ __DRI_IMAGE_FOURCC_NV12, __DRI_IMAGE_COMPONENTS_Y_UV, 2,
{ { 0, 0, 0, __DRI_IMAGE_FORMAT_R8, 1 },
{ 1, 1, 1, __DRI_IMAGE_FORMAT_GR88, 2 } } },
as a result, a wrong fourcc __DRI_IMAGE_FOURCC_R8 was returned.
To fix this bug, the image inherits the texture's planar_format that
has the original fourcc; Upon querying, if planar_format is set,
return the saved fourcc; Otherwise fall back to the old way.
v3: add a bug description and "cc mesa-stable" tag (Jason)
remove redundant null pointer check (Tapani)
squash 2 patches into one (James)
v2: fall back to intel_lookup_fourcc() when planar_format is NULL
(Dongwon & Matt Roper)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Xiong, James <james.xiong@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Since 31d91f019b, the makefile tries to
find the file SConstript.spirv instead of SConscript.spirv which breaks
the make dist command.
Reviewed-by: Brian Paul <brianp@vmware.com>
Forgot one check... Too many mistakes for a simple change.
Fixes: f1d7c16e85 ("radv: fix prefetching compute shaders on CIK and older chips")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2:
- Provide a correct explanation on the envvars documentation (Ian).
- Provide a more correct explanation on the function comments (Andres).
v3:
- Homogenize documentation and inline comments (Emil).
- Correct a typo (Emil).
Fixes: 2599b92eb9 ("mesa: allow forcing >=3.1 compatibility contexts
with MESA_GL_VERSION_OVERRIDE")
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: Eric Engestrom <eric.engestrom@imgtec.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Currently, any driver that does not support the ARB_compatibility
extension will fail on GL3.1 context creation if the application does
not request the forward-compatiblity flag.
Restore the original check which changes mesa_api to API_OPENGL_CORE,
only when:
- GL3.1 is requested, without the forward-compatiblity flag.
- driver does not support ARB_compatibility - as deduced by
max_gl_compat_version.
Fixes: a0c8b49284 ("mesa: enable OpenGL 3.1 with ARB_compatibility")
v2:
- Improve commit log (Emil).
- Provide a correct explanation on the features documentation (Ian).
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Eric Engestrom <eric.engestrom@imgtec.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This way we won't fail when validating just because we may have a non
overriden core version that is lower than the requested one, even when
the compat version is high enough.
For example, running glcts from VK-GL-CTS with i965, this will
succeed:
$ MESA_GL_VERSION_OVERRIDE=4.6 ./glcts --deqp-case=KHR-GL46.info.vendor
While, this will fail:
$ MESA_GL_VERSION_OVERRIDE=4.6COMPAT ./glcts --deqp-case=KHR-GL46.info.vendor
Fixes: 464c56d3d5 ("dri_util: Use
_mesa_override_gl_version_contextless")
Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
... to radv_use_tc_compat_htile_for_image(). This function
name makes more sense to me because we want to know if and
only if TC-compat HTILE should be used.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If the image has FMASK metadata, the number of samples is > 1
because radv_image_can_enable_fmask() handles that already.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
DCC for MSAA textures are currently unsupported but that will
be used later on.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
They will help for DCC MSAA textures and if we support mipmaps
in the future.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Add needed infrastructure to use performance monitor
requests for queries.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Tested-by: Chris Healy <cphealy@gmail.com>
We only support importing YUV as OES external resources.
This will change in the future, but for now this fixes the
advertised capabilities in eglQueryDmaBufModifiersEXT.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This adds a helper to check if a pipe format is in YUV color space.
Drivers want to know about this, as YUV mostly needs special handling.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
If a fence is created in between nvc0_hw_end_query and
nvc0_hw_query_fifo_wait, the sequence number in nvc0->screen->fence.bo can
be larger than hq->fence->sequence before the semaphore is created,
resulting in the semaphore never being triggered.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
If the fence has not been emitted, hq->fence->sequence would be zero. This
would result in the semaphore never being triggered, blocking all later
commands in the pushbuf.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
[imirkin: use nouveau_fence_emit instead]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
All consts are now implicitly 2d (they set .Dimension), so trigger
asserts. Also, the texture offset can't handle any sort of 2d indexing.
While this could be tacked on, this seems unnecessary, just move it off
into a separate temp.
Fixes assertion failure in
tests/spec/arb_gpu_shader5/compiler/builtin-functions/fs-gatherOffset-uniform-offset.frag
Note that this was an issue even before the const-always-2d thing, since
there was no detection of when even a proper second dimension was used,
e.g. for UBO or geom/tess inputs.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes a bunch of new CTS pbo tests that use those as an output format,
which the state tracker converts into buffer image writes.
No part of the driver is ready for BGR10A2. It could probably be enabled
on Maxwell+, but seems unnecessary. This error was introduced when
flipping the displayable bit on those formats, which accidentally also
moved the image bit.
Fixes: e1a70aed10 (nv50,nvc0: mark ABGR format as displayable instead of ARGB format)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
tgsi_to_nir emits things with arrays as global vars.. and nir->ir3 does
lower_locals_to_regs. But nothing was lowering global to local, which
breaks compiling tgsi shaders
Signed-off-by: Rob Clark <robdclark@gmail.com>
In the old days (0.42.x), when mesa's meson system was written the
recommendation for handling conditional dependencies was to define them
as empty lists. When meson would evaluate the dependencies of a target
it would recursively flatten all of the arguments, and empty lists would
be removed. There are some problems with this, among them that lists and
dependencies have different methods (namely .found()), so the
recommendation changed to use `dependency('', required : false)` for
such cases. This has the advantage of providing a .found() method, so
there is no need to do things like `dep_foo != [] and dep_foo.found()`,
such a dependency should never exist.
I've tested this with 0.42 (the minimum we claim to support) and 0.45.
On 0.45 this removes warnings about comparing unlike types, such as:
meson.build:1337: WARNING: Trying to compare values of different types
(DependencyHolder, list) using !=.
v2: - Use dependency('', required : false) instead of
declare_dependency(), the later will always report that it is
found, which is not what we want.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
brw_reg::type is "enum brw_reg_type type:4". For whatever reason, GCC
is treating this as an int instead of an enum. As a result, it doesn't
detect missing switch cases and it doesn't detect that flow can get out
of the switch.
This silences the warning:
src/intel/compiler/brw_reg.h: In function ‘bool brw_regs_negative_equal(const brw_reg*, const brw_reg*)’:
src/intel/compiler/brw_reg.h:305:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
In the emit_copies() function, the use of "newv" and "temp" names made
sense when only copies from temporaries to the new variables were
being done. But now there are other calls to copy with other pairings,
and "temp" doesn't always refer to a temporary created in this
pass. Use the names "dest" and "src" instead.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is unnecessary because we record an error which should
be returned by vkEndCommandBuffer(), and the app shouldn't
submit a command buffer when this happens.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Because the check was moved to radv_emit_prefetch_L2().
Fixes: 4ad7595f35 ("radv: rename radv_emit_prefetch() to radv_emit_prefetch_L2()")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This unnecessary when the precision bit flag is not set, and this
might hurt performance. The Vulkan explains that not setting
VK_QUERY_CONTROL_PRECISE_BIT might be more efficient on some
implementations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Enable it directly in the preamble, but do not enable line
on Polaris10/11/12 because there is a hw bug.
There is possibly an issue when MSAA is off, but this doesn't
regress any CTS and AMDVLK doesn't have a workaround as well.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2 (Jason Ekstrand):
- Return the correct enum values from anv_layout_to_fast_clear_type
v3 (Jason Ekstrand):
- Always return ANV_FAST_CLEAR_NONE and leave doing the right thing for
the patch which adds a modifier which supports fast-clears.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Tested-by: Daniel Stone <daniels@collabora.com>
Acked-by: Nanley Chery <nanley.g.chery@intel.com>
si_get_total_colormask accesses NULL pointer on compute shaders
Fixes crashes on clover
Fixes: 0669dca9c0 ("radeonsi: skip DCC render feedback checking if color writes are disabled")
CC: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Adds a new debug tool to pad each GEM BO allocated with (weak)
pseudo-random noise values which are then checked after each
batchbuffer dispatch to the kernel. This can be quite valuable to
find diffucult to track down heisenberg style bugs.
[scott.d.phillips@intel.com: split to separate tool]
v2: (by Scott D Phillips)
- track gem handles per fd (Kevin)
- remove handles on GEM_CLOSE (Kevin)
- ignore prime handles
- meson & shell script
v3: (by Scott D Phillips)
- don't track prime bos at all (Kevin)
- protect the hash table with a mutex (Kevin)
- hook fds by drm_version.name, not path (Chris Wilson)
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Reviewed-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
This is not fully tested. Meson can't link LLVM even though automake can.
PATH=/usr/llvm/x86_64-linux-gnu/bin:$PATH meson build/ -Dgallium-va=false \
-Dplatforms=x11,drm -Dgallium-drivers=radeonsi -Ddri-drivers= \
-Dgallium-omx=disabled -Dgallium-xvmc=false -Dgles1=false \
-Dtexture-float=true -Dvulkan-drivers=
src/gallium/auxiliary/libgallium.a(gallivm_lp_bld_misc.cpp.o):
(.data.rel.ro._ZTI26DelegatingJITMemoryManager[_ZTI26DelegatingJITMemoryManager]+0x10):
undefined reference to `typeinfo for llvm::RTDyldMemoryManager'
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Instead of updating the clear color in anv before a resolve, just let
blorp handle that for us during fast clears.
v5: Update comment about HiZ clear color (Jordan).
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
On Gen10+, instead of copying the clear color from the state buffer to
the surface state, just use the address of the state buffer in the
surface state directly. This way we can avoid the copy from state buffer
to surface state.
v4:
- Remove use_clear_address from anv code. (Jason)
- Use the helper to extract clear color from attachment (Jason)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Extract the code from color_attachment_compute_aux_usage, so we can
later reuse it to update the clear color state buffer.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
On Gen10, when emitting the surface state, use the value stored in the
clear color entry buffer by using a clear color address in the surface
state.
v4: Use the clear color offset from the clear_color_bo, when available.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
On Gen10, whenever we do a fast clear, blorp will update the clear color
state buffer for us, as long as we set the clear color address
correctly.
However, on a hiz clear, if the surface is already on the fast clear
state we skip the actual fast clear operation and, before gen10, only
updated the miptree. On gen10+ we need to update the clear value state
buffer too, since blorp will not be doing a fast clear and updating it
for us.
v4:
- do not use clear_value_size in the for loop
- Get the address of the clear color from the aux buffer or the
clear_color_bo, depending on which one is available.
- let core blorp update the clear color, but also update it when we
skip a fast clear depth.
v5: Better subject (Jordan).
v6: Remove outdated comment (Jason).
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In a follow up patch, we make use of clear_color_bo, which is in
mt->mcs_buf or mt->hiz_buf. To avoid duplicating more code that does the
same thing on both aux buffers, just use aux_buf already.
v5: Add aux_buf to brw_wm_surface_state too.
v6: Drop aux_surf and use aux_buf->surf instead (Jason).
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Add an extra BO to store clear color when we receive the aux buffer from
the window system. Since we have no control over the aux buffer size in
this case, we need the new BO to store only the clear color.
v5:
- Better subject (Jordan).
- Drop alignment from brw_bo_alloc().
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Similarly to vulkan where we store the clear value in the aux surface,
we can do the same in GL.
v2: Remove unneeded extra function.
v3: Use clear_value_state_size instead of clear_value_size.
v4:
- rename to clear_color_state_size
- store clear_color_bo and clear_color_offset in the aux buf struct
v5: Unreference clear color bo (Jordan)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We always want to update the fast clear color during a fast clear on
i965. On anv, we are doing that before a resolve, but by adding support
to blorp, we can do a similar thing and update it during a fast clear
instead.
The goal is to remove some code from anv that does such update, and
centralize everything in blorp, hopefully removing a lot of code
duplication. It also allows us to have a similar behavior on gen < 9 and
gen >= 10.
v5: s/we/we are/ (Jordan)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
On gen10+, if surface->clear_color_addr is present, use it directly
intead of copying it to the surface state.
v4: Remove redundant #if clause for GEN <= 10 (Jason)
v5: Move flush after the reloc, and keep lower bits (Topi).
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
gen10 can emit the clear color by setting it on a buffer somewhere, and
then adding only the address to the surface state.
This commit add support for that on isl_surf_fill_state, and if that is
requested, skip setting the clear value itself.
v2: Add assert to make sure we are at least on gen10.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The size of the clear color struct (expected by the hardware) is 8
dwords (isl_dev.ss.clear_value_state_size here). But we still need to
track the size of the clear color, used when memcopying it to/from the
state buffer. For that we keep isl_dev.ss.clear_value_size.
v4:
- Add struct to gen11 too (Jason, Jordan)
- Add field for Converted Clear Color to gen11 (Jason)
- Add clear_color_state_offset to differentiate from
clear_value_offset.
- Fix all the places where clear_value_size was used.
v5 (Jason):
- Split genxml changes to another commit.
- Remove unnecessary gen checks.
- Bring back missing offset increment to init_fast_clear_color().
v6 (Jason):
- On init_fast_clear_color, change:
addr.offset += 4 => sdi.Address.offset += i * 4
- Use GEN_GEN instead of GEN_VERSIONx10.
[jordan.l.justen@intel.com: isl_device_init changes]
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
genxml does not support having two address fields with different names
but same position in the state struct. Both "Clear Color Address"
and "Clear Depth Address Low" mean the same thing, only for different
surface types.
To workaround this genxml limitation, rename "Clear Color Address"
to "Clear Value Address" and use it for both color and depth. Do the
same for the high bits.
TODO: add support for multiple addresses at the same position in the
xml.
v2: Combine high and low order bits into a single address field.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Some instructions contain fields that are either an address or a value
of some type based on the content of other fields, such as clear color
values vs address. That works fine if these fields are in the less
significant dword, the lower 32 bits of the address, because they get
OR'ed with the address. But if they are in the higher 32 bits, they get
discarded.
On Gen10 we have fields that share space with the higher 16 bits of the
address too. This commit makes sure those fields don't get discarded.
v5: Remove spurious whitespace (Jason).
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The lower bits seem to have extra fields in every platform but gen8
(even though we don't use them in gen9). So just go ahead and avoid
using them for the address.
v4: Use Jason's suggestion for comment explaining the change.
v5: Fix aux_address comment in anv_private.h (Jason)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Fix use of alloca() without #include <c99_alloca.h> in 1da345e5
vbo/vbo_context.c: In function '_vbo_draw_indirect':
vbo/vbo_context.c:284:34: error: implicit declaration of function 'alloca' [-Werror=implicit-function-declaration]
struct _mesa_prim *space = alloca(draw_count*sizeof(struct _mesa_prim));
^~~~~~
vbo/vbo_context.c:284:34: warning: initialization makes pointer from integer without a cast [-Wint-conversion]
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Disabled by default for now, it can be enabled with
RADV_PERFTEST=outoforder.
No CTS regressions on Polaris, and all Vulkan games I tested
look good as well.
Expect small performance improvements for applications where
out-of-order rasterization can be enabled by the driver.
Loosely based on RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
../src/intel/compiler/brw_reg.h: In function ‘bool brw_regs_negative_equal(const brw_reg*, const brw_reg*)’:
../src/intel/compiler/brw_reg.h:305:1: warning: control reaches end of non-void function [-Wreturn-type]
Introduced by 8f83eea71e ("i965: Add negative_equals methods").
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
From the SPIR-V spec, OpTypeImage:
"Depth is whether or not this image is a depth image. (Note that
whether or not depth comparisons are actually done is a property of
the sampling opcode, not of this type declaration.)"
The sampling opcodes that specify depth comparisons are
OpImageSample{Proj}Dref{Explicit,Implicit}Lod, so we should set
is_shadow only for these (we were using the deph property of the
image until now).
v2:
- Do the same for OpImageDrefGather.
- Set is_shadow to false if the sampling opcode is not one of these (Jason)
- Reuse an existing switch statement instead of adding a new one (Jason)
Fixes crashes in:
dEQP-VK.spirv_assembly.instruction.graphics.image_sampler.depth_property.*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Otherwise we may end up trying to coalesce in a case such as
ssa_1 = fadd r1, r2
r3.x = fneg(r2);
r3 = vec4(ssa_1, ssa_1.y, ...)
and that would cause us to move the writes to r3 from the vec to the
fadd which would re-order them with respect to the write from the fneg.
In order to solve this, we just don't coalesce if the destination of the
vec is not SSA. We could try to get clever and still coalesce if there
are no writes to the destination of the vec between the vec and the ALU
source. However, since registers only come from phi webs and indirects,
the chances of having a vec with a register destination that is actually
coalescable into its source is very slim.
Shader-db results on Haswell:
total instructions in shared programs: 13657906 -> 13659101 (<.01%)
instructions in affected programs: 149291 -> 150486 (0.80%)
helped: 0
HURT: 592
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105440
Fixes: 2458ea95c5 "nir/lower_vec_to_movs: Coalesce movs on-the-fly when possible"
Reported-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Tested-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If we close the fd before calling DRM_IOCTL_PRIME_FD_TO_HANDLE the kernel
will hit a -EBADF error. Move the close(fd) call to the end of
anv_CreateDmaBufImageINTEL().
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This fixes a bug in radeonsi where LLVM cannot handle the case where
a break exists but its not the last instruction in the block.
LLVM would fail with:
Terminator found in the middle of a basic block!
LLVM ERROR: Broken function found, compilation aborted!
Fixes: 96fe8834f5 "glsl_to_tgsi: do fewer optimizations with GLSLOptimizeConservatively"
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105317
When running virgl on a GLES host the only sRGB formats that support
rendering is RGBA and RGBX. That pipe format is in the sRGB default
lists that the state tracker uses when mapping mesa formats.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Jakob Bornecrantz <jakob@collabora.com>
Prior to printing a decoded field, print out all dwords that field
belongs to. In particular with address fields spanning multiple
dwords, we want to have all the dwords presented before the field is
decoded to make it easier to read.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
MI_LOAD_REGISTER_IMM can load multiple (register, value) tuples in one
command. In our drivers we only use one tuple at a time, but the
kernel might load more than one at a time.
Instead of making all the tuple part of a group, we leave out the
first tuple (the one we use in the generated packing structures).
This is particularly useful for looking at error stats generated by
the kernel.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
The kernel reports workaround batch buffers, but we're not presenting
them currently. Also they might not be useful for debugging purely
userspace driver issues, when problems arise because of interactions
between kernel & userspace drivers, it's nice to be able to decode
them.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
This variable controls whether we link using the glsl code path or the
spirv path. It's set when we validate that all shaders are glsl or
spirv, but if there are no shaders attached to the program it will
remain unset, resulting in undefined behavior. We want to go down the
glsl path in that case, so initialize to false.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105820
Fixes: 16f6634e7f
("mesa/program: Link SPIR-V shaders using the SPIR-V code-path")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
The driver already supports exporting the Layer and ViewportIndex
built-ins from vertex or tessellation shaders.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Add helpers to get the number of src/dest components for an intrinsic,
and update spots that were open-coding this logic to use the helpers
instead.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we were MARK flag for both preventing loops, and tracking whether
instructions were used, we could end up in an infinite loop due to
bd2ca2bcdd. Instead invert the logic.. mark all instructions UNUSED
up front and clear the flag as we visit them.
Fixes: bd2ca2bcdd freedreno/ir3: eliminate unused false-deps
Signed-off-by: Rob Clark <robdclark@gmail.com>
This reverts commit 41cf30b8bc.
Commit caused regressions with KHR-GLES3.packed_pixels.* tests.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Suggested-by: Eric Anholt <eric@anholt.net>
When allocating a buffer for DRI2, set the modifier to INVALID to inform
the backend that we have no supplied modifiers and it should do its own
thing. The missed initialisation forced linear, even if the
implementation had made other decisions.
This resulted in VC4 DRI2 clients failing with:
Modifier 0x0 vs. tiling (0x700000000000001) mismatch
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reported-by: Andreas Müller <schnitzeltony@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes: 3f8513172f ("gallium/winsys/drm: introduce modifier field to winsys_handle")
v2: rebased on top of subpass rework.
v3: rebased
v4:
- rebased
- reset pending clear views in one go rather one bit at a time (Caio)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When multiview is active a subpass clear may only clear a subset of the
attachment layers. Other subpasses in the same render pass may also
clear too and we want to honor those clears as well, however, we need to
ensure that we only clear a layer once, on the first subpass that uses
a particular layer (view) of a given attachment.
This means that when we check if a subpass attachment needs to be cleared
we need to check if all the layers used by that subpass (as indicated by
its view_mask) have already been cleared in previous subpasses or not, in
which case, we must clear any pending layers used by the subpass, and only
those pending.
v2:
- track pending clear views in the attachment state (Jason)
- rebased on top of fast-clear rework.
v3:
- rebased on top of subpass rework.
v4: rebased.
v5 (Caio):
- Rebased.
- Initialize pending clear views to only have bits set for layers
that exist.
- Reset pending clear views in one go rather one bit at a time.
- Put "last subpass for this attachment" condition in a separate
function to simplify the conditional that resets pending_clear_aspects.
Fixes:
dEQP-VK.multiview.readback_implicit_clear.*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For now we skip SI && HAVE_LLVM < 0x0600 for simplicity. We also skip
setting the more accurate masks for builtin uniforms for now as it
causes some piglit regressions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will be shared by the TGSI and NIR backends. For simplicity
we leave the SI LLVM 5.0 and lower work around only in the TGSI
backend.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Buffers can be large, so we probably don't want to make them all 32x
bigger. But they can't be rendered to (at least in GL) so we don't
need this workaround to prevent page faults on mem<->gmem.
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
We could alternatively fall back to using "old style" draw's for
mem<->gmem (ie. what <= a4xx do) when height is not aligned to 32,
but that is somewhat more work (and not really something that could
be applied to stable)
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Fixes an issue that became possible when we started lowering phi webs to
regs (a7ea2b4e) (although was not really seen until we also switched to
using peephole select pass (ec8bc54a) instead of lowering *all* if/else
to select).
If texture coord (or anything else that uses create_collect() to collect
scalar values in a sequence of scalar registers) was consuming a value
produced on either side of an if/else (ie. a phi lowered to nir reg,
which in ir3 is an "array" of length 1) then register allocation would
happen incorrectly and we'd end up sampling from garbage coordinates.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Some instructions require src/dst to be in full or half precision
register depending on src/dst type. So do a better job of propagating
register type.
Signed-off-by: Rob Clark <robdclark@gmail.com>
We'll also need to be able to create a half-precision immediate. So
re-work create_immed(). Prep work for following patch.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Previously false-dependencies would get flagged as used, even if the
only "use" was a false dep to (for example) prevent a load from being
scheduled after a store.
In addition to being pointless instructions, in some cases they can
cause problems. For example, ldg (and similar instructions) depend on
an immed arg getting CP'd into the instruction, but this doesn't happen
if an instruction is otherwise unused. Which can result in undefined
results (overwriting unintended registers).
Signed-off-by: Rob Clark <robdclark@gmail.com>
Extend translate_sge_slt to emit these, in analogous fashion
but using CNDEv.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Add support for:
- PIPE_FORMAT_ETC1_RGB8
- PIPE_FORMAT_DXT1_RGB
- PIPE_FORMAT_DXT1_RGBA
- PIPE_FORMAT_DXT3_RGBA
- PIPE_FORMAT_DXT5_RGBA
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Denormalized texture coordinates are required for text rendering in
GALLIUM_HUD.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Textures will sometimes be updated if texture view state was
un-set, without this change that causes an assertion crash or
segfault.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Compose swizzles using util_format_compose_swizzles instead
of the custom code (which somehow had a bug).
This makes the GL_ALPHA internal format work.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Change use of BLEND_ to BLEND2_,
BLEND_* a3xx_rb_blend_opcode
BLEND2_* is a2xx_rb_blend_opcode
This makes no effective difference as the used enumerant has the same
value (0), but the other enumerants do not match 1-to-1 so this will
avoid future problems.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
The format enumeration comes comes from the yamoto
register headers that are part of the amd-gpu kernel driver.
(see freedreno envytools commit b8fb7978e7ae106d0d11d0b238ab2ba2d4dd9d43)
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Avoid using malloc in the draw path of mesa.
Since the draw_count is a user api input, fall back to malloc if
the amount of consumed stack space may get too high.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Move the files, adapt to the naming scheme in tnl, update callers
and build system.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
The legacy draw paths from back before 2012 contained a gl_vertex_array
array for the inputs to be used for draw. So all draw methods from legacy
drivers and everything that goes through tnl are originally written
for this calling convention. The same goes for tools like t_rebase or
vbo_split*, that even partly still have the original calling convention
with a currently unused such pointer.
Back in 2012 patch 50f7e75
mesa: move gl_client_array*[] from vbo_draw_func into gl_context
introduced Array._DrawArrays, which was something that was IMO aiming for
a similar direction than Array._DrawVAO introduced recently.
Now several tools like t_rebase and vbo_split*, which are mostly used by
tnl based drivers, would need to be converted to use the internal
Array._DrawVAO instead of Array._DrawArrays. The same goes for the driver
backends that use any of these tools.
Alternatively we can reintroduce the gl_vertex_array array in its call
argument list and put these tools finally into the tnl directory.
So this change reintroduces this gl_vertex_array array for the legacy
draw paths that are still required for the tools t_rebase and vbo_split*.
A followup will move vbo_split also into tnl.
Note that none of the affected drivers use the DriverFlags.NewArray
driver bit. So it should be safe to remove this also for the legacy
draw path.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Remove the vbo_indirect_draw_func vbo callback and make the default
implementation use the drivers main draw callback function directly.
This will be needed with the next changes when drivers without own main
drivers DrawIndirect implementation get moved to the main drivers
Draw method.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Let the i965 backend have its own gl_vertex_array array and basically
reimplement the way _vbo_draw works.
Note that brw_draw_indirect_prims calls brw_draw_prims internally
and gets its update to Array._DrawArray by this way.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Let the gallium backend have its own gl_vertex_array array and basically
reimplement the way _vbo_draw works.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Otherwise, any indirect push constant access results in an assertion
failure when we start digging through the channel_sizes array. This
fixes dEQP-VK.pipeline.push_constant.graphics_pipeline.dynamic_index_vert
on Haswell. It should be a harmless no-op for GL since indirect push
constants aren't used there.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: e69e5c7006 "i965/vec4: load dvec3/4 uniforms first in the..."
Because nir_instr_remove is an inline wrapper around nir_instr_remove_v,
the compiler should be able to tell that the return value is unused and
not emit the extra code in most cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows the WGL_SWAP_INTERVAL env var to override any application
calls to wglSwapIntervalEXT(). Useful for debugging, or to set the
interval to zero to effectively disable the swap interval.
Note: we also rename the previous instance of SVGA_SWAP_INTERVAL to
WGL_SWAP_INTERVAL since this is a WGL feature and not related to the
svga driver.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This fixes a Windows build warning where the prototypes for the ES
function in the header file don't match the prototypes in this file
because the GL_API and GLAPI macros are defined differently.
v2: defined GL_API to KEYWORD1 instead of GLAPI, per Mathias.
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
The MSVC compiler warns when the function parameter types don't
exactly match with respect to enum vs. uint32_t. Use SpvOp everywhere.
Alternately, uint32_t could be used everywhere. There doesn't seem
to be an advantage to one over the other.
Reviewed-by: Neil Roberts <nroberts@igalia.com>
The SCons build broke with commit ba975140d3 because a SPIR-V
function is called from Mesa main. This adds a convenience library for
SPIR-V and adds it to everything that was including nir. It also adds
both nir and spirv to drivers/x11/SConscript.
Also add nir/spirv modules to osmesa and libgl-gdi targets. (Brian Paul)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105817
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
In the BITFIELD_MASK() macro, if b==32 the expression evaluates to
~0u, but the compiler still sees the expression (1 << 32) in the
unused part and issues a warning about integer bitshift overflow.
Fix that by using (b) % 32 to ensure the max shift is 31 bits.
This issue has been present for a while, but shows up much more
often because of the recent VBO changes.
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This assert is hit on hardware which does not expose GL 4.4 or GLES 3.1.
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Signed-off-by: Jakob Bornecrantz <jakob@collabora.com>
The version passed to QueryVersion requests is the version that the
client supports. We were just passing in whatever version of XCB was
present on the system, which may not be a version that Mesa actually
explicitly supports, e.g. it might bring unwanted semantics.
Set specific protocol versions which we support, and only pass those.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Fixes: 7aeef2d4ef ("dri3: allow building against older xcb (v3)")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This is the main fork of the shader compilation code-path, where a NIR
shader is obtained by calling spirv_to_nir() or glsl_to_nir(),
depending on its nature..
v2: Use 'spirv_data' member from gl_linked_shader to know which method
to call. (Timothy Arceri)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is basically a wrapper around spirv_to_nir() that includes
arguments setup and post-conversion validation.
v2: * Rebase update (SpirVCapabilities not a pointer anymore,
spirv_to_nir_options added, and others).
* Code-style improvements and remove debug hunk. (Timothy Arceri)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is the equivalent to link_shaders() from
src/compiler/glsl/linker.cpp, but for SPIR-V programs. It just
creates the program and its gl_linked_shader objects, giving drivers
the opportunity to implement any linking of SPIR-V shaders they choose,
at a later stage.
v2: Bail out if we see more that one shader for the same stage, and
add a corresponding comment. (Timothy Arceri)
v3:
* Adds also a linker error log to the condition above, with a
reference to the specification issue. (Timothy Arceri)
* Squash with the patch adding the function boilerplate (Timothy
Arceri)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is a reference to the spirv_data object stored in gl_shader, which
stores shader SPIR-V data that is needed during linking too.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
v2:
* Use gl_spirv_validation instead of spirv_to_nir. This method just
validates the shader. The conversion to NIR will happen later,
during linking. (Alejandro Piñeiro)
* Use gl_shader_spirv_data struct to store the SPIR-V data.
(Eduardo Lima)
* Use the 'spirv_data' member to tell if the gl_shader is a SPIR-V
shader, instead of a dedicated flag. (Timothy Arceri)
Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
ARB_gl_spirv adds the ability to use SPIR-V binaries, and a new
method, glSpecializeShader. Here we add a new function to do the
validation for this function:
From OpenGL 4.6 spec, section 7.2.1"
"Shader Specialization", error table:
INVALID_VALUE is generated if <pEntryPoint> does not name a valid
entry point for <shader>.
INVALID_VALUE is generated if any element of <pConstantIndex>
refers to a specialization constant that does not exist in the
shader module contained in <shader>.""
v2: rebase update (spirv_to_nir options added, changes on the warning
logging, and others)
v3: include passing options on common initialization, doesn't call
setjmp on common_initialization
v4: (after Jason comments):
* Rename common_initialization to vtn_builder_create
* Move validation method and their helpers to own source file.
* Create own handle_constant_decoration_cb instead of reuse existing one
v5: put vtn_build_create refactoring to their own patch (Jason)
v6: update after vtn_builder_create method renamed, add explanatory
comment, tweak existing comment and commit message (Timothy)
Refactored from spirv_to_nir, in order to be reused later.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
v2: renamed method (from vtn_builder_create), add explanatory comment
(Timothy)
Needed for ARB_gl_spirv. Those are not the same that the Intel vulkan
driver. From the ARB_spirv_extensions spec:
"3. If a new GL extension is added that includes SPIR-V support via
a new SPIR-V extension does it's SPIR-V extension also get
enumerated by the SPIR_V_EXTENSIONS_ARB query?.
RESOLVED. Yes. It's good to include it for consistency. Any SPIR-V
functionality supported beyond the SPIR-V version that is required
for the GL API version should be enumerated."
So in addition to the core SPIR-V support, there is the possibility of
specific GL extensions enabling specific SPIR-V extensions (so
capabilities). That would mean that it is possible that OpenGL and
Vulkan not having the same capabilities supported, even for the same
driver. For this reason it is better to keep them separated.
As an example: at the time of this patch writing Intel vulkan driver
support multiview, but there isn't any OpenGL multiview GL extension
supported.
Note: we initialize SPIR-V capabilities at brwCreateContext instead of
the usual brw_initialize_context_constants because we want to do that
only if the extension is enabled.
v2:
* Rebase update (SpirVCapabilities not a pointer anymore)
* Fill spirv capabilities for OpenGL >= 3.3 (Ian Romanick)
v3:
* Drop multiview support, as i965 doesn't support any multiview GL
extension (Jason)
* Fill spirv capabilities only if the extension is enabled (Jason)
v4: Capabilities are supported only on gen7+. Added comment and assert
(Jason)
Let the lowering in NIR handle it instead.
This hurts one shader that occurs twice in shader-db (SynMark GSCloth)
on IVB and HSW. No other shaders or platforms were affected.
total cycles in shared programs: 253438422 -> 253438426 (0.00%)
cycles in affected programs: 412 -> 416 (0.97%)
helped: 0
HURT: 2
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Antia Puentes <apuentes@igalia.com>
Future changes will add generated files used only from
src/compiler/glsl. These can't be built from Makefile.nir.am, and we
can't move all the rules from Makefile.nir.am to Makefile.spirv.am (and
it would be silly anyway).
v2: Do it for meson too.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (the meson bits)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (the automake bits)
This slightly simplifies later changes that add more Makefile.*.am
files.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Previously bitset.h would include u_math.h to get bitscan.h. u_math.h
lives in src/gallium/auxiliary/util while both bitset.h and bitscan.h
live in src/util. Having the one file directly include another file
that lives in the same directory makes much more sense.
As a side-effect, several files need to directly include standard header
files that were previously indirectly included.
v2: Fix build break in src/amd/common/ac_nir_to_llvm.c.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Previously size=0, element_size=0 would have been allowed. That
combination can only lead to despair.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
The new name make the zero-input behavior more obvious. The next
patch adds a new function with different zero-input behavior.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This corrects pkg-config to use the libdrm version (as computed by the
previous patch) instead of using a hardcoded value that may or may not
(probably not) be right.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Currently each driver specifies it's own version, and core libdrm
specifies a version. In the most common case this is fine, since there
will be exactly one libdrm installed on a system, but if there are more
than one it's possible that mesa will be linked against different
versions of libdrm. There is also the possibility that the current
approach makes the pkg-config files we generate incorrect, since there
could be #defines that use newer features if they're available.
This patch corrects all of that. All of the versions are still set by
driver (along with a default core version). Then all of the drivers that
are enabled have their versions compared and the highest version is
selected, then all libdrm checks are made with that version.
v2: - Reorder the list to have the name first and whether the dependency
is needed second (Eric)
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
in glapi_dispatch.c, as we have for many other GLES functions.
Fixes a cross-compile issue (missing prototype) when GLES support
is disabled.
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Found running "The Witness" in Wine. Without this patch, texture views created
on multi-sample textures would have a GL_TEXTURE_SAMPLES of 0. All things
considered such views actually work surprisingly well, but when combined with
(plain) multi-sample textures in a framebuffer object, the resulting FBO is
incomplete because the sample counts don't match.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
To fix a regression in:
dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output.struct
And the following regressions (Polaris only):
dEQP-VK.glsl.indexing.varying_array.*
Fixes: f3275ca01c ("ac/nir: only enable used channels when exporting parameters")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
In the absence of a general NIR or VIR-level scheduler, this at least
avoids spilling in
GTF-GLES3.gtf.GL3Tests.uniform_buffer_object.uniform_buffer_object_storage_layouts
I'm disappointed that the compiler didn't warn me about use of
uninitialized uc in these paths. Just use the incoming clear color
instead of the packing temporary if we're doing our own packing.
Fixes GTF-GLES3.gtf.GL3Tests.color_buffer_float.color_buffer_float_clamp_*
We always want A in the A slot in the tile buffer, and any other swapping
should happen elsewhere.
Fixes RGBA4-using cases in fbo-clear-formats and
GTF-GLES3.gtf.GL3Tests.color_buffer_float.color_buffer_float_clamp_fixed.
We can't necessarily finalize the texture at this point if we're rendering
to a texture image whose format is different from the baselevel's format.
This was introduced as a fix for fbo-incomplete-texture-03 in
de414f4915, but the later fix for vmware on
that testcase in 95d5c48f68 made it
unnecessary.
Fixes assertion failures in util_resource_copy_region() in
KHR-GLES3.copy_tex_image_conversions.forbidden.* when trying to finalize
an R8 texture image to the RG8 texture object's pt.
Reviewed-by: Brian Paul <brianp@vmware.com>
On GFX9 whether the buffer size is interpreted as elements or bytes
depends on whether IDXEN is enabled in the instruction. If the index
is a constant zero, LLVM optimizes IDXEN to 0.
Now the size in elements is interpreted in bytes which of course
results in out of bounds accesses.
The correct fix is most likely to disable the LLVM optimization,
but we need something to work with LLVM <= 6.0.
radeonsi does the max between stride and element count on the CPU
but that results in the size intrinsics returning the wrong size
for the buffer. This would cause CTS errors for radv.
v2: Also include the store changes.
Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
nir_intrinsics.c existed as a static file until commit 76dfed8ae2 began
generating it as part of the build process. autotools is incapable of
coping, and so a build-tree from before this commit would then fail with
it:
[4]: *** No rule to make target '../../../mesa/src/compiler/nir/nir_intrinsics.c', needed by 'nir/nir_intrinsics.lo'. Stop.
Add a few lines to configure.ac to update the broken build files.
Fixes: 76dfed8ae2 ("nir: mako all the intrinsics")
Otherwise meson won't read the VERSION file and won't set a version.
That means that pkg-config files will have version unset as well.
Fixes: 3e9533d9b8
("meson: Add script to use VERSION file for getting version")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
I assume this was implemented in a previous version of that commit, but
was removed in the version that actually landed.
Fixes: 8430af5ebe "Add support for swrast to the DRM EGL platform"
Cc: Giovanni Campagna <gcampagna@src.gnome.org>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The power-of-two padded size that gets minified is based on level 1's
dimensions, not level 0's, which starts to differ at a width of 9.
Fixes all failures on texelFetch fs sampler2D 1x1x1-64x64x1
bo->align is always 0; there's no need to waste 8 bytes storing it.
Thanks to C99 initializers zeroing fields, we can completely drop the
only read of the field altogether.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Buffers are always page aligned on 965+ hardware; I believe this extra
parameter is a vestige from the Gen2-3 era.
All callers pass 0, and in fact we assert that the alignment is 0 unless
BO_ALLOC_BUSY is set (for some reason). We can just drop the parameter
and set the value to 0 explicitly.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
intel_miptree_create_for_bo does not actually allocate a BO, so
specifying allocation flags accomplishes nothing and is confusing.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is just zero - passing nothing already gives us a post-sync
operation of "nothing".
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
I have no idea why but having dest_components == -1 was causing a memory
leak somewhere. Without this, you can't get through a full shader-db
run without running out of memory.
Reviewed-by: Rob Clark <robdclark@gmail.com>
This was causing us to walk dest_components times over a thing with no
destination. This happened to work because all of the image intrinsics
without a destination also happened to have dest_components == 0. We
shouldn't be reading dest_components if has_dest == false.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There were two problems, both of which are fixed now:
- The indirect address was not being shifted by 4
- The indirect address was being placed as an argument in the offset case
This fixes some of the new interpolateAt* piglits which now test for
these situations.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
When an if nesting inside anouther if is optimised away we can
end up with a loop terminator and following block that looks like
this:
if ssa_596 {
block block_5:
/* preds: block_4 */
vec1 32 ssa_601 = load_const (0xffffffff /* -nan */)
break
/* succs: block_8 */
} else {
block block_6:
/* preds: block_4 */
/* succs: block_7 */
}
block block_7:
/* preds: block_6 */
vec1 32 ssa_602 = phi block_6: ssa_552
vec1 32 ssa_603 = phi block_6: ssa_553
vec1 32 ssa_604 = iadd ssa_551, ssa_66
The problem is the phis. Loop unrolling expects the last block in
the loop to be empty once we splice the instructions in the last
block into the continue branch. The problem is we cant move phis
so here we lower the phis to regs when preparing the loop for
unrolling. As it could be possible to have multiple additional
blocks/ifs following the terminator we just convert all phis at
the top level of the loop body for simplicity.
We also add some comments to loop_prepare_for_unroll() while we
are here.
Fixes: 51daccb289 "nir: add a loop unrolling pass"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105670
The location was only being incremented the first time we processed a
location. This meant we would incorrectly skip some elements of
an array if the first element was packed and proccessed previously
but other elements were not.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We need to wait until after the writemask is widened before we
adjust it for component packing.
Together with the previous patch this fixes a number of
arb_enhanced_layouts component layout piglit tests.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fixes tcs/tes varying arrays where we dont lower indirects and
therefore don't split arrays. Here we also fix useagemask for dual
slot doubles.
Fixes a number of arb_tessellation_shader piglit tests.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
How many times did I look at this table without noticing the missing 'G'
in the texture column?
Fixes KHR-GLES3.copy_tex_image_conversions.required.* on 7268.
Apparently it is not happy about things like: .foo = {}
So skip over initializers for empty lists.
Fixes: 76dfed8ae2
Reported-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Note: the file was originally 17.4.0, yet git stuggles to detect the
move :-\
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit dceb1ce807)
I threatened to do this a long time ago.. I probably *should* have done
it a long time ago when there where many fewer intrinsics. But the
system of macro/#include magic for dealing with intrinsics is a bit
annoying, and python has the nice property of optional fxn params,
making it possible to define new intrinsics while ignoring parameters
that are not applicable (and naming optional params). And not having to
specify various array lengths explicitly is nice too.
I think the end result makes it easier to add new intrinsics.
v2: couple small fixes found with a test program to compare the old and
new tables
v3: misc comments, don't rely on capture=true for meson.build, get rid
of system_values table to avoid return value of intrinsic() and
*mostly* remove side-effects, add autotools build support
v4: scons build
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
EXT_color_buffer_float spec states:
"An INVALID_OPERATION error is generated ... if the color buffer is
a floating-point format and type is not FLOAT, HALF FLOAT, or
UNSIGNED_INT_10F_11F_11F_REV."
This means that GL_HALF_FLOAT type should be supported when color
buffer has floating-point format.
Fixes Android CTS test android.view.cts.PixelCopyTest.
v2: remove comments of EXT_color_buffer_half_float as
EXT_color_buffer_float can use type GL_HALF_FLOAT
Signed-off-by: Lin Johnson <johnson.lin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This is the actual hardware layout, and we were only swizzling R/B back
around in texturing. Fixes part of
KHR-GLES3.copy_tex_image_conversions.required.cubemap_negx_cubemap_negx in
simulation.
Once we've disabled EZ for some draws, we need to not use EZ on future
draws. Implementing that made implementing the GT/GE direction trivial.
Fixes KHR-GLES3.shaders.fragdepth.compare.no_write on V3D 4.1 simulation.
On 3.x, we just don't flag the primitive as needing TF, but those
primitive bits are now allocated to the new primitive types. Now we need
to actually update the enable flag at draw time.
The next job from this client will turn it back on unless TF gets
disabled, but we don't want the state to leak from this client to another
(which causes GPU hangs).
I had this note to myself, and it turns out that a lot of CTS tests use
XFB with points to get data out without using a fragment shader. Keep
track of two sets of precomputed TF specs (point size in VPM prologue or
not), and switch between them when we enable/disable point size.
The length-1 field only has 4 bits, so we need to generate separate specs
when there's too much TF output per buffer.
Fixes
GTF-GLES3.gtf.GL3Tests.transform_feedback.transform_feedback_builtin_type
and transform_feedback_max_interleaved.
GTF-GLES3.gtf.GL3Tests.instanced_arrays.instanced_arrays_divisor uses -1
as a divisor, so we would overflow to count=0 and upload no data,
triggering the assert below. We want to upload 1 element in this case,
fixing the test on VC5.
v2: Use some more obvious logic, and explain why we don't use the normal
round_up().
Reviewed-by: Brian Paul <brianp@vmware.com>
There's nothing to worry about here -- the A channel just gets dropped by
the blit. This avoids a segfault in the fallback path when copying from a
RGBA16_SINT renderbuffer to a RGB16_SINT destination represented by an
RGBA16_SINT texture (the fallback path tries to get/fetch to float
buffers, but the float pack/unpack functions are NULL for SINT/UINT).
Fixes KHR-GLES3.packed_pixels.pbo_rectangle.rgba16i on VC5.
v2: Extract the logic to a helper function and explain what's going on
better.
v3: const-qualify args
Reviewed-by: Brian Paul <brianp@vmware.com>
Just checking for 2 jumps is not enough to be sure we can do a
complex loop unroll. We need to make sure we also have also found
2 loop terminators.
Without this we were attempting to unroll a loop where the second
jump was nested inside multiple ifs which loop analysis is unable
to detect as a terminator. We ended up splicing out the first
terminator but failed to actually unroll the loop, this resulted
in the creation of a possible infinite loop.
Fixes: 646621c66d "glsl: make loop unrolling more like the nir unrolling path"
Tested-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105670
A recent commit (see below) triggered some cases where conditional
modifier propagation and dead code elimination would cause a MAD
instruction like the following to be generated:
mad.l.f0 null, ...
Matt pointed out that fs_visitor::fixup_3src_null_dest() fixes cases
like this in the scalar backend. This commit basically ports that code
to the vec4 backend.
NOTE: I have sent a couple tests to the piglit list that reproduce this
bug *without* the commit mentioned below. This commit fixes those
tests.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Fixes: ee63933a7 ("nir: Distribute binary operations with constants into bcsel")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105704
No shader-db changes. This source must have been written by a previous
instruction, so it cannot be a uniform or a shader input. However, this
change allows the next commit to help more shaders.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
No shader-db changes. This source must have been written by a previous
instruction, so it cannot be a uniform or a shader input. However, this
change allows the next commit to help about 900 more shaders.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This method is similar to the existing ::equals methods. Instead of
testing that two src_regs are equal to each other, it tests that one is
the negation of the other.
v2: Simplify various checks based on suggestions from Matt. Use
src_reg::type instead of fixed_hw_reg.type in a check. Also suggested
by Matt.
v3: Rebase on 3 years. Fix some problems with negative_equals with VF
constants. Add fs_reg::negative_equals.
v4: Replace the existing default case with BRW_REGISTER_TYPE_UB,
BRW_REGISTER_TYPE_B, and BRW_REGISTER_TYPE_NF. Suggested by Matt.
Expand the FINISHME comment to better explain why it isn't already
finished.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v3]
Reviewed-by: Matt Turner <mattst88@gmail.com>
Not used in GL but 8 and 16 component vectors exist in OpenCL.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Refactor things so there isn't so much typing involved to add new
things.
Also drops a pointless conditional (out of bounds rows or columns
already returns error_type in all paths.. might as well drop it
rather than make the check more convoluted in the next patch by
adding the vec8/vec16 case).
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The currently we use the singular CHECK_HEADER combined with explicit
append to the DEFINES variable. That is a legacy misnomer, since it
requires us to add $DEFINES to every piece that we build.
Using the plural version of the helper sets the HAVE_ macro for us, plus
ensures it's passed to the compiler - if config.h is available in there
(not in the case of mesa) otherwise on the command line.
In hindsight, we should replace all the AC_CHECK_{FUNC,HEADER} instances
with the plural version (or even the _ONCE suffixed version) and drop
the DEFINES hacks.
Fixes: cbee1bfb34 ("meson/configure: detect endian.h instead of trying
to guess when it's available")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105717
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Need to update the tgsi code and st_glsl_to_tgsi code at the same time
to prevent compile break since C++ is much pickier about implicit
enum/unsigned casting.
Bump size of glsl_to_tgsi_instruction::op to 10 bits to be sure to
avoid MSVC signed enum overflow issue. No change in class size.
Reviewed-by: Eric Anholt <eric@anholt.net>
I don't think it actually fixes anything, but that's nice not to have valgrind warnings.
It manifests itself when running the piglit test : glsl-fs-raytrace-bug27060
==2058== Uninitialised byte(s) found during client check request
==2058== at 0xC5BB040: blob_write_bytes (blob.c:152)
==2058== by 0xC595359: write_variable (nir_serialize.c:144)
==2058== by 0xC59560C: write_var_list (nir_serialize.c:192)
==2058== by 0xC5982E4: nir_serialize (nir_serialize.c:1124)
==2058== by 0xC0B729D: brw_program_serialize_nir (brw_program.c:835)
==2058== by 0xC0AB2D6: brw_link_shader (brw_link.cpp:358)
==2058== by 0xC32FE3F: _mesa_glsl_link_shader (ir_to_mesa.cpp:3169)
==2058== by 0xC36C7ED: create_new_program(gl_context*, state_key*) (ff_fragment_shader.cpp:1127)
==2058== by 0xC36C8A6: _mesa_get_fixed_func_fragment_program (ff_fragment_shader.cpp:1157)
==2058== by 0xC1B50AF: update_program (state.c:134)
==2058== by 0xC1B56DF: _mesa_update_state_locked (state.c:352)
==2058== by 0xC1B579A: _mesa_update_state (state.c:386)
==2058== Address 0xf1eab8a is 58 bytes inside a block of size 96 alloc'd
==2058== at 0x4C2CB8F: malloc (vg_replace_malloc.c:299)
==2058== by 0xC0FD306: ralloc_size (ralloc.c:121)
==2058== by 0xC0FD5B1: ralloc_array_size (ralloc.c:208)
==2058== by 0xC452B3B: (anonymous namespace)::nir_visitor::visit(ir_variable*) (glsl_to_nir.cpp:448)
==2058== by 0xC45CE8B: ir_variable::accept(ir_visitor*) (ir.h:428)
==2058== by 0xC46D0B5: visit_exec_list(exec_list*, ir_visitor*) (ir.cpp:1898)
==2058== by 0xC451D2F: glsl_to_nir (glsl_to_nir.cpp:162)
==2058== by 0xC0B5223: brw_create_nir (brw_program.c:79)
==2058== by 0xC0AAB67: brw_link_shader (brw_link.cpp:257)
==2058== by 0xC32FE3F: _mesa_glsl_link_shader (ir_to_mesa.cpp:3169)
==2058== by 0xC36C7ED: create_new_program(gl_context*, state_key*) (ff_fragment_shader.cpp:1127)
==2058== by 0xC36C8A6: _mesa_get_fixed_func_fragment_program (ff_fragment_shader.cpp:1157)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Instead we will re-generate them again on building.
v2: get rid of BUILT_SOURCES (Daniel, Emil)
v3: keep BUILT_SOURCES for egl/Makefile.am (Emil)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The hardware only supports 32-bit depth surfaces, but we can
enable TC-compat HTILE for 16-bit depth surfaces if no Z planes
are compressed.
The main benefit is to reduce the number of depth decompression
passes. Also, we don't need to implement DB->CB copies which is
fine.
This improves Serious Sam 2017 by +4%. Talos and F12017 are also
affected but I don't see a performance difference.
This also improves the shadowmapping Vulkan demo by 10-15%
(FPS is now similar to AMDVLK).
No CTS regressions on Polaris10.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Generated with
git grep -l nir_intrinsic_image | xargs \
sed -i 's/nir_intrinsic_image/nir_intrinsic_image_var/g'
and some manual fixing in nir_intrinsics.h
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Fixes
GTF-GLES3.gtf.GL3Tests.vertex_type_2_10_10_10_rev.vertex_type_2_10_10_10_rev_invalid2,
where we hadn't thrown a GL error as needed in the extension-disabled
case. We want to be exposing the extension anyway.
Our backend needs some sort of vertex position value to emit the scaled
viewport values and such. Fixes potential segfaults in
KHR-GLES3.copy_tex_image_conversions.required.cubemap_negx_cubemap_negx
Some equations of the CNL metrics started to use operators we haven't
defined yet, just add those.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With the introduction of asymmetric slices in CNL, we cannot rely on
the previous SUBSLICE_MASK getparam to tell userspace what subslices
are available.
We introduce a new uAPI in the kernel driver to report exactly what
part of the GPU are fused and require this to be available on Gen10+.
Prior generations can continue to rely on GETPARAM on older kernels.
This patch is quite a lot of code because we have to support lots of
different kernel versions, ranging from not providing any information
(for Haswell on 4.13 through 4.17), to being able to query through
GETPARAM (for gen8/9 on 4.13 through 4.17), to finally requiring 4.17
for Gen10+.
This change stores topology information in a unified way on
brw_context.topology from the various kernel APIs. And then generates
the appropriate values for the equations from that unified topology.
v2: Move slice/subslice masks fields to gen_device_info (Rafael)
v3: Add a gen_device_info_subslice_available() helper (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There are a couple of ways we can get the fusing information from the
kernel :
- Through DRM_I915_GETPARAM with the SLICE_MASK/SUBSLICE_MASK
parameters
- Through the new DRM_IOCTL_I915_QUERY by requesting the
DRM_I915_QUERY_TOPOLOGY_INFO
The second method is more accurate and also gives us the EUs fusing
masks. It's also a requirement for CNL as this platform has asymetric
subslices and the first method SUBSLICE_MASK value is assumed uniform
across slices.
v2: Change gen_device_info_update_from_masks() to generate topology
and call into gen_device_info_update_from_topology (Lionel/Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We want to store values coming from the kernel but as a first step, we
can generate mask values out the numbers already stored in the
gen_device_info masks.
v2: Add a helper to set EU masks (Lionel/Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be reused to store values reported by the kernel. The main
use case will be for use as the input values of the metric sets
equations for the INTEL_performance_queries extension. By storing this
information in the gen_device_info we make this non GL specific so
this can be reused by Vulkan if we ever have an equivalent extension.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit cb2ddcefa5.
This causes clang to error out building C++ code. The plan is to fix the
build to work with clang, but in the mean time we'll just revert this
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
When registring configurations to the kernel for the first time, we
run into an issue where the id number is not properly set (we're using
the wrong variable). As a result when trying to use that id later on,
we get an error.
This issue manifest itself the first time you use frameretrace after
reboot, subsequent runs are fine.
Fixes: 27ee83eaf7 ("i965: perf: add support for userspace configurations")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add a new struct kms_sw_plane which delegate a plane and use it
in place of sw_displaytarget. Multiple planes share same underlying
kms_sw_displaytarget.
v2:
- add more check for plane size (Tomasz)
v3:
- split from larger patch (Emil)
v4:
- no change from v3
v5:
- remove mapped field (Tomasz)
v6:
- remove change-id in commit message (Tomasz)
v7:
- add revision history in commit message (Emil)
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
If user calls map twice for kms_sw_displaytarget, the first mapped
buffer could get leaked. Instead of calling mmap every time, just
reuse previous mapping. Since user could map same displaytarget with
different flags, we have to keep two different pointers, one for rw
mapping and one for ro mapping. Also introduce reference count for
mapped buffer so we can unmap them at right time.
v2:
- avoid duplicated mapping and leaked mapping (Tomasz)
v3:
- split from larger patch (Emil)
v4:
- remove munmap from dt_destory (Emil)
v5:
- introduce reference count for mapping (Tomasz)
- add back munmap in dt_destory
v6:
- remove change-id in commit message (Tomasz)
v7:
- remove munmap from dt_destory again (Emil)
- add revision history in commit message (Emil)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Lepton Wu <lepton@chromium.org>
As the other VC4 files do. Otherwise, it won't find nir_builder.h
v2: add path in source code rather changing autotools (Emil)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Disabling fast color clear makes fbo-clearmipmap test render correct
texture in base miplevel. Fast color clear is anyways disabled for
non-base miplevels.
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Rafael ran piglit with the test code enabled and saw no additional GPU
hangs.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On gen11+ AUX_HIZ is not a supported value for surfaces being
sampled by the 3D sampler.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We all know the platform names, and I don't want to update this list
continually.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
commit 03dd9a88b0 introduced per surface
queues, but the display_sync for swrast_commit_backbuffer remained on
the old queue. This is likely to break when dispatching the correct
queue at the top of function (which can't dispatch the sync callback
we're waiting for).
The easiest known reproduction case is running weston-subsurfaces under
weston --use-pixman
Signed-off-by: Derek Foreman <derekf@osg.samsung.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
We're trying to be -Wundef clean so that we can turn it on (and
eventually make it an error).
Note that the OMX code already used `#if ENABLE_ST_OMX_BELLAGIO` instead
of #ifdef; I could've changed these, but the point of -Wundef is to
catch typos, so we might as well make the change the right way.
Fixes: 83d4a5d5ae "st/omx/tizonia: Add H.264 decoder"
Fixes: b2f2236dc5 "st/omx/tizonia: Add H.264 encoder"
Fixes: c62cf1f165 "st/omx/tizonia/h264d: Add EGLImage support"
Cc: Gurkirpal Singh <gurkirpal204@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
and let's make sure `with_gallium_omx` is never 'auto' and can only be
one of [bellagio, tizonia, disabled].
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The DriverFlags.NewArray bit is already set to NewDriverState in
_mesa_set_draw_vao since we have actually just above changed the VAOs
content. So this can be removed.
The _vbo_update_inputs is called by the vbo...recalculate_inputs being
set through the same mechanism as described above.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
At the current state, _vbo_update_inputs is called from
the draw callback if vbo...recalculate_inputs is set.
But that is now set of the _DrawVAO or its content or the
vertex program mode is changed.
So remove _vbo_update_inputs from the direct dlist draw path.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Now that setting vbo...recalculate_inputs also sets the
DriverFlags.NewArray bits into the NewDriverState setting that from
vbo_bind_arrays is redundant.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
This flag is now set when the actual Array._DrawVAO changes.
So setting this flag is redundant here.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Since arrays also handle the mapping of current values into the
disabled array slots, we need to tell the array update code that
this mapping has changed. Also mark only dirty if it has changed.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Both mean something very similar and are set at the same time now.
For that vbo module to be set from core mesa, implement a public vbo
module method to set that flag. In the longer term the flag should
vanish in favor of a driver flag of the appropriate driver.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Update the VAO internal state on Array._DrawVAO instead of
Array.VAO. Also the VAO internal state update gets triggered now
by a change of Array._DrawVAO instead of the _NEW_ARRAY state flag.
Also no driver looks at any VAO's NewArrays value from within
the Driver.UpdateState callback. So it should be safe to move
this update into the _mesa_set_draw_vao method.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Factor out that common call into the almost single place.
Remove the _mesa_set_drawing_arrays call from vbo_{exec,save}_draw code
paths as the function is now called through vbo_bind_arrays.
Prepare updating the list of struct gl_vertex_array entries via
calling _vbo_update_inputs for being pushed into those drivers that
finally work on that long list of gl_vertex_array pointers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Move vbo draw functions into struct dd_function_table.
For now just wrap the underlying vbo functions.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
If view mask has only one bit set, view index is effectively a
constant, so doesn't need to be passed to the next stages, just always
set it.
Part of this was in the original patch that added
anv_nir_lower_multiview.c but disabled.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The view_index is encoded in the remainder of dividing instance id by
the number of views in the view mask (n). In the general case (handled
by the else clause), there is a need to map from 0..n-1 into the
number of the view being masked. For that a map is encoded.
In the case only the first n bits in the mask are set, the mapping is
trivial, 0..n-1 already represent what view is being referred to.
That case was in the original patch that added
anv_nir_lower_multiview.c but disabled.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Unfortunately TGSI doesn't record the type of the FS output like GLSL
does, but VC5's TLB writes depend on the output's base type. Just record
the type in the key at variant compile time when we've got a TGSI input
and then fix it up.
Fixes KHR-GLES3.packed_pixels.pbo_rectangle.rgba32i/ui and apparently a
GPU hang that breaks most tests that come after it.
The implementation is inspired by
lower_instructions_visitor::dfrexp_sig_to_arith.
This has been tested against the arb_gpu_shader_fp64/fs-frexp-dvec4
test using the ARB_gl_spirv branch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
ring_name is "<class_name> + <instance_id>" (e.g. rcs0). So we need to
first compare the class name only, then get the instance id.
Without this, INSTDONE is not being decoded.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Shader-db runtime change avarage of five runs:
Before 125,77 seconds (+/- 0,09%)
After 124,48 seconds (+/- 0,07%)
Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric at anholt.net>
Make a simple worklist by basically just wrapping u_vector.
This is intended used in nir_opt_dce to reduce the number of calls
to ralloc, as we are currenlty spamming ralloc quite bad. It should
also give better cache locality and much lower memory usage.
Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric at anholt.net>
Different registers are used for execlist submission in gen11, so
also watch those. This code only watches element zero of the
submit queue, which is all aubdump currently writes.
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
The coordinate shaders may now have side effects in the form of transform
feedback.
Part of fixing
GTF-GLES3.gtf.GL3Tests.transform_feedback.transform_feedback_misc
Generalize the code for remove dead loops to also remove dead if
nodes. The conditions are the same in both cases, if the node (and
it's children) don't have side-effects AND the nodes after it don't
use the values produced by the node.
The only difference is when evaluating side effects: loops consider
only return jumps as a side-effect -- they can stop execution of nodes
after it; 'if' nodes outside loops should consider all kinds of
jumps (return, break, continue) since all of them can cause execution
of nodes after it to be skipped.
After this patch, empty ifs (those which both then and else blocks are
empty) will be removed by nir_opt_dead_cf.
It caused no change to shader-db, in part because the removal of empty
ifs is currently covered by nir_opt_peephole_select.
v2: Improve the identification of cases where break/continue can cause
side-effects. (Jason)
v3: Move code comment changes to a different patch. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
On the CI family, firmware requires the destory command have to be the
last command in the IB, moving feedback command after destroy is causing
issues on CI cards, so we have to keep the previous logic that moves
destroy back to the last command.
But as the original issue fixed previously, with the newer family like Vega10,
feedback command have to be included inside of the task info command along
with destroy command.
Fixes: 6d74cb25("radeon/vce: move destroy command before feedback command")
Signed-off-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Use get_language_version to calculate default cl standard based on
device capabilities and -cl-std specified in build options.
v5; move dev_clc_version declaration from an earlier patch
v4: Squash the __OPENCL_VERSION__ and CLC language version patches
v3: (Jan) Allow device_version up to 2.2 while device_clc_version
only goes to 2.0
Use get_cl_version to calculate version instead
v2: Split out from the previous patch (Pierre)
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
CC: Jan Vesely <jan.vesely@rutgers.edu>
Used to calculate the default CLC language version based on the --cl-std in build args
and the device capabilities.
According to section 5.8.4.5 of the 2.0 spec, the CL C version is chosen by:
1) If you have -cl-std=CL1.1+ use the version specified
2) If not, use the highest 1.x version that the device supports
Curiously, there is no valid value for -cl-std=CL1.0
Validates requested cl-std against device_clc_version
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
v7: (Pierre) Split cl/clc versions into separate lists and
make more references const.
v6: (Pierre) Add more const and fix some whitespace
v5: (Aaron) Use a collection of cl versions instead of switch cases
Consolidates the string, numeric version, and clc langstandard::kind
v4: (Pierre) Split get_language_version addition and use into separate patches
Squash patches that add the helpers and validate the language standard
v3: Change device_version to device_clc_version
v2: (Pierre) Move create_compiler_instance changes to correct patch
to prevent temporary build breakage.
Convert version_str into unsigned and use it to find language version
Add build_error for unknown language version string
Whitespace fixes
The compiler doesn't notice that the condition for num_layers to be
undefined already defined it above (as our assert checked in a debug
build).
v2: Move the pair of assignments to one outside of the block.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This extension removes the restrictions on minDepth/maxDepth,
minDepthBounds/maxDepthBounds and VkClearDepthStencilValue::depth.
The following CTS tests now pass:
dEQP-VK.glsl.builtin_var.fragdepth.line_list_d32_sfloat_large_depth
dEQP-VK.glsl.builtin_var.fragdepth.point_list_d32_sfloat_large_depth
dEQP-VK.glsl.builtin_var.fragdepth.triangle_list_d32_sfloat_large_depth
dEQP-VK.draw.inverted_depth_ranges.nodepthclamp_depth_range_unrestricted
dEQP-VK.draw.inverted_depth_ranges.depthclamp_depth_range_unrestricted
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This was previously ignored.
Along with the virglrenderer patch, this fixes ~100 dEQP tests:
dEQP-GLES3.functional.texture.filtering.cube.*
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
As of earlier commit, the --header was made a hard requirement when
using --code.
Hence - annotate both as required and drop a few no longer needed
checks.
Fixes: 035cc7a12d ("i965: perf: reduce i965 binary size")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Autotools/android builds generate the header & code files in 2 steps,
but the code generation requires the name of the header file to
include it.
This change generates both files in one command.
Fixes: 035cc7a12d ("i965: perf: reduce i965 binary size")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The have-new-DRI3 codepaths would never actually properly trigger, since
there was a typo in configure.ac which broke the version check. This
went unnoticed but for an error in config.log if you looked closely
enough.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reported-by: Lukas F. Hartmann <lukas@mntmn.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Fixes: 7aeef2d4ef ("dri3: allow building against older xcb (v3)")
Cc: Dave Airlie <airlied@redhat.com>
On all Arm architectures (ARMv7 and below as 'arm', ARMv8 and above as
'aarch64'), only build swrast for DRI drivers. The only classic drivers
which could be used are r200 and NV20 cards, which seems unlikely enough
that it shouldn't be the default.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reported-by: Javier Jardón <jjardon@gnome.org>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Loop was accessing one more than bindingCount elements from
pBindings, accessing uninitialized memory.
Fixes: ddc4069122 ("anv: Implement VK_KHR_maintenance3")
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Performance metric numbers are calculated the following way :
- out of the 256 bytes long OA reports, we accumulate the deltas
into an array of uint64_t
- the equations' generated code reads the accumulated uint64_t
deltas and normalizes them for a particular platform
Our hardware is such that a number of counters in the OA reports
always return the same values (i.e. they're not programmable), and
they return the same values even across generations, and as a result a
number of equations are identical in different metric sets across
different generations.
Up to now we've kept the generated code of the equations separated in
different files (per generation/GT), and didn't apply any
factorization of the common equations. We could have make some
improvement by reusing equations within a given metrics file, but we
can go even further and reuse across generations (i.e. all files).
This change changes the code generation to emit a single file in which
we reuse equations emitted code based on the hash of equations'
strings.
Here are the savings in a meson build :
Before(.old)/after :
$ du -h ./build/src/mesa/drivers/dri/libmesa_dri_drivers.so ./build/src/mesa/drivers/dri/libmesa_dri_drivers.so.old
43M ./build/src/mesa/drivers/dri/libmesa_dri_drivers.so
47M ./build/src/mesa/drivers/dri/libmesa_dri_drivers.so.old
$ size build/src/mesa/drivers/dri/libmesa_dri_drivers.so build/src/mesa/drivers/dri/libmesa_dri_drivers.so.old
text data bss dec hex filename
13054002 409424 671856 14135282 d7aff2 build/src/mesa/drivers/dri/libmesa_dri_drivers.so
14550386 409552 671856 15631794 ee85b2 build/src/mesa/drivers/dri/libmesa_dri_drivers.so.old
As a side comment here is the size of the drivers if we remove all of
the metrics from the build :
$ du -sh build/src/mesa/drivers/dri/libmesa_dri_drivers.so
40M build/src/mesa/drivers/dri/libmesa_dri_drivers.so
v2: Fix an issue with hashing of counter equations (Lionel)
Build system rework (Emil)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (build system part)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The equation code computes a float (percentage) yet the return type
was an uint64_t.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
==15115== 48 bytes in 1 blocks are definitely lost in loss record 16 of 66
==15115== at 0x4C2EC15: realloc (vg_replace_malloc.c:785)
==15115== by 0x8602C3E: _mesa_reserve_parameter_storage (prog_parameter.c:212)
==15115== by 0x8602D1E: _mesa_add_parameter (prog_parameter.c:252)
==15115== by 0x86032C4: _mesa_add_sized_state_reference (prog_parameter.c:384)
==15115== by 0x8603324: _mesa_add_state_reference (prog_parameter.c:409)
Fixes: edded12376 "mesa: rework ParameterList to allow packing"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The previous commit to make DRI3 modifier support optional, breaks with
an updated server and old client.
Make sure we never set multibuffers_available unless we also support it
locally. Make sure we don't call stubs of new-DRI3 functions (or empty
branches) which will never succeed.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes: 7aeef2d4ef ("dri3: allow building against older xcb (v3)")
Noticed while passing by. Not sure if it impacts anything, but
likely to impact GFX9 more than anything else since we lower
inputs, outputs and locals there.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
i965 and gallium handle the atomic buffer index differently. It was
just by luck that the single piglit test for this was passing.
For gallium we use the atomic binding so that we match the handling
in st_bind_atomics().
On radeonsi this fixes the CTS test:
KHR-GL43.shader_storage_buffer_object.advanced-write-fragment
It also fixes tressfx hair rendering in Tomb Raider.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will only ever be used by gallium drivers so it probably doesn't
belong in the nir toolkit. Also we want to pass it some non NIR
things in the following patch.
To avoid regressions we wrap the lowering calls that have been moved
to st_glsl_to_nir with a quick hack so that they are only called for
radeonsi, we will replace the hack with a check for uniform packing
in a following patch.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
These will be used in the following patch to allow copying directly
to the param list when packing is enabled.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Currently everything is padded to 4 components. Making the list
more flexible will allow us to do uniform packing.
V2 (suggestions from Nicolai):
- always pass existing calls to _mesa_add_parameter() true for padd_and_align
- fix bindless param value offsets
- remove left over wip logic from pad and align code
- zero out param value padding
- whitespace fix
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Will be used to determine whether to take packing code paths or not.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Our register spilling support is nice to have since vc4 couldn't at all,
but we're still very restricted due to needing to not spill during a TMU
operation, or during the last segment of the program (which would be nice
to spill a value of, when there's a long-lived value being passed through
with little modification from the start to the end).
We could do better by emitting unspills for the last-segment values just
before the last thrsw, since the last segment is probably not the maximum
interference area.
Fixes GTF uniform_buffer_object_arrays_of_all_valid_basic_types and 3
others.
We have some cases where in subpass we want the layer but having
it be 0 and loaded in the frag shader without the vertex shader
exporting it is fine.
So don't export the layer if we don't have a value to put in it.
Fixes: d4c74aed7a (radv/multiview: mark layer_input if we have input attachments.)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
If a shader only writes to an output via a constant initializer we
need to lower it before we call nir_remove_dead_variables so that
this pass sees the stores from the initializer and doesn't kill the
output.
Fixes test failures in new work-in-progress CTS tests:
dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output.float
This is ported from anv:
99b57daf4a anv/pipeline: lower constant initializers on output variables earlier
from Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
For multiview we need to emit a number of sequential queries
depending on the view mask.
This avoids dEQP-VK.multiview.queries.15 waiting forever
on the CPU for query results that are never coming.
We only really want to emit one query,
and the rest should be blank (amdvlk does the same),
so we emit begin/end pairs for all the others except
the first query.
v2: fix tests
v3: split out patch.
Fixes: dEQP-VK.multiview.queries*
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This just splits out the begin/end query hw emissions,
it makes it easier to add multiview support for queries.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Since the intermediate states of active_stages are not used,
i.e. active_stages is read only after all stages were set into it,
just set its value before compiling the shaders.
This will allow to conditionally run certain passes based on what
other shaders are being used, e.g. a certain pass might only be
applicable to the vertex shader if there's no geometry or tessellation
shader being used.
v2: Use vk_to_mesa_shader_stage. (Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This change allows the disk shader cache to work with programs loaded
with ProgramBinary. Drivers check for LINKING_SKIPPED, and if set,
then they try to use the shader cache.
Since the program loaded by ProgramBinary is similar to loading the
shader from the disk cache, this is probably more appropriate.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Currently, we only look in the disk shader cache if we see that the
shader program is in the cache during the link step.
If the shader cache entry isn't found during the program link, there
are still some (fairly unlikely) scenarios where later it might be
useful to search the cache for gen binary programs.
1. If the cache evicts the serialized glsl cache, there might still be
valid gen program entries in the disk cache.
2. If two applications are running in parallel, then it is possible
that one may write out the cached gen program item which the other
application can then make use of.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
When the shader cache is used, this can be generated. In fact, the
shader cache uses this sha1 to lookup the serialized GL shader
program.
If a GL shader program is restored with ProgramBinary, the shaders are
not available, and therefore the correct sha1 cannot be generated. If
this is restored, then we can use the shader cache to restore the
binary programs to the program that was loaded with ProgramBinary.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The fragment shader was trying to read this, but nothing
was exporting it from the vertex shader. This handles
it like the prim id export.
Fixes:
dEQP-VK.multiview.secondary_cmd_buffer.*
dEQP-VK.multiview.index.fragment_shader.*
v1.1: updated to use 0x1 (Samuel)
Fixes: e3265c10c8 (radv: Implement multiview draws.)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
An incorrect formula was used to compute bound_samplers_mask_vs.
Since s is above always 8 for vs and the variable is encoded on 8 bits,
it was always 0.
This resulted in commiting the samplers every call when
there was at least one texture read in the vs shader.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
We only get VK_SUCCESS if it was initialized, but apparently my compiler
doesn't track that far.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We only have a cfg != NULL if we went through one of the paths that set
it, but my compiler doesn't figure that out.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 6411defdcd ("intel/cs: Re-run final NIR optimizations for each SIMD size")
This is a legitimate warning: if anv's blorp_alloc_binding_table() throws
an error from anv_cmd_buffer_alloc_blorp_binding_table(), we silently
continue to use this undefined value. The rest of this code doesn't seem
very allocation-error-proof, though, either.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Previously the unit test filled out a minimal devinfo struct. A previous
patch caused the test to begin assert failing because the devinfo was
not complete. Avoid this by using the real mechanism to create devinfo.
Note that we have to drop icl from the table, since we now rely on the
name -> PCI ID translation done by gen_device_name_to_pci_device_id(),
and ICL's PCI IDs are not upstream yet.
Fixes: f89e735719 ("intel/compiler: Check for unsupported register sizes.")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
I'm not sure everyone wants to be updating their dri3 in a forced
march setting, this allows a nicer approach, esp when you want
to build on distro that aren't brand new.
I'm sure there are plenty of ways this patch could be cleaner,
and I've also not built it against an updated dri3.
For meson I've just left it alone, since if you are using meson
you probably don't mind xcb updates, and if you are using meson
you can fix this better than me.
v3: just don't put a version in for dri3/present without
modifiers, should allow building with 1.11 as well
(feel free to supply meson followups)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
We're already including it in the meson build. This fixes build issues
on systems which have a drm_fourcc.h that doesn't have modifiers.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Implement the eglSwapinterval for Android platform to
enable the async mode for some GFX benchmarks such as
Daimler C217, CityBench.
Results of the dEQP-EGL.*swap_interval tests
'dEQP-EGL.functional.query_config.get_config_attrib.max_swap_interval'..
'dEQP-EGL.functional.query_config.get_config_attrib.min_swap_interval'..
'dEQP-EGL.functional.choose_config.simple.selection_only.max_swap_interval'..
'dEQP-EGL.functional.choose_config.simple.selection_only.min_swap_interval'..
'dEQP-EGL.functional.choose_config.simple.selection_and_sort.max_swap_interval'..
'dEQP-EGL.functional.choose_config.simple.selection_and_sort.min_swap_interval'..
'dEQP-EGL.functional.negative_api.swap_interval'..
Test run totals:
Passed: 7/7 (100.0%)
Failed: 0/7 (0.0%)
Not supported: 0/7 (0.0%)
Warnings: 0/7 (0.0%)
Signed-off-by: Zhongmin Wu <zhongmin.wu@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
[Emil Velikov: polish inline comment, add dEQP stats, s/dpy/disp/]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Instead of indirectly pulling the wayland headers everywhere, use
forward declarations and #include only as needed.
Should effectively fix build errors like the following:
make[5]: Entering directory
'/.../src/gallium/state_trackers/omx/tizonia'
CC h264dprc.lo
In file included from h264dprc.c:45:0:
.../src/egl/drivers/dri2/egl_dri2.h:47:10: fatal error:
wayland/wayland-egl/wayland-egl-backend.h: No such file or directory
#include "wayland/wayland-egl/wayland-egl-backend.h"
Cc: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
During development the version was bumped, yet the comment did not get
an update.
Fixes: c80c08e226 ("vulkan/wsi/x11: Add support for DRI3 v1.2")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
According to OpenGL ES 3.2, section 8.6, CopyTexSubImage* should return
an INVALID_OPERATION if the internalformat of the texture is RGB9_E5.
This fixes
dEQP-GLES31.functional.debug.negative_coverage.*.copytexsubimage2d_texture_internalformat.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This pass moves load UBO operations just before their first use,
loosely based on nir_opt_move_comparisons.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This drops one of the geometry specific user sgprs,
we can work this out at compile time.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves the lds_size calcs into the shader so we have all
the size stuff in one file.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This drops the now unneeded scanning and results in favour
of the ones in the info.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of recalculating the value, use the shader calculated value.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This gathers the ls outputs written by the vertex shader,
and the tcs outputs, these are needed to calculate certain
tcs parameters.
These have to be separate for combined gfx9 shaders.
This is a bit pessimistic compared to the nir pass,
as we don't work out the individual slots for tcs outputs,
but I actually thing it should be fine to just mark the whole
thing used here.
v2: move to radv, handle clip dist (Samuel),
handle compacts and patchs properly.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just moves this function to an inline so the shader_info
pass can use it.
v2: use inline (Samuel)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
[123/227] Compiling C object 'src/mesa/drivers/dri/i965/libi965_gen110@sta/genX_blorp_exec.c.o'.
../src/mesa/drivers/dri/i965/genX_blorp_exec.c:99:1: warning: ‘blorp_get_surface_base_address’ defined but not used [-Wunused-function]
blorp_get_surface_base_address(struct blorp_batch *batch)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
[84/227] Compiling C object 'src/intel/vulkan/libanv_gen110@sta/genX_blorp_exec.c.o'.
../src/intel/vulkan/genX_blorp_exec.c:68:1: warning: ‘blorp_get_surface_base_address’ defined but not used [-Wunused-function]
blorp_get_surface_base_address(struct blorp_batch *batch)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
If AMD_shader_info or RADV_TRACE_FILE is used we might need to
keep trace of LLVM IR.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Just to be sure all options are enabled when trying to generate
a hang report.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
So now, during spirv_to_nir, it uses the capability instead of the
extension. Note that we are really doing here is treating
SPV_AMD_gcn_shader as other supported extensions. SPV_AMD_gcn_shader
is not the first SPV extension supported. For example, the capability
draw_parameters infers if the extension SPV_KHR_shader_draw_parameters
is supported or not.
This could be seen as counter-intuitive, and that it would be easier
to define which extensions are supported, and based our checks on
that, but we need to take into account that some capabilities are
optional from core, and others came from new extensions.
Also this commit would make the implementation of ARB_spirv_extensions
easier.
v2: AMD_gcn_shader capability renamed to gcn_shader (Daniel Schürmann)
Reviewed-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We don't need anymore the source and destination's data type, just
their bitsize.
v2:
- Use glsl_get_bit_size () instead (Jason).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There are some SPIRV opcodes (like UConvert and SConvert) have some
expectations of the output that doesn't depend on the operands
data type. Generalize the solution of all of them.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
sched_yield is used but the include reference on Darwin is missing. This patch
conditionally guards on Darwin/OSX to import sched.h first.
Reviewed-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
The implementation for bootstrapping SWR on Darwin targets is based on the Linux version.
Instead of reading the output of /proc/cpuinfo, sysctlbyname is used to determine the
physical identifiers, processor identifiers, core counts and thread-processor affinities.
With this patch, it is possible to use SWR as an alternate renderer on OSX to softpipe and
llvmpipe.
Reviewed-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This fixes an illegal command buffer on the host seen with
piglit arb_internalformat_query2-max-dimensions
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add them as usable for textures, so they can be used by
Wayland drm in 10 bpc mode and for X11 compositing under
GLX and EGL. We need these formats to be supported at
least for sampling, otherwise GLX_texture_from_pixmap
and the equivalent EGL image extension won't work with
X11 drawables of depth 30 and just display an all black
window.
Do not expose these formats as renderable, and thereby
not as a fbconfig/EGLConfig/Visual, as NVidia hw does
not support 10 bpc unorm formats without alpha channel.
Tested under X11 + GLX/EGL + DRI2/DRI3 for compositing,
and under Wayland+Weston drm backend with a Tesla and
Pascal gpu.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Walking the whole hash table, inserting entries by hashing them first
is just a really bad idea. We can simply memcpy the whole thing.
V2: Remove leftover creation of acp in two places
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
V2: Don't rzalloc; we are about to rewrite the whole thing (Vladislav)
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This fixes the following build breakage:
make[5]: Entering directory
'/mnt/sdc1/Gits/mesa/src/gallium/state_trackers/omx/tizonia'
CC h264dprc.lo
In file included from h264dprc.c:45:0:
../../../../../src/egl/drivers/dri2/egl_dri2.h:47:10: fatal error:
wayland/wayland-egl/wayland-egl-backend.h: No such file or directory
#include "wayland/wayland-egl/wayland-egl-backend.h"
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
meson got the same fix in 7598dedfde.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
OpenCL kernels also have int8/uint8.
v2: remove changes in nir_search as Jason posted a patch for that
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
From the spec:
"When copying between compressed and uncompressed formats the
extent members represent the texel dimensions of the source
image and not the destination."
However, as per 7b890a36, we must still use the destination image type
when clamping the extent so that we copy the correct number of layers
for 2D to 3D copies.
Fixes: 7b890a36 "radv: Fix vkCmdCopyImage for 2d slices into 3d Images"
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This fixes CTS:
dEQP-VK.api.device_init.create_device_queue2_unmatched_flags
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@gmail.com>
The code to handle mat multiplication by a scalar tries to pick either
imul or fmul depending on whether the matrix is float or integer.
However it was doing this by checking whether the base type is float.
This was making it choose the int path for doubles (and presumably
float16s).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
af5f2322d0 addressed this for extension commands, but the spec mandates
this behavior also for core API commands. From the Vulkan spec,
Table 2. vkGetDeviceProcAddr behavior:
device pname return
----------------------------------------------------------
(..)
device core device-level command fp
(...)
See that it specifically states "device-level".
Since the vk.xml file doesn't state if core commands are instance or
device level, we identify device level commands as the ones that take a
VkDevice, VkQueue or VkCommandBuffer as their first parameter.
Fixes test failures in new work-in-progress CTS tests.
Also see the public issue:
https://github.com/KhronosGroup/Vulkan-LoaderAndValidationLayers/issues/2323
v2:
- Include reference to github issue (Emil)
- Rebased on top of Vulkan 1.1 changes.
v3:
- Remove the not in the condition and switch the then/else cases (Jason)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The spec is pretty clear that this can be 0, and that it operates
as a reserved binding.
Fixes:
dEQP-VK.binding_model.descriptor_update.empty_descriptor.uniform_buffer
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
sched_yield is used but the include reference on Darwin is missing. This patch
conditionally guards on Darwin/OSX to import sched.h first.
Reviewed-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
The implementation for bootstrapping SWR on Darwin targets is based on the Linux version.
Instead of reading the output of /proc/cpuinfo, sysctlbyname is used to determine the
physical identifiers, processor identifiers, core counts and thread-processor affinities.
With this patch, it is possible to use SWR as an alternate renderer on OSX to softpipe and
llvmpipe.
Reviewed-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
If a src was referencing the same temp as the dst, the per-component
copy code didn't work.
e.g.
cndge r0.xy, r0.xx, |r2|, r3
got expanded into
mov r12.x, |r2|
cndge r0.x, r0.x, r12, r3
mov r12.y, |r2|
cndge r0.y, r0.x, r12, r3
hence for the second cndge r0.x was mistakenly the previous cndge result.
Fix this by doing all the movs first, so there's no bogus alu.last in between.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=102905
Tested-by: <iive@yahoo.com>
Reviewed-by: Dave Airlie <airlied@gmail.com>
I was going to have to add another parameter to this monster,
so we should just pass the nir_variable in, I can't find any
reason this would be a bad idea.
This needed for the next fix.
Fixes: 94f9591995 (radv/ac: add support for TCS/TES inputs/outputs.)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This seems more correct to me, since if we have an array
of floats they'll be vec4 aligned, and if we do af[2],
we want the const index to increase by 2 slots in the non
compact case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464
Fixes: 94f9591995 (radv/ac: add support for TCS/TES inputs/outputs.)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Meson is pretty well tested and works in most configurations now, so we
can remove the warning about it being unsuited for actual use.
It's also worth documenting that meson 0.42.0 or greater is required.
v2: - Minor rewording of supported platforms as suggested by Emil
- Add two missing tags as reported by xmllint --html
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
This is based heavily on 97f10934ed, "ac/nir: Add vote_ieq/vote_feq
lowering pass." from Bas Nieuwenhuizen. This version is a bit more
general since it's in common code. It also properly handles NaN due to
not flipping the comparison for floats.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Meson's compiler.has_header is completely useless, it only checks that a
header exists, not whether it's usable. This creates problems if a
header contains a conditional #error declaration, like so:
> #if __x86_64__
> # error "Doesn't work with x86_64!"
> #endif
Compiler.has_header will return true in this case, even when compiling
for x86_64. This is useless.
Instead, we'll do a compile check so that any #error declarations will
be treated as errors, and compilation will work.
Fixes compilation on x32 architecture.
Gentoo Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=649746
meson bug: https://github.com/mesonbuild/meson/issues/2246
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This is a terrible hack but it fixes CTS regressions. It's still
incredibly unclear exactly what is going wrong in the hardware to cause
this to be an issue so this isn't a good fix by any means. However, it
does fix tests so there is that.
Fixes: fb0e9b5197 "i965: Track the depth and render caches separately"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103746
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
It would be nice to support perfmon with simulator, and might be a useful
tool for regression testing performance (since the simulator would be
deterministic).
Now the "ac/nir" prefix will really be the shared code between
RadeonSI and RADV, that might avoid confusions in the future.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Calling __builtin_frame_address with a nonzero argument is unsafe
but is sometimes done for debugging purposes. Since this code is
part of some debug util code I'm assuming that is the case here
and using GCC pragma to silence the warning.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
With some sets of optimization flags, GCC will generate warnings like
this:
src/mesa/main/texparam.c:2327:27: warning: ‘*((void *)&ip+12)’ may be used uninitialized in this function [-Wmaybe-uninitialized]
params[3] = ip[3];
~~^~~
src/mesa/main/texparam.c:2320:16: note: ‘*((void *)&ip+12)’ was declared here
GLint ip[4];
^~
ip is not initialized in cases where a GL error is generated. In these
cases, we should *not* write to the user's buffer, so this is actually a
bug. I wrote a new piglit test gl-3.0-texparameteri to show this bug.
I suspect that Coverity also detected this, but the scan site is
currently down.
Fixes: c2c507786 "main: Added entry points for glGetTextureParameteriv, Iiv, and Iuiv."
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is similar to commit 90633079. libtool links first to system directories
instead of custom locations of libdrm on relinking. Since a more recent libdrm
version than the one provided by the system is often needed when compiling
mesa, make sure this works by putting libdrm in front.
See also: https://bugs.freedesktop.org/show_bug.cgi?id=100259
Signed-off-by: Roman Gilg <subdiff@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The tool accepts the input and output files as arguments.
There's no need for the redirection.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The tool accepts the input and output files as arguments.
There's no need for the redirection.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The tool accepts the input and output files as arguments.
There's no need for the redirection.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The extension was never implemented. Quick search suggests:
- no actual users (on my Arch setup)
- the Nvidia driver does not implement the extension
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
The extension was never implemented. Quick search suggests:
- no actual users (on my Arch setup)
- the Nvidia driver does not implement the extension
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
The extension was never implemented. Quick search suggests:
- no actual users (on my Arch setup)
- the Nvidia driver does not implement the extension
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
The extension was never implemented. Quick search suggests:
- no actual users (on my Arch setup)
- the Nvidia driver does not implement the extension
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
The extension was never implemented. Quick search suggests:
- no actual users (on my Arch setup)
- the Nvidia driver does not implement the extension
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
The extension was never implemented. Quick search suggests:
- no actual users (on my Arch setup)
- the Nvidia driver does not implement the extension
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
The vulkan API is not ideal as it does not allow us have a
shared limit.
Feral needs 15+6 for one of their games, and I'm not a fan
of overcommitting the limits, so increase the number of
dynamic uniform buffers to 16.
CC: <mesa-stable@lists.freedesktop.org>
CC: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This fixes a memory trashing crash (not the test) seen with
dEQP-GLES3.stress.draw.unaligned_data.random.203
on virgl.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is ported from the sb backend, there are some issues with
evergreen stacks on the boundary between entries and ALU_PUSH_BEFORE
instructions.
Whenever we are going to use a push before, we check the stack
usage and if we have to use the workaround, then we switch to
a separate push.
I noticed this problem dealing with some of the soft fp64 shaders,
in nosb mode, they are quite stack happy.
This fixes all the glitches and inconsistencies I've seen with them
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Tested-by: Elie Tournier <elie.tournier@collabora.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes following dependency problem:
Native dependency xcb-dri3 found: NO found '1.11' but need: '>= 1.13'
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Fixes: c80c08e226 ("vulkan/wsi/x11: Add support for DRI3 v1.2")
Instead of keeping a copy of the vertex array content in
struct gl_vertex_array only keep pointers to the first order
information originaly in the VAO.
For that represent the current values by struct gl_array_attributes
and struct gl_vertex_buffer_binding.
v2: Change comments.
Remove gl... prefix from variables except in the i965 directory where
it was like that before. Reindent because of that.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
The logic would not work correctly for line lengths smaller than 1.0,
even a degenerated line with length 0 would still produce a fragment
with anyhwere between alpha 0.0 and 0.5.
Reviewed-by: Brian Paul <brianp@vmware.com>
Ken suggested that we might be underallocating scratch space on HD
400. Allocating scratch space as though there was actually 8 EUs
seems to help with a GPU hang seen on synmark CSDof.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested by our OpenCL team.
Fixes: 9c499e6759 "st/mesa: don't invoke st_finalize_texture & st_convert_sampler for TBOs"
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Like the r600 paths to use other custom states, we pass in a couple of
parameters to customize the innards of the blitter. It's up to the caller
to wrap other state necessary for its shaders (for example, constant
buffers for the uniforms the shader uses).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If PresentCompleteNotify event says the pixmap was presented
with mode PresentCompleteModeSuboptimalCopy, it means the pixmap
could possibly have been flipped instead if allocated with a
different format/modifier.
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Add support for DRI3 v1.1, which allows pixmaps to be backed by
multi-planar buffers, or those with format modifiers. This is both
for allocating render buffers, as well as EGLImage imports from a
native pixmap (EGL_NATIVE_PIXMAP_KHR).
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
When it is detected that a window could have been flipped
but has been copied because of suboptimal format/modifier.
The Vulkan client should then re-create the swapchain.
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Adds support for multiple planes and buffer modifiers.
v4: Rename "has_dri3_v1_1" to "has_dri3_modifiers"
v12: Multi-planar/modifier support is now DRI3 v1.2; also update release
versions
We were dropping the upper 32 bits, which caused assertion failures in
some compute shader piglit tests with radeonsi since the commit below.
Fixes: 752e969703 ("compiler: Add two new system values for subgroups")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is useful for archrast data collection. This greatly speeds up the
post processing script since there is significantly less events generated.
Finally, this is a simpler option to communicate to users than having
them directly adjust MAX_PRIMS_PER_DRAW and MAX_TESS_PRIMS_PER_DRAW.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
In the API event handler we want to share information between the core
layer and the API. Specifically, around associating various ids with
different kinds of events. For example, associate render pass id with
draw ids, or command buffer ids with draw ids.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Vivante hardware supports this just fine. There is no reason why this shouldn't
be advertised as a valid combination.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This allows the driver to be built on a make distcheck and makes sure
that it properly builds when a distribution tarball is made.
Suggested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Tegra K1 and later use a GPU that can be driven by the Nouveau driver.
But the GPU is a pure render node and has no display engine, hence the
scanout needs to happen on the Tegra display hardware. The GPU and the
display engine each have a separate DRM device node exposed by the
kernel.
To make the setup appear as a single device, this driver instantiates
a Nouveau screen with each instance of a Tegra screen and forwards GPU
requests to the Nouveau screen. For purposes of scanout it will import
buffers created on the GPU into the display driver. Handles that
userspace requests are those of the display driver so that they can be
used to create framebuffers.
This has been tested with some GBM test programs, as well as kmscube and
weston. All of those run without modifications, but I'm sure there is a
lot that can be improved.
Some fixes contributed by Hector Martin <marcan@marcan.st>.
Changes in v2:
- duplicate file descriptor in winsys to avoid potential issues
- require nouveau when building the tegra driver
- check for nouveau driver name on render node
- remove unneeded dependency on libdrm_tegra
- remove zombie references to libudev
- add missing headers to C_SOURCES variable
- drop unneeded tegra/ prefix for includes
- open device files with O_CLOEXEC
- update copyrights
Changes in v3:
- properly unwrap resources in ->resource_copy_region()
- support vertex buffers passed by user pointer
- allocate custom stream and const uploader
- silence error message on pre-Tegra124
- support X without explicit PRIME
Changes in v4:
- ship Meson build files in distribution tarball
- drop duplicate driver_tegra dependency
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
This adds support for framebuffer modifiers to Nouveau. This will be
used by the Tegra driver to share metadata about the format of buffers
(such as the tiling mode or compression).
Changes in v2:
- remove unused parameters to nouveau_buffer_create()
- move format modifier query code to nvc0 backend
- restrict format modifiers to 2D textures
- implement ->query_dmabuf_modifiers()
Changes in v4:
- add UAPI include path on meson builds
Changes in v5:
- remove unnecessary includes
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Add a new macro that can be used to extract the tiling mode from a
tile_mode value. This is will be used to determine the number of GOBs
used in block linear mode.
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Thierry Reding <treding@nvidia.com>
The existing format modifier definitions were merged prematurely, and
recent work has unveiled that the definitions are suboptimal in several
ways:
- The format specifiers, except for one, are not Tegra specific, but
the names don't reflect that.
- The number space is split into two, reserving 32 bits for some
"parameter" which most of the modifiers are not going to have.
- Symbolic names for the modifiers are not using the standard
DRM_FORMAT_MOD_* prefix, which makes them awkward to use.
- The vendor prefix NV is somewhat ambiguous.
Fortunately, nobody's started using these modifiers, so we can still fix
the above issues. Do so by using the standard prefix. Also, remove TEGRA
from the name of those modifiers that exist on NVIDIA GPUs as well. In
case of the block linear modifiers, make the "parameter" smaller (4
bits, though only 6 values are valid) and don't let that leak into any
of the other modifiers.
Finally, also use the more canonical NVIDIA instead of the ambiguous NV
prefix.
This is based on commit 268892cb63a822315921a8dab48ac3e4abf7dd03 from
Linux v4.16-rc1.
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Avoid a compiler warnings when the val parameter is an expression.
This is based on commit 5843f4e02fbe86a59981e35adc6cabebee46fdc0 from
Linux v4.16-rc1.
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Bit 0 enables VSRC0 (R in low bits, G high) and bit 2 enables
VSRC1 (B in low bits, A high).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If the api version is too low, the loader clamps the application
requested version to the advertized version, which messes with
which extensions are enabled.
Reviewed-by: Dave Airlie <airlied@redhat.com>
If the previously seen instruction generates more fields than the new
instruction, still allow CSE to happen. This doesn't do much, but it
also enables a couple more shaders in the next patch. It helped quite a
bit in another change series that I have (at least for now) abandoned.
v2: Add some extra comentary about the parameters to instructions_match.
Suggested by Ken.
No changes on Skylake, Broadwell, Iron Lake or GM45.
Ivy Bridge and Haswell had similar results. (Ivy Bridge shown)
total instructions in shared programs: 11780295 -> 11780294 (<.01%)
instructions in affected programs: 302 -> 301 (-0.33%)
helped: 1
HURT: 0
total cycles in shared programs: 257308315 -> 257308313 (<.01%)
cycles in affected programs: 2074 -> 2072 (-0.10%)
helped: 1
HURT: 0
Sandy Bridge
total instructions in shared programs: 10506687 -> 10506686 (<.01%)
instructions in affected programs: 335 -> 334 (-0.30%)
helped: 1
HURT: 0
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (idr): Don't allow CSEL with a non-float src2.
v3 (idr): Add CSEL to fs_inst::flags_written. Suggested by Matt.
v4 (idr): Only set BRW_ALIGN_16 on Gen < 10 (suggested by Matt). Don't
reset the access mode afterwards (suggested by Samuel and Matt). Add
support for CSEL not modifying the flags to more places (requested by
Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v3]
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is only required with the latest libdrm.
This fixes 32-bit support with high addresses.
(and possibly 64-bit support too because the high bits need to be masked out)
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
It should be possible to build EGL without GLX, but the meson build
currently doesn't allow that because it too tightly couples glx and dri.
This patch eases dri and glx apart, so that EGL without GLX can be
built.
CC: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
The meson build file for Apple GLX is not listed in the EXTRA_DIST make
variable and therefore isn't shipped as part of the release tarball, so
meson builds from the tarball will fail.
Add the file to EXTRA_DIST to ensure it is included in the tarball.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Null exports should only be needed when no other exports are
emitted. This removes a bunch of 'exp null off, off, off, off done vm'.
Affected games are Dota 2 and Wolfenstein 2, not sure if that
really helps, but code size is decreasing there.
Polaris10:
Totals from affected shaders:
SGPRS: 8216 -> 8216 (0.00 %)
VGPRS: 7072 -> 7072 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 454968 -> 453896 (-0.24 %) bytes
Max Waves: 772 -> 772 (0.00 %)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
With this extension enabled and a server GLX implementation that actually
honors it, Window movement lags considerably on gnome-shell/vmware, so
disable it by default.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Deepak Rawat <drawat@vmware.com>
Drivers on virtual hardware don't want to expose this extension to
GLX compositors, similarly to GLX_OML_sync_control, since that significantly
increases latency.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Deepak Rawat <drawat@vmware.com>
These helpers insert the basic block in the same order as they
appear in NIR making it easier to follow LLVM IR dumps. The helpers
also insert more useful labels onto the blocks.
TGSI use the line number of the corresponding opcode in the TGSI
dump as the label id, here we use the corresponding block index
from NIR.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This re-adds the auto option for omx, without it we default to tizonia
and the build fails almost immediately, this is especially obnoxious
those building a driver that doesn't support the OMX state tracker to
begin with.
v2: - Only define OMX_FOO for auto cases if the dependencies are found.
This fixes building tizonia with auto (Julien, Eric)
CC: Gurkirpal Singh <gurkirpal204@gmail.com>
Fixes: bb5e27fab6
("st/omx/bellagio: Rename st and target directories")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Jon Turney <jon.turney@dronecode.org.uk> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Julien Isorce <julien.isorce@gmail.com>
Tested-by: Karol Herbst <kherbst@redhat.com> (v1)
In contrast to non-aa, where stippling is based on either dx or dy
(depending on if it's a x or y major line), stippling is based on
actual distance with smooth lines, so adjust for this.
(It looks like there's some minor artifacts with mesa demos
line-sample and stippling, it looks like the line endpoints
aren't quite right with aa + stippling - maybe due to the
integer math in the stipple stage, but I can't quite pinpoint it.)
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The motivation actually was to get rid of the additional tex
instruction, since that requires the draw fallback code to intercept
all sampler / view calls (even if the fallback is never hit).
Basically, the idea is to use coverage of the pixel to calculate
the alpha value, and coverage is simply based on the distance
to the center of the line (in both line direction, which is useful
for wide lines, as well as perpendicular to the line).
This is much closer to what hw supporting this natively actually does.
It also fixes an issue with line width not quite being correct, as
well as endpoints getting stretched too far (in line direction) with
wide lines, which is apparent with mesa demo line-sample.
(For llvmpipe, it would probably make sense to do something like this
directly when drawing lines, since rendering two tris is twice as
expensive as a line, but it would need some changes with state
management.)
Since we're no longer relying on mipmapping to get the alpha value,
we also don't need to draw 3 rects (6 tris), one is sufficient.
There's still issues (as before):
- quite sure it's not correct without half_pixel_center, but can't test
this with GL.
- aaline + line stipple is incorrect (evident with line-sample demo).
Looking at the spec the stipple pattern should actually be based on
distance (not just dx or dy for x/y major lines as without aa).
- outputs (other than pos + the one used for line aa) should be
reinterpolated since we actually increase line length by half a pixel
(but there's no tests which would care).
v2: simplify the math (should be equivalent), don't need immediate
v3: use float versions of atan2,cos,sin, minor cleanups
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The old vote_eq implementation supported only booleans, but now
we have to support arbitrary values, so use the read_first_invocation
intrinsic + ballot.
I took this as an opportunity to figure out how easy it was to do this
in nir instead of in the nir_to_llvm pass, and it actually turned out
pretty okay IMO. Only creating the pass is some extra code.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This requires us to bump the subgroup size to 32 for all shader stages
because Vulkan requires that to be a physical device query.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
NIR has code to lower these away for us but we can do significantly
better in many cases with register regioning and SIMD4x2.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This commit adds a helper to the builder for emitting "scan" operations.
Given a binary operation #, a scan takes the vector [a0, a1, ..., aN]
and returns the vector [a0, a0 # a1, ..., a0 # a1 # ... # aN] where each
channel contains the combination of all previous channels. The sequence
of instructions to perform the scan is fairly optimal; a 16-wide scan on
a 32-bit type is only 6 instructions. The subgroup scan and reduction
operations will be implemented in terms of this.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The SPIR-V extension wants us to be able to do an AllEqual on any vector
or scalar type. This has two implications:
1) We need to be able to handle vectors so we switch the vote_eq
intrinsics to be vectorized intrinsics.
2) We need to handle floats which have different behavior with respect
to +-0, NaN, etc. than the integer variant so we need two variants.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Someone can make the lowering optional later if they want something
different for their hardware.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
From the Vulkan 1.1 spec:
"Vulkan 1.0 implementations were required to return
VK_ERROR_INCOMPATIBLE_DRIVER if apiVersion was larger than 1.0.
Implementations that support Vulkan 1.1 or later must not return
VK_ERROR_INCOMPATIBLE_DRIVER for any value of apiVersion."
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This is not strictly necessary since users should not be requesting any
flags that are not valid for the list of enabled features requested and
we already fail if they attempt to use an unsupported feature, however
it is an easy to implement sanity check that would help developes realize
that they are doing things wrong, so we might as well do it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
From the Vulkan 1.1 spec, VkDeviceQueueInfo2 structure:
"The queue returned by vkGetDeviceQueue2 must have the same flags value
from this structure as that used at device creation time in a
VkDeviceQueueCreateInfo instance. If no matching flags were specified
at device creation time then pQueue will return VK_NULL_HANDLE."
For us this means no flags at all since we don't support any.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This belongs to the protected memory feature but there's nothing about
it that's specific to protected memory.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This is part of the device groups extension/feature but it's a decent
chunk of work in its own right so it's worth breaking into its own
patch. The mechanism we use is fairly straightforward: we just push the
base work group id into the shader and add it to the work group id we
get from dispatch.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This advertises the VK_KHR_shader_draw_parameters functionality as a
"core optimal feature" in Vulkan 1.1.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This requires us to rename any Vulkan API entrypoints which became core
in 1.1 to no longer have the KHR suffix.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
In this case, we say an entrypoint is supported if ANY of the extensions
is supported. This is because, in the XML, entrypoints don't require
extensions so much as extensions require entrypoints.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The original string map assumed that the mapping from strings to
entrypoints was a bijection. This will not be true the moment we
add entrypoint aliasing. This reworks things to be an arbitrary map
from strings to non-negative signed integers. The old one also had a
potential bug if we ever had a hash collision because it didn't do the
strcmp inside the lookup loop. While we're at it, we break things out
into a helpful class.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Our previous handling of barriers always used the big hammer and didn't
correctly emit memory barriers when specified along with a control
barrier. This commit completely reworks the way we emit barriers to
make things both more precise and more correct.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
When looking up known glsl_type instances in the various hash tables, we
end up leaking the key instances used for the lookup, as the glsl_type
constructor allocates memory on the global mem_ctx. This patch changes
glsl_type to manage its own memory, which fixes the leak and also allows
getting rid of the global mem_ctx and its mutex.
v2: remove lambda usage (Tapani)
(+keep ASSERT_BITFIELD_SIZE, modify dummy ctor to initialize mem_ctx)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104884
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Simon Hausmann <simon.hausmann@qt.io>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This capability allows gl_ViewportIndex and gl_Layer to also be used
as outputs in Vertex and Tesselation shaders.
v2: Make conditional to the capability, add gl_Layer, add tesselation
shaders. (Iago)
v3: Don't export to tesselation control shader.
v4: Add Reviewd-by tag.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Fixes the following building errors:
external/mesa/src/intel/vulkan/anv_device.c:300: error: undefined reference to 'gen_get_pci_device_id_override'
external/mesa/src/intel/vulkan/anv_device.c:312: error: undefined reference to 'gen_get_device_name'
external/mesa/src/intel/vulkan/anv_device.c:313: error: undefined reference to 'gen_get_device_info'
clang.real: error: linker command failed with exit code 1 (use -v to see invocation)
Fixes: 272bef0601 "intel: Split gen_device_info out into libintel_dev"
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This reverts commit 2d36efdb7f.
This raised limit turns out to harmful for more complex shaders,
it causes excessive spilling in some Bioshock Infinite shaders.
The fps for the ssao demo on radv remains unchanged when reverting
this.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
If it's zero but put it in args we still end up consuming a
register for it.
This fixes some spilling in the NIR paths in Dirt Rally that
isn't seen with TGSI.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These transformations are inexact because section 4.7.1 (Range and
Precision) says:
Operations and built-in functions that operate on a NaN are not
required to return a NaN as the result.
The fmin or fmax might not return NaN in cases where the original
expression would be required to return NaN.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This transformation is inexact because section 4.7.1 (Range and
Precision) says:
Operations and built-in functions that operate on a NaN are not
required to return a NaN as the result.
The fmin or fmax might not return NaN in cases where the original
expression would be required to return NaN.
v2: Reorder operands and mark as inexact. The latter suggested by
Jason.
shader-db results:
Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14514817 -> 14514808 (<.01%)
instructions in affected programs: 229 -> 220 (-3.93%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 4
helped stats (rel) min: 2.86% max: 4.12% x̄: 3.70% x̃: 4.12%
total cycles in shared programs: 533145211 -> 533144939 (<.01%)
cycles in affected programs: 37268 -> 36996 (-0.73%)
helped: 8
HURT: 0
helped stats (abs) min: 2 max: 134 x̄: 34.00 x̃: 2
helped stats (rel) min: 0.02% max: 14.22% x̄: 3.53% x̃: 0.05%
Sandy Bridge and Ivy Bridge had similar results. (Ivy Bridge shown)
total cycles in shared programs: 257618409 -> 257618403 (<.01%)
cycles in affected programs: 12582 -> 12576 (-0.05%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.05% max: 0.05% x̄: 0.05% x̃: 0.05%
No changes on Iron Lake or GM45.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: Refactor out screen functions to st/omx
Allows to keep all the code under st/omx (st/omx/tizonia and
st/omx/bellagio).
Reverts targets/omx_bellagio to omx as additions to existing files
is enough to compile for both bellagio and tizonia.
* autotools changes:
--enable-omx -> --enable-omx-bellagio
* meson changes:
-Dgallium-omx=false -> -Dgallium-omx=disabled
-Dgallium-omx=true -> -Dgallium-omx=bellagio
Acked-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Julien Isorce <julien.isorce@gmail.com>
This allows us to generate, for example,
"exp param0 v0, off, off, off" if only the first channel is needed.
Not sure if this improves performance but it's worth trying.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
When the mask is not 0xf we need to update the number of
enabled channels, otherwise the hardware won't emit the
components that are combined.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Currently, it's always 0xf but an upcoming patch will reduce the
number of channels for parameters export.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This adds a missing library to the i965/Android.mk file, and updates
intel/Android.mk to include the new library. Without this, mesa does not
build on Android.
Fixes: 272bef0601 "intel: Split gen_device_info out into
libintel_dev"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On Android surface/swapchain extensions are implemented by the loader. Patch
modifies both anv and radv extension scripts disabling currently exposed
ones. See also earlier commit 9f763c1f9b.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Just like commit 2ffe395 does for radv.
Fixes following dEQP test on i965:
dEQP-VK.api.info.android.no_unknown_extensions
v2: make it !ANDROID since this extension is not about
surfaces/swapchain
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Some state trackers require 128.
(There are no plans to increase PIPE_MAX_SAMPLERS too, since with gl
state tracker it's unlikely more than 32 will be needed, if you need
more use bindless.)
The comment said it will only represent the lowest 32 regs. This was
not entirely true in practice, since at least on x86 you'll get
masked shifts (unless the compiler could recognize it already and toss
it out). It turns out this actually works out alright (presumably
noone uses it for temp regs) when increasing max sampler views, so
make that behavior explicit.
Albeit it feels a bit hacky (but in any case, explicit behavior there
is better than undefined behavior).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Useful for testing API, builtin library, and device completeness of
not-yet-supported versions.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
(v3) Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Jan Vesely <jan.vesely@rutgers.edu>
v4: Remove redundant std::string wrapper around debug_get_option calls
v3: mark CL version overrides as static and const
v2: Make version_string in platform const in case
We'll need to be able to detect device version to define the appropriate
__OPENCL_VERSION__ header.
v2: Rebase after removing the previous patch (Pierre)
- Removed "clover: Add device_clc_version to llvm::create_compiler_instance"
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We'll be using dev.device_clc_version to select the default language version
soon along with the existing ir_target field.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v4: Pass the device down instead of device_clc_version as a separate field
v3: Revise to acknowledge that we now have the device in compile/link_program
instead of the string values.
v2: (Pierre) Move changes to create_compiler_instance invocation to correct
patch to prevent temporary build breakage.
(Jan) Use device_clc_version instead of device_version for compile/link
Copying the individual fields from the device when compiling/linking
will lead to an unnecessarily large number of fields getting passed
around.
v3: Rebase on current master
v2: Use device in function args before making additional changes in
following patches
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Currently both users of this would overflow an array when the
input was a dual slot double as they expected the number of
components to be a max of 4.
Since we pass the type we can just let the functions handle
doubles in a way they choose.
Reviewed-by: Dave Airlie <airlied@redhat.com>
All the tess shader and tgsi equivalents are here and it allows
use to use llvm_type_is_64bit() in the following patch without
exposing it externally.
Reviewed-by: Dave Airlie <airlied@redhat.com>
The V3D engine provides several perf counters.
Implement ->get_driver_query_[group_]info() so that these counters are
exposed through the GL_AMD_performance_monitor extension.
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
We want people to be using ISL_FORMAT_*, rather than the genxml format
enumerations. This patch drops 10 separate copies, and drops a bunch
of ugly casting.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
[jordan.l.justen@intel.com: Minor changes for rebase]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Split out the device info so isl doesn't depend on intel/common. Now
it will depend on the new intel/dev device info lib.
This will allow the decoder in intel/common to use isl, allowing us to
apply Ken's patch that removes the genxml duplication of surface
formats.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Some instructions, assume src and/or dst is half-precision based on a
type field (ie. f32/s32/u32 are full precision but others are half
precision). So add some code to sanity check the src/dst registers to
catch mixups.
Also propagate half-precision flag for SSA sources. The instruction
consuming a SSA value needs to be of the same type as the one producing
it.
This is probably not complete half-precision support, but a useful first
step. We do still need to add support for nir alu instructions for
converting between half/full precision.
Signed-off-by: Rob Clark <robdclark@gmail.com>
It isn't just vertex shaders that need to fixup reg footprint for inputs
populated before shader starts.
This problem showed up with compute shaders. If you have (for example)
a localregid sysval, but only the .x component is used, the hw still
writes the .yz components, which could overflow into other threads
causing corruption. Showed up in cl cts 'basic/test_basic intmath_int'.
But in theory the same problem could crop up elsewhere.
Signed-off-by: Rob Clark <robdclark@gmail.com>
I think this should also always only occur at the end of a BB (by
definition), and the BB successor should be the end block.
Signed-off-by: Rob Clark <robdclark@gmail.com>
glBindBufferRange(..) in vrend_draw_bind_ubo is failing with
more than one uniform block. This is due to improper alignment
of the start of the second block. Let's query the proper
alignment from the driver and pass it back to Mesa.
Let's query for the texture alignment too, even though the Virgl
renderer doesn't call glTexBufferRange yet.
The default values are the widest workable range possible (for example,
GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT on Nvidia is 256).
Fixes:
dEQP-GLES3.functional.ubo.* on Nvidia
Example test:
dEQP-GLES3.functional.ubo.multi_basic_types.single_buffer.shared_vertex
Note: This is based on "virgl: reduce some default capset limits.",
which hasn't landed in Mesa yet but should relatively soon.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Since v2 might take a while to rollout, we should reduce
these inside some gathered minimums and then v2 can increase
them using host values.
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This checks the kernel api is new enough and asks for the
larger caps size since the kernel won't mess it up now.
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Until llvm handles indirects better we will need to use these
workarounds in the radeonsi backend also.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The change tries to catch more opportunities to reuse the same set
of VAO's when building up display lists. Instead of checking the
offset with respect to the beginning of the vertex buffer object
the change tries to apply this same optimization with respect to the
previous display list node.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reduces my build from 1717 warnings to 1547 warnings by silencing 170
instances of things like
In file included from ../../SOURCE/master/src/mesa/main/texcompress_bptc.h:30:0,
from ../../SOURCE/master/src/mesa/main/texcompress_bptc.c:31:
../../SOURCE/master/src/mesa/main/texcompress_bptc.c: In function ‘_mesa_texstore_bptc_rgba_unorm’:
../../SOURCE/master/src/mesa/main/texstore.h:60:14: warning: unused parameter ‘dstFormat’ [-Wunused-parameter]
mesa_format dstFormat, \
^
../../SOURCE/master/src/mesa/main/texcompress_bptc.c:1276:32: note: in expansion of macro ‘TEXSTORE_PARAMS’
_mesa_texstore_bptc_rgba_unorm(TEXSTORE_PARAMS)
^~~~~~~~~~~~~~~
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reduces my build from 2075 warnings to 2023 warnings by silencing 52
instances of things like
src/compiler/nir/nir_constant_expressions.c: In function ‘evaluate_bfi’:
src/compiler/nir/nir_constant_expressions.c:1812:61: warning: unused parameter ‘bit_size’ [-Wunused-parameter]
evaluate_bfi(MAYBE_UNUSED unsigned num_components, unsigned bit_size,
^~~~~~~~
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reduces my build from 6301 warnings to 2075 warnings by silencing 4226
instances of things like
src/mesa/drivers/dri/i965/i965@sta/brw_oa_hsw.c: In function ‘hsw__render_basic__gpu_core_clocks__read’:
src/mesa/drivers/dri/i965/i965@sta/brw_oa_hsw.c:41:62: warning: unused parameter ‘brw’ [-Wunused-parameter]
hsw__render_basic__gpu_core_clocks__read(struct brw_context *brw,
^~~
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reduces my build from 6451 warnings to 6301 warnings by silencing 150
instances of
../../SOURCE/master/src/intel/compiler/brw_inst.h: In function ‘brw_reg_type brw_inst_src1_type(const gen_device_info*, const brw_inst*)’:
../../SOURCE/master/src/intel/compiler/brw_inst.h:802:55: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
unsigned file = __builtin_strcmp("dst", #reg) == 0 ? \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
BRW_GENERAL_REGISTER_FILE : \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
brw_inst_##reg##_reg_file(devinfo, inst); \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../SOURCE/master/src/intel/compiler/brw_inst.h:811:1: note: in expansion of macro ‘REG_TYPE’
REG_TYPE(src1)
^~~~~~~~
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reduces my build from 7119 warnings to 7005 warnings by silencing 114
instances of
In file included from ../../SOURCE/master/src/mesa/drivers/dri/i965/brw_context.h:46:0,
from ../../SOURCE/master/src/mesa/drivers/dri/i965/intel_pixel_read.c:38:
../../SOURCE/master/src/mesa/drivers/dri/i965/brw_bufmgr.h: In function ‘brw_bo_unmap’:
../../SOURCE/master/src/mesa/drivers/dri/i965/brw_bufmgr.h:258:47: warning: unused parameter ‘bo’ [-Wunused-parameter]
static inline int brw_bo_unmap(struct brw_bo *bo) { return 0; }
^~
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
These days, we're just passing a pointer to a prog_data field, which
we already have access to. We can just use it directly.
(In the past, it was a pointer to a separate value.)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This should have no practical impact. For the default uploader, we
don't really care, but for others, we may want to append more data
as the GPU is reading existing data, which means we need async and
persistent flags.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
I'd like to reuse the upload logic for a new program cache, but the
buffers will need to have a different lifetime than the default
uploader, and also some address space restrictions. So, we can't
use a single uploader for both situations - we'll need two of them.
This creates a public 'uploader' structure, and adjusts the interface
to take an uploader rather than always using brw->upload. It should
have no functional change at the moment.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Commit bit in the message descriptor (Bit 13) must be always set
to true in CNL+ for memory fence messages. It also fixes a piglit
GPU hang on cnl+ in simulation environment.
Piglit test: arb_shader_image_load_store-shader-mem-barrier
See HSD ES # 1404612949
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This reverts commit a4031bdfa9. It's
redundant with the sample mask predication done at this point by the
common logical send lowering infrastructure, and rather buggy because
it wasn't applying the correct sample mask in shaders using discard,
since the dispatch mask returned by FS_OPCODE_MOV_DISPATCH_TO_FLAGS
doesn't reflect samples discarded by the shader, so it could have led
to data corruption in fragment shader invocations that execute discard
based on a non-dynamically uniform condition.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The main motivation is to enable HDC surface opcodes on ICL which no
longer allows the sample mask to be provided in a message header, but
this is enabled all the way back to IVB when possible because it
decreases the instruction count of some shaders using HDC messages
significantly, e.g. one of the SynMark2 CSDof compute shaders
decreases instruction count by about 40% due to the removal of header
setup boilerplate which in turn makes a number of send message
payloads more easily CSE-able. Shader-db results on SKL:
total instructions in shared programs: 15325319 -> 15314384 (-0.07%)
instructions in affected programs: 311532 -> 300597 (-3.51%)
helped: 491
HURT: 1
Shader-db results on BDW where the optimization needs to be disabled
in some cases due to hardware restrictions:
total instructions in shared programs: 15604794 -> 15598028 (-0.04%)
instructions in affected programs: 220863 -> 214097 (-3.06%)
helped: 351
HURT: 0
The FPS of SynMark2 CSDof improves by 5.09% ±0.36% (n=10) on my SKL
laptop with this change. According to Eero this improves performance
of the same test by 9% on BYT and by 7-8% on BXT J4205 and on SKL GT2
desktop.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-By: Eero Tamminen <eero.t.tamminen@intel.com>
This makes sure that the header-present bit of the message descriptor
is in sync with the IR instruction fields, which gives the optimizer
more control to avoid the overhead of setting up a message header when
it's possible to do so.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This shouldn't cause any functional change at this point, it changes
SHADER_OPCODE_FIND_LIVE_CHANNEL to use the flag register specified at
the IR level instead of the hard-coded f1.0, now that it can be
represented in backend_instruction::flag_subreg. This will be
necessary for scheduling to behave correctly once more things start
making use of f1.0.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows representing conditional mods and predicates on f1.0-f1.1
at the IR level by adding an extra bit to the flag_subreg
backend_instruction field.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
SLM has a chunk of special-purpose memory separate from L3 on ICL+, we
shouldn't allocate a partition for it on L3 anymore.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since geometry shader also consumes prescale constants, the
geometry shader constant buffer will need to be updated when prescale
factor is changed.
Reviewed-by: Brian Paul <brianp@vmware.com>
The earlier Mesa commit 3d06c8afb5 ("st/mesa: don't translate blend
state when it's disabled for a colorbuffer") subtly changed the
details of gallium's per-RT blend state.
In particular, when pipe_rt_blend_state[i].blend_enabled is true,
we have to get the src/dst blend terms from pipe_rt_blend_state[i],
not [0] as before.
We now have to scan the blend targets to find the first one that's
enabled (if any). We have to use the index of that target for getting
the src/dst blend terms. And note that we have to set identical blend
terms for all targets.
This fixes the Piglit fbo-drawbuffers2-blend test. VMware bug 2063493.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We were calling SVGA3D_vgpu10_DestroyBlendState() when vgpu10 was not
enabled (bs->id==0 by default), resulting in lots of device errors.
Reviewed-by: Neha Bhende<bhenden@vmware.com>
If svga_update_state() fails, we flush the command buffer and retry.
If it fails again, it likely means we were unable to translate a shader
for some reason (uses too many resources, for example). In that case,
let's just skip the draw call. The alternative, just disabling the
shader stage in question, would certainly lead to bad rendering anyway,
and probably device errors.
Fixes failed assertion running Piglit glsl-1.50/execution/
variable-indexing/gs-output-array-vec4-index-wr.shader_test since it
uses too many GS output registers (though the test still fails).
VMware bug 2063492.
v2: also call pipe_debug_message() so apps or apitrace can be notified
when this issue occurs.
v3: use svga_update_state_retry().
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Previous patch missed a "return" when trying to modify the create encoder
function, which made the whole logic fail. Therefore, add the return back.
Fixes: b38b208ff8 "radeonsi:create uvd hevc enc entry"
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
ARM SoCs usually have their DRM/KMS devices on the platform bus, so add
support for this bus in order to allow use of the DRI_PRIME environment
variable with those devices.
While at it, also support the host1x bus, which is effectively the same
but uses an additional layer in the bus hierarchy.
Note that it isn't enough to support the bus that has the rendering GPU
because the loader code will also try to construct an ID path tag for a
scanout-only device if it is the default that is being opened.
The ID path tag for a device can be obtained by running udevadm info on
the device node, as shown in this example on NVIDIA Tegra:
$ udevadm info /dev/dri/card0 | grep ID_PATH_TAG
E: ID_PATH_TAG=platform-50000000_host1x
The corresponding OF_FULLNAME property, from which the ID_PATH_TAG is
constructed, can be found in the sysfs "uevent" attribute for the card0
device's parent:
$ grep OF_FULLNAME /sys/devices/platform/50000000.host1x/drm/uevent
OF_FULLNAME=/host1x@50000000
Similarily, /dev/dri/card1 corresponds to the GPU:
$ udevadm info /dev/dri/card1 | grep ID_PATH_TAG
E: ID_PATH_TAG=platform-57000000_gpu
and:
$ grep OF_FULLNAME /sys/devices/platform/57000000.gpu/uevent
OF_FULLNAME=/gpu@57000000
Changes in v2:
- avoid confusing pre-increment in strdup()
- add examples of tags to commit message
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
The disk cache implementation uses 64-bit atomic operations. For some
architectures, such as 32-bit ARM, GCC will not be able to translate
these operations into atomic, lock-free instructions and will instead
rely on the external atomics library to provide these operations.
Check at configuration time whether or not linking against libatomic
is necessary and if so, create a dependency that can be used while
linking the mesautil library.
This is the meson equivalent of 2ef7f23820 ("configure: check if
-latomic is needed for __atomic_*").
For some background information on this, see:
https://gcc.gnu.org/wiki/Atomic/GCCMM
Changes in v2:
- clarify meaning of lock-free in commit message
- fix build if -latomic is not necessary
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
This is just useless for two reasons:
1) flush_bits is not set accordingly, so nothing will be flushed
in BeginQuery().
2) we always flush caches in EndCommandBuffer(), so if a reset
is done in a previous command buffer we are safe.
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This allows most GPU objects to use the full 48-bit address space
offered by Gen8+ platforms, rather than being stuck with 32-bit.
This expands the available GPU memory from 4G to 256TB or so.
A few objects - instruction, scratch, and vertex buffers - need to
remain pinned in the low 4GB of the address space for various reasons.
We default everything to 48-bit but disable it in those cases.
Thanks to Jason Ekstrand for blazing this trail in anv first and
finding the nasty undocumented hardware issues. This patch simply
rips off all of his findings.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This gives the scheduler visibility into the headers which should
improve scheduling. More importantly, however, it lets the scheduler
know that the header gets written. As-is, the scheduler thinks that a
texture instruction only reads it's payload and is unaware that it may
write to the first register so it may reorder it with respect to a read
from that register. This is causing issues in a couple of Dota 2 vertex
shaders.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104923
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Following on from 49879f3778 this makes sure we use the correct
src index.
Fixes cts test:
KHR-GL46.compute_shader.atomic-case3
Reviewed-by: Dave Airlie <airlied@redhat.com>
We only need to check for previously processed location on user
defined varyings as they are the only ones that support component
packing. Therefore a single instance of processed_locs can be
shared by regular varyings and patches.
For simplicity we make processed_locs an array in order to handle
dual source bleanding.
Fixes the follow piglit test on radeonsi:
tests/spec/arb_enhanced_layouts/execution/component-layout/fs-output.shader_test
Reviewed-by: Dave Airlie <airlied@redhat.com>
This speeds up the Sascha Willems multisampling demo by around 25% when
using 8x or 16x MSAA.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
We'll want to re-use the complex resolve predicate computations for MCS
resolves so it's nice to have them as helper functions.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This doesn't actually do anything because att_state->fast_clear is
determined based on the return value of anv_layout_to_fast_clear_type
which currently returns NONE for multisampled images.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This is a bit complicated because we have to get the indirect clear
color in there somehow. In order to not do any more work in the shader
than needed, we set it up as it's own vertex binding which points
directly at the clear color address specified by the client.
Acked-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
There are enough #ifs in there that it's kind-of pointless to duplicate
it for each buffer.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Instead of listing all the UNIX PIPE_OS platforms just use
PIPE_OS_UNIX. Makes BSD sockets available on PIPE_OS_BSD.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
OpenBSD, FreeBSD, NetBSD and DragonFlyBSD all have clock_gettime()
so use it when PIPE_OS_BSD is defined.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
3 digits versions in LLVM only started from 3.4.1 on.
Hence, even if you can perfectly build with an old LLVM (< 3.4.1) in
the system while not needing LLVM at all (auto), when passing through
the LLVM version detection code, meson will fail when accessing
"_llvm_version[2]" due to:
"Index 2 out of bounds of array of size 2."
v2: Properly compare LLVM version and set patch version to 0
if < 3.4.1 (Eric).
v3: Improve the commit log explanation (Eric).
Cc: Dylan Baker <dylan@pnwbakers.com>
Cc: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
In 16631ca30e we fixed gen9 active components to account for padded
inputs in the URB, which we can have with SSO programs. To do that,
instead of going through the bitfield of inputs (which doesn't include
padding information), we compute the number of inputs from the size
of the URB entry.
Unfortunately, there are some special inputs that are not stored in
the URB and that we also need to account for. These special inputs
are identified and handled during calculate_attr_overrides().
Instead of keeping track of the exact number of inputs, we just
program active components for all possible inputs like we do in
anvil.
This fixes a regression in a WebGL program that uses Point Sprite
functionality (specifically, VARYING_SLOT_PNTC).
v2:
- Add 'Fixes' tag (Mark Janes)
- make no_vue_inputs int instead of uint32_t, and add const qualifier
to num_inputs variable (Ian)
v3:
- Do not try to count inputs correctly, just program all input
slots like we do in anvil (Ken)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105224
Fixes: 16631ca30e (i965/sbe: fix active components for SSO programs with over 16 inputs)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is an optimization which reduces the number of flushes for
small pool buffers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
var->name could be NULL under ARB_gl_spirv for example. And in any
case, the code is already handing var name being NULL when reading a
variable, so it is consistent to do it writing a variable too.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The introduction of 16-bit types with VK_KHR_16bit_storages implies that
push constant offsets could be multiple of 2-bytes. Some assertions are
updated so offsets should be just multiple of size of the base type but
in some cases we can not assume it as doubles aren't aligned to 8 bytes
in some cases.
For 16-bit types, the push constant offset takes into account the
internal offset in the 32-bit uniform bucket adding 2-bytes when we access
not 32-bit aligned elements. In all 32-bit aligned cases it just becomes 0.
v2: Assert offsets to be aligned to the dest type size. (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Range in 16-bit push constants load was being calculated
wrongly using 4-bytes per element instead of 2-bytes as it
should be.
v2: Use glsl_get_bit_size instead of if statement
(Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Enables storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccesss
features of VK_KHR_16bit_storage for Gen8+.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Restrict the use of untyped_surface_write with 16-bit pairs in
ssbo to the cases where we can guarantee that offset is multiple
of 4.
Taking into account that VK_KHR_relaxed_block_layout is available
in ANV we can only guarantee that when we have a constant offset
that is multiple of 4. For non constant offsets we will always use
byte_scattered_write.
v2: (Jason Ekstrand)
- Assert offset_reg to be multiple of 4 if it is immediate.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
16-bit load_ubo/ssbo operations that call do_untyped_read_vector don't
guarantee that offsets are multiple of 4-bytes as required by untyped_read
message. This happens for example in the case of f16mat3x3 when then
VK_KHR_relaxed_block_layout is enabled.
Vectors reads when we have non-constant offsets are implemented with
multiple byte_scattered_read messages that not require 32-bit aligned offsets.
Now for all constant offsets we can use the untyped_read_surface message.
In the case of constant offsets not aligned to 32-bits, we calculate a
start offset 32-bit aligned and use the shuffle_32bit_load_result_to_16bit_data
function and the first_component parameter to skip the copy of the unneeded
component.
v2: (Jason Ekstrand)
Use untyped_read_surface messages always we have constant offsets.
v3: (Jason Ekstrand)
Simplify loop for reads with non constant offsets.
Use end - start to calculate the number of 32-bit components to read with
constant offsets.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This helper used to load 16bit components from 32-bits read now allows
skipping components with the new parameter first_component. The semantics
now skip components until we reach the first_component, and then reads the
number of components passed to the function.
All previous uses of the helper are updated to use 0 as first_component.
This will allow read 16-bit components when the first one is not aligned
32-bit. Enabling more usages of untyped_reads with 16-bit types.
v2: (Jason Ektrand)
Change parameters order to first_component, num_components
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The surfaces that backup the GPU buffers have a boundary check that
considers that access to partial dwords are considered out-of-bounds.
For example, buffers with 1,3 16-bit elements has size 2 or 6 and the
last two bytes would always be read as 0 or its writting ignored.
The introduction of 16-bit types implies that we need to align the size
to 4-bytew multiples so that partial dwords could be read/written.
Adding an inconditional +2 size to buffers not being multiple of 2
solves this issue for the general cases of UBO or SSBO.
But, when unsized arrays of 16-bit elements are used it is not possible
to know if the size was padded or not. To solve this issue the
implementation calculates the needed size of the buffer surfaces,
as suggested by Jason:
surface_size = isl_align(buffer_size, 4) +
(isl_align(buffer_size, 4) - buffer_size)
So when we calculate backwards the buffer_size in the backend we
update the resinfo return value with:
buffer_size = (surface_size & ~3) - (surface_size & 3)
It is also exposed this buffer requirements when robust buffer access
is enabled so these buffer sizes recommend being multiple of 4.
v2: (Jason Ekstrand)
Move padding logic fron anv to isl_surface_state.
Move calculus of original size from spirv to driver backend.
v3: (Jason Ekstrand)
Rename some variables and use a similar expresion when calculating.
padding than when obtaining the original buffer size.
Avoid use of unnecesary component call at brw_fs_nir.
v4: (Jason Ekstrand)
Complete comment with buffer size calculus explanation in brw_fs_nir.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Like before use local variables from compile_vertex_list instead.
Remove vertex_size from struct vbo_save_vertex_list.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
The buffer_offset is used in aligned_vertex_buffer_offset.
But now that most of these decisions are done in compile_vertex_list
we can work on local variables instead of struct members in the
display list code. Clean that up and remove buffer_offset.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Replace last use on replay with _vbo_save_get_{min,max}_index. Appart from
that it is not used anymore.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Is not used anymore on replay, move the last use in display list
compilation to the original array in the display list compiler.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Is not used anymore on replay, move the last use in display list
compilation to the original array in the display list compiler.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Since we now store a set of VAOs in the display list, use these object
to get the reference to the VBO in several places.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Use the information already present in the VAO to update the current values
after display list replay. Set GL_OUT_OF_MEMORY on allocation failure
for the current value update storage.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Use the information already present in the VAO to replay a display list
node using immediate mode draw commands. Use a hand full of helper methods
that will be useful for the next patches also.
v2: Insert asserts, constify local variables.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
The master value is now stored inside the VAO already present in
struct vbo_save_vertex_list. Remove the unneeded copy from dlist storage.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
We don't zalloc the physical device so we need to unconditionally set
everything. Crucible helpfully initializes all allocations to 139 so it
was getting true regardless of whether or not the kernel actually
supports context priorities.
Fixes: 6d8ab53303 "anv: implement VK_EXT_global_priority extension"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts commit a2c1e48f15.
On BDWGT3e and KBLGT3e systems, this commit regressed the following
tests:
piglit.spec.ext_framebuffer_multisample.accuracy 2 stencil_resolve small depthstencil
piglit.spec.ext_framebuffer_multisample.accuracy 4 stencil_resolve small depthstencil
piglit.spec.ext_framebuffer_multisample.accuracy 6 stencil_resolve small depthstencil
piglit.spec.ext_framebuffer_multisample.accuracy 8 stencil_resolve small depthstencil
piglit.spec.ext_framebuffer_multisample.accuracy all_samples stencil_resolve small depthstencil
This stops a crash when running (still fails):
tests/spec/arb_gpu_shader_fp64/execution/explicit-location-gs-fs-vs.shader_test
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The nir->llvm conversion was using the wrong srcs.
Fixes:
tests/spec/arb_compute_shader/execution/shared-atomics.shader_test
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Gen4 point clipping calls brw_clip_tri_alloc_regs with nr_verts == 0,
which means that c->reg.vertex[] isn't initialized. It then emits MOVs
to stomp components of those uninitialized registers to 0.
This started causing assertions after Matt's recent series, when those
uninitialized registers started getting BRW_REGISTER_TYPE_NF, which
definitely doesn't exist on Gen4-5.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We need to align the size of the slice, not the offset of the next slice.
Fixes KHR-GLES3.texture_repeat_mode.rgba32ui_11x131_2_clamp_to_edge.
Fixes: b4b4ada761 ("broadcom/vc5: Fix layout of 3D textures.")
Before, we were trusting in the hardware to take the intersection
of the viewport clip with the drawing rectangle. Unfortunately,
3DSTATE_DRAWING_RECTANGLE is fairly expensive because it implicitly
does a full pipeline stall. If we're a bit more careful with our
viewport clipping, we can just re-emit it once at context creation
time.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The delayed loading code was fail if we had control flow.
This fixes:
tests/spec/arb_shader_image_load_store/execution/image_checkerboard.shader_test
v2: don't use temp_reg before setting temp_reg up.
Tested-by: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
With the Align16 tests now disabled, we can run the rest of the tests in
ICL mode (and see them pass!)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Align16 is no more. We previously generated an align16 ADD instruction
to calculate DDY:
add(16) g25<1>F -g23<4>.xyxyF g23<4>.zwzwF { align16 1H };
Without align16, we now implement it as:
add(4) g25<1>F -g23<0,2,1>F g23.2<0,2,1>F { align1 1N };
add(4) g25.4<1>F -g23.4<0,2,1>F g23.6<0,2,1>F { align1 1N };
add(4) g26<1>F -g24<0,2,1>F g24.2<0,2,1>F { align1 1N };
add(4) g26.4<1>F -g24.4<0,2,1>F g24.6<0,2,1>F { align1 1N };
where only the first two instructions are needed in SIMD8 mode.
Note: an earlier version of the patch implemented this in two
instructions in SIMD16:
add(8) g25<2>F -g23<4,2,0>F g23.2<4,2,0>F { align1 1N };
add(8) g25.1<2>F -g23.1<4,2,0>F g23.3<4,2,0>F { align1 1N };
but I realized that the channel enable bits will not be correct. If we
knew we were under uniform control flow, we could emit only those two
instructions however.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In a future patch, generate_ddy will want to inspect inst->exec_size.
Change generate_ddx as well for consistency.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The PLN instruction is no more. Its functionality is now implemented
using two MAD instructions with the new native-float type. Instead of
pln(16) r20.0<1>:F r10.4<0;1,0>:F r4.0<8;8,1>:F
we now have
mad(8) acc0<1>:NF r10.7<0;1,0>:F r4.0<8;8,1>:F r10.4<0;1,0>:F
mad(8) r20.0<1>:F acc0<8;8,1>:NF r5.0<8;8,1>:F r10.5<0;1,0>:F
mad(8) acc0<1>:NF r10.7<0;1,0>:F r6.0<8;8,1>:F r10.4<0;1,0>:F
mad(8) r21.0<1>:F acc0<8;8,1>:NF r7.0<8;8,1>:F r10.5<0;1,0>:F
... and in the case of SIMD8 only the first pair of MAD instructions is
used.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If multiple instructions are emitted, special handling of things like
conditional mod and NoDDClr/NoDDChk need to be performed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This isn't technically broken, but the next patch will make this
function report whether it generated multiple instructions, and that
information will be used to disable the application of conditional mod
by the generic code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This new type exposes the additional precision offered by the
accumulator register and will be used in the next patch to implement the
functionality of the PLN instruction using a pair of MAD instructions.
One weird thing to note: align1 ternary instructions may only have an
accumulator in the dst or src1 normally, but when src0's type is :NF
the accumulator is read.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The hardware register types' encodings have changed on Gen11. Good thing
we have that superfluous looking brw_reg_type abstraction lying around!
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen11 does not support DF, Q, UQ types in hardware. As a result, we have
to disable some GL extensions until they can be reimplemented.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
anv_gem_set_context_param is to be used directly instead!
Fixes: 6d8ab53303 "anv: implement VK_EXT_global_priority extension"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fix clipper validMask setting. We don't need to run frustum rejected
primitives through the clipper. Perform frustum culling with only
frustum clip codes. Guardband clip codes cannot be used because they
overlap frustum codes.
Reviewed-By: Bruce Cherniak <bruce.cherniak@intel.com>
Translate is now part of an overloaded LOAD call which required a change to
the code gen to skip the load functions in order to handle them manually
to make them virtual.
Reviewed-By: Bruce Cherniak <bruce.cherniak@intel.com>
- Have the draw type sent to DrawInfoEvent in handlers created in
archrast.cpp. The draw type no longer needs to be sent during during
AR_API_EVENT() call in api.cpp.
- Remove draw type from event defintions in events_private.proto, no
longer needed
Reviewed-By: Bruce Cherniak <bruce.cherniak@intel.com>
Populate pLastIndex, even for the non-indexed case. An zero pLastIndex
can cause the index offsets inside the fetcher to have non-sensical values
that can be either very large positive or very large negative numbers.
Reviewed-By: Bruce Cherniak <bruce.cherniak@intel.com>
We were setting view to NULL if the iteration was larger than i.
But in fact if the view is NULL the code did nothing anyway...
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
v2: add ANV_CONTEXT_REALTIME_PRIORITY (Chris)
use unreachable with unknown priority (Samuel)
v3: add stubs in gem_stubs.c (Emil)
use priority defines from gen_defines.h
v4: cleanup, add anv_gem_set_context_param (Jason)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> (v2)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v3)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We now have hopefully fixed all bugs regarding high addresses on Vega10 and
Raven. Start to use the high range to make room for SVM in the low
range.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
According to GLSL ES 3.2 spec, see table in 9.2.1 "Linked Shaders"
section, the precision qualifier should match for uniform variables.
This also applies to previous GLSL ES 3.x specs.
This 'if' checks the condition for uniform variables, while for UBOs
it is checked in link_interface_blocks.cpp.
Fixes: b50b82b8a5
("glsl/es31: precision qualifier doesn't need to match in shader interface block members")
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Previously we had a check for 1d of narrow 2D textures, however
narrow 2d textures caused gpu hangs, but it was correct for 1d
textures.
This fixes a bunch of 1D image piglits for me.
Fixes: 7b8e1c089d (r600/texture: drop lowering 1d/2d images to linear.)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Ideally the st_finalize_texture call would take care of that, but it
doesn't seem to with KHR-GL45.shader_image_size.advanced-nonMS-*. This
assertion makes sure that no such values are passed to the driver.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This was segfaulting:
dEQP-VK.memory.pipeline_barrier.host_write_index_buffer.1024
Fixes: 8de6f79707 (ac/radeonsi: add load_base_vertex() to the abi)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
radeonsi, i965 and anv all treat fdd{x,y} opcodes the same as
fdd{x,y}_coarse by default. The SPIR-V spec lets the implementation
decide how it should be handled and radv was previously going
for the higher quality option. Here we change the shared amd
code to match how nir_op_fdd{x,y} is expected to be handled
by the other NIR drivers.
Fixes piglit test:
./bin/arb_shader_texture_lod-texgrad -auto
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The lowering is incompatible with how the radeonsi backend works.
Fixes piglit test:
./bin/arb_shader_draw_parameters-basevertex vertexid-zerobased -auto
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
test_fuzz_compact_instruction() was attempting to modify the uint64_t
data array of a brw_inst through a pointer to uint32_t, which has
undefined behavior. This was causing the test_eu_compact unit test to
fail mysteriously for me on GCC 7 with some additional
harmless-looking changes I had applied to my tree, which happened to
affect the order instructions are emitted by GCC causing the bit
twiddling to be done after the clear_pad_bits() call which is supposed
to overwrite the same data through a pointer of different type,
leading to data corruption. A similar failure has been reported by
Vinson Lee on the master branch built with GCC 8.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105052
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In order to fix a build failure on compilers not implementing
unrestricted unions, which is a C++11 feature.
v2: Provide signed integer comparison and assignment operators instead
of BITSET_WORD ones to avoid spurious ambiguity warnings on
comparisons with a signed integer literal.
Fixes: ba79a90fb5 "glsl: Switch ast_type_qualifier to a 128-bit bitset."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105238
Tested-by: Roland Scheidegger <sroland@vmware.com>
Tested-By: George Kyriazis <george.kyriazis@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It's basically just the opposite, and it only makes sense to
round the layer for 2D texture arrays.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The hardware skips over unallocated slots, so we have to make sure those
registers are packed together.
Fixes KHR-GL45.enhanced_layouts.fragment_data_location_api
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Karol Herbst <kherbst@redhat.com>
It looks like we had all the pieces in place for this,
just never tested it and turned it on.
I don't see any CTS regressions and the computeshader
demo runs.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
currently while insterting barriers, writes and reads to FILE_FLAGS aren't
considered. This can lead to WaR hazards in some situations.
With the previous commit fixes shaders with intstructions like this:
mad u32 $r2 $r4 $r11 $r2
mad u32 { $r5 $c0 } $r4 $r10 $r6
mad (SUBOP:1) u32 $r3 $r4 $r10 $r2 $c0
Affects OpenCL CTS tests on Maxwell+:
basic/test_basic intmath_long
basic/test_basic intmath_long2
basic/test_basic intmath_long4
v2: only put barriers on instructions which actually read flags
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
In the sched data calculator we have to track first use of defs by iterating
over all defs of an instruction, not just the first one.
v2: fix minGRP and maxGRP values
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Similar to 90dd6e5 ("Android: egl: add dependency on libnativewindow")
Fixes the following building error:
In file included from out/target/product/x86_64/obj_x86/STATIC_LIBRARIES/libmesa_vulkan_util_intermediates/util/vk_enum_to_str.c:26:
external/mesa/include/vulkan/vk_android_native_buffer.h:22:10: fatal error: 'system/window.h' file not found
^~~~~~~~~~~~~~~~~
1 error generated.
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Similar to 90dd6e5 ("Android: egl: add dependency on libnativewindow")
Fixes the following building errors:
In file included from external/mesa/src/intel/vulkan/gen7_cmd_buffer.c:30:
In file included from external/mesa/src/intel/vulkan/anv_private.h:72:
external/mesa/include/vulkan/vk_android_native_buffer.h:22:10: fatal
error: 'system/window.h' file not found
^~~~~~~~~~~~~~~~~
1 error generated.
...
In file included from external/mesa/src/intel/vulkan/anv_gem.c:32:
In file included from external/mesa/src/intel/vulkan/anv_private.h:72:
external/mesa/include/vulkan/vk_android_native_buffer.h:22:10: fatal
error: 'system/window.h' file not found
^~~~~~~~~~~~~~~~~
1 error generated.
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Building rules are aligned to automake ones
The correct script to build anv_extensions.{c,h} is anv_extensions_gen.py
Generation rules for anv_extensions.c requires --out-c option
Generation rules for anv_extensions.h were missing
Necessary include paths are added to avoid following build errors:
cp: cannot stat '.../gen/STATIC_LIBRARIES/libmesa_vulkan_common_intermediates/vulkan/anv_extensions.c':
No such file or directory
In file included from external/mesa/src/intel/vulkan/anv_gem.c:32:
external/mesa/src/intel/vulkan/anv_private.h:75:10: fatal error: 'anv_extensions.h' file not found
^~~~~~~~~~~~~~~~~~
1 error generated.
In file included from external/mesa/src/intel/vulkan/anv_batch_chain.c:30:
external/mesa/src/intel/vulkan/anv_private.h:75:10: fatal error: 'anv_extensions.h' file not found
^~~~~~~~~~~~~~~~~~
1 error generated.
Fixes: dd088d4bec ("anv/extensions: Generate a header file with extension tables")
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
so that it can be removed and replaced with inline VBO descriptors,
and the pointer can be packed in unused bits of VBO descriptors.
This also removes the pointer from merged TES-GS where it's useless.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
autotools wants to have the BUILT_SOURCES ready as soon as it enters the
directory, even if they are not used. This meant the build failed if
wayland-protocols was not available on the system, even if it was not
enabled.
As BUILT_SOURCES cannot be used in a conditional (cf. 166852ee95), do
the same thing as EGL and manually encode the dependencies in the
Makefile.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Fixes: bfa22266cd ("vulkan/wsi/wayland: Add support for zwp_dmabuf")
Cc: Emil Velikov <emil.velikov@collabora.co.uk>
Reported-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105211
On cayman this was hitting an assert later, which probably wasn't
see on non-cayman due to having the t slot.
Fixes: 9041730d1 (r600: add support for ARB_shader_clock.)
This requires passing an extra argument to the lowering pass because
the KHR_blend_equation_advanced specification doesn't seem to define
any mechanism for the implementation to determine at compile-time
whether coherent blending can ever be used (not even an "#extension
KHR_blend_equation_advanced_coherent" directive seems to be required
in the shader source AFAICT).
In the long run we'll probably want to do state-dependent recompiles
based on the value of ctx->Color.BlendCoherent, but right now there
would be no benefit from that because the only driver that supports
coherent framebuffer fetch is i965 on SKL+ hardware, which are unable
to support the non-coherent path for the moment because of texture
layout issues, so framebuffer fetch coherency is always enabled for
them.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
This allows the application to request framebuffer fetch coherency
with per-fragment output granularity. Coherent framebuffer fetch
outputs (which is the default if no qualifier is present for
compatibility with older versions of the EXT_shader_framebuffer_fetch
extension) will have ir_variable_data::memory_coherent set to true.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
At the same point where it is initialized on GL(ES) 3.0+ so we can
implement some common layout qualifier handling in a future commit.
Until now the fb_fetch_output flag would be inherited from the
original implicit gl_LastFragData declaration at a later point in the
AST to GLSL IR translation.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
This should end the drought of bits in the ast_type_qualifier object.
The bitset_t type works pretty much as a drop-in replacement for the
current uint64_t bitset.
The only catch is that the bitset_t type as defined in the previous
commit doesn't have a trivial constructor (because it has a
user-defined constructor), so it cannot be used as union member
without providing a user-defined constructor for the union (which
causes it in turn to be non-trivially constructible). This annoyance
could be easily addressed in C++11 by declaring the default
constructor of bitset_t to be the implicitly defined one -- IMO one
more reason to drop support for GCC 4.2-4.3.
The other minor change was required because glsl_parser_extras.cpp was
hard-coding the type of bitset temporaries as uint64_t, which (unlike
would have been the case if the uint64_t had been replaced with
e.g. an __int128) would otherwise have caused a build failure, because
the boolean conversion operator of bitset_t is marked explicit (if
C++11 is available), so the bitset won't be silently truncated down to
1 bit in order to use it to initialize the uint64_t temporaries
(yikes).
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
This can be used to specify that a C++ conversion operator is not
meant to be used for implicit conversions, which can lead to
unintended loss of information in some cases. Implemented as a macro
in order to keep old GCC versions happy.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Desktop GL is now supported, and there is an additional entry-point
for EXT_shader_framebuffer_fetch_non_coherent.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
The changes I had originally planned for the MESA_shader_framebuffer_fetch
extension have been merged into the EXT spec, there's no point in keeping
MESA_shader_framebuffer_fetch extension enables.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
This GL entry point was renamed to glFramebufferFetchBarrier() in the
EXT extension on request from Khronos members. Update the Mesa
codebase to match the latest spec.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
This reverts two bogus and seemingly useless changes from the commits
referenced below, which broke KHR_blend_equation_advanced (and
EXT_shader_framebuffer_fetch_non_coherent which wasn't exposed yet)
for any kind of render target surface that would cause the
get_isl_surf() call in brw_emit_surface_state() to do anything useful
(notice how the result of get_isl_surf() is completely ignored by the
caller right now), as was the case while using those extensions with
1D array or 3D framebuffers in particular.
Fixes: f5859b45b1 "i965/miptree: Switch remaining surfaces to isl"
Fixes: bf24c3539e "i965/miptree: Clean-up unused"
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
TCS_OUT_LAYOUT has 13 unused bits. That's enough for a 32-bit address
aligned to 512KB. Hey, it's a 13-bit pointer!
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If 32-bit pointers are supported, both pointers can be moved into s[0:1]
and then ESGS has exactly the same user data SGPR declarations as VS.
If 32-bit pointers are not supported, only one pointer can be moved into
s[0:1]. In that case, the 2nd pointer is moved before TCS constants,
so that the location is the same in HS and GS.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
When transitioning to an htile compressed depth format, Set the full
depth range, so later rasterization can pass HiZ. Previously, for depth
only formats, the depth range was set to 0 to 0. This caused unwanted
HiZ rejections with a VK_FORMAT_D16_UNORM depth buffer
(VK_FORMAT_D32_SFLOAT was not affected somehow).
These values are derived from PAL [0], since I can't find the
specification describing the htile values.
[0] 5cba4ecbda/src/core/hw/gfxip/gfx9/gfx9MaskRam.cpp (L1500)
CC: Dave Airlie <airlied@redhat.com>
CC: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Fixes: 5158603182 "radv: Use correct HTILE expanded words."
Similar to cb0d1ba156 ("anv/extensions: Fix VkVersion::c_vk_version for patch == None")
fixes the following building errors:
out/target/product/x86_64/obj_x86/STATIC_LIBRARIES/libmesa_radv_common_intermediates/radv_entrypoints.c:1161:48:
error: use of undeclared identifier 'None'; did you mean 'long'?
return instance && VK_MAKE_VERSION(1, 0, None) <= core_version;
^~~~
long
external/mesa/include/vulkan/vulkan.h:34:43: note: expanded from macro 'VK_MAKE_VERSION'
(((major) << 22) | ((minor) << 12) | (patch))
^
...
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
Fixes: e72ad05c1d ("radv: Return NULL for entrypoints when not supported.")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cube maps are entire miptrees repeated, while 3D textures have each level
have all of its layers next to each other. Fixes tex3d and
tex-miplevel-selection GL2:texture() 3D.
Like for vc4, the new DISPLAY_TARGET flag ended up causing no formats to
match. Just drop the whole retval == usage thing and return early when we
hit a known unsupported case.
Fixes: f7604d8af5 ("st/dri: only expose config formats that are display targets")
Once GBM started looking at the values of the alpha masks, ARGB/ABGR
wouldn't match any more because we had both A and R in the low bits.
Fixes: 2ed344645d ("gbm/dri: Add RGBA masks to GBM format table")
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Daniel Stone <daniels@collabora.com>
This is more accurate with respect to the compatibility profile.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
LLVM requirement was bumped to 4.0.0 with earlier commit.
Hence any code tailored for older versions is now unreachable.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
When we set up the shadow resource we were copying the original resource
as the template, including its prsc->next field. When we shadowed the
first YUV plane's resource for linear-to-tiled conversion, we would end up
unbalancing the refcount on the shadow resource's destruction.
TS tiles map to a fixed amount of bytes in the color/depth surface,
so the blocksize of the format needs to be taken into account when
calculating the number of tiles to fill.
The simplest fix is to just use the layer stride, which is the surface
size in bytes.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Some of the 16bit formats misrender with missing tiles with the current
"2" state. As all the previously working formats also work with the "3"
state, just always use that one.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
This feature has caused some trouble already. Add a debug switch to
allow users to quickly check if a specific issue is caused by this
feature.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Currently meson will generate a pkg-config that links to EGL_mesa (or
GLX_mesa), but this isn't correct, it should always link to EGL or GL.
Probably the "right" solution is to have glvnd itself provide the pkg
config files for GL and EGL, but that also means that glvnd needs to
provide many of the header files, which makes it a more involved job.
Fixes: a47c525f32 ("meson: build glx")
Fixes: 035ec7a2bb ("meson: Add support for EGL glvnd")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
dri2_display_destroy() is called when platform specific display
initialisation fails. However, this would typically lead to a
segfault due to the dri2_egl_display vbtl not having been set up.
Fixes: 2db9548296 ("loader_dri3/glx/egl: Optionally use a blit
context for blitting operations")
Signed-off-by: Frank Binns <francisbinns@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
RGB9_E5 should be accepted by RenderbufferStorage if the
EXT_texture_shared_exponent is exposed. It is left to the
implementations to return GL_FRAMEBUFFER_UNSUPPORTED_EXT
when checking the framebuffer completeness if they do not
support rendering in this format.
Discussed in:
https://github.com/KhronosGroup/OpenGL-API/issues/32
This fixes KHR-GL45.internalformat.renderbuffer.rgb9_e5
v2: Added more info to the commit message (Antia)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Antia Puentes <apuentes@igalia.com>
Finally use an internal VAO to execute display list draws. Avoid
duplicate state validation for display list draws. Remove client arrays
previously used exclusively for display lists.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
VAOs will be used in the next change as immutable object across multiple
contexts. Only reference counting may write concurrently on the VAO. So,
make the reference count thread safe for those and only those VAO objects.
v3: Use bool/true/false for gl_vertex_array_object::SharedAndImmutable.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Finally use an internal VAO to execute immediate mode draws. Avoid
duplicate state validation for immediate mode draws. Remove client arrays
previously used exclusively for immediate mode draws.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Correct VBO_MATERIAL_SHIFT value.
The functions will be used next in this series.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
We will need the flush_vertices argument later in this series.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Change vertex_attrib_binding() to _mesa_vertex_attrib_binding(), add a
flush_vertices argument, and make it publicly available.
The function will be needed later in the series.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
We will need the flush_vertices argument later in this series.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Switch over to use the _DrawVAO for all the array type draws.
The _DrawVAO needs to be set before we enter _mesa_update_state, so move
setting the draw method in front of the first call to _mesa_update_state
which is in turn called from the *validate*Draw* calls. Using the
gl_vertex_array_object::_Enabled bitmask, gl_vertex_program_state::_VPMode
and gl_vertex_array_object::_AttributeMapMode we can already set
varying_vp_inputs before we call _mesa_update_state the first time.
Thus remove duplicate state validation.
v2: Update comments.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Provided the _DrawVAO and the derived state that is maintained if we have
the _DrawVAO set, implement a method to incrementally update the array of
gl_vertex_array input pointers.
v2: Add some more comments.
Rename _vbo_array_init to _vbo_init_inputs.
Rename vbo_context::arrays to vbo_context::draw_arrays.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
During the patch series this VAO gets populated with either the currently
bound VAO or an internal VAO that will be used for immediate mode and
dlist rendering.
v2: More comments about the _DrawVAO, filter and enabled mask.
Rename _DrawVAOEnabled to _DrawVAOEnabledAttribs.
v3: Fix and move comment.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
At those places where we used get_vp_mode() use
gl_vertex_program_state::_VPMode instead.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
To get equivalent information than get_vp_mode(), track the vertex
processing mode in a per context variable at
gl_vertex_program_state::_VPMode.
This aims to replace get_vp_mode() as seen in the vbo module.
But instead of the get_vp_mode() implementation which only gives correct
answers past calling _mesa_update_state() this context variable is
immediately tracked when the vertex processing state is modified. The
correctness of this value is asserted on state validation.
With this in place we should be able to untangle the dependency with
varying_vp_inputs and state invalidation.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
We don't want filtering for integer textures, same as depth/stencil.
Fixes: KHR-GL45.direct_state_access.renderbuffers_storage_multisample
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Karol Herbst <kherbst@redhat.com>
We need to mark the range as valid, and validate the resource using a
helper to ensure that the buffer status is marked properly.
Fixes some CTS pipeline stats query tests, and
KHR-GL45.direct_state_access.queries_functional
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Karol Herbst <kherbst@redhat.com>
Two things were off:
- valid range was not updated, which could affect waiting for future
maps
- fencing was done manually instead of using the *_resource_validate
helper, which resulted in a missed dirty buffer flag being set
Fixes: KHR-GL45.direct_state_access.buffers_clear
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Karol Herbst <kherbst@redhat.com>
Somewhere along the way the Makefile changes got lost ...
Fixes: 4db78f3a6b "radv: Put supported extensions in a struct."
Acked-by: Dave Airlie <airlied@redhat.com>
This implements strict checking for the entrypoint ProcAddr
functions.
- InstanceProcAddr with instance = NULL, only returns the 3 allowed
entrypoints.
- DeviceProcAddr does not return any instance entrypoints.
- InstanceProcAddr does not return non-supported or disabled
instance entrypoints.
- DeviceProcAddr does not return non-supported or disabled device
entrypoints.
- InstanceProcAddr still returns non-supported device entrypoints.
Reviewed-by: Dave Airlie <airlied@redhat.com>
The MSVC version we (at VMware) primarily care about from now on is
2015.
See https://ci.appveyor.com/project/jrfonseca/mesa/build/46
We can drop support for building with 2013 in a future commit. I'm not
aware of significant changes in C99/C11 support from MSVC 2013 to 2015,
but there's no point in continuing supporting old MSVC versions when
nobody cares.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Otherwise the code size increases because the original fexp2()
instructions can't be deleted.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This disables persistence accross wavefronts.
F1 2017 and Wolfenstein 2 appear to use some coherent images
but this patch doesn't seem to change anything.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We were only resolving the first.
v2:
- Do not require that the number of layers on dst and src are an
exact match, it is okay if the dst has more layers so long as
it has at least the same that we are going to resolve.
- Do not always resolve array_len layers, we should resolve
only from base_array_layer to array_len.
v3:
- v2 was assuming that array_len represented the total number of
layers in the image, but it represents the number of layers
starting at the base array ayer.
v4:
- The number of layers to resolve should be taken from the
framebuffer (Nanley).
Fixes new CTS tests for multisampled layered rendering:
dEQP-VK.renderpass.multisample_resolve.layers_*
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The setTexBuffer2 hook from GLX is used to implement glxBindTexImageEXT
which has tighter restrictions than just "it's shared". In particular,
it says that any rendering to the image while it is bound causes the
contents to become undefined.
The GLX_EXT_texture_from_pixmap extension provides us with an acquire
and release in the form of glXBindTexImageEXT and glXReleaseTexImageEXT.
The extension spec says,
"Rendering to the drawable while it is bound to a texture will leave
the contents of the texture in an undefined state. However, no
synchronization between rendering and texturing is done by GLX. It
is the application's responsibility to implement any synchronization
required."
From the EGL 1.4 spec for eglBindTexImage:
"After eglBindTexImage is called, the specified surface is no longer
available for reading or writing. Any read operation, such as
glReadPixels or eglCopyBuffers, which reads values from any of the
surface’s color buffers or ancillary buffers will produce
indeterminate results. In addition, draw operations that are done
to the surface before its color buffer is released from the texture
produce indeterminate results
In other words, between the bind and release calls, we effectively own
those pixels and can assume, so long as we don't crash, that no one else
is reading from/writing to the surface. The GLX and EGL implementations
call the setTexBuffer2 and releaseTexBuffer function pointers that the
driver can hook.
In theory, this means that, between BindTexImage and ReleaseTexImage, we
own the pixels and it should be safe to track aux usage so we
can avoid redundant resolves so long as we start off with the right
assumption at the start of the bind/release pair.
In practice, however, X11 has slightly different expectations. It's
expected that the server may be drawing to the image at the same time as
the compositor is texturing from it. In that case, the worst expected
outcome should be tearing or partial rendering and not random corruption
like we see when rendering races with scanout with CCS. Fortunately,
the GEM rules about texture/render dependencies save us here. If X11
submits work to write to a pixmap after the compositor has submitted
work to texture from it, GEM inserts a dependency between the compositor
and X11. If X11 is using a high-priority context, this will cause the
compositor to get a temporarily boosted priority while the batch from
X11 is waiting on it. This means that we will never have an actual race
between X11 and the compositor so no corruption can happen.
Unfortunately, however, this means that X11 will likely be rendering to it
between the compositor's BindTexImage and ReleaseTexImage calls. If we
want to avoid strange issues, we need to be a bit careful about
resolves because we can't really transition it away from the "default"
aux usage. The only case where this would practically be a problem is
with image_load_store where we have to do a full resolve in order to use
the image via the data port. Even there it would only be a problem if
batches were split such that X11's rendering happens between the resolve
and the use of it as a storage image. However, the chances of this
happening are very slim so we just emit a warning and hope for the best.
This commit adds a new helper intel_miptree_finish_external which resets
all aux state to whatever ISL says is the right worst-case "default" for
the given modifier. It feels a little awkward to call it "finish"
because it's actually an acquire from the perspective of the driver, but
it matches the semantics of the other prepare/finish functions. This
new helper gets called in intelSetTexBuffer2 instead of make_shareable.
We also add an intelReleaseTexBuffer (we passed NULL to releaseTexBuffer
before) and call intel_miptree_prepare_external in it. This probably
does nothing most of the time but it means that the prepare/finish calls
are properly matched.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
The old code made a new miptree that referenced the same BO as the
renderbuffer and just trusted in the memory aliasing to work. There are
only two ways in which the new miptree is liable to differ from the one
in the renderbuffer and neither of them matter:
1) It may have a different target. The only targets that we can ever
see in intelSetTexBuffer2 are GL_TEXTURE_2D and GL_TEXTURE_RECTANGLE
and the difference between the two doesn't matter as far as the
miptree is concerned; genX(update_sampler_state) only looks at the
gl_texture_object and not the miptree when determining whether or
not to use normalized coordinates.
2) It may have a very slightly different format. Again, this doesn't
matter because we've supported texture views for quite some time so
we always look at the gl_texture_object format instead of the
miptree format for hardware setup anyway.
On the other hand, because we were recreating the miptree, we were using
intel_miptree_create_for_bo which doesn't understand modifiers. We
really want this function to work without doing a resolve so long as you
have modifiers so we need to fix that.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This function is used to determine when we need to re-allocate a
miptree. Since we do nothing different in miptree allocation for
sRGB vs. linear, loosening this should be safe and may lead to less
copying and reallocating in some odd cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
We're about to start letting the intel_obj->_Format be the "real"
texture format. For depth/stencil textures, this may be a combined
depth stencil format. For ETC2 on gen7 and earlier, this will be the
actual ETC2 format. This makes a bit more GL sense but means we have to
be careful in state upload.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Both KHR_blend_equation_advanced and ARB_bindless_texture provide
layout qualifiers, and are exposed in compatibility contexts. We
need to parse the layout qualifier as a token in order for those
to work, but forgot to extend this check.
ARB_shader_image_load_store would need a similar treatment, but we
don't expose that in legacy OpenGL contexts.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105161
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Use a helper function for updating the swapchain status. This will be
used later to handle VK_SUBOPTIMAL_KHR, where we need to make a
non-error status stick to the swapchain until recreation. Instead of
direct comparisons to VK_SUCCESS to check for error, test for negative
numbers meaning an error status, and positive numbers indicating
non-error statuses.
v2 (Jason Ekstrand):
- Use a pattern of "return x11_swapchain_result(chain, VK_WHATEVER)"
- Handle wsi_queue_pull returning VK_TIMEOUT
- Call x11_swapchain_result in x11_present_to_x11
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This most likely means we lost our connection to the X server so
OUT_OF_DATE is reasonable. This was also the one case where we pushed a
UINT32_MAX into the queue without setting an error condition.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Daniel Stone <daniels@collabora.com>
zwp_linux_dmabuf_v1 lets us use multi-planar images and buffer
modifiers.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For a bit there, we had a bug in i965 where it ignored the tiling of the
modifier and used the one from the BO instead. At one point, we though
this was best fixed by setting a tiling from Vulkan. However, we've
decided that i965 was just doing the wrong thing and have fixed it as of
5048572352.
The old assumptions also affected the solution we used for legacy
scanout in Vulkan. Instead of treating it specially, we just treated it
like a modifier like we do in GL. This commit goes back to making it
it's own thing so that it's clear in the driver when we're using
modifiers and when we're using legacy paths.
v2 (Jason Ekstrand):
- Rename legacy_scanout to needs_set_tiling
Reviewed-by: Daniel Stone <daniels@collabora.com>
This involves extending our fake extension a bit to allow for additional
querying and passing of modifier information. The added bits are
intended to look a lot like the draft of VK_EXT_image_drm_format_modifier.
Once the extension gets finalized, we'll simply transition all of the
structs used in wsi_common to the real extension structs.
Reviewed-by: Daniel Stone <daniels@collabora.com>
This was originally intended to make sure the remap location
was not -1. However the code has changed alot since then,
the location is now never set to -1 and we also handle
components meaning this old assert has been doing comparisions
with the pointer to the array of component data.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105183
Fixes assert in the glsl-1.50-gs-max-output-components piglit test.
Note that the double handling will only work for doubles that
don't take up multiple slots i.e. double and dvec2. However
dual slot double handling is an existing bug which is made no
worse by this patch.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Delaying unrolling and allowing NIR to do it instead has been shown
to result in better code in drivers such as i965. shader-db results
appear to show the same is true for radeonsi.
The other advantage is that using NIR unrolling improves compile
times significantly.
Totals from affected shaders:
SGPRS: 9624 -> 10016 (4.07 %)
VGPRS: 6800 -> 6464 (-4.94 %)
Spilled SGPRs: 0 -> 2 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 359176 -> 332264 (-7.49 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 1355 -> 1432 (5.68 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes the following piglit tests:
tests/spec/arb_tessellation_shader/execution/double-array-vs-tcs-tes.shader_test
tests/spec/arb_tessellation_shader/execution/double-vs-tcs-tes.shader_test
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The DRI3 drawable info struct currently stores a boolean for whether the
last completed operation was a flip or not. As we need to track the full
completion mode for handling suboptimal returns, change the 'flipping'
field to the raw present completion mode from the server.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
It's true for depth HiZ clears because we only have HiZ on single-slice
images right now. However, for stencil-only clears there is no such
restriction.
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Based on amdgpu hardware query information to check if UVD hevc enc support
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We have to increase the file index also for 0x10000 not just for values
greater than 0x10000.
Fixes: 37b67db6ae
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This looks like it should be protected by the assume() about
nr_color_regions, but my compiler warns anyway.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: d32956935e ("glsl: Walk a list of ir_dereference_array to mark array elements as accessed")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
My build was producing:
../src/loader/loader.c:121:67: warning: ‘%1u’ directive output may be truncated writing between 1 and 3 bytes into a region of size 2 [-Wformat-truncation=]
and we can avoid this careful calculation by just using asprintf (as we do
elsewhere in the file).
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
They should probably get unit tests implemented, but this cleans up a
bunch of warnings in my build for now.
Fixes: 59f458cd87 ("glsl: Add 16-bit types")
Cc: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This just avoids marking it as a used output if we don't
actually use it.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
dEQP-VK.tessellation.invariance.outer_edge_symmetry.triangles_equal_spacing_ccw
was hitting an llvm assert due to one value being an int and the
other a float.
This just casts both values to integer and fixes the test.
Fixes: dEQP-VK.tessellation.invariance.outer_edge_symmetry.triangles_equal_spacing_ccw
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of having aux usage and ANV_AUX_USAGE_DEFAULT to mean "give me
something reasonable" we now use anv_layout_to_aux_usage whenever a
layout is available. If a layout is available, we ignore the aux_usage
parameter. For the cases where we have an explicit aux usage such as
clears and aux ops, we have a new ANV_IMAGE_LAYOUT_EXPLICIT_AUX layout.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
If we don't have HiZ, then anv_layout_to_aux_usage will return NONE for
both layouts. If the two layouts are the same, they will get the aux
usage. In either case, the code below will give us ISL_AUX_OP_NONE and
we'll return without doing anything.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This is quite a bit cleaner because we now sync the clear values at the
same time as we do the fast clear. For loading the clear values into
the surface state, we now do it once when we handle the LOAD_OP_LOAD
instead of every subpass.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This requires us to ditch the VkAttachmentReference struct in favor of
an anv-specific struct. However, we can now easily identify from just
the subpass attachment what kind of an attachment it is. This will make
iteration over anv_subpass::attachments a little easier in some case.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This moves the decision out of begin_subpass and into BeginRenderPass
like the decision for color clears. We use a similar name for the
function for depth/stencil as for color even though no aux usage is
really getting computed.
v2 (Jason Ekstrand):
- Don't always disable HiZ clears by accident
- Use the initial layout to decide whether to do fast clears
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This is similar to blorp_gen8_hiz_clear_attachments except that it takes
actual images instead of trusting in the already set depth state.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This doesn't really change much now but it will give us more/better
control over clears in the future. The one interesting functional
change here is that we are now re-emitting 3DSTATE_DEPTH_BUFFERS and
friends for each clear. However, this only happens at begin_subpass
time so it shouldn't be substantially more expensive.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This is a bit less awkward than passing in the subpass because it means
we don't have to extract the subpass id from the subpass.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This seems slightly more correct because it means that the flushes
happen before any clears or resolves implied by the subpass transition.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Previously, we just used all the channels regardless of the format.
This is less than ideal because some channels may have undefined values
and this should be ok from the client's perspective. Even though the
driver should do the correct thing regardless of what is in the
undefined value, it makes things less deterministic. In particular, the
driver may choose to fast-clear or not based on undefined values. This
level of nondeterminism is bad.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This fixes a pile of hangs caused by the recent shuffling of resolves
and transitions. The particularly problematic case is when you have at
least three attachments with load ops of CLEAR, LOAD, CLEAR. In this
case, we execute the first CLEAR followed by a MI memcpy to copy the
clear values over for the LOAD followed by a second CLEAR. The MI
commands cause the first CLEAR to hang which causes us to get stuck on
the 3DSTATE_MULTISAMPLE in the second CLEAR.
We also add guards for BLORP to fix the same issue. These shouldn't
actually do anything right now because the only use of indirect clears
in BLORP today is for resolves which are already guarded by a render
cache flush and CS stall. However, this will guard us against potential
issues in the future.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Nanley Chery <nanley.g.chery@intel.com>
Was hitting an assert with vs-varying-array-mat4-index-col-row-wr.shader_test
When eliminating a copy, we were dropping the use_count of the mov that
is skipped, but not increasing the use_count of it's src instruction.
Fixes: 76440fcca9 freedreno/ir3: clean up dangling false-dep's
Signed-off-by: Rob Clark <robdclark@gmail.com>
In the case of NVIDIA hardware, ABGR is displayable but ARGB is not.
Only advertise the one set in the visuals list.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Split up the op properties table into generation-specific bits, and only
use the kepler ones on kepler. Fixes some CTS images tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
All this information can be retrieved from the TIC directly. Avoid
having to dip into the constbuf information about the image.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
By default, 3DSTATE_CONSTANT_* Constant Buffer 0 is relative to dynamic
state base address. This makes it unusable for pushing UBOs.
There is a bit in the INSTPM register (or CS_DEBUG_MODE2 on Skylake)
which controls whether buffer 0 is relative to dynamic state base
address, or simply a normal pointer. Setting that gives us full
flexibility. This lets us push up to 4 UBO ranges.
We can't currently write this on Haswell and earlier, and will need
to update the kernel command parser, and then do the whole version
checking song and dance. We also need a brand new kernel that supports
context isolation - on older kernels, newly created contexts inherit
register state from whatever happened to be running. So, setting this
would have catastrophic impact on other drivers such as libva, Beignet,
or older Mesa.
See commit 8ec5a4e4a4 where we did this
once before, but had to revert it in commit 013d331220.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Kernel 4.16 has proper context isolation, which means we can change
the L3 configuration without worrying about that leaking to other
newly created contexts, breaking the assumptions of other userspace.
So, disable our workaround to reprogram it back to the default.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
GP10B requires the use of GP100_COMPUTE_CLASS instead of
GP104_COMPUTE_CLASS as is used for other non-GP100 chips.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The previous commit reworked the checks intel_from_planar() to check the
right individual cases for regular/planar/aux buffers, and do size
checks in all cases.
Unfortunately, the aux size check was broken, and required the aux
surface to be allocated with the correct aux stride, but full image
height (!).
As the ISL aux surface is not recorded in the DRIimage, we cannot easily
access it to check. Instead, store the aux size from when we do have the
ISL surface to hand, and check against that later when we go to access
the aux surface.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Fixes: c2c4e5bae3 ("i965: Fix bugs in intel_from_planar")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
State trackers must use a user buffer or const_uploader,
or set pipe_resource::flags same as const_uploader->flags.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The PIPE_CONTROL command description says:
"Whenever a Binding Table Index (BTI) used by a Render Taget Message
points to a different RENDER_SURFACE_STATE, SW must issue a Render
Target Cache Flush by enabling this bit. When render target flush
is set due to new association of BTI, PS Scoreboard Stall bit must
be set in this packet."
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Meta is awful and we'd like to stop using it. Implementing this using
BLORP allows us to stop trashing a bunch of GL state every time.
This follows the structure of st_generate_mipmap().
compute_num_levels is lifted directly from there.
Improves performance in Gl41HdrBloom by about 11.794% +/- 1.01919% (n=3)
on Kabylake GT2 at 1280x720 (the difference seems much smaller at higher
resolutions).
v2 (idr): Don't try depth or depth-stencil blorp blits on Gen4 or Gen5
because it's not implemented yet.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
I want to use compute_num_levels inside i965. Rather than duplicating
it, move it from mesa/st to core Mesa, and make it non-static.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Consolidate archrst draw events into single draw event with an attribute
that represents the type of draw
- Add handlers for new private proto versions of DrawInstancedEvent,
DrawIndexedInstancedEvent, DrawInstancedSplitEvent, and
DrawIndexedInstancedSplitEvent
- Convert the draw events to generic DrawInfoEvents
- parse_proto_event_fields() replaces 'AR_DRAW_TYPE' as a field type with
'uint32_t'. This draw type is actually an enum, but can be represented
as an unsigned integer.
- is_draw_or_dispatch() recognizes DrawInfoEvent as a draw event
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Convert portions of the C sampler to the rasty SIMD lib.
Also fix SRL call with a non-immediate. Don't count on the compiler
automagically converting an srli call to srl if the shift count isn't
an immediate.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Use a new function to denote that we want to get offset to next component
and hide the fact that GEP is used underneath.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Use llvm intrinsic masked.gather instead of manual unroll for the cases
where we have vector of pointers. Improves llvm IR debug experience by
reducing a ton of IR to a single intrinsic call. Also seems to reduce
overall stack use considerably.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Fix invalid number of attributes passed into tesselation PA.
Needs to take into account any offsets from the shader.
Innocuous issue, but removes an assert firing in debug.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Split into two proto files and modify appropriate build rules for
configure / scons / meson builds.
There are private internal events (proxy) that communicate information
from rasterizer to ArchRast. ArchRast can use these events to calculate
a final answer and then emit other public events which will be saved to
file. Users will use the public proto file and not the private one.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
As the comment notes: linux-dmabuf has nothing to do with wayland-drm,
but we need a single place to build these files we can use from both EGL
and Vulkan, which is guaranteed to be included before both EGL and
Vulkan WSI.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
v2: just tell the compiler to assume the format will always be found, as
it comes from the table itself to begin with. (DanielS)
CID: 1429516
Fixes: d32b23f383 "egl/wayland: Add bpp to visual map"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Doesn't really change anything to the test though ¯\_(ツ)_/¯
CID: 1429511
Fixes: e8495646af "glsl/tests: changes to test_disk_cache_create test"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The gallium query types changed, so we need to remap from the
gallium ones to the virgl ones.
Fixes:
dEQP-GLES3.functional.transform_feedback.basic_types*
"This also fixes:
dEQP-GLES3.functional.transform_feedback.array.separate*
dEQP-GLES3.functional.transform_feedback.array_element*
dEQP-GLES3.functional.transform_feedback.interpolation.*
Gallium's p_defines.h and virglrenderer's p_defines.h have diverged
quite a bit, so not including
PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE there makes sense for now."
- Gurchetan Singh
Fixes: 3f6b3d9db (gallium: add PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Tested-by: Gurchetan Singh <gurchetansingh@chromium.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
From PIPE_CONTROL command description in gfxspecs:
"Whenever a Binding Table Index (BTI) used by a Render Taget Message
points to a different RENDER_SURFACE_STATE, SW must issue a Render
Target Cache Flush by enabling this bit. When render target flush
is set due to new association of BTI, PS Scoreboard Stall bit must
be set in this packet."
V2: Move the PIPE_CONTROL to update_renderbuffer_surfaces() in
brw_wm_surface_state.c (Ken).
Fixes a fulsim error and a GPU hang described in below JIRA.
JIRA: MD5-322
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Sampling from hiz is enabled in i965 for GEN9+ but this feature has
been removed from gen11. So, this new flag will be useful to turn
the feature on/off for different gen h/w. It will be used later
in a patch adding device info for gen11.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
SIMD4x2 dispatch mode has been removed in GEN11. We're not using
it anyways in Mesa. Adding few asserts to make it explicit.
Use GEN_GEN macro in place of devinfo->gen (Ken)
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Nothing is changed here from gen10 to gen11. So, just update
the assert.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
StateCacheInvalidation is required on all gen7+ platforms. We
don't need to update this check for every new gen h/w unless
this requirement is changed. So, dropping the check for latest
gen h/w.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These only existed to avoid making people update libdrm for new uABI
headers. A while ago we imported those headers into the Mesa repo,
so the dependency is gone and these are no longer useful.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
r325155 ("Pass a reference to a module to the bitcode writer.")
and
r325160 ("Pass module reference to CloneModule")
change function interface from pointer to reference.
v2: Fix indentation (tab instead of spaces)
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This needs to link the state tracker with --whole-archive to expose the
right symbols.
v4: - Always add libswdri and libswkmsdri to the link_with list
Fixes: 22a817af8a ("meson: build gallium xvmc state tracker")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This needs to use --whole-archive (link_whole in meson) to properly
expose symbols.
v4: - Always add libswdri and libswkmsdri to link_with list
Fixes: 0ba909f0f1 ("meson: build gallium xa state tracker")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This needs to use --whole-archive (link_whole in meson) to properly
expose symbols.
v4: - Always add libswdri and libswkmsdri to link_with
Fixes: 1d36dc674d ("meson: build gallium omx state tracker")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The state tracker needs to be linked with whole-archive (like
autotools). As a result there are symbols from libswdri and libswkmsdri
that are needed, so link those as well.
v4: - Always add libswdri and libswkmsdri to link_with list
Fixes: 5a785d51a6 ("meson: build gallium va state tracker")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The VDPAU state tracker needs to be linked with whole-archive (autotools
does this). Because we are linking the whole archive we alos need to
link with libswdri and libswkmsdri if those have been enabled.
v4: - Always add libswdri and libswkmsdri to link_with list
Fixes: 68076b8747 ("meson: build gallium vdpau state tracker")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This makes the dependencies easier to manage, since each media target
doesn't need to worry about linking to half a dozen libraries.
Fixes: b1b65397d0 ("meson: Build gallium auxiliary")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This allows these variables to unconditionally included in `link_with`
lists, even if they're not used. This allows deleting duplicated logic
in nearly every gallium target implemented in meson today. This also
removes the now useless `build_by_default` flag from swdri and swkmsdri.
v4: - add this patch
Fixes: 66c94b9313
("meson: build gallium winsys for dri, null, and wrapper")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This reverts commit 10d1b0be8e.
This is unnecessary, the depend_files argument is for adding
dependencies on files that are not part of the input, which is already
done.
cc: Jason Ekstrand <jason.ekstrand@intel.com>
Fixes: 10d1b0be8e
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Move the calls to svga_hwtnl_set_fillmode() and svga_hwtnl_set_flatshade()
out of the two retry_draw_*() functions to the svga_draw_vbo() function.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This fixes a few Piglit transform feedback regressions caused by
commit 7a1401938b.
In that change I moved the moved svga_update_state() into the loops,
after the calls to svga_hwtnl_set_flatshade(). But
svga_hwtnl_set_flatshade() actually depends on some derived shader
state. This patch moves the svga_update_state() call into
svga_draw_vbo() so it's not duplicated in two places.
Fixes: 7a1401938b ("svga: clean up retry_draw_range_elements(),
retry_draw_arrays()")
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
If we fail to compile the normal VS or FS we fall back to a simple/
dummy shader. We need to rescan the the shader to update the shader
info. Otherwise, this can lead to further translations failures
because the shader info doesn't match the actual shader.
Found by adding some extra debug assertions in the state-update code
while debugging something else.
v2: also update shader generic_inputs/outputs, etc. per Charmaine
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
In theory this might lead to corruption if we bind a descriptor
set which is unused, because LLVM is smart and it can re-use
unused user SGPRs. In practice, this doesn't seem to fix
anything.
As a side effect, this will reduce the number of emitted
SH_REG packets.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Although meta shaders don't use any vertex buffers, there is no
behaviour change but I think it's better to do this. Though,
this saves two user SGPRs for push constants inlining or
something else.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It was assumed that fromPlanar() could return NULL to mean
that the planar image is the same as the parent DRI image.
That assumption wasn't made everywhere though.
Let's fix things and make sure that all callers understand
a NULL result
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Stone <daniels@collabora.com>
The ARB_viewport_array spec says:
"Dependencies
OpenGL 1.0 is required.
OpenGL 3.2 or the EXT_geometry_shader4 or ARB_geometry_shader4 extensions
are required.
This extension is written against the OpenGL 3.2 (Compatibility)
Specification."
As such, we should ignore it for GLES2 contexts.
Fixes:
dEQP-GLES2.functional.state_query.integers.viewport_getinteger
dEQP-GLES2.functional.state_query.integers.viewport_getfloat
on llvmpipe and virgl.
v2: Use _mesa_has_* (Ilia)
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: 17.3 18.0 <mesa-stable@lists.freedesktop.org>
Generators really are never the thing you want. The problem in this case
is that a generator must create a file that contains any file that the
generated target depends on. Since brw_oa.py doesn't generate such a
file the generated sources are not regenerated even if the xml files
they should depend on changes.
While we could change brw_oa.py to write such a file, that's silly, it
depends on itself and the xml file. So we'll just use a custom target
instead, which will have the correct dependency behavior and doesn't
really add that much code.
Fixes: 3218056e0e ("meson: Build i965 and dri stack")
CC: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
From Skylake PRM Surface Formats section:
"The surface format for the typed atomic integer operations must
be R32_UINT or R32_SINT."
Fixes an error and a piglit GPU hang in simulation environment.
Piglit test: gl45-imageAtomicExchange-float.shader_test
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.co
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "18.0 17.3" <mesa-stable@lists.freedesktop.org>
When initializing mcs, map with MAP_RAW and fill in the linear
map. Removes a place where gtt mapping is used.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In all current uses, the linear surface is only allocated starting
at (xt1, yt1) anyway, so this improves the calling ergonomics.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
TileY's low 6 address bits are: v1 v0 u3 u2 u1 u0
Thus a cache line in the tiled surface is composed of a 2d area of
16x4 bytes of the linear surface.
Add a special case where the area being copied is 4-line aligned
and a multiple of 4-lines so that entire cache lines will be
written at a time.
On Apollolake, this increases tiling throughput to wc maps by
84.0103% +/- 0.862818%
v2: Split [y0, y1) and [y2, y3) loops apart for clarity (Jason Ekstrand)
v3: Don't reset src var (Jason), Ensure y0 <= y1 <= y2 <= y3
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This code to re-set the type of the source and destination is not
necessary since we never manipulate the types. Looks like a
left over from a time where we had to retype to float temporarily
to handle 64-bit inputs.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Divide it by two as we do for other stages. This is because the
component layout qualifier is always in 32-bit units.
Fixes issues in a new CTS test (still WIP):
KHR-GL45.enhanced_layouts.varying_double_components
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This fixes a regression in the broadcast color to all color bufs case.
Fixes: 6c691081a (r600: fixup sparse color exports.)
Signed-off-by: Dave Airlie <airlied@redhat.com>
I found a shader with
DCL TEMP[0], LOCAL
DCL TEMP[1..256], ARRAY(1), LOCAL
DCL TEMP[257..512], ARRAY(2), LOCAL
DCL TEMP[513..768], ARRAY(3), LOCAL
DCL TEMP[769], LOCAL
This would remap badly, as it would add up all the spilled sizes
and subtract it from the temp for 0. If the current temp is less
than the array start break out.
Fixes: 1d871aa6 (r600g: Implement spilling of temp arrays (v2))
Signed-off-by: Dave Airlie <airlied@redhat.com>
Shaders coming from dx10 state trackers have a RET before the END.
And the epilog needs to be placed before the RET (otherwise it will
get ignored).
Hence figure out if a RET is in main, in this case we'll place
the epilog there rather than before the END.
(At a closer look, there actually seem to be problems with control
flow in general with output redirection, that would need another
look. It's enough however to fix draw's aa line emulation in some
internal bug - lines tend to be drawn with trivial shaders, moving
either a constant color or a vertex color directly to the output).
v2: add assert so buggy handling of RET in main is detected
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Nobody queries these and nobody sets them to anything useful,
the docs say TODO.
Drop them until a use appears.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add support for GL_NUM_SHADING_LANGUAGE_VERSIONS
and glGetStringi for GL_SHADING_LANGUAGE_VERSION
v2:
- Combine similar functionality into
_mesa_get_shading_language_version() function.
- Change GLSL version return mechanism.
v3:
- Add return of empty string for GLSL ver 1.10.
- Move _mesa_get_shading_language_version() function
to src/mesa/main/version.c.
v4:
- Add OpenGL version check.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104915
Signed-off-by: Andriy Khulap <andriy.khulap@globallogic.com>
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The EXTRA_VERSION_40 predicate is tested as part of
extra_gl40_ARB_sample_shading but there was no switch case for it.
Fixes: 77b440e42d ("mesa: Add new functions and enums required
by GL_ARB_sample_shading")
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This struct allows us to report:
- accurate max point size/line width.
- accurate texel and texture gather offsets
- vertex/geometry limits.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Let's use the new gl_state_index16 type everywhere and remove
the typecasts.
This helps reduce the size of gl_program_parameter.
Reviewed-by: Brian Paul <brianp@vmware.com>
All drivers convert these to float, so there is no reason to use double.
The piglit test that expects double precision from glGet will be adjusted
not to require it (there is a piglit patch).
gl_context::ViewportArray: 512 -> 384 bytes
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
GL allows doing glTexEnv on 192 texture units, while in reality,
only MaxTextureCoordUnits units are used by fixed-func shaders.
There is a piglit patch that adjusts piglits/texunits to check only
MaxTextureCoordUnits units.
Reviewed-by: Brian Paul <brianp@vmware.com>
We were setting current_pipeline to UINT32_MAX and then calling
cmd_cmd_state_reset which memsets the entire state struct to 0 which
implicitly resets current_pipeline to 3D. I have no idea how this
hasn't caused everything to explode.
Fixes: cd3feea745 "anv/cmd_buffer: Rework anv_cmd_state_reset"
cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The previous code was trying to avoid non-existent layers by taking a
MAX with anv_image_aux_layers. Unfortunately, it wasn't taking into
account that layer_count starts at base_layer which may not be zero.
Instead, we need to subtract base_layer from anv_image_aux_layers with
a guard against roll-over.
Fixes: de3be61801 "anv/cmd_buffer: Rework aux tracking"
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This commit fixes two bugs in intel_from_planar. First, if the planar
format was non-NULL but only had a single plane, we were falling through
to the planar case. If we had a CCS modifier and plane == 1, we would
return NULL instead of the CCS plane. Second, if we did end up in the
planar_format == NULL case and the modifier was DRM_FORMAT_MOD_INVALID,
we would end up segfaulting in isl_drm_modifier_has_aux.
Cc: mesa-stable@lists.freedesktop.org
Fixes: 8f6e54c929
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Even switching the def's condition to be the same chip revision check as
the use, the compiler doesn't figure it out. Just NULL-init it.
Fixes: ec53e52742 ("ac/nir: Add ES output to LDS for GFX9.")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
My gcc doesn't figure out that dims >= 1 (seems reasonable), and doesn't
notice that ddmax is used from the same no_rho_opt as its initialization.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The kernel used to have execbuf parameters to program the INSTPM bit
for whether 3DSTATE_CONSTANT_* should be relative to dynamic state
base address or an absolute address. However, they never worked in
the presence of hardware contexts, so I deleted them a while back.
It doesn't make sense to set this flag, as it doesn't exist anymore.
It also never did anything anyway - the flag is zero, so |'ing it in
did nothing. The default is relative anyway.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The compiler doesn't know that num_visuals > 0.
Fixes: 37a8d907cc ("egl/gbm: Ensure EGLConfigs match GBM surface format")
Reviewed-by: Daniel Stone <daniels@collabora.com>
Flush a resource's previous write_batch synchronously. Because a
resource's associated batches are not updated until after the flush
thread submits rendering to the kernel, this was causing a bit of
confusion in the following loop. This fixes a bug that appeared with
recent stk.
Perhaps we need to re-work things a bit to clear out dependent patches
in the ctx's thread and use a fence to deal with the period between
when a flush is queued and when it is submitted to the kernel. But
this will do until time permits a larger refactor.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Because of loops, we can't schedule all of a block's predecessors first.
Instead just assume that the result consumed in a block was written far
enough away in all paths into a block. And do an intra-block scheduling
pass to figure out if there are any cases where we need to insert extra
nop's. This works out better than always assuming the worst case (ie.
that a value live into a block was written in the last instruction in
the predecessor block).
Signed-off-by: Rob Clark <robdclark@gmail.com>
Normally false-deps are not something to consider, since they mostly
exist for delay-slot related reasons:
* barriers
* ordering writes after read
* SSBO/image access ordering
The exception is a false-dependency on an array store.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Previously we didn't handle flow control in legalize, and instead just
set (ss)(sy) on the first instruction in every block. Which isn't very
clever.
Instead, consider output state of all predecessor blocks, so we only
set a sync bit if needed for any possible path leading into a block.
Because of loops, we can't require that all successor blocks are
legalized before a given block, so instead run in a loop until results
converge.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Maybe there is a better way for this.. where it comes useful is "array"
loads, which end up as a false-dep for a later array store.
If all the uses of an array load are CP'd into their consumer, it still
leaves the dangling array load, leading to funny things like:
mov.u32u32 r5.y, r0.y
mov.u32u32 r5.y, r0.z
Signed-off-by: Rob Clark <robdclark@gmail.com>
Generally seems to do worse on instruction count and register usage,
according to shader-db. But shader-db also doesn't do a very good job
of weighting loop bodies, so that might not be totally valid.
So add an env variable to enable GCM pass for easier experimentation.
Signed-off-by: Rob Clark <robdclark@gmail.com>
There are more useful nir passes added since initial conversion to nir.
But ir3 was never updated to use them.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Agressively lowering all if/else to selects in some extreme cases
results in much higher register pressure. Using peephole select instead
with a modest threshold speeds up alu2 4x!
16 seems like a good limit, low enough to help alu2 but not too low that
it penalizes everything else. With a bit better scheduling of the
instruction that moves a value into a predicate register, we might be
able to lower this limit a bit more in the future, but since we need 6
cycles from the move to predicate register to predicated branch, that
puts some sort of lower bound on how far we can lower this threshold.
Signed-off-by: Rob Clark <robdclark@gmail.com>
nir's from_ssa pass is much better at avoiding inserting extra moves
than our logic is. And lowering phi webs to regs just treats anything
involved in a phi web as an array of length=1. Which with previous
array related fixes in RA/etc ends up working out quite well. This cuts
down on extra instructions and also helps with register pressure.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Makes it easier to compare values seen in-game (where there are many
shaders) to cmdline standalone compiler.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Seems to be there since a3xx, but we always lowered fsat. But we can
shave some instructions, especially in shaders that use lots of
clamp(foo, 0.0, 1.0) by not lowering fsat.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Use livein state of other blocks to extend liverange of arrays when they
are still needed by successor blocks.
Signed-off-by: Rob Clark <robdclark@gmail.com>
When eliminating movs, the instruction that is now directly using the
src of the mov has the same scheduling order constraints as the original
mov instruction.
Signed-off-by: Rob Clark <robdclark@gmail.com>
The number of bits depends on generation. But printing negative values
with a5xx encoding (largest size) but compiling for a3xx or a4xx, would
result in negative values printed as large positive values.
I guess in practice huge negative branch offsets aren't likely (and if
that is the case, the shader is probably too big to grok by reading the
assembly). So just print using smallest bitfield size.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Fix the following:
warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but
argument 3 has type ‘uint64_t {aka long long unsigned int}.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We move the nir check before the shader cache call so that we can
call a nir based caching function in a following patch.
Also with this change we simply check if vertex shaders support
NIR rather than looping over the stages as mixing of shader types
is not supported anyway.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Clover now checks PIPE_SHADER_CAP_SUPPORTED_IRS for native support instead.
This change indirectly enables NIR support for compute shaders
on radeonsi.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
PIPE_SHADER_CAP_PREFERRED_IR was conflicting with PIPE_SHADER_IR_NIR
for compute shaders, so we let clover pick the one it wants to use.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes the following piglit test:
./bin/arb_vertex_attrib_64bit-check-explicit-location -auto -fbo
Where we would end up with the nir such as:
vec1 64 ssa_11 = pack_64_2x32_split ssa_9, ssa_10
vec1 32 ssa_12 = f2f32 ssa_2
And our pack_64_2x32_split nir to llvm code always produces
a 64bit integer as output.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
When we create an EGL window surface on a GBM surface, ensure that the
EGLConfig is compatible with the GBM format, notwithstanding XRGB/ARGB
interchange.
For example, rendering with an XRGB8888 EGLConfig on to an ARGB8888
gbm_surface (and vice-versa) are acceptable, but rendering with an
XRGB2101010 EGLConfig on to an XRGB8888 gbm_surface will now be
rejected.
This was previously allowed through; when 10bpc formats were enabled,
clients which picked a completely random EGL config and hoped/assumed
they were XRGB8888 would break.
If you have bisected a failure to start a GBM/KMS client to this commit,
please look at its EGLConfig selection (e.g. through eglChooseConfigs),
and add an EGL_NATIVE_VISUAL_ID == gbm_surface format match to the
attribs for config selection.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Now that we have mask/channel information in gbm_dri's format conversion
table, we can remove the copy in EGL.
As this table contains more formats (notably including R8 and RG8, which
can be used for BO but not surface allocation), we now compare the masks
of all channels when trying to find a suitable config. Without doing
this, an XRGB8888 EGLConfig would match on an R8 format.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Each Wayland EGLDisplay currently contains a struct with one vector of
modifiers per format, hardcoded in the header. To allow easier support
for more formats, turn this into an array of u_vectors which is opaque
outside of platform_wayland.c.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Both the DRI2 GetBuffersWithFormat interface, and SHM buffer allocation,
had their own format -> bpp lookup tables. Replace these with a lookup
into the visual map.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
When trying to translate between DRIImage format enums and FourCC codes,
use our visual map rather than an open-coded subset.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
When creating a wl_buffer on an upstream Wayland display from an
existing EGLImage, use the dri2_wl_visual map rather than another
hardcoded list of formats.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Extend the visual map from only containing names and bitmasks, to also
carrying the three format enums we need. These are the DRIImage format
tokens for internal allocation, FourCC codes for wl_drm and dmabuf
protocol, and wl_shm codes for swrast drivers.
We will later use these formats to eliminate a bunch of open-coded
conversions.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Widen the channel masks given in the visual table to the full width of
the pixel format, i.e. as many leading zeros as required.
No functional change.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
When 0b2b719121 moved from an if tree to a struct to map between
wl_drm formats and EGLConfigs, it transposed the mapping between XRGB
and ARGB. Luckily, everyone exposes both formats, so this is harmless.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Fixes: 0b2b719121 ("egl/wayland: introduce dri2_wl_add_configs_for_visuals() helper")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Non-MRT cases always translate blend state for 1 color buffer only.
MRT cases only check and translate blend state for enabled color buffers.
This also avoids an assertion failure in translate_blend for:
dEQP-GLES31.functional.draw_buffers_indexed.overwrite_common.common_advanced_blend_eq_buffer_blend_eq
Reviewed-by: Eric Anholt <eric@anholt.net>
Include the Mesa version and detail about the platform.
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
It is present from libva 2.1 (VAAPI 1.1.0 or higher).
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
This patch moves disk cache path and index creation back to the
constructor which matches previous behavior. We still allow create
to succeed without path so that cache can be used with callback
functionality.
Fixes: c95d3ed091 "disk cache: create cache even if path creation fails"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Initialize to TGSI_TEXTURE_BUFFER (== 0), same as was done before the
variable type was changed to enum tgsi_texture_type.
Fixes a bunch of piglit failures with radeonsi, e.g.:
gles-3.0-transform-feedback-uniform-buffer-object: ../../../../src/gallium/auxiliary/tgsi/tgsi_util.c:502: tgsi_util_get_texture_coord_dim: Assertion `!"unknown texture target"' failed.
Corresponding compiler warning:
CXX state_tracker/st_glsl_to_tgsi.lo
../../../src/mesa/state_tracker/st_glsl_to_tgsi.cpp: In function ‘pipe_error st_translate_program(gl_context*, uint, ureg_program*, glsl_to_tgsi_visitor*, const gl_program*, GLuint, const ubyte*, const ubyte*, const ubyte*, const ubyte*, const ubyte*, GLuint, const ubyte*, const ubyte*, const ubyte*)’:
../../../src/mesa/state_tracker/st_glsl_to_tgsi.cpp:5992:23: warning: ‘tex_target’ may be used uninitialized in this function [-Wmaybe-uninitialized]
ureg_memory_insn(ureg, inst->op, dst, num_dst, src, num_src,
~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
inst->buffer_access,
~~~~~~~~~~~~~~~~~~~~
tex_target, inst->image_format);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../../src/mesa/state_tracker/st_glsl_to_tgsi.cpp:5866:27: note: ‘tex_target’ was declared here
enum tgsi_texture_type tex_target;
^~~~~~~~~~
Fixes: 9f9ce1625f ("st/mesa: use TGSI enum types in st_glsl_to_tgsi.cpp")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is already handled at link_uniform_blocks, specifically at
process_block_array_leaf.
Additionally, this code was not handling correctly arrays of
arrays. When creating the name of the block to set the binding, it
only took into account the first level, so any attempt to set a
explicit binding on a array of array ubo would trigger an assertion.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Instead of updating all modified gl_vertex_array_object::_VertexArray
entries just update those that are modified and enabled.
Also release buffer object from the _VertexArray that belong
to disabled attributes.
v2: Also set Ptr and Size to zero.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Set the _DrawArray pointer to NULL when calling into the Drivers
Bitmap/CopyPixels/DrawAtlasBitmaps/DrawPixels/DrawTex hooks.
This fixes an assert that gets uncovered when the following
patch gets applied.
v2: Mute from within the state tracker instead of generic mesa.
v3: Avoid evaluating _DrawArrays from within st_validate_state.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When changing the attribute binding in the VAO we also need to
account for getting rid of non vbo bits from VertexAttribBufferMask.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
When NIR is enabled, TGSI must not be used. When NIR is disabled, TGSI
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Now that we're tracking aux properly per-slice, we can enable this for
applications which actually care.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This commit completely reworks aux tracking. This includes a number of
somewhat distinct changes:
1) Since we are no longer fast-clearing multiple slices, we only need
to track one fast clear color and one fast clear type.
2) We store two bits for fast clear instead of one to let us
distinguish between zero and non-zero fast clear colors. This is
needed so that we can do full resolves when transitioning to
PRESENT_SRC_KHR with gen9 CCS images where we allow zero clear
values in all sorts of places we wouldn't normally.
3) We now track compression state as a boolean separate from fast clear
type and this is tracked on a per-slice granularity.
The previous scheme had some issues when it came to individual slices of
a multi-LOD images. In particular, we only tracked "needs resolve"
per-LOD but you could do a vkCmdPipelineBarrier that would only resolve
a portion of the image and would set "needs resolve" to false anyway.
Also, any transition from an undefined layout would reset the clear
color for the entire LOD regardless of whether or not there was some
clear color on some other slice.
As far as full/partial resolves go, he assumptions of the previous
scheme held because the one case where we do need a full resolve when
CCS_E is enabled is for window-system images. Since we only ever
allowed X-tiled window-system images, CCS was entirely disabled on gen9+
and we never got CCS_E. With the advent of Y-tiled window-system
buffers, we now need to properly support doing a full resolve of images
marked CCS_E.
v2 (Jason Ekstrand):
- Fix an bug in the compressed flag offset calculation
- Treat 3D images as multi-slice for the purposes of resolve tracking
v3 (Jason Ekstrand):
- Set the compressed flag whenever we fast-clear
- Simplify the resolve predicate computation logic
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Even though the blorp pass looks a bit on the sketchy side, the end
result in the Vulkan driver is very nice. Instead of having this weird
case where you do a fast clear and then maybe have to resolve, we just
do the ambiguate and are done with it. The ambiguate does exactly what
we want of setting all the CCS values to 0 which puts it into the
pass-through state.
This should also improve performance a bit in certain cases. For
instance, if we did a transition from UNDEFINED to GENERAL for a surface
that doesn't have CCS enabled all the time, we would end up doing a
fast-clear and then a full resolve which ends up touching every byte in
the main surface as well as the CCS. With the ambiguate pass, that
transition only touches the CCS.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Now that this isn't a multi-case if and it's just the one case, it's a
bit clearer if the condition is just part of the if instead of being
pulled out into a boolean variable.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This pass performs an "ambiguate" operation on a CCS-compressed surface
by manually writing zeros into the CCS. On gen8+, ISL gives us a fairly
detailed notion of how the CCS is laid out so this is fairly simple to
do. On gen7, the CCS tiling is quite crazy but that isn't an issue
because we can only do CCS on single-slice images so we can just blast
over the entire CCS buffer if we want to.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The current strategy we use for managing resolves has an issues where we
track clear colors and the need for resolves per-LOD but we still allow
resolves of only a subset of the slices in any given LOD and doing so
sets the "needs resolve" flag for that LOD to false while leaving the
remaining layers unresolved. This patch is only the first step and does
not, by itself fix anything. However, it's fairly self-contained and
splitting it out means any performance regressions should bisect to this
nice obvious commit rather than to the giant "rework aux tracking"
commit.
Nanley and I did some testing and none of the applications we tested
even tried to fast-clear anything other than the first slice of an
image. The test was done by adding a printf right before we call
blorp_fast_clear if we were every going to touch any slice other than
the first with a fast-clear. Due to the way the original code was
structured, this would not have included applications which only cleared
a subset of layers. The applications tested were:
* All Sascha Willems demos
* Aztec Ruins
* Dota 2
* The Talos Principle
* Mad Max
* Warhammer 40,000: Dawn of War III
* Serious Sam Fusion 2017: BFE
While not the full list of shipping applications, it's a pretty good
spread and covers most of the engines we've seen running on our driver.
If this is ever shown to be a performance problem in the future, we can
reconsider our strategy.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Currently, this helper does nothing but we call it every place where an
image is written through the render pipeline. This will allow us to
properly mark the aux state so that we can handle resolves correctly.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This moves it to being based on layout_to_aux_usage instead of being
hard-coded based on bits of a priori knowledge of how transitions
interact with layouts. This conceptually simplifies things because
we're now using layout_to_aux_usage and layout_supports_fast_clear to
make resolve decisions so changes to those functions will do what one
expects.
There is a potential bug with window system integration on gen9+ where
we wouldn't do a resolve when transitioning to the PRESENT_SRC layout
because we just assume that everything that handles CCS_E can handle it
all the time. When handing a CCS_E image off to the window system, we
may need to do a full resolve if the window system does not support the
CCS_E modifier. The only reason why this hasn't been a problem yet is
because we don't support modifiers in Vulkan WSI and so we always get X
tiling which implies no CCS on gen9+. This patch doesn't actually fix
that bug yet but it takes us the first step in that direction by making
us actually pick the correct resolve op. In order to handle all of the
cases, we need more detailed aux tracking.
v2 (Jason Ekstrand):
- Make a few more things const
- Use the anv_fast_clear_support enum
v3 (Jason Ekstrand):
- Move an assert and add a better comment
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
If the function gets passed ANV_AUX_USAGE_DEFAULT, it still has the old
behavior of setting ISL_AUX_USAGE_NONE for depth/stencil which is what
we want for blits/copies.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This replaces image_fast_clear and ccs_resolve with two new helpers that
simply perform an isl_aux_op whatever that may be on CCS or MCS. This
is a bit cleaner as it separates performing the aux operation from which
blorp helper we have to call to do it.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Right now, we have different entrypoints and enums in blorp for these
different operations. This provides us a central enum which we can
begin to transition to.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
v2: * Check whether the node src and dst registers are NULL before using
them.
* fix a type in the commit message.
Two cases are handled with this patch:
1. If copy propagation tries to eliminated a move from a relative
array access then it could optimize
MOV R1, ARRAY[RELADDR_1]
MOV R2, ARRAY[RELADDR_2]
OP2 R3, R1 R2
into
OP2 R3, ARRAY[RELADDR_1], ARRAY[RELADDR_2]
which is forbidden, because there is only one address register available.
2. When MULADD(x,a,MUL(x,c)) is handled
MUL TMP, R1, ARRAY[RELADDR_1]
MULLADD R3, R1, ARRAY[RELADDR_2], TMP
by folding this into
ADD TMP, ARRAY[RELADDR_2], ARRAY[RELADDR_1]
MUL R3, R1, TMP
which is also forbidden.
Test for these cases and reject the optimization if a forbidden combination
of relative access would be created.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103142
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
On r600 we use the scratch mem with read/read_ind, in that case
sb should track the rw_gpr as a dst instead of a src.
This stops the whole shader being optimised out.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Spills have to happen after the VLIW bundle currently
processed, so defer emitting the spill op.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This fixes:
KHR-GL45.texture_gather.swizzle
on cayman and redwood.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
NIR->LLVM should only be a translation pass, and all scan stuff
should be done before.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Add a build option to control building some of the misc tools we
have. Also set the executables to install, presumably you want
that if you're asking for the build.
v2: set 'install:' to the with_tools value, not true (Jordan)
handle 'all' in a the comma list (Dylan)
Add freedreno's tools (Dylan)
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
TGSI_INTERPOLATE_CONSTANT and TGSI_INTERPOLATE_LOC_CENTER have the
value zero so there's no change in behavior. It seems funny to
declare these fs input registers with constant interpolation. But
it looks like ureg_DECL_input_layout() is not called anywhere and
ureg_DECL_input() is only called from
util_make_geometry_passthrough_shader().
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
And add a default switch case to silence a compiler warning.
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
'indirect_params' was a bit vague. Use the names that we use in
gallium's pipe_draw_indirect_info.
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The initial revision of the patch adding loadable configs was testing
the feature's availability by adding a new config successfully and
then removing it.
A second version tested the availability just by exercising the
removal. But some unused code remained.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This helps figuring out potential problems when metrics don't show up
on frameretrace for example.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This ports the texture gather integer workaround from radeonsi.
This fixes:
KHR-GL45.texture_gather.plain-gather-uint/int*
v2: add rect support, fix 2d array shadow
Reviewed-by: Roland Scheidegger <sroland@vmware.com> (on irc)
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is taken from Glenn Kennards scratch series, but separated
out as a cleanup by me.
Reviewed-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The hw gives us coverage for pixel, not for individual fragment shader
invocations, in case execution isn't per pixel (eg, unlike cm, actually
cannot do "real" minSampleShading, it's either per-pixel or per-fragment,
but it doesn't really make a difference here).
Also, with msaa disabled, the hw still gives us a mask corresponding to
the number of samples, where GL requires this to be 1.
Fix this up by masking the sampleMaskIn bits with the bit corresponding to
the sampleID, if we know this shader is always executed at per-sample
granularity. (In case of a per-sample frequency shader and msaa disabled,
the sampleID will always be 0, so this works just fine there.)
Fixing this for the minSampleShading case will need a shader key (radeonsi
uses the prolog part for) (for eg, could get away with a single bit, cm
would need more bits depending on sample/invocation ratio, or read the
bits from a uniform), unless we'd want to always use a sample mask uniform
(which is probably not a good idea, as it would make the ordinary common
msaa case slower for no good reason).
This fixes some parts of piglit arb_sample_shading-samplemask (with fixed
test), in particular those which use a sampleID, still failing others
as expected.
Reviewed-by: Dave Airlie <airlied@redhat.com>
For some reason, we were iterating through the code twice (first just for
instructions needing barycentrics, then for instructions and input dcls).
Move things around slightly so this is no longer necessary.
There also was a unnedeed enabling of the fixed_pt_position_gpr - this is only
needed if the per-sample interpolation comes from an input, not from an
instruction (just move the assert where it belongs) (since the sample id to
sample from comes from a tgsi src in this case, and isn't sampleID).
Otherwise there should be no functional change.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This parameter for _mesa_get_min_incations_per_fragment() was once used
by the intel driver, but it's long gone.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Dave Airlie <airlied@vmware.com>
Ported from the radeonsi GL_AMD_pinned_memory implementation.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It seems these were missed when struct pipe_context * argument was
added to hud_graph::query_new_value.
Fixes: 3132afdf4c "gallium/hud: pass pipe_context explicitly to most functions"
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Building gallium is faster by 7.5 seconds on a 4core/8thread 3GHz CPU.
(gallium build time is reduced by 15% when building only radeonsi)
Non-recursive makefiles are great!
The array members are have type 'struct gl_buffer_object *'
Found by coverity.
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We need this to handle some oddball dx10 format
(DXGI_FORMAT_R10G10B10_XR_BIAS_A2_UNORM). What you can do with this
format is very limited, hence we don't want to add it as a gallium
format (we could not express the properties of this format as
ordinary format properties neither, so like all special formats
it would need specific code for handling it in any case).
While here, also nuke the array for different shaders for different
writemasks, as it was not actually used (always full masks are
passed in for generating shaders).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The writemask handling was busted, since writing defaults to output
meant they got overwritten by the tex sampling anyway. Albeit the
affected components were undefined, so maybe with some luck it
still would have worked with some drivers - if not could as well
kill it... (This would have affected u_blitter but not u_blit since
the latter always used xyzw mask.)
Reviewed-by: Brian Paul <brianp@vmware.com>
We don't need the library if we don't build tests, and building
it adds a dependency on gtest which adds a dependency on cxxabi.h.
Fixes: 6569b33b6e "mesa/st/tests: unify MockCodeLine* classes"
Reviewed-By: Gert Wollny <gw.fossdev@gmail.com>
v2: add disk_cache_has_key, disk_cache_put_key support
using blob cache (Nicolai, Jordan)
v3: rename set_cb as put_cb to match existing naming (Timothy)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This patch makes disk_cache initialize path and index lazily so
that we can utilize disk_cache without a path using callback
functionality introduced by next patch.
v2: unmap mmap and destroy queue only if index_mmap exists
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Next patch will allow disk_cache instance to be created without
path set for it, modify some test cases that assume disk_cache
creation to fail with invalid path. Creation should succeed but
simple put/get test fail.
v2: leave tests as is but check that both cache struct exists
and try simple put/get that should fail with invalid path set
(Emil)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Ported from RadeonSI.
Only one F1 2017 shader is affected, code size decreased
from 532 to 488 on both Polaris10 and Vega10.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Although on gen8+ platforms we can in theory use 3DSTATE_VF_SGVS
to put these beyond the last vertex element it seems that we still
need to allocate the SVGS element, otherwise we have observed cases
where we end up reading garbage. Specifically, the CTS test mentioned
below was flaky with a fail rate of ~1% on some gen9+ platforms caused
by reading garbage for the gl_InstanceID value. The flakyness goes
away as soon as we start allocating the SVGS element.
v2:
- Do this for gen8+, not just gen9+, and pull the boolean
outside the #if block (Jason)
Fixes flaky test:
KHR-GL45.vertex_attrib_64bit.limits_test
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104335
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Using 'extern "C"' around includes is always incorrect, as the header may
contain C++ symbols (as it does in this case), which means it cannot use
C linkage. In this case the header has a template in it, which obviously
cannot be linked with C linkage rules.
Fixes: a29ad2b421 ("mesa/tests: Add tests for the generated dispatch table")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Instead just set the proper -I flags and include it from a more standard
path. In this case we'll add -Isrc/mesa (which is common), and #include
main/foo.h.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Since the type is gl_vertex_array. Update comment to explain that
these arrays are only used by the VBO module.
Also rename some local variables in _mesa_update_vao_derived_arrays().
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Instead of testing for formats==NULL everywhere, just point formats at
a dummy array which will be discarded.
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Clang defines __GNUC__ macro, so one doesn't need to check __clang__
macro in this particular case.
v2: added comment as per Brian Paul's suggestion
Reviewed-by: Brian Paul <brianp@vmware.com>
Use a new helper function, st_access_flags_to_transfer_flags(), to
convert the GL_MAP_x flags to PIPE_TRANSFER_x flags.
We'll be able to use this function in a couple other places.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Split out some of the code into three new helper functions:
buffer_target_to_bind_flags(), storage_flags_to_buffer_flags(),
buffer_usage() to make the code more managable.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
LLVM can't shrink loads.
Polaris10:
Totals from affected shaders:
SGPRS: 62528 -> 59955 (-4.11 %)
VGPRS: 44708 -> 44616 (-0.21 %)
Spilled SGPRs: 16 -> 8 (-50.00 %)
Code Size: 1355504 -> 1355172 (-0.02 %) bytes
Max Waves: 11710 -> 11670 (-0.34 %)
Vega10:
Totals from affected shaders:
SGPRS: 51448 -> 50371 (-2.09 %)
VGPRS: 39140 -> 39048 (-0.24 %)
Spilled SGPRs: 16 -> 16 (0.00 %)
Code Size: 1307188 -> 1304296 (-0.22 %) bytes
Max Waves: 11312 -> 11292 (-0.18 %)
This reduces SGPRs spilling in MadMax, and it also reduces
number of SGPRs in DOW3 and F12017. The number of waves slightly
decreases in F1 but I don't see any performance changes after
benchmarking it. Talos and Serious Sam are not affected because
they don't use any push constants.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is a very simple pass that just shrinks load_push_constant
intrinsics when some components are unused. For now, it can just
shrink vec4 to vec3, vec3 to vec2 and so on.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fixes a regression for now, in the future we should gather
the used components properly.
V2: just set for VS and correctly handle doubles
Fixes: be973ed21f "radeonsi: load the right number of components for VS inputs and TBOs"
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
And use it in the enable code path.
Move _mesa_update_attribute_map_mode into its only remaining file.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
The mutex is currently used for reference counting and updating
the minmax index cache.
The change uses atomics directly for reference counting and
the mutex for the minmax cache.
This is safe since the reference count is not modified beside
in _mesa_reference_buffer_object where atomics aim to be used.
While using the minmax cache, the calling code holds a reference
to the buffer object. Thus unreferencing or even referencing the
buffer object does not need to be serialized with accessing
the minmax cache.
The change reduces the time _mesa_reference_buffer_object_ takes
by about a factor of two when looking at perf results for some
of my favorite use cases.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
If we have gaps in the shader mask we have to have 0x1 in them
according to a comment in radeonsi, and this is required to fix
the test at least on cayman.
We also need to record the highest one written to write to the
ps exports reg.
This fixes:
KHR-GL45.enhanced_layouts.fragment_data_location_api
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we only get 1,2,3,6 framebuffers we want a sparse target mask.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Since enhanced layouts allows setting specific MRT outputs, we
can get sparse outputs, so we have to calculate the shader
mask earlier.
v1.1: update checks for state update (Roland)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Set render cond and emit atom.
Fixes:
KHR-GL45.compute_shader.conditional-dispatching
Reviewed-by: Roland Scheidegger <sorland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We need to get the grid sizes earlier to fill in to the const
buffer.
Fixes:
KHR-GL45.compute_shader.built-in-variables
and
KHR-GL45.compute_shader.dispatch-indirect
Reviewed-by: Roland Scheidegger <sorland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This cleans up and fixes the previous fix even more.
Buffers from textures start at max const,
buffers from buffers/images come in from the 168 offset.
This fixes a bunch of:
KHR-GL45.shader_storage_buffer_object*
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
For buffers we want the size in bytes,
For images we want it in elements.
This fixes:
KHR-GL45.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-pad
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
for frag shaders we get a value in the key, I expect I need
to make compute work better
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The compute emission path always emits this currently, and emitting
it on the fragment path breaks the blitter.
This fixes gpu hangs in KHR-GL45.compute_shader.resource-texture
Reviewed-by: Roland Scheidegger <sorland@vmware.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This has 4 srcs.
This fixes:
KHR-GL45.shader_atomic_counter_ops_tests.ShaderAtomicCounterOpsExchangeTestCase
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
With sb enabled on cayman, this was overwriting the proper
cf index value with random ones if the dst gpr was 2 or 3,
only save the value for a MOVA instruction.
Fixes:
KHR-GL45.gpu_shader5.uniform_blocks_array_indexing
(on cayman with sb)
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes a few CTS cases in :
KHR-GL45.texture_view.view_sampling
some multisample cases are still broken, but not sure this is
the same problem.
v2: fix more cases
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
RX550 fails
dEQP-VK.renderpass.suballocation.multisample.d32_sfloat_s8_uint.samples_2
So increase the range of the workaround.
Fixes: f4c534ef6 (radv: don't enable tc compat for d32s8 + 4/8 samples (v1.1))
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fix INVALID_OPERATION caused by BufferData with target
EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD when the buffer size is
not page aligned.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: 17.3 18.0 <mesa-stable@lists.freedesktop.org>
In file included from ../src/gallium/targets/dri/target.c:1:
In file included from ../src/gallium/auxiliary/target-helpers/drm_helper.h:8:
../src/util/xmlpool.h:103:10: fatal error: 'xmlpool/options.h' file not found
See also 26bde1e3.
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
When using print_draw_arrays for debugging, we were printing an "n"
amount of vertex but that meant not to print all the size in the "n"
vertex, depending on the stride used.
Now we print the whole size in the "n" vertex.
Cc: Mathias Fröhlich <mathias.froehlich@web.de>
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
The loop goes through the list of enabled extensions marking them as
enabled in the list, but this relies on every other extension being
initialized to false by default.
This bug would make us, for example, advertise certain device extension
entry points as available even when the corresponding extensions had
not been enabled.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: abc62282b5 "anv: Add a per-device table of enabled extensions"
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
The SPIR-V parser splits in/out struct variables and creates
a separate variable for each first-level member of the struct.
When the struct variable has an initializer this means that we also
need to split the initializer.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Otherwise loop unrolling will fail to see the actual cost of
the unrolling operations when the loop body contains 64-bit integer
instructions, and very specially when the divmod64 lowering applies,
since its lowering is quite expensive.
Without this change, some in-development CTS tests for int64
get stuck forever trying to register allocate a shader with
over 50K SSA values. The large number of SSA values is the result
of NIR first unrolling multiple seemingly simple loops that involve
int64 instructions, only to then lower these instructions to produce
a massive pile of code (due to the divmod64 lowering in the unrolled
instructions).
With this change, loop unrolling will see the loops with the int64
code already lowered and will realize that it is too expensive to
unroll.
v2: Run nir_algebraic first so we can hopefully get rid of some of
the int64 instructions before we even attempt to lower them.
Reviewed-by: Matt Turner <mattst88@gmail.com>
CC r600_shader.lo
r600_shader.c: In function ‘egcm_int_to_double’:
r600_shader.c:4543:12: error: ‘ctx’ is a pointer; did you mean to use ‘->’?
if (ctx.bc->chip_class == CAYMAN)
^
->
Fixes: 35b4301577 ("r600/fp64: fix integer->double conversion")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Doing a straight uint/int->fp32->fp64 conversion causes
some precision issues, Roland suggested splitting the
integer into two portions and doing two separate
int->fp32->fp64 conversions then adding the results.
This passes the tests in CTS and piglit.
[airlied: fix cypress conversion opcodes]
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
set suitable defaults for 'dri-drivers', 'gallium-drivers', 'vulkan-drivers'
and 'platforms' options for osx, windows and cygwin, adding cygwin where
appropriate.
v2: error() for unknown OS
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Ken called this out in review, but it seems I forgot to make the change.
I noticed that the control flow annotations in the fragment shader
disassembly of tests/shaders/glsl-fs-loop-continue.shader_test were not
correct, and moving this line to the correct place fixes it.
According with OpenGL GLSL 3.20 spec, section 4.3.9:
"It is a link-time error if any particular shader interface
contains:
- two different blocks, each having no instance name, and each
having a member of the same name, or
- a variable outside a block, and a block with no instance name,
where the variable has the same name as a member in the block."
This fixes a previous commit 9b894c8 ("glsl/linker: link-error using the
same name in unnamed block and outside") that covered this case, but
did not take in account that precision qualifiers are ignored when
comparing blocks with no instance name.
With this commit, the original tests
KHR-GL*.shaders.uniform_block.common.name_matching keep fixed, and also
dEQP-GLES31.functional.shaders.linkage.uniform.block.differing_precision
regression is fixed, which was broken by previous commit.
v2: use helper varibles (Matteo Bruni)
Fixes: 9b894c8 ("glsl/linker: link-error using the same name in unnamed block and outside")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104668
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104777
CC: Mark Janes <mark.a.janes@intel.com>
CC: "18.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Matteo Bruni <matteo.mystral@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
If extensions GL_KHR_texture_compression_astc_hdr or
GL_KHR_texture_compression_astc_sliced_3d are implemented then ASTC
format are supported in CompressedTex*Îmage3D.
Fixes KHR-GLES2.texture_3d.* with this format.
CC: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
build_id_find_nhdr_for_addr() fails to find the build-id if the first LOAD
segment has a virtual address other than 0x0.
For most shared libraries, the first LOAD segment has vaddr=0x0:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00000000 0x00000000 0x2d2e26 0x2d2e26 R E 0x1000
LOAD 0x2d2e54 0x002d3e54 0x002d3e54 0x2e248 0x2f148 RW 0x1000
However, compiling the Intel Vulkan driver as 32-bit binary on Android produces
the following ELF header with vaddr=0x8000 instead:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x00008034 0x00008034 0x00100 0x00100 R 0x4
LOAD 0x000000 0x00008000 0x00008000 0x224a04 0x224a04 R E 0x1000
LOAD 0x225710 0x0022e710 0x0022e710 0x25988 0x27364 RW 0x1000
build_id_find_nhdr_callback() compares the address of dli_fbase from dladdr()
and dlpi_addr from dl_iterate_phdr(). With vaddr > 0, these point to a
different memory address, e.g.:
dli_fbase=0xd8395000 (offset 0x8000)
dlpi_addr=0xd838d000
At least on glibc and bionic (Android) dli_fbase refers to the address where
the shared object is mapped into the process space, whereas dlpi_addr is just
the base address for the vaddrs declared in the ELF header.
To compare them correctly, we need to calculate the start of the mapping
by adding the vaddr of the first LOAD segment to the base address.
Note: musl users will need the following patch.
https://git.musl-libc.org/cgit/musl/commit/?id=b3ae7beabb9f0c219bb8a8b63567a01c6530c1ac
Cc: Chad Versace <chadversary@chromium.org>
Cc: <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104642
Fixes: 5c98d38 "util: Query build-id by symbol address, not library name"
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Logics that related to dual instances encode should only be done for
H264, not other codecs.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Add HEVC picture desc, and add codec check when creating and destroying
context.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Move all H264 encode related functions into separate file. Similar to
VAAPI decode side, there will be separate file for each codec on encode
side as well.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Implement encoding of sps, pps, vps, aud, and slice headers for HEVC
based on HEVC specs.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Pass pipe_picture_desc instead of pipe_h264_enc_picture_desc so that
it can be used for different codecs. Add functions to handle picture
parameters that will be used for HEVC encode.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Add vcn encode interface for HEVC, and rename radeon_enc_h264_enc_pic
to radeon_enc_pic since radeon_enc_pic is used by both H264 and HEVC.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Obviously it would be good to have an ADD and a MUL and a signal together,
but we can even potentially have multiple signals merged, as well.
total instructions in shared programs: 100423 -> 97874 (-2.54%)
instructions in affected programs: 78812 -> 76263 (-3.23%)
We emit some MOVs to track lifetimes of payload registers, but we don't
need there to be actual MOV instructions for them.
total instructions in shared programs: 101045 -> 100423 (-0.62%)
instructions in affected programs: 37083 -> 36461 (-1.68%)
If this is an image buffer, we need to calculate the correct resource
id.
Fixes:
KHR-GL45.shader_image_size.*
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes a crash in:
KHR-GL45.texture_cube_map_array.texture_size_compute_sh.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When the disk shader cache CI testing was enabled, we started noticing
occasional failures on deqp test runs. (Mainly SNB, rarely HSW)
Before this change, when we cleared the (in memory) program cache we
reused the same bo. Since the disk shader cache quickly restores
programs, it appears that this would lead to overwrites of the older
program binaries in the in memory program cache that apparently were
still executing in some cases. If these programs were still executing,
this could cause a GPU hang.
This issue is probably not disk shader cache specific, but may have
been hidden due to the compiler taking time to recompile programs
after the cache was cleared.
v2:
* Don't add `copy` param to brw_cache_new_bo (Ken)
* Call from brw_program_cache_check_size (Ken)
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should increase performance by reducing SDRAM bank conflicts when
crossing between UIF columns (particularly on power-of-two height
textures).
The uif_xor_disable setup is dropped, since we need to allow XOR on lower
miplevels even when level 0 is XOR. The level 0 force UIF and level 0 XOR
flags should handle setting XOR properly on imported buffers.
The alignment here means that we can't get back the padded height from the
size/stride any more, so it's now a field in the slice as well.
Fixes piglit fbo-generatemipmap-formats RGBA16 NPOT.
The VC5 HW puts A in the low bits and R in the high bits. We can't just
swizzle in the shaders because the blending HW can't pick what channel A
is in, so make a new format to match it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
swapBytes operates on bytes, not 4-bit channels, so you can't just take
non-swapBytes cases and flip the REV flag.
Avoids piglit texture-packed-formats regressions when enabling the
ABGR4444 format.
Fixes: c5a5c9a7db ("mesa/formats: add new mesa formats and their pack/unpack functions.")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Move generated files from codegen/meson.build to other directories, in order
to satisfy generated include file dependencies
Add correct file lists for architecture-specific libraries.
cc: mesa-stable@lists.freedesktop.org
cc: dylan@pnwbakers.com
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Currently we always check for 3.9.0, which is pretty safe since
everything except radv work with >= 3.9 and 3.9 is pretty old at this
point. However, radv actually requires 4.0, and there is a patch for
radeonsi to do the same.
Fixes: 673dda8330 ("meson: build "radv" vulkan driver for radeon hardware")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Currently there is not a separate option for setting the search path of
DRI drivers in meson, like there is in scons and autotools. This is an
oversight and needs to be fixed. This adds an extra option
`dri-search-path`, which will default to the value of
`dri-drivers-path`, like autotools does.
v2: - Split input list before joining.
v3: - use : instead of ; as the delimiter. The autotools help string
incorrectly says ; but the code uses :
v4: - Take list in pre : delimited form (Ilia)
- Ensure that the dri-search-path is absolute when using
dri_drivers_path
Fixes: db9788420d ("meson: Add support for configuring dri drivers directory.")
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net> (v2)
Reviewed-by: Eric Engestrom <eric@engestrom.ch> (v3)
On travis, for OSX, python2 from homebrew is pre-installed. per [1]:
python points to the macOS system Python (with no manual PATH modification)
python2 points to Homebrew’s Python 2.7.x (if installed)
python3 points to Homebrew’s Python 3.x (if installed)
pip doesn't exist
pip2 points to Homebrew’s Python 2.7.x’s pip (if installed)
pip3 points to Homebrew’s Python 3.x’s pip (if installed)
We will end up using 'python2' for building mesa.
Just use 'pip2' instead of 'pip', as that seems to work for all platforms on
travis.
[1] https://docs.brew.sh/Homebrew-and-Python.html
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Use a '|' YAML literal block to avoid the convoluted syntax needed to put
the entire conditional on a single line.
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
An additional stub for applegl_create_context() is needed
Cannot test indirect API as it's not built on osx, currently
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
downsize_format_if_needed takes an integer as number of uploads
parameter. Hence, let's do an integer comparation instead of a boolean
check, since that is confusing.
Since we are at it, fix a couple of wrongly tabbed indents.
Cc: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This only affects drivers that set DriverFlags.NewBlend.
v2: - fix typo advanded -> advanced
- return "enum gl_advanced_blend_mode" from
_mesa_get_advanced_blend_sh_constant
- don't call FLUSH_VERTICES twice
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The old one generates useless instructions in there, found while
comparing geometry shaders between RadeonSI and RADV.
This improves all Vulkan demos that use geometry shaders, +4%
for deferredshadows, +9% for viewportarray, +7% for
geometryshader on Polaris10.
This seems to also improve DOW3 a little bit (+1%).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
I think the cp packets can be made work, but I think it might
need a kernel change, so for now just do the worst thing.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
By the looks of it it seems hemlock is treated separately to cypress, but
certainly it won't need the stack workarounds cedar/redwood (and
seemingly every other eg chip except cypress/juniper) need.
(Discovered by accident.)
Acked-by: Alex Deucher <alexander.deucher@amd.com>
This passes the CTS and piglit tests.
This also disable sb for helper invocations until it doesn't
mess up the VPM flags.
Thanks to Ilia and Glenn for advice, and Roland for working
out the working evergreen path.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
deqp does not allow any KHX extensions, and since deqp is included
in android-cts, android does not allow any khx extensions.
So disable VK_KHX_multiview on android.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
CC: 18.0 <mesa-stable@lists.freedesktop.org>
Using the newly introduced VAO array maps, we can
simplify vbo_bind_vertex_list.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Using the newly introduced VAO array maps, we can
simplify vbo_exec_bind_arrays.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Using the newly introduced VAO state variable, we can
simplify recalculate_input_bindings.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Instead of each context having its own map instance for
this purpose, use a global static const map.
v2: s,unsigned char,GLubyte,g
s,_VP_MODE_MAX,VP_MODE_MAX,g
Change comment style.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Since the first material attribute no longer aliases with
the generic0 attribute, only aliasing between generic0 and
position is left and entirely dependent on the enabled
state of the VAO. So introduce a gl_attribute_map_mode
in the VAO that is used to track how the position
and the generic 0 attribute alias.
Provide a static const array that can be used to
map from vertex program input indices to VERT_ATTRIB_*
indices. The outer dimension of the array is meant to
be indexed directly by the new VAO member variable.
Also provide methods on the VAO to convert bitmasks of
VERT_BIT's from the VAO numbering to the vertex processing
inputs numbering.
v2: s,unsigned char,GLubyte,g
s,_ATTRIBUTE_MAP_MODE_MAX,ATTRIBUTE_MAP_MODE_MAX,g
Change comment style, add comments.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
The materials are now moved to the end of the
generic attributes block to the range 4-15.
Before, the way the position and generic 0 attribute
is handled was dependent on the presence and kind of
the currently attached vertex program. With this
change the way the position attribute and the generic 0
attribute is treated only depends on the enabled
flag of those two arrays.
This will later help to untangle the update dependencies
between enabled arrays and shader inputs.
v2: s,VERT_ATTRIB_MAT_OFFSET,VERT_ATTRIB_MAT0,g
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Instead of just assuming that the material attributes
just overlap with the generic attributes 0-12, give
them symbolic defines so that we can easier move them
to an other range.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
When executing a display list draw, for the offset
list to be correct, the offset computation needs to
accumulate all attribute size values in order.
Specifically, if we are shuffling around the position
and generic0 attributes, we may violate the order or
if we do not walk the generic vbo attributes we may
skip some of the attributes.
Even if this is an unlikely usecase we can fix this use
case by precomputing the offsets on the full attribute list
and store the full offset list in the display list node.
v2: Formatting fix
v3: Rebase
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Once a lp_build_sampler_soa or lp_build_sampler_aos object is created,
it should never be modified. Found by inspection.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
In vbo_draw_indirect_prims() pass the 'indirect_data' argument to
vbo->draw_prims(). All the callers are passing ctx->DrawIndirectBuffer
so this should be no functional change. Add a (temporary) assertion to
be sure.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
And rename indirect_params -> indirect_draw_count_buffer and
indirect_params_offset -> indirect_draw_count_offset to be more
specific.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
This parameter (from the glMultiDrawArraysIndirectCountARB function)
is poorly named. It's an offset into the buffer which contains the
number of primitives to draw.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Get rid of a bunch of goto spaghetti. Remove unneeded do_retry parameter.
No Piglit changes. Also tested w/ Google Earth and other apps.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
The image_h for the tiling algorithm needs to be the padded-to-a-uifblock
height of the level, not the unpadded height or the height of level 0.
Fixes some cases of KHR-GLES3.texture_repeat_mode.* and
depthstencil-render-miplevels.
I thought I didn't need this because I was doing level-0-always-UIF and
that the pad there would propagate down, but it turns out that for level 1
the padding ends up being chosen by the HW. This brings us closer to
being able to turn on UIF XOR for increased performance, as well.
If we just make another gallium surface for the separate stencil, it's a
lot easier to keep track of which set of fields we're using in RCL setup.
This also incidentally fixes a little bug in setting up the surface's
padded height for separate stencil when the UIF-ness changes at different
levels of Z versus stencil.
This matches the naming of the other hub regs we get, and I don't know for
sure if UIFCFG will be the same register between the hub and the cores on
all versions.
Take into account the resource format, instead of applying a hardcoded
32bpp. This not only over-allocates 16bpp formats, but also results in
a wrong stride being filled into the handle.
Fixes: 848b49b288 ("gallium: add renderonly library")
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Daniel Stone <daniels@collabora.com>
In commit 3f353342a6 (present in 17.3.0)
we started unconditionally using I915_EXEC_NO_RELOC, which was
introduced in Linux v3.9. ChromeOS kernel 3.8 has backported this,
so it should work too.
Running on older kernels would likely result in every single batch
being rejected by the kernel, which is pretty catastrophic. Yet, it
appears that nobody noticed. So, let's just bump the official
requirement and move forward ever so slowly.
Fixes: 3f353342a6 ("i965: Use I915_EXEC_NO_RELOC")
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Only dive into the windows subdir if windows platform is selected.
Signed-off-by: Marc Dietrich <marvin24@gmx.de>
Fixes: 5ef75cb02b "meson: build src/glx/windows"
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
mesa/src/glx/glxcmds.c:1295:21: error: implicit declaration of function 'env_var_as_boolean' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
mesa/src/glx/apple/apple_visual.c:85:28: error: implicit declaration of function 'env_var_as_boolean' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Don't want an overly large numBufferBarriers/numTextureBarriers to blow
up the stack.
v2: handle malloc errors
v3: fix patch
v4: initialize texObjs/bufObjs
Suggested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
When the application doesn't provide its own pipeline cache,
the driver uses a in-memory cache but it shouldn't insert any
entries when the cache is explicitely disabled by the user.
Found while running my experimental pipeline-db tool with a
ton of shaders, the memory footprint was just huge, and sometimes
the process was even killed...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The Vulkan spec says:
"pipelineBindPoint is a VkPipelineBindPoint indicating whether
the descriptors will be used by graphics pipelines or compute
pipelines. There is a separate set of bind points for each of
graphics and compute, so binding one does not disturb the other."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104732
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
the vpm bit wasn't being applied to the push/pop instructions.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The vtx operations never got translated, so things worked by
0 being equal to 0, translate them so we can use the proper buffer
resinfo code.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
All of the current gallium nir driver use these optimisations but
they do so in their backends. Having these called in the backend
only can cause a number of problems:
- Shader compile times are greater because the opts need to do
significant passes over all shader variants.
- The shader cache is partially defeated due to the significant
optimisation passes over variants.
- We might miss out on nir linking optimisation opportunities.
Adding these passes to st_nir_opts() alleviates these problems.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The emission of vertex attributes corresponding to dvec3 and dvec4
vertex shader input variables was not correct when the <size> passed
to the VertexAttribL* commands was <= 2.
In 61a8a55f55 ("i965/gen8: Fix vertex attrib upload for dvec3/4
shader inputs"), for gen8+ we needed to determine if the attrib was
dual slot to emit 128 or 256-bit, independently of the VAO size.
Similarly, for gen < 8 we also need to determine whether the attrib is
dual slot to force the emission of 256-bits through 2 uploads.
Additionally, we make use of the ISL_FORMAT_R32_FLOAT format in this
second upload to fill these unspecified components with zeros, as we
also do for gen8+.
Fixes the following test on Haswell:
KHR-GL46.vertex_attrib_binding.basic-inputL-case1
v2: Added more inline comments to explain why we are using
ISL_FORMAT_R32_FLOAT and its consequences, as requested by
Alejandro and Antía.
Fixes: 75968a668e ("i965/gen7: expose OpenGL 4.2 on Haswell when
supported")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103006
Cc: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: Antia Puentes <apuentes@igalia.com>
Cc: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This requires moving the _MaxLevel handling up to the callers. Another
user of intel_finalize_mipmap_tree will be added later that depends on
_MaxLevel not being modified.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
brw_bo_wait_rendering used to take a brw_context pointer for perf_debug
messages about stalls. Chris eliminated that in 833108ac14.
This message about passing NULL to avoid those warnings is no longer
relevant, and just adds confusion. So, drop it.
This reverts commit 513c2263cb.
_mesa_base_fbo_format_ is used to validate the internalformat
passed to RenderbufferStorage, which in the OpenGL 4.6 is said:
"An INVALID_ENUM error is generated if internalformat is not one of the
color-renderable, depth-renderable, or stencil-renderable formats defined
in section 9.4."
RGB9_E5 format is not renderable, as stated in the same specification
(Bug 9338).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104794
Cc: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
This can lead to a situation where cache flushes could get conditionally
disabled while still clearing the flush_bits, and thus flushes due to
application pipeline barriers may never get executed.
Fixes: a6c2001ace (radv: add support for cmd predication.)
Signed-off-by: Dave Airlie <airlied@redhat.com>
This reverts part of the patch which introduced the GLenum16 change.
Fixes a conform regression found by Roland.
Fixes: f96a69f916 ("mesa: replace GLenum with GLenum16 in
common structures (v4)")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
If we ever hit this edge-case, it can theoretically cause problem for
CNL because we could end up changing render targets without re-emitting
3DSTATE_MULTISAMPLE which is part of the pipeline. Just get rid of the
edge case.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
min(a+b, c+d) >= 0 becomes (a+b >= 0 && c+d >= 0).
No shader-db changes, but it does prevent 6 to 12 instruction
regressions in the next patch on all measured Intel platforms.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
v2: Rebase on almost 2 years. Require that one of the arguments to fmin
or fmax be used only once. This prevents some regressions.
shader-db results:
Skylake and Broadwell had similar results. Skylake shown.
total instructions in shared programs: 14526021 -> 14525913 (<.01%)
instructions in affected programs: 4613 -> 4505 (-2.34%)
helped: 31
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 3.48 x̃: 4
helped stats (rel) min: 0.62% max: 6.67% x̄: 3.31% x̃: 2.42%
total cycles in shared programs: 533118710 -> 533118403 (<.01%)
cycles in affected programs: 34334 -> 34027 (-0.89%)
helped: 24
HURT: 0
helped stats (abs) min: 4 max: 24 x̄: 12.79 x̃: 14
helped stats (rel) min: 0.25% max: 2.40% x̄: 1.08% x̃: 1.03%
No changes on GM45, Iron Lake, Sandy Bridge, Ivy Bridge, or Haswell.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
We need this to be able to support the interpolateAt builtins in a
sane way. It also leads to the generation of more optimal code.
The lowering and splitting is made conditional on lower_all_io_to_temps
because vc4 and freedreno both expect these passes to be enabled and
niether support glsl 400 so don't need to deal with the interpolateAt
builtins.
We leave the other stages for now as to avoid regressions. Ideally we
could remove the stage checks and just set the nir options correctly
for each stage. However all gallium drivers currently just use return
the same nir compiler options for all stages, and it's probably more
trouble than its worth to change this.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We need this because we will always copy fs outputs to temps and
split the arrays, but do not want to do either of these with fs
inputs as it is unnessisary and makes handling interpolateAt
builtins difficult.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We need this to be able to support the interpolateAt builtins in a
sane way. It also leads to the generation of more optimal code.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We need to collect this when scanning over the instruction rather
than when scanning over the inputs otherwise we might get confliting
values for inputs that are use by the interpolateAt* builtins.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Also moved everything in a struct and then return the struct from
the helper function, so it is clear in the caller what part of the
pipeline gets modified.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This gives about 2% performance improvement on dota2 for me.
This is mostly a mechanical copy and replacement, but at bind time
we still do:
1) Some stuff that is only based on num_samples changes.
2) Some command buffer state setting.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This fixes the piglit test:
spec/ext_semaphore/api-errors/usigned-byte-i-v-bad-value
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This allows the client to actually query the enums specified in the
ext_external_objects spec.
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fixes the following piglit tests:
spec/ext_semaphore_fd/api-errors/import-semaphore-fd-bad-enum
spec/ext_memory_object_fd/api-errors/import-memory-fd-bad-enum
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When calling si_fence_server_sync(), the wait operation is associated
with the next kernel submission. Therefore, any unflushed work
submitted previous to fence_server_sync() will also be affected by
the wait.
To avoid adding the dependency to the unflushed work, we flush before
emitting the fence dependency.
v2: s/semaphore/fence
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Syncobj based waits or signals only happen at submission boundaries. In
order to guarantee that the requested signal event will occur when the
state tracker requested it, we must issue a flush.
v2: s/fence/semaphore for pipe objects
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Hook up importing semaphores of type PIPE_FD_TYPE_SYNCOBJ
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Add the ability to signal a syncobj when a cs completes execution.
v2: corresponding changes for gallium fence->semaphore rename
v3: s/semaphore/fence for pipe objects
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bits to implement ServerWaitSemaphoreObject/ServerSignalSemaphoreObject
v2:
- corresponding changes for gallium fence->semaphore rename
- flushing moved to mesa/main
v3: s/semaphore/fence for pipe objects
v4: add bitmap flushing
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Memory synchronization is left for a future patch.
v2: flush vertices/bitmaps moved to mesa/main
v3: removed spaces before/after braces
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
EXT_semaphore and EXT_semaphore_fd define no pnames. Therefore there
isn't much to do besides determining the correct error code.
v2: removed useless return
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Used by EXT_semmaphore and EXT_semaphore_fd
v2: Removed unnecessary dummy callback initialization
v3: Fixed attempting to free the DummySemaphoreObject
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Guarded by PIPE_CAP_SEMAPHORE_SIGNAL
v2: corresponding changes for PIPE_CAP_SEMAPHORE_SIGNAL rename
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Calling this function will emit a fence signal operation into the
GPU's command stream.
v2: documentation typos
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Denotes that a fd is backed by a synobj. For example, radv shared
semaphores.
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
An fd can potentially have different types of objects backing it.
Specifying the type helps us make sure we treat the FD correctly.
This is in preparation to allow importing syncobj fence FDs in addition
to native sync FDs.
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This function can get access for a 64-bit dvec4, which means we
have to load 8 components.
This fixes:
R600_DEBUG=nir ./bin/shader_runner generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-abs-dvec4.shader_test -auto
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Some of the enable/disable vertex array functions take a zero-based
generic index, while others take a VERT_ATTRIB_GENERIC0-based value.
Add an assertion to clarify that in one place.
Reviewed-by: Gert Wollny <gw.fossdev@gmail.com>
In creating fixed function vertex shader hash keys do only
care for producing the varying output if fog is enabled and the
varing is consumed in the fragment stage.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Using lower alignment restrictions for the state key fields finally
yields to a smaller hashing state key.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Remove set but not read field from the state key used for hashing
fixed function vertex shaders.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Remove set but not read field from the state key used for hashing
fixed function vertex shaders.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
For the state key for hashing fixed function vertex shaders, the
texgen_enabled field requires only a single bit.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
For the state key for hashing fixed function
vertex shaders, encode the different fog modes, including
if fog is generally enabled or not, into a 2 bit field.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
For the state key for hashing fixed function
vertex shaders, the information is only evaluated
if lighting is generally switched on.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
For the state key for hashing fixed function
vertex shaders, The varying_vp_inputs bitmask already
contains the point size array enabled information.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
If a shader only writes to an output via a constant initializer we
need to lower it before we call nir_remove_dead_variables so that
this pass sees the stores from the initializer and doesn't kill the
output.
Fixes test failures in new work-in-progress CTS tests:
dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output_vert
dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output_frag
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
texture_format_error_check_gles() displays error like "glTexImage%dD".
This patch just replace the %d by the correct dimension.
Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Inspired by Marek's earlier patch, but even smaller. Sort fields from
largest to smallest. Use bitfields for more fields (sometimes with an
extra bit for MSVC). Reduce Stride field to GLshort.
Note that some fields cannot be bitfields because they're accessed via
pointers (such as for glEnableClientState(GL_VERTEX_ARRAY) to set the
Enabled field).
Reduces size from 48 to 24 bytes.
Also reduces size of gl_vertex_array_object from 3632 to 2864 bytes.
And add some assertions in init_array().
v2: use s/GLuint/unsigned/, improve commit comments.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Inspired by Marek's earlier patch, but goes a little further.
Sort fields from largest to smallest. Use bitfields.
Reduced from 48 bytes to 32. Also reduces size of gl_vertex_array_object
from 4144 to 3632
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: - fix glGet*
- also use GLenum16 for DrawBuffers
v3: - rebase to top of tree (BrianP) and incorporate Ian's suggestions
v4: - fix a GLenum16 bug in VBO/save code, add some STATIC_ASSERT()s
gl_context = 152432 -> 136840 bytes
vbo_context = 22096 -> 20608 bytes
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
get_value_size() returns -1 for an error. The similar check in
_mesa_GetUnsignedBytei_vEXT() is correct.
Found by chance. There are apparently no Piglit tests which exercise
glGetUnsignedBytei_vEXT() or glGetUnsignedBytevEXT().
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Sometimes rasterization state object could be empty. This is causing
segfault on hw8,9,10 for some traces.
This patch fixes enemy_territory_quake_wars_high,
enemy_territory_quake_wars_low, etqw-demo, lightsmark2008, quake1
glretrace crashes on hw 8,9,10.
Tested with mtt-glretrace and mtt-piglit.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
According to spec, S3TC_DXT1_EXT RGB formats are supposed to be
opaque. Correspoding svga formats are not handling it so explicitly
setting it to 1.0.
This fixes piglit test spec@ext_texture_compression_s3tc@s3tc-targeted
Note: This test is testcase for freedesktop bug 100925
Tested with mtt-piglit and mtt-glretrace on 8,9,10,11 and 15
Reviewed-by: Brian Paul <brianp@vmware.com>
In the register lifetime estimation if the first write is unconditional or
conditional but not within a loop then this is an unconditional dominant
write in the sense of register life time estimation.
Add a test case and record the write accordingly.
Fixes: 807e2539e5 ("mesa/st/glsl_to_tgsi: Add
tracking of ifelse writes in register merging")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104803
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The size/type query is always legal (if we made it that far).
Removing this causes a difference for GL_TEXTURE_BUFFER - the reason is that
these parameters are valid only with GetTexLevelParameter() if gl 3.1 is
supported, but not if only ARB_texture_buffer_object is supported.
However, while the spec says that these queries return "the same information
as querying GetTexLevelParameter" I believe we're not expected to return just
zeros here. By definition, these pnames are always valid (unlike for the
GetTexLevelParameter() function which would return an error without GL 3.1).
The spec is a bit inconsistent there and open to interpretation - while
mentioning the "same information as querying GetTexLevelParameter" is
returned, it also mentions that 0 is returned for size/type if the
target/format is not supported - implying correct results to be returned
if it is supported, regardless that GetTexLevelParameter would return
an error. (Also, the bit about this returning the same as
GetTexLevelParameter also includes querying stencil type, which isn't
even possible with GetTexLevelParameter.)
This breaks some piglit arb_internalformat_query2 tests (which I believe to
be wrong).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>§
The code just considered all formats as being supported if they were either
a valid fbo or texture format.
This was quite awkward since then the query would return "supported" for
e.g. GL_RGB9E5 or compressed formats and target RENDERBUFFER (albeit the driver
could still refuse it in theory). However, when then querying for instance the
internalformat sizes, it would just return 0 (due to the checks being more
strict there).
It was also a problem for texture buffer targets, which have a more restricted
list of formats which are allowed (and again, it would return supported but
then querying sizes would return 0).
So only take validation of formats into account which make sense for a given
target.
Can also toss out some special checks for rgb9e5 later, since we'd never get
there if it wasn't supported in the first place.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Testing for gles there is just confusing - this is about target being
supported, if it was valid at all was already determined earlier
(in _legal_parameters). It didn't make sense at all in any case, since
it would only have said false there for gles for 2d but not 2d arrays etc.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This fixes an assert in nir_lower_var_copies() for some bioshock
shaders where an unused clipdistance array has no size.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
c2acf97fcc changed the use of double_inputs_read to be
inconsitent with its previous meaning. Here we re-enable the
gather info code that was removed as the modified code from
c2acf97fcc now uses the double_inputs member rather than
double_inputs_read.
This change allows us to use double_inputs_read with gallium
drivers without impacting double_inputs which is used by i965.
We also make use of the compiler option vs_inputs_dual_locations
to allow for the difference in behaviour between drivers that handle
vs inputs as taking up two locations for doubles, versus those that
treat them as taking a single location.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Allows nir drivers to either use a single or dual locations for
vs double inputs.
i965 uses dual locations for both OpenGL and Vulkan drivers, for
now gallium OpenGL drivers only use a single location.
The following patch will also make use of this option when
calling nir_shader_gather_info().
Reviewed-by: Karol Herbst <kherbst@redhat.com>
First we move double_inputs_read into a vs struct in the union,
double_inputs_read is only used for vs inputs so this will
save space and also allows us to add a new double_inputs field.
We add the new field because c2acf97fcc changed the behaviour
of double_inputs_read, and while it's no longer used to track
actual reads in i965 we do still want to track this for gallium
drivers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This ports a fix from amdvlk, to fix the sizing for mip levels
when block compressed images are viewed using uncompressed views.
My original fix didn't power the clamping, but it looks like
the clamping is required to stop the sizing going too large.
Fixes:
dEQP-VK.image.texel_view_compatible.graphic.extended*bc*
Doesn't crash DOW3 anymore.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Signed-off-by: Dave Airlie <airlied@redhat.com>
It did not signal syncobjs in the fence, and also signalled too early
if there was work on the queue already, as we have to wait till that
work is done.
Fixes: d27aaae4d2 "radv: Add external fence support."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The arrays specified by ctx->Array._DrawArrays are used for all
vertex drawing via vbo_context::draw_prims(). Different arrays are
used for immediate mode, vertex arrays, display lists, etc. Changing
from one to another requires updating derived/driver array state.
Before, we indirectly specifid the arrays with the gl_draw_method values.
Now we just directly specify the arrays instead. This is simpler and
will allow a subsequent display list optimization.
In the future, it might make sense to get rid of ctx->Array._DrawArrays
entirely and just pass the arrays as another parameter to
vbo_context::draw_prims().
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
To match the VERT_ATTRIB_COLOR_INDEX name.
Give a name to the previously anonymous enum of VBO_ATTRIB_x values.
Update the comment on the enum.
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Instead of NONE/ARB use FF/SHADER. Move the enum declaration to
vbo_private.h where it's used.
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
This change cleans following scary warnings in valgrind output
when disk cache is being written:
==6532== Uninitialised byte(s) found during client check request
==6532== at 0x14423FAD: blob_write_bytes (blob.c:152)
==6532== by 0x144240FB: blob_write_uint32 (blob.c:194)
==6532== by 0x144001A5: write_tex (nir_serialize.c:613)
and later (loads of):
==6532== Use of uninitialised value of size 8
==6532== at 0x62FCD9E: crc32_z (in /usr/lib64/libz.so.1.2.11)
==6532== by 0x13F65014: util_hash_crc32 (crc32.c:127)
==6532== by 0x13F5DABA: cache_put (disk_cache.c:947)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
==2780== 1,024 bytes in 1 blocks are possibly lost in loss record 180 of 205
==2780== at 0x4C31A1E: calloc (vg_replace_malloc.c:711)
==2780== by 0x13F6467E: util_queue_init (u_queue.c:309)
==2780== by 0x13F5C9F6: disk_cache_create (disk_cache.c:369)
==2780== by 0x13F05406: brw_disk_cache_init (brw_disk_cache.c:428)
==2780== by 0x13F01E78: brwCreateContext (brw_context.c:1068)
Fixes: 1a61a8b9a7 ("i965: Initialize disk shader cache if MESA_GLSL_CACHE_DISABLE is false")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
==25481== 576 bytes in 1 blocks are definitely lost in loss record 179 of 208
==25481== at 0x4C2FB6B: malloc (vg_replace_malloc.c:299)
==25481== by 0x1404E2CC: ralloc_size (ralloc.c:121)
==25481== by 0x14119F82: read_and_upload (brw_disk_cache.c:176)
==25481== by 0x1411A5C9: brw_disk_cache_upload_program (brw_disk_cache.c:271)
==25481== by 0x1412FCA4: brw_upload_wm_prog (brw_wm.c:597)
Fixes: 516d50db31 ("i965: add initial implementation of on disk shader cache")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This fixes the scenario where the input is a struct. With this
the Unreal engines Elemental demo now works on radeonsi.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This uses a different shader than radeonsi, as we can't address non-256
aligned ssbos, which the radeonsi code does. This passes some extra
offsets into the shader.
It also contains a set of u64 instruction implementation that may
or may not be complete (at least the u64div is definitely not something
that works outside this use-case). If r600 grows 64-bit integers,
it will use the GLSL lowering for divmod.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If the images/buffer bindings had a gap, this produced the wrong values,
this should fix that to generate the correct rat mask for mixes of
images/buffers/cbs.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Otherwise, using pkg-config to retrieve flags will fail, e.g.
$ pkg-config gl --cflags
Package libdrm was not found in the pkg-config search path.
Perhaps you should add the directory containing `libdrm.pc'
to the PKG_CONFIG_PATH environment variable
Package 'libdrm', required by 'gl', not found
Fixes: 3218056e0e ("meson: Build i965 and dri stack")
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
The changes in 4.2 haven't impacted any of our CL or state struct entries
that I can see, so I haven't enabled custom compile for doing 4.2 instead
of 4.1.
shader-db doesn't show any regression and 32-bit pointers with byval
are declared as VGPRs for some reason.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Without this we will fail to correctly serialise programs when
using glGetProgramBinary() if the program was retrieved from
the disk cache rather than freshly compiled.
Fixes: c69b0dd681 "st/glsl_to_tgsi: store num_tgsi_tokens in st_*_program"
Reviewed-by: Gert Wollny <gw.fossdev@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104762
I got reviews and fixed the patches locally, but ended up merging the
ones that I sent originally to the list. This patch fixes those
mistakes.
Fixes: 78c125af39
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
These packets were causing GPU hangs when the context was restored,
possibly because they were pointing to BO's that were already
unreferenced. So we tell the hardware to ignore such packets after the
batch buffer ends, since we know those BO's are not around anymore.
This change fixes GPU hangs on CNL. The (partial) solution to this
problem so far was to entirely disable push constants on this platform.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The warning happens on line 2114 for the memcpy(data, p, size) call.
I'm not sure why that generates the warning but not the earlier use
of p in the code.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
We assigned the function that gets the device uuid to the GetDriverUuid
function pointer and the function that gets the driver uuid to the
GetDeviceUuid function pointer inside the state tracker. Exchanged the
pointers.
cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
We need to access the pipeline layout to compute correct dynamic
offsets for dyamic UBO/SSBO descriptors when we emit draw commands.
Instead of taking it from the pipeline object, store the layout
in the command buffer pipeline state.
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The Vulkan spec states that VkPipelineLayout objects must not be
destroyed while any command buffer that uses them is in the recording
state, but it permits them to be destroyed otherwise. This means that
applications are allowed to free pipeline layouts after command recording
is finished even if there are pipeline objects that still exist and were
created with these layouts.
There are two solutions to this, one is to use reference counting on
pipeline layout objects. The other is to avoid holding references to
pipeline layouts where they are not really needed.
This patch takes a step towards the second option by making the
pipeline shader compile code take pipeline layout from the
VkGraphicsPipelineCreateInfo provided rather than the pipeline
object.
A follow-up patch will remove any remaining uses of the layout field
so we can remove it from the pipeline object and avoid the need
for reference counting.
v2: Use ANV_FROM_HANDLE, remove unnecessary braces (Jason)
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The spec states that descriptor set layouts can be destroyed almost
at any time:
"VkDescriptorSetLayout objects may be accessed by commands that
operate on descriptor sets allocated using that layout, and those
descriptor sets must not be updated with vkUpdateDescriptorSets
after the descriptor set layout has been destroyed. Otherwise,
descriptor set layouts can be destroyed any time they are not in
use by an API command."
v2: allocate off the device allocator with DEVICE scope (Jason)
Fixes the following work-in-progress CTS tests:
dEQP-VK.api.descriptor_set.descriptor_set_layout_lifetime.graphics
dEQP-VK.api.descriptor_set.descriptor_set_layout_lifetime.compute
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This attribute is similar to the definition of restrict in
C99 and it might help LLVM.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This allows to reduce the number of dwords that are loaded
with buffer_load_format_xyzw. For example, when the only used
channel is 1, the driver will emit buffer_load_format_x instead.
Shader stats for DOW3 (with some local hacky scripts for SPIRV):
143 shaders in 143 tests
Totals:
SGPRS: 5344 -> 5352 (0.15 %)
VGPRS: 3476 -> 3452 (-0.69 %)
Spilled SGPRs: 30 -> 29 (-3.33 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 269860 -> 269808 (-0.02 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 1267 -> 1272 (0.39 %)
Wait states: 0 -> 0 (0.00 %)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The GPU hangs when the driver forces a PS_PARTIAL_FLUSH after
a dispatch call (and vice versa for graphics). Something has
changed in the kernel driver because it used to work.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
According to LLVM, only pre-GFX9 targets do not flush denorms
for fmin/fmax.
All dEQP-VK.glsl.builtin.precision.* still pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Without this, we may end up dereferencing blend before we check for
binding->index != UINT32_MAX. However, Vulkan allows the blend state to
be NULL so long as you don't have any color attachments. This fixes a
segfault when running The Talos Principal.
Fixes: 12f4e00b69
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The values that this function returned were always the values passed
in. The only thing that happened was either an assertion or undefined
results when an unknown value was passed in. This doesn't seem that
useful. Most of nouveau_gldefs.h could be removed in this manner.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
And rename to emit_copy_blit.
v2: sed --in-place -e 's/color_logic_ops/gl_logicop_mode/g' $(grep -lr
color_logic_ops src/) suggested by Brian.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v1]
And rename to emit_copy_blit.
v2: sed --in-place -e 's/color_logic_ops/gl_logicop_mode/g' $(grep -lr
color_logic_ops src/) suggested by Brian.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v1]
And delete the resulting dead code. This has only been compile-tested.
v2: sed --in-place -e 's/color_logic_ops/gl_logicop_mode/g' $(grep -lr
color_logic_ops src/) suggested by Brian.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
With the exception of NVIDIA hardware, these are is the values that all
hardware and Gallium want. The remapping is currently implemented in at
least 6 places. This starts the process of consolidating to a single
place.
v2: sed --in-place -e 's/color_logic_ops/gl_logicop_mode/g' $(grep -lr
color_logic_ops src/) suggested by Brian. Added some comments about the
selection of bit patterns for gl_logicop_mode and the GLenums.
Suggested by Nicolai. Folded the GLenum_to_color_logicop macro into its
only users.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com> [v1]
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
-1 is considered an error for EGL_ANDROID_native_fence_sync, so
we need to actually create a sync file.
Fixes: f536f45250 "radeonsi: implement sync_file import/export"
Reviewed-by: Dave Airlie <airlied@redhat.com>
18fde36ced changed the way temporary
registers were allocated in lower_integer_multiplication so that we
allocate regs_written(inst) space and keep the stride of the original
destination register. This was to ensure that any MUL which originally
followed the CHV/BXT integer multiply regioning restrictions would
continue to follow those restrictions even after lowering. This works
fine except that I forgot to reset the register file to VGRF so, even
though they were assigned a number from alloc.allocate(), they had the
wrong register file. This caused some GLES 3.0 CTS tests to start
failing on Sandy Bridge due to attempted reads from the MRF:
ES3-CTS.functional.shaders.precision.int.highp_mul_fragment.snbm64
ES3-CTS.functional.shaders.precision.int.mediump_mul_fragment.snbm64
ES3-CTS.functional.shaders.precision.int.lowp_mul_fragment.snbm64
ES3-CTS.functional.shaders.precision.uint.highp_mul_fragment.snbm64
ES3-CTS.functional.shaders.precision.uint.mediump_mul_fragment.snbm64
ES3-CTS.functional.shaders.precision.uint.lowp_mul_fragment.snbm64
This commit remedies this problem by, instead of copying inst->dst and
overwriting nr, just make a new register and set the region to match
inst->dst.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103626
Fixes: 18fde36ced
Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This seems to be broken, at least the cts tests fail.
This fixes:
dEQP-VK.renderpass.suballocation.multisample.d32_sfloat_s8_uint.samples_4
dEQP-VK.renderpass.suballocation.multisample.d32_sfloat_s8_uint.samples_8
2 samples seems to pass fine, amdvlk doesn't appear to enable TC for
possibly some other reasons here.
This is most likely a hack.
v1.1: add a bit of explaination text. (Samuel)
Fixes: ad3d98da9 (radv: enable tc compatible htile for d32s8 also.)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Modify DumpToFile to only dump the function, not the entire module.
Reduces file sizes and speeds up the dumping.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Hides console window creation during JIT linker execution in apps that
don't have a console. Remove hooking of CreateProcessInternalA - the
MSFT implementation just turns around and calls CreateProcessInternalW
which, we do hook.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Early Rasterization did not initially work with USE_SIMD16_FRONTEND=0.
Fix it so it works there, too. Please note that the default setting
is USE_SIMD16_FRONTEND=1.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
We actually use some of the types from mtypes.h so include it directly
instead of relying on indirectly including it via bufferobj.h
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
All but two cases of the switch did the same n += InstSize[n[0].opcode]
instruction. Just move it after the switch.
Add some sanity check assertions.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The newest version of WSI Fusion makes several glDrawPixels calls
per frame. By caching more than one image, we get better performance
when panning/zooming the map.
v2: move pixel unpack param checking out of cache search loop, per Roland
v3: also move unpack->BufferObj check out of loop, per Roland.
Noticed while skimming for GLX_ instances in the dri codebase.
Comment is completely off and was in such a state since day 1.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Remove the instances already available in gl.h or glext.h.
Sadly GLclampx is only available in GLES(1) so we need to keep that one.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: a7cfec3be0 ("vbo: move VBO-private types, prototypes, etc. into
new vbo_private.h header")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Namely extend the EXTRA_DIST list, instead of re-assigning it and bring
back a file dropped by mistake.
Fixes: 436ed65d38 ("autotools: include meson build files in tarball")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
We are not allowed to modify the incoming coords values, or things may
crash (as we may be inside a llvm conditional and the values may be used
in another branch).
I recently broke this when fixing an issue with NaNs and seamless cube
map filtering, and it causes crashes when doing cubemap filtering
if the min and mag filters are different.
Add const to the pointers passed in to prevent this mishap in the future.
Fixes: a485ad0bcd ("gallivm: fix an issue with NaNs with seamless cube filtering")
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Previously, we were handling self-dependencies by marking the render
buffer and then passing disable_aux=true to prepare_texture so that it
would do a resolve. This works but ends us up doing to much resolving
in some cases. Specifically, if we're doing something such as mipmap
generation, this would cause us to resolve all levels of the texture if
even one of them is overlapping.
Instead, this commit makes us wait until we process the framebuffer to
do these resolves and we only resolve the slices needed for rendering.
Doing this resolve puts them into the pass-through state so, even if we
do texture using CCS_E, the CCS data will effectively be ignored and the
real surface contents read.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of keeping an array of booleans, we now hang onto an array of
isl_aux_usage enums. This means that the thing we are passing from
brw_draw.c to surface state setup is the thing that surface state setup
actually needs instead of an input to compute what it needs.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The only purpose of this function is to disable aux on texture surfaces
when the corresponding renderbuffer has aux disabled. However, the act
of disabling aux on the renderbuffer will cause it to be resolved and
intel_miptree_texture_aux_usage will already check the resolved status
of a texture and return ISL_AUX_USAGE_NONE for it. Even if we used CCS
for it, that wouldn't really be a problem because the CCS will be in the
pass-through state and so it would effectively be ignored.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Only one of the callers of intel_miptree_render_aux_usage actually took
brw->draw_aux_buffer_disabled into account. This was causing us to
ignore draw_aux_buffer_disabled for the intel_miptree_prepare_render.
This isn't a problem because the draw_aux_buffer_disabled entry was set
during texture preparation and we already did the resolve at that time.
However, this also meant that the aux_usage we were passing to
brw_cache_flush_for_render and brw_render_cache_add_bo was wrong so our
automatic cache flushing around aux_usage changes wasn't happening.
This was causing GPU hangs in Oxenfree.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104711
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104411
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104383
Fixes: ea0d2e98ec
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Both callers of intel_miptree_prepare/finish_render have to call
intel_miptree_render_aux_usage anyway for other reasons. They may as
well pass the result in instead of us calling it again.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As far as I can tell this always just reassigns the same value.
Also as we don't curretly store UniformHash in the shader cache
removing this will help with adding a shader cache to gallium
nir drivers.
Reviewed-by: Rob Clark <robdclark@gmail.com>
When LLVM is built inside of a git repo (even way below, e.g. /usr/ports/.git
exists, and LLVM is built in /usr/ports/devel/llvm50/work), its version
becomes something like 5.0.0git-f8ab206b2176.
New meson versions already handle this, but we support older versions too.
Fixes: 673dda8330 ("meson: build "radv" vulkan driver for radeon hardware")
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
get_pkgconfig_variable('cflags') always returns an empty list, it's a
function for getting *custom* variables.
Meson does not yet support asking for cflags, so explicitly invoke
pkg-config for now.
Fixes: 68076b8747 ("meson: build gallium vdpau state tracker")
Fixes: a817af8a89eb ("meson: build gallium xvmc state tracker")
Fixes: 1d36dc674d ("meson: build gallium omx state tracker")
Fixes: 5a785d51a6 ("meson: build gallium va state tracker")
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Looks like checking both sources was intended, instead of the first one
twice. Found with Coccinelle, coccinellery/xand/xand.cocci semantic patch.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This was just found while reading for other stuff,
src/core/hw/gfxip/gfx6/gfx6DepthStencilView.cpp.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We need to enable the pos float location 2 mode anytime we have
persample not just when forced by the frag shader.
This fixes:
dEQP-VK.pipeline.multisample.min_sample_shading*
Fixes: 58c97a079 (radv: enable location at sample when persample is forced.)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Before we were adding -DHAVE_SSE41 which isn't what the code is
looking for, so some uses of the sse4.1 code were always being
skipped.
v2: Don't add any compile check for the quite old -msse4.1 option (Dylan)
Fixes: 84486f6462 ("meson: Enable SSE4.1 optimizations")
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
With the implementation of the tracking of the registers used in reladdr
asserting that a driver calling merge_register() uses the address register
is no longer needed.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Add a code line type that accepts one layer of indirect addressing and
add tests to check that temporary register access used for indirect
addressing is accounted for in the lifetime estimation.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
So far indirect addressing was not tracked to estimate the temporary
life time, and it was not needed, because code to load the address
registers was always emitted eliminating the reladdr* handles in the
past glsl-to.tgsi stages. Now, with Mareks patch allowing any 1D register
to be used for addressing on some hardware this changed, and
the tracking becomes necessary.
Because the registers have no direct indication on whether the reladdr* was
already loaded into an address register, the temporaries in reladdr* are
always tracked as reads. This may result in a slight over-estimation of the
lifetime in the cases when the load to the address register was emitted.
v2: no changes
v3: Use debug_log variable instead of directly writing to std::err in debugging
output.
v6: fix indention and typos
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Additional tests are added that check the tracking of access to temporaries
in if-else branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Improve the life-time evaluation of temporary registers by also tracking
writes in both if and else branches and in up to 32 nested scopes.
As a result the estimated required register life-times can be further
reduced enabling more registers to be merged.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
* Merge the classes MockCodeLine and MockCodelineWithSwizzle into
one, and refactor tests accordingly.
* Change memory allocations to use ralloc* interface.
v2:
* move the test classes into a conveniance library
* rename the Mock* classes to Fake* since they are not really
Mocks
* Base assertion of correct number of src and dst registers in tests
on what the operatand actually expects
* Fix number of destinations in one test
v6:
* fix local includes using "..." insteadof <...>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Don't allocate a zero-sized array, when no texture offsets are given.
v5: correct spaces and empty lines
Reviewed-by: Brian Paul <brianp@vmware.com>(v4)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Add the equal operator and the "<<" stream write operator for the
st_*_reg classes and the "<<" operator to the instruction class, and
make use of these operators in the debugging output.
v5: Fix empty lines
Reviewed-by: Brian Paul <brianp@vmware.com> (v4)
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Non-VBO sources files sometimes included vbo.h while others included
vbo_context.h. We're moving all public types, functions to the former.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
_vbo_DestroyContext() can be safely called even if there's no VBO
module. Removes a dependency on the vbo_context() function.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The Vulkan spec says:
"pipelineBindPoint is a VkPipelineBindPoint indicating whether the
descriptors will be used by graphics pipelines or compute pipelines.
There is a separate set of bind points for each of graphics and
compute, so binding one does not disturb the other."
Up until now, we've been ignoring the pipeline bind point and had just
one bind point for everything. This commit separates things out into
separate bind points.
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102897
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
It's now a function which returns the push descriptor set. Since we set
the error on the command buffer, returning the error is a little
redundant. Returning the descriptor set (or NULL on error) is more
convenient.
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
There are several places where we'd already saved the pipeline off to a
temporary variable but, due to an artifact of history, weren't actually
using that temporary everywhere. No functional change.
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
This splits anv_cmd_state_reset into separate init and finish functions.
This lets us share init code with cmd_buffer_create. This potentially
fixes subtle bugs where we may have missed some bit of state that needs
to get initialized on command buffer creation.
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
This is ported from radeonsi and fixes:
dEQP-VK.pipeline.multisample_shader_builtin.sample_mask.bit_*
v2: don't call this path for radeonsi, it does it in the epilog.
use the radeonsi code path.
v3: handle NULL pCreateInfo->pMultisampleState properly (Samuel)
v3.1: set ps_iter_samples default to 1 (Bas)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: bdcbe7c76 (radv: add sample mask input support)
Signed-off-by: Dave Airlie <airlied@redhat.com>
radeonsi has a workaround for this, but it uses a R16A16 format,
which vulkan doesn't have, we could probably come up with a work
around but for now just avoid hw resolves.
Fixes:
dEQP-VK.renderpass.suballocation.multisample.r16g16_*norm*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: 2a04f5481d (radv/meta: select resolve paths)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From reading AMDVLK it currently never uses hw resolve paths.
This patch takes from radeonsi which doesn't use hw resolve
for integer formats, and does the same for radv.
This fixes:
dEQP-VK.renderpass.suballocation.multisample*uint tests.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: 2a04f5481d (radv/meta: select resolve paths)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Some of the hw resolve passes need the SPI color format setup
correctly.
This fixes lots of 16-bit and 32-bit format tests in
dEQP-VK.renderpass.suballocation.multisample*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Signed-off-by: Dave Airlie <airlied@redhat.com>
'cleanup' path is dereferencing 'svga' a lot, 'done' is a better choice.
Found by Coccinelle.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Defines
- HAVE_FUNC_ATTRIBUTE_RETURNS_NONNULL
- HAVE_FUNC_ATTRIBUTE_VISIBILITY
were misspelled.
Signed-off-by: Marc Dietrich <marvin24@gmx.de>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Technically, the Vulkan spec requires that we return valid entrypoints
for all core functionality and any available device extensions. This
means that, for gen-specific functions, we need to return a trampoline
which looks at the device and calls the right device function. In 99%
of cases, the loader will do this for us but, aparently, we're supposed
to do it too. It's a tiny increase in binary size for us to carry this
around but really not bad.
Before:
text data bss dec hex filename
3541775 204112 6136 3752023 394057 libvulkan_intel.so
After:
text data bss dec hex filename
3551463 205632 6136 3763231 396c1f libvulkan_intel.so
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The Vulkan spec annoyingly requires us to track what core version and
what all extensions are enabled and only advertise those entrypoints.
Any call to vkGet*ProcAddr for an entrypoint for an extension the client
has not explicitly enabled is supposed to return NULL.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This lets us move a bunch of stuff out of codegen and back into
anv_device.c which is a bit nicer.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This removes some redundant code between libanv_common, libvulkan_intel,
and libvulkan_intel_test.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The new anv_extensions_gen.py is the code generator while the old
anv_extensions.py file is purely declarative.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Fixes the following piglit tests:
arb_shader_image_load_store/layer/image3d/layered binding test
arb_shader_image_load_store/max-size/image3d max size test/2048x8x8x1
arb_shader_image_load_store/max-size/image3d max size test/8x2048x8x1
arb_shader_image_load_store/max-size/image3d max size test/8x8x2048x1
arb_shader_image_load_store/semantics/imageload/vertex shader/rgba32f/image3d test
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This makes the following changes to address cleanup issues:
- Error conditions now return NULL instead of calling exit()
- swr_creen is now freed upon error, rather than leak.
- Library handle from dlopen is now closed upon swr_screen destruction
v2: Added additional context in commit msg and remove unnecessary "PUBLIC"
v3: Fix typo in commit message.
Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: Bruce Cherniak <bruce.cherniak@intel.com>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
cc: mesa-stable@lists.freedesktop.org
Fixes the following piglit test on radeonsi:
./bin/arb_enhanced_layouts-gs-stream-location-aliasing
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
pthread_setname_np was added in glibc 2.12 for the Linux port only, other
ports do not necessarily have it.
Signed-off-by: Jose Fonseca <jfonseca@vmware.com>
Courtesy of clang static analyzer.
I was hunting for potential sources of memory corruption using Mesa with
a GL trace, and happened to find this (unrelated) issue.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
We have to start render targets at binding table index 0 in order to use
headerless FB write messages, and in fact already assume this in a bunch
of places in the code. Let's finish that off, and not bother storing 0
in a struct to pretend to add it in a few places.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
It is never a negative number. Variable is compared against unsigned
values and passed into functions that expect unsigned int.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The ifdef spaghetty in st_vdpau.c is rather confusing and misleading.
Simplily it by introducing a static inline helper noop (when
HAVE_ST_VDPAU is not defined) in the header.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Christian König <christian.koenig@amd.com>
We need this to ensure that GTT maps work on buffers we get from Vulkan
on the off chance that someone does a readpixels or something. Soon, we
will be removing GTT maps from i965 entirely and this can be reverted.
None the less, it's needed for stable.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
This fixes a bug where we were taking the tiling from the BO regardless
of what the modifier said. When we got images in from Vulkan where it
doesn't set the tiling on the BO, we would treat them as linear even
though the modifier expressly said to treat it as Y-tiled.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
addrlib asserts when that happens, and supporting it is not
required so lets not allow this for now.
It also assert on fmask, but we don't have the number of samples here.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This gets memcpy'd and written driectly, and due to alignment, this
resulted in uninitialized gaps. This makes those gaps go away.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
When switching between framebuffers with and without TS, the TS state
needs to be flushed to the command stream even if the derived state
isn't changed.
Fixes: 4ee7c2c284 ("etnaviv: enable TS, but disable autodisable")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
We've been requiring this since GLES 3.0 was introduced, but the GLES 3.2
spec is the one that has "Supporting blending on a per-draw-buffer basis"
in the new features. V3D 3.3 would require lowering blending to shader
code to implement independent blending.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This hasn't been true in 6+ years, if it was even true then. Before
we rewrote the compiler and introduced GLSL IR in 2010-2011, i965 used
to have two compiler backends for WM programs, based on Mesa IR. One
handled flow control and was SIMD8-only, while the other was SIMD16
only and didn't handle flow control. Or something like that.
Even then, this certainly didn't handle vertex shaders, so "all ...
code generation" is a bit strong.
This adds the meson.build, meson_options.txt, and a few scripts that are
used exclusively by the meson build.
v2: - Remove accidentally included changes needed to test make dist with
LLVM > 3.9
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
For some reason llvm5 is picky about accepting a void * type in the
case of building an argument list.
Since we don't care about the type (we ignore the argument for now),
pick another pointer type
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Early Rasterization is an optimization for small triangles.
Scientific workloads often contain very small triangles that has non-zero
area and cannot be trivially rejected as falling between pixel centers,
but does not cover any pixel center. Those triangles can be initially
rasterized as early as in binner and rejected if they cover no pixels The
optimization can be disabled in compilation using KNOB_ENABLE_EARLY_RAST
option in knobs.h
The Early Rast is disabled by default.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Flip the switch(es) to enable simd16 vertex shaders:
USE_SIMD16_SHADERS and USE_SIMD16_VS
Both have to be enabled at the same time. Currently, just setting
USE_SIMD16_SHADERS does not work correctly.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Supporting simd16 vertex shaders involves packing the output of the
fetch shader appropriately, especially the vertexID buffers that have to
be formatted in one simd16 register, needed by the VS.
As part of this support, we needed to remove the 2nd JitManager, since it
was not accounting for vector width correctly.
USE_SIMD16_SHADERS is also split into two defines. The additional
one (USE_SIMD16_VS) controls the width of the vertex shader (VS), while
the original one (USE_SIMD16_SHADERS) controls overall front end width.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Add a new define (USE_SIMD16_VS), to denote calling a 16-wide vertex shader.
This is needed because the mesa driver can do 16-wide shaders, but rasty
cannot yet, so we need to distinguish.
Create a new VertexID entry (VertexID16) for the USE_SIMD16_VS case, since
we need to format the vertex id in a way that is digestible by the 16-wide VS
Disabled for now. To be enabled in a future checkin when driver work
is complete.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Add name argument to x86 autogenerated macros.
Add useful variable names for DCL_inputVec implementation.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
- Move debug .ll files to JIT_CACHE_DIR
- Don't link against jitter SRGBLut table, add global data to shader that needs it.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Adds ability to step into jitted llvm IR in Visual Studio.
- Updated llvm type generation script to also generate corresponding debug types.
- New module pass inserts debug metadata into the IR for each function
Disabled by default.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
The user SGPR location can change between pipelines, so we need to
emit it again to the pottentially changed SGPR index.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Growing the batch/state buffer is a lot more dangerous than I thought.
A number of places emit multiple state buffer sections, and then write
data to the returned pointer, or save a pointer to brw->batch.state.bo
and then use it in relocations. If each call can grow, this can result
in stale map references or stale BO pointers. Furthermore, fences refer
to the old batch BO, and that reference needs to continue working.
To avoid these woes, we avoid ever swapping the brw->batch.*.bo pointer,
instead exchanging the brw_bo structures in place. That way, stale BO
references are fine - the GEM handle changes, but the brw_bo pointer
doesn't. We also defer the memcpy until a quiescent point, so callers
can write to the returned pointer - which may be in either BO - and
we'll sort it out and combine the two properly in the end.
v2/v3:
- Handle stale pointers in the shadow copy case, where realloc may or
may not move our shadow copy to a new address.
- Track the partial map explicitly, to avoid problems with buffer reuse
where multiple map modes exist (caught by Chris Wilson).
v4:
- Don't use realloc in the CPU shadow case, it isn't safe.
Fixes: 2dfc119f22 "i965: Grow the batch/state buffers if we need space and can't flush."
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> [v3]
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
'aux' is a very generic name, suggesting it can be a bunch of things.
However, it's always the brw_*_prog_data structure. So, call it that.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part 2 of 2 (part 1 is autoconf changes, part 2 is C++ changes)
When only a single SWR architecture is being used, this allows that
architecture to be builtin rather than as a separate libswrARCH.so that
gets loaded via dlopen. Since there are now several different code
paths for each detected CPU architecture, the log output is also
adjusted to convey where the backend is getting loaded from.
This allows SWR to be used for static mesa builds which are still
important for large HPC environments where shared libraries can impose
unacceptable application startup times as hundreds of thousands of copies
of the libs are loaded from a shared parallel filesystem.
Based on an initial implementation by Tim Rowley.
v2: Refactor repetitive preprocessor checks to reduce code duplication
v3: Formatting changes per Bruce C. Also delay screen creation until end
to avoid leaks when failure conditions are hit.
Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
CC: Tim Rowley <timothy.o.rowley@intel.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Part 1 of 2 (part 1 is autoconf changes, part 2 is C++ changes)
When only a single SWR architecture is being used, this allows that
architecture to be builtin rather than as a separate libswrARCH.so that
gets loaded via dlopen. Since there are now several different code
paths for each detected CPU architecture, the log output is also
adjusted to convey where the backend is getting loaded from.
This allows SWR to be used for static mesa builds which are still
important for large HPC environments where shared libraries can impose
unacceptable application startup times as hundreds of thousands of copies
of the libs are loaded from a shared parallel filesystem.
Based on an initial implementation by Tim Rowley.
v2: Fix comment placement pointed out by Bruce C.
Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
CC: Tim Rowley <timothy.o.rowley@intel.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
As a followup to the previous patch propagate the change of numSamples
from int to unsigned to gl_config::samples and consequently fix some
-Wsign-compare warnings.
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
According to the ARB_multisample num_samples is a non-negative integer.
Consequently define it as such, fail in glx/choose_visual if a negative
number is given.
v2: split patch into gallium and mesa part
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
vk_error() is a macro that calls __vk_errorf() with instance == NULL.
Then, __vk_errorf() passes a pointer to instance->debug_report_callbacks
to vk_debug_error(), which segfaults as this pointer is invalid but not
NULL.
Fixes: e5b1bd6ab8 "vulkan: move anv VK_EXT_debug_report implementation to common code."
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
When a channel was not set we also did not increase the LDS address,
while that obviously should happen.
The output loading code was inadvertently fixed which resulted in a
mismatch causing the SaschaWillems tessellation demo to result
in corrupt rendering.
Fixes: 7898eb9a60 "ac: rework load_tcs_{inputs,outputs}"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Passes
dEQP-VK.api.smoke.*
dEQP-VK.wsi.android.*
with android-cts-7.1_r12 .
Unlike the initial anv implementation this does
use syncobjs instead of waiting on the CPU.
This is missing meson build coverage for now.
One possible todo is that linux 4.15 now has a
sycall that allows us to export amdgpu fence to
a sync_file, which allows us not to force all
fences and semaphores to use syncobjs. However,
I had trouble with my kernel crashing regularly
with NULL pointers, and I'm not sure how beneficial
it is in the first place given that intel uses
syncobjs for all fences if available.
Reviewed-by: Dave Airlie <airlied@redhat.com>
lp_build_interleave2_half was not doing the right thing for avx512-style
16-wide loads.
This path is hit in the swr driver with a 16-wide vertex shader. It is
called from lp_build_transpose_aos, when doing texel fetches and the
fetched data needs to be transposed to one component per output register.
Special-case the post-load swizzle operations for avx512 16x32 (16-wide
32-bit values) so that we move the xyzw components correctly to the outputs.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The optimization in change 8e4efdc895 ("vbo: optimize some display
list drawing") missed the loopback case. This is used when the
glBegin/End primitive doesn't have a uniform set of vertex attributes.
The new Piglit gl-1.0-dlist-materials test hits this.
So check the aligned_vertex_buffer_offset(list) value and adjust the
buffer offset accordingly.
We also need to remove the 'start == 0' assertion in the loopback
code since it no longer applies.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Currently a couple of gallium targets race with xmlpool_options.h being
generated, don't do that.
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The view index user sgpr wasn't being accounted for properly,
this refactors out the code to decide if it's required and then
uses that info to account for it.
Fixes: 180c1b924e (ac/nir: Add shader support for multiviews.)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Only one piglit test fails,
sso-vs-gs-fs-array-interleave
There are 3 tests using ssbo without checking sizes failing also
but those are test bugs.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The kernel is moving to a $class$instance naming scheme in preparation
for accommodating more rings in the future in a consistent manner. It is
already using the naming scheme internally, and now we are looking at
updating some soft-ABI such as the error state to use the new naming
scheme. This of course means we need to teach aubinator_error_decode how
to map both sets of ring names onto its register maps.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Portal 2 appears to bind RGBA8888_UNORM textures to a sampler2DShadow,
and calls shadow2D() on it. This causes undefined behavior in OpenGL.
Unfortunately, our sampler appears to hang in this scenario, which is
not acceptable. Just give them a null surface instead, which returns
all zeroes.
Fixes GPU hangs in Portal 2 on Kabylake.
Huge thanks to Jason Ekstrand for noticing this crazy behavior while
sifting through crash dumps.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104487
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
From the Vulkan spec with KHX extensions:
"If queries are used while executing a render pass instance that has
multiview enabled, the query uses N consecutive query indices
in the query pool (starting at query) where N is the number of bits
set in the view mask in the subpass the query is used in.
How the numerical results of the query are distributed among the
queries is implementation-dependent. For example, some implementations
may write each view's results to a distinct query, while other
implementations may write the total result to the first query and write
zero to the other queries. However, the sum of the results in all the
queries must accurately reflect the total result of the query summed
over all views. Applications can sum the results from all the queries to
compute the total result."
In our case we only really emit a single query (in the first query index)
that stores the aggregated result for all views, but we still need to manage
availability for all the other query indices involved, even if we don't
actually use them.
This is relevant when clients call vkGetQueryPoolResults and pass all N
queries to retrieve the results. In that scenario, without this patch,
we will never see queries other than the first being available since we
never emit them.
v2: we need the same treatment for timestamp queries.
v3 (Jason):
- Better an if instead of an early return.
- We can't write to this memory in the CPU, we should use
MI_STORE_DATA_IMM and emit_query_availability (Jason).
v4 (Jason):
- No need to take the value to write as parameter, just hard code it to 0.
Fixes test failures in some work-in-progress CTS multiview+query tests.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Since setup of ALLOW_RGB10_CONFIGS was moved to i965's own
brw_config_options.xml, this was hard-coded to false and
could not be overriden by drirc. Add some parsing into
i965's private screen->optionCache to enable drirc again.
Fixes: b391fb26df ("dri_util: remove ALLOW_RGB10_CONFIGS option (v2)")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This was handled for VS, but not for GS.
Fixes for gallium drivers using nir:
spec@arb_gpu_shader5@arb_gpu_shader5-xfb-streams-without-invocations
spec@arb_gpu_shader5@arb_gpu_shader5-xfb-streams*
spec@arb_transform_feedback3@arb_transform_feedback3-ext_interleaved_two_bufs_gs*
spec@ext_transform_feedback@geometry-shaders-basic
spec@ext_transform_feedback@* use_gs
spec@glsl-1.50@execution@geometry@primitive-id*
spec@glsl-1.50@execution@geometry@tri-strip-ordering-with-prim-restart gl_triangle_strip *
spec@glsl-1.50@transform-feedback-builtins
spec@glsl-1.50@transform-feedback-type-and-size
v2: don't call st_translate_program_stream_output) for TCS
v3: drop scanning patch outputs as TCS can't output xfb
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Karol Herbst <kherbst@redhat.com>
if no destination:
a) convert _RET instructions to non _RET variants if no dst
b) set src0 to undefined if it's a READ, this should get DCE then.
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The normal ssa renumbering isn't sufficient for LDS queue access,
this uses two stacks, one for the lds queue, and one for the
lds r/w ordering.
The LDS oq values are incremented in their use in a linear
fashion.
The LDS rw values are incremented in their definitions and used
in the next lds operation to ensure reordering doesn't occur.
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
So LDS ops have to be SLOT_X,
and LDS OQ reads have read port restrictions so we try
and force those into only having one per slot and avoiding
bank swizzles.
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This tries to avoid an lds queue read getting scheduled separately
from an lds ret read, the non-sb code uses the same style of hammer,
this isn't foolproof.
We can do better, but it's a bit tricky, as you have to scan ahead
and either schedule more lds oq moves and more lds reads and that
could lead to you running out of space anyways.
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support for tracking the lds oq read/writes
so can avoid scheduling other things in between.
This patch just adds the tracking and assert to show
problems.
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
You have to schedule LDS_READ_RET _, x and MOV reg, LDS_OQ_A_POP
in the same basic block/clause. This makes sure once we've issues
and MOV we don't add another block until we balance it with an
LDS read.
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We need to convert these to the hw special registers.
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This handles parsing the LDS ops and queue accessess.
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes bad interactions with the LDS special values.
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
For LDS read/write ordering we use the LDS_RW value, reads
will wait on previous writes.
For LDS read/read from LDS queue ordering we use the LDS_OQ
values, we define two for now, though initially we'll just
support OQA.
Also add the check for the lds oq values
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
It's rare to have a final alu clause on normal shaders (exports)
but tess shaders write to LDS as their output, so we see some
alu clauses, and the CF_END get put in the wrong place.
This makes sure to update last_cf correctly.
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support for GDS ops to sb backend.
This seems to work for atomics and tess factor writes.
Acked-By: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Some tess shaders were doing MOVA_INT _, c0.x on cayman, and then
hitting an assert in sb_bc_finalize.cpp:translate_kcache.
This makes sure the toplevel kcache tracker gets updated,
and the clause gets fixed up.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This field is ignored for tf writes so should be 0.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Looks like the decompress does not handle invalid encodings well,
which happens with random memory. Of course apps should not use it
with random memory, but they are allowed to ....
Fixes: 44fcf58744 "radv: Disable DCC for GENERAL layout and compute transfer dest."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Now that we have two of these, we're duplicating a bunch of this logic.
The next commit will add more logic, which would make the duplication
seem worse.
This ends up setting EXEC_OBJECT_CAPTURE on the batch, which isn't
necessary (it's already captured), but it should be harmless.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Having a boolean for "we're using malloc'd shadow copies for all
buffers" is cleaner than having a cpu_map pointer for each. It was
okay when we had one buffer, but this is more obvious.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Previously the dataflow propagation algorithm would calculate the ACP
live-in and -out sets in a two-pass fixed-point algorithm. The first
pass would update the live-out sets of all basic blocks of the program
based on their live-in sets, while the second pass would update the
live-in sets based on the live-out sets. This is incredibly
inefficient in the typical case where the CFG of the program is
approximately acyclic, because it can take up to 2*n passes for an ACP
entry introduced at the top of the program to reach the bottom (where
n is the number of basic blocks in the program), until which point the
algorithm won't be able to reach a fixed point.
The same effect can be achieved in a single pass by computing the
live-in and -out sets in lock-step, because that makes sure that
processing of any basic block will pick up the updated live-out sets
of the lexically preceding blocks. This gives the dataflow
propagation algorithm effectively O(n) run-time instead of O(n^2) in
the acyclic case.
The time spent in dataflow propagation is reduced by 30x in the
GLES31.functional.ssbo.layout.random.all_shared_buffer.5 dEQP
test-case on my CHV system (the improvement is likely to be of the
same order of magnitude on other platforms). This more than reverses
an apparent run-time regression in this test-case from my previous
copy-propagation undefined-value handling patch, which was ultimately
caused by the additional work introduced in that commit to account for
undefined values being multiplied by a huge quadratic factor.
According to Chad this test was failing on CHV due to a 30s time-out
imposed by the Android CTS (this was the case regardless of my
undefined-value handling patch, even though my patch substantially
exacerbated the issue). On my CHV system this patch reduces the
overall run-time of the test by approximately 12x, getting us to
around 13s, well below the time-out.
v2: Initialize live-out set to the universal set to avoid rather
pessimistic dataflow estimation in shaders with cycles (Addresses
performance regression reported by Eero in GpuTest Piano).
Performance numbers given above still apply. No shader-db changes
with respect to master.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104271
Reported-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Both state->prog->info.inputs_read and state->InputsBound are GLbitfield64
so it seems that the OR of those values should be of the same type.
I'm not sure this fixes any actual issues though.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The values will never be larger than VBO_ATTRIB_MAX (currently 44).
v2: add STATIC_ASSERT to be sure VBO_ATTRIB_MAX can fit in ubyte,
per Emil.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The vbo_save_vertex_list structure records one or more glBegin/End
primitives which all have the same vertex format.
To draw these primitives, we setup the vertex array state, then
issue the drawing command. Before, the 'start' vertex was typically
zero and we used the vertex array pointer to indicate where the
vertex data starts.
This patch checks if the vertex buffer offset is an exact multiple of
the vertex size. If so, that means we can use zero-based vertex array
pointers and use the draw's start value to indicate where the vertex
data starts.
This means a series of display list drawing commands may have
identical vertex array state. This will get filtered out by the
Gallium CSO module so we can issue a tight series of drawing commands
without state changes to the device.
Note that this also works for a series of glCallList commands (not
just one list that contains multiple glBegin/End pairs).
No Piglit or conform changes.
v2: minor fixes suggested by Ian.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Using a plural name makes it easier to see that this is an array and
not a pointer to a single object.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This query shows the ratio of total commands vs. drawing commands sent
to the vgpu device. This gives some idea of how many state changes
are sent per draw call. The closer the ratio is to 1.0, the better.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Evidently, nobody has used PIPE_DRIVER_QUERY_TYPE_FLOAT up to this
point. Adding a driver query of this type which returns the query
value in pipe_query_result::f resulted in garbage output in the HUD.
The problem is the pipe_query_result::f field was being accessed as
through the u64 field and being added to the query_info::results_cumulative
field. This patch checks for PIPE_DRIVER_QUERY_TYPE_FLOAT in a few
places and scales the float by 1000 before converting to uint64_t.
Also, add some comments to explain the query_info::result_index field.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The hud_graph_add_value() function takes a double value, so just pass
the current/critical values as-is since they're doubles.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Set with_dri from with_gallium when DRI GLX is explicitly configured, as
well as when DRI GLX is chosen automatically.
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Symbol rename from dri_* to drm_intel_* introduced a number of compatability
defines within intel_bufmgr.h.
Replace the old function with the new function, consistent with the balance
of this file.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We still have more work to do but piglit results are looking
pretty good.
At GLSL 1.50 we have 30647/31118 piglit tests passing.
At GLSL 4.50 we have 37927/38551 piglit tests passing.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This shares more code and calls the new shared load_tess_varyings()
abi so that the radeonsi nir path now supports tcs output loads.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The code to load outputs is essentially the same as load inputs
so we make the interface more generic to maximise code sharing.
We will make use of the new support in the following patch.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This resolves a game bug in Dead Island. The game doesn't properly
handle ARB_get_program_binary with 0 supported formats, and ends up
crashing.
This will enable ARB_get_program_binary binary support for any
driver that currently enables the on-disk shader cache.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85564
These will be shared between the on-disk shader cache and
ARB_get_program_binary.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We can instead just get this from st_*_program.
V2: store tokens to to st_compute_program before attempting to
write to cache (fixes crash).
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We were using a sequence counter value to wait for a specific NotifyMSC
event. However, we can receive events from other clients as well, which
may already be using higher sequence numbers than us. In that case, we
could stop processing after an event from another client, which could
have been received significantly earlier. This would have multiple
undesirable effects:
* The computed MSC and UST values would be lower than they should be
* We could leave a growing number of NotifyMSC events from ourselves and
other clients in XCB's special event queue
I ran into this with Firefox and Thunderbird, whose VSync threads both
seem to use the same window. The result was sluggish screen updates and
growing memory consumption in one of them.
Fix this by checking the XCB sequence number and MSC value of NotifyMSC
events, instead of using our own sequence number.
v2:
* Use the Present event ID for the sequence parameter of the
PresentNotifyMSC request, as another safeguard against processing
events from other clients
* Rebase on drawable mutex changes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> # v1
This is not hooked up to any messages yet, but useful for e.g.
renderdoc if you add some messages during development.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
For also using it in radv. I moved the remaining stubs back to
anv_device.c as they were just trivial.
This does not move the vk_errorf/anv_perf_warn or the object
type macros, as those depend on anv types and logging.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Lowering these to temps makes a big mess, and results in some
piglit test failures. Also the radeonsi backend (the only backend
to support tess) has support for indirects so there is no need to
lower them anyway.
Fixes the following piglit tests on radeonsi:
tests/spec/arb_tessellation_shader/execution/variable-indexing/tes-input-array-vec3-index-rd.shader_test
tests/spec/arb_tessellation_shader/execution/variable-indexing/tes-input-array-vec4-index-rd.shader_test
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This commit unifies the CCS_E and CCS_D cases. This should fix a couple
of subtle issues. One is that when you use INTEL_DEBUG=norbc to disable
CCS_E, we don't get the sRGB blending workaround. By unifying the code,
we give CCS_D that workaround as well.
The second issue fixed by this refactor is that the blending workaround
was appears to be enabled on all gens but really only applies on gen9.
Due to a happy accident in the way code was laid out, it was only
getting enabled on gen9: gen8 and earlier don't support non-zero-one
clear colors, and gen10 supports sRGB for CCS_E so it got caught in the
format_ccs_e_compat_with_miptree case. This refactor moves it above the
format_ccs_e_compat_with_miptree case so it's an explicit early exit and
makes it explicitly only on gen9.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "17.3" <mesa-stable@lists.freedesktop.org>
This makes sure we flush things out of other caches prior to using a
surface through the render cache. Currently, this is a no-op because GL
won't let you bind anything other than a color surface as color so it
should never end up in the depth cache. However, this does complete the
flush/add_bo pair for regular drawing which will be required for the
next commit.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Improves performance of SynMark2 OglGSCloth by a further 9.65%±0.59%
due to the reduction in overwraps of the primitive count buffer that
lead to a CPU stall on previous rendering. Cummulative performance
improvement from the series 81.50% ±0.96% (data gathered on VLV).
Tested-By: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows us to aggregate the primitive counts of a completed
transform feedback begin/end block lazily, which in the most typical
case (where glDrawTransformFeedback is not used) will allow us to
avoid aggregating the primitive counters on the CPU altogether,
preventing a stall on previous rendering during
glBeginTransformFeedback(), which dramatically improves performance of
applications that rely heavily on transform feedback.
Improves performance of SynMark2 OglGSCloth by 65.52% ±0.25% (data
gathered on VLV).
Tested-By: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A primitive counter encapsulates a scalar aggregating counter for each
vertex stream along with a section within the primitive tally buffer
which hasn't been read out yet. Defining this as a separate type will
allow us to keep multiple counter objects around for the same
transform feedback object without any code duplication.
Tested-By: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'm guessing this may have been disable because of missing
component packing support. However recent nir linking changes
required nir based gallium drivers to support component packing
so this should now be ok to enable.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
vsplit_add_cache uses the post-bias index for hashing, but the
vsplit_add_cache_uint/ushort/ubyte ones used the pre-bias index, therefore
the code for handling the special case (because -1 matches the initialization
value of the cache) wasn't actually working.
Commit 78a997f728 actually simplified the
cache logic somewhat, but it looks like this particular problem carried over
(and duplicated to the ushort/ubyte cases, since before only uint needed it).
This could lead to the vsplit cache doing the wrong thing, in particular
later fetch_info might indicate there are 0 values to fetch. This only really
affected edge cases which were bogus to begin with, but it could lead to a
crash with the jit vertex shader, since it cannot handle this case correctly
(the count loop is always executed at least once and we would not allocate
any memory for the shader outputs), so add another assert to catch it there.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This is an optimisation that is recommended by Matt Arsenault,
and used by RadeonSI, but it's not compatible with Vulkan.
Note that AC_FLOAT_MODE_UNSAFE_FP_MATH includes the no signed
zeros flag in LLVM.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When that debug option is not used, we use the default float mode
because the no signed zeros optimisation is not Vulkan compatible.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This also replaces llvm.AMDGPU.kilp by llvm.AMDGPU.kill with
LLVM < 6. Similar to RadeonSI codepath.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
According with OpenGL GLSL 4.20 spec, section 4.3.9, page 57:
"It is a link-time error if any particular shader interface
contains:
- two different blocks, each having no instance name, and each
having a member of the same name, or
- a variable outside a block, and a block with no instance name,
where the variable has the same name as a member in the block."
This means that it is a link error if for example we have a vertex
shader with the following definition.
"layout(location=0) uniform Data { float a; float b; };"
and a fragment shader with:
"uniform float a;"
As in both cases we refer to both uniforms as "a", and thus using
glGetUniformLocation() wouldn't know which one we mean.
This fixes KHR-GL*.shaders.uniform_block.common.name_matching.
v2: add fixed tests (Tapani)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
glXGetDriverConfig parameters do not provide a context to dynamically
check for the presence of the function, so the dispatcher directly calls
glXGetDriverConfig, but in non-dri builds dri_glx.c didn't provide
glXGetDriverConfig.
This change make it just return NULL in that case.
Fixes: 84f764a759 "glxglvnddispatch: Add missing dispatch for GetDriverConfig
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This is to fix VA-API issues with GStreamer and MPEG2.
Since gstreamer does not pass quantiser matrices with each frame, invalid
pointers were being passed to the driver. This patch addresses the same.
Signed-off-by: Indrajit Das <indrajit-kumar.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This can't work for two reasons:
- TESSINNER/TESSOUTER are shader input values, so never translated
to the intrinsic ops
- the shader info pass scans the current stage but we want to know
in TCS, if TES reads the tess factors.
This fixes 6 regressions related to
deqp-vk/tessellation/shader_input_output/tess_level_{inner,outer}_XXX_tes
This reverts commit 5ba1a61648.
Without this initialization the temp registers used in tgsi_declaration
may used random indices, and this may result in failing translation from TGSI
with an error message "GPR limit exceeded", because the random index is greater
then the allowed limit implying that the shader uses more temporary registers then
available.
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
First try with a "soft" depth, to try to schedule sfu instructions
further from their consumers, but fall back to hard depth (which might
result in stalling) if nothing else is avail to schedule.
Previously the consumer of a sfu instruction could end up scheduled
immediately after (since "hard" depth from sfu to consumer would be 0).
This works because legalize pass would insert a (ss) sync bit, but it
is sub-optimal since it would cause a stall.
Instead prioritize other instructions for 4 cycles if they would no
cause a nop to be inserted. This minimizes the stalling. There is a
slight penalty in general to overall # of instructions in shader (since
we could end up needing nop's later due to scheduling the "deeper" sfu
consumer later), but ends up being a wash on register pressure.
Overall this seems to be worth a 10+% gain in fps. Increasing the
"soft" depth of sfu consumer beyond 4 helps a bit in some cases, but 4
seems to be a good trade-off between getting 99% of the gain and not
increasing instruction count of shaders too much.
It's possible a similar approach could help for tex/mem instructions,
but the (sy) sync bit seems to trigger a switch to a different thread-
group to hide memory latency (possibly with some limits depending on
number of registers used?).
Signed-off-by: Rob Clark <robdclark@gmail.com>
If the blit isn't changing format, but is changing tiling, just lie and
call things ARGB (since the exact component order doesn't matter for a
tiling blit).
Signed-off-by: Rob Clark <robdclark@gmail.com>
Overall a nice 5-10% gain for most games. And more for things like
glmark2 texture benchmark.
There are some rough edges. In particular, the hardware seems to only
support tiling or component swap. (Ie. from hw PoV, ARGB/ABGR/RGBA/
BGRA are all the same format but with different component swap.) For
tiled formats, only ARGB is possible. This isn't a big problem for
*sampling* since we also have swizzle state there (and since
util_format_compose_swizzles() already takes into account the component
order, we didn't use COLOR_SWAP for sampling). But it is a problem if
you try to render to a tiled BGRA (for example) surface.
The next patch introduces a workaround for blitter, so we can generate
tiled textures in ABGR/RGBA/BGRA, but that doesn't help the render-
target case. To handle that, I think we'd need to keep track that the
tiled format is different from the linear format, which seems like it
would get extra fun with sampler views/etc.
So for now, disabled by default, enable with FD_MESA_DEBUG=ttile. In
practice it works fine for all the games I've tried, but makes piglit
grumpy.
Signed-off-by: Rob Clark <robdclark@gmail.com>
The rules are sufficiently different for a5xx with tiled textures, so
split this out into something that can be implemented per-generation.
The a5xx specific implementation will come in a later patch.
Signed-off-by: Rob Clark <robdclark@gmail.com>
zlib provides a faster slice-by-4 CRC32 implementation than the
traditional single byte lookup one used by mesa. As most supported
platforms now link zlib unconditionally, we can easily use it.
Improvement for a 1MB buffer (avg MB/s, n=100, zlib 1.2.8):
i5-6600K C2D E4500
mesa zlib mesa zlib
443 1443 225% +/- 2.1% 403 1175 191% +/- 0.9%
It has been verified the calculation results stay the same after this
change.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The next change wants to use some optional zlib functionality, however
not all platforms currently use zlib. Based on earlier Jordan Justen's
patches and their review feedback.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Fixes a number of int64 piglit tests, for example:
generated_tests/spec/arb_gpu_shader_int64/execution/built-in-functions/fs-sign-i64vec2.shader_test
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
V2: just zero-extend the 32-bit value.
Fixes a number of int64 piglet tests, for example:
generated_tests/spec/arb_gpu_shader_int64/execution/conversion/frag-conversion-explicit-bool-int64_t.shader_test
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
assert() is replaced by unreachable(), to avoid following building error:
external/mesa/src/gallium/drivers/radeonsi/si_shader.c:1967:1:
error: control may reach end of non-void function [-Werror,-Wreturn-type]
}
^
1 error generated.
Fixes: c797cd6 ("ac: add load_patch_vertices_in() to the abi")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Fixes a bunch of arb_gpu_shader_fp64 piglit tests for example:
generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-mix-double-double-double.shader_test
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
I had 3.x putting swizzling in the texture state only for 16-bit texture
returns, and in the shader for 32-bit. This may be due to having mixed up
the return channel setup on 3.x back before I had moved it into the
compiler. On 4.x, the non-border-color texwrap tests are passing nicely
with both 16 and 32-bit returns with swizzling in the texture state.
The LDVARY signal now writes an arbitrary register, so I took out the
magic src register file and replaced it with an instruction with LDVARY
set so we have somewhere to hang a QFILE_TEMP destination for register
allocation.
The V3D 3.x series of TMU writes with meaning depending on the texture
type is replaced with writes to specific registers for each texture
argument semantic.
This fills in the delay slots of thread end as much as we can (other than
being cautious about potential TLBZ writes).
In the process, I moved the thread end THRSW instruction creation to the
scheduler. Once we start emitting THRSWs in the shader, we need to
schedule the thread-end one differently from other THRSWs, so having it in
there makes that easy.
Now, instead of a magic write register for VPM stores we have an
instruction to do them (which means no packing of other ALU ops into it),
with the ability to reorder the VPM stores due to the offset being baked
into the instruction.
VPM loads also gain the ability to be reordered by packing the row into
the A argument. They also no longer write to the r3 accumulator, and
instead must be stored to a physical register.
This required moving the register accesses to a separate v3dx file, since
the register definitions for each V3D version collide. It seems that
initializing the v3d_hw from a file dictating 3.3
(v3d_simulator_wrapper.cpp) is safe, though.
The WRTMUC replaces the implicit uniform loads in the first two texture
instructions. LDVPM disappears in favor of an ALU op. LDVARY, LDTMU,
LDTLB, and LDUNIF*RF now write to arbitrary registers, which required
passing the devinfo through to a few more functions.
The TLB load/store path is rebuilt in this version. There is no longer a
single-byte resolved store or the 3-byte extended store. Instead, you get
to always use general loads/stores (which, honestly, was tempting even in
previous versions).
To conditionally compile cl_emit() macros per V3D version, we need it to
expand to whatever V3D we're building for. This required emitting #define
V3D_VERSION 33 in all our currently 3.3-only code.
We try to emit a #error and continue so that you can debug the missing
type at C compile time, but were missing a couple of definitions in that
path (sigh, python).
To avoid compilation warnings and because this helper
shouldn't update anything.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Upon reception of an event that lowered the number of active back buffers,
the code would immediately try to free all back buffers with an id equal to or
higher than the new number of active back buffers.
However, that could lead to an active or to-be-active back buffer being freed,
since the old number of back buffers was used when obtaining an idle back
buffer for use.
This lead to crashes when lowering the number of active back buffers by
transitioning from page-flipping to non-page-flipping presents.
Fix this by computing the number of active back buffers only when trying to
obtain a new back buffer.
Fixes: 15e208c4cc ("loader/dri3: Don't accidently free buffer holding new back content")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104214
Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Tested-by: Andriy.Khulap <andriy.khulap@globallogic.com>
Tested-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
From Vulkan spec:
"descriptorCount is the number of descriptors contained in the binding,
accessed in a shader as an array. If descriptorCount is zero this
binding entry is reserved and the resource must not be accessed from
any stage via this binding within any pipeline using the set layout."
Fixes:
dEQP-VK.binding_model.descriptor_update.empty_descriptor.uniform_buffer
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
ARB_ubo requires 12 UBOs (per stage) at least, but this limit has been
raised by GL 4.3 to 14, so don't advertize GL 4.3 without it (only checking
the vertex stage since all drivers probably have the same limit anyway for
other stages). (piglit has minmax tests for that kind of thing, but they go
only up to 3.3, so this won't really be noticed.)
I think this currently should not affect any driver - r600 until very
recently only supported 12 but now advertizes 14 too.
Reviewed-by: Brian Paul <brianp@vmware.com>
We've seen some problems internally due to macro redefinition.
Fix this by adding HAVE_FUNC_ATTRIBUTE_NORETURN to c99_compat.h,
and defining it for msvc.
And avoid redefinition just in case.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
If the number of instances hasn't changed and we've already
emitted it, don't emit it again.
If the vertex shader is the same and the first_instance, vertex_offset
haven't changed don't emit them again.
This increases the fps in GL_vs_VK -t 1 -m -api vk from around 40
to around 60 here, it may not impact anything else.
Dieter also reported smoketest going from 1060->1200 fps.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Running dota2 since the below commit crashes with an llvm assert.
Trim the vector like the other user. This possible could also be
avoided by not padding inside the load vec3->vec4.
Fixes: 41c36c4549 (amd/common: use ac_build_buffer_load() for emitting UBO loads)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This creates two new internal dependencies, idep_nir_headers and
idep_nir. The former encapsulates the generation of nir_opcodes.h and
nir_builder_opcodes.h and adding src/compiler/nir as an include path.
This ensures that any target that needs nir headers will have the
includes and that the generated headers will be generated before the
target is build. The second, idep_nir, includes the first and
additionally links to libnir.
This is intended to make it easier to avoid race conditions in the build
when using nir, since the number of consumers for libnir and it's
headers are quite high.
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
These were added after adderlib was mesonified, but it still good to use
them instead of open coding them.
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Currently the meosn build has a mix of two styles:
arg : [foo, ...
bar],
and
arg : [
foo, ...,
bar,
]
For consistency let's pick one. I've picked the later style, which I
think is more readable, and is more common in the mesa code base.
v2: - fix commit message
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
We already had to switch all of the W types to UW to prevent issues
with vector immediates on gen10. We may as well use unsigned types
everywhere.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Gen 10 has a strange hardware bug involving V immediates with W types.
It appears that a mov(8) g2<1>W 0x76543210V will actually result in g2
getting the value {3, 2, 1, 0, 3, 2, 1, 0}. In particular, the bottom
four nibbles are repeated instead of the top four being taken. (A mov
of 0x00003210V yields the same result.) This bug does not appear in any
hardware documentation as far as we can tell and the simulator does not
implement the bug either.
Commit 6132992cdb was mostly a no-op
except that it changed the type of the subgroup invocation from UW to W
and caused us to tickle this bug with basically every compute shader
that uses any sort of invocation ID (which is most of them). This is
also potentially an issue for geometry shader input pulls and SampleID
setup. The easy solution is just to change the few places where we use
a vector integer immediate with a W type to use a UW type.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Fixes: 6132992cdb
Without this we end up with the llvm error message:
"Both operands to a binary operator are not of the same type!"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Without this we end up with the llvm error message:
"Both operands to a binary operator are not of the same type!"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Without this we end up with the llvm error message:
"Both operands to a binary operator are not of the same type!"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Per spec:
"Additionally, exporting a fence payload to a handle with copy transference has the same side effects
on the source fence’s payload as executing a fence reset operation. If the fence was using a
temporarily imported payload, the fence’s prior permanent payload will be restored."
And similar for semaphores:
"Additionally, exporting a semaphore payload to a handle with copy transference has the same side
effects on the source semaphore’s payload as executing a semaphore wait operation. If the
semaphore was using a temporarily imported payload, the semaphore’s prior permanent payload
will be restored."
Fixes: 42bc25a79c "radv: Advertise sync fd import and export."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Some cases weren't handled, such as stride 4 which is needed for 64-bit
operations. Presumably fixes the assertion failure mentioned in commit
2d04572038 (Revert "i965/fs: Use align1 mode on ternary instructions
on Gen10+") but who can really say since the commit neglected to list
any of them!
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
After executing a secondary command buffer, we need to update certain
state on the primary command buffer to reflect changes by the secondary.
Otherwise subsequent commands may not have the correct state set.
This fixes various issues (rendering errors, GPU hangs) seen after
executing secondary command buffers in some cases.
v2 (Jason Ekstrand):
- Reset to invalid values instead of pulling from the secondary
- Change the comment to be more descriptive
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
anv_extensions usage from anv_icd was bringing the unwanted dependency
of mako templates for the latter. We don't want that since it will
force the dependency even for distributable tarballs which was not
needed until now.
Jason suggested this approach.
v2: Patch simplification (Jason).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104551
Fixes: 0ab04ba979 ("anv: Use python to generate ICD json files")
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
"The maxDescriptorSet* limit is n times the corresponding
maxPerStageDescriptor* limit, where n is the number of shader stages
supported by the VkPhysicalDevice. If all shader stages are supported,
n = 6 (vertex, tessellation control, tessellation evaluation,
geometry, fragment, compute)."
Fixes:
dEQP-VK.api.info.device.properties
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes the follow test for radeonsi nir:
tests/spec/arb_tessellation_shader/execution/quads.shader_test
Also stops 8 other tests from crashing, they now just fail e.g.
tcs-output-array-float-index-rd-after-barrier.shader_test
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
If they were promoted from inputs/outputs, they could have a
non-zero value left over, which messed with our store handling.
Fixes: 06f05040eb "radv: Link shaders."
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Many of the functions declared in tgsi_build.h return structs (not struct
pointers). Therefore the full struct definitions are needed to avoid
warnings or errors:
In file included from src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp:23:
external/mesa3d/src/gallium/auxiliary/tgsi/tgsi_build.h:47:1: error: 'tgsi_build_header' has C-linkage specified, but returns incomplete type 'struct tgsi_header' which could be incompatible with C [-Werror,-Wreturn-type-c-linkage]
This error shows up on Android builds using clang and -Werror.
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Herring <robh@kernel.org>
BuilderSWR::swr_gs_llvm_fetch_input() (and consequently
swr_gs_llvm_fetch_input()), did not handle the case where
is_vindex_indirect or is_aindex_direct is set.
Implement it, using the code in draw_llvm.c as a guideline.
Fixes the following piglit tests:
dynamic_input_array_index (crash)
gs-input-array-vec4-index-rd
vs-output-array-vec4-index-wr-before-gs
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
This is unused because it's for libGL/libEGL, not drivers.
v2: i965 was wrong, because it used dri_util instead of its own config.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Using a gather for elements less than 32-bits in size can cause
pagefaults when loading the last elements in a page-aligned-sized
buffer.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
If MaxAttribs were ever raised to 32, undefined behavior would occur.
We had already gone to the effort (albeit incorrectly) handle this in
one case, so fix them all.
CID: 1369628
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
If max_index were ever 32, the linker would have marked all 32
locations as invalid instead of marking none of them as invalid. It's
a good thing the maximum value actually set by any driver for
MaxAttribs is 16.
Found by inspection while investigating CID 1369628.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
All cases where the result could be non-visit_continue would have
already returned.
CID: 401351, 1224465, 1224466
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
None of these are necessary because result->type is the only thing used
outside the giant switch-statement.
CID: 1230983, 1230984
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Tested with a modified deferred demo and no regressions in a 1.0.2
mustpass run.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The EXT values are really large, e.g.
VK_DYNAMIC_STATE_DISCARD_RECTANGLE_EXT = 1000099000, so 1 << value
is not going to fit into a 32-bit mask.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
It makes more sense to rely on nir_intrinsic_load_push_constant
instead of the pipeline layout.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When a TCS is present at link time we know the number of vertices in the
patch and we can lower gl_PatchVerticesIn in the TesEval stage directly
to a constant. We already have a pass for this that we use in the
Vulkan pipeline, so we just reuse that.
Notice that the GLSL linker also implements this optimization, which
we are not removing because other drivers may still depend on it, so
this should only be useful for OpenGL SPIR-V shaders for now.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Intel was the only user and now NIR can do the lowering.
v2: do not try to handle it as a system value directly for the SPIR-V
path. In GL we rather handle it as a uniform like we do for the
GLSL path (Jason).
v3: drop LowerTESPatchVerticesIn as well (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We want this here instead of nir_lower_system_values because for
Vulkan we don't want this lowering to take place.
v2: do not try to handle it as a system value directly for the SPIR-V
path. In GL we rather handle it as a uniform like we do for the
GLSL path (Jason).
v3: do this also for the TessEval stage (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: do not try to handle it as a system value directly for the SPIR-V
path. In GL we rather handle it as a uniform like we do for the
GLSL path (Jason).
v3:
- Remove the uniform variable, it is alwats -1 now (Jason)
- Also do the lowering for the TessEval stage (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Similar to const buffers. The driver must not emit any tes-related state if tes
is disabled, since the hw slots are all shared by VS, therefore it would
overwrite them (the mesa state tracker might not do this, but it would be
perfectly legal to do so).
Nevertheless I think the dirty state tracking logic in the driver is
fundamentally flawed when tes is disabled/enabled, since it looks to me like
the VS (and TES) state would not get reemitted to the correct slots (if it's
not dirty anyway). Unless I'm missing something...
Theoretically, the overwrite problem could be solved by using non-overlapping
resource slots for TES and VS (since we're not even close to using half the
resource slots), but it wouldn't work for constant buffers nor samplers, and
for VS would still need to propagate changes to both LS and VS, so probably
not a useful idea.
Unfortunately there's zero coverage of this with piglit, since all tessellation
shader tests are just shader_runner tests, which are unsuitable for testing
any kind of state dependency tracking issues (so I can't even quickly hack
something up to proove it and fix it...).
TCS otoh is just fine - like GS it has its own hw slots.
Tested-by: Konstantin Kharlamov <hi-angel@yandex.ru>
Reviewed-by: Dave Airlie <airlied@redhat.com>
With the exception of the default tess levels only ever accessed
by the default tcs shader, the LDS_INFO const buffer was only accessed by vtx
instructions, and not through kcache. No idea why really, but use this to our
advantage by not using a constant buffer slot for it. This just requires us to
throw the default tess levels into the "normal" driver const buffer instead.
Alternatively, could acesss those constants via vtx instructions too, but then
we couldn't use a ordinary ureg prog accessing them as constants and would have
to generate that directly when compiling the default tcs shader. (Another
alternative would be to put all lds info into the ordinary driver const
buffer, albeit we'd maybe need to increase the fixed size as it can't fit
alongside the ucp since vs needs access to the lds info too.)
Tested-by: Konstantin Kharlamov <hi-angel@yandex.ru>
Dave Airlie <airlied@redhat.com>
Contrary to what the comment said, this appears to work just fine on my rv770
(tested with piglit textureSize 140 fs/vs samplerBuffer).
Dave Airlie confirmed it working on cayman too.
I have no clue though if it's actually preferrable to use it (unfortunately
we cannot get rid of the tex constants completely, as we still require them
for cube map txq).
Albeit filling in the format (1 channels or 4?) and the stuff related to mega-
or mini-fetch (what the hell is this...) is just a guess based on other usage
of vtx fetch instructions...
v2: it really needs to be done through texture cache (I botched the
testing because sb optimizations turned it automatically into tc, but
can't rely on it and isn't happening on tes).
Tested-by: Konstantin Kharlamov <hi-angel@yandex.ru>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Ideally we'd support 16 (d3d11 requires 15, and mesa subtracts one for non-ubo
constants), but that's kind of impossible (it would be only doable if either
we'd somehow merge the mesa non-ubo constants with the driver constants, or
only use the driver constants with vtx fetch instead of through the kcache
mechanism - the latter probably wouldn't be too bad).
For now just do as the comment already said, place the gs ring (not really
a const buffer in any case) which is only ever referred to through vc fetch
clauses at index 16. Throw in a couple asserts for good measure to make sure
the hw limit isn't exceeded.
Tested-by: Konstantin Kharlamov <hi-angel@yandex.ru>
Reviewed-by: Dave Airlie <airlied@redhat.com>
We only did this for the other stages, but obviously tess eval/ctrl need it
too.
This fixes the (newly modified) piglit texturing/textureSize test when run
with tes stage and bufferSampler.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Juniper really has a maximum of 4 RBEs (16 pixels). However, predication
always locks up on my HD 5750, and through experiments it looks like if we're
pretending it has a maximum of 8, with 4 disabled, it works correctly.
My conclusion would be that there's a bug (likely firmware, not hw) which
causes the predication logic to try to read 8 results out of the query buffer
instead of just 4, and since of course noone ever writes the upper 4, the
status bit is never set and hence it will wait for it forever.
Ideally this would be fixed in firmware, but I'd guess chances of that
happening are slim.
This will double the size of (occlusion) query result buffers, write the
status bit for the disabled rbs in these buffers, and will also add 8 results
together instead of just 4 when reading them back. The latter is unnecessary,
but it's probably not worth bothering - luckily num_render_backends isn't
used outside of occlusion queries, so don't need separate value for the
"real" maximum.
Also print out the enabled_rb_mask if it changed from the pre-fixed value
(which is already printed out), just in case there's some more problems
with chips which have some rbs disabled...
This fixes all the lockups with piglit nv_conditional_render tests on my
HD 5750 (all pass).
Reviewed-by: Dave Airlie <airlied@redhat.com>
The logic had two fatal flaws which completely killed the default value.
1) drm will overwrite the value anyway even if the chip can't be handled
2) the default value logic is relying on num_render_backends, which was
filled in later.
Luckily noone is relying on it, but it's a bit confusing seeing the chip clock
printed out there (as hex) with R600_DEBUG=info...
(Albeit radeonsi does not appear to fix up the value. If kernels which don't
handle this query are still supported, radeonsi will still end up with a broken
enabled_rb_mask, I have no idea of the potential results of this there.)
Reviewed-by: Dave Airlie <airlied@redhat.com>
For eg/cm, the r600_gb_backend_map will always be 0. This is a bug in
the drm kernel driver, as it just just never fills the information in
(it is now being fixed - the history shows it was being filled in when
the query was brand new but got lost shortly thereafter with backend_map
fixes).
This causes r600_query_hw_prepare_buffer to write the "status bit"
(just the highest bit of the occlusion query result) even for active rbes
(all but the first). This doesn't make much sense, albeit I suppose it's mostly
safe. According to the commit history, it's necessary to set these bits for
inactive rbes since otherwise predication will lock up - presumably the hw just
is waiting for the status bit to appear, which will never happen with inactive
rbes. I'd guess potentially predication could be wrong (due to not waiting for
the actual result if the status bit is already there) if this is set for
active rbes.
Discovered while trying to fix predication lockups on Juniper (needs another
patch).
Reviewed-by: Dave Airlie <airlied@redhat.com>
This fixes the new piglit test.
While here also fix up the logic for early exit of setting up driver consts.
Tested-by: Konstantin Kharlamov <hi-angel@yandex.ru>
Reviewed-by: Reviewed-by: Dave Airlie <airlied@redhat.com>
The offset looks bogus to me. Albeit in the end it doesn't matter, by the
looks of it offsets smaller than 4 get ignored there (not sure of the rules,
I suppose either non-dword aligned offsets never work there or the offset
must be at least aligned to the size of a single element).
Tested-by: Konstantin Kharlamov <hi-angel@yandex.ru>
Reviewed-by: Dave Airlie <airlied@redhat.com>
../../../src/mesa/main/shaderapi.c: In function ‘_mesa_ShaderBinary’:
../../../src/mesa/main/shaderapi.c:2188:9: error: implicit declaration of function ‘alloca’ [-Werror=implicit-function-declaration]
Apparently, Geminilake requires you to whack a chicken bit to select
either compute or tessellation mode for barriers. The recommendation
is to switch between them at PIPELINE_SELECT time.
We may not need to do this all the time, but I don't know that it hurts
either. PIPELINE_SELECT is already a pretty giant stall.
This appears to fix hangs in tessellation control shaders with barriers
on Geminilake. Note that this requires a corresponding kernel change,
drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake.
in order for the register write to actually happen. Without an updated
kernel, this register write will be noop'd and the fix will not work.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Memtrace aubs are similar to classic aubs, with the major
difference being how command submission is serialized (as register
writes instead of a high-level submit message). Some internal
tools generate or consume only memtrace aubs.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
New generated files from:
bb1e6ff161 ("spirv: Add a prepass to set types on vtn_values")
65fc16c974 ("autotools: set XA versions in configure.ac and configure header file")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This has only been compile tested.
v2: - Have a single option for opencl (Eric E)
- fix typo "tgis" -> "tgsi" (Curro)
- Don't add "lib" to pipe loader libraries, which matches the
autotools behavior
v3: - Remove trailing whitespace
- Make PIPE_SEARCH_DIR an absolute path
v4: - add trailing / to LIBCLC defines
Acked-by: Curro Jerez <currojerez@riseup.net>
Tested-by: Jan Vesely <jan.vesely@rutgers.edu>
cc: Aaron Watry <awatry@gmail.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
This enables the SWR driver, but doesn't actually hook it up to any of
the targets yet. I felt like this patch was big and complicated enough
without adding that.
v2: - Fix typo 'delemeited' -> 'delimited' (Eric E)
- Fix type 'errror' -> 'error' (Eric E)
- Use variables to hold files instead of looking above the current
meson build (Eric E)
- Use foreach loops to reduce the number of unique generators
- Add comment about why some generators have names and some are just
added to a list
v3: - Remove trailing whitespace
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
nir_to_llvm_context will always be NULL for radeonsi so we need
work around this.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes the following piglit tests in radeonsi:
vs-tcs-tes-tessinner-tessouter-inputs-quads.shader_test
vs-tcs-tes-tessinner-tessouter-inputs-tris.shader_test
vs-tes-tessinner-tessouter-inputs-quads.shader_test
vs-tes-tessinner-tessouter-inputs-tris.shader_test
v2: make use of si_shader_io_get_unique_index_patch()
via the helper in the previous patch rather than
shader_io_get_unique_index()
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will be shared by the tgsi and nir backends.
v2: move si_shader_io_get_unique_index_patch() call inside
the helper.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Technically, the GLSLang bug related to this can also affect SSBO writes
where the bool -> uint conversion is missing. However, the only known
shipping application with an old enough version of GLSLang to cause
issues with this is the new DOOM game so we keep the workaround as small
as possible.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104424
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Previously, we were storing a pointer to the vtn_value because we use it
to look up decorations when we create input/output variables. This
works, but it also may be useful to have the id itself so we may as well
store that instead.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Now that higher levels are enforcing decoration sanity, we don't need
the vtn_asserts here. This function *should* be safe but we still want
a few well-placed regular asserts in case something goes awry.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This reworks the error checking on our generic handling of decorations.
The objective is to validate all of the SPIR-V assumptions we make
up-front and convert redundant checks to compiled-out asserts. The most
important part of this is to ensure that member decorations only occur
on OpTypeStruct and that the member is never out-of-bounds. This way
later code can assume that the member is sane and not have to worry
about OOB array access due to a misplaced OpMemberDecorate.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This is a bit simpler since we have fewer enum values in the case. It's
also a bit more efficient because we're making fewer glsl_get_* calls.
While we're at it, add better type validation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Now that vtn_base_type is a real and full base type, we can switch on
that instead of the GLSL base type which is a lot fewer cases in our
switch.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
VGPR1 = InstanceID / StepRate0; // StepRate0 can be set to 1
Ported from RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For Vega10 and Raven that need a special workaround for the
scissor bug.
This seems to give a minor boost for Talos and Dota 2, at least.
To reduce the cost of memcmp, the driver checks if it's
really useful to do the comparison.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
VGPR1 is only needed for topology that needs 3 offsets like
triangles or quads.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
../../src/util/.libs/libmesautil.a(libmesautil_la-u_queue.o): In function `u_thread_setname':
/builddir/build/BUILD/mesa-17.3.1/src/util/../../src/util/u_thread.h:66: undefined reference to `pthread_setname_np'
../../src/util/.libs/libmesautil.a(libmesautil_la-u_queue.o): In function `thrd_join':
/builddir/build/BUILD/mesa-17.3.1/src/util/../../include/c11/threads_posix.h:336: undefined reference to `pthread_join'
../../src/util/.libs/libmesautil.a(libmesautil_la-u_queue.o): In function `u_thread_create':
/builddir/build/BUILD/mesa-17.3.1/src/util/../../src/util/u_thread.h:48: undefined reference to `pthread_sigmask'
../../src/util/.libs/libmesautil.a(libmesautil_la-u_queue.o): In function `thrd_create':
/builddir/build/BUILD/mesa-17.3.1/src/util/../../include/c11/threads_posix.h:296: undefined reference to `pthread_create'
../../src/util/.libs/libmesautil.a(libmesautil_la-u_queue.o): In function `u_thread_create':
/builddir/build/BUILD/mesa-17.3.1/src/util/../../src/util/u_thread.h:50: undefined reference to `pthread_sigmask'
/builddir/build/BUILD/mesa-17.3.1/src/util/../../src/util/u_thread.h:50: undefined reference to `pthread_sigmask'
../../src/util/.libs/libmesautil.a(libmesautil_la-u_queue.o): In function `call_once':
/builddir/build/BUILD/mesa-17.3.1/src/util/../../include/c11/threads_posix.h:96: undefined reference to `pthread_once'
../../src/util/.libs/libmesautil.a(libmesautil_la-u_queue.o): In function `u_thread_get_time_nano':
/builddir/build/BUILD/mesa-17.3.1/src/util/../../src/util/u_thread.h:84: undefined reference to `pthread_getcpuclockid'
collect2: error: ld returned 1 exit status
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Igor Gnatenko <ignatenko@redhat.com>
This was never enabled in secondary buffers because hiz_enabled was
never set to true for those.
If the app provides a framebuffer in the inheritance info when beginning
a secondary buffer, we can determine if HiZ is enabled and therefore
allow the PMA optimization to be enabled within the command buffer.
This improves performance by ~13% on an internal benchmark on Skylake.
v2: Use anv_cmd_buffer_get_depth_stencil_view().
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Respect the std430 rules for determining offset and size of struct
members when using a std430 buffer. std140 rules lead to wrong buffer
offsets in that case.
Fixes my test case attached in Bugzilla. No piglit changes.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104492
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
A part of the driver constbuf area is allocated for bindless images. Any
update requires uploading to all driver constbufs. This also extends the
driver constbuf to 64KB, up from 2KB.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This keeps a list of resident textures (per context), and dumps that
list into the active buffer list when submitting. We also treat bindless
texture fetches slightly differently, wrt the meaning of indirect, and
not requiring the SAMPLER file to be used.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
In preparation for bindless images, we have to retrieve the
target/format info from the instruction directly, as there will be no
declaration. Furthermore, for bound images, this information is still
available in the instruction, so we can drop the declaration-based
mechanism entirely.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
I'm fairly sure both of the changed sites are OK as-is, but they're
fragile, so this is just safening them up. Since this is happening
pre-ssa, we don't want to be overwriting values that may potentially get
used later on.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
If we free the bo, then the PTE may get deallocated immediately. We have
to make sure that the submission includes a ref to the old bo so that it
remains mapped for the duration of the command execution.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
intel_batchbuffer_emit_float is dead code, it should go.
intel_batchbuffer_emit_dword only had one user, which had bungled using
them by forgetting to call intel_batchbuffer_require_space first. So it
seems wise to delete these unsafe helpers.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
intel_batchbuffer_emit_dword doesn't reserve space for the DWord it
emits. In the past, we had some reserved batch space to ensure this
worked. With the switch to growing batches, we need to actually request
space so that we grow if necessary.
Fixes: 2c46a67b41 (i965: Delete BATCH_RESERVED handling.)
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The enums are moved to the top and indented like the rest of the file.
Comments are added to split up the function aliases by corresponding
extension. This should make no functional difference.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If we have a color attachment, but its writes are masked, this would
have still returned true. This is inconsistent with how HasWriteableRT
in 3DSTATE_PS_BLEND is set, which does take the mask into account.
This could lead to PixelShaderHasUAV not being set in 3DSTATE_PS_EXTRA
if the fragment shader does use UAVs, meaning the fragment shader may
not be invoked because HasWriteableRT is false. Specifically, this was
seen to occur when the shader also enables early fragment tests: the
fragment shader was not invoked despite passing depth/stencil.
Fix by taking the color write mask into account in this function. This
is consistent with how things are done on i965.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Similar to RadeonSI.
This fixes:
dEQP-VK.image.texel_view_compatible.graphic.basic.attachment_read.bc*r16g16b16a16_sfloat
dEQP-VK.image.extended_usage_bit.attachment_write.r16_sfloat
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When allocate_user_sgprs() was called, ctx->stage was actually
unset and 0 is for the vertex shader. This doesn't change
anything for now because of the spill support thing.
Though, the number of user SGPRs has to be fixed for merged
shaders on GFX9. It was broken before anyway.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This might decrease VGPR spilling, because we no longer
have to use v4i32 for 2D fetches when level == 0. We now
use v2i32 for those cases.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Using 4, as it is the default value on mesa. See mesa/main/config.h
and the following commit that introduced the value:
15ac66e331
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
ARB_transform_feedback3 sets a minimum of 1, ARB_gpu_shader5 a minimum
of 4. It shouldn't matter too much, so choosing the later.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Used to handle how many ubo you can define on the context. Minimimum
defined as 36 on ARB_uniform_buffer_object spec, up to 84 on OpenGL
4.6 (12 per stage at each moment).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Every now and then I execute the standalone compiler, get the
non-version error, and need to remember what I'm doing wrong
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This allows us to pass the llvm param directly rather than looking
it up.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Copied from radeonsi.
Putting in the correct metadata flush commands for eventually not
flushing L2 on CB/DB switch.
Does not remove the need for V_028A90_CACHE_FLUSH_AND_INV_TS_EVENT
at the moment.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
These are just shaders reads, so we need to invalidate L1.
Fixes: 6dbb0eaccc "radv: handle subpass cache flushes"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
It makes more sense to move all scan stuff in the same place.
Also, we don't really need to duplicate the uses_primid field
for each stages.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Commit 2f421651ac ("egl: let each platform decided how to handle
LIBGL_ALWAYS_SOFTWARE") broke the build due to copy-n-paste of misnamed
function parameter.:
src/egl/drivers/dri2/platform_android.c:1183:8: error: use of undeclared identifier 'disp'
Rather than just fixing 'disp', rename the function parameter 'dpy' to
'disp' to align with the other EGL platforms' implementations.
Fixes: 2f421651ac ("egl: let each platform decided how to handle LIBGL_ALWAYS_SOFTWARE")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Some later code relies on _Layer to set first/last_layer. Make sure it's
always initialized.
Detected by valgrind's conditional jump/move with uninit value logic.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
If alpha-to-coverage is enabled, we have to compute alpha
even if color writes are disabled.
Signed-off-by: Józef Kucia <joseph.kucia@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
sync_files are in linux since 4.7, while the amdgpu fence_to_handle
ioctl is only in 4.15.
In particular we don't need it for sync_file in radv, because
everything happens via syncobjs, which got support earlier than
fence_to_handle.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When rasterization is disabled we can have that few.
Fixes: 76603aa90b "radv: Drop the default viewport when 0 viewports are given."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Seems like users are actually hitting 0xFFFFFFFF actually making
things broken for them, and the mad max regression is fixed, so
lets put this in once more.
v2: Use 0xf for depth-only htile. (Dave)
Fixes: af2844116f "radv: Revert HTILE reset word to 0xFFFFFFFF."
Reviewed-by: Dave Airlie <airlied@redhat.com>
We were trying to load/store the logical width/height number of compressed
blocks. As long as the textures were large, single-level, and the
load/store at (0,0), it kind of worked.
I want to do the SETMSF.IFA to discard only if execute == 0 and cond, so
our dest of the PUSHZ needs to be nonzero if execute or !cond are nonzero.
Fixes dEQP-GLES3.functional.shaders.discard.dynamic_loop_dynamic.
Apparently the other funcs will have observable differences when early Z
is enabled.
Fixes (new) simulator assertion failures in
dEQP-GLES3.functional.rasterizer_discard.basic.clear_depth.
For enums we were doubling the underscore if the value had a numeric first
character of its name (which safe_name() adds an underscore to). A little
helper function cleans up the other instance of prefixing while also
fixing this.
This means that with no flatshading we'll emit the single-byte
ZERO_ALL_FLAT_SHADE_FLAGS, and otherwise emit a set of FLAT_SHADE_FLAGS to
get all the bits we need set.
There's a _SET enum in the packet we could use to possibly set entire
ranges of the bitfield without using another packet, but this at least
fixes the conformance failure.
In updating the simulator, behavior changed slightly so that our old code
wasn't getting glxgears's flatshading interpolated right. Emit flat
shading code just like we would for a normal flat-shaded varying, by
passing a flag in the shader key for glShadeModel(GL_FLAT) state and
customizing the color inputs based on that.
It seems that the HW team has decided that it's the only supported mode,
and it's the mode I actually meant to be using but forgot. Our table of
return_32_bit should have matched the default non-OVRTMUOUT behavior, so
this change should be invisible.
However, the change revealed that some my return_size checks for swizzling
were a bit confused in the shadow case, so I had to move them to draw time
once we have both the sampler and the view together.
Fixes assertion failures in the updated simulator, where the non-OVRTMUOUT
support has been removed.
The compiler decides how many LDTMUs we're going to emit, and that must
match the P1 flags. This brings the return channel counting to a single
place (so all that's passed into the compiler is "how many return channels
you may request from this texture's format), and was a necessary step for
shadow samplers once we stop using OVRTMUOUT=0.
This means that we get a single copy of it emitted, instead of once at the
start of each tile (though it's still executed once per tile). Fixes
assertion failures with the updated simulator.
In newer versions they've removed the C interface, so make one here. This
also isolates the Mesa codebase from the simulator codebase, so we don't
have conflicts over things like "unreachable"
As Marek noted, the GL_RGBA + GL_UNSIGNED_INT_2_10_10_10_REV type
combo is also good for readback of BGRX1010102 framebuffers, not
only for BGRA1010102 framebuffers for use with glReadPixels()
under GLES, so add it for the GL_IMPLEMENTATION_COLOR_READ_TYPE_OES
query.
Successfully tested on gallium r600 driver with a (quickly hacked
for RGBA 10 10 10 0) dEQP testcase
dEQP-EGL.functional.wide_color.window_1010102_colorspace_default.
Suggested-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Some clients may not like rgb10 fbconfigs and visuals.
Support driconf option 'allow_rgb10_configs' on gallium
to allow per application enable/disable.
The option defaults to enabled.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Successfully tested under Weston 3.0.
Photometer confirms 10 rgb bits from rendering to display.
v2: Rebased onto master for dri2_teardown_wayland().
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Enables eglCreateImageKHR() with target set to
EGL_NATIVE_PIXMAP_KHR to handle color depth 30
X11 drawables.
Note that in theory the drawable depth 32 case in the
current implementation is ambiguous: A depth 32 drawable
could be of format ARGB8888 or ARGB2101010, therefore an
assignment of __DRI_IMAGE_FORMAT_ARGB8888 for a pixmap of
ARGB2101010 format would be wrong. In practice however, the
X-Server (as of v1.19) does not provide any depth 32 visuals
for ARGB2101010 EGL/GLX configs. Those are associated with
depth 30 visuals without an alpha channel instead. Therefore
the switch-case depth 32 branch is only executed for ARGB8888
pixmaps and we get away with this.
Tested with KDE Plasma 5 under X11, DRI2 and DRI3/Present,
selecting EGL + OpenGL compositing and different fbconfigs
with/without 2 bit alpha channel. glxinfo confirms use of
depth 30 visuals for ARGB2101010 only.
Suggested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Similar to the matching of 24 bit RGB visuals to 32-bit
RGBA EGLConfigs, so X11 compositors won't alpha-blend any
config with a destination alpha buffer during compositing.
Additionally this fixes failure to select ARGB2101010
configs via eglChooseConfig() with EGL_ALPHA_BITS 2 on
a depth 30 X-Screen. The X-Server doesn't provide any
visuals of depth 32 for ARGB2101010 configs, it only
provides depth 30 visuals. Therefore if we'd only match
ARGB2101010 configs to depth 32 RGBA visuals, we would
not ever get a visual for such a config.
This was apparent in piglit tests for egl configs, which
are fixed by this commit.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This format + type combo is good for BGRA1010102 framebuffers
for use with glReadPixels() under GLES, so add it for the
GL_IMPLEMENTATION_COLOR_READ_TYPE_OES query.
Allows successful testing of 10 bpc / depth 30 rendering with dEQP test
case dEQP-EGL.functional.wide_color.window_1010102_colorspace_default.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Allows to prevent exposing RGB10 configs and visuals to
clients.
v2: Rename expose_rgb10_configs to allow_rgb10_configs,
as suggested by Emil.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Some clients may not like RGB10X2 and RGB10A2 fbconfigs and
visuals. Add a new driconf option 'allow_rgb10_configs' to
allow per application enable/disable.
The option defaults to enabled.
v2: Rename expose_rgb10_configs to allow_rgb10_configs,
as suggested by Emil. Add comment to option parsing,
to make sure it stays before the ->InitScreen().
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Expose formats which are supported at least back to Gen 5 Ironlake,
possibly further. Allow creation of 10 bpc winsys buffers for drawables.
glxinfo now lists new RGBA 10 10 10 2/0 formats.
v2: Move the BGRA/BGRX1010102 formats before the RGBA/RGBX8888
32 bit formats, as the code comments require. Thanks Emil!
Update num_formats from 3 to 5, to keep the special Android
handling intact.
v3: Use num_formats = ARRAY_SIZE(formats) - 2 as suggested by Tapani,
to only exclude the last 2 Android formats, add Tapani's r-b.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Allow DRI3/Present buffer sharing for 10 bpc buffers.
Otherwise composited desktops under DRI3 will only display
black client areas for redirected windows.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Used to support ARGB2101010 and XRGB2101010
winsys framebuffers / drawables, but added
other 10 bpc fourcc's as well for consistency
with definitions in wayland_drm.h, gbm.h, and
drm_fourcc.h.
v2: Align new defines with tabs instead of spaces, for
consistency with remainder of that block of definitions,
as suggested by Tapani.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Extend intel_miptree_blit() to handle at least
ARGB2101010 -> XRGB2101010, ARGB2101010 -> ARGB2101010,
and XRGB2101010 -> XRGB2101010 via the BLT engine,
but not XRGB2101010 -> ARGB2101010 yet.
This works as tested under Compiz, KDE-5, Gnome-Shell.
v2: Restrict BLT fast path to exclude XRGB2101010 -> ARGB2101010,
as intel_miptree_set_alpha_to_one() isn't ready to set 2 bit
alpha channels to 1.0 yet. However, couldn't find a test case
where this specific blit would be needed, so maybe not much
of a point to improve here.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The parameters to gen_xmlpool.py are wrong and cause the following
warnings:
Warning: language 'out/target/product/linaro_x86_64/gen/STATIC_LIBRARIES/libmesa_util_intermediates/xmlpool/es/LC_MESSAGES/options.mo' not found.
Warning: language 'out/target/product/linaro_x86_64/gen/STATIC_LIBRARIES/libmesa_util_intermediates/xmlpool/nl/LC_MESSAGES/options.mo' not found.
Warning: language 'out/target/product/linaro_x86_64/gen/STATIC_LIBRARIES/libmesa_util_intermediates/xmlpool/fr/LC_MESSAGES/options.mo' not found.
Warning: language 'out/target/product/linaro_x86_64/gen/STATIC_LIBRARIES/libmesa_util_intermediates/xmlpool/sv/LC_MESSAGES/options.mo' not found.
Warning: language 'external/mesa3d/src/util/xmlpool/t_options.h' not found.
Warning: language 'out/target/product/linaro_x86_64/gen/STATIC_LIBRARIES/libmesa_util_intermediates/xmlpool' not found.
Warning: language 'de' not found.
Warning: language 'es' not found.
Warning: language 'nl' not found.
Warning: language 'fr' not found.
Warning: language 'sv' not found.
The result is English is the only language in options.h. Use "$<"
instead of "$^" because we only need the first dependency (the script),
not all dependencies.
Signed-off-by: Rob Herring <robh@kernel.org>
Older OpenGL defines two equations for converting from signed-normalized
to floating point data. These are:
f = (2c + 1)/(2^b - 1) (equation 2.2)
f = max{c/2^(b-1) - 1), -1.0} (equation 2.3)
Both OpenGL 4.2+ and OpenGL ES 3.0+ mandate that equation 2.3 is to be
used in all scenarios, and remove equation 2.2. DirectX uses equation
2.3 as well. Intel hardware only supports equation 2.3, so Gen7.5+
systems that use the vertex fetcher hardware to do the conversions
always get formula 2.3.
This can make a big difference for 10-10-10-2 formats - the 2-bit value
can represent 0 with equation 2.3, and cannot with equation 2.2.
Ivybridge and older were using equation 2.2 for OpenGL, and 2.3 for ES.
Now that Ivybridge supports OpenGL 4.2, this is wrong - we need to use
the new rules, at least in core profile. That would leave Gen4-6 doing
something different than all other hardware, which seems...lame.
With context version promotion, applications that requested a pre-4.2
context may get promoted to 4.2, and thus get the new rules. Zero cases
have been reported of this being a problem. However, we've received a
report that following the old rules breaks expectations. SuperTuxKart
apparently renders the cars red when following equation 2.2, and works
correctly when following equation 2.3:
https://github.com/supertuxkart/stk-code/issues/2885#issuecomment-353858405
So, this patch deletes the legacy equation 2.2 support entirely, making
all hardware and APIs consistently use the new equation 2.3 rules.
If we ever find an application that truly requires the old formula, then
we'd likely want that application to work on modern hardware, too. We'd
likely restore this support as a driconf option. Until then, drop it.
This commit will regress Piglit's draw-vertices-2101010 test on
pre-Haswell without the corresponding Piglit patch to accept either
formula (commit 35daaa1695ea01eb85bc02f9be9b6ebd1a7113a1):
draw-vertices-2101010: Accept either SNORM conversion formula.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
tl;dr: For many types of GL object, we can *NEVER* use the Gen function.
In OpenGL ES (all versions!) and OpenGL compatibility profile,
applications don't have to call Gen functions. The GL spec is very
clear about how you can mix-and-match generated names and non-generated
names: you can use any name you want for a particular object type until
you call the Gen function for that object type.
Here's the problem scenario:
- Application calls a meta function that generates a name. The first
Gen will probably return 1.
- Application decides to use the same name for an object of the same
type without calling Gen. Many demo programs use names 1, 2, 3,
etc. without calling Gen.
- Application calls the meta function again, and the meta function
replaces the data. The application's data is lost, and the app
fails. Have fun debugging that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92363
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
All of the callers of _mesa_meta_bind_rb_as_tex_image call
_mesa_meta_setup_sampler shortly after. _mesa_meta_setup_sampler also
binds the texture. This is necessary because not all paths that lead to
_mesa_meta_setup_sampler some through _mesa_meta_bind_rb_as_tex_image.
Rename the function _mesa_meta_texture_object_from_renderbuffer to
reflect its true purpose.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Also, the comment on _mesa_record_error was wrong.
dd_function_table::Error was not called because that function does not
exist.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The in-place resolve probably has some additional restrictions when not
operating on a super tiled surface. Disable it on non-supertiled surfaces
for now to work around a GPU hang.
Fixes: 78ade65956 ("etnaviv: Do GC3000 resolve-in-place when possible")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Overall it does not really help or hurt. The deferred demo gets 1%
improvement and some games a 3% decrease, so I don't think this
should be enabled by default.
But with the code upstream it is easier to experiment with it.
v2: Remove initializing the registers from si_emit_config.
Reviewed-by: Dave Airlie <airlied@redhat.com>
piglit doesn't care, but I'm quite confident that the size actually bound
as range should be reported and not the base size of the resource (and
some quick piglit test hacking confirms this).
Also, the array in the constant buffer looks overallocated by a factor of 4.
For eg, also decrease the size by another factor of 2 by using the same
constant slot for both buffer size (required for txq for TBOs) and the number
of layers for cube arrays, as these are mutually exclusive. Could of course use
some more logic and only actually do this for the samplers/images/buffers where
it's required rather than for all, but ah well...
Reviewed-by: Dave Airlie <airlied@redhat.com>
Those are implemented as texture sampling, so we need to make the
texture TC-compatible too.
Fixes: 34d23e82ca "radv: set some dcc parameters depending on if texture will be sampled"
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Before this DCC was in practice disabled for most games. This
enables practical DCC use. Expect a 5-10% perf increase on a
bunch of games on vega @ 4k.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
If both source and destination are DCC compressed, and their formats
are not compatible, we need to decompress one of them to make
sure we can do reinterpretation (which needs src format == dst format)
.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Apps can use this for render feedback loops, where things are
defined if they render each pixel only once. However, DCC fails
here, as the level of coherence is a block not a pixel, so disable it.
This is also going to help implementing other stuff.
Even if we optimize this later to only happen if there actually is
a loop (if possible at all ...), then the machinery is still useful
to exclude images accessible by the SDMA queue when that is implemented.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
It should already be valid there + the RB will update it during
rendering.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
For fast clear eliminate and decompressions, we always use the most compressed
format.
For clears, the code already creates a renderpass on demand with the exact same
layout as specified.
Otherwise we start distinguishing between GENERAL and TRANSFER_DST_OPTIMAL.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
We do an in place copy where we read compressed and write decompressed.
By doing this in sizes that cover entire DCC blocks and waiting for all
reads in the block before starting to write we avoid corruption.
In the end we clear the DCC metadata to 0xffffffff.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
We don't get a layout when binding to a descriptor set, but can
assume that the LAYOUT is GENERAL.
For DCC stores with the DCC bits set will result in a hang, so
better be safe than sorry.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This reverts commit 5951578043.
The mentioned commit causes a hang in DoW3 on Vega.
Fixes: 5951578043 "radv/gfx9: fix block compression texture views."
Acked-by: Dave Airlie <airlied@redhat.com>
The SVGA_NEW_FS flag is needed since we now examine the fragment
shader's fs_shadow_compare_units flags. The SVGA_NEW_TEXTURE_FLAGS
flag is not needed since it's only for pre-VGPU10.
No piglit changes. This doesn't fix any known issues but it could
pop up somewhere. Suggested by Charmaine.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This should fix:
dEQP-VK.pipeline.sampler.view_type.*.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black
and a few others in that area.
Fixes: b11c4a5546 (radv: add texture descriptor/fmask/cmask support for GFX9)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
amdvlk is probably more subtle than this but it never uses
the inv cb/db variants, we fail some CTS tests without this.
Fixes:
dEQP-VK.renderpass.dedicated_allocation.formats.d32_sfloat_s8_uint.input*.
Fixes: c2fbeb7ca0 (radv: add GFX9 cache flushing support.)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (for now :-)
Signed-off-by: Dave Airlie <airlied@redhat.com>
This ports a fix from amdvlk, to fix the sizing for mip levels
when block compressed images are viewed using uncompressed views.
Fixes:
dEQP-VK.image.texel_view_compatible.graphic.extended*bc*
Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes some of the broken:
dEQP-VK.synchronization.op.multi_queue.*64x64x8* tests.
Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes some of the broken:
dEQP-VK.synchronization.op.multi_queue.*64x64x8* tests.
Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes some of the broken:
dEQP-VK.synchronization.op.multi_queue.*64x64x8* tests.
Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously, we were flagging the instruction state buffer for capture
but not surface state or dynamic state. We want those captured too.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Some older versions of the Vulkan driver didn't properly tag dynamic
state as needing to be captured. Also, this prevents crashes when
looking at dumps on older kernels.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We were walking the sections, printing the batches, and then freeing
them in one pass. If the batch happens to reference any earlier
sections (which it almost certainly will since it's at the end), we will
access freed memory.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This can happen when there's no active fragment shader, such as
when using transform feedback. This wasn't hit by any Piglit test
but is hit by Daniel Rákos' Nature demo. VMware bug 2026189.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Change 59f458cd87 added more enums to glsl_base_type. We
have to bump up the size of the bitfields for fields of this type
for MSVC. Also, add another assertion to catch another place where
this enum bitfield is used.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
It's legal to a pipeline stat query on a compute queue,
but we'd emit the wrong packet here. This should fix it to emit
the correct packet.
Noticed while inspecting the mpv hang.
Fixes: ad61eac250 (radv: factor out eop event writing code. (v2))
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The event emission wasn't sending the correct packet for gfx8 compute
queues, which explains why it works on vega fine.
This fixes the mpv vulkan hang.
Fixes: ad61eac250 (radv: factor out eop event writing code. (v2))
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These seem mildly unstable on vega, crashing CTS in various fun ways,
and looks like leaking memory.
Disable for now, but leave the option to enable them.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We destroy the pools but don't free the container.
This fixes:
dEQP-VK.wsi.xlib.swapchain.simulate_oom*
Fixes: d50937f137 (vulkan/wsi: Implement prime in a completely generic way)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Framebuffer is from 0,0, not (dst.x, dst.y).
Fixes: 69136f4e63 "radv/meta: add resolve pass using fragment/vertex shaders"
Reviewed-by: Dave Airlie <airlied@redhat.com>
The position start at (dst.x, dst.y), so if we want the source to
start at (src.x, src.y), we have to offset by (src.x-dst.x,src.y-dst.y).
Haven't tested that this fixed anything yet, but found by inspection.
Fixes: 69136f4e63 "radv/meta: add resolve pass using fragment/vertex shaders"
Reviewed-by: Dave Airlie <airlied@redhat.com>
the samples_identical instruction returns 0 if they are differet, so
we have to do the extra work if the result is 0, not if it is != 0.
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <airlied@redhat.com>
My refactor in 47273d7312 missed this early return; because
of it, setting UseFallback one layer above actually prevented the
software path from being used.
Remove this early return and let each platform's dri2_initialize_*()
decide what it can do with the LIBGL_ALWAYS_SOFTWARE restriction.
platform_{surfaceless,x11,wayland} were already handling it themselves.
Fixes: 47273d7312 "egl: set UseFallback if LIBGL_ALWAYS_SOFTWARE is set"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reported-by: Brendan King <Brendan.King@imgtec.com>
Note: the following happens only when using slibtool.
Since this is a very serious breakage, we will keep the workaround until
a better solution is available.
DRI modules store the address of the dispatch table in a TLS variable,
_glapi_tls_Dispatch.
Changes to the way libEGL is built in d884d8d007 resulted in
it being statically linked against libglapi, and thus containing its own
copy of _glapi_tls_Dispatch. The result was that some applications would
fail to work (e.g. deqp-egl, which dynamically loads libEGL), due to the
DRI module storing the dispatch table address in one copy of
_glapi_tls_Dispatch, and libEGL obtaining the address from another copy
of the variable.
Fixes: d884d8d007 "egl/dri: link directly to libglapi.so"
Signed-off-by: Brendan King <Brendan.King@imgtec.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
For copies the texture unit needs to know the depth format so
it can read the htile data properly.
This fixes:
dEQP-VK.renderpass.suballocation.formats.d32_sfloat_s8_uint.load.clear
Fixes: ad3d98da9f (radv: enable tc compatible htile for d32s8 also.)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This needs to correspond to the bit depth of the Z plane.
noticed in passing reading amdvlk.
Fixes: fc6c77e162 (radv: fix TC-compat HTILE with VK_FORMAT_D32_SFLOAT_S8_UINT on Vega)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes a crash since the variant object isn't allocated until later
in the function. Not sure how this got through.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
In some cases, We do shadow comparison cases in the fragment shader
instead of with texture sampler state. But when we do so, we must
disable the shadow comparison test in the sampler state. As it
was, we were doing the comparison twice, which resulted in nonsense.
Also, we had the texcoord and texel value swapped in the comparison
instruction.
Fixes about 38 Piglit tex-miplevel-selection tests.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This is a regression since I added cayman atomic support, not sure
it fixes anything, but the shader dumps look better.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is ported from amdvlk which sets the independent 64b blocks
only for image which will sample dcc.
I'm not sure how to port this to radeonsi.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These are just taken from amdvlk, we probably knew these already,
but may as well port them now.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We need to move this to a separate loop because
nir_compact_varyings() can alter the IR of a previous stage.
Fixes: 6648bd68fd "st/glsl_to_nir: enable NIR link time opts"
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This was already being reported, just missed the docs.
Reported-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Both in setup and arithmetic instructions. Also, remove the useless
new_*_inst() functions, and refactor check_arith_arg(), because it did
two completely different things.
Piglit: spec/ati_fragment_shader/error04-endshader
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
ATI_fs in swrast only had access to texture coordinates if there was a
valid texture bound and texturing was enabled.
Piglit: spec/ati_fragment_shader/render-sources and render-notexture
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
ATI_fs in swrast only had secondary color if GL_COLOR_SUM was enabled.
This patch probably fixes the same issue in r200.
Piglit: spec/ati_fragment_shader/render-sources and render-precedence
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
This patch fixes multiple problems:
- the interpolator check was duplicated
- both had arg instead of argRep
- I split it into color and alpha for better readability and error msg
- the DOT4 check only applies to color instruction according to the spec
- made the DOT4 check fatal, and improved the error msg
Piglit: spec/ati_fragment_shader/error08-secondary
v2: fixed formatting, added spec quotations
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
This fixes crash when:
- first pass begins with alpha inst
- first pass ends with color inst, second pass begins with alpha inst
Also, use the symbolic name instead of a number.
Piglit: spec/ati_fragment_shader/api-alphafirst
v2: fixed formatting
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
This fixes crash in the state tracker.
Piglit: spec/ati_fragment_shader/render-notexture
v2: fixed formatting, moved stuff inside the loop,
moved the fallback later to fix more cases
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Increase the limit and handle non-square images better.
This makes glxgears 20% faster on APUs, and a little more on dGPUs.
We all use and love glxgears.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
DCC was disabled when the image format is !!supported, which is one ! too many.
Ironically the commit that introduced it was supposed to lead to more DCC use ...
Fixes: 969537d935 "radv: Add support for more DCC compression with VK_KHR_image_format_list."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes running piglits without -fbo. Probably lots of other stuff too.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
This is parallel to the pre-SM50 change which does this. Adjusts the
shuffles / quadops to make the values correct relative to lane 0, and
then splat the results to all lanes for the final move into the target
register.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-By: Karol Herbst <kherbst@redhat.com>
This fixes the layout issue for the blit path as well.
This fixes:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.depth_stencil.d32_sfloat_s8_uint_d32_sfloat_s8_uint*
v2: use compatible render passes.
v2.1: use enum
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we are doing a general->general transfer with HIZ enabled,
we want to hit the tile surface disable bits in radv_emit_fb_ds_state,
however we never get the current layout to know we are in general
and meta hardcoded the transfer layout which is always tile enabled.
This fixes:
dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.depth_stencil.d32_sfloat_s8_uint_d32_sfloat_s8_uint.optimal_general
dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.depth_stencil.d32_sfloat_s8_uint_d32_sfloat_s8_uint.general_general
v2: refactor some shared helpers for blit patches
v3: we only need multiple render passes as they should be compatible.
v3.1: use enum (Bas)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just refactors the gfx9 blit2d pipeline creation
to be less lines of code.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This add support for a 3D image reading path to the blit 2d paths,
like I did for the clear paths.
Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Alex Smith <asmith@feralinteractive.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
On GFX9 we must access 3D textures with 3D samplers AFAICS.
This fixes:
dEQP-VK.api.image_clearing.core.clear_color_image.3d.single_layer
on GFX9 for me.
v1.1: fix tex->sampler_dim to dim
v2: send layer in from outside
v3: don't regress on pre-gfx9
Fixes: e38685cc62 'Revert "radv: disable support for VEGA for now."'
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Alex Smith <asmith@feralinteractive.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
looking at traces I noticed we'd set slice_max too large sometimes.
This should fix it.
v2: fix missing - 1
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This should shut up some Valgrind errors during pre-regalloc
scheduling. The errors were harmless since they could only have led
to the estimation of the bank conflict penalty of an instruction
pre-regalloc, which is inaccurate at that point of the program
compilation, but no less accurate than the intended "return 0"
fall-back path. The scheduling pass is normally re-run after regalloc
with a well-defined grf_used value and accurate bank conflict
information.
Fixes: acf98ff933 "intel/fs: Teach instruction scheduler about GRF bank conflict cycles."
Reported-by: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The weight_vector_type constructor was inadvertently assuming C++17
semantics of the new operator applied on a type with alignment
requirement greater than the largest fundamental alignment.
Unfortunately on earlier C++ dialects the implementation was allowed
to raise an allocation failure when the alignment requirement of the
allocated type was unsupported, in an implementation-defined fashion.
It's expected that a C++ implementation recent enough to implement
P0035R4 would have honored allocation requests for such over-aligned
types even if the C++17 dialect wasn't active, which is likely the
reason why this problem wasn't caught by our CI system.
A more elegant fix would involve wrapping the __SSE2__ block in a
'__cpp_aligned_new >= 201606' preprocessor conditional and continue
taking advantage of the language feature, but that would yield lower
compile-time performance on old compilers not implementing it
(e.g. GCC versions older than 7.0).
Fixes: af2c320190 "intel/fs: Implement GRF bank conflict mitigation pass."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104226
Reported-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This reverts commit 9702fac68e, which
hangs vulkancts and crucible on all platforms.
The patch is being reverted because it disables continuous integration
testing. The patch from bug 104359 does not apply to master.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104359
This fixes vmfaults seen on vega with:
dEQP-VK.pipeline.multisample_interpolation.sample_interpolate_at_single_sample_.128_128_1.samples_1
These were caused by the don't allocate cmask but it was just accidental.
The actual problem was the shader was trying to get the sample positions from
a buffer, but the buffer was never getting configured to contain them, as the
previous shader never needed them.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: 1171b304f3 (radv: overhaul fragment shader sample positions.)
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes a varying packing issue when using transform feedback in
GL_SEPARATE_ATTRIBS mode. By time we get to linking, we already
know that the number of feedback attributes is under the
GL_MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS limit so packing isn't
as critical. In fact, packing/splitting vec3 attributes can cause
trouble because splitting effectively creates another TFB output
which can exceed device limits. So, disable vec3 packing when it's
not needed to avoid that issue.
Fixes the Piglit ext_transform_feedback-separate test on VMware
driver.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The mix of bitwise operators with * and + to compute the packing_class
values was a little weird. Just use bitwise ops instead.
v2: add assertion to make sure interpolation bits fit without collision,
per Timothy. Basically, rewrite function to be simpler.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
When walking over all the cases in a OpSwitch, take in account the bitsize
of the literals to avoid getting wrong cases.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Vulkan spec doesn't specify that VK_REMAINING_ARRAY_LAYERS is allowed
in the passed VkClearRect struct.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Thanks to Karol Herbst for the debugging / tracing work that led to this
change.
Move to using lane 0 as the "work" lane for the texture. It is unclear
why this helps, as that computation should be identical to doing it in
the "correct" lane with the properly adjusted quadops.
In order to be able to use the lane 0 result, we also have to ensure
that lane 0 contains the proper array/indirect/shadow values.
This applies to Fermi and Kepler. Maxwell+ may or may not need fixing,
but that lowering logic is separate.
Fixes KHR-GL45.texture_cube_map_array.sampling
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Apparently gallium's u_blitter wants depth from at least the .z component,
and other swizzling appears to apply on top of that. Fixes
fbo-generatemipmap-formats failures with depth formats.
There may be some more RCL work to be done (I think I need to split my Z/S
stores when doing separate stencil), but this gets piglit's "texwrap
GL_ARB_depth_buffer_float" working.
v2: Unwrap the z32f_wrapper before calling the helper, rather than having
the helper have a callback.
v3: Rebase on Rob Clark's u_transfer_helper instead
For devices (and kernels) which support different priority ringbuffers,
expose context priority support.
Signed-off-by: Rob Clark <robdclark@gmail.com>
We still have gpu hangs on Cannonlake when using push constants, so
disable them for now until we have a proper fix for these hangs.
v2: Add warning message when creating context too.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
This is not because the vertex stage needs some push constants
that other stages need them too. This should reduce the number
of loaded SGPRs in some situations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
They are dummy objects but the spec requires layout to not be
NULL, this just makes sure we are creating valid pipeline layout
objects. This will allow us to remove some useless checks.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It uses slightly more memory (though still bounded by the number
of mapped ranges), but gives less quadratic behavior.
Cuts 4 minutes from the runtime of the CTS *.sparse.* tests.
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Need to do some gymnastics to copy the parameter from the indirect
parameters buffer to uniform so shader sees the correct base-vertex-id.
Fixes ./bin/arb_draw_indirect-vertexid on a5xx and probably a4xx too.
Signed-off-by: Rob Clark <robdclark@gmail.com>
For dealing with indirect-draw + gl_VertexID, we'll introduce another
case where we need to use CP_MEM_TO_MEM. Rather than adding more
if(a5xx)/else make this a ctx vfunc.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Cmdstream traces from blob make it clear that the blob driver dev's
*think* a5xx has a real (non-zero-based) vtxid. But reality claims
differently.
Fixes ./bin/gl-3.2-basevertex-vertexid and probably others.
This means draw-indirect is going to need some gymnastics to copy
base-vertex into uniform. (a4xx probably needs that too.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
When calculating buffer offsets for client buffers account for info.index_bias.
Fixes the follow piglit tests:
arb_draw_elements_base_vertex-drawelements-user_varrays
arb_draw_elements_base_vertex-negative-index-user_varrays
Reviewed-By: Bruce Cherniak <bruce.cherniak@intel.com>
This wasn't calculating the correct value, this along with
a nir patch fixes a regression in:
dEQP-VK.tessellation.shader_input_output.barrier
Fixes: 043d14db30 (ac/nir: don't write tcs outputs to LDS that aren't read back.)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we don't remap and output this code would trample the outputs
read bits.
This fixes a regression in
dEQP-VK.tessellation.shader_input_output.barrier
Fixes: 1c9c42d16b (nir: add varying component packing helpers)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The Talos Principle contains shaders with an OpSelect between two
vectors where the condition is a scalar boolean. This is technically
against the spec bout nir_builder gracefully handles it by splatting
out the condition to all the channels. So long as the condition is a
boolean, just emit a warning instead of failing.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104246
This reverts commit 2294d35b24.
We can't do this without adjusting the input SGPRs/VGPRs logic.
For now, just revert it. I will send a proper solution later.
It fixes a rendering issue in F1 2017 that CTS didn't catch up.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
anv merges the tess info correctly, but radv wasn't doing this.
This fixes hangs in
dEQP-VK.tessellation.winding.default_domain.hlsl_triangles_ccw
Fixes: 60fc0544e0 (radv/pipeline: handle tessellation shader compilation)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is in no way optimal, but there seems to be some problems
mixing at the moment, lots of hangs, it is possible, just need
to figure out more magic.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Basically a clone of util_blitter_blit() but with special handling to
blit PIPE_BUFFER as a PIPE_TEXTURE_1D.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Get rid of "gmem" (ie. tiling) ringbuffer, and just emit setup commands
directly to "draw" ringbuffer for compute (and in future for blits not
using the 3d pipe). This way we can have a simple flat cmdstream buffer
and bypass setup related to 3d pipe.
Signed-off-by: Rob Clark <robdclark@gmail.com>
In the busy && !needs_flush case, we can support a DISCARD_RANGE upload
using a staging buffer. This is a bit different from the case of mid-
batch uploads which require us to shadow the whole resource (because
later draws in an earlier tile happen before earlier draws in a later
tile).
Signed-off-by: Rob Clark <robdclark@gmail.com>
According to the RENDER_SURFACE_STATE internal documentation, the
R32G32B32_FLOAT restriction is marked "IVB" only. We choose to apply
it to Ivybridge and Baytrail, but not Haswell.
Apparently fixes KHR-GL46.texture_size_promotion.functional on Haswell.
Changes these tests from crashing to skipping on Haswell:
- KHR-GL46.direct_state_access.textures_storage_multisample_2d_rgb32f
- KHR-GL46.direct_state_access.textures_storage_multisample_3d_rgb32f
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Create a list in decoder to store all render picture buffer pointers that
currently being used in reference picture lists.
During get message buffer call, check each pointer in render_pic_list[]
within given pic->ref[] list, remove pointer that no longer being used by
pic->ref[]. Then add current render surface pointer to the render_pic_list[]
and assign the associated index to result.curr_idx.
As a result, result.curr_idx will have the correct index to represent the
current render picture, instead of the previous increamenting values.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Create a list in decoder to store all render picture buffer pointers that
currently being used in reference picture lists.
During get message buffer call, check each pointer in render_pic_list[]
within given pic->ref[] list, remove pointer that no longer being used by
pic->ref[]. Then add current render surface pointer to the render_pic_list[]
and assign the associated index to result.curr_idx.
As a result, result.curr_idx will have the correct index to represent the
current render picture, instead of the previous increamenting values.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Vaapi encode interface provides idr frame flags, where omx interface doesn't.
Therefore, change to use picture type to determine idr frame, which will
work for both interfaces.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Vaapi encode interface provides idr frame flags, where omx interface doesn't.
Therefore, change to use picture type to determine idr frame, which will
work for both interfaces.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Replace use of x86 intrinsic with general llvm IR instruction.
Generates the same final assembly.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Add BASE_NUMA_NODE, BASE_CORE, BASE_THREAD parameters to
SwrCreateContext.
Add optional SWR_API_THREADING_INFO parameter to SwrCreateContext to
control reservation of API threads.
Add SwrBindApiThread() function to allow binding of API threads to
reserved HW threads.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Since we have HW contexts on gen4/5, we could take advantage of them, as
done for gen6+ in commit e32cd5ffbb ("i965: Rely on hardware contexts
for query objects on Gen6+."), to only emit a pair of counters at
begin/end queryobj, rather than around every primitive. However, to keep
queryobj working in the meantime as we bringup support for HW ctx on
gen4/5, we can keep using the existing code.
References: e32cd5ffbb ("i965: Rely on hardware contexts for query objects on Gen6+.")
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Add a new helper that drivers can use to emulate various things that
need special handling in particular in transfer_map:
1) z32_s8x24.. gl/gallium treats this as a single buffer with depth
and stencil interleaved but hardware frequently treats this as
separate z32 and s8 buffers. Special pack/unpack handling is
needed in transfer_map/unmap to pack/unpack the exposed buffer
2) fake RGTC.. GPUs designed with GLES in mind, but which can other-
wise do GL3, if native RGTC is not supported it can be emulated
by converting to uncompressed internally, but needs pack/unpack
in transfer_map/unmap
3) MSAA resolves in the transfer_map() case
v2: add MSAA resolve based on Eric's "gallium: Add helpers for MSAA
resolves in pipe_transfer_map()/unmap()." patch; avoid wrapping
pipe_resource, to make it possible for drivers to use both this
and threaded_context.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Following dEQP cases pass:
dEQP-EGL.functional.get_proc_address.extension.gl_ext_disjoint_timer_query
dEQP-EGL.functional.client_extensions.disjoint
Piglit test 'ext_disjoint_timer_query-simple' passes with these changes.
No changes/regression observed in Intel CI.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Patch adds GL_GPU_DISJOINT_EXT and enables to use timer queries when
EXT_disjoint_timer_query is enabled.
v2: enable extension only when EXT_disjoint_timer_query set
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes almost all of piglit's arb_shader_texture_lod grad tests, except for
the base -texgrad/texgradcube ones which fail on what appear to be
precision problems.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We want the clamping of the coordinate to apply after the offset, so we
need to do math to lower the offset out of the instruction. Fixes texwrap
offset cases for GL_CLAMP with GL_NEAREST on vc5.
Note: I moved the get_texture_size() verbatim, so that it was defined
before use.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The spec says the missing texel (when we wrap around both x and y axis)
should be synthesized as the average of the 3 other texels. For bilinear
filtering however we instead adjusted the filter weights (because, while
the complexity looks similar, there would be 4 times as many color values
to fix up than weights). Obviously this could not work for gather (hence
accurate corner filtering was disabled with gather).
Implement this by just doing it as the spec implies - calculate the 4th
texel as the average of the other 3. With gather of course there's only
one color to worry about, so it's not all that many instructions neither
(albeit surely the whole cube map filtering is hilariously complex).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Cube texture wrapping is a bit special since the values (post face
projection) always are within [0,1], so we took advantage of that and
omitted some clamps.
However, we can still get NaNs (either because the coords already had NaNs,
or the face projection generated them), and in fact we didn't handle them
quite safely. I've seen -INT_MAX + 1 been propagated through as the final int
coord value, albeit I didn't observe a crash. (Not quite a coincidence, since
any stride mul with -INT_MAX or -INT_MAX+1 will turn up as a small positive
number - nevertheless, I'd rather not try my luck, I'm not entirely sure it
can't really turn up negative neither due to seamless coord swapping, plus
ifloor of a NaN is not guaranteed to return -INT_MAX by any standard. And
we kill off NaNs similarly with ordinary texture wrapping too.)
So kill off the NaNs by using the common max against zero method.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Unfortunately, in aubinator and aubinator_error_decode we don't always
know how many of a given state we have, so we must guess. One day,
we'll come up with a way to annotate the batch to solve this problem.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The shared framework can now do everything that aubinator_error_decode
ever did and more. It's time to make the switch.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Previously, if a group was nested in another group such that it didn't
start on a dword boundary, we would decode it as if it started at the
start of its first dword. This changes things to work even more in
terms of bits so that we can properly decode these structs. This
affects MOCS, attribute swizzles, and several other things.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Use 16_ABGR instead of 32_ABGR if Z isn't written.
Ported from RadeonSI.
No CTS regressions on Polaris.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We should also not load the input SGPRs and VGPRS, but
let's start with this for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The number of grid components is always 3 when gl_NumWorkGroups
is declared, because it relies on the number of components of
nir_instrinsic_load_num_work_groups.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
From android cts 8.0_r4, a new test case checks if all the required egl
extensions are exposed. In the current implementation we expose KHR_image
if KHR_image_base and KHR_image_pixmap are supported but KHR_image spec
does not mandate the existence of both the extensions.
This patch preserves the current check and also provides the backend
with an option to expose the KHR_image extension.
Test: run cts -m CtsOpenGLTestCases -t \
android.opengl.cts.OpenGlEsVersionTest#testRequiredEglExtensions
Signed-off-by: Harish Krupo <harish.krupo.kps@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
We never supported it. Missed during copy and pasting.
Fixes: 17201a2eb0 "radv: port to using updated anv entrypoint/extension generator."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The hardware doesn't support this, and isl_surf_get_mcs_surf will fail.
I feel a bit bad replicating this logic, but we want to decide up front.
This fixes the following test when run with --deqp-surface-width=16384:
- GTF-GL46.gtf30.GL3Tests.framebuffer_blit.framebuffer_blit_error_blitframebuffer_multisampled_framebuffers_different_sample_count
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Commit bb1e6ff161 ("spirv: Add a prepass to set types on vtn_values")
added generation of vtn_gather_types.c, but forgot to add it to the
Android build files.
Fixes: bb1e6ff161 ("spirv: Add a prepass to set types on vtn_values")
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Rob Herring <robh@kernel.org>
This patch fixes piglit tex3d-maxsize by correcting 4 things:
The total_size calculation was using 32-bit math, therefore a >4GB
allocation request overflowed and was not returning false (unsupported).
Changed AlignedMalloc arguments from "unsigned int" to size_t, to handle
>4GB allocations.
Added error checking on texture allocations to fail gracefully.
Finally, temporarily decreased supported max texture size from 4GB to 2GB.
The gallivm texture-sampler needs some additional work to correctly handle
larger than 2GB textures (offsets to LLVMBuildGEP are signed).
I'm working on a follow-on patch to allow up to 4GB textures, as this is
useful in HPC visualization applications.
Fixes piglit tex3d-maxsize.
v2: Updated patch description to clarify ">4GB".
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
Environment variable KNOB_MAX_WORKER_THREADS allows the user to override
default thread creation and thread binding. Previous commit to adjust
linux cpu topology caused setting this KNOB to bind all threads to a single
core.
This patch restores correct functionality of override.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
This test should rely on dispatch.h being generated, but it doesn't.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The sample mask is used even if msaa is not explicity enabled when we
have a framebuffer with multisampled surfaces. That's DX behavior and
what the Radeon drivers do. Not sure about other drivers at this point.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Without this we get the error "FPExt only operates on FP" when
converting the following:
vec1 32 ssa_5 = b2f ssa_4
vec1 64 ssa_6 = f2f64 ssa_5
Which results in:
%44 = and i32 %43, 1065353216
%45 = fpext i32 %44 to double
With this patch we now get:
%44 = and i32 %43, 1065353216
%45 = bitcast i32 %44 to float
%46 = fpext float %45 to double
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
nir_lower_io_to_temporaries() does not support tcs so we cannot
assume there are no indirects here. Also the radeonsi backend
(the only backend to support tess) has support for tcs indirects
so there is no need to lower them anyway.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This patch is purely for readability improvements when programming
the MEDIA_VFE_STATE.
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No need to pass a pipe_resource when we can just pass the target.
This makes the function potentially more usable. Rename it too.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This function is only used in two places:
1. VMware driver, but only for HUD reporting
2. st/nine state tracker, used for texture memory accounting
Fixes: a69efa9482 ("util: add new util_resource_size() function in
u_resource.[ch]")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Pointers with no storage type are converted to inout variables but SSA
values and pointers with a storage type (which turns into a uint or
uvec2) are just input variables.
Previously, we just gave them exactly the same type as the respective
image (which already had a sampler2D or similar type). Now they have
their own base type and a pointer to the vtn_type for the image.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Thanks to Emil's -Wundef, t_dd_dmatmp.h now complains that intel_render.c
is missing a couple `#define`s.
Assigning them to 0 keeps the existing behaviour; I'll let someone else
turn them on if this is the behaviour that was intended.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Note that gl_shader::CompileStatus will also indicate whether a shader
has been successfully specialized.
v2: Use the 'spirv_data' member of gl_shader to know if it is a SPIR-V
shader, instead of a dedicated flag. (Timothy Arceri)
v3: Use bool instead of GLboolean. (Ian Romanick)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: * Add a gl_shader_spirv_data member to gl_shader, which already
encapsulates a gl_spirv_module where the binary will be saved.
(Eduardo Lima)
* Just use the 'spirv_data' member to know whether a gl_shader has
the SPIR_V_BINARY_ARB state. (Timothy Arceri)
* Remove redundant argument checks. Move extension presence check
to API entry point where the rest of checks are. Retype 'n' and
'length'arguments to use the correct and more standard types.
(Ian Romanick)
* Fix some nitpicks. (Ian Romanick)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is a per-shader structure holding the SPIR-V data associated with the
shader (binary module, specialization constants and entry-point).
This is needed because both gl_shader and gl_linked_shader need to share this
data. Instead of copying the data, we pass a reference to it upon program
linking. That's why it is reference-counted.
This struct is created and associated with the shader upon calling
glShaderBinary(), then subsequently filled up by the call to
glSpecializeShaderARB().
v2: Readability improvements (Ian Romanick)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: * Make the SPIR-V module struct part of a larger gl_shader_spirv_data
struct that will be introduced later, and don't reference it directly
in gl_shader. (Eduardo Lima)
* Readability improvements (Ian Romanick)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: * Add meson build bits (Eric Engestrom)
* Return INVALID_OPERATION error on SpecializeShaderARB (Ian Romanick)
v3: Include boilerplate for the GL 4.6 alias of glSpecializeShaderARB
(Neil Roberts)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Instead of calling vtn_add_case for the default case and then looping,
add an is_default variable and do everything inside the loop. This will
make the next commit easier.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This autogenerated pass will automatically find and set the type field
on all vtn_values. This way we always have the type and can use it for
validation and other checks.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
At the moment, this just lets us drop the const_type for constants and
unify things a bit. Eventually, we will use this to store the types of
all SPIR-V SSA values.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We can write to the same output but in different components, like
in this example:
layout(location = 0, component = 0) out ivec2 dEQP_FragColor_0;
layout(location = 0, component = 2) out ivec2 dEQP_FragColor_1;
Therefore, they are not two different outputs but only one.
Fixes:
dEQP-VK.glsl.440.linkage.varying.component.frag_out.*
v3:
- Remove FRAG_RESULT_MAX.
- Add const and use sizeof (Ian).
- Do three-pass to set properly the locations of fragment
outputs when having arrays (Jason).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Care must be taken that all coords end up correct, the tests are very
sensitive that everything is correctly rounded. This doesn't matter
for bilinear filter (since picking a wrong texel with weight zero is
ok), and we could also switch the per-sample coords mistakenly.
While here, also optimize the coord_mirror helper a bit (we can do the
mirroring directly by exploiting float rounding, no need for fixing up
odd/even manually).
I did not touch the mirror_clamp and mirror_clamp_to_border modes.
In contrast to mirror_clamp_to_edge and mirror_repeat these are legacy
modes. They are specified against old gl rules, which actually does
the mirroring not per sample (so you get swapped order if the coord
is in the mirrored section). I think the idea though is that they should
follow the respecified mirror_clamp_to_edge rules so the order would be
correct.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
To support the reindex intrinsic, we need the result to be
something on which we can adjust the index/address.
Since it is all within a basic block, the compiler should be
able to merge any extra loads.
v2: Change visit_get_buffer_size too.
Reviewed-by: Dave Airlie <airlied@redhat.com>
If the app does not plan to put a buffer or image in it
(why? But it is allowed and CTS does it), they do not need to
allocate it with the deciate allocation struct.
Fixes: a639d40f13 "radv: add support for local bos. (v3)"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Push constants on Intel hardware are significantly more performant than
pull constants. Since most Vulkan applications don't actively use push
constants on Vulkan or at least don't use it heavily, we're pulling way
more than we should be. By enabling pushing chunks of UBOs we can get
rid of a lot of those pulls.
On my SKL GT4e, this improves the performance of Dota 2 and Talos by
around 2.5% and improves Aztec Ruins by around 2%.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In Vulkan, we don't support classic pull constants and everything the
client asks us to push, we push. However, for pushed UBOs, we still
want to fall back to conventional pulls if we run out of space.
Push constants work in terms of 32-byte chunks so if we want to be able
to push UBOs, every thing needs to be 32-byte aligned. Currently, we
only require 16-byte which is too small.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
In order to do this we have to modify push constant set up to handle
ranges. We also have to tweak the way we handle dirty bits a bit so
that we re-push whenever a descriptor set changes.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
There are several places where we look up opcodes in an array of stages.
Assert that the we don't end up going out-of-bounds.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We want to call brw_nir_analyze_ubo_ranges immedately after
anv_nir_apply_pipeline_layout and it badly wants constants. We could
run an optimization step and let constant folding do it but that's way
more expensive than needed. It's really easy to just handle constants
in apply_pipeline_layout.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This rewires the logic for assigning uniform locations to work in terms
of "complex alignments". The basic idea is that, as we walk the list of
instructions, we keep track of the alignment and continuity requirements
of each slot and assert that the alignments all match up. We then use
those alignments in the compaction stage to ensure that everything gets
placed at a properly aligned register. The old mechanism handled
alignments by special-casing each of the bit sizes and placing 64-bit
values first followed by 32-bit values.
The old scheme had the advantage of never leaving a hole since all the
64-bit values could be tightly packed and so could the 32-bit values.
However, the new scheme has no type size special cases so it handles not
only 32 and 64-bit types but should gracefully extend to 16 and 8-bit
types as the need arises.
Tested-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The testing for this extension is currently very poor. The CTS tests
only test accessing UBOs and SSBOs at dynamic offsets so none of our
constant-offset paths get triggered at all. Also, there's an assertion
in our handling of nir_intrinsic_load_uniform that offset % 4 == 0 which
is never triggered indicating that nothing every gets loaded from an
offset which is not a dword. Both push constants and the constant
offset pull paths are complex enough, we really don't want to ship
without tests. We'll turn the extension back on once we have decent
tests.
VCE processing IBs starts from session and task info at first level,
other commands processed subsequently. The task info for destroy is
embedded to destroy command, resulting that feedback command is not
properly procoessed. This is causing kernel spin VM fault messages on
Polaris and Vega10 card when running ends at encode application.
The fix is also verified on VCE physical mode card.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Christian König <christian.koenig@amd.com>
Might be useful to know the VRAM/GTT usage, the number of VRAM
CPU page faults, etc. Nothing is currently using that new
interface, but it's a first step.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
dota2 binds a ton of index buffers but the type is always 16-bit.
Note that we have to invalidate the type when switching from
indexed draws to normal draws.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
dota2 always calls vkResetCommandBuffer() before
vkBeginCommandBuffer() which is quite useless.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
RADV_CMD_BUFFER_STATUS_INVALID is not used for now, but I think
it makes sense to declare it. Could be used later with better
command buffer error handling.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Copied from RadeonSI.
This fixes all CTS
dEQP-VK.renderpass.dedicated_allocation.formats.d32_sfloat_s8_uint.clear.*
And some other ones which use the same format.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The GL_ARB_get_program_binary extension spec says:
"If ProgramBinary fails to load a binary, no error is generated, but
any information about a previous link or load of that program object
is lost."
v2:
* Re-initialize shProg->data after clear. (Jordan)
(Required after 6a72eba755)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
V2 (Timothy Arceri):
- add extra code comment
- stop passing around void *binary and just pass
program_binary_header *hdr instead.
- move to src/mesa/main rather than src/util
V3 (Timothy Arceri):
- Move more code out of the backend and into the common
helpers.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Mesa supports either 0 or 1 formats. If 1 format is supported, it is
GL_PROGRAM_BINARY_FORMAT_MESA as defined in the
GL_MESA_program_binary_formats extension spec.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The ARB_get_program_binary extension requires that uniform values in a
program be restored to their initial value just after linking.
This patch saves off the initial values just after linking. When the
program is restored by glProgramBinary, we can use this to copy the
initial value of uniforms into UniformDataSlots.
V2 (Timothy Arceri):
- Store UniformDataDefaults only when serializing GLSL as this
is what we want for both disk cache and ARB_get_program_binary.
This saves us having to come back later and reset the Uniforms
on program binary restores.
Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This will allow us to use the program serialization to implement
ARB_get_program_binary.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This addresses a long-standing back-end compiler bug that could lead
to cross-channel data corruption in loops executed non-uniformly. In
some cases live variables extending through a loop divergence point
(e.g. a non-uniform break) into a convergence point (e.g. the end of
the loop) wouldn't be considered live along all physical control flow
paths the SIMD thread could possibly have taken in between due to some
channels remaining in the loop for additional iterations.
This patch fixes the problem by extending the CFG with physical edges
that don't exist in the idealized non-vectorized program, but
represent valid control flow paths the SIMD EU may take due to the
divergence of logical threads. This makes sense because the i965 IR
is explicitly SIMD, and it's not uncommon for instructions to have an
influence on neighboring channels (e.g. a force_writemask_all header
setup), so the behavior of the SIMD thread as a whole needs to be
considered.
No changes in shader-db.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes the dataflow propagation logic of the copy propagation pass
more intelligent in cases where the destination of a copy is known to
be undefined for some incoming CFG edges, building upon the
definedness information provided by the last patch. Helps a few
programs, and avoids a handful shader-db regressions from the next
patch.
shader-db results on ILK:
total instructions in shared programs: 6541547 -> 6541523 (-0.00%)
instructions in affected programs: 360 -> 336 (-6.67%)
helped: 8
HURT: 0
LOST: 0
GAINED: 10
shader-db results on BDW:
total instructions in shared programs: 8174323 -> 8173882 (-0.01%)
instructions in affected programs: 7730 -> 7289 (-5.71%)
helped: 5
HURT: 2
LOST: 0
GAINED: 4
shader-db results on SKL:
total instructions in shared programs: 8185669 -> 8184598 (-0.01%)
instructions in affected programs: 10364 -> 9293 (-10.33%)
helped: 5
HURT: 2
LOST: 0
GAINED: 2
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Currently the liveness analysis pass would extend a live interval up
to the top of the program when no unconditional and complete
definition of the variable is found that dominates all of its uses.
This can lead to a serious performance problem in shaders containing
many partial writes, like scalar arithmetic, FP64 and soon FP16
operations. The number of oversize live intervals in such workloads
can cause the compilation time of the shader to explode because of the
worse than quadratic behavior of the register allocator and scheduler
when running out of registers, and it can also cause the running time
of the shader to explode due to the amount of spilling it leads to,
which is orders of magnitude slower than GRF memory.
This patch fixes it by computing the intersection of our current live
intervals with the subset of the program that can possibly be reached
from any definition of the variable. Extending the storage allocation
of the variable beyond that is pretty useless because its value is
guaranteed to be undefined at a point that cannot be reached from any
definition.
According to Jason, this improves performance of the subgroup Vulkan
CTS tests significantly (e.g. the runtime of the dvec4 broadcast test
improves by nearly 50x).
No significant change in the running time of shader-db (with 5%
statistical significance).
shader-db results on IVB:
total cycles in shared programs: 61108780 -> 60932856 (-0.29%)
cycles in affected programs: 16335482 -> 16159558 (-1.08%)
helped: 5121
HURT: 4347
total spills in shared programs: 1309 -> 1288 (-1.60%)
spills in affected programs: 249 -> 228 (-8.43%)
helped: 3
HURT: 0
total fills in shared programs: 1652 -> 1597 (-3.33%)
fills in affected programs: 262 -> 207 (-20.99%)
helped: 4
HURT: 0
LOST: 2
GAINED: 209
shader-db results on BDW:
total cycles in shared programs: 67617262 -> 67361220 (-0.38%)
cycles in affected programs: 23397142 -> 23141100 (-1.09%)
helped: 8045
HURT: 6488
total spills in shared programs: 1456 -> 1252 (-14.01%)
spills in affected programs: 465 -> 261 (-43.87%)
helped: 3
HURT: 0
total fills in shared programs: 1720 -> 1465 (-14.83%)
fills in affected programs: 471 -> 216 (-54.14%)
helped: 4
HURT: 0
LOST: 2
GAINED: 162
shader-db results on SKL:
total cycles in shared programs: 65436248 -> 65245186 (-0.29%)
cycles in affected programs: 22560936 -> 22369874 (-0.85%)
helped: 8457
HURT: 6247
total spills in shared programs: 437 -> 437 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0
total fills in shared programs: 870 -> 854 (-1.84%)
fills in affected programs: 16 -> 0
helped: 1
HURT: 0
LOST: 0
GAINED: 107
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This should allow the post-RA scheduler to do a slightly better job at
hiding latency in presence of instructions incurring bank conflicts.
The main purpuse of this patch is not to improve performance though,
but to get conflict cycles to show up in shader-db statistics in order
to make sure that regressions in the bank conflict mitigation pass
don't go unnoticed.
Acked-by: Matt Turner <mattst88@gmail.com>
Unnecessary GRF bank conflicts increase the issue time of ternary
instructions (the overwhelmingly most common of which is MAD) by
roughly 50%, leading to reduced ALU throughput. This pass attempts to
minimize the number of bank conflicts by rearranging the layout of the
GRF space post-register allocation. It's in general not possible to
eliminate all of them without introducing extra copies, which are
typically more expensive than the bank conflict itself.
In a shader-db run on SKL this helps roughly 46k shaders:
total conflicts in shared programs: 1008981 -> 600461 (-40.49%)
conflicts in affected programs: 816222 -> 407702 (-50.05%)
helped: 46234
HURT: 72
The running time of shader-db itself on SKL seems to be increased by
roughly 2.52%±1.13% with n=20 due to the additional work done by the
compiler back-end.
On earlier generations the pass is somewhat less effective in relative
terms because the hardware incurs a bank conflict anytime the last two
sources of the instruction are duplicate (e.g. while trying to square
a value using MAD), which is impossible to avoid without introducing
copies. E.g. for a shader-db run on SNB:
total conflicts in shared programs: 944636 -> 623185 (-34.03%)
conflicts in affected programs: 853258 -> 531807 (-37.67%)
helped: 31052
HURT: 19
And on BDW:
total conflicts in shared programs: 1418393 -> 987539 (-30.38%)
conflicts in affected programs: 1179787 -> 748933 (-36.52%)
helped: 47592
HURT: 70
On SKL GT4e this improves performance of GpuTest Volplosion by 3.64%
±0.33% with n=16.
NOTE: This patch intentionally disregards some i965 coding conventions
for the sake of reviewability. This is addressed by the next
squash patch which introduces an amount of (for the most part
boring) boilerplate that might distract reviewers from the
non-trivial algorithmic details of the pass.
The following patch is squashed in:
SQUASH: intel/fs/bank_conflicts: Roll back to the nineties.
Acked-by: Matt Turner <mattst88@gmail.com>
src/gallium/winsys/pl111/drm/libpl111winsys.a(pl111_drm_winsys.c.o): In function `pl111_drm_screen_create':
pl111_drm_winsys.c:(.text+0x33): undefined reference to `vc4_drm_screen_create_renderonly'
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
This patch is ported from RadeonSI and it has two effects.
It fixes a rendering issue which affects F1 2017 and Dawn
of War 3 (Vega only) because LLVM was ending up by generating
the new v_mad_mix_{hi,lo} instructions which appear to be
buggy in some way. Not sure if Mesa is generating something
wrong or if the issue is in LLVM only. Anyway, that explains why
the DOW3 issue can't be reproduced with GL on Vega.
It also improves performance because v_cvt_pkrtz_f16 is faster,
and because I guess the rounding mode behaviour is similar between
GL and VK, we can use it. About performance, it improves Talos
by +3/4% but I don't see any other impacts.
No CTS regressions on Polaris.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
To avoid any vulkan driver to include the GL mtypes.h. Renamed as
eventually this could be used by drivers not using nir.
v2: remove compiler/spirv/spirv.h from mtypes (Alejandro)
v3: added the definition at compiler/shader_info.h (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
At this point dc_job->cache_item_metadata.keys always equals
NULL, so call to free() is useless
Fixes: b86ecea344 ("util/disk_cache: write cache item metadata to disk")
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
In that case, nir_eval_const_opcode() will evaluate the conversion
but as it was using destination's bit_size, the resulting
value was just a cast of the source constant value. By passing the
source's bit size, it does the conversion properly.
Fixes:
dEQP-VK.spirv_assembly.instruction.*.opspecconstantop.*convert*
v2:
- Remove invalid conversion op cases.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This helps avoid compiler warningss in the next commit - everything
was initialized, but it wasn't obvious to static analysis.
Suggested-by: Tapani Pälli <tapani.palli@intel.com>
This fixes a crash in:
KHR-GL45.enhanced_layouts.xfb_block_stride
Fixes: 0822517936 "glsl: add helper to process xfb qualifiers during linking"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If an array is accessed within an if block, then currently it is not known
whether the value in the address register is involved in the evaluation of the
if condition, and converting the if condition may actually result in
out-of-bounds array access. Consequently, if blocks that contain indirect array
access should not be converted.
Fixes piglits on r600/BARTS:
spec/glsl-1.10/execution/variable-indexing/
vs-output-array-float-index-wr
vs-output-array-vec3-index-wr
vs-output-array-vec4-index-wr
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104143
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We just pass these in from outside in a constant buffer.
The shader side stores them once they are accessed once.
v2: fix to not use a temp_reg.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This add paths to handle TGSI compute shaders and shader selection.
It also avoids emitting certain things on tgsi paths,
CBs, vertex buffers, config reg init (not required).
v1.1: fix rat mask calc
Signed-off-by: Dave Airlie <airlied@redhat.com>
This appears to cause hangs with compute images. Unless
we can find more specifics, just don't do this for now.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Until now it was part of spirv_to_nir_options. But it will be used on
the implementation of ARB_gl_spirv and ARB_spirv_extensions, and added
to the OpenGL context, as a way to save what SPIR-V capabilities the
current OpenGL implementation supports.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The handle type in the case statement is supposed to be VK_EXTERNAL_-
MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT.
Fixes: ab18e8e59b ("anv: Implement VK_EXT_external_memory_dma_buf")
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The handle type in the case statement is supposed to be VK_EXTERNAL_-
MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT.
Fixes: 546e747867 ("radv: Implement VK_EXT_external_memory_dma_buf")
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
`declare_dependency()` takes `compile_args`, not `c_args`.
It was correct in all the other `declare_dependency()` from that commit.
Fixes: 0bbecc5a85 "meson: define driver dependencies"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Currently there are no users of these outside of extensions.c.
Provide some information why they exist and how to use them.
Cc: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Together with "radeonsi: fix the R600_RESOURCE_FLAG_UNMAPPABLE check",
this ensures that sparse buffers are placed in VRAM.
Noticed by an assertion that started triggering with commit d4fac1e1d7
("gallium/radeon: enable suballocations for VRAM with no CPU access")
Fixes KHR-GL45.sparse_buffer_tests.BufferStorageTest in debug builds.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
The flag is on the pipe_resource, not the r600_resource.
I don't see an obvious bug related to this, but it could potentially lead
to suboptimal placement of some resources.
Fixes: a41587433c ("gallium/radeon: add R600_RESOURCE_FLAG_UNMAPPABLE")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
SSBO loads were using byte_scattered read messages as they allow
reading 16-bit size components. byte_scattered messages can only
operate one component at a time so we needed to emit as many messages
as components.
But for vec2 and vec4 of 16-bit, being multiple of 32-bit we can use the
untyped_surface_read message to read pairs of 16-bit components using only
one message. Once each pair is read it is unshuffled to return the proper
16-bit components. vec3 case is assimilated to vec4 but the 4th component
is ignored.
16-bit scalars are read using one byte_scattered_read message.
v2: Removed use of stride = 2 on sources (Jason Ekstrand)
Rework optimization using unshuffle 16 reads (Chema Casanova)
v3: Use W and D types insead of HF and F in shuffle to avoid rounding
erros (Jason Ekstrand)
Use untyped_surface_read for 16-bit vec3. (Jason Ekstrand)
v4: Use subscript insead of chaging type and stride (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Currently, we use byte-scattered write messages for storing 16-bit
into an SSBO. This is because untyped surface messages have a fixed
32-bit size.
This patch optimizes these 16-bit writes by combining 2 values (e.g,
two consecutive components aligned with 32-bits) into a 32-bit register,
packing the two 16-bit words.
16-bit single component values will continue to use byte-scattered
write messages. The same will happens when the first consecutive
component is not aligned 32-bits.
This optimization reduces the number of SEND messages used for storing
16-bit values potentially by 2 or 4, which cuts down execution time
significantly because byte-scattered writes are an expensive
operation as they only write a component for message.
v2: Removed use of stride = 2 on sources (Jason Ekstrand)
Rework optimization using shuffle 16 write and enable writes
of 16bit vec4 with only one message of 32-bits. (Chema Casanova)
v3: - Fix coding style (Eduardo Lima)
- Reorganize code to avoid duplication. (Jason Ekstrand)
- Include new comments to explain the length calculations to
fix alignment issues of components. (Jason Ekstrand)
- Fix issues with writemask yz with 16-bit writes. (Jason Ektrand)
v4: (Jason Ekstrand)
- Reorganize 64-bit ssbo-writes to avoid using slots_per_component.
- Comment about why suffle is needed when using byte_scattered_write.
Signed-off-by: Eduardo Lima <elima@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Enables SPV_KHR_16bit_storage on gen 8+.
VK_KHR_16bit_storage is enabled for SSBO/UBO using the
VK_KHR_get_physical_device_properties2 functionality to expose
if the extension is supported or not.
v2: update due rebase against master (Alejandro)
v3: (Jason Ekstrand)
- Move this patch up in VK_KHR_16bit_storage series enabling only
storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccess.
- Only expose VK_KHR_16bit_storage on Gen8+
v4: (Jason Ekstrand)
- Squash enable SPV_KHR_16bit_storage into VK_KHR_16bit_storage
enablement for SSBO/UBO.
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
load_ubo is using 32-bit loads as uniforms surfaces have a 32-bit
surface format defined. So when reading 16-bit components with the
sampler we need to unshuffle two 16-bit components from each 32-bit
component.
Using the sampler avoids the use of the byte_scattered_read message
that needs one message for each component and is supposed to be
slower.
v2: (Jason Ekstrand)
- Simplify component selection and unshuffling for different bitsizes
- Remove SKL optimization of reading only two 32-bit components when
reading 16-bits types.
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
This helpers are used to load/store 16-bit types from/to 32-bit
components.
The functions shuffle_32bit_load_result_to_16bit_data and
shuffle_16bit_data_for_32bit_write are implemented in a similar
way than the analogous functions for handling 64-bit types.
v1: Explain need of temporary in shuffle operations. (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Used to enable 16-bit reads at do_untyped_vector_read, that is used on
the following intrinsics:
* nir_intrinsic_load_shared
* nir_intrinsic_load_ssbo
v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand)
v3: - Add bitsize to scattered read operation (Jason Ekstrand)
- Remove implementation of 16-bit UBO read from this patch.
- Avoid assertion at opt_algebraic caused by ADD of two IMM with
offset with BRW_REGISTER_TYPE_UD type found on matrix tests.
(Jose Maria Casanova)
v4: (Jason Ekstrand)
- Put if case for 16-bits at the beginning of the if ladder.
- Use type_sz(dest.type) * 8 as bit_size parameter for scattered read.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: Fix alignment style (Topi Pohjolainen)
(Jason Ekstrand)
- Enable bit_size parameter to scattered messages to enable different
bitsizes byte/word/dword.
- Remove use of brw_send_indirect_scattered_message in favor of
brw_send_indirect_surface_message.
- Move scattered messages to surface messages namespace.
- Assert align1 for scattered messages and assume Gen8+.
- Inline brw_set_dp_byte_scattered_read.
v3: (Jason Ekstrand)
- Use renamed brw_byte_scattered_data_element_from_bit_size method
- Assert scattered read for Gen8+ and Haswell.
- Use conditional expresion at components_read.
- Include comment about params for scattered opcodes.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
While on Untyped Surface messages the bits of the execution mask are
ANDed with the corresponding bits of the Pixel/Sample Mask, that is
not the case for byte scattered writes. That is needed to avoid ssbo
stores writing on helper invocations. So when that can affect, we load
the sample mask, and predicate the send message.
Note: the need for this patch was tested with a custom test. Right now
the 16 bit storage CTS tests doesnt need this path in order to get a
full pass.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We need to rely on byte scattered writes as untyped writes are 32-bit
size. We could try to keep using 32-bit messages when we have two or
four 16-bit elements, but for simplicity sake, we use the same message
for any component number. We revisit this aproach in the follwing
patches.
v2: Removed use of stride = 2 on 16-bit sources (Jason Ekstrand)
v3: (Jason Ekstrand)
- Include bit_size to scattered write message and remove namespace
- specific for scattered messages.
- Move comment to proper place.
- Squashed with i965/fs: Adjust type_size/type_slots on store_ssbo.
(Jose Maria Casanova)
- Take into account that get_nir_src returns now WORD types for
16-bit sources instead of DWORD.
v4: (Jason Ekstrand)
- Rename lenght variable to num_components.
- Include assertions before emit_untyped_write.
- Remove type_slot in favor of num_slot and first_slot.
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: (Jason Ekstrand)
- Enable bit_size parameter to scattered messages to enable different
bitsizes byte/word/dword.
- Remove use of brw_send_indirect_scattered_message in favor of
brw_send_indirect_surface_message.
- Move scattered messages to surface messages namespace.
- Assert align1 for scattered messages and assume Gen8+.
- Inline brw_set_dp_byte_scattered_write.
v3: - Remove leftover newline (Topi Pohjolainen)
- Rename brw_data_size to brw_scattered_data_element and use
defines instead of an enum (Jason Ekstrand)
- Assert scattered write for Gen8+ and Haswell (Jason Ekstrand)
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Although from SPIR-V point of view, rounding modes are attached to the
operation/destination, on i965 it is a status, so we don't need to
explicitly set the rounding mode if the one we want is already set.
Taking into account that the default mode is RTE, one possible
optimization would be optimize out the first RTE set for each
block. For in order to work, we would need to take into account block
interrelationships. At this point, it is not worth to complicate the
optimization for such small gain.
v2: Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate
with the rounding mode (Curro)
v3: Reset optimization for every block. (Jason Ekstrand)
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
By default we don't set the rounding mode. We only set
round-to-near-even or round-to-zero mode if explicitly set from nir.
v2: Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate
with the rounding mode (Curro)
v3: Use new helper brw_rnd_mode_from_nir_op (Jason Ekstrand)
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Although it is possible to emit them directly as AND/OR on brw_fs_nir,
having a specific opcode makes it easier to remove duplicate settings
later.
v2: (Curro)
- Set thread control to 'switch' when using the control register
- Use a single SHADER_OPCODE_RND_MODE opcode taking an immediate
with the rounding mode.
- Avoid magic numbers setting rounding mode field at control register.
v3: (Curro)
- Remove redundant and add missing whitespace lines.
- Match printing instruction to IR opcode "rnd_mode"
v4: (Topi Pohjolainen)
- Fix code style.
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Control register cr0 in i965 can be used to change the rounding modes
in 32-bit to 16-bit floating-point conversions.
From intel Skylake PRM, vol 07, section "Register and Tegister Regions",
subsection "Control Register" (page 754):
"Subregister cr0.0:ud contains normal operation control fields such as the
floating-point mode ... "
Floating-point Rounding mode is changed at bits 5:4 of cr0.0:
"Rounding Mode. This field specifies the FPU rounding mode. It is
initialized by Thread Dispatch."
00b = Round to Nearest or Even (RTNE)
01b = Round Up, toward +inf (RU)
10b = Round Down, toward -inf (RD)
11b = Round Toward Zero (RTZ)"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Conversions to 16-bit need having aligment between the 16-bit
and 32-bit types. So the conversion operations unpack 16-bit types
to with an stride=2 and then applies a MOV with the conversion.
v2 (Jason Ekstrand):
- Avoid the general use of stride=2 for 16-bit register types.
v3 (Topi Pohjolainen)
- Code style fix
(Jason Ekstrand)
- Now nir_op_f2f16 was renamed to nir_op_f2f16_undef
because conversion to f16 with undefined rounding is explicit
Signed-off-by: Eduardo Lima <elima@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Note that we don't remove the assert at i965/vec4. At this point half
float support is only for the scalar backend.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
These types have similar vec4 sizes as their 32-bit counterparts.
The vec4 backend doesn't support 16-bit types and probably never will,
but this method is called by the scalar backend at
fs_visitor::nir_setup_outputs(), so we still need to provide valid vec4
sizes for 16-bit types. In the future, something different should be
implemented to avoid this dependency.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
SpvOpFConvert now manages the FPRoundingMode decorator for the
returning values enabling the nir_rounding_mode in the conversion
operation to fp16 values.
v2: Fixed breaking of specialization constants. (Jason Ekstrand)
v3: Avoid nir_rounding_mode * casting. (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: Added more missing implementations of 16-bit types. (Jason Ekstrand)
v3: Store values in values[0].u16[i] (Jason Ekstrand)
Include switches based on bitsize for 16-bit types
(Chema Casanova)
v4: Coding style fixes (Jason Ekstrand)
Use vtn_u64_literal and u64[0] at 64-bit SpvOpConstant (Jason Ekstrand)
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Eduardo Lima <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
nir_type_conversion enables new operations to handle rounding modes to
convert to fp16 values. Two new opcodes are enabled nir_op_f2f16_rtne
and nir_op_f2f16_rtz.
The undefined behaviour doesn't has any effect and uses the original
nir_op_f2f16 operation.
v2: Indentation fixed (Jason Ekstrand)
v3: Use explicit case for undefined rounding and assert if
rounding mode is used for non 16-bit float conversions
(Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will include the following NIR ALU opcodes:
* nir_op_i2i16
* nir_op_i2f16
* nir_op_u2u16
* nir_op_u2f16
* nir_op_f2i16
* nir_op_f2u16
* nir_op_f2f16
v2: Remove "from" 16-bit in commit subject (Topi Pohjolainen)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: Renamed glsl_half_float_type() to glsl_float16_t_type().
(Jason Ekstrand)
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Eduardo Lima <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is basically to avoid "not handle in switch" warnings.
v2: Let the new types hit the assertion instead. (Marek Olšák
and Jason Ekstrand)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Adds new INT16, UINT16 and FLOAT16 base types.
The corresponding GL types for half floats were reused from the
AMD_gpu_shader_half_float extension. The int16 and uint16 types come from
NV_gpu_shader_5 extension.
This adds the builtins and the lexer support.
To avoid a bunch of warnings due to cases not handled in switch, the
new types have been added to a few places using same behavior as
their 32-bit counterparts, except for a few trivial cases where they are
already handled properly. Subsequent patches in this set will provide
correct 16-bit implementations when needed.
v2: * Use FLOAT16 instead of HALF_FLOAT as name of the base type.
* Removed float16_t from builtin types.
* Don't copy 16-bit types as if they were 32-bit values in
copy_constant_to_storage().
* Use get_scalar_type() instead of adding a new custom switch
statement.
(Jason Ekstrand)
v3: Use GL_FLOAT16_NV instead of GL_HALF_FLOAT for consistency
(Ilia Mirkin)
v4: Add missing 16-bit base types support in glsl_to_nir (Eduardo Lima).
v5: Fix coding style (Topi Poholainen).
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Eduardo Lima <elima@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Not to be confused with variablePointersStorageBuffer which is the
subset of VK_KHR_variable_pointers required to enable the extension.
This means we now have "full" support for variable pointers.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
The SPIR-V spec is a bit underspecified when it comes to exactly how
you're allowed to use OpPtrAccessChain and what it means in certain edge
cases. In particular, what if the base pointer of the OpPtrAccessChain
points to the base struct of an SSBO instead of an element in that SSBO.
The original variable pointers implementation in mesa assumed that you
weren't allowed to do an OpPtrAccessChain that adjusted the block index
and asserted such. However, there are some CTS tests that do this and,
if the CTS does it, someone will do it in the wild so we should probably
handle it. With this commit, we significantly reduce our assumptions
and should be able to handle more-or-less anything.
The one assumption we still make for correctness is that if we see an
OpPtrAccessChain on a pointer to a struct decorated block that the block
index should be adjusted. In theory, someone could try to put an array
stride on such a pointer and try to make the SSBO an implicit array of
the base struct and we would not give them what they want. That said,
any index other than 0 would count as an out-of-bounds access which is
invalid.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This is required for being able to handle OpPtrAccessChain in SPIR-V
where the base type of the incoming pointer requires us to add to the
block index instead of the byte offset.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Before, we always left workgroup variables as shared nir_variables and
let the driver call nir_lower_io. This adds an option to do the
lowering directly in spirv_to_nir. To do this, we implicitly assign the
variables a std430 layout and then treat them like a UBO or SSBO and
immediately lower all the way to an offset.
As a side-effect, the spirv_to_nir pass now handles variable pointers
for workgroup variables.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Up until now, all pointers have been ivec2s. We're about to add support
for pointers to workgroup storage and those are going to be uints.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
There is no good reason why we should have the same logic repeated in
get_vulkan_resource_index and vtn_ssa_offset_pointer_dereference. If
we're a bit more careful about how we do things, we can just use the one
function and get rid of the other entirely. This also makes the push
constant special case a lot more clear.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This commit moves them both into vtn_variables.c towards the top, makes
them take a vtn_builder, and replaces a hand-rolled instance of
is_external_block with a function call.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This makes us key off of !offset instead of !block_index. It also puts
the guts inside a switch statement so that we can handle more than just
UBOs and SSBOs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This parallels what we do for vtn_block_load except that we don't yet
support anything except SSBO loads through this path.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This is equivalent and means we don't have resource index code scattered
about.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
The layer parameter is signed. Fixes the error message seen when
running the arb_texture_multisample-errors test which checks a
negative layer value.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
After the mesa/st nir linking support, we start to see inputs/outputs
like:
decl_var shader_out INTERP_MODE_NONE float packed:uv (VARYING_SLOT_VAR9.x, 1, 0)
decl_var shader_out INTERP_MODE_NONE float packed:uv@0 (VARYING_SLOT_VAR9.y, 1, 0)
(ie. were location_frac != .x)
Unfortunately I overlooked the addition of the component parameter to
load_input/store_output, so when we started encountering inputs/outputs
with component other than .x, we'd end up loading/storing the wrong
input/output.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Since in the NIR case, driver takes ownership of the NIR shader, we need
to clone what is passed to the driver. Normally this is done as part of
creating the shader variant (where is clone is anyways needed). But
compute shaders have no variants, so we were cloning earlier.
The problem is that after the NIR linking optimizations, we ended up
cloning *before* all the lowering passes where done.
So move this into st_get_cp_variant(), to make compute shaders work more
like other shader stages.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The atomic rat path has a bug in the ssbo path, refactor out the
address calcs from the load/store paths and reuse to fix the bug
in the buffer rat atomic path.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just changes how thread id loading is done, it makes
smaller shaders if we don't use thread id gprs.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Commit 94ca8e04ad ("spirv: Add vtn_fail and vtn_assert helpers") broke
Android builds which have -Werror enabled with the following errors:
external/mesa3d/src/compiler/spirv/spirv_to_nir.c:272:1: error: control may reach end of non-void function [-Werror,-Wreturn-type]
external/mesa3d/src/compiler/spirv/spirv_to_nir.c:810:1: error: control may reach end of non-void function [-Werror,-Wreturn-type]
...
The problem is the noreturn attribute is not enabled and we to define
HAVE_FUNC_ATTRIBUTE_NORETURN.
Auditing src/util/macros.h, we're also missing
HAVE_FUNC_ATTRIBUTE_RETURNS_NONNULL and HAVE_FUNC_ATTRIBUTE_WARN_UNUSED_RESULT,
so add them too.
Fixes: 94ca8e04ad ("spirv: Add vtn_fail and vtn_assert helpers")
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
libmesa_amd_common static dependency is added in Android build
to avoid the following building errors:
In file included from external/mesa/src/gallium/drivers/radeon/r600_buffer_common.c:24:
In file included from external/mesa/src/gallium/drivers/radeonsi/si_pipe.h:26:
external/mesa/src/gallium/drivers/radeonsi/si_shader.h:138:10: fatal error: 'ac_binary.h' file not found
^~~~~~~~~~~~~
1 error generated.
...
In file included from external/mesa/src/gallium/drivers/radeon/r600_gpu_load.c:34:
In file included from external/mesa/src/gallium/drivers/radeonsi/si_pipe.h:26:
external/mesa/src/gallium/drivers/radeonsi/si_shader.h:138:10: fatal error: 'ac_binary.h' file not found
^~~~~~~~~~~~~
1 error generated.
Fixes: 950221f923 ("radeonsi: remove r600_common_screen")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
On Cayman we don't use the append/consume counters (fglrx doesn't)
and they don't seem to work well with compute shaders.
This just uses GDS instead to do the atomic operations.
v1.1: remove unused line.
v2: use EOS on cayman, it appears to work.
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
v2: - set d3d_drivers_path instead of dri_drivers_path
- Fix nine guard to check for all relavent gallium drivers
- Link with libswdri and libswkmsdri when necessary
- Fix pkg-config generation
- Add missing comma
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This argument is the wrong approach for handling gallium media state
trackers, since it doesn't allow for an auto option. Instead we'll use
tristates, which do allow for auto.
This option has never been wired to anything anyway.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Which is required for the gallium media state trackers.
v2: - Make symlinks local instead of absolute
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
v2: - put driver_swrast in the right field
- add dep_threads (dep_llvm requires threads, so it masked this
previously)
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
This allow us to encapsulate the compiler and linkage requirements of
each driver in a reusable way. The result will be that each target that
needs a specific driver can simply add `driver_<name>` to its
dependencies line and the necessary libraries and compiler args will be
added. This will allow for a lot of code de-duplication between gallium
targets.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This is a requirement of the next patch. Since meson does not have
forward declarations, and we're going to define the driver dependencies
in the drivers folder they need to be after the winsys so that the
winsys libs are defined first.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
So that state trackers, targets, and special winsys requirements are all
in a single if statement. This is a cosmetic only cleanup with no
functional changes.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Gen10 doesn't automatically decode the clear color of sRGB buffers. To
get correct rendering, avoid fast-clearing such buffers for now.
The driver now passes the following piglit tests:
* spec@arb_framebuffer_srgb@msaa-fast-clear
* spec@ext_texture_srgb@multisample-fast-clear gl_ext_texture_srgb
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There are some case where the dri3 loader is covering for underlinkage
for GLX and EGL, provide the linkage that they actually need.
v2: - remove dep_xcb_dri3 from glx. This was an oversight in v1 and is
not needed.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Jon Turney <jon.turney@dronecode.org.uk> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Currently this pkg-config file is only installed if a classic dri driver
is built. This is wrong, it should be installed if any dri driver is
installed, which includes the gallium dri target.
Reported-by: Marc Dietrich <marvin24@gmx.de>
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The reference value in gen_device_info isn't going to be acurate on
Gen10+. We should query it from the kernel, which reads a couple of
register to compute the actual value.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
We cannot figure this value out of the PCI-id anymore. Let's read it
from the kernel (which computes this from a few registers).
When running on a (upcoming) 4.16-rc1+ kernel, this will fixes piglit
tests on CNL :
spec@arb_timer_query@query gl_timestamp
spec@arb_timer_query@timestamp-get
spec@ext_timer_query@time-elapsed
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
The WSI core code does all the hard work. Just add the wrappers and
turn it on.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Now that we have anv_device_init/finish functions, there's no reason to
have the individual driver do any more work than that.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This drops the unneeded callbacks struct as well as the queue_get_family
callback we were using before we'd pulled QueuePresent inside.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Unfortunately, due to the fact that AcquireNextImage does not take a
queue, the ANV trick for triggering the fence won't work in general. We
leave dealing with the fence up to the caller for now.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
v2 (Jason Ekstrand):
- Rebase
- Alter the names of the helpers to better match the vulkan entrypoints
- Use the helpers in anv
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This lets us save a QueueSubmit and it also makes prime a lot less
X11-specific. Also, it means we can only wait on the semaphores once
instead of on every blit.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This moves bits out of all four corners (anv, radv, x11, wayland) and
into the wsi common code. We also switch to using an outarray to ensure
we get our return code right.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Now that we're using the same common code as radv, we get prime support
for free. Just enable it.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Neither mesa driver really cares, but we should set it none the less for
the sake of correctness.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
v2 (Jason Ekstrand):
- Better comit message
- Rebase
- Re-indent to follow wsi_common style
- Drop the unneeded _swapchain from the newly added helper
- Make the clone more true to the original (as per the rebase)
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Just check if image has scanout flag set
v2 (Jason Ekstrand):
- Rebase
- Also drop the now unused radv_mem_flag_bits enum
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This uses the mock extension created in a previous commit to tell the
driver that the image it's just been asked to create is, in fact, a
window system image with whatever assumptions that implies. There was a
lot of redundant code between the two drivers to do basically exactly
the same thing.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
We need it to happen after memory type setup so that we can query memory
types in wsi_device_init.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This lets us set the BO tiling when we allocate the memory. This is
required for GL to work properly.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This is a modified version of the patch originally sent by Chad Versace.
The primary difference is that this version claims that OPQAUE_FD and
DMA_BUF are compatible handle types.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This gives the opportunity to collect some function pointers if we'd
like which will be very useful in future.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This fixes a potential leak if allocating the swapchain fails. Since
geometry checking and bit-depth fetching is self-contained, it makes
sense to just do it first so we can delete the geometry reply.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This is used to hold information about the allocated image, rather than
an ever-growing function argument list.
v2 (Jason Ekstrand):
- Rename wsi_image_base to wsi_image
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This just seems cleaner, and we may expand this in future.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This fixes hangs on GFXBench 5's Aztec Ruins benchmark.
Unfortunately, it regresses OglCSCloth performance by about 10%. There
are some ideas for fixing that.
The Vulkan driver already emits this stall.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We need to be able to emit PIPE_CONTROLs from genX_state_upload.c,
which can't safely include brw_defines.h because it conflicts with
genxml. Move all the PIPE_CONTROL related stuff together into a
separate header.
Reviewed-by: Matt Turner <mattst88@gmail.com>
These helpers are much nicer than just using assert because they don't
kill your process. Instead, it longjmps back to spirv_to_nir(), cleans
up all the temporary memory, and nicely returns NULL. While crashing is
completely OK in the Vulkan world, it's not considered to be quite so
nice in GL. This should help us to make SPIR-V parsing much more
robust. The one downside here is that vtn_assert is not compiled out in
release builds like assert() is so it isn't free.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
This commit reworks the way that logging works in SPIR-V to provide
richer and more detailed logging infrastructure. This commit contains
several improvements over the old mechanism:
1) Log messages are now more detailed. They contain the SPIR-V byte
offset as well as source language information from OpSource and
OpLine.
2) There is now a logging callback mechanism so that errors can get
propagated to the client through debug callbak extensions.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
This simply moves allocating the vtn_builder and initializing it to the
very beginning before we even parse the header.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
The separate stencil buffer was not also getting marked as valid if
written by a draw/clear, resulting in gmem2mem getting skipped. Move
this into fd_batch_resource_used() which also handles the separate
stencil case.
Also fix restore_buffers typo.
Fixes: 4ab6ab8036 freedreno: avoid mem2gmem for invalidated buffers
Signed-off-by: Rob Clark <robdclark@gmail.com>
Otherwise huge amount of spam from instr-a2xx.h.. gcc has no way to know
that freedreno was never built with such an old gcc version to care
about the bugs in old gcc ;-)
Reported-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
[added commit message]
Signed-off-by: Rob Clark <robdclark@gmail.com>
We validate that the interface block array type's definition matches.
However, previously, the function could be called if an non-array
interface block has different type definitions -for example, when the
precision qualifier differs in a GLSL ES shader, we would create two
different types-, and it would return invalid as both definitions are
non-arrays.
We fix this by specifying that at least one definition should be an
array to call the validation.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
They might mismatch due to the two shaders using different GLSL
versions, and that's ok in desktop GL. In ES, precision qualifiers
don't need to match.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
spotExponent and spotCosCutoff were swapped in the
gl_builtin_uniform_element struct.
Now the order matches across gl_builtin_uniform_element,
glsl_struct_field and the spec.
Reviewed-by: Brian Paul <brianp@vmware.com>
GLSL shaders can access the normal scale factor with the built-in
gl_NormalScale. Mesa's modelspace lighting optimization uses a different
normal scale factor than defined in the spec. We have to take care not
to use this factor for gl_NormalScale.
Mesa already defines two seperate states: state.normalScale and
state.internal.normalScale. The first is used by the glsl compiler
while the later is used by the fixed function T&L pipeline. Previously
the only difference was some component swizzling. With this commit
state.normalScale always uses the normal scale factor for eyespace
lighting.
Reviewed-by: Brian Paul <brianp@vmware.com>
This creates a common function that can be shared by the tgsi
and nir backends.
v2: use LLVMBuildBitCast() directly
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This pass is more fully featured, it supports geom and tess shaders.
It also supports interpolation intrinsics.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The gallium glsl->nir pass currently lowers away all indirects on both inputs
and outputs. This fuction allows us to lower vs inputs and fs outputs and also
lower things one stage at a time as we don't need to worry about indirects
on the other side of the shaders interface.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Because NIR can create non vec4 variables when implementing component
packing we need to make sure not to reprocess the same slot again.
Also we can drop the fs_attr_idx counter and just use driver_location.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
NIR component packing will be inserted between these calls and the
calling of st_glsl_to_nir_post_opts().
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We need to be able to do these NIR opts in the state tracker
rather than the driver in order for the NIR linking opts to
be useful.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We want to be able to generate NIR then apply NIR optimisations.
Once the optimisations are done we can then apply the new post opt
function which assigns uniforms etc based on the optimised IR.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
v2: update shader info input/output masks when pack components
v3: make sure interpolation loc matches, this is required for the
radeonsi NIR backend.
v4: 33dca36f4f fixed nir_gather_info to update outputs_read
correct, make sure we also adjust this correctly when
packing components.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v3)
V2:
- fix matrix support, non-array matrices were being skipped in v1
v3:
- handle lowering of tcs output loads correctly
- correctly mark indirect locations for either in or out not both
when processing a stage.
- use nir_src_copy() when lowering stores.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
It isn't just load instructions that have write-after-read hazzard.
Fixes stk gaussian blur compute shaders.
Signed-off-by: Rob Clark <robdclark@gmail.com>
In some cases we can end up trying to add a write dependency on ourself,
which shouldn't trigger a flush.
Avoids an extra couple flushes per from in stk.
Signed-off-by: Rob Clark <robdclark@gmail.com>
ctx->last_fence isn't such a terribly clever idea, if batches can be
flushed out of order. Instead, each batch now holds a fence, which is
created before the batch is flushed (useful for next patch), that later
gets populated after the batch is actually flushed.
Signed-off-by: Rob Clark <robdclark@gmail.com>
In transfer_map(), when we need to flush batches that read from a
resource, we should be holding screen->lock to guard against race
conditions. Somehow deferred flush seems to make this existing
race more obvious.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This is a bit more general and lets us pass additional options into the
spirv_to_nir pass beyond what capabilities we support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Instead of emitting absolutely everything, just emit the few functions
that are actually referenced in some way by the entrypoint. This should
save us quite a bit of time when handed large shader modules containing
many entrypoints.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Since almost all BOs will be in one CL at a time, this cache will almost
always hit except for the first usage of the BO in each CL.
This didn't show up as statistically significant on the minetest trace
(n=340), but if I lop off the throttled lobe of the bimodal distribution,
it very clearly does (0.74731% +/- 0.162093%, n=269).
No significant difference in the minetest replay, but it should reduce
overhead by not requiring that we write quad indices to index buffers that
we repeatedly re-upload (and making the draw packet smaller, as well).
Over the course of the series the actual game seems to be up by 1-2 fps.
Now that there's only one user of it, it's pretty obvious how to avoid
emitting redundant ones. This should save a bunch of kernel validation
overhead.
No statistically sigificant difference on the minetest trace I was looking
at (n=169), but the maximum FPS is up by .3%
Originally there was CL code for handling various relocations back when I
had relocs for the TSDA/TA buffers. Now that the kernel handles those
entirely on its own, I can inline that code into the one place using it.
We failed to take the start into account for how many vertices to draw in
this round, so we would end up decrementing count below 0, which as an
unsigned number meant we would loop until the CLs soon ran out of space.
When I wrote the code I was thinking about how to use the previously
emitted shader state (no index bias baked into the elements) by emitting
up to 65535 and then only re-emitting with bias for the second wround, but
that doesn't work if the start is over 65535. Instead, just delay
emitting shader state until we get into the drawarrays GFXH-515 loop and
always bake the bias in when we're doing the workaround.
When we look up the DRI drawable state we need to associate an fbconfig
with the drawable. With GLX_EXT_no_config_context we can no longer infer
that from the context and must instead query the server.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This also fixes a bug, the error path through MakeCurrent didn't
translate the error code by the extension's error base.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The dummy vtable has these slots as NULL already, no need to check for
the dummy context explicitly.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Aimed to work with Glide, which hasn't been a thing in over 10 years.
There are no drivers that implement it, so annotate it as obsolete
v2: Move the extension to OLD/
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Reviewed-by: Adam Jackson <ajax@redhat.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The implementation is a simple 'return EGL_FALSE'. Stop pretending and
simply remove it.
Note: the removal of XMesa API is fine, since there hasn't been any
users for it in years.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
No Mesa driver has implemented the extension in ages. Seemingly non Mesa
drivers don't implement it either.
As mentioned by Ian, the extension is effectively superseded by
ARB_vertex_buffer_object.
v2: Move the extension to OLD/
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Reviewed-by: Adam Jackson <ajax@redhat.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I believe the workaround describes that the MI_LOAD_REGISTER_IMM should
come right after the 3DSTATE_SAMPLE_PATTERN.
This fixes GPU hangs in the i965 initial state batchbuffer when running
some Piglit tests with always_flush_batch=true.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On CNL, we see multiple multisample failures on piglit tests. By
emitting this extra state, though not documented in the bspec, those
failures seem to go away.
This workaround could be removed if we ever find out a better solution,
but it should be good enough for now.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
deviceName string is declared, assigned and freed but actually
never used in dri3_create_screen() function.
Fixes: 2d94601582 ("Add DRI3+Present loader")
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
"vkAllocateCommandBuffers can be used to create multiple command
buffers. If the creation of any of those command buffers fails, the
implementation must destroy all successfully created command buffer
objects from this command, set all entries of the pCommandBuffers
array to NULL and return the error."
This has been suggested by gabriel@system.is.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's really annoying and this pollutes the output especially
when a bunch of non-meta shaders are compiled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This just builds on the image support. Evergreen only has ssbo
for fragment and compute no other stages.
v2: handle images and ssbo in the same shader properly (Ilia)
v3: fix RESQ on buffers,
fix missing atom emit
fix first element offset
use R32 format
write separate buffer rat store path.
(from running deqp gles3.1 tests)
Signed-off-by: Dave Airlie <airlied@redhat.com>
On cayman it appears the cmp component is now in Z.
Fixes:
arb_shader_image_load_store-dead-fragments on cayman.
Signed-off-by: Dave Airlie <airlied@redhat.com>
These didn't handle the TGSI at all properly, this fixes
them to use the common path for 64->32 then adds the 32->int
on at the end.
Fixes:
generated_tests/spec/arb_gpu_shader_fp64/execution/conversion/*
Signed-off-by: Dave Airlie <airlied@redhat.com>
The idea is ported from RadeonSI, but using 512x512 instead of
256x256 seems slightly better. This improves dota2 performance
by +2%.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Fix incomplete check of input params in blorp_surf_convert_to_uncompressed()
which can lead to NULL pointer dereferencing.
Fixes: 5ae8043fed ("intel/blorp: Add an entrypoint for doing
bit-for-bit copies")
Fixes: f395d0abc8 ("intel/blorp: Internally expose
surf_convert_to_uncompressed")
Reviewed-by: Emil Velikov <emli.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
64-bit pull loads are implemented by emitting 2 separate
32-bit pull load messages, where the second message loads from
an offset at +16B.
That addition of 16B to the original offset should not alter the
original offset register used as source for the pull load instruction
though, since the compiler might use that same offset register in other
instructions (for example, for other pull loads in the shader code
that take that same offset as reference).
If the pull load is 32-bit then we only need to emit one message and
we don't need to do offset calculations, but in that case the optimizer
should be able to drop the redundant MOV.
Fixes the following test on Haswell:
KHR-GL45.gpu_shader_fp64.fp64.max_uniform_components
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103007
Prepare for two texture handling paths, the descriptor-based
path will be added in a future commit. These are structured
so that the texture implementation handles its own state
emission.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This needs to be shared between texture_plain and texture_desc.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This will be shared with the texture descriptor path.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Need this to efficiently emit texture descriptor invalidations.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Track varying component offset of the point size output, as well as
provide the offset of the point coord input.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Update state objects to add new state, and emit function to emit new
state.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
- This core must load shaders from memory (AFAIK)
- Yet another new location for UNIFORMS
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Update context reset for HALTI3..HALTI5, sorting states for the HALTI
version that has them.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
RS align is not necessary and might even be harmful when using the BLT
engine for blitting.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Add an implemenation of key clear_blit functions using the BLT engine
that replaced the RS on GC7000.
Also set level->size correctly for imported resources. This is important
for the BLT resolve-in-place path to work for them.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Prepare for BLT-based blitting path by moving RS-based
blitting to the RS implementation file, making this
self-contained.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Want to be able to emit state from the texture implementation,
and the blitter implementation.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
When the BLT is involved as source or target, add an extra BLT
enable/disable sequence around the sync sequence.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
The blob does this, as DRAW_INSTANCED can replace fully all the other
draw commands. It is also required to handle integer vertex formats.
The other path is only there for compatibility and might go away (or at
least rot to become buggy due to dis-use) in newer hardware.
As a by-effect this changes the behavior for GC3000-, by no longer using
the index offset for DRAW_INDEXED but instead adding it to INDEX_ADDR.
This should make no difference.
Preparation for GC7000 support.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
This is used by HALTI2+ (GC3000+) when drawing with DRAW_INSTANCED.
It is also necessary when switching between integer and floating point
vertex element formats.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
We're about to add more of them, and need to pass the whole lot of them
around together when growing them. Putting them in a struct makes this
much easier.
brw->batch.batch.bo is a bit of a mouthful, but it's nice to have things
labeled 'batch' and 'state' now that we have multiple buffers.
Fixes: 2dfc119f22 "i965: Grow the batch/state buffers if we need space and can't flush."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103101
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Once we reach the intended size of the buffer (BATCH_SZ or STATE_SZ), we
try and flush. If we're not allowed to flush, we resort to growing the
buffer so that there's space for the data we need to emit.
We accidentally got the threshold wrong. The first non-wrappable call
beyond (e.g.) STATE_SZ would grow the buffer to floor(1.5 * STATE_SZ),
The next call would see we were beyond STATE_SZ and think we needed to
grow a second time - when the buffer was already large enough.
We still want to flush when we hit STATE_SZ, but for growing, we should
use the actual size of the buffer as the threshold. This way, we only
grow when actually necessary.
v2: Simplify the control flow (suggested by Jordan)
Fixes: 2dfc119f22 "i965: Grow the batch/state buffers if we need space and can't flush."
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The original state buffer was marked with EXEC_OBJECT_CAPTURE. When
growing it, we want to preserve that flag so we continue to capture it
in GPU hang reports.
Fixes: 2dfc119f22 "i965: Grow the batch/state buffers if we need space and can't flush."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The intention here is make the new BO use the same alignment as the old
BO. This isn't strictly necessary, but we would have to update the
'alignment' field in the validation list when swapping it out, and we
don't bother today.
The batch and state buffers use an alignment of 4096, so this should be
equivalent - it's just clearer than cut and pasting a magic constant.
Fixes: 2dfc119f22 "i965: Grow the batch/state buffers if we need space and can't flush."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
STATE_BASE_ADDRESS specifies a maximum size of the dynamic state
section, beyond which data supposedly reads back as 0. On Gen8+,
we were programming it to the size of the buffer. This worked fine
until we started growing the state buffer in commit 2dfc119f22.
When the state buffer grows, the value in STATE_BASE_ADDRESS becomes
too small, and our state beyond STATE_SZ bytes would read back as 0.
To avoid having to update the value, we program it to MAX_STATE_SIZE.
We used to program the upper bound to the maximum on older hardware
anyway, so programming it too large isn't a big deal.
Bogus SURFACE_STATE can easily lead to GPU hangs and misrendering.
DiRT Rally was hitting the statebuffer growth path, and suffered from
bad texture corruption and GPU hangs (usually around the same time).
This patch fixes both issues.
Fixes: 2dfc119f22 "i965: Grow the batch/state buffers if we need space and can't flush."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103101
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Most files in gallium/radeon now include si_pipe.h.
chip_class and family are now here:
sscreen->info.family
sscreen->info.chip_class
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If u_endian.h can't determine the endianess, the default behaviour in sha1.c
is to build for big-endian
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Commit 78942e ("mesa: shrink VERT_ATTRIB bitfields to 32 bits") uses
vs_prog_data->vs_inputs as if it were a 32-bit unsigned integer.
But actually it is a 64-bit integer, and as such it is used in other
parts of Mesa code. It is worth to note that bits from the entire range
are used, and not only 32-bits. This is due our implementation for
handling 64-bit dual-slot input attributes, which requires to use a
larger bitfield to manage them.
This commit reverts the changes done in brw_draw_upload.c, keeping the
rest of the changes.
This fixes the following tests:
- KHR-GL45.enhanced_layouts.varying_array_locations
- KHR-GL45.enhanced_layouts.varying_locations
Fixes: 78942e ("mesa: shrink VERT_ATTRIB bitfields to 32 bits")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103942
CC: Marek Olšák <marek.olsak@amd.com>
CC: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
This is more inline with what the functions name suggests it should
do, and makes the code much easier to follow.
This will also make adding uniform packing support much simpler.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Use the destination write mask to determine which values are really to be
read from LDS and load only these.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
This fixes hangs on cayman with
tests/spec/arb_tessellation_shader/execution/trivial-tess-gs_no-gs-inputs.shader_test
This has a single if/else in it, and when this peephole activated,
it would set the jump target to NULL if there was no instruction
after the final POP. This adds a NOP if we get a jump in this case,
and seems to fix the hangs, so we have a valid target for the ELSE
instruction to go to, instead of 0 (which causes infinite loops).
v2: update last_cf correctly. (I had some other patches hide this)
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is cleaner than using a non-standard memclear macro (which does a
memset to 0) and then initializing fields after the fact. We move the
declarations to where we initialized the fields. While we're at it, we
move the declaration of 'ret' that goes with the ioctl, eliminating the
declaration section altogether.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
These were moved to src/intel/common/gen_debug.h, but they are not
common code. They assume that brw_context or gl_context variables
exist, named brw or ctx. That isn't remotely true outside of i965.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We want to program the 3DSTATE_RASTER field to the gl_context value,
not the other way around.
Fixes: 13ac46557a (i965: Port Gen8+ 3DSTATE_RASTER state to genxml.)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
There was a mistake just in those metric sets. We probably didn't
noticed because they're not really interesting for 3D workloads.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This provides a good way to verify we haven't broken using the perf
driver on older kernels (which don't have the oa config loading
mechanism).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows us to deploy new configurations without touching the
kernel.
v2: Detect loadable configs without creating one (Chris)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When making configs loadable from userspace in the kernel, we left to
userspace more responsability around programming some registers. In
particular one register we use to set directly in the driver has now
been moved into the configs.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
nir_validate.c's #endif already had the correct NDEBUG comment
Fixes: dcb1acdea0 "nir/validate: Only build in debug mode"
Fixes: 9ff71b649b "i965/nir: Validate that NIR passes call nir_metadata_preserve()"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fix a bunch of labels indicating when registers were added/removed
and normalize the SI-class GRBM_GFX_INDEX.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fixes yet another case where DFRACEXP has only one destination. Found
by address sanitizer.
Fixes tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-frexp-dvec4-only-mantissa.shader_test
Fixes: 3b666aa747 ("st/glsl_to_tgsi: fix DFRACEXP with only one destination")
Acked-by: Marek Olšák <marek.olsak@amd.com>
When floating point textures are created on OpenGL ES 2.0, driver
is free to choose used internal format. Mesa makes this decision in
adjust_for_oes_float_texture. Error checking for glTexImage2D properly
checks that sized formats are not used. We use same error checking
path for glTexSubImage2D (since there is lot of overlap), however since
those checks include internalFormat checks, we need to pass original
internalFormat passed by the client. Patch adds oes_float_internal_format
that does reverse adjust_for_oes_float_texture to get that format.
Fixes following test failure:
ES2-CTS.gtf.GL2ExtensionTests.texture_float.texture_float
(when running test with MESA_GLES_VERSION_OVERRIDE=2.0)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103227
Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The only reason why we needed that version was because the Vulkan driver
needed to be able to create the surface states so it could handle
indirect clear colors. Now that blorp handles them natively, there's no
need for the extra entrypoint.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This way uninitialized fields get automatically zeroed and it's safe to
add more fields to blorp_surf.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This doesn't go all the way of avoiding the txf_ms if it's fast-cleared,
however it does at least make us only do it once. This should improve
performance of MSAA resolves in the presence of lots of clear color.
Without the patch, enabling fast-clears in the multisampling Sascha demo
drops the framerate by about 10%. With this patch, enabling fast-clears
increases the demo's framerate by 25%.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
That name is already taken by one of the helpers in blorp_nir_builder.h
and, while we haven't moved the guts of blorp_blit.c there yet, we'd
like to start using some things from that header.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
AOSP master has changed the build default to -Werror making all the
warnings errors. Override that with -Wno-error.
Signed-off-by: Rob Herring <robh@kernel.org>
driver_cache_blob was introduced with the i965 disk cache, it allows
us to simplify the cache a little and possibly offers some minor
speed improvements since we load the GLSL metadata and TGSI from
disk in one pass.
Using driver_cache_blob should also make it straight forward to
implement binary support for ARB_get_program_binary in gallium.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The next commit will reduce the size even more.
v2: typecast to uint64_t manually
v3: add more typecasts, add asserts
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
If the TCS doesn't read back the outputs, no need to store them
to LDS in the first place. (except for tess factors).
This seems to give about 50fps (3290->3330) with tessellation demo.
I haven't tested if it impacts DoW3 at all.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This comes in handy when checking "NV50_PROG_DEBUG=1" outputs with diff!
V2:
- Use environmental variable (Karol Herbst)
V3:
- Use the already populated nv50_ir_prog_info to forward information to the
print pass (Pierre Moreau)
V4:
- get rid of default value in PrintPass constructor
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Memory loads can take offsets, but the SHLADD will often attempt to
consume the offsets too. As there may be multiple memory loads with the
same base but different offsets, those would end up in a SHLADD instead
of the offset of the memory operation.
This moves the pass after we've had a chance to attempt to propagate
immediate adds into the indirect offset.
total instructions in shared programs : 6580681 -> 6567716 (-0.20%)
total gprs used in shared programs : 944261 -> 943375 (-0.09%)
total shared used in shared programs : 0 -> 0 (0.00%)
total local used in shared programs : 15328 -> 15328 (0.00%)
total bytes used in shared programs : 60339896 -> 60221504 (-0.20%)
local shared gpr inst bytes
helped 0 0 555 2698 2698
hurt 0 0 138 336 336
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
When a MERGE operation gets its constraint moves added, it
susbstantially extends live ranges to be reusing an immediate from
earlier in the program (not to mention the silliness of loading an
immediate into a register, and then moving into another register).
We detect these scenarios and insert moves that take the immediate or
constbuf load directly into the register. If it's the last use, then we
can just move that operation to the closer location.
With SM35 (255 regs) we get these results:
total instructions in shared programs : 6583670 -> 6580681 (-0.05%)
total gprs used in shared programs : 950818 -> 944261 (-0.69%)
total shared used in shared programs : 0 -> 0 (0.00%)
total local used in shared programs : 15328 -> 15328 (0.00%)
total bytes used in shared programs : 60367456 -> 60339896 (-0.05%)
local shared gpr inst bytes
helped 0 0 4584 3186 3186
hurt 0 0 55 968 968
I suspect they will be better for SM20 and SM30.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
It's common to use signed int modulo in GLSL. As it happens, the GLSL
specs allow the result to be undefined, but that seems fairly
surprising. It's not that much more effort to get it right, at least for
positive modulo operators.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This is a copy of the a5xx logic. Fails a few tests, but basic
functionality is there.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Unfortunately Adreno A4xx hardware returns incorrect results with the
GATHER4 opcodes. As a result, we have to lower to 4 individual texture
calls (txl since we have to force lod to 0). We achieve this using
offsets, including on cube maps which normally never have offsets.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
GL doesn't have this, but some hardware supports it. This is convenient
for lowering tg4 to plain texture calls, which is necessary on Adreno
A4xx hardware.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
The cache-test test program attempts to create a collision (using key_a
and key_a_collide) by making the first two bytes identical. The idea is
fine -- the shader cache wants to use the first four characters of a
SHA1 hex digest as the index.
The following program
unsigned char array[4] = {1, 2, 3, 4};
int *ptr = (int *)array;
for (int i = 0; i < 4; i++) {
printf("%02x", array[i]);
}
printf("\n");
printf("%08x\n", *ptr);
prints
01020304
04030201
on little endian, and
01020304
01020304
on big endian.
On big endian platforms reading the character array back as an int (as
is done in disk_cache.c) does not yield the same results as reading the
byte array.
To get the first four characters of the SHA1 hex digest when we mask
with CACHE_INDEX_KEY_MASK, we need to byte swap the int on big endian
platforms.
Bugzilla: https://bugs.freedesktop.org/103668
Bugzilla: https://bugs.gentoo.org/637060
Bugzilla: https://bugs.gentoo.org/636326
Fixes: 87ab26b2ab ("glsl: Add initial functions to implement an
on-disk cache")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The code defines a macro blk0(i) based on the preprocessor condition
BYTE_ORDER == LITTLE_ENDIAN. If true, blk0(i) is defined as a byte swap
operation. Unfortunately, if the preprocessor macros used in the test
are no defined, then the comparison becomes 0 == 0 and it evaluates as
true.
Fixes: d1efa09d34 ("util: import sha1 implementation from OpenBSD")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
We don't support ARB_vertex_blend.
Note that the attribute aliasing check for ARB_vertex_program had to be
rewritten.
vbo_context: 20344 -> 20008 bytes
gl_context: 74672 -> 74616 bytes
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is needed for profiling multi-context applications like Chrome.
One context can record queries and another context can draw the HUD.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Core Mesa already handles flushing based on ContextReleaseBehavior,
so the comment is wrong.
Also, old_st is always NULL, because unbind_context always precedes
make_current.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
now you can hack the driver to enable DCC for displayable textures and
Glamor that doesn't enable that by default won't crash anymore.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Ideally we'd be able to get the library filename from libtool, but that
doesn't seem to be a feature...
Use of ${uname} is presumably ok here as we won't be running 'make check' if
we are cross-compiling
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This fixes two CTS regressions:
- dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_primary
- dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_secondary
These two tests are part the mustpass lists, so presumably they
are correct and my change was wrong.
This reverts commit 0f68208f1d.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This happens when all BOs have the RADEON_FLAG_NO_INTERPROCESS_SHARING
(DRM version >= 3.23) flag set. This flag is mainly used for reducing
overhead on the userspace side because we don't have to put those BOs
inside the list.
Though, if the driver tries to create a list with 0 buffers inside it,
libdrm returns -EINVAL and the app just crashes.
This fixes a bunch of CTS dEQP-VK.sparse_resources.* fails (~100).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When we split an instruction that reads an uniform value
(vstride 0) we need to respect the vstride on the second
half of the instruction (that is, the second half should
read the same region as the first).
We were doing this already, but we didn't account for
stages that have interleaved input attributes which also
have a vstride of 0 and need the same treatment.
Fixes the following on Haswell:
KHR-GL45.enhanced_layouts.varying_locations
KHR-GL45.enhanced_layouts.varying_array_locations
KHR-GL45.enhanced_layouts.varying_structure_locations
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Andres Gomez <agomez@igalia.com>
Vertex buffer legacy state is no longer picked up with new drawing
commands. Change to use different cases depending on the number of
vertex streams in the GPU specs.
This results in slightly more compact state emission as well, on all
vivantes.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
There's been some Haiku-related activity lately, so let's document who
to cc on these patches.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Alexander von Gluck IV <kallisti5@unixzen.com>
I really intended to set this for all shader stages by
3835009796 but missed it for compute shaders
(because it's in a different source file...).
Reviewed-by: Dave Airlie <airlied@redhat.com>
When the kernel support flagging our BO, let's mark batch &
instruction BOs for capture so then can be included in the error
state.
v2: Only add EXEC_CAPTURE if supported (Kristian)
v3: Fix operator precedence issue (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will allow to set the flags on any anv_bo created/filled from a
state pool or block pool later.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
- fix a number of -Wsign-compare warnings
- fix two warnings for -Woverride-init because TGSI_OPCODE_CEIL == 83, and
the according field was defined two times.
[airlied: don't use -1 with unsigned type,
fix whitespace]
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
So far on pre-cayman chipsets the CF instructions CF_OP_LOOP_END,
CF_OP_CALL_FS, CF_OP_POP, and CF_OP_GDS an extra CF_NOP instruction
was added to add the EOP flag, even though this is not actually
needed, because all these instrutions support the EOP flag.
This patch removes the fixup code, adds setting the EOP flag for the
according instructions as well as others like CF_OP_TEX and CF_OP_VTX,
and adds writing out EOP for this type of instruction in the disassembler.
This also fixes a bug where shaders were created that didn't actually have
the EOP flag set in the last CF instruction, which might have resulted
in GPU lockups.
[airlied: cleaned up a little]
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is still not fully correct (haiku and BSD is notably probably not
correct), but Linux is not regressed and this should be correct for
macOS and Windows.
v2: - set the dri_platform to windows on Cygwin as well (Jon)
v3: - Add a better todo for Hurd (Eric)
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This is necessary to support operating systems other than the *nix
family (excluding macOS). For Linux nothing has changed, the defaults
are still the same.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
There is one provided unconditionally, and one guarded by platform ==
linux. Remove the unconditional one.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
These are all either done already, or are autotools specific. The
misspelled gallium G3DVL is the autotools specific bit, meson is
handling that via build_by_default.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This function is required for both the Intel "Anvil" vulkan driver and
the i965 GL driver. Error out if either of those is enabled but this
function isn't found.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This patch allows building asm for x86 on x86_64 platforms, when the
operating system is the same. Previously cross compile always turned off
assembly. This allows using a cross file to cross compile x86 binaries
on x86_64 with asm.
This could probably be relaxed further thanks to meson's "exe_wrapper",
which is way to specify an emulator or compatibility layer (wine) that
can run the foreign binaries on the build system. Since the meson build
at this point only supports building on Linux I can't test this and I
don't want to write/enable code that cannot even be build tested.
v4: - set condition to build == x86_64 and host == x86 and
build.system == host.system
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This patch checks for an and then enables sse4.1 optimizations if the
host machine will be x86/x86_64.
v2: - Don't compile code, it's unnecessary since we require a compiler
which always has SSE4.1 (Matt)
v3: - x64 -> x86_64 (Matt)
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The HW doesn't add the base level anywhere (the min/max lod clamping is
what does base level), so we need to add it manually in this case.
Fixes piglit tex-miplevel-selection *Lod 2D.
After the first output, we were padding by an extra size of the previous
output. Fixes piglit ext_transform_feedback-output-type mat4x3[2] and
friends.
The HW was computing an implicit height for the surface based on the image
size, but that may be smaller than the surface with ARB_fbo mismatched
sizes. In that case, we need to tell it about the pad, either with the
little 4-bit field in the RT config, or the extended field in
CLEAR_COLORS_PART3.
Fixes piglit arb_framebuffer_object-mixed-buffer-sizes.
The HALTI level is an indication of the gross architecture of the GPU.
It determines for significant part what feature level the GPU has, what
state (especially frontend state) is there, and where it is located.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
This intrinsic is produced to load SYSTEM_VALUE_VERTICES_IN, which is
generated to load gl_PatchVerticesIn in the SPIR-V path for both
Vulkan and OpenGL.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will dump the INTERFACE_DESCRIPTOR_DATA along with the associated
samplers & surfaces.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
The gen had to be changed from 4 to 6 so that we could test MAD, which
is new on Gen6.
mad_imm_float_neg_mov_sat tests the case fixed by the previous commit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From section 8.7, page 179 of OpenGL ES 3.2 spec:
An INVALID_OPERATION error is generated by CompressedTexImage3D
if internalformat is one of the the formats in table 8.17 and target
is not TEXTURE_2D_ARRAY, TEXTURE_CUBE_MAP_ARRAY or TEXTURE_3D.
An INVALID_OPERATION error is generated by CompressedTexImage3D if
internalformat is TEXTURE_CUBE_MAP_ARRAY and the “Cube Map Array”
column of table 8.17 is not checked, or if internalformat is
TEXTURE_3D and the “3D Tex.” column of table 8.17 is not checked.
So far it was only considering TEXTURE_2D_ARRAY as valid target. But as
"Cube Map Array" column is checked for all the cases, in practice we can
consider also TEXTURE_CUBE_MAP_ARRAY.
This fixes KHR-GLES32.core.texture_cube_map_array.etc2_texture
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The blend math gets a bit funky due to inverse blend factors being
in range [0,2] rather than [-1,1], our normalized math can't really
cover this.
src_alpha_saturate blend factor has a similar problem too.
(Note that piglit fbo-blending-formats test is mostly useless for
anything but unorm formats, since not just all src/dst values are
between [0,1], but the tests are crafted in a way that the results
are between [0,1] too.)
v2: some formatting fixes, and fix a fairly obscure (to debug)
issue with alpha-only formats (not related to snorm at all), where
blend optimization would think it could simplify the blend equation
if the blend factors were complementary, however was using the
completely unrelated rgb blend factors instead of the alpha ones...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reducing Bucket index calculation to O(1).
This algorithm calculates the index using matrix method. Assuming
PAGE_SIZE is 4096, matrix arrangement is as below:
1*4096 2*4096 3*4096 4*4096
5*4096 6*4096 7*4096 8*4096
10*4096 12*4096 14*4096 16*4096
20*4096 24*4096 28*4096 32*4096
... ... ... ...
... ... ... ...
... ... ... max_cache_size
From this matrix its clearly seen that every row follows the below way:
... ... ... n
n+(1/4)n n+(1/2)n n+(3/4)n 2n
Row is calculated as log2(size/PAGE_SIZE) Column is calculated as
converting the difference between the elements to fit into power size of
two and indexing it.
Final Index is (row*4)+(col-1)
Tested with Intel Mesa CI.
Improves performance of 3DMark on BXT by 0.705966% +/- 0.229767% (n=20)
v4: Review comments on style and code comments implemented (Ian).
v3: Review comments implemented (Ian).
v2: Review comments implemented (Jason).
Signed-off-by: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
Signed-off-by: Kedar Karanje <kedar.j.karanje@intel.com>
Reviewed-by: Yogesh Marathe <yogesh.marathe@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Currently the target has a redundant guard, and the state tracker isn't
properly guarded.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will allow us to simplify some guards within the gallium directory.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Otherwise, the simulator would complain in tex-miplevel-selection that the
min/max clamp was out of order. The actual HW seems to have clamped to
the max anyway.
Widen fetch shader to SIMD16, enable SIMD16 types in the jitter,
and provide utility EXTRACT/INSERT SIMD8 <-> SIMD16 utility functions.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
We could always do the flush asynchronously, but if we're going to wait
for a fence anyway and the driver thread is currently idle, the additional
communication overhead isn't worth it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This got lost in a rebase but never hurt anything because we happened
to always sync in fence_finish anyway...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
With threaded gallium, the driver may currently be running in another
thread. In that case, we will execute all remaining commands in that
thread instead of syncing, which should be better for cache locality.
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Asynchronous flushes require a proper implementation of
st_server_wait_sync, because we could have the following with
threaded Gallium:
Context 1 app Context 1 driver Context 2
------------- ---------------- ---------
f = glFenceSync
glFlush
<-- app sync --> <-- app sync -->
glWaitSync(f)
.. draw calls ..
pipe_context::flush
for glFenceSync
pipe_context::flush
for glFlush
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The whole point of fence_server_sync is that it can be used to
avoid waiting in the application thread.
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We need to account for SGPR locations in merged shaders.
This case is exercised by KHR-GL45.enhanced_layouts.vertex_attrib_locations
Fixes: 79c2e7388c ("radeonsi/gfx9: use SPI_SHADER_USER_DATA_COMMON")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
As per the spec, the query identified by queryPool and query
must currently be active. Applications have to call vkCmdBeginQuery()
before, and thus the query pool BO will already be in the list.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Similar to how the driver sets the depth clear regs after a
fast depth clear. Most of the time, this will copy a 32-bit reg
instead of a 64-bit reg.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For the fast path, radv_fill_buffer() ensures that the BO is
already in the list. For the slow path, the depth surface is
part of the framebuffer which means the BO is added to the list
when the framebuffer is emitted.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
aspects can't be zero and there is an assertion that ensures
it's not in emit_clear().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
generate_array_index fails to check whether the target of a subroutine
call exists in the AST, potentially passing around null ir_rvalue
pointers eventuating in abort/segfault.
Fixes: fd01840c0b ("glsl: add AoA support to subroutines")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100438
The original spec I had didn't expose integer textures and suggested that
you use unfiltered floats. Now there are proper formats for them.
Fixes 16- and 32-bit texwrap integer tests in piglit, and
dEQP-GLES3.functional.fbo.completeness.renderable.renderbuffer.color0.rgb10_a2ui.
When we tried to clear color while storing depth, it assertion failed
about basically not having enough information to decide which color RT to
clear. It turns out the STORE_GENERAL picks the buffer according to the
color buffer being stored, or all of them if NONE. If you're doing depth,
it doesn't know which to pick.
The OVERWRITE bit disables destination fetches, which is exactly what
we want when there is no valid color buffer bound.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
The brw_disasm_info header is included by certain tools in order to get
shader assembly from binaries so it's a semi-external header. Including
brw_cfg.h also pulls in brw_shader.h so you end up getting quite a bit
of our back-end compiler internals. Instead, make the couple of forward
declarations we need and make the header more stand-alone. This fixes
the meson build.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 4f82b17287
Almost all of our BO export paths were already properly marked the BO as
external and added it to the handle table. Most export use-cases go
through a prime fd or flink where we have a brw_bo export helper that
does the right thing. The one missing one happens when you call
queryImage and ask for __DRI_IMAGE_ATTRIB_HANDLE. We just grabbed the
gem handle out of the BO (because it's really easy to do that) and
handed it off to the client; what could go wrong? As it turns out, this
path is used by basically every compositor that wants to turn around and
call drmModeAddFB2 on it so it can hand it off to display. The result,
as of 4b1e70cc57, is that we no longer set
MOCS_PTE on those surfaces and the kernel's attempts to disable caching
fail and we scanout gets corruption.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103759
Fixes: 4b1e70cc57
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
To avoid problems with MSVC. And verify size with ASSERT_BITFIELD_SIZE().
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Just comparing glxinfo and features.txt, and it seems features.txt is
fairly out of date. The a5xx specific features (compute/images/atomics/
etc) are recent.
Signed-off-by: Rob Clark <robdclark@gmail.com>
It was the only file named intel_* in the compiler.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The old code used an array to store each "instruction group" (the new,
better name than the old overloaded "annotation"), and required a
memmove() to shift elements over in the array when we needed to split a
group so that we could add an error message. This was confusing and
difficult to get right, not the least of which was because the array
has a tail sentinel not included in .ann_count.
Instead use a linked list, a data structure made for efficient
insertion.
Acked-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'm going to change the call in a later patch and with the difference in
indentation level it wasn't immediately obvious that the calls were
identical.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Implement encoding of sps, pps, and silce headers using the newly added h.264
header coding descriptors functions based on h.264 specs.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Since bitstream headers, e.g. sps, pps, slice, are encoded in driver side, we
need to add corresponding algorithms that required to generate those headers.
According to h.264 specs, signed/unsigned interger Exp-Golomb-coded syntax
element with left bit first (code_se and code_ue) and unsigned integer using
n bits (code_fixed_bits) descriptors function are needed. Therefore, adding
those algorithms and related variables and output algorithms here.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Implement required ibs and command buffer submission interfaces for vcn encode
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Add a skeleton pipe video interface and encode ib interface for video encode
on vcn hardware. Add function defines and structures for vcn encode. Update
Makefile.sources and meson.build with newly added files.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
pic_order_cnt_type is a required variable when encoding both sps and
slice header, therefore we need to get this value from st, e.g. vaapi
interface, and then pass it to radeon driver for encoding headers.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Different from vce encoding, vcn encoding requires driver side to encode
bitstream header, such as pps, sps and slice header. pic_order_cnt_type
is a required variable when encoding both sps and slice header, therefore
we need to add this new variable here, and hold the value passed from st,
e.g. vaapi interface
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
* Explicitely convert values to int in comparison.
* Remove one MAYBE_UNUSED that is actually not needed.
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Silence warnings by decoration the parameters with UNUSED.
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Use size_t instread of unsigned for new_max. realloc later expects
size_t as parameter anyway.
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
With --disable-debug a parameter is not used. Silence this
warning by fake-using it.
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Annotate the according parameters accordingly.
v2: move UNUSED decoration in front of parameter declaration
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Annotate the parameters accordingly.
v2: move UNUSED decoration in front of parameter declaration
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
v1: Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Decorate the parameters accordingly with "UNUSED" or "MAYBE_UNUSED" (for
the param that is used in debug mode, but not in release mode).
v2: move UNUSED decoration in front of parameter declaration
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Decorate the params accordingly with "UNUSED".
v2: move UNUSED decoration in front of parameter declaration
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Decorate the params with "UNUSED" accordingly.
v2: move UNUSED decoration in front of parameter declaration
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Decorate the params accordingly with UNUSED or MAYBE_UNUSED (for params
that are used in debug mode).
v2: move *UNUSED decoration in front of parameter declaration
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Silence the warning by making the conversion to int explicit.
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Decorate the unused param accordingly with "UNUSED".
v2: move UNUSED decoration in front of parameter declaration
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Decorate the params accordingly with "UNUSED".
v2: move UNUSED decoration in front of parameter declaration
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Decorate the params accordingly with "UNUSED".
v2: move UNUSED decoration in front of parameter declaration
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Decorate the unused params with "UNUSED".
v2: move UNUSED decoration in front of parameter declaration
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Decorate the unused params with "UNUSED".
v2: move UNUSED decoration in front of parameter declaration
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Decorate the parameters accordingly with "UNUSED".
v2: move UNUSED decoration in front of parameter declaration
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
This warning was issued only in release mode. Fix it by fake-using the
parameter.
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
v2: move UNUSED decoration in front of parameter declaration
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
u_bit_scan may return -1 that then may be interpreted as (unsigned)-1 in
the following comparison, since num_names is unsigned. Convert the latter to
be int as well.
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
asprintf is decorated with the attrbute "warn_unused_result", and if the
function call fails, the pointer "temp" will be undefined, but since it is
used later it should contain some usable value.
Test return value of asprintf and assign some save value to "temp" if
the call failed.
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Annotate the unused parameter.
v2: move UNUSED decoration in front of parameter declaration
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
* Annotate three parameters that are not used in release mode.
* explicitely convert an int to unsigned in an ?: construct.
v2: move MAYBE_UNUSED decoration in front of parameter declaration
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
BLIT_ZS mode is used for either combined z24/s8 or z32 in which case
BLIT_S mode is used for separate stencil.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Declare glsl_type::sampled_type as glsl_base_type as we do for the
base_type field. And make base_type a bitfield to save a few bytes.
Update glsl_type constructor to take glsl_base_type instead of unsigned
and pass GLSL_TYPE_VOID instead of zero.
No Piglit regressions with llvmpipe.
v2:
- Declare both base_type and sampled_type as 8-bit fields
- Use the new ASSERT_BITFIELD_SIZE() macro.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
I've noticed at least two places where we store the TGSI opcode in
an unsigned:8 bitfield. We're at 249 opcodes now. If we hit 256 we'll
need to grow those bitfields. Use the new ASSERT_BITFIELD_SIZE() macro
to detect that.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Use the proper enum types for various variables. Makes life in gdb
a little nicer. Note that the size of enum bitfields must be one
larger so the high bit is always zero (for MSVC).
v2: also increase size of image_format bitfield, per Eric Engestrom.
v3: use the new ASSERT_BITFIELD_SIZE() macro
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
For checking that bitfields are large enough to hold the largest
expected value.
v2: move into existing util/macros.h header where STATIC_ASSERT() lives.
v3: add MAYBE_UNUSED to variable declaration
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There is no need to have these overlap if we support hw atomics.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We want to emit invariant state at the start of a render batch. In the
past, this more or less happened: a new batch flagged BRW_NEW_CONTEXT
(because we don't have hardware contexts), which triggered the
brw_invariant_state atom. So, it would be emitted before any 3D
drawing. (Technically, there might be some BLT commands in the batch
because Gen4-5 have a single combined render/BLT ring, but that should
be harmless).
With the advent of BLORP, this broke. The first item in a batch might
be a BLORP operation, which bypasses the normal draw upload path. So,
we need to ensure invariant state happens first. To do that, we just
upload it when creating a new batch. On Gen6+ we'd need to worry about
whether it's a RENDER or BLT batch, but because we have a combined ring,
this approach should work fine on Gen4-5.
Seems to fix GPU hangs when playing hardware accelerated video with
mpv -hwdec=vaapi on Ironlake.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103529
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This also enables GL4.2 for gpus with hw fp64 (cayman, cypress)
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support for the RESQ opcode with the workaround
required due to hw bugs for buffers and cube arrays.
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Until we can work further on sb, disable it for images for now.
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support to the shader assembler for load/store/atomic
ops on images which are handled via the RAT operations.
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the atoms and gallium api implementations,
along with support for compress/decompress paths for
shader images.
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We need the thread id to use the immediate buffer readback
mechanism, so add support for calculating it.
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This isn't 100% perfect (fglrx also fails a bunch of those tests)
but implement the start of a memory barrier for image support.
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In order to image readback we have to execute a MEM_RAT instruction
that needs a buffer to transfer the result into until the shader
can fetch it.
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This implements proper handling for shaders with side effects.
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Currently the versions are set in the header, and then sed is used to
extract them, so that autotools can use them elsewhere.
This is odd. Autotools is perfectly capable of configuring the header
with the versions, and then they don't need to be extracted from the
the header. This is cleaner and more obvious.
Tested with make distcheck.
v2: - Split tiny -> patch change
- Drop temporary variables
- change XA_VERSION_* -> XA_*
v3: - Finish splitting the tiny -> patch change
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Emil Velikov <emli.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
Original 965 sets bits 28:27 to 0, while G45 and later set it to 1.
Note that the G45 docs are incorrect in this regard - see the DevCTG+
note in the Ironlake PRMs.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v2: - Add information about CC, CXX, CFLAGS, and CXXFLAGS (Nicolai)
- Add message at top that meson for mesa is still a work in progress
- Add trailing "/" to directories (Eric E.)
- Fix a number of spelling/grammar/style suggestions from Eric E.
- Make a number of changes as suggested by Emil.
v3: - Fix order of commands in example (Eric E.)
- Add documentation for overriding LLVM version (Eric E.)
v4: - Rebase on master
- update default buildtype
- add note about b_ndebug
- Clarify meson configure a bit
v5: - use <code> for command line arguments (Eric E.)
- Add note about listing options without a build directory
- Minor formatting changes (Eric E.)
- Replace the CC, CFLAGS, etc section with an environment variables
section, which mentions CC, CXX, CFLAGS, CXXFLAGS, LDFLAGS, and
DESTDIR
- Add comment that not using buildtype debug might make debugging
harder
- Add comment that b_ndebug and buildtype are orthogonal
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch> (v3)
Otherwise we'll bail with due to -Werror=implicit-function-declaration.
It went unnoticed since the we had a bug which did consistently set the
compiler flag.
Fixes: ba8a347f93 ("mesa: split extensions overrides and glGetString(GL_EXTENSIONS)")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Analogous to the glGetString() case - report all the
extensions enabled via MESA_EXTENSION_OVERRIDE
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Store pointers to the tokenized strings in the gl_extensions struct.
This way we can reuse them in glGetStringi() while we construct the
really long string only in _mesa_make_extension_string.
Only 16 pointers/strings are stored for now.
v2: Warn only once when we provide more than 16 unk. extensions, rebase
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
If the extra_extensions string is empty there's no need to call
atexit() - there's nothing to free.
v2: Rebase
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
The sorting was originally added to work around broken games (comment
says Quake3 demo) that were copying the extensions list into small
buffer.
Sorting does not solve the problem, since we'll still overflow and cause
corruption/crash.
Better workaround is to actually trim the string ... as done with a
later commit which introduces the MESA_EXTENSION_MAX_YEAR env. variable.
Side note: On my machine, the existing sorting makes no changes to the
extensions string.
Cc: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We already use it for _mesa_extension_override_enables.
Improve consistency and use it for both extension lists.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The function get_extension_override() returns a copy of a string,
only for it to be copied again ...
Drop the unneeded calloc/strdup/free dance.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
While parsing MESA_EXTENSION_OVERRIDE we keep track of the disabled
extensions, twice - in _mesa_extension_override_disables and
disabled_extensions.
Upon context creation, we use the former to modify the extensions list.
Yet, we still check the updated list against disabled_extensions.
Remove disabled_extensions, it's obsolete.
Cc: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
As of previous commit we removed the extension overrides from this
function.
Thus we no longer need to call it during MakeCurrent, so we can
construct the extensions string when needed - _mesa_GetString.
This commit effectively reverts a879d14ecf ("mesa: initialize extension
string when context is first bound")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Currently we apply the extension overrides and construct the extensions
string upon MakeCurrent.
They are two distinct things, so let's slit the two while pushing the
overrides management _before_ _mesa_compute_version(). This ensures that
the version is updated to reflect the enabled/disabled extensions.
Cc: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Checking the override was useful in the early stages of developing the
extension.
Now that everything is wired, where possible, we can drop the check.
Doing so allows us to simplify some of the related code.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Some platforms are missing a proper teardown function. Add a small TODO
to make it obvious.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The current "No EGL platform enabled." is misleading and wrong.
We reach said code when $platform is missing.
To make this more obvious and clear provide wrappers in the header
file, making the code a bit easier to follow.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
I'm not sure of the reason for this. I don't see anything like this in
configure.ac
In include/c11/threads.h the cases are:
1) building for Windows -> threads_win32.h
2) HAVE_PTHREAD -> threads_posix.h
3) Not supported on this platform
So not defining HAVE_PTHREAD for anything not Windows just means we can't
build at all.
When we are building for Windows, I'm not sure if dependency('threads')
would ever find anything, or defining HAVE_PTHREAD has any effect, but avoid
defining it there, just in case.
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
...and provide a better citation for the existing one.
v2:
- Apply the workaround to Gen8 too, as intended (caught by Topi).
- Restructure to add bits instead of an extra flush (based on a similar
patch by Rafael Antognolli).
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
We only have a single stencil read mask and write mask. Issue a
warning if different front/back values are used. The Piglit
gl-2.0-two-sided-stencil test hits this.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Sampler TS is an hardware optimization that can be used when rendering
to textures. After rendering to a resource with TS enabled, the
texture unit can use this to bypass lookups to empty tiles. This also
means a resolve-in-place can be avoided to flush the TS.
This commit is also an optimization when not using sampler TS, as
resolve-in-place will now be skipped if a resource has no (valid) TS.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This is to make sure that the TS is properly flushed to memory before
rendering to a new surface starts.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Sampler TS introduces yet another format enumeration for
renderable+textureable formats. Introduce it into the etnaviv_format
table as another column.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Resources only need a resolve-to-itself if their TS is valid for any
level, not just if it happens to be allocated.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This reverts two of the vk_error changes:
reporting unsupported format is common,
and testing non-amdgpu drivers and ignoring them is also common.
Fixes: cd64a4f70 (radv: use vk_error() everywhere an error is returned)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
BATCH_RESERVED was deleted in commit 2c46a67b41 (i965: Delete
BATCH_RESERVED handling.) The reserved_space field is dead code, and
the comments aren't useful these days.
Having this separate could potentially make programs that rebind atomics
but no other surfaces ever so slightly faster. But it's a tiny amount
of code to add to the existing UBO/SSBO atom, and very related.
The extra atoms have a cost on every draw call, and so dropping some of
them would be nice. This also reclaims a dirty bit.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We use the same hardware mechanism for both atomic counters and SSBO
atomics, so there's really no benefit to maintaining separate code to
handle each case. Instead, we can just use Rob's shiny new NIR pass to
convert atomic_uints to SSBOs, and delete piles of code.
The ssbo_start section of the binding table becomes a combined ABO and
SSBO section, with ABOs first, then SSBOs.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This fixes the missing AutomaticSize handling in the ABO code, removes
a bunch of duplicated code, and drops an extra layer of wrapping around
brw_emit_buffer_surface_state().
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This should reduce the overhead of adding a BO to the current
list, especially when the list is huge. Also, when a new pipeline
is bound, we only need to update the descriptor, the buffer objects
should already be in the list.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
I don't think we will need a 64-bit unsigned integer for the
dirty flags in the future, and there is still 20 bits left.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Float rts were always set as unorm instead of float.
Not sure of the consequences, but at least it looks like the blend clamp
would have been enabled, which is against the rules (only eg really bothered
to even attempt to specify this correctly, r600 always used clamp anyway).
Albeit r600 (not r700) setup still looks bugged to me due to never setting
BLEND_FLOAT32 which must be set according to docs...
Not sure if the hw really cares, no piglit change (on eg/juniper).
Reviewed-by: Dave Airlie <airlied@redhat.com>
Both r600 and evergreen used the clamped version, whereas cayman used the
ieee one. I don't think there's a valid reason for this discrepancy, so let's
switch to the ieee version for r600 and evergreen too, since we generally
want to stick to ieee arithmetic.
With this, behavior for both rcp and rsq should now be the same for all of
r600, eg, cm, all using ieee versions (albeit note rsq retains the abs
behavior for everybody, which may not be a good idea ultimately).
Reviewed-by: Dave Airlie <airlied@redhat.com>
r600 used the clamped version for rcp, whereas both evergreen and cayman
used the ieee version. I don't know why that discrepancy exists (it does so
since day 1) but there does not seem to be a valid reason for this, so make
it consistent. This seems now safer than before the previous commit (using
the dx10 clamp bit).
Note that rsq still uses clamped version (as before even though the table
may have suggested otherwise for evergreen) for r600/eg, but not for cayman.
Will be changed separately for better regression tracking...
Reviewed-by: Dave Airlie <airlied@redhat.com>
The docs are not very concise in what this really does, however both
Alex Deucher and Nicolai Hähnle suggested this only really affects instructions
using the CLAMP output modifier, and I've confirmed that with the newly
changed piglit isinf_and_isnan test.
So, with this bit set, if an instruction has the CLAMP modifier bit (which
clamps to [0,1]) set, then NaNs will be converted to zero, otherwise the result
will be NaN.
D3D10 would require this, glsl doesn't have modifiers (with mesa
clamp(x,0,1) would get converted to such a modifier) coupled with a
whatever-floats-your-boat specified NaN behavior, but the clamp behavior
should probably always be used (this also matches what a decomposition into
min(1.0, max(x, 0.0)) would do, if min/max also adhere to the ieee spec of
picking the non-nan result).
Some apps may in fact rely on this, as this prevents misrenderings in
This War of Mine since using ieee muls
(ce7a045fee), without having to use clamped
rcp opcode, which would also fix this bug there.
radeonsi also seems to set this bit nowadays if I see that righ (albeit the
llvm amdgpu code comment now says "Make clamp modifier on NaN input returns 0"
instead of "Do not clamp NAN to 0" since it was changed, which also looks
a bit misleading).
v2: set it in all shader stages.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103544
Reviewed-by: Dave Airlie <airlied@redhat.com>
I believe this is the safe thing to do, especially ever since the driver
actually generates NaNs for muls too.
The ISA docs are not very helpful here, however the dx10 versions will pick
a non-nan result over a NaN one (this is also the ieee754 behavior), whereas
the non-dx10 ones will pick the NaN (verified by newly changed piglit
isinf-and-isnan test).
Other "modern" drivers will most likely do the same.
This was shown to make some difference for bug 103544, albeit it is not
required to fix it.
Reviewed-by: Dave Airlie <airlied@redhat.com>
A lot of cubemap array piglits fail, port the texture type
picking code from radeonsi which seems to fix most of them.
For images I will port the rest of the code.
Fixes:
getteximage-depth gl_texture_cube_map_array-*
fbo-generatemipmap-cubemap array
getteximage-targets cube_array
amongst others.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
A couple failures in piglit tests w/ TF or gl_VertexID + indirect draws.
OTOH all the deqp tests (although they don't test those combinations).
I suspect this could be fixed by a firmware update, but I don't think
there is much we can do in mesa for that.
Signed-off-by: Rob Clark <robdclark@gmail.com>
The MOV instruction can extract bytes to words/double words, and
words/double words to quadwords, but not byte to quadwords.
For unsigned byte to quadword, we can read them as words and AND off the
high byte and extract to quadword in one instruction. For signed bytes,
we need to first sign extend to word and the sign extend that word to a
quadword.
Fixes the following test on CHV, BXT, and GLK:
KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103628
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes the following tests on CHV, BXT, and GLK:
KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot
dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint32_to_int64
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103115
Already implemented for Gallium drivers.
Useful for gbm_bo_(un)map.
Tests:
By porting wayland/weston/clients/simple-dmabuf-drm.c to GBM.
kmscube --mode=rgba
kmscube --mode=nv12-1img
kmscube --mode=nv12-2img
piglit ext_image_dma_buf_import-refcount -auto
piglit ext_image_dma_buf_import-transcode-nv12-as-r8-gr88 -auto
piglit ext_image_dma_buf_import-sample_rgb -fmt=XR24 -alpha-one -auto
piglit ext_image_dma_buf_import-sample_rgb -fmt=AR24 -auto
piglit ext_image_dma_buf_import-sample_yuv -fmt=NV12 -auto
piglit ext_image_dma_buf_import-sample_yuv -fmt=YU12 -auto
piglit ext_image_dma_buf_import-sample_yuv -fmt=YV12 -auto
v2: add early return if (flag & MAP_INTERNAL_MASK)
v3: take input rect into account and test with kmscube and piglit.
v4: handle wraparound and bo reference.
v5: indent, exclude 0 width and height on the boundary, map bo
independently of the image.
Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
It seems safe and it improves performance by +4% (73->76).
A drirc based solution is not what we want for now, keep it
simple and improve later if it's really needed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Previously, we just had one hash set for tracking depth and render
caches called brw_context::render_cache. This is less than ideal
because the depth and render caches are separate and we can't track
moves between the depth and the render caches. This limitation led
to some unnecessary flushing around the depth cache. There are cases
(mostly with BLORP) where we can end up touching a depth or stencil
buffer through the render cache. To guard against this, blorp would
unconditionally do a render_cache_set_check_flush on it's destination
which meant that if you did any rendering (including a BLORP operation)
to a given surface and then used it as a blorp destination, you would
end up flushing it out of the render cache before rendering into it.
Things get worse when you dig into the depth/stencil state code for
regular GL draw calls. Because we may end up rendering to a depth
or stencil buffer via BLORP, we did a render_cache_set_check_flush on
all depth and stencil buffers in brw_emit_depthbuffer to ensure that
they got flushed out of the render cache prior to using them for depth
or stencil testing. However, because we also need to track dirtiness
for depth and stencil so that we can implement depth and stencil
texturing correctly, we were adding all depth and stencil buffers to the
render cache set in brw_postdraw_set_buffers_need_resolve. This meant
that, if anything caused 3DSTATE_DEPTH_BUFFER to get re-emitted
(currently _NEW_BUFFERS, BRW_NEW_BATCH, and BRW_NEW_BLORP), we would
almost always do a full pipeline stall and render/depth cache flush.
The root cause of both of these problems is that we can't tell the
difference between the render and depth caches in our tracking. This
commit splits our cache tracking into two sets, one for render and one
for depth, and properly handles transitioning between the two. We still
flush all the caches whenever anything needs to be flushed. The idea is
that if we're going to take the hit of a flush and stall, we may as well
flush everything in the hopes that we can avoid a flush by something
else later.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Right now we just always flush the destination for render and aren't
particularly careful about depth or stencil. Soon, flush_for_render
isn't going to do the same thing as flush_for_depth and we may be doing
a good deal less depth flushing so we should be a bit more precise.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In theory, this will let us track the depth and render caches
separately. Right now, they're just wrappers around
brw_render_cache_set_*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were already using PTE for all render targets in case one happened to
get scanned out. However, this still wasn't 100% correct because there
are still possibly cases where we may want to texture from an external
buffer even though we don't know the caching mode. This can happen, for
instance, on buffers imported from another GPU via prime.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101691
Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Tested-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a bit more annoying than your average shader - we need to look
at MEDIA_INTERFACE_DESCRIPTOR_LOAD in the batch buffer, then hop over
to the dynamic state buffer to read the INTERFACE_DESCRIPTOR_DATA, then
hop over to the instruction buffer to decode the program.
Now that we store all the buffers before decoding, we can actually do
this fairly easily.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
while loops skip the first field of the instruction/structure, which
is not what the code intended. It works out because the field we're
looking for doesn't happen to be first, but we ought to do it right
regardless.
Found while writing the next patch, where Kernel Start Pointer is
the first field of INTERFACE_DESCRIPTOR_DATA.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This makes aubinator_error_decode's shader dumping work like aubinator.
Instead of printing them after the fact, it prints them right inside the
3DSTATE_VS/HS/DS/GS/PS packet that references them. This saves you the
effort of cross-referencing things and jumping back and forth.
It also reduces a bunch of book-keeping, and eliminates the limitation
that we could only handle 4096 programs. That code was also broken and
failed to print any shaders if there were under 4096 programs.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This lets us complete parsing and storing of each buffer's data before
we begin decoding the batchbuffer. This makes it possible to inspect
the state buffer and program buffer, so we can properly decode any
indirect state or shader programs.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Based on a similar patch to intel_error_decode by Chris Wilson.
While we're de-duplicating the gtt_offset calculation, we can simplify
it to assume two hex digits are there - the kernel has done this since
v4.6, and we already require error states from v4.10.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Also change count from a pointer into a value. We were supposed to
be resetting it to 0 (and failed to), but that's gone since we dropped
the pre-ascii85 handling.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Error state files used to look like:
render ring --- gtt_offset = 0x0e8f6000
00000000 : 69040000
00000004 : 79090000
...
00007ffc : 00000000
--- ringbuffer = 0x00001000
There were thousands of lines between sections. The file format changed
with Kernel 4.10, and now has a single ascii85-encoded line following
each section heading. This is much easier to parse.
There are a bunch of bugs in our handling of the old style format,
where we'd decode the wrong data, at the wrong time. Fixing all of
these is going to be a giant pain. It's also a lot of extra code
complexity. In order to properly decode indirect state, or compute
shaders, we'll also need to parse data in advance of decoding, which
is going to be a giant pain with this ad-hoc "decode everywhere!"
mentality. So, let's just drop support for the older file format.
This unfortunately requires an error state generated by Kernel 4.10 or
later. That's probably not the end of the world, as we encourage users
to upgrade to the latest kernel when encountering GPU hangs anyway. It
might be a giant pain for people with LTS kernels, though...
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
The dashes "---" may occur within an ascii85 block, but only an ascii85
block starts with ':' or '~'.
Ported from Chris Wilson's intel-gpu-tools commit:
bceec7e1d8a160226b783c6344eae8cbf4ece144
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
It's a neat idea, and still useful in some cases, but the intel common
code is used by i965 and anvil only, this is a little clearer.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Using build_by_default : false is convenient for dependencies that can
be pulled in by various diverse components of the build system, the
gallium hardware/software drivers and state trackers do not fit that
description. Instead, these should be guarded using the variable that tracks
whether that driver should be enabled.
This leaves a few helper libraries: trace, rbug, etc, and the generic
winsys bits as `build_by_default : false` because there are a large
number of gallium components that pull them in.
v2: - remove build_by_default from winsys convenience libs as well.
v3: - Always put drivers before winsys for consistency
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
This handles the bits >= 32 corner case in bitfieldInsert.
Fixes:
tests/spec/arb_gpu_shader5/execution/built-in-functions/fs-bitfieldInsert.shader_test.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Like
radeonsi: generate GS prolog to (partially) fix triangle strip adjacency rotation
evergreen hw suffers from the same problem, so rotate the
geometry inputs to fix this.
This fixes:
./bin/glsl-1.50-geometry-primitive-types GL_TRIANGLE_STRIP_ADJACENCY
on evergreen.
Signed-off-by: Dave Airlie <airlied@redhat.com>
As per radeonsi, the tess factor components for isolines
are reversed.
Fixes: tests/spec/arb_tessellation_shader/execution/isoline.shader_test
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
r0 in input into vertex shaders contains things like vertexid,
we need to reserve it even if we have no inputs.
This fixes a bunch of tessellation piglits.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The window-system auto-detection code (specifically for glx) relies on
with_any_vk being available. This fixes the Vulkan-only build. Also,
this puts it up near the handling of -Ddri-drivers and -Dgallium-drivers
which seems to make a bit more sense.
Fixes: 118a7f0441 "meson: add support for xlib glx"
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
It should be perfectly valid to build a completely headless Vulkan
driver. We don't need to require a platform.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The previous iteration algorithm would advance the field pointer right
after we advance the group. This meant that you would end up with
skipping the first field of the group. In the common case, where the
only field is a struct (e.g. 3DSTATE_VERTEX_BUFFERS), it would get
skipped entirely.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
They serve no purpose other than to just fill empty space in the packet
so each dword has something. Just disallowing empty groups is a bit
easier on some of the tools. This does not change the generated packing
headers in any way.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Gather operations in both GLSL and SPIR-V require a sampler. Fixes
gathers returning garbage when using separate texture/samplers (on AMD,
was using an invalid sampler descriptor).
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We should use the result type of the OpSampledImage opcode, rather than
the type of the underlying image/samplers.
This resolves an issue when using separate images and shadow samplers
with glslang. Example:
layout (...) uniform samplerShadow s0;
layout (...) uniform texture2D res0;
...
float result = textureLod(sampler2DShadow(res0, s0), uv, 0);
For this, for the combined OpSampledImage, the type of the base image
was being used (which does not have the Depth flag set, whereas the
result type does), therefore it was not being recognised as a shadow
sampler. This led to the wrong LLVM intrinsics being emitted by RADV.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It turned out that with recent changes that call into dri3 from glFinish(),
it appears like different thread end up waiting for X events simultaneously,
causing deadlocks since they steal events from eachoter and update the dri3
counters behind eachothers backs.
This patch intends to improve on that. It allows at most one thread at a
time to wait on events for a single drawable. If another thread intends to
do the same, it's put to sleep until the first thread finishes waiting, and
then it rechecks counters and optionally retries the waiting. Threads that
poll for X events never pulls X events off the event queue if there are
other threads waiting for events on that drawable. Counters in the
dri3 drawable structure are protected by a mutex. Finally, the mutex we
introduce is never held while waiting for the X server to avoid
unnecessary stalls.
This does not make dri3 drawables completely thread-safe but at least it's a
first step.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102358
Fixes: d5ba75f888 "st/dri2 Plumb the flush_swapbuffer functionality through to dri3"
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
v2: include the file also in the meson.build (Eric Engestrom).
Fixes: f1e1c60ff6 ("etnaviv: Update from rnndb")
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Patch adds support and capability to match with new surface attribute,
component type. Currently no configs with floating point type are exposed.
With this change, following dEQP test starts to pass:
dEQP-EGL.functional.choose_config.color_component_type_ext.dont_care
dEQP-EGL.functional.choose_config.color_component_type_ext.fixed
dEQP-EGL.functional.choose_config.color_component_type_ext.float
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
This should not be needed, if the allocation fails an error is
returned and the host should handle it.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Instead just dirty RADV_CMD_DIRTY_FRAMEBUFFER and it will be
re-emitted if necessary before the next draw.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Just after the vertex shader.
This seems to give a minor boost for, at least, Serious Sam
Fusion 2017 and Dawn of War 3. I don't see any real impacts
with The Talos Principle.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is what we do in the condition too, so it makes sense.
v2: Only compute without_array() once (Ilia).
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This register is the same on all gpus so far, so emit it in one
place and also for the pre-gfx9 gpus set the value in the pipeline
creation.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves some calculations of register values into the pipeline
construction, it saves looking at outinfo in the cmd buffer emit.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In case an instruction only writes one register, and it is .x, we can
skip the extra level of fanout indirection.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Seems to fix dEQP compute related tests.. and matches what i965 does, so
perhaps there is some assumption that std430 packing is on by default
somewhere in NIR?
Add a new pass that inserts additional dependencies, rather than simply
relying on SSA srcs added in the nir->ir3 frontend. This makes it
easier to deal with barriers, but the additional false deps also lets us
deal properly with ensuring a write depends on all previous reads.
Since conversion to barrier instructions is lossy (ie. just knowing the
instruction doesn't tell us enough about what other instructions the
barrier applies to), use barrier_class/barrier_conflict fields in the
ir3_instruction to retain this information.
This could probably be relaxed somewhat by considering *which* array/
buffer/image variable is being referenced. Ie. a write to buffer A
can overtake a read from buffer B, if B is not coherent. (right?)
Signed-off-by: Rob Clark <robdclark@gmail.com>
I want to add a growable array to ir3_instruction, so we can append
false dependencies for purposes of scheduling barriers, atomics, and
dealing with write after read hazards.
Just code motion preparing for next patch.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Atomic instructions take a different # of src args depending on .g or .l
variant, split these out into different helpers with INSTR*F() helper
macro that lets you specify instruction flag.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Some instructions (like barriers) have no dst, which causes problems
with dereferencing a NULL dst. Flip the logic around to reject opc's
that can't be a type of move first, to filter out those instructions.
Signed-off-by: Rob Clark <robdclark@gmail.com>
User consts and driver consts such as UBO addresses and immediates are
handled the same for all shader stages, so split out a shared helper for
these, to make it easier to add more.
Signed-off-by: Rob Clark <robdclark@gmail.com>
It is unfortunate that image state isn't a real CSO, since (at least for
a4xx/a5xx) it is a combination of sampler and "SSBO" image state, and it
would be useful to pre-compute the state block "register" values rather
than doing it at emit time.
Signed-off-by: Rob Clark <robdclark@gmail.com>
In case the IR is NIR, the driver takes reference to the nir_shader.
Also, because there are no variants, we need to clone the shader,
instead of sharing the reference with gl_program, which would result
in a double free in _mesa_delete_program().
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Having both an ir3_compile (which was really context for compiling a
single shader variant) and ir3_compiler (which is the compiler object
that compiles all variants, ie. basically holds the RA regset) is a
bit confusing.
Signed-off-by: Rob Clark <robdclark@gmail.com>
We renamed "Function Enable" to "Enable", which broke our detection
of whether shaders are enabled or not. So, we'd see a bunch of HS/DS
packets with program offsets of 0, and think that was a valid TCS/TES.
Fixes: c032cae9ff (genxml: Rename "Function Enable" to "Enable".)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
As far as I can tell these fields are only used to query arb
program info and are not related to ATI_fragment_shader.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Miklós Máté <mtmkls@gmail.com>
Android fences can't be deferred, because st/dri calls fence_finish
with ctx = NULL, so the driver can't flush u_threaded_context.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This prevents build failures when libdrm_freedreno is unavailable,
which started happening after the ir3_compiler build was enabled.
(Patch by Rob, commit message by Ken).
Fixes: fecd04a66a ("freedreno/ir3: fix standalone compiler meson build")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The L3 configuration code already considers the TCS and TES programs,
but failed to listen for TCS/TES program changes.
This was somehow missing.
Fixes: e9644cb1f9 ("i965: Consider tessellation in get_pipeline_state_l3_weights.")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
There is a bunch of churn in the main meson.build so that we can
correctly set the auto tristate of GLX. In particular, don't build
xlib-based glx when dri and gallium are disabled but vulkan is enabled,
in that case just turn glx off.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Because the same generation logic is required by xlib glx and
gallium-xlib glx, it makes sense to pull it out.
v2: - Ensure that libgl is defined before trying to generate a pkgconfig
file with it.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This ensure that it's properly guarded, but also makes the code clearer
by grouping related things together.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This creates a dependency on this header being generated before trying
to compile any of these targets, as well as passing the correct -I to
the compiler to ensure it's included correctly.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
These variables were removed from autotools in 2008 (sha:
80f68e1b6a), but they have lived on here. The Scons build
meanwhile doesn't set a patch/tiny version at all, just major and minor.
This patch removes the unused variables and simply sets the version,
leaving patch/tiny as 0 since that's what the autotools build as been
doing forever. This shouldn't change any behavior.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This seems to be dropped in 222a2fb9 "util: move os_time.[ch] to src/util"
../../../src/util/os_time.c: In function ‘os_time_sleep’:
../../../src/util/os_time.c:104:4: error: implicit declaration of function ‘usleep’ [-Werror=implicit-function-declaration]
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
These flags are set for C sources, but not C++. This causes symbol
visibility leaks from the C++ parts of the Intel compiler.
Fixes: 700bebb958 ("i965: Move the back-end compiler to src/intel/compiler")
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes piglit - egl_khr_fence_sync/android_native tests.
Broken by 884a0b2a9e.
Introduce state-tracker flush flags, analogous to the pipe ones. Use
the former when with stapi->flush().
Fixes: 884a0b2a9e ("st/dri: use stapi flush instead of pipe flush
when creating fences")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Apparently, it doesn't have pthread barriers.
p_config.h (which was originally used to guard this code) uses the
__APPLE__ macro to detect Mac OS.
Fixes: f0d3a4de75 ("util: move pipe_barrier into src/util and rename to util_barrier")
Cc: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
State validation is performed during clear and draw calls. Validation
during clear was still accessing vertex buffer state. When the currently
set vertex buffers are client arrays, this could lead to accessing freed
memory. Such is the case with the VMD application.
Previously, vertex buffer validation depended on a dirty bit or the
draw info indicating an indexed draw. This required special handling for
clears. But, vertex buffer validation still occurred which was unnecessary
and wrong.
Now, only minimal validation is performed during clear, deferring the
remainder to the next draw. And, by setting the dirty bit in swr_draw_vbo
for indexed draws, vertex buffer validation is only dependent upon a
single dirty bit.
This fixes a bug exposed by the VMD application when changing models.
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
Don't rely on intr->num_components having a valid value. It doesn't
seem to anymore for non-vectorized intrinsics.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Its helper function, anv_surface_get_subresource_layout(), was not very
helpful. So fold it into the main function.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instead of choosing the tiling flags inside make_surface(), which is
called once per aspect in a loop, and which chooses the same tiling for
each aspect, choose the tiling flags exactly once before entering the
aspect loop.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The same local variable, 'plane_format', was returned on success *and*
failure. Be more explicit in distinguishing the two cases: return
'plane_format' on success and return 'unsupported' on failure.
This simplifies the diff in upcoming patches for
VK_EXT_image_drm_format_modifier.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Now that get_image_format_properties() returns the correct
VkFormatFeatureFlags, we can remove the unneeded if-branch and some
local variables.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Now that get_image_format_features() has a VkImageTiling parameter, we
can bypass anv_physical_device_get_format_properties() and call
get_image_format_features() directly.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The name is misleading. It looks like vkGetPhysicalDeviceImageFormatProperties(),
but it actually implement vkGetPhysicalDeviceFormatProperties. Let's
rename it to what it actually does, get_image_format_features(), because it
returns VkFormatFeatureFlags.
For consistency, also rename get_buffer_format_properties() to
get_buffer_format_features().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Teach it to calculate the format features for YCbCr.
The goal (which is completed in this patch) is to incrementally fix
get_image_format_properties() to return a correct result. Previously,
it returned incorrect VkFormatFeatureFlags which the caller needed clean
up.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Teach it to calculate the format features for 3-channel formats.
The goal is to incrementally fix get_image_format_properties() to return
a correct result. Currently, it returns incorrect VkFormatFeatureFlags
which the caller must clean up.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Replace parameters 'enum isl_format' and 'struct anv_format_plane' with
new parameter 'const struct anv_format *'.
The goal is to incrementally fix get_image_format_properties() to return
a correct result. Currently, it returns incorrect VkFormatFeatureFlags
which the caller must clean up.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Teach it to calculate the format features for ASTC.
The goal is to incrementally fix get_image_format_properties() to return
a correct result. Currently, it returns incorrect VkFormatFeatureFlags
which the caller must clean up.
v2: New commit message
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Teach it to calculate the features of depthstencil formats.
The goal is to incrementally fix get_image_format_properties() to return
a correct result. Currently, it returns incorrect VkFormatFeatureFlags
which the caller must clean up.
v2: New commit message
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Some functions have a comment that says "Exactly one bit must be in
'aspect'". So change the type of their 'aspect' parameter from
VkImageAspectFlags to VkImageAspectFlagBits.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Make it a stand-alone function. Pre-patch, for some formats the function
returned incorrect VkFormatFeatureFlags which were cleaned up by the
caller.
This prepares for a cleaner implementation of
VK_EXT_image_drm_format_modifier.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We already have piglit tests testing alpha, luminance, and intensity
formats. They were skipped by piglit until now.
Additionally, I'm enabling one ARB_texture_buffer_range piglit test to run
with the compat profile.
i965 behavior is unchanged except that it doesn't expose TBOs in the Compat
profile. Not sure how that affects the GL version override.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This adds support for the evergreen/cayman atomic counters.
These are implemented using GDS append/consume counters. The values
for each counter are loaded before drawing and saved after each draw
using special CP packets.
v2: move hw atomic assignment into driver.
v3: fix messing up caps (Gert Wollny), only store ranges in driver,
drop buffers.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
This adds support for creating the hw atomic tgsi from
the glsl codepaths.
v2: drop the atomic index and move to backend.
v3: drop buffer decls. (Marek)
v4: fix off by one (Gert)
v5: fix off by one the other way (Dave)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
HW atomics need to use caps to set some limits, and some
other limits may also need limiting.
This fixes things up to work for evergreen hw, it may need
more changes in the future if other hw wants to use this path.
v1.1: fix indent.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds a new atom that calls the new driver API to
bind buffers containing hw atomics.
v2: fixup bindings for sparse buffers. (mareko/nha)
don't bind buffer atomics when hw atomics are enabled.
use NewAtomicBuffer (mareko)
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This API binds atomic buffers for all bound shaders (as per the
GL semantics).
This is needed to support cross shader hw atomic counters.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support for a hw atomic counters to TGSI.
A new register file for storing atomic counters is added,
along with a new atomic counter semantic, along with docs
for both.
v2: drop semantic, move hw counter to backend,
Ilia pointed out SSO would have busted my plan, and he
was right.
v3: drop BUFFER decls. (Marek)
v3.1: minor fixups for whitespace, set ureg error
if we overflow the hw atomic limits. (nha)
v3.2: fix some docs inconsistencies (Ilia)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This looks like an evergreen specific feature, but with atomic
counters AMD have hw specific counters they use instead of operating
on buffers directly. These are separate to the buffer atomics,
so require different limits and code paths.
I've left the CAP for atomic type extensible in case someone
else has a variant on this sort of thing (freedreno maybe?)
and needs to change it.
This adds all the CAPs required to add support for those atomic
counters, along with a related CAP for limiting the number of
output resources.
I'd like to land this and the st patch then I can start to
upstream the evergreen support for these and other GL4.x features.
v2: drop the ATOMIC_COUNTER_MODE cap, just use the return
from the HW counters. If 0 we use the current mode.
v3: fix some rebase errors (Gert Wollny)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
13b303ff92 added the actual enums but
didn't remove the already existing XXXX ones. (And also duplicated
the "fragment" names instead of using the "vertex" names.)
Fixes: 13b303ff92 "docs: Update the list of used MESA GL enums."
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
st_src_reg is a class, not a struct. Simply remove 'struct' to silence
a MSVC compiler warning (class vs. struct mismatch).
Reviewed-by; Charmaine Lee <charmainel@vmware.com>
This allows drivers to be set by OS/arch in a sane manner.
v2: - set _drivers to a list of drivers instead of manually assigning
each with_*
v3: - Use "auto" instead of "default", which matches the value of other
automatically configured options.
- Set vulkan drivers as well
- Add error message if no automatic drivers are known for a given
arch/OS combo
- use not(darwin or windows) instead of (linux or *bsd), which is
probably more accurate (that way Solaris and other *nix systems
aren't excluded)
- rename softpipe to swrast, as swrast is the actual option name
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Similar to what we did for pixel shader threads - see gen_device_info.c.
We don't want to bump the actual Maximum Number of Threads though, so
we adjust it here. For pixel shaders, we don't use max_wm_threads, so
we could just bump it globally.
Supposedly fixes Piglit tests:
arb_gpu_shader_int64/execution/built-in-functions/cs-op-div-i64vec3-int64_t
arb_gpu_shader_int64/execution/built-in-functions/cs-op-div-i64vec4-int64_t
arb_gpu_shader_int64/execution/built-in-functions/cs-op-div-u64vec4-uint64_t
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Meson has up until this point set it's version in the root meson.build
script, while the other build systems read the VERSION file. This is
just "one more thing" to duplicate between meson and every other build
system. This script is a simple "read, strip, print" sort of deal to
allow meson to read the VERSION file.
I chose to implement this in python since python is portable, and to
keep the meson.build script clean. This is also complicated by the fact
that the project() call *must* be the first non-comment,non-blank in the
toplevel meson.build script.
v2: - Move from scripts/ to bin/
- use python explicitly to run the scripts to support windows
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This patch makes use of the DRM_IOCTL_VC4_GEM_MADVISE ioctl to mark all
BOs placed in the mesa BO cache as purgeable so that the system can
reclaim this memory under memory pressure.
v2:
- Removed BOs from the cache when they've been purged by the kernel
- Check whether the madvise ioctl is supported or not before using it
v3: Don't walk the whole list when we find a busy BO (by anholt, acked by
Boris)
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Taken from drm-next d65d31388a23 ("Merge tag
'drm-misc-next-fixes-2017-11-07' of
git://anongit.freedesktop.org/drm/drm-misc into drm-next")
v2: Add the NOTSUPP definition from the final drm-next version, not the
commit (anholt).
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Somehow on my cross build the -pthread is getting lost. All the other
deps seem to work out fine.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Tested-by: Timothy Arceri <tarceri@itsqueeze.com>
The gallium auxiliary build would link against llvm, for the gallivm code
that it didn't build. This broke the build on my armhf cross, where
libLLVM-3.9.so is not multiarch and thus points to x86-64 libs.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Tested-by: Timothy Arceri <tarceri@itsqueeze.com>
This should be safe as these backends already support the EGL version of
this extension. DRI1 is not affected because it does not support
GLX_ARB_create_context anyway. DRI-Windows is not prepared to implement
this as there's no equivalent WGL extension, and wglCreateContextAttribs
seems to really want the HDC's pixel format to be set.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes non-deterministic failures in
dEQP-EGL.functional.sharing.gles2.multithread.simple_egl_sync.images.texture_source.teximage2d_render
and others in dEQP-EGL.functional.sharing.gles2.multithread.*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The current value was introduced in commit a27180d0d8, which claims
that it represents ~1.11 years. However, it is interpreted in nanoseconds,
so it actually only represents ~9.8 hours. That seems a bit short.
Use the largest value consistent with both int32 and int64. It
corresponds to ~292 years in nanoseconds.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
st_flush should flush state tracker-internal state and the pipe, but
not mesa/main state. Of the four callers:
- glFlush/glFinish already call FLUSH_{VERTICES,STATE}.
- st_vdpau doesn't need to call them.
- st_manager will now call them explicitly.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There may be pending operations (e.g. vertices) that need to be flushed
by the state tracker.
Found by inspection.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This patch has multiple goals:
1. Off-load the writing of records in 'always' mode to another thread
for performance.
2. Allow using ddebug with threaded contexts. This really forces us to
move some of the "after_draw" handling into another thread.
3. Simplify the different modes of ddebug, both in the code and in
the user interface, i.e. GALLIUM_DDEBUG. In particular, there's
no 'pipelined' anymore, since we're always pipelined; and 'noflush'
is replaced by 'flush', since we no longer flush by default.
4. Fix the fences in pipelining mode. They previously relied on writes
via pipe_context::clear_buffer. However, on radeonsi, those could
(quite reasonably) end up in the SDMA buffer. So we use the newly
added PIPE_FLUSH_{TOP,BOTTOM}_OF_PIPE fences instead.
5. Improve pipelined mode overall, using the finer grained information
provided by the new fences.
Overall, the result is that pipelined mode should be more useful, and
using ddebug in default mode is much less invasive, in the sense that
it changes the overall driver behavior less (which is kind of crucial
for a driver debugging tool).
An example of the new hang debug output:
Gallium debugger active.
Hang detection timeout is 1000ms.
GPU hang detected, collecting information...
Draw # driver prev BOP TOP BOP dump file
-------------------------------------------------------------
2 YES YES YES NO /home/nha/ddebug_dumps/shader_runner_19919_00000000
3 YES NO YES NO /home/nha/ddebug_dumps/shader_runner_19919_00000001
4 YES NO YES NO /home/nha/ddebug_dumps/shader_runner_19919_00000002
5 YES NO YES NO /home/nha/ddebug_dumps/shader_runner_19919_00000003
Done.
We can see that there were almost certainly 4 draws in flight when
the hang happened: the top-of-pipe fence was signaled for all 4 draws,
the bottom-of-pipe fence for none of them. In virtually all cases,
we'd expect the first draw in the list to be at fault, but due to the
GPU parallelism, it's possible (though highly unlikely) that one of
the later draws causes a component to get stuck in a way that prevents
the earlier draws from making progress as well.
(In the above example, there were actually only 3 draws truly in flight:
the last draw is a blit that waits for the earlier draws; however, its
top-of-pipe fence is emitted before the cache flush and wait, and so
the fact that the draw hasn't truly started yet can only be seen from a
closer inspection of GPU state.)
Acked-by: Marek Olšák <marek.olsak@amd.com>
v2: use uncached system memory for the fence, and use the CPU to
clear it so we never read garbage when checking the fence
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Queries should still get marked as flushed when flushes are executed
asynchronously in the driver thread.
To this end, the management of the unflushed_queries list is moved into
the driver thread.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This requires out-of-band creation of fences, and will be signaled to
the pipe_context::flush implementation by a special TC_FLUSH_ASYNC flag.
v2:
- remove an incorrect assertion
- handle fence_server_sync for unsubmitted fences by
relying on the improved cs_add_fence_dependency
- only implement asynchronous flushes on amdgpu
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The driver uses (and must use) the flushed flag of queries as a hint that
it does not have to check for synchronization with currently queued up
commands. Deferred flushes do not actually flush queued up commands, so
we must not set the flushed flag for them.
Found by inspection.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The idea is to fix the following interleaving of operations
that can arise from deferred fences:
Thread 1 / Context 1 Thread 2 / Context 2
-------------------- --------------------
f = deferred flush
<------- application-side synchronization ------->
fence_server_sync(f)
...
flush()
flush()
We will now stall in fence_server_sync until the flush of context 1
has completed.
This scenario was unlikely to occur previously, because applications
seem to be doing
Thread 1 / Context 1 Thread 2 / Context 2
-------------------- --------------------
f = glFenceSync()
glFlush()
<------- application-side synchronization ------->
glWaitSync(f)
... and indeed they probably *have* to use this ordering to avoid
deadlocks in the GLX model, where all GL operations conceptually
go through a single connection to the X server. However, it's less
clear whether applications have to do this with other WSI (i.e. EGL).
Besides, even this sequence of GL commands can be translated into
the Gallium-level sequence outlined above when Gallium threading
and asynchronous flushes are used. So it makes sense to be more
robust.
As a side effect, we no longer busy-wait on submission_in_progress.
We won't enable asynchronous flushes on radeon, but add a
cs_add_fence_dependency stub anyway to document the potential
issue.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
These bits are intended to be used by the ddebug hang detection and are
named in analogy to the Vulkan stage bits (and the corresponding Radeon
pipeline event).
Hang detection needs fences on the granularity of individual commands,
which nothing else really covers. The closest alternative would have
been PIPE_QUERY_GPU_FINISHED, but (a) queries are a per-context object
and we really want a per-screen object, (b) queries don't offer a
wait with timeout, and (c) in any case, PIPE_QUERY_GPU_FINISHED is
meant to imply that GPU caches are flushed, which the new bits
explicitly aren't.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
With Gallium threaded contexts, creating shader/compute states is
effectively a screen operation, so we should not use context state.
In particular, this allows us to avoid using the context's LLVM
TargetMachine.
This isn't an issue yet because u_threaded_context filters out non-async
debug callbacks, and we disable threaded contexts for debug contexts.
However, we may want to change that in the future.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Schedule one job for every thread, and wait on a barrier inside the job
execution function.
v2: avoid alloca (fixes Windows build error)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
The #if guard is probably not 100% equivalent to the previous PIPE_OS
check, but if anything it should be an over-approximation (are there
pthread implementations without barriers?), so people will get either
a good implementation or compile errors that are easy to fix.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Some locking is unfortunately required, because well-formed GL programs
can have multiple threads racing to access the same texture, e.g.: two
threads/contexts rendering from the same texture, or one thread destroying
a context while the other is rendering from or modifying a texture.
Since even the simple mutex caused noticable slowdowns in the piglit
drawoverhead micro-benchmark, this patch uses a slightly more involved
approach to keep locks out of the fast path:
- the initial lookup of sampler views happens without taking a lock
- a per-texture lock is only taken when we have to modify the sampler
view(s)
- since each thread mostly operates only on the entry corresponding to
its context, the main issue is re-allocation of the sampler view array
when it needs to be grown, but the old copy is not freed
Old copies of the sampler views array are kept around in a linked list
until the entire texture object is deleted. The total memory wasted
in this way is roughly equal to the size of the current sampler views
array.
Fixes non-deterministic memory corruption in some
dEQP-EGL.functional.sharing.gles2.multithread.* tests, e.g.
dEQP-EGL.functional.sharing.gles2.multithread.simple.images.texture_source.create_texture_render
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Move the early-out for surface-based textures earlier. This narrows the
scope of the locking added in a follow-up commit.
Fix one remaining case of initializing a surface-based texture
without properly finalizing it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
r600 expects the context that created the sampler view to still be alive
(there is a per-context list of sampler views).
svga currently bails when the context of destruction is not the same as
creation.
The GL state tracker, which is the only one that runs into the
multi-context subtleties (due to share groups), already guarantees that
sampler views are destroyed before their context of creation is destroyed.
Most drivers are context-agnostic, so the warning message in
pipe_sampler_view_release doesn't really make sense.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We only need the lock to guard changes in the variant linked list. The
actual compilation can happen outside the lock, since we use the ready
fence as a guard.
v2: fix double-unlock
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There's a race condition between si_shader_select_with_key and
si_bind_XX_shader:
Thread 1 Thread 2
-------- --------
si_shader_select_with_key
begin compiling the first
variant
(guarded by sel->mutex)
si_bind_XX_shader
select first_variant by default
as state->current
si_shader_select_with_key
match state->current and early-out
Since thread 2 never takes sel->mutex, it may go on rendering without a
PM4 for that shader, for example.
The solution taken by this patch is to broaden the scope of
shader->optimized_ready to a fence shader->ready that applies to
all shaders. This does not hurt the fast path (if anything it makes
it faster, because we don't explicitly check is_optimized).
It will also allow reducing the scope of sel->mutex locks, but this is
deferred to a later commit for better bisectability.
Fixes dEQP-EGL.functional.sharing.gles2.multithread.simple.buffers.bufferdata_render
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fences are now 4 bytes instead of 96 bytes (on my 64-bit system).
Signaling a fence is a single atomic operation in the fast case plus a
syscall in the slow case.
Testing if a fence is signaled is the same as before (a simple comparison),
but waiting on a fence is now no more expensive than just testing it in
the fast (already signaled) case.
v2:
- style fixes
- use p_atomic_xxx macros with the right barriers
Acked-by: Marek Olšák <marek.olsak@amd.com>
The closest to it in the old-style gcc builtins is __sync_lock_test_and_set,
however, that is only guaranteed to work with values 0 and 1 and only
provides an acquire barrier. I also don't know about other OSes, so we
provide a simple & stupid emulation via p_atomic_cmpxchg.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
According to the GLSL ES 3.20, GLSL 4.50, and GLSL 1.20 specs:
"To force all output variables to be invariant, use the pragma
#pragma STDGL invariant(all)
before all declarations in a shader."
Notably, this is only supposed to affect output variables. Furthermore,
"Only variables output from a shader can be candidates for invariance."
It looks like this has been wrong since we first supported the pragma in
2011 (commit 86b4398cd1).
Fixes dEQP-GLES2.functional.shaders.preprocessor.pragmas.pragma_fragment.
v2: Now that all cases are identical (other than compute shaders, which
have no output variables anyway), we can drop the switch statement
entirely. We also don't need the current_function == NULL check;
this was a hold over from when we had a single var_mode_out for both
function parameters and shader varyings, in the bad old days.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Patch exposes sRGB visuals and adds DRI integer query support for
__DRI2_RENDERER_HAS_FRAMEBUFFER_SRGB. Further changes make sure that
we mark if the app explicitly wanted sRGB and for these framebuffers
we don't turn sRGB off in intel_gles3_srgb_workaround. This way we
keep compatibility for existing applications relying on default sRGB
and ony add more visual support.
With this change, following dEQP tests start to pass:
dEQP-EGL.functional.wide_color.window_8888_colorspace_srgb
dEQP-EGL.functional.wide_color.pbuffer_8888_colorspace_srgb
v2: some code cleanup (Emil Velikov)
update num_formats correctly (reported by deveee@gmail.com)
v3: cleanup, remove redundant is_srgb
rename explicit_srgb as 'need_srgb' to follow style better
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v2)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102264
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102354
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102503
The GL spec will soon be revised to clarify that a buffer binding for
a transform feedback buffer is only required if a variable is actually
defined to use the buffer binding point. Previously a declaration for
the default transform buffer would make it require a binding even if
nothing was declared to use the default buffer.
Affects:
KHR-GL44/45.enhanced_layouts.xfb_stride_of_empty_list
KHR-GL44/45.enhanced_layouts.xfb_stride_of_empty_list_and_api
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Previously, if we were linking a vec4 VS with a SIMD8/16 FS, we wouldn't
lower indirects on the fragment shader which is wrong. Instead of using
a single indirect mask, take advantage of our new little helper.
Reviewed-by: Timothy Arceri <tarceri at itsqueeze.com>
Cc: mesa-stable@lists.freedesktop.org
Radeonsi also sets this flag. Seems to avoid pulling up the desintation
RT value when the dst blend factor is zero if it's not otherwise being
loaded. Among other things, it allows blending to overwrite infinity/NaN
values in the destination RT.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
I think it's more clear to only call emit_access once. The only
difference between the two calls is the value of size_mul used for the
offset parameter... but you really have to look at it to be sure.
The s/is_64bit/is_double/ change is because there are no int64_t or
uint64_t matrix types.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
I was going to squash this with the previous commit, but there's a lot
of churn in that commit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Without this, the SPIR-V generator has to deal with a bunch of junk
like:
(swiz z (swiz xxx (swiz x (var_ref packed:binormal.z,light_dir))))
It seems better to cull that stuff out than to add code to deal with
it. The problem is the way swizzles to and from scalars have to be
handled in SPIR-V.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
This is derived from tgsi/radeonsi code from the GLSL intrinsics.
This should pre-fix radv for the upcoming spirv patches.
v2: actually use wait_cnt, sleep deprived dad time! (Bas)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
While modern pthread mutexes are very fast, they still incur a call to an
external DSO and overhead of the generality and features of pthread mutexes.
Most mutexes in mesa only needs lock/unlock, and the idea here is that we can
inline the atomic operation and make the fast case just two intructions.
Mutexes are subtle and finicky to implement, so we carefully copy the
implementation from Ulrich Dreppers well-written and well-reviewed paper:
"Futexes Are Tricky"
http://www.akkadia.org/drepper/futex.pdf
We implement "mutex3", which gives us a mutex that has no syscalls on
uncontended lock or unlock. Further, the uncontended case boils down to a
cmpxchg and an untaken branch and the uncontended unlock is just a locked decr
and an untaken branch. We use __builtin_expect() to indicate that contention
is unlikely so that gcc will put the contention code out of the main code
flow.
A fast mutex only supports lock/unlock, can't be recursive or used with
condition variables. We keep the pthread mutex implementation around as
for the few places where we use condition variables or recursive locking.
For platforms or compilers where futex and atomics aren't available,
simple_mtx_t falls back to the pthread mutex.
The pthread mutex lock/unlock overhead shows up on benchmarks for CPU bound
applications. Most CPU bound cases are helped and some of our internal
bind_buffer_object heavy benchmarks gain up to 10%.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
When I introduced gl_shader_program_data one of the intentions was to
fix a bug where a failed linking attempt freed data required by a
currently active program. However I seem to have failed to finish
hooking up the final steps required to have the data hang around.
Here we create a fresh instance of gl_shader_program_data every
time we link. gl_program has a reference to gl_shader_program_data
so it will be freed once the program is no longer active.
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Neil Roberts <nroberts@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102177
This has a bit of a surprising effect:
For the render pipeline, the upload_sampler_state_table atom emits
3DSTATE_BINDING_TABLE_POINTERS_XS. It tries to avoid this for compute:
if (GEN_GEN >= 7 && stage_state->stage != MESA_SHADER_COMPUTE) {
/* Emit a 3DSTATE_SAMPLER_STATE_POINTERS_XS packet. */
genX(emit_sampler_state_pointers_xs)(brw, stage_state);
} ...
However, we were failing to initialize brw->cs.base.stage, so it was
left as 0 (MESA_SHADER_VERTEX), causing this condition to break. We
then emitted 3DSTATE_SAMPLER_STATE_POINTERS_VS in GPGPU mode, when
trying to upload CS samplers. Nothing good can come of this.
Found by inspection while debugging a GPU hang. Jordan believes this
helps the Deus Ex: Mankind Divided benchmark mode's stability when
running with shader cache.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
... as can happen with various types like mat4, or else we'll smash the
stack writing past the end of components_local[].
Fixes: 5a0d3e1129 ("nir: Print the components referenced for split or
packed shader in/outs.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
So far clover based its test for compiler support on the version of gcc,
while in reality support for c++11 is required. This patch replaces the
version check by the check unified for all modules that require c++11.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Add a check that tests whether the c++ compiler supports c++11, either
by default, by adding the compiler flag -std=c++11, or by adding a
compiler flag that the user has specified via the environment variable
CXX11_CXXFLAGS.
The test only does a very shallow check of c++11 support, i.e. it tests
whether the define __cplusplus >= 201103L to confirm language support
by the compiler, and it checks whether the header <tuple> is available
to test the availability of the c++11 standard library.
A make file conditional HAVE_STD_CXX11 is provided that is used in this
patch to enable the test in st/mesa if C++11 support is available.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102665
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Currently we were overwriting the existing warning flags, instead of
adding new [as applicable].
Fixes c5d2e2d43f ("configure: Test for -Wno-initializer-overrides")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Currently we were overwriting the existing warning flags, instead of
adding new [as applicable].
v2: Add missing space before -Werror (Eric)
Fixes e4b2b69e82 ("configure: Add and use AX_CHECK_COMPILE_FLAG")
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Targets such as omx and va can work w/o anything X related. Mandate the
xcb* dependencies only when the X11 platform is selected.
Reported-by: Lukas Rusak <lorusak@gmail.com>
Fixes: 63e11ac2b5 ("configure: error out if building VA w/o supported
platform")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Lukas Rusak <lorusak@gmail.com> (v1)
Currently we error out when building GLVND w/o GLX.
That was the original premice before we had EGL. As the commit says,
that error should be reworked to honour both - do so.
v2: Drop noop *);; (Eric)
Reported-by: Lukas Rusak <lorusak@gmail.com>
Fixes: ce562f9e3f ("EGL: Implement the libglvnd interface for EGL (v3)")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Lukas Rusak <lorusak@gmail.com> (v1)
Android implements the API and does the native damage handling itself.
At the same time it
a) does call the vendor's eglSwapBuffersWithDamageKHR
b) does not implement eglSetDamageRegionKHR
There's something strange happening here. For now simply note about the
'lack' of eglSwapBuffersWithDamageKHR support.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The function is effectively a direct function call into
libwayland-server.so.
Thus GBM no longer depends on the wayland-drm static library, making the
build more straight forward. And the resulting binary is a bit smaller.
Note: we need to move struct wayland_drm_callbacks further up,
otherwise we'll get an error since the type is incomplete.
v2: Rebase, beef-up commit message, update meson, move struct
wayland_drm_callbacks.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> # meson bit only
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com> # for the rest
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> # meson
Commit 05fc62d89f sets the variable, yet it forgot the update the
existing reference to append (instead of assign).
Thus as-is the expat library was discarded from the link chain when
building with Android.
Fixes: 05fc62d89f ("automake: intel: move expat handling where it's
used")
Cc: Hongxu Jia <hongxu.jia@windriver.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Use $(sysconfdir) instead of hardcoding /etc.
While the OpenCL spec expects the file in /etc, people building their
stack can override that, esp. !Linux users.
Furthermore this removes a fundamental violation, which results in the
system file being overwritten even as one explicitly sets --prefix
and/or DESTDIR.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-By: Aaron Watry <awatry@gmail.com>
Otherwise it will be missing from the release tarball
Fixes: 7f33e94e43 ("amd/addrlib: update to latest version")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
LLVM 6 changed the API on the fast-math-flags:
https://reviews.llvm.org/rL317488
NOTE: This also enables the new flag 'ApproxFunc' to allow for
approximations for library functions (sin, cos, ...). I'm not completly
convinced, that this is something mesa should do.
Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Use the NIR helper rather than the GLSL IR helper to get in/out
masks. This allows us to ignore varyings removed by NIR
optimisations.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We want to use nir_shader_gather_info() the GLSL IR version might
be including varyings that NIR later eliminates. To do this we
need to generate NIR before we we start using the in/out bitmasks.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Delaying adding built-in uniforms until after we convert to NIR
gives us a better chance to optimise them away. Also NIR allows
us to iterate over the uniforms directly so should be faster.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This uses C++11 initializer lists.
I just overwrote all Mesa files with internal addrlib and discarded
hunks that we should probably keep, but I might have missed something.
The code depending on ADDR_AM_BUILD is removed. We can add it back next
time if needed.
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Gallium disables it by removing the streamout buffers, not by binding a
program that doesn't have TF outputs. Fixes piglit
"ext_transform_feedback2/counting with pause"
The v3d_qpu_writes_r*() were only checking for fixed-function accumulator
writes, not normal ALU writes to those regs.
Fixes fs-discard-exit-2 on simulation (but not HW).
We have to compute the queries in software, so we're counting the
primitives by hand. We still need to make sure to not increment the
PRIMITIVES_EMITTED if we overflowed, but leave that for later.
Most of NIR doesn't allow doing array indexing on a vector (though it
does on a matrix). However, nir_lower_io handles it just fine and this
behavior is needed for shared variables in Vulkan. This commit makes
glsl_get_array_element do something sensible for vector types and makes
nir_validate happy with them.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We were already validating that the parent type goes along with the
child type but we weren't actually validating that the parent type is
reasonable. This fixes that.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The GL_ARB_shader_ballot spec says that gl_SubGroupSizeARB is declared
as a uniform. This means that it cannot change across an invocation
such as a draw call or a compute dispatch. For compute shaders, we're
ok because we only ever use one dispatch size. For fragment, however,
the hardware dynamically chooses between SIMD8 and SIMD16 which violates
the spec. Instead, let's just pick a subgroup size based on the shader
stage. The fixed size we choose for compute shaders is a bit higher
than strictly needed but there's no real harm in that. The advantage is
that, if they do anything interesting with the value, NIR will see it as
an immediate and can optimize better.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Ballot intrinsics return a bitfield of subgroups. In GLSL and some
SPIR-V extensions, they return a uint64_t. In SPV_KHR_shader_ballot,
they return a uvec4. Also, some back-ends would rather pass around
32-bit values because it's easier than messing with 64-bit all the time.
To solve this mess, we make nir_lower_subgroups take a new parameter
called ballot_bit_size and it lowers whichever thing it gets in from the
source language (uint64_t or uvec4) to a scalar with the specified
number of bits. This replaces a chunk of the old lowering code.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The SUBGROUP_*_MASK system values are uint64_t when coming in from GLSL
but uvec4 when coming in from SPIR-V. Lowering based on type allows us
to nicely handle both.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This way they can return either a uvec4 or a uint64_t. At the moment,
this is a no-op since we still always return a uint64_t.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This commit pulls nir_lower_read_invocations_to_scalar along with most
of the guts of nir_opt_intrinsics (which mostly does subgroup lowering)
into a new nir_lower_subgroups pass. There are various other bits of
subgroup lowering that we're going to want to do so it makes a bit more
sense to keep it all together in one pass. We also move it in i965 to
happen after nir_lower_system_values to ensure that because we want to
handle the subgroup mask system value intrinsics here.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The automatic exec size inference can accidentally mess things up if
we're not careful. For instance, if we have
add(4) g38.2<4>D g38.1<8,2,4>D g38.2<8,2,4>D
then the destination register will end up having a width of 2 with a
horizontal stride of 4 and a vertical stride of 8. The EU emit code
sees the width of 2 and decides that we really wanted an exec size of 2
which doesn't do what we wanted.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We have had a feature in codegen for some time that tries to
automatically infer the execution size of an instruction from the width
of its destination. For things such as fixed function GS, clipper, and
SF programs, this is very useful because they tend to have lots of
hand-rolled register setup and trying to specify the exec size all the
time would be prohibitive. For things that come from a higher-level IR,
however, it's easier to just set the right size all the time and the
automatic exec sizes can, in fact, cause problems. This commit makes it
optional while enabling it by default.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Originally we tried to handle this case based on slots_valid. However,
there are a number of ways that this can go wrong. For one, we throw
away any trailing slots which either aren't written or are set to
VARYING_SLOT_PAD. Second, even if PSIZ is a valid slot, we may not
actually write anything there. Between the lot of these, it was
possible to end up in a case where we tried to do a regular URB write
but ended up with a length of 1 which is invalid. This commit moves it
to the end and makes it based on a new boolean flag urb_written.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Subgroup invocation is computed using a vector immediate and some
dispatch-aware arithmetic. Unfortunately, due to the vector arithmetic,
and the fact that it's frequently read 16-wide, it's not something that
can easily be CSEd by the back-end compiler. There are a few different
possible approaches to this problem:
1) Emit the code to calculate the subgroup invocation on-the-fly and
trust NIR to do the CSE. This is what we were doing.
2) Add a back-end instruction for the subgroup ID. This has the
advantage of helping the back-end compiler with CSE but has the
downside of very poor scheduling for the calculation because it has
to be emitted in the back-end.
3) Emit the calculation at the top of the program and re-use the
result. This gets rid of the CSE problem but comes at the cost of
an extra live register.
This commit switches us from 1) to 3). We choose to store the subgroup
invocation values as a W type to reduce the impact of the extra live
register. Trusting NIR and using 1) was fine but we're soon going to
want to use the subgroup invocation value for other things in the
back-end compiler and this makes it much easier to do without having to
worry about CSE problems.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We're going to want subgroup ID for SPIR-V subgroups eventually anyway.
We really only want to push one and calculate the other from it. It
makes a bit more sense to push the subgroup ID because it's simpler to
calculate and because it's a real API thing. The only advantage to
pushing the base thread ID is to avoid a single SHL in the shader.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
With the advent of SPIR-V subgroup operations, compute shaders will have
to be slightly different depending on the SIMD size at which they
execute. In order to allow us to do dispatch-width specific things in
NIR, we re-run the final NIR stages for each sIMD width.
One side-effect of this change is that we start rallocing fs_visitors
which means we need DECLARE_RALLOC_CXX_OPERATORS.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Previously, brw_nir_lower_intrinsics added the param and then emitted a
load_uniform intrinsic to load it directly. This commit switches things
over to use a specific NIR intrinsic for the thread id. The one thing I
don't like about this approach is that we have to copy thread_local_id
over to the new visitor in import_uniforms.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This isn't often a problem , when we're in a compute shader, we must
push the thread local ID so we decrement the amount of available push
space by 1 and it's no longer even and 64-bit data can, in theory, span
it. By marking those uniforms contiguous, we ensure that they never get
split in half between push and pull constants.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
The only things that adjust fs_visitor::max_dispatch_width are render
target writes which don't happen in compute shaders so they're
pointless.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
It's 8 for everything except compute shaders. For compute shaders,
there's no need to duplicate the computation and it's just a possible
source of error.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Before, we bailing in assign_constant_locations based on the minimum
dispatch size. The more direct thing to do is simply to check for
whether or not we have constant locations and bail if we do. For
nir_setup_uniforms, it's completely safe to do it multiple times because
we just copy a value from the NIR shader.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This is what we really wanted all along. Always retyping to D works
because that's what get_nir_src() always gives us, at least for 32-bit
types. The SPIR-V variants of these operations accept arbitrary types
and we need this if we're going to handle 64 or 16-bit values.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The index is any value provided by the shader and this can be called in
non-uniform control flow so we can't just take component 0. Found by
inspection.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Stop retyping the output of shuffle_64bit_data_for_32bit_write. It's
always BRW_REGISTER_TYPE_D which is perfectly fine for writing out.
Also, when we change get_nir_src to return something with a 64-bit type
for 64-bit values, the retyping will not be at all what we want. Also,
retyping the output based on src.type before we whack it back to 32 bits
is a problem because the output is always 32 bits.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
All callers of this function allocate a fs_reg expressly to pass into
it. It's much easier if we just let the helper allocate the register.
While we're here, we switch it to doing the MOVs with an integer type so
that we don't accidentally canonicalize floats on half of a double.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The swizzles weren't doing any good because swiz is just XYZW. Also, we
were emitting an extra set of MOVs because shuffle_64bit_data_for_32bit
already does a MOV for us. Finally, the temporary was only ever used
inside the inner loop so there's no need for it to actually be an array.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Some hardware (CHV, BXT) have special restrictions on register regions
when doing integer multiplication. We want to respect those when we
lower to DxW multiplication.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
The same workaround we need for 64-bit values on little core also takes
care of the Ivy Bridge problem and does so a bit more efficiently so we
can drop that code while we're here.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
We're not using broadcast for any 32-bit types right now since we mostly
use it for emit_uniformize on 32-bit buffer indices. However, SPIR-V
subgroups are going to need it for 64-bit so let's make it work.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This means we have to drop const from a variable but it also means that
100% of the code which deals with the offset limit is in one place.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
These restrictions effectively already existed due to the way we use
indirect sources but weren't being directly enforced.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
For some reason, the any/all predicates don't work properly with SIMD32.
In particular, it appears that a SEL with a QtrCtrl of 2H doesn't read
the correct subset of the flag register and you end up getting garbage
in the second half. Work around this by using a pair of 1-wide MOVs and
scattering the result. This fixes the any/all instructions for SIMD32.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
The any/all intrinsics return a boolean value so D or UD is the correct
type. Unfortunately, get_nir_dest has the annoying behavior of
returnning a float type by default. This causes format conversion which
gives us -1.0f or 0.0f in the register. If the consumer of the result
does an integer comparison to zero, it will give you the right boolean
value but if we do something more clever based on the 0/~0 assumption
for booleans, this will give the wrong value.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
In fragment shaders f0.1 is used for discards so doing ballot after a
discard can potentially cause the discard to not happen. However, we
don't support SIMD32 fragment shaders yet so this isn't a problem.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
We have ANY/ALL32 predicates and, for the most part, they work just
fine. (See the next commit for more details.) Also, due to the way
that flag registers are handled in hardware, instruction splitting is
able to split the CMP correctly. Specifically, that hardware looks at
the execution group and knows to shift it's flag usage up correctly so a
2H instruction will write to f0.1 instead of f0.0.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Before, we were careful to place the zip after the last of the split
instructions but did unzip on-demand. This changes things so that the
unzips go before all of the split instructions and the unzip comes
explicitly after all the split instructions. As a side-effect of this
change, we now emit the split instruction from highest SIMD group to
lowest instead of low to high. We could have kept the old behavior, but
it shouldn't matter and this made the code easier.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
This makes it far more explicit where we're inserting the instructions
rather than the magic "before and after" stuff that the emit_[un]zip
helpers did based on block and inst.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Register strides higher than 4 are uncommon but they can happen. For
instance, if you have a 64-bit extract_u8 operation, we turn that into
UB -> UQ MOV with a source stride of 8. Our previous calculation would
try to generate a stride of <32;8,8>:ub which is invalid because the
maximum horizontal stride is 4. To solve this problem, we instead use a
stride of <8;1,0>. As noted in the comment, this does not work as a
destination but that's ok as very few things actually generate that
stride.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
The lod clamping is what limits you between base and last level, and the
base level field is just there to help decide where the min/mag change
happens.
Fixes tex-miplevel-selection GL2:texture()
The ordering of the values was even less obvious than I thought, with both
the mip filter and the min filter being in different bits depending on
whether the mip filter is none.
Fixes piglit fs-textureLod-miplevels.shader_test
The HW doesn't pad the slice's height to make a full 4x4 group of UIF
blocks. We just need to pad to columns, and the start of the next column
appears in the bottom of the previous column's last block.
Fixes piglit fs-textureOffset-2D.
This is so much more pleasant to write than the manual
V3D33_whatever_pack() calls, and will be useful for when we start doing
actual per-V3D compiles.
As with blending, we'll have the bit flagged again when it gets reenabled
in CONFIGURATION_BITS, so there's no need to emit test state if we're not
testing.
Groups containing fields smaller than a byte probably not being decoded
correctly. For example:
<group count="32" start="32" size="4">
<field name="Vertex Element Enables" start="0" end="3" type="uint"/>
</group>
gen_field_iterator_next would properly walk over each element of the
array, incrementing group_iter. However, the code to print the actual
values only considered iter->field->start/end, which are 0 and 3 in the
above example. So it would always fetch bits 3:0 of the current byte,
printing the same value over and over.
Cc: Eric Anholt <eric@anholt.net>
The HW puts the pad bits at the top for DEPTH_COMPONENT24, but we need it
at the bottom for texturing. Using the format with stencil probably means
we won't be able to do Z24 and separate S8, but I wasn't planning on
supporting that anyway.
Fixes hiz-depth-read-fbo-d24-s0
I saw VK_IMAGE_ASPECT_ANY_COLOR_BIT while hacking anv_formats.c and got
confused. "Huh? What extension added that?". No extension defines it;
anv_private.h defines it.
To remove confusion, rename the anv-private VK tokens as if they were
extension tokens with the ANV vendor suffix.
I found only two such tokens:
VK_IMAGE_ASPECT_ANY_COLOR_BIT
VK_IMAGE_ASPECT_PLANES_BITS
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The DummyShader is used by GenFragmentShadersATI() as a placeholder to
mark IDs as allocated. Context cleanup wants to delete everything in
ctx->Shared->ATIShaders, and crashes on these placeholders with this
backtrace:
==15060== Invalid free() / delete / delete[] / realloc()
==15060== at 0x482F478: free (vg_replace_malloc.c:530)
==15060== by 0x57694F4: _mesa_delete_ati_fragment_shader (atifragshader.c:68)
==15060== by 0x58B33AB: delete_fragshader_cb (shared.c:208)
==15060== by 0x5838836: _mesa_HashDeleteAll (hash.c:295)
==15060== by 0x58B365F: free_shared_state (shared.c:377)
==15060== by 0x58B3BC2: _mesa_reference_shared_state (shared.c:469)
==15060== by 0x578687F: _mesa_free_context_data (context.c:1366)
==15060== by 0x595E9EC: st_destroy_context (st_context.c:642)
==15060== by 0x5987057: st_context_destroy (st_manager.c:772)
==15060== by 0x5B018B6: dri_destroy_context (dri_context.c:217)
==15060== by 0x5B006D3: driDestroyContext (dri_util.c:511)
==15060== by 0x4A1CBE6: dri3_destroy_context (dri3_glx.c:170)
==15060== Address 0x7b5dae0 is 0 bytes inside data symbol "DummyShader"
Also, DeleteFragmentShadersATI() should not assert on DummyShader, just
remove the hash entry.
Normally one would define a shader after GenFragmentShadersATI(), and
BindFragmentShaderATI() replaces the placeholder with a real object.
However, the specification doesn't say that one has to define a shader
for each allocated ID.
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
autotools generates libGLESv1_CM.so.1.0.0, so let's make sure meson
does the same.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
This `version` field defines the filename for the .so.
The plan .so as well as .so.$major are always symlinks to this.
Unless I'm mistaken, only the major is ever used, so this shouldn't
matter, but for consistency with autotools (and in case it does matter),
let's always have all 3 major.minor.patch components.
(The soname isn't affected, and is always .so.$major)
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
I was hacking something stupid in doom, and hit an assert for the bitcast
following this, it definitely looks like this should be the number of 32-bit
components, not the instr level ones.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Commit 259fc50545 added linker error for
mismatching uniform precision, as required by GLES 3.0 specification and
conformance test-suite.
Several Android applications, including Forge of Empires, have shaders
which violate this rule, on a dead varying that will be eliminated.
The problem affects a big number of applications using Cocos2D engine
and other GLES implementations accept this, this poses a serious
application compatibility issue.
Starting from GLSL ES 3.0, declarations with conflicting precision
qualifiers are explicitly prohibited. However GLSL ES 1.00 does not
clearly specify the behavior, except that
"Uniforms are defined to behave as if they are using the same storage in
the vertex and fragment processors and may be implemented this way.
If uniforms are used in both the vertex and fragment shaders, developers
should be warned if the precisions are different. Conversion of
precision should never be implicit."
The word "used" is not clear in this context and might refer to
1) declared (same as GLES 3.x)
2) referred after post-processing, or
3) linked after all optimizations are done.
Looking at existing applications, 2) or 3) seems to be widely adopted.
To avoid compatibility issues, turn the error into a warning if GLSL ES
version is lower than 3.0 and the data is dead in at least one of the
shaders.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97532
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We can avoid adding the buffer in the non-local case, this will
avoid all the overhead of the indirect call.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The next patch will try and avoid calling the indirect function.
v2: add a missing conversion.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The function that calls us has just added the buffer to the
list already, no need to try and add it again.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
There's no point recalculating these the whole time on descriptor
emission, just store them at pipeline creation.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Hilariously this is a fairly big win. Neil's multi-context-test
improves from ~24 to ~36 fps with llvmpipe on a Core i5-3317U. softpipe
also improves, from about 2.25 to 3.09 fps (when it's that slow, you're
allowed to be that precise).
I'd have added it to swrast classic, but the testcase wants GL 3.0 and
shaders, and that's not a thing classic has, so I figured making it work
on softpipe was crime enough.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Previously the CreateContext method of __DriverApiRec took a set of
arguments to describe the attribute values from the window system API's
CreateContextAttribs function. As more attributes get added this could
quickly get unworkable and every new attribute needs a modification for
every driver.
To fix that, pass the attribute values in a struct instead. The struct
has a bitmask to specify which members are used. The first three members
(two for the GL version and one for the flags) are always set. If the
bit is not set in the attribute mask then it can be assumed the
attribute has the default value. Drivers will error if unknown bits in
the mask are set.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Neil Roberts <neil@linux.intel.com>
It shouldn't be necessary to flush the context within the driver
implementation because the old context is explicitly flushed in
_mesa_make_current which is called a little further on. It is useful to
only have a single place that flushes when switching contexts to make it
easier to later implement the GL_KHR_context_flush_control extension.
The flush in intelMakeCurrent was added in commit 5505865 to implement
the GLX semantics that the context should be flushed when it is
released. When the commit was made there was no flush in
_mesa_make_current because it was only added later in 93102b4c. I think
that later commit effectively makes the first commit redundant.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Neil Roberts <neil@linux.intel.com>
HALIGN_FOUR/SIXTEEN has no meaning for compressed textures, and we can't
render to them anyway. So use the tightest possible packing. This
avoids bugs with non-power-of-two block sizes.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Add ASTC texture support for hardware that supports this
(currently only GC3000 on i.MX6qp is known to have this).
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Uploaded data must start at (stride * start), because we can't modify
start in all cases. If it's the first allocation, it's also the amount
of memory wasted. If the starting offset is larger than the size of
the upload buffer, the buffer is re-created, used for 1 upload, and then
thrown away. If the upload is small, most of the buffer space is unused
and wasted. Keep doing that and the OOM killer comes. It's actually
pretty quick.
With signed VB offsets, we can set min_out_offset = 0
in u_upload_alloc/u_upload_data.
This fixes OOM situations with SPECviewperf.
Fixes:
make[2]: Leaving directory '/home/local/mesa/mesa-17.4.0-devel/_build/sub/src'
make[2]: *** No rule to make target '../../../src/git_sha1.h.in', needed by 'git_sha1.h'. Stop.
Makefile:660: recipe for target 'all-recursive' failed
Fixes: 16be271c6e "git_sha1_gen: use git_sha1.h.in on all build systems"
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Instead of storing all the pointers and zeroing them all out,
just store a valid bitmask in the state. This also moves
the CmdBindPipeline path down the cpu usage path for the
multithreading demo as it no longer has to traverse MAX_SETS
to find the active descriptor sets.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This isn't required to be cleared, since buffers are only linked
by vertex elements, so if elements are clear then no buffers
should be referenced.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just removes a hole in the cmd_state and packs some bools
together.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we allocate attachments in the begin command buffer due to the
render pass continue bit, we were leaking them.
Since renderpasses inside a cmd buffer malloc/free these properly,
and set to NULL, we just need to call free at end.
Fixes a memory leak with multithreading demo.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
uint32_t data[MAX_SETS * 2] = {}; was getting executed before
the exit and took significant amounts of time. By having the
check outside the function, we skip the execution of the clear.
Reviewed-by: Dave Airlie <airlied@redhat.com>
The vram_list linked list resulted in lots of pointer chasing.
Replacing this with an array instead improves descriptor set
allocation CPU usage by 3x at least (when also considering the free),
because it had to iterate through 300-400 sets on average.
Not a huge improvement as the pre-improvement CPU usage was only
about 2.3% in the busiest thread.
Reviewed-by: Dave Airlie <airlied@redhat.com>
In OpenCL/CUDA kernels, shared memory usage can be defined within the
kernel code. Those usage will only be picked up while parsing the
SPIR-V, during the translation phase of the program.
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
This workaround doesn't fix any of the piglit hangs we've seen
on CNL. But it might be fixing something we haven't tested yet.
V2: Remove the bits enabling Float blend optimization. It is
enabled through CACHE_MODE_SS register.
Update the comment.
Move gen10 if block on top of gen9 if block.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This optimization is enabled for previous generations too.
See Mesa commit c17e214a6b
On CNL this bit has been moved to CACHE_MODE_SS register.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This workaround doesn't fix any of the piglit hangs we've seen
on CNL. But it might be fixing something we haven't tested yet.
V2: Add the check for Post Sync Operation.
Update the workaround comment.
Use braces around if-else.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
There are few other (duplicate) workarounds which have similar recommendations:
WaFlushHangWhenNonPipelineStateAndMarkerStalled
WaCSStallBefore3DSamplePattern
WaPipeControlBefore3DStateSamplePattern
WaPipeControlBefore3DStateSamplePattern has some extra recommendations if
driver is using mid batch context restore. Ignoring it for now because We're
not doing mid-batch context restore in Mesa.
This workaround doesn't fix any of the piglit hangs we've seen
on CNL. But it might be fixing something we haven't tested yet.
V2: Use brw_load_register_imm32() to program CACHE_MODE_0.
Get rid of brw_flush_gpu_caches().
V3: Make the workaround helper functions static.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by :Nanley Chery <nanley.g.chery@intel.com>
Fixes reverted patch f03b7c9 by doing VMID reservation per
process and not per context.
Also updates required amdgpu libdrm version since the change
involved interface updates in amdgpu libdrm.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The dynamic index of a vector (not array!) is lowered to a sequence of
conditional assignments. However, the interpolate_at_* expressions
require that the interpolant is an l-value of a shader input.
So instead of doing conditional assignments of parts of the shader input
and then interpolating that (which is nonsensical), we interpolate the
entire shader input and then do conditional assignments of the interpolated
result.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The intended rule has been clarified in GLSL 4.60, Section 8.13.2
(Interpolation Functions):
"For all of the interpolation functions, interpolant must be an l-value
from an in declaration; this can include a variable, a block or
structure member, an array element, or some combination of these.
Component selection operators (e.g., .xy) may be used when specifying
interpolant."
For members of interface blocks, var->data.must_be_shader_input must be
determined on-the-fly after lowering interface blocks, since we don't want
to disable varying packing for an entire block just because one input in it
is used in interpolateAt*.
v2: keep setting must_be_shader_input in ast_function (Ian)
v3: follow the relaxed rule of GLSL 4.60
v4: only apply the relaxed rules to desktop GL
(the ES WG decided that the relaxed rules may apply in a future version
but not retroactively; see also
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.negative.*)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101378
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
We need to validate some structs exist before we dirty the states, and
avoid the problem in some other places.
Fixes: e027935a7 ("st/mesa: don't update unrelated states in non-draw calls such as Clear")
These are produced by nir_lower_bitmap(), adding the missing derefence
would cause other issues that need to be hacked around such as
skipping sampler lowering and uniform location assignment, so this
change seems the correct way to go.
Fixes 194 piglit crashes on radeonsi using NIR.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This add support for the early depth/stencil property found
on image shaders.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support for emitting RAT instructions to the assembler.
RAT instructions are used to implement image accessors.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support to the assembler for the mark bit
on the export word1.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds support to the assembler for setting the valid
pixel mode on the CF clause.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These special ALU sources provide the shader engine,
simd and hw wave ids.
These are required for images support.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This should reduce the time where compute units are idle, mainly
for meta operations because they use a bunch of compute shaders.
This seems to have a really minor positive effect for Talos, at least.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It confuses CTS. This pregenerates the heap info into the
physical device, so we can use it for translating contiguous
indices into our "standard" ones.
This also makes the WSI a bit smarter in case the first preferred
heap does not exist.
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: <mesa-stable@lists.freedesktop.org>
This is just a bad idea and should be avoided. Instead, make the #include
flat and fix the build systems to pass the proper -I flags
v2: - add an inc_wayland_drm instead passing a path to
include_directories (Emil)
- update commit message (Emil)
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
It's inaccurate. Instead, see the copyright and use "git log" and
"git blame" to know the authorship.
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Thanks to the ralloc invariant of "any pointer returned from ralloc can
be used as a context", calling ralloc_size with a size of zero will
cause it to allocate at least a header. If we don't have any push
constants, then NULL is perfectly acceptable (and even preferred).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This reverts commit d364684711.
The commit that bumped the autotools version was reverted, so lets
revert the meson version to match.
fixes: 1f2640bfa9
"Revert "winsys/amdgpu: Add R600_DEBUG flag to reserve VMID per ctx.""
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Prevents an assertion when using GALLIUM_HUD with ioquake3,
when cso_restore_constant_buffer_slot0 restores an empty
constant buffer in slot 0.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Structure code to only flush when we will potentially call cpu_prep. This
prevents spurious flushes in applications that heavily rely on u_uploader.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
GC3000 resolve-in-place assumes that the TS state is configured.
If it is not, this will result in MMU errors. This is especially
apparent when using glGenMipmaps().
Fixes: 78ade65956 ("etnaviv: Do GC3000 resolve-in-place when possible")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Tested-by: Chris Healy <cphealy@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
This should make sure we don't treat exports buffers as local
bos.
Fixes: a639d40f13 (radv: add support for local bos. (v3))
Tested-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
__asm__ is portable, and allows the svga driver to be compiled with the
c99 standard instead of requiring the gnu99 standard.
I have compile tested this with GCC and Clang on Linux.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
If we have more programs than what we can store,
aubinator_error_decode will assert. Instead let's have a rolling
window of programs.
v2: Fix overflowing issues (Eric Engestrom)
v3: Go through programs starting at idx_program (Scott)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
MSVC treats enums as being signed. The 4-bit target field isn't large
enough to correctly store the value 8 (for PIPE_TEXTURE_CUBE_ARRAY).
The bitfield value 0x8 was being interpreted as -8 so matching the
target with PIPE_TEXTURE_CUBE_ARRAY in switch statements, etc. was
failing.
To keep the structure size the same, we reduce the format field from
16 bits to 15. There don't appear to be any other enum bitfields
which need to be adjusted.
This fixes a number of Piglit cube map array tests.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This makes use of ralloc to simplify the destruction. We can also
store instructions in hash tables.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
We used to print invalid data when the last field was being clamped to
32bits due to Dword Length of the whole instruction. Here is an
example where the decoder read part of the next instruction instead of
stopping at the 32bit limit:
0x000ce0b4: 0x10000002: MI_STORE_DATA_IMM
0x000ce0b4: 0x10000002 : Dword 0
DWord Length: 2
Store Qword: 0
Use Global GTT: false
0x000ce0b8: 0x00045010 : Dword 1
Core Mode Enable: 0
Address: 0x00045010
0x000ce0bc: 0x00000000 : Dword 2
0x000ce0c0: 0x00000000 : Dword 3
Immediate Data: 8791026489807077376
With this change we have the proper value :
0x000ce0b4: 0x10000002: MI_STORE_DATA_IMM (4 Dwords)
0x000ce0b4: 0x10000002 : Dword 0
DWord Length: 2
Store Qword: 0
Use Global GTT: false
0x000ce0b8: 0x00045010 : Dword 1
Core Mode Enable: 0
Address: 0x00045010
0x000ce0bc: 0x00000000 : Dword 2
0x000ce0c0: 0x00000000 : Dword 3
Immediate Data: 0
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Due to the new way we handle fields, we need *not* to forget the first
field when decoding instructions. The issue was that the advance
function was called first and skipped the first field.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
The xml files don't always have fields in order. This might confuse
our parsing of the commands. Let's have the fields in order. To do
this, the easiest way it to use a linked list. It also helps a bit
with the iterator.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
(Apologies for the double negative.)
For now, the shader cache is disabled by default on i965 to allow us
to verify its stability.
In other words, to enable the shader cache on i965, set
MESA_GLSL_CACHE_DISABLE to false or 0. If the variable is unset, then
the shader cache will be disabled.
We use the build-id of i965_dri.so for the timestamp, and the pci
device id for the device name.
v2:
* Simplify code by forcing link to include build id sha. (Matt)
v3:
* Don't use a for loop with snprintf for bin to hex. (Matt)
* Assume fixed length render and timestamp string to further simplify
code.
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This would cause the read of the metadata content to fail, which would
prevent the linking from being skipped.
Seen on Rocket League with i965 shader cache.
Fixes: b86ecea344 "util/disk_cache: write cache item metadata to disk"
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes many GL 4.5 CTS blend tests, such as:
* GL45-CTS.blend_equation_advanced.extension_directive_enable
* GL45-CTS.blend_equation_advanced.extension_directive_warn
* GL45-CTS.blend_equation_advanced.blend_all.GL_MULTIPLY_KHR_all_qualifier
* GL45-CTS.blend_equation_advanced.blend_specific.GL_COLORBURN_KHR
v2:
* Directly save the BlendSupport field to avoid potentially including
a pointer in the future in the structure is updated. (tarceri)
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If the i965 gen program cannot be loaded from the cache, then we
fallback to using a serialized nir program.
This is based on "i965: add cache fallback support" by Timothy Arceri
<timothy.arceri@collabora.com>. Tim's version was written to fallback
to compiling from source, and therefore had to be much more complex.
After Connor and Jason implemented nir serialization, I was able to
rewrite and greatly simplify this patch.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For now this disables the shader cache when transform feedback is
enabled via the GL API as we don't currently allow for it when
generating the sha for the shader.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used to disable the shader cache when xfb is enabled
via the api as we don't currently allow for it when generating the
sha for the shader.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This enables the cache on vertex and fragment shaders only.
v2:
* Use MAYBE_UNUSED. (Matt)
[jordan.l.justen@intel.com: reword subject]
[jordan.l.justen@intel.com: *_cached_program => brw_disk_cache_*_program]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This uses the Mesa disk_cache support to write out the final linked
binary for vertex and fragment shader programs.
This is based off the initial implementation done by Carl Worth. It
has been significantly reworked, first by Tim Arceri, and then by
Jordan Justen.
v2:
* Squash 'i965: add image param shader cache support'
* Squash 'i965: add shader cache support for pull param pointers'
* Sustantially simplified by a rework on top of Jason's 2975e4c56a.
* Rename load_program_data to read_program_data. (Jason)
v3:
* Simplify and align program read/write. (Jason)
v4:
* Don't save prog_data size since we know it from the stage. (Ken)
* Don't save program size, since prog_data includes the size. (Ken)
* Remove `assert` that potentially could be triggered by disk
corruption of the cache entries. (Ken)
* Fix compute shader scratch allocation. (Ken)
* Remove special case mapping for non-LLC. (Ken)
* Remove SET_UPLOAD_PARAMS macro
[jordan.l.justen@intel.com: *_cached_program => brw_disk_cache_*_program]
[jordan.l.justen@intel.com: brw_shader_cache.c => brw_disk_cache.c]
[jordan.l.justen@intel.com: don't map to write program when LLC is present]
[jordan.l.justen@intel.com: set program_written_to_cache on read from cache]
[jordan.l.justen@intel.com: only try cache when status is linking_skipped]
[jordan.l.justen@intel.com: all v2-v4 changes noted above]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, thread_count was sent in from the stage after some stage
specific calculations. Those stage specific calculations were moved
into brw_alloc_stage_scratch, which will allow the shader cache to
also use the same calculations.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used by the on disk shader cache.
v2:
* Set in brw_compile_* rather than brw_codegen_*. (Jason)
Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com>
[jordan.l.justen@intel.com: Only add to brw_stage_prog_data]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When a program is restored from the shader cache, prog->nir will be
NULL, but prog->info will be restored.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If the shader cache is enabled, after linking the program, we
serialize the program to nir. This will be saved out by the glsl
shader cache support.
Later, if the same program is found in the cache, we can use the nir
for a fallback in the unlikely case that the gen binary program is not
found in the cache.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
These fields can be used to optionally save off a driver blob with the
program metadata. For example, serialized nir, or tgsi.
v3:
* Rename serialized_nir* to driver_cache_blob*. (Tim)
* Free memory. (Jason)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2 (Jason Ekstrand):
- Various whitespace cleanups
- Add helpers for reading/writing objects
- Rework derefs
- [de]serialize nir_shader::num_*
- Fix uses of blob_reserve_bytes
- Use a bitfield struct for packing tex_instr data
v3:
- Zero nir_variable struct on deserialization. (Jordan)
- Allow nir_serialize.h to be included in C++. (Jordan)
- Handle NULL info.name. (Jason)
- Set info.name to NULL when name is NULL. (Jordan)
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
if the driver sets the cap, then use the value it gives us.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Some hw (evergreen) has a limit on how many combined (images/buffers/mrts)
a fragment shader can access.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
It is possible that the optimizer ends up in an infinite loop in
post_scheduler::schedule_alu(), because post_scheduler::prepare_alu_group()
does not find a proper scheduling. This can be deducted from
pending.count() being larger than zero and not getting smaller.
This patch works around this problem by signalling this failure so that the
optimizers bails out and the un-optimized shader is used.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103142
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The shared si_create_shader_selector() code already offsets the mask.
Fixes the following piglit tests:
arb_cull_distance/clip-cull-3.shader_test
arb_cull_distance/clip-cull-4.shader_test
Fixes: 29d7bdd179 (radeonsi: scan NIR shaders to obtain required info)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously the values were calculated by just shifting ~0 by the
invocation ID. This would end up including bits that are higher than
gl_SubGroupSizeARB. The corresponding CTS test effectively requires that
these high bits be zero so it was failing. There is a Piglit test as
well but this appears to checking the wrong values so it passes.
For the two greater-than bitmasks, this patch adds an extra mask with
(~0>>(64-gl_SubGroupSizeARB)) to force these bits to zero.
Fixes: KHR-GL45.shader_ballot_tests.ShaderBallotBitmasks
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102680#c3
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Only use CCS_E to render to a texture that is CCS_E-compatible with the
original texture's miptree (linear) format. This prevents render
operations from writing data that can't be decoded with the original
miptree format.
On Gen10, with the new CCS_E-enabled formats handled, this enables the
driver to pass the arb_texture_view-rendering-formats piglit test.
v2. Add a TODO for texturing. (Jason)
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
CannonLake additionally supports R11G11B10_FLOAT and four 10-10-10-2
formats with CCS_E. None of these formats fit within the current
blorp_copy framework so disable them until support is added.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It's supposed to be linked with pthread-stubs (if the platform needs
pthread-stubs). Pthread stubs support isn't (yet) implemented in the
meson build, so add a TODO.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This allows a user to not care whether they're setting a tristate or a
boolean option, which is a nice user facing feature, and something I've
personally run into.
Suggested-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
If we don't want to use these deps, there's no good reason to search
for them in the first place. This should shave a bit of time for the
initial build.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
The u_format_other.c users sqrtf, which on some systems require
a math-library. So let's make sure we link with it.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes regression in:
dEQP-VK.api.object_management.alloc_callback_fail.graphics_pipeline
Fixes: 1e84e53712 "radv: add cache items to in memory cache when reading from disk"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This patch modifies the ARB_indirect_parameters logic in
brw_draw_prims, so that our implementation isn't affected if
another application attempts to use predicates. Previously we
were using a predicate with a DELTAS_EQUAL comparison operation
and relying on the MI_PREDICATE_DATA register being 0. Our code
to initialize MI_PREDICATE_DATA to 0 was incorrect, so we were
accidentally using whatever value was written there. Because the
kernel does not initialize the MI_PREDICATE_DATA register on
hardware context creation, we might inherit the value from whatever
context was last running on the GPU (likely another process).
The Haswell command parser also does not currently allow us to write
the MI_PREDICATE_DATA register. Rather than fixing this and requiring
an updated kernel, we switch to a different approach which uses a
SRCS_EQUAL predicate that makes no assumptions about the states of any
of the predicate registers.
Fixes Piglit's spec/arb_indirect_parameters/tf-count-arrays test.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103085
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Due to a gaffe on my part, we were re-emitting all binding table entries
on every single draw call. The push_constant_packets atom listens to
BRW_NEW_DRAW_CALL, but skips emitting 3DSTATE_CONSTANT_XS for each stage
unless stage_state->push_constants_dirty is true. However, it flagged
BRW_NEW_SURFACES unconditionally at the end, by mistake.
Instead, it should only flag it if we actually emit 3DSTATE_CONSTANT_XS
for a stage. We can move it a few lines up, inside the loop - the early
continues will skip over it if push constants aren't dirty for a stage.
With INTEL_NO_HW=1 set, improves performance of GFXBench5 gl_driver_2
on Apollolake at 1280x720 by 1.01122% +/- 0.470723% (n=35).
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Groups containing fields smaller than a DWord were not being decoded
correctly. For example:
<group count="32" start="32" size="4">
<field name="Vertex Element Enables" start="0" end="3" type="uint"/>
</group>
gen_field_iterator_next would properly walk over each element of the
array, incrementing group_iter, and calling iter_group_offset_bits()
to advance to the proper DWord. However, the code to print the actual
values only considered iter->field->start/end, which are 0 and 3 in the
above example. So it would always fetch bits 3:0 of the current DWord
when printing values, instead of advancing to each element of the array,
printing bits 0-3, 4-7, 8-11, and so on.
To fix this, we add new iter->start/end tracking, which properly
advances for each instance of a group's field.
Caught by Matt Turner while working on 3DSTATE_VF_COMPONENT_PACKING,
with a patch to convert it to use an array of bitfields (the example
above).
This also fixes the decoding of 3DSTATE_SBE's "Attribute Active
Component Format" fields.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Fixes fbo-blending-formats on RGB8 and 565. We will still need to demote
blending to shader code in the MRT case to fix it in general, but that can
be added when we start doing 32F blending (which also needs to be done in
the shader).
The previous packing I did got us all the R*16F and R*32F formats, where
the pipe format basically matched the TLB's format, but since the clear
color will just be memcpyed to the TLB, we should be looking at its format
for deciding how to pack.
Fixes RGB565, RGB5_A1 and RGBA10 fbo-clear-formats tests and improves
4444.
The output formats are consistent with their channels appearing from low
to high in their name. Textures are interpreted the same way, but their
names may have the channels swapped around. I'm retaining the texture
names so that we are consistent with the documentation, but I want to
leave a warning for others.
In the case of fneg(0.0), we were getting back 0.0 instead of -0.0. We
were also needing an immediate 0 value for ineg, when there's an opcode to
do the job properly.
Fixes fs-floatBitsToInt-neg.shader_test.
We were storing the resolved pixels in all cases, but nr_samples > 0 means
we should be keeping the per-sample values.
We will probably want to change the job structure at some point, as we'll
want to recognize full-buffer resolves and do the resolved store in the
same job as the original rendering, meaning we'll need to track both the
MSAA and single-sample resources in the job. However, this will be enough
to build the rest of the MSAA support.
The HW has no native sampler support for multisample textures, but since
we only need to support txf_ms and the layout is UIF, we just need to
scale up the texcoords and then add in the sample.
This drops the old TEXTURE_MSAA_ADDR special uniform, since we're treating
MSAA textures as textures, rather than basically texbos like VC4 had to.
We just need to multiply width/height by 2 each, and always set them up as
UIF tiling, since that's how the TLB will store them in raw (per-sample)
mode.
We were handing the intra-byte padding fine, but with a 24-bit address
(bottom 8 bits implied 0) we would end up off by 8 bytes in our shift,
impacting vc5's load/store general packets (all other packets we have had
<8 bits of padding).
We only have 2x16 unpacking in our ALUs. To enable this, we also need
lower_fdiv for its new instructions, which had been handled at a higher
level previously.
I already had the texture's wrapping set up to use different behavior for
nearest or linear, so we just needed to saturate the coordinates in linear
mode to get the "proper" blend between the edge and border values.
1D is the exception to "all V3D textures are tiled", since tiling 1D
textures would just waste memory and cache space. This ended up being a
problem once we started actually marking 1D textures as 1D instead of 2D.
Like VC4, we need to at least have one element set up, but unlike VC4 it
seems we don't need to read it to keep the HW happy. Fixes GPU hangs with
glsl-no-vertex-attribs.shader_test.
>From GLSL 4.5 spec, section "7.1 Built-In Language Variables", page 130 of
the PDF states:
"If multiple shaders using members of a built-in block belonging to
the same interface are linked together in the same program, they must
all redeclare the built-in block in the same way, as described in
section 4.3.9 “Interface Blocks” for interface-block matching, or a
link-time error will result."
Fixes:
* GL45-CTS.CommonBugs.CommonBug_PerVertexValidation
v2 (Neil Roberts):
Explicitly look for gl_PerVertex in the symbol tables instead of
waiting to find a variable in the interface.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102677
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
This effectively factorizes a couple of similar routines.
v2 (Neil Roberts): Non-trivial rebase on master
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Some symbols gathered in the symbols table during parsing are needed
later for the compile and link stages, so they are moved along the
process. Currently, only functions and non-temporary variables are
copied between symbol tables. However, the built-in gl_PerVertex
interface blocks are also needed during the linking stage (the last
step), to match re-declared blocks of inter-stage shaders.
This patch adds a new utility function that will factorize current code
that copies functions and variables between two symbol tables, and in
addition will copy explicitly declared gl_PerVertex blocks too.
The function will be used in a subsequent patch.
v2 (Neil Roberts):
Allow the src symbol table to be NULL and explicitly copy the
gl_PerVertex symbols in case they are not referenced in the exec_list.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
NIR does not have these instructions. TGSI and Mesa IR both implement
them using < and >=, repsectively. Removing them deletes a bunch of
code and means I don't have to add code to the SPIR-V generator for
them.
v2: Rebase on 2+ years of change... and fix a major bug added in the
rebase.
text data bss dec hex filename
8255291 268856 294072 8818219 868e2b 32-bit i965_dri.so before
8254235 268856 294072 8817163 868a0b 32-bit i965_dri.so after
7815339 345592 420592 8581523 82f193 64-bit i965_dri.so before
7813995 345560 420592 8580147 82ec33 64-bit i965_dri.so after
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Without the lexer changes, tests/glslparsertest/glsl2/tex_rect-02.frag
fails. Before this change, the parser would determine that
sampler2DRect is not a valid type because the call to
state->symbols->get_type() in ast_type_specifier::glsl_type() would
return NULL. Since ast_type_specifier::glsl_type() is now going to
return the glsl_type pointer that it received from the lexer, it doesn't
have an opportunity to generate an error.
text data bss dec hex filename
8255243 268856 294072 8818171 868dfb 32-bit i965_dri.so before
8255291 268856 294072 8818219 868e2b 32-bit i965_dri.so after
7815195 345592 420592 8581379 82f103 64-bit i965_dri.so before
7815339 345592 420592 8581523 82f193 64-bit i965_dri.so after
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This allows us to use a single token for every built-in type except void.
text data bss dec hex filename
8275163 269336 294072 8838571 86ddab 32-bit i965_dri.so before
8255243 268856 294072 8818171 868dfb 32-bit i965_dri.so after
7836963 346552 420592 8604107 8349cb 64-bit i965_dri.so before
7815195 345592 420592 8581379 82f103 64-bit i965_dri.so after
Yes, the 64-bit binary shrinks by 21k.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Passing YYSTYPE into classify_identifier enables a later patch.
text data bss dec hex filename
8310339 269336 294072 8873747 876713 32-bit i965_dri.so before
8275163 269336 294072 8838571 86ddab 32-bit i965_dri.so after
7845579 346552 420592 8612723 836b73 64-bit i965_dri.so before
7836963 346552 420592 8604107 8349cb 64-bit i965_dri.so after
Yes, the 64-bit binary shrinks by 8k.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
There are two callers of the constructor, and they are right next to
each other. Move the "#anon_struct" name handling to the parser so that
the conditional can be removed.
I've also deleted part of the comment (about the memory leak) because I
don't think it's quite accurate or relevant.
text data bss dec hex filename
8310399 269336 294072 8873807 87674f 32-bit i965_dri.so before
8310339 269336 294072 8873747 876713 32-bit i965_dri.so after
7845611 346552 420592 8612755 836b93 64-bit i965_dri.so before
7845579 346552 420592 8612723 836b73 64-bit i965_dri.so after
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Having moved gallium_dri.so library to /vendor/lib/dri
also symlinks need to be coherently created using TARGET_OUT_VENDOR instead of TARGET_OUT
or all non Intel drivers will not be loaded with Android N and earlier,
thus causing SurfaceFlinger SIGABRT
(v2) simplification of post install command
Fixes: c3f75d483c ("Android: move libraries to /vendor")
Cc: 17.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v1)
Reviewed-by: Rob Herring <robh@kernel.org> (v1)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
It seems nobody's using the string hashing function. If you try to
pass it directly to the hashtable creation function, you'll get
compiler warning for non matching prototypes. Let's make them match.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise we will leak them, load duplicates from disk rather
than memory and never write items loaded from disk to the apps
pipeline cache.
Fixes: fd24be134f 'radv: make use of on-disk cache'
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Patch uses mem_ctx for allocation to ensure param array gets freed
later.
==6164== 48 bytes in 1 blocks are definitely lost in loss record 61 of 193
==6164== at 0x4C2EB6B: malloc (vg_replace_malloc.c:299)
==6164== by 0x12E31C6C: ralloc_size (ralloc.c:121)
==6164== by 0x130189F1: fs_visitor::assign_constant_locations() (brw_fs.cpp:2095)
==6164== by 0x13022D32: fs_visitor::optimize() (brw_fs.cpp:5715)
==6164== by 0x13024D5A: fs_visitor::run_fs(bool, bool) (brw_fs.cpp:6229)
==6164== by 0x1302549A: brw_compile_fs (brw_fs.cpp:6570)
==6164== by 0x130C4B07: blorp_compile_fs (blorp.c:194)
==6164== by 0x130D384B: blorp_params_get_clear_kernel (blorp_clear.c:79)
==6164== by 0x130D3C56: blorp_fast_clear (blorp_clear.c:332)
==6164== by 0x12EFA439: do_single_blorp_clear (brw_blorp.c:1261)
==6164== by 0x12EFC4AF: brw_blorp_clear_color (brw_blorp.c:1326)
==6164== by 0x12EFF72B: brw_clear (brw_clear.c:297)
Fixes: 8d90e28839 ("intel/compiler: Allocate pull_param in assign_constant_locations")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
We were dividing by 4 twice. This also papered over a bug where we
were neglecting to clamp the sampler count to the [0, 16] range.
This should have no functional impact, it only affects prefetching.
v2 [Kenneth Graunke]:
- Clamp sampler_count to [0, 16] to avoid overflowing the valid values
for this field. Write a commit message.
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This avoids recompiles for shaders that don't use explicit derivatives
when ctx->Hint.FragmentShaderDerivative == GL_NICEST.
For example, GFXBench 5 Aztec Ruins sets the GL_NICEST hint before
compiling any shaders, but none of them use dFdx() or dFdy() - only
implicit derivatives. This doesn't eliminate any recompiles, but
does eliminate one of the reasons for doing so.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
i965 turns fddx/fddy into their coarse/fine variants based on the
ctx->Hint.FragmentShaderDerivative setting. It needs to know whether
this can impact a shader in order to better guess NOS settings.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Also, reorder them to match the structure's field order, to make it
easier to check that they're all present.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This allows an app to query shader statistics and get a disassembly of
a shader. RenderDoc git has support for it, so this allows you to view
shader disassembly from a capture.
When this extension is enabled on a device (or when tracing), we now
disable pipeline caching, since we don't get the shader debug info when
we retrieve cached shaders.
v2: Improvements to resource usage reporting
v3: Disassembly string must be null terminated (string_buffer's length
does not include the terminator)
v4: Fixed LDS reporting. (Bas)
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Following piglits are passing:
- glean@texture_srgb
- spec@ext_texture_srgb@fbo-srgb
- spec@ext_texture_srgb@tex-srgb
- spec@ext_texture_srgb@texwrap formats
- spec@ext_texture_srgb@texwrap formats-s3tc
Btw. this enables GL 2.1 :-)
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Fixes intermittent GPU hangs on Broxton with an Intel internal
test case.
There are plenty of similar fragment shaders in piglit that do
not use any varyings and any uniforms. According to the
documentation special timing is needed between pipeline stages.
Apparently we just don't hit that with piglit. Even with the
failing test case one doesn't always get the hang.
Moreover, according to the error states the hang happens
significantly later than the execution of the problematic shader.
There are multiple render cycles (primitive submissions) in between.
I've also seen error states where the ACTHD points outside the
batch. Almost as if the hardware writes somewhere that gets used
later on. That would also explain why piglit doesn't suffer from
this - most tests kick off one render cycle and any corruption
is left unseen.
v2 (Ken): Instead of enabling push constants, enable one of the
inputs (PSIZ).
v3 (Ken, Jason): Use LAYER instead making vulkan emit_3dstate_sbe()
happy.
Cc: "17.3 17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Culling tris with zero area seems like a great idea, but apparently with
fill mode line (and point) we're supposed to draw them, at least some tests
for some other state tracker complained otherwise.
Such tris also always seem to be back facing (not sure if this can be
inferred from anything, since in a mathematical sense it cannot really be
determined), so make sure to account for this when filling in the face
information.
(For solid tris, this is of course unnecessary, drivers will throw the tris
away later in any case.)
Reviewed-by: Brian Paul <brianp@vmware.com>
This has been tested with the osdemo from mesa-demos
v2: - Add SELinux dependency
- fix typo GALLIUM_LLVM -> GALLIUM_LLVMPIPE
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This builds the classic (non-gallium) osmesa with meson. This has been
tested with the osdemo application from mesa-demos.
v2: - Remove unrelated change
- Add SELinux dependency to osmesa
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This makes things much easier to ensure correctness with meson. Tested
with make dist-check and with meson.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
These are used by non-gallium osmesa, so they need to be defined outside
of the gallium subdirectory.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
According to the ARB_ES3_1_compatibility specification,
glGetFramebufferAttachmentParameteriv is supposed to accept BACK,
and it behaves exactly like BACK_LEFT.
Fixes a GL error in GFXBench 5 Aztec Ruins.
Cc: "17.3 17.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
From the spec:
"IMAGE_FORMAT_COMPATIBILITY_TYPE: The matching criteria use for the
resource when used as an image textures is returned in
<params>. This is equivalent to calling GetTexParameter"
So we would need to return None for any target not supported by
GetTexParameter. By mistake, we were using the target check for
GetTexLevelParameter.
v2: fix typo (GetTextParameter vs GetTexParemeter) on comment (Illia Mirkin)
Reviewed-by: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Meson's vcs_tag() uses the output of `git describe`, eg.
17.3-branchpoint-5-gfbf29c3cd15ae831e249+
Whereas the other build systems used a script that outputs only the sha1
of the HEAD commit, eg.
fbf29c3cd1
Given that this information is used by printing it next to the version
number, there's some redundancy here, and inconsistency between build
systems.
Bring Meson in line by making it use the same script, with the added
advantage of now supporting the MESA_GIT_SHA1_OVERRIDE env var.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Serious Sam Fusion 2017 uses a huge number of occlusion queries,
and the allocated query pool buffer is greater than 4096 bytes.
This slightly improves performance (tested in Ultra) from
117.2 FPS to 119.7 FPS (~+2%) on my RX480.
This also improves Talos, from 69 FPS to 72/73 FPS (~+5%).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Mesa's DEBUG and assert's NDEBUG are not tied to each other, so we need
to explicitly compile this code out.
Fixes: 3df7892878 "vc4: Drop reloc_count tracking for debug
asserts on non-debug builds."
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Valgrind shows that leak is caused by gen6_upload_push_constant, add
unref push_const_bo per stage to destructor to fix this (like done for
scratch_bo).
==10952== 144 bytes in 1 blocks are definitely lost in loss record 44 of 66
==10952== at 0x4C30A1E: calloc (vg_replace_malloc.c:711)
==10952== by 0x8C02847: bo_alloc_internal.constprop.10 (brw_bufmgr.c:344)
==10952== by 0x8C425C4: intel_upload_space (intel_upload.c:101)
==10952== by 0x8C22ED0: gen6_upload_push_constants (gen6_constant_state.c:154)
v2: remove if conditions, brw_bo_unreference handles NULL (Ken, Emil)
Fixes: 24891d7c05 ("i965: Store per-stage push constant BO pointers.")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Asserting slot >= 2 made sense when the URB read offset was always 1
(pair of slots). Commit 566a0c43f0 made
it possible to read from the VUE header in slot 0, by adjusting the
offset to be 0. So, this assert is now bogus. Use the one from GL.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Commit 566a0c43f0 started setting the
3DSTATE_SBE bit to override these values with the one calculated there.
So, they're dead. Stop setting them.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It appears that flushing the DB metadata is actually not sufficient
since the driver uses the new VS blit shaders. This looks quite
strange though, but it seems like we need to flush DB for fixing
the corruption.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102955
Fixes: 69ccb9dae7 (radeonsi: use new VS blit shaders (VS inputs in SGPRs)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This uses the new kernel interfaces for reduced cs overhead,
We only set the local flag for memory allocations that don't have
a dedicated allocation and ones that aren't imports.
v2: add to all the internal buffer creation paths.
v3: missed some command submission paths, handle 0/empty bo lists.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Not all rendering matches the miptree format. We allow rendering to
texture views so there are cases where it may not match. In those
cases, our current scheme of just passing the value of ctx->sRGBEnabled
isn't viable. Instead, just do what we do for texturing and pass the
view format in directly.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: mesa-stable@lists.freedesktop.org
It's rather surprising that we've never actually hit this before.
Aparently, Ian's SPIR-V generator currently claims the Simple when you
don't do anything complex. We really shouldn't assert-fail on it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: mesa-stable@lists.freedesktop.org
system/window.h is no longer available by default and is part of
libnativewindow, so add it to the shared libraries. It has to be conditional
because the library is only present in O and later.
Really, we should only be depending on vndk/window.h now, but that's only
in O and changing would be pretty invasive.
Signed-off-by: Rob Herring <robh@kernel.org>
See my LLVM patch which fixes the root cause.
Users have to apply this patch and then they have 2 choices:
- Downgrade to LLVM 5.0
- Update to LLVM git after my LLVM patch is pushed.
It won't be possible to use current and earlier development version
of LLVM 6.0.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: 17.3 <mesa-stable@lists.freedesktop.org>
This should be OUT_RELOC() since the operation isn't writing to the
buffer. Technically it doesn't matter much currently, since we'd
anyways to a gmem2mem later. But that will change.
Signed-off-by: Rob Clark <robdclark@gmail.com>
When binding a new pipeline, we applied all dynamic states
without checking if they really need to be re-emitted. This
doesn't seem to be useful for the meta operations because only
the viewports/scissors are updated.
This should reduce the number of commands added to the IB
when a new graphics pipeline is bound.
Also, rename radv_dynamic_state_copy() to radv_bind_dynamic_state()
and set the dirty flags directly there.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The depth bounds test values are either set at pipeline
creation or dynamically using vkCmdSetDepthBounds().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
From the OpenGL 4.6 spec, section 4.4.1 Input Layout Qualifiers, Page 68,
(Location aliasing):
"Further, when location aliasing, the aliases sharing the location
must have the same underlying numerical type (floating-point or
integer)."
The current implementation is too strict, since it checks that the
the base types are an exact match instead.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2:
- we only need to validate inputs to the first stage and outputs
from the last stage, everything else has already been validated
during cross_validate_outputs_to_inputs (Timothy).
- Use MAX_VARYING instead of MAX_VARYINGS_INCL_PATCH (Illia)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
For non-SSO programs, we only need to validate outputs, since
the cross validation of outputs to inputs will ensure that we
produce linker errors for invalid inputs too.
Hoever, for the SSO path there is no output to input validation,
so we need to validate inputs explicitly. Generalize the function
so it can handle this as well.
Also, notice that vertex shader inputs and fragment shader outputs
are already validated in assign_attribute_or_color_locations()
for both SSO and non-SSO paths, so we should not try to validate
that here again (in fact, the function would require explicit
paths to handle these two cases properly).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Currently, we only validate explicit locations for non-SSO programs.
This creates a helper that we can call from both SSO and non-SSO paths
directly, so we can reuse all the logic behind this.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
From ARB_enhanced_layouts:
"[...]when location aliasing, the aliases sharing the location
must have the same underlying numerical type (floating-point or
integer) and the same auxiliary storage and
interpolation qualification.[...]"
Add code to the linker to validate that aliased locations do
have the same aux storage.
Fixes:
KHR-GL45.enhanced_layouts.varying_location_aliasing_with_mixed_auxiliary_storage
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
From ARB_enhanced_layouts:
"[...]when location aliasing, the aliases sharing the location
must have the same underlying numerical type (floating-point or
integer) and the same auxiliary storage and
interpolation qualification.[...]"
Add code to the linker to validate that aliased locations do
have the same interpolation.
Fixes:
KHR-GL45.enhanced_layouts.varying_location_aliasing_with_mixed_interpolation
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The existing code was checking the whole interface variable rather
than its members, which is not what we want: we want to check
aliasing for each member in the interface variable.
Surprisingly, there are piglit tests that verify this and were
passing due to a bug in the existing code: when we were computing
the last component used by an interface variable we would use
the 'vector' path and multiply by vector_elements, which is 0 for
interface variables. This made the loop that checks for aliasing
be a no-op and not add the interface variable to the list of outputs
so then we would fail to link when we did not see a matching output
for the same input in the next stage. Since the tests expect a
linker error to happen, they would pass, but not for the right
reason.
Unfortunately, the current implementation uses ir_variable instances
to keep track of explicit locations. Since we don't have
ir_variables instances for individual interface members, we need
to have a custom struct with the data we need. This struct has
the ir_variable (which for interface members is the whole
interface variable), plus the data that we need to validate for
each aliased location, for now only the base type, which for
interface members we will take from the appropriate field inside
the interface variable.
Later patches will expand this custom struct so we can also check
other requirements for location aliasing, specifically that
we have matching interpolation and auxiliary storage, that once
again, we will take from the appropriate field members for the
interface variables.
v2:
- Use MAX_VARYING instead of MAX_VARYINGS_INCL_PATCH (Illia)
Fixes:
KHR-GL45.enhanced_layouts.varying_block_automatic_member_locations
Fixes (these were passing before but for incorrect reasons):
tests/spec/arb_enhanced_layouts/linker/block-member-locations/named-block-member-location-overlap.shader_test
tests/spec/arb_enhanced_layouts/linker/block-member-locations/named-block-member-mixed-order-overlap.shader_test
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Move the checks for explicit locations to a separate function. We
will use this in a follow-up patch to validate locations for interface
variables where we need to validate each interface member rather than
the interface variable itself.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
We were assuming that if an input has an invalid explicit location it would
fail to link because it would not find the corresponding output, however,
since we look for the matching output by indexing the explicit_locations
array with the input location, we still need to ensure that we don't index
out of bounds.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
There is no reason to block this here, if a driver enables
it, let it handle it.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This probably needs more work but this just add the initial
code to convert gs/tcs/tes nir based shaders in the state tracker.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This commit fixes two issues: First, we were returning false regardless
of whether or not the function made progress. Second, we were calling
nir_metadata_preserve far more often than needed; we only need to call
it once per impl.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We want this to get called before nir_lower_subgroups which is going in
brw_preprocess_nir. Now that nir_lower_wpos_ytransform can handle
system values, this should be safe to do.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We currently have a bug where nir_lower_system_values gets called before
nir_lower_var_copies so it will miss any system value uses which come
from a copy_var intrinsic. Moving it to after brw_preprocess_nir fixes
this problem.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
The PRM says "The execution size must be 1." In 73137997e2, the
execution size was set to 1 when it should have been BRW_EXECUTE_1
(which maps to 0). Later, in dc2d3a7f5c, JMPI was used for
line AA on gen6 and earlier and we started manually stomping the
exeution size to BRW_EXECUTE_1 in the generator. This commit fixes the
original bug and makes brw_JMPI just do the right thing.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 73137997e2
Returns the brw_type for a given ssa.bit_size, and a reference type.
So if bit_size is 64, and the reference type is BRW_REGISTER_TYPE_F,
it returns BRW_REGISTER_TYPE_DF. The same applies if bit_size is 32
and reference type is BRW_REGISTER_TYPE_HF it returns BRW_REGISTER_TYPE_F
v2 (Jason Ekstrand):
- Use better unreachable() messages
- Add Q types
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In order to implement the ballot intrinsic, we do a MOV from flag
register to some GRF. If that GRF is used in a SEL, cmod propagation
helpfully changes it into a MOV from the flag register with a cmod.
This is perfectly valid but when lower_simd_width comes along, it simply
splits into two instructions which both have conditional modifiers.
This is a problem since we're reading the flag register. This commit
makes us check whether or not flags_written() overlaps with the flag
values that we are reading via the instruction source and, if we have
any interference, will force us to emit a copy of the source.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
The beginning of the end for the shader keys. Not entirely sure
what I'm going to replace them with for the compiler though, so this
is the first step.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
These assertions were revisited a couple of times in the past, and they
still weren't quite right.
The problem I was seeing (with some other state tracker) was a copy between
two 512x512 s3tc textures, but from mip level 0 to mip level 8. Therefore,
the destination has only size 2x2 (not a full block), so the box width/height
was only 2, causing the assertion to trigger for src alignment.
As far as I can tell, such a copy is completely legal, and because a correct
assertion would get ridiculously complicated just get rid of it for good.
Reviewed-by: Brian Paul <brianp@vmware.com>
This way, we know what we're allowed to use (no nested include lists
for instance) and users get immediate feedback when trying to use
unsupported versions, rather than a cryptic crash or things being
silently not built correctly.
Cc: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Libunwind has some issues on some platforms, so let's allow people
who have issues to opt-out. This is similar to what we do in automake,
and the implementation is modelled after our opt-out for valgrind.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Following test checking entrypoints passes:
dEQP-EGL.functional.get_proc_address.extension.gl_ext_occlusion_query_boolean
Piglit test 'ext_occlusion_query_boolean-any-samples' passes with these changes.
No changes/regression observed in WebGL occlusion tests or Intel CI.
v2: add es2="2.0" for glapi entrypoints, clean up xml
dispatch_sanity changes (fix 'make check')
Signed-off-by: Harish Krupo <harish.krupo.kps@intel.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Some of the checks are valid for generic ES 3.2 as well.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
It's still printed after linking, but it makes more sense to
have SPIRV->NIR->LLVM IR->ASM.
Fixes: f0a2bbd1a4 (radv: move nir print after linking is done)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is needed for RADV to support explicit component packing.
This is also required to use the new NIR component splitting /
packing passes.
V2:
- add commponent packing support for interpolate_at* intrinsics
- improve store packing support when not all varyings are scalar
as spotted by Bas the store source was incorrectly offset.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For certain buffer meta ops we can use the CP or a compute shader,
we should use a define to rather than hardcoding 4096, allows
for easier testing and more consistency.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Drivers have supported KHR_no_error for a while. We'd been leaving it
marked as "in progress" because there's a zillion places that could get
slightly more optimized. But, Timothy and Samuel have already done
piles of work, and I think we have a solid implementation at this point.
Let's check it off the list.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This properly sets stage_state->push_constant_dirty = true, so that we
emit 3DSTATE_CONSTANT_XS to disable the constant buffer for the shader
stage. It also sets stage_state->push_const_size = 0.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We have a gl_program and we want a gl_program. There's no point in
converting to brw_program and back again. This probably made more
sense in the old days before Tim dropped a layer of subclassing.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Compute shaders don't have access to the framebuffer, so there's no
point in worrying whether a texture is bound as a render target.
This saves a bunch of resolves in GFXBench4 Manhattan 3.1, but doesn't
seem to impact performance at all, at least on Apollolake.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
gcc is throwing this warning in my meson build:
../src/intel/compiler/brw_eu_validate.c:50:11: warning
argument 1 null where non-null expected [-Wnonnull]
return memmem(haystack.str, haystack.len,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
needle.str, needle.len) != NULL;
~~~~~~~~~~~~~~~~~~~~~~~
The first check for CONTAINS has a NULL error_msg.str and 0 len. The
glibc implementation will exit without looking at any haystack bytes if
haystack.len < needle.len, so this was safe, but silence the warning
anyway by guarding against implementation variablility.
Fixes: 122ef3799d ("i965: Only insert error message if not already present")
Reviewed-by: Matt Turner <mattst88@gmail.com>
To enable per-context priorities, we need to have per-context pipe's.
Unfortunately we still need to keep the global screen pipe, mostly just
for screen->get_timestamp().
Signed-off-by: Rob Clark <robdclark@gmail.com>
To add context priority support we need to have an fd_pipe per context,
rather than per-screen. Which conflicts with existing ctx->pipe (which
is actually a visibility stream pipe (hw resource). So just rename it.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Instead of plain snprintf(). To fix the MSVC build.
snprintf() is used in various places in Mesa/gallium, but apparently,
not in code built with MSVC.
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
I'm working on radeonsi support in the Chrome OS Android container
(ARC++). Mesa in ARC++ uses autotools instead of Android.mk, but all
the necessary EGL bits are there, so the existing check is too strict.
Signed-off-by: Benjamin Gordon <bmgordon@chromium.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
This restores performance for the drirc workaround, i.e.
KILL_IF does:
visible = src0 >= 0;
kill_flag &= visible; // accumulate kills
amdgcn_kill(wqm_vote(visible)); // kill fully dead quads only
And all helper pixels are killed at the end of the shader:
amdgcn_kill(kill_flag);
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We now have linking optimisations so we want to delay dumping the
nir until after these are complete.
Fixes: 06f05040eb (radv: Link shaders)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fixes a regression I introduced refactoring this code,
I managed to invert range twice, I moved the inversion into
the common code, but forgot to stop doing it in the callee.
Fixes: GL45-CTS.multi_bind.dispatch_bind_buffers_base
Fixes: 35ac13ed3 (mesa/bufferobj: consolidate some codepaths between ubo/ssbo/atomics.)
Reported-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The IR is reused in different pipeline combinations so we need
to clone it to avoid link time optimistaions messing up the
original copy.
Fixes: 06f05040eb (radv: Link shaders)
Reviewed-by: Dave Airlie <airlied@redhat.com>
Compiling with MSVC options /we4995 /we4996 (a subset of /sdl) generates
a warning that the gethostbyname() function is deprecated in favor of
getaddrinfo() or GetAddrInfoW(). Replace the call with getaddrinfo().
Untested. There are no callers to u_socket_connect() in Gallium.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This was the actual cause of GPU hangs fixed by 0fdd531457 ("radv:
Fix pipeline cache locking issues"), since multiple threads would end
up trying to create the variants for a single entry.
Now that we're locking around the whole of this function, this isn't
really necessary (we either create all or none of the variants), but
fix this anyway in case things change later.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
CC: 17.3 <mesa-stable@lists.freedesktop.org>
The kernel doesn't initialize the value of the INSTPM or CS_DEBUG_MODE2
registers at context initialization time. Instead, they're inherited
from whatever happened to be running on the GPU prior to first run of a
new context. So, when we started setting these, other contexts in the
system started inheriting our values. Since this controls whether
3DSTATE_CONSTANT_* takes a pointer or an offset, getting the wrong
setting is fatal for almost any process which isn't expecting this.
Unfortunately, VA-API and Beignet don't initialize this (nor does older
Mesa), so they will die horribly if we start doing this. UXA and SNA
don't use any push constants, so they are unaffected.
Until we have some kind of solution to this problem, I'm going to revert
this patch and abandon using the feature for now. It will lead to fewer
pushed UBO ranges on Broadwell+, which may lead to lower performance,
though I don't have any data on the impact.
Cc: "17.3 17.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102774
Meson 0.43 added the ability to pass nested lists to
include_directories, so the code that we have works for 0.43, but not
for 0.42. This patch changes the include_directories list to be flat so
it works with 0.42
fixes: 108d257a16 ("meson: build libEGL")
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
There are two issues with the current implementation. First, it relies
on the layout(local_size_*) happening in the same shader as the main
function, and secondly it doesn't work for variable group sizes.
In both cases, the simplest fix is to move the setup of these derived
values to a later time, similar to how the gl_VertexID workarounds are
done. There already exist system values defined for both of the derived
values, so we use them unconditionally, and lower them after linking is
performed.
While we're at it, we move to using gl_LocalGroupSizeARB instead of
gl_WorkGroupSize for variable group sizes.
Also the dead code elimination avoidance can be removed, since there
can be situations where gl_LocalGroupSizeARB is needed but has not been
inserted for the shader with main function. As a result, the lowering
code has to insert its own copies of the system values if needed.
Reported-by: Stephane Chevigny <stephane.chevigny@polymtl.ca>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103393
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103388">Bug 103388</a> - Linking libcltgsi.la (llvm/codegen/libclllvm_la-common.lo) fails with "error: no match for 'operator-'" with GCC-7, Mesa from Git and current LLVM revisions</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (8):</p>
<ul>
<li>cherry-ignore: configure.ac: rework llvm detection and handling</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97532">Bug 97532</a> - Regression: GLB 2.7 & Glmark-2 GLES versions segfault due to linker precision error (259fc505) on dead variable</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103787">Bug 103787</a> - [BDW,BSW] gpu hang on spec.arb_pipeline_statistics_query.arb_pipeline_statistics_query-comp</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94739">Bug 94739</a> - Mesa 11.1.2 implementation error: bad format MESA_FORMAT_Z_FLOAT32 in _mesa_unpack_uint_24_8_depth_stencil_row</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101378">Bug 101378</a> - interpolateAtSample check for input parameter is too strict</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102435">Bug 102435</a> - [skl,kbl] [drm] GPU HANG: ecode 9:0:0x86df7cf9, in csgo_linux64 [4947], reason: Hang on rcs, action: reset</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102552">Bug 102552</a> - Null dereference due to not checking return value of util_format_description</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103412">Bug 103412</a> - gallium/wgl: Another fix to context creation without prior SetPixelFormat()</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103616">Bug 103616</a> - Increased difference from reference image in shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103966">Bug 103966</a> - Mesa 17.2.5 implementation error: bad format MESA_FORMAT_Z_FLOAT32 in _mesa_unpack_uint_24_8_depth_stencil_row</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104119">Bug 104119</a> - radv: OpBitFieldInsert produces 0 with a loop counter for Insert</li>
@@ -58,14 +59,187 @@ Note: some of the new features are only available with certain drivers.
<h2>Bug fixes</h2>
<ul>
TBD
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97532">Bug 97532</a> - Regression: GLB 2.7 & Glmark-2 GLES versions segfault due to linker precision error (259fc505) on dead variable</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101766">Bug 101766</a> - Assertion `!"invalid type"' failed when constant expression involves literal of different type</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101832">Bug 101832</a> - [PATCH][regression][bisect] Xorg fails to start after f50aa21456d82c8cb6fbaa565835f1acc1720a5d</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101851">Bug 101851</a> - [regression] libEGL_common.a undefined reference to '__gxx_personality_v0'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101867">Bug 101867</a> - Launch options window renders black in Feral Games in current Mesa trunk</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101876">Bug 101876</a> - SIGSEGV when launching Steam</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102014">Bug 102014</a> - Mesa git build broken by commit bc7f41e11d325280db12e7b9444501357bc13922</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102015">Bug 102015</a> - [Regression,bisected]: Segfaults with various programs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102024">Bug 102024</a> - FORMAT_FEATURE_SAMPLED_IMAGE_BIT not supported for D16_UNORM and D32_SFLOAT</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102038">Bug 102038</a> - assertion failure in update_framebuffer_size</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102429">Bug 102429</a> - [regression, SI] Performance decrease in Unigine Valley & Heaven</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102435">Bug 102435</a> - [skl,kbl] [drm] GPU HANG: ecode 9:0:0x86df7cf9, in csgo_linux64 [4947], reason: Hang on rcs, action: reset</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102454">Bug 102454</a> - glibc 2.26 doesn't provide anymore xlocale.h</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102496">Bug 102496</a> - Frontbuffer rendering corruption on mesa master</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102502">Bug 102502</a> - [bisected] Kodi crashes since commit 707d2e8b - gallium: fold u_trim_pipe_prim call from st/mesa to drivers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102530">Bug 102530</a> - [bisected] Kodi crashes when launching a stream - commit bd2662bf</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102552">Bug 102552</a> - Null dereference due to not checking return value of util_format_description</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102565">Bug 102565</a> - u_debug_stack.c:114: undefined reference to `_Ux86_64_getcontext'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102573">Bug 102573</a> - fails to build on armel</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102665">Bug 102665</a> - test_glsl_to_tgsi_lifetime.cpp:53:67: error: ‘>>’ should be ‘>>’ within a nested template argument list</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102809">Bug 102809</a> - Rust shadows(?) flash random colours</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102844">Bug 102844</a> - memory leak with glDeleteProgram for shader program type GL_COMPUTE_SHADER</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102847">Bug 102847</a> - swr fail to build with llvm-5.0.0</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102852">Bug 102852</a> - Scons: Support the new Scons 3.0.0</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102904">Bug 102904</a> - piglit and gl45 cts linker tests regressed</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102924">Bug 102924</a> - mesa (git version) images too dark</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102940">Bug 102940</a> - Regression: Vulkan KMS rendering crashes since 17.2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102955">Bug 102955</a> - HyperZ related rendering issue in ARK: Survival Evolved</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103002">Bug 103002</a> - string_buffer_test.cpp:43: error: ISO C++ forbids initialization of member ‘str1’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103323">Bug 103323</a> - Possible unintended error message in file pixel.c line 286</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103388">Bug 103388</a> - Linking libcltgsi.la (llvm/codegen/libclllvm_la-common.lo) fails with "error: no match for 'operator-'" with GCC-7, Mesa from Git and current LLVM revisions</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103412">Bug 103412</a> - gallium/wgl: Another fix to context creation without prior SetPixelFormat()</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103519">Bug 103519</a> - wayland egl apps crash on start with mesa 17.2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103529">Bug 103529</a> - [GM45] GPU hang with mpv fullscreen (bisected)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103537">Bug 103537</a> - i965: Shadow of Mordor broken since commit 379b24a40d3d34ffdaaeb1b328f50e28ecb01468 on Haswell</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103544">Bug 103544</a> - Graphical glitches r600 in game this war of mine linux native</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103616">Bug 103616</a> - Increased difference from reference image in shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103787">Bug 103787</a> - [BDW,BSW] gpu hang on spec.arb_pipeline_statistics_query.arb_pipeline_statistics_query-comp</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94739">Bug 94739</a> - Mesa 11.1.2 implementation error: bad format MESA_FORMAT_Z_FLOAT32 in _mesa_unpack_uint_24_8_depth_stencil_row</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102710">Bug 102710</a> - vkCmdBlitImage with arrayLayers > 1 fails</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103579">Bug 103579</a> - Vertex shader causes compiler to crash in SPIRV-to-NIR</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103966">Bug 103966</a> - Mesa 17.2.5 implementation error: bad format MESA_FORMAT_Z_FLOAT32 in _mesa_unpack_uint_24_8_depth_stencil_row</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104119">Bug 104119</a> - radv: OpBitFieldInsert produces 0 with a loop counter for Insert</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104214">Bug 104214</a> - Dota crashes when switching from game to desktop</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104492">Bug 104492</a> - Compute Shader: Wrong alignment when assigning struct value to structured SSBO</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104551">Bug 104551</a> - Check if Mako templates for Python are installed</li>
</ul>
<h2>Changes</h2>
<p>Alex Smith (3):</p>
<ul>
<li>anv: Add missing unlock in anv_scratch_pool_alloc</li>
<li>anv: Take write mask into account in has_color_buffer_write_enabled</li>
<li>anv: Make sure state on primary is correct after CmdExecuteCommands</li>
</ul>
<p>Andres Gomez (1):</p>
<ul>
<li>anv: Import mako templates only during execution of anv_extensions</li>
</ul>
<p>Bas Nieuwenhuizen (11):</p>
<ul>
<li>radv: Invert condition for all samples identical during resolve.</li>
<li>radv: Flush caches before subpass resolve.</li>
<li>radv: Fix fragment resolve destination offset.</li>
<li>radv: Use correct framebuffer size for partial FS resolves.</li>
<li>radv: Always use fragment resolve if dest uses DCC.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90311">Bug 90311</a> - Fail to build libglx with clang at linking stage</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101442">Bug 101442</a> - Piglit shaders@ssa@fs-if-def-else-break fails with sb but passes with R600_DEBUG=nosb</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102435">Bug 102435</a> - [skl,kbl] [drm] GPU HANG: ecode 9:0:0x86df7cf9, in csgo_linux64 [4947], reason: Hang on rcs, action: reset</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104163">Bug 104163</a> - [GEN9+] 2-3% perf drop in GfxBench Manhattan 3.1 from "i965: Disable regular fast-clears (CCS_D) on gen9+"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104383">Bug 104383</a> - [KBL] Intel GPU hang with firefox</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104546">Bug 104546</a> - Crash happens when running compute pipeline after calling glxMakeCurrent two times</li>
</ul>
<h2>Changes</h2>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: add sha256 checksums for 17.3.5</li>
<li>Update version to 17.3.6</li>
</ul>
<p>Jason Ekstrand (4):</p>
<ul>
<li>i965/draw: Do resolves properly for textures used by TXF</li>
<li>i965: Replace draw_aux_buffer_disabled with draw_aux_usage</li>
<li>i965/draw: Set NEW_AUX_STATE when draw aux changes</li>
<li>i965: Stop disabling aux during texture preparation</li>
</ul>
<p>Kenneth Graunke (1):</p>
<ul>
<li>i965: Don't disable CCS for RT dependencies when dispatching compute.</li>
</ul>
<p>Topi Pohjolainen (1):</p>
<ul>
<li>i965: Don't try to disable render aux buffers for compute</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105013">Bug 105013</a> - [regression] GLX+VA-API+clutter-gst video playback is corrupt with Mesa 17.3 (but is fine with 17.2)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105029">Bug 105029</a> - simdlib_512_avx512.inl:371:57: error: could not convert ‘_mm512_mask_blend_epi32((__mmask16)(ImmT), a, b)’ from ‘__m512i’ {aka ‘__vector(8) long long int’} to ‘SIMDImpl::SIMD512Impl::Float’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105098">Bug 105098</a> - [RADV] GPU freeze with simple Vulkan App</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105103">Bug 105103</a> - Wayland master causes Mesa to fail to compile</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104636">Bug 104636</a> - [BSW/HD400] Aztec Ruins GL version GPU hangs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105290">Bug 105290</a> - [BSW/HD400] SynMark OglCSDof GPU hangs when shaders come from cache</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105464">Bug 105464</a> - Reading per-patch outputs in Tessellation Control Shader returns undefined values</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105670">Bug 105670</a> - [regression][hang] Trine1EE hangs GPU after loading screen on Mesa3D-17.3 and later</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105717">Bug 105717</a> - [bisected] Mesa build tests fails: BIGENDIAN_CPU or LITTLEENDIAN_CPU must be defined</li>
</ul>
<h2>Changes</h2>
<p>Axel Davy (3):</p>
<ul>
<li>st/nine: Fix bad tracking of vs textures for NINESBT_ALL</li>
<li>st/nine: Fixes warning about implicit conversion</li>
<li>st/nine: Fix non inversible matrix check</li>
</ul>
<p>Caio Marcelo de Oliveira Filho (1):</p>
<ul>
<li>anv/pipeline: fail if TCS/TES compile fail</li>
</ul>
<p>Dave Airlie (1):</p>
<ul>
<li>radv: get correct offset into LDS for indexed vars.</li>
</ul>
<p>Derek Foreman (1):</p>
<ul>
<li>egl/wayland: Make swrast display_sync the correct queue</li>
</ul>
<p>Eric Engestrom (1):</p>
<ul>
<li>meson/configure: detect endian.h instead of trying to guess when it's available</li>
</ul>
<p>Ian Romanick (2):</p>
<ul>
<li>mesa: Don't write to user buffer in glGetTexParameterIuiv on error</li>
<li>i965/vec4: Fix null destination register in 3-source instructions</li>
</ul>
<p>Jason Ekstrand (1):</p>
<ul>
<li>i965: Emit texture cache invalidates around blorp_copy</li>
</ul>
<p>Jordan Justen (2):</p>
<ul>
<li>i965: Calculate thread_count in brw_alloc_stage_scratch</li>
<li>i965: Hard code CS scratch_ids_per_subslice for Cherryview</li>
</ul>
<p>Juan A. Suarez Romero (6):</p>
<ul>
<li>docs: add sha256 checksums for 17.3.7</li>
<li>cherry-ignore: ac/nir: pass the nir variable through tcs loading.</li>
<li>cherry-ignore: radv: handle exporting view index to fragment shader. (v1.1)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105317">Bug 105317</a> - The GPU Vega 56 was hang while try to pass #GraphicsFuzz shader15 test</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105440">Bug 105440</a> - GEN7: rendering issue on citra</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105442">Bug 105442</a> - Hang when running nine ff lighting shader with radeonsi</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105994">Bug 105994</a> - surface state leak when creating and destroying image views with aspectMask depth and stencil</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (2):</p>
<ul>
<li>dri_util: when overriding, always reset the core version</li>
<li>mesa: adds some comments regarding MESA_GLES_VERSION_OVERRIDE usage</li>
</ul>
<p>Axel Davy (2):</p>
<ul>
<li>st/nine: Declare lighting consts for ff shaders</li>
<li>st/nine: Do not use scratch for face register</li>
</ul>
<p>Bas Nieuwenhuizen (1):</p>
<ul>
<li>ac/nir: Add workaround for GFX9 buffer views.</li>
</ul>
<p>Daniel Stone (1):</p>
<ul>
<li>st/dri: Initialise modifier to INVALID for DRI2</li>
</ul>
<p>Emil Velikov (1):</p>
<ul>
<li>glsl: remove unreachable assert()</li>
</ul>
<p>Eric Engestrom (1):</p>
<ul>
<li>gbm: remove never-implemented function</li>
</ul>
<p>Henri Verbeet (1):</p>
<ul>
<li>mesa: Inherit texture view multi-sample information from the original texture images.</li>
</ul>
<p>Iago Toral Quiroga (1):</p>
<ul>
<li>compiler/spirv: set is_shadow for depth comparitor sampling opcodes</li>
</ul>
<p>Jason Ekstrand (4):</p>
<ul>
<li>nir/vars_to_ssa: Remove copies from the correct set</li>
<li>nir/lower_indirect_derefs: Support interp_var_at intrinsics</li>
<li>intel/vec4: Set channel_sizes for MOV_INDIRECT sources</li>
<li>nir/lower_vec_to_movs: Only coalesce if the vec had a SSA destination</li>
</ul>
<p>Juan A. Suarez Romero (3):</p>
<ul>
<li>docs: add sha256 checksums for 17.3.8</li>
<li>cherry-ignore: Explicit 18.0 only nominations</li>
<li>Update version to 17.3.9</li>
</ul>
<p>Lionel Landwerlin (1):</p>
<ul>
<li>anv: fix number of planes for depth & stencil</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>mesa: simplify MESA_GL_VERSION_OVERRIDE behavior of API override</li>
</ul>
<p>Samuel Pitoiset (1):</p>
<ul>
<li>radv: fix picking the method for resolve subpass</li>
</ul>
<p>Sergii Romantsov (1):</p>
<ul>
<li>i965: Extend the negative 32-bit deltas to 64-bits</li>
Note: some of the new features are only available with certain drivers.
</p>
<ul>
<li>Disk shader cache support for i965 when MESA_GLSL_CACHE_DISABLE environment variable is set to "0" or "false"</li>
<li>GL_ARB_shader_atomic_counters and GL_ARB_shader_atomic_counter_ops on r600/evergreen+</li>
<li>GL_ARB_shader_image_load_store and GL_ARB_shader_image_size on r600/evergreen+</li>
<li>GL_ARB_shader_storage_buffer_object on r600/evergreen+<li>
<li>GL_ARB_compute_shader on r600/evergreen+<li>
<li>GL_ARB_cull_distance on r600/evergreen+</li>
<li>GL_ARB_enhanced_layouts on r600/evergreen+</li>
<li>GL_ARB_bindless_texture on nvc0/kepler</li>
<li>OpenGL 4.3 on r600/evergreen with hw fp64 support</li>
<li>Support 1 binary format for GL_ARB_get_program_binary on i965.
(For the 18.0 release, 0 formats continue to be supported in
compatibility profiles.)</li>
<li>Cannonlake support on i965 and anv</li>
</ul>
<h2>Bug fixes</h2>
<ul>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85564">Bug 85564</a> - Dead Island rendering issues</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90311">Bug 90311</a> - Fail to build libglx with clang at linking stage</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92363">Bug 92363</a> - [BSW/BDW] ogles1conform Gets test fails</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=94739">Bug 94739</a> - Mesa 11.1.2 implementation error: bad format MESA_FORMAT_Z_FLOAT32 in _mesa_unpack_uint_24_8_depth_stencil_row</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=97532">Bug 97532</a> - Regression: GLB 2.7 & Glmark-2 GLES versions segfault due to linker precision error (259fc505) on dead variable</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101378">Bug 101378</a> - interpolateAtSample check for input parameter is too strict</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101442">Bug 101442</a> - Piglit shaders@ssa@fs-if-def-else-break fails with sb but passes with R600_DEBUG=nosb</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101560">Bug 101560</a> - SPIR-V OpSwitch with int64 not supported even though shaderInt64 is true</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=101691">Bug 101691</a> - gfx corruption on windowed 3d-apps running on dGPU</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102354">Bug 102354</a> - Mesa 17.2 no longer can give SRGB-capable framebuffer on i965, even though Mesa 17.1.x does.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102358">Bug 102358</a> - WarThunder freezes at start, with activated vsync (vblank_mode=2)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102435">Bug 102435</a> - [skl,kbl] [drm] GPU hang in Valve games based on Source 1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102503">Bug 102503</a> - Report SRGB framebuffer to SuperTuxKart to workaround SuperTuxKart crash</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102665">Bug 102665</a> - test_glsl_to_tgsi_lifetime.cpp:53:67: error: ‘>>’ should be ‘>>’ within a nested template argument list</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103283">Bug 103283</a> - drm_get_device_name_for_fd is broken on FreeBSD</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103388">Bug 103388</a> - Linking libcltgsi.la (llvm/codegen/libclllvm_la-common.lo) fails with "error: no match for 'operator-'" with GCC-7, Mesa from Git and current LLVM revisions</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103412">Bug 103412</a> - gallium/wgl: Another fix to context creation without prior SetPixelFormat()</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103496">Bug 103496</a> - svga_screen.c:26:46: error: git_sha1.h: No such file or directory</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103513">Bug 103513</a> - [build failure] radv_shader.c:683:2: error: format not a string literal and no format arguments [-Werror=format-security]</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103519">Bug 103519</a> - wayland egl apps crash on start with mesa 17.2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103529">Bug 103529</a> - [GM45] GPU hang with mpv fullscreen (bisected)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103537">Bug 103537</a> - i965: Shadow of Mordor broken since commit 379b24a40d3d34ffdaaeb1b328f50e28ecb01468 on Haswell</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103544">Bug 103544</a> - Graphical glitches r600 in game this war of mine linux native</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103579">Bug 103579</a> - Vertex shader causes compiler to crash in SPIRV-to-NIR</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103616">Bug 103616</a> - Increased difference from reference image in shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103674">Bug 103674</a> - u_queue.c:173:7: error: implicit declaration of function 'timespec_get' is invalid in C99</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103784">Bug 103784</a> - [bisected] Egl changes breaks all of EGL</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103787">Bug 103787</a> - [BDW,BSW] gpu hang on spec.arb_pipeline_statistics_query.arb_pipeline_statistics_query-comp</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103808">Bug 103808</a> - [radeonsi, bisected] World of Warcraft scribbling all over screen</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103902">Bug 103902</a> - Portal 2 game hangs at startup with latest mesa dev</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103904">Bug 103904</a> - Source engine-based games won't hang at start without R600_DEBUG=vs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103955">Bug 103955</a> - Using array in structure results in wrong GLSL compilation output</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103966">Bug 103966</a> - Mesa 17.2.5 implementation error: bad format MESA_FORMAT_Z_FLOAT32 in _mesa_unpack_uint_24_8_depth_stencil_row</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=103988">Bug 103988</a> - Intermittent piglit failures with shader cache enabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104005">Bug 104005</a> - [sklgt4e] GPU hangs in Car_Chase</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104119">Bug 104119</a> - radv: OpBitFieldInsert produces 0 with a loop counter for Insert</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104141">Bug 104141</a> - include/c11/threads_posix.h:96: undefined reference to `pthread_once'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104163">Bug 104163</a> - [GEN9+] 2-3% perf drop in GfxBench Manhattan 3.1 from "i965: Disable regular fast-clears (CCS_D) on gen9+"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104199">Bug 104199</a> - [i965 bisected] BIO and EM Vision in >Observer_ is broken since commit af2c320190f3c73180f1610c8df955a7fa2a4d09</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104213">Bug 104213</a> - NULL pointer access crashes on compiling Vulkan compute shaders after "anv: Add support for the variablePointers feature"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104214">Bug 104214</a> - Dota crashes when switching from game to desktop</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104302">Bug 104302</a> - Wolfenstein 2 (2017) under wine graphical artifacting on RADV</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104331">Bug 104331</a> - [r600g] Ogre demo "TutorialUAV01" crash at r600_decompress_color_images</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104338">Bug 104338</a> - NULL pointer access crash on Sacha Willems' Vulkan raytracing demo after "spirv: Add basic type validation for OpLoad, OpStore, and OpCopyMemory"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104359">Bug 104359</a> - Mesa freezes in "vtn_cfg_walk_blocks" with Sacha Willems' hdr, parallaxmapping and specializationconstants Vulkan demos</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104381">Bug 104381</a> - swr fails to build since llvm-svn r321257</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104383">Bug 104383</a> - [KBL] Intel GPU hang with firefox</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104492">Bug 104492</a> - Compute Shader: Wrong alignment when assigning struct value to structured SSBO</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104546">Bug 104546</a> - Crash happens when running compute pipeline after calling glxMakeCurrent two times</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104551">Bug 104551</a> - Check if Mako templates for Python are installed</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104625">Bug 104625</a> - semicolon after if</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104636">Bug 104636</a> - [BSW/HD400] Aztec Ruins GL version GPU hangs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104749">Bug 104749</a> - rasterizer/jitter/JitManager.cpp:252:91: error: no matching function for call to ‘llvm::DIBuilder::createBasicType(const char [8], int, llvm::dwarf::TypeKind)’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104762">Bug 104762</a> - Various segfaults/problems in qt/plasma</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104777">Bug 104777</a> - Attaching multiple shader objects for the same stage to a GLSL program triggers a linker error</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104884">Bug 104884</a> - memory leak with intel i965 mesa when running android container in Ubuntu</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104905">Bug 104905</a> - SpvOpFOrdEqual doesn't return correct results for NaNs</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=104915">Bug 104915</a> - Indexed SHADING_LANGUAGE_VERSION query not supported</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105013">Bug 105013</a> - [regression] GLX+VA-API+clutter-gst video playback is corrupt with Mesa 17.3 (but is fine with 17.2)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105029">Bug 105029</a> - simdlib_512_avx512.inl:371:57: error: could not convert ‘_mm512_mask_blend_epi32((__mmask16)(ImmT), a, b)’ from ‘__m512i’ {aka ‘__vector(8) long long int’} to ‘SIMDImpl::SIMD512Impl::Float’</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105065">Bug 105065</a> - Qt Programs occasionally fail to render with new Mesa (glGetProgramBinary)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105098">Bug 105098</a> - [RADV] GPU freeze with simple Vulkan App</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105103">Bug 105103</a> - Wayland master causes Mesa to fail to compile</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105290">Bug 105290</a> - [BSW/HD400] SynMark OglCSDof GPU hangs when shaders come from cache</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105292">Bug 105292</a> - vkGetQueryPoolResults returns incorrect query status for large query buffers (bisected)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105436">Bug 105436</a> - Blinking textures in UT2004 [bisected]</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105464">Bug 105464</a> - Reading per-patch outputs in Tessellation Control Shader returns undefined values</li>
</ul>
<h2>Changes</h2>
<ul>
<li>Remove incomplete GLX_MESA_set_3dfx_mode from the Xlib libGL</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=102542">Bug 102542</a> - mesa-17.2.0/src/gallium/state_trackers/nine/nine_ff.c:1938: bad assignment ?</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105317">Bug 105317</a> - The GPU Vega 56 was hang while try to pass #GraphicsFuzz shader15 test</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105440">Bug 105440</a> - GEN7: rendering issue on citra</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105442">Bug 105442</a> - Hang when running nine ff lighting shader with radeonsi</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105567">Bug 105567</a> - meson/ninja: 1. mesa/vdpau incorrect symlinks in DESTDIR and 2. Ddri-drivers-path Dvdpau-libs-path overrides DESTDIR</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105670">Bug 105670</a> - [regression][hang] Trine1EE hangs GPU after loading screen on Mesa3D-17.3 and later</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105717">Bug 105717</a> - [bisected] Mesa build tests fails: BIGENDIAN_CPU or LITTLEENDIAN_CPU must be defined</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105942">Bug 105942</a> - Graphical artefacts after update to mesa 18.0.0-2</li>
</ul>
<h2>Changes</h2>
<p>Andres Gomez (2):</p>
<ul>
<li>dri_util: when overriding, always reset the core version</li>
<li>mesa: adds some comments regarding MESA_GLES_VERSION_OVERRIDE usage</li>
</ul>
<p>Axel Davy (5):</p>
<ul>
<li>st/nine: Fix bad tracking of vs textures for NINESBT_ALL</li>
<li>st/nine: Fixes warning about implicit conversion</li>
<li>st/nine: Fix non inversible matrix check</li>
<li>st/nine: Declare lighting consts for ff shaders</li>
<li>st/nine: Do not use scratch for face register</li>
</ul>
<p>Bas Nieuwenhuizen (3):</p>
<ul>
<li>ac/nir: Add workaround for GFX9 buffer views.</li>
<li>radv: Don't set instance count using predication.</li>
<li>radv: Always reset draw user SGPRs after secondary command buffer.</li>
</ul>
<p>Caio Marcelo de Oliveira Filho (1):</p>
<ul>
<li>anv/pipeline: fail if TCS/TES compile fail</li>
</ul>
<p>Daniel Stone (1):</p>
<ul>
<li>st/dri: Initialise modifier to INVALID for DRI2</li>
</ul>
<p>Derek Foreman (1):</p>
<ul>
<li>egl/wayland: Make swrast display_sync the correct queue</li>
</ul>
<p>Dylan Baker (4):</p>
<ul>
<li>meson: don't use compiler.has_header</li>
<li>autotools: include meson_get_version</li>
<li>meson: Set .so version for xa like autotools does</li>
<li>meson: fix megadriver symlinking</li>
</ul>
<p>Emil Velikov (1):</p>
<ul>
<li>docs: add sha256 checksums for 18.0.0</li>
</ul>
<p>Eric Engestrom (3):</p>
<ul>
<li>meson/configure: detect endian.h instead of trying to guess when it's available</li>
<li>docs: fix 18.0 release note version</li>
<li>gbm: remove never-implemented function</li>
</ul>
<p>Henri Verbeet (1):</p>
<ul>
<li>mesa: Inherit texture view multi-sample information from the original texture images.</li>
</ul>
<p>Iago Toral Quiroga (1):</p>
<ul>
<li>compiler/spirv: set is_shadow for depth comparitor sampling opcodes</li>
</ul>
<p>Ian Romanick (1):</p>
<ul>
<li>i965/vec4: Fix null destination register in 3-source instructions</li>
</ul>
<p>Jason Ekstrand (4):</p>
<ul>
<li>nir/vars_to_ssa: Remove copies from the correct set</li>
<li>nir/lower_indirect_derefs: Support interp_var_at intrinsics</li>
<li>intel/vec4: Set channel_sizes for MOV_INDIRECT sources</li>
<li>nir/lower_vec_to_movs: Only coalesce if the vec had a SSA destination</li>
</ul>
<p>Juan A. Suarez Romero (5):</p>
<ul>
<li>cherry-ignore anv: Be more careful about fast-clear colors</li>
<li>cherry-ignore: ac/shader: fix vertex input with components.</li>
<li>cherry-ignore: radv: handle exporting view index to fragment shader. (v1.1)</li>
# else /* for use with static link lib build of Win32 edition only */
# define GLAPI extern
# endif/* _STATIC_MESA support */
# endif
# if defined(__MINGW32__) && defined(GL_NO_STDCALL) || defined(UNDER_CE) /* The generated DLLs by MingW with STDCALL are not compatible with the ones done by Microsoft's compilers */
description:'comma separated list of window systems to support. wayland, x11, surfaceless, drm, etc.'
value:'auto',
description:'comma separated list of window systems to support. If this is set to auto all platforms applicable to the OS will be enabled.'
)
option(
'dri3',
type:'combo',
value:'auto',
choices:['auto','yes','no'],
choices:['auto','true','false'],
description:'enable support for dri3'
)
option(
'dri-drivers',
type:'string',
value:'i915,i965',
description:'comma separated list of dri drivers to build.'
value:'auto',
description:'comma separated list of dri drivers to build. If this is set to auto all drivers applicable to the target OS/architecture will be built'
)
option(
'dri-drivers-path',
type:'string',
value:'',
description:'Location of dri drivers. Default: $libdir/dri.'
description:'Location to install dri drivers. Default: $libdir/dri.'
)
option(
'dri-search-path',
type:'string',
value:'',
description:'Locations to search for dri drivers, passed as colon separated list. Default: dri-drivers-path.'
)
option(
'gallium-drivers',
type:'string',
value:'pl111,radeonsi,nouveau,swrast,vc4',
description:'comma separated list of gallium drivers to build.'
value:'auto',
description:'comma separated list of gallium drivers to build. If this is set to auto all drivers applicable to the target OS/architecture will be built'
)
option(
'gallium-media',
'gallium-extra-hud',
type:'boolean',
value:false,
description:'Enable HUD block/NIC I/O HUD status support',
)
option(
'gallium-vdpau',
type:'combo',
value:'auto',
choices:['auto','true','false'],
description:'enable gallium vdpau state tracker.',
)
option(
'vdpau-libs-path',
type:'string',
value:'',
description:'comma separated list of gallium media APIs to build (omx,va,vdpau,xvmc).'
description:'path to put vdpau libraries. defaults to $libdir/vdpau.'
)
option(
'gallium-xvmc',
type:'combo',
value:'auto',
choices:['auto','true','false'],
description:'enable gallium xvmc state tracker.',
)
option(
'xvmc-libs-path',
type:'string',
value:'',
description:'path to put xvmc libraries. defaults to $libdir.'
)
option(
'gallium-omx',
type:'combo',
value:'auto',
choices:['auto','disabled','bellagio','tizonia'],
description:'enable gallium omx state tracker.',
)
option(
'omx-libs-path',
type:'string',
value:'',
description:'path to put omx libraries. defaults to omx-bellagio pkg-config pluginsdir.'
)
option(
'gallium-va',
type:'combo',
value:'auto',
choices:['auto','true','false'],
description:'enable gallium va state tracker.',
)
option(
'va-libs-path',
type:'string',
value:'',
description:'path to put va libraries. defaults to $libdir/dri.'
)
option(
'gallium-xa',
type:'combo',
value:'auto',
choices:['auto','true','false'],
description:'enable gallium xa state tracker.',
)
option(
'gallium-nine',
type:'boolean',
value:false,
description:'build gallium "nine" Direct3D 9.x state tracker.',
)
option(
'gallium-opencl',
type:'combo',
choices:['icd','standalone','disabled'],
value:'disabled',
description:'build gallium "clover" OpenCL state tracker.',
)
option(
'd3d-drivers-path',
type:'string',
value:'',
description:'Location of D3D drivers. Default: $libdir/d3d',
)
option(
'vulkan-drivers',
type:'string',
value:'intel,amd',
description:'comma separated list of vulkan drivers to build.'
value:'auto',
description:'comma separated list of vulkan drivers to build. If this is set to auto all drivers applicable to the target OS/architecture will be built'
)
option(
'shader-cache',
@@ -101,7 +185,7 @@ option(
'gbm',
type:'combo',
value:'auto',
choices:['auto','yes','no'],
choices:['auto','true','false'],
description:'Build support for gbm platform'
)
option(
@@ -115,7 +199,7 @@ option(
'egl',
type:'combo',
value:'auto',
choices:['auto','yes','no'],
choices:['auto','true','false'],
description:'Build support for EGL platform'
)
option(
@@ -132,15 +216,31 @@ option(
)
option(
'llvm',
type:'boolean',
value:true,
type:'combo',
value:'auto',
choices:['auto','true','false'],
description:'Build with LLVM support.'
)
option(
'valgrind',
type:'boolean',
value:true,
description:'Build with valgrind support if possible'
type:'combo',
value:'auto',
choices:['auto','true','false'],
description:'Build with valgrind support'
)
option(
'libunwind',
type:'combo',
value:'auto',
choices:['auto','true','false'],
description:'Use libunwind for stack-traces'
)
option(
'lmsensors',
type:'combo',
value:'auto',
choices:['auto','true','false'],
description:'Enable HUD lmsensors support.'
)
option(
'build-tests',
@@ -154,3 +254,35 @@ option(
value:false,
description:'Enable floating point textures and renderbuffers. This option may be patent encumbered, please read docs/patents.txt and consult with your lawyer before turning this on.'
// As in the following HwlComputeSurfaceInfo, thick modes may be degraded to
// thinner modes, we should re-evaluate whether the corresponding
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.