This implied upgrading to the Visual Studio 2019 image, not for VS
itself, but for the newer Python 3.8.5 version it contains, to avoid
UnicodeDecodeError inside modulefinder module when attempting to decode
our UTF-8 encoded Python scripts with cp1252 encoding.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6184>
They're almost entirely split by whether you're uploading user buffer or
from a BO. While I'm rewriting the API, drop the emit_const ->
fdN_emit_const wrapper in favor of a #define before the header and a
little helper for the asserts.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5990>
I got tripped up again with the index vs count vs size fields and I'd
rather we didn't store the redundant info. Settle on immediates_count as
"how many dwords of immediates we have"
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5990>
1) convert to getopt, and drop most variant related options since
they aren't super-useful these days.. and easy enough to add
back if/when needed. (Also, none of the newer shader variant
options where covered before.)
2) covert to dynamically allocated shader/variant, to get things
working again after ir3_shader/_variant converted to ralloc
3) few small cleanups
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6189>
This is a followup of 19db1a540c.
This commit fixed KHR-GL45.texture_view.view_classes on gfx9 but the test
still failed when using AMD_DEBUG=nodma or AMD_DEBUG=nodcc,nodma.
The workaround is now used from si_resource_copy_region so it covers the
previous call site (si_texture_transfer_map) and the sctx->dma_copy
fallback code.
Fixes: 19db1a540c ("radeonsi: add a workaround to fix KHR-GL45.texture_view.view_classes on gfx9")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6115>
The previous implementation kept a hashtable of a Backtrace object per
thread. debug_backtrace_capture is supposed to store a backtrace in
the passed in debug_stack_frame array, but instead overwrote the
per-thread Backtrace object.
This new version works more like the libunwind based capture. We hash
the file and symbol names and store pointers in the debug_stack_frame
struct. This way debug_backtrace_capture doesn't overwrite previous
captures or allocate memory that needs to be freed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6112>
Quiets this warning:
../../master/src/mapi/glapi/tests/check_table.cpp:576:20: error: non-constant-expression cannot be narrowed from type 'unsigned int' to 'int' in initializer list [-Wc++11-narrowing]
{ "glColor3dv", _O(Color3dv) },
^~~~~~~~~~~~
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6112>
Similar to other skips for texture queries that don't actually sample
the texture and which results are not packed.
We can't use nir_tex_instr_is_query() here to skip the lowering for all
queries since that causes regressions in Piglit. Apparently, we do want
to lower some of the query results. In particularly, the LOD query.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6169>
Trying to figure out how uniforms were working, I found that computerator
had different behavior from our GL fragment shaders. Given that 3xx had
an SP_ bit for this (thanks flto@ for the note), it was a matter of
pasting bits of SP_* setup into computerator until I got the GL behavior.
I named it the same as the a3xx register.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6179>
We would be making a MOV from a u32, when we should be loading from a
16-bit value. This likely didn't bite us because we only do mediump in FS
and CS so far, and indirect uniforms are usually in a VS (and usually
highp).
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6179>
This moves the semaphore implementation and tu_QueueSubmit to
tu_drm.c, such that that's the only file including xf86drm.h and
msm_drm.h. This way, the entire kernel interface is contained in
tu_drm.c
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5999>
This patch fixes the case where less number of reference surfaces created and destoyed
on need basis. The problem comes when we are refereing old assoiciated data for newly
created target buffer with same address. Here old target buffer destroyed as that
surface is no more used as reference for next frames and when we create a new surface
for the next frame to process we will get the surfaceid and same target address
of destroyed surface.
When new surface/surface->buffer/target ,target->codec is null as we cleared when we
destroy this surface, but per ref_mapping logic, it was taking null associated data
i.e.0 as curr_ref_idx. Hence total reference mapping table goes wrong with wrong data.
Beacuse of this, we have seen corrupted vp9 decoded frames.
Signed-off-by: SureshGuttula <suresh.guttula@amd.corp-partner.google.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5452>
This was meant to handle incoherent accesses by always flushing them,
but it accidentally checked for the coherent variant instead. As a
result e.g. a vkCmdClearImage() followed by a renderpass using the image
didn't get any flushes, resulting in the same sort of corruption seen
with sysmem renderpass clears. This happened to be exposed via some
tests that used multiview.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6009>
Some changes unintendedly slipped into an unrelated commit before it was
merged.
This caused kernel modules to be built and installed in the ramdisk,
which caused some devices to fail to boot due to the ramdisk size limit
being surpassed.
These changes weren't in effect until a subsequent commit triggered a
rebuild of the ramdisks.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Fixes: a9560939e0 ("ci: Build-test Panfrost tools")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6167>
The OpenGL flavor of SPIR-V allows for samplers inside structs. This
means that our simple array-of-array handling isn't sufficient and we
need something substantially more complex for generating NIR types.
Fixes: 14a12b771d "spirv: Rework our handling of images and samplers"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6065>
Example:
../src/freedreno/ir2/instr-a2xx.h:384:1: note: offset of packed bit-field
‘const_index’ has changed in GCC 4.4
384 | } instr_fetch_vtx_t;
It's apparently due to bitfields that would cross the width of their type.
Just expand the types of the affected fields so that the compiler quiets
down.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6165>
2D path is using the same hardware as the 3D path, with the advantage of
separate register state, but here it requires WFI and extra cache flushing
and invalidating, so it should be better to just use the 3D path. There are
also some cases where the 3D path would be much faster, since it can clear
multiple attachments at once.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5775>
One day, we may want copy_prop_vars or other passes to be able to see
through certain types of casts such as when someone casts a uint64_t to
a uvec2. However, for now we should just avoid casts all together.
Fixes: d8e3edb784 "nir/deref: Support casts and ptr_as_array in..."
Tested-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6072>
This manages to make some extra vec operations that would turn into movs
go away.
brw shader-db:
total instructions in shared programs: 3895037 -> 3893221 (-0.05%)
total cycles in shared programs: 113832759 -> 113792154 (-0.04%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6050>
A630 doesn't have the HW format we use to sample stencil, so it needs a
workaround. It also has a bug around the AS_R8G8B8A8 format, which doesn't
work when UBWC is disabled, so use 8_8_8_8_UNORM instead when UBWC is
disabled (using AS_R8G8B8A8 or 8_8_8_8_UNORM should only matter with UBWC)
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5438>
This is mainly for the benefit of automated syncing of changes from mesa
back to envytools, where the same subdir meson.build's are used, but the
toplevel meson.build is different. In the envytools case, we want these
depenendencies to be required.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6154>
Instead of building the adreno/foo.xml headers from the toplevel, split
out a subdir(). This fits better with how meson likes things to be
structured. But it does require fixing a bit about how gen_header.py
resolves imports, ie. it cannot assume the src file is at the root of
the $RNN_PATH.
This is needed for the next patch, to add support for installing the
register database for use with installed tools.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6154>
No need for rnn_path.h, just construct the whole thing in meson and pass
via c_args. Also move where the path is constructed up one level. This
will be needed for syncing back to envytools, where the path will be
different.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6154>
libxml2 can load gzip compressed files, so lets look for these too.
This will be useful for installing the register database so that an
installed cffdump/crashdec can use them. But it isn't too useful
to be able to edit the installed register database, so we can gzip
them to use less disk space.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6154>
When iris imports an I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS surface, it
allocates a buffer to hold the indirect clear color. When the import is
complete, iris_resource::aux::clear_color is set to zero but the
indirect buffer is filled with garbage values. This could break certain
texture view use-cases, so zero the allocated buffer to fix those.
Fixes: c19492bcdb ("iris: Handle importing aux-enabled surfaces on TGL")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6092>
It's possible an SSA value depends on a register; in this case, chasing
the source would result in a crash as the chase helper in NIR asserts
is_ssa. Instead we should check a priori that all the argments are in
fact SSA, bailing otherwise.
In the piglit shader exhibiting this bug (by looping over the index),
bailing on the ishl instruction is -necessary-. This is not merely us
being cowardly to avoid seeing through the registers; indeed, if we
wrote away the ishl instruction, the shift itself would have to be
stored in a load/store register (r26/r27) which would preclude reading
it in the loop, creating a register allocation failure later in the
compile. So this is the correct solution due to the restricted
semantics.
Closes#3286
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reported-by: Icecream95 <ixn@keemail.me>
Fixes: f5401cb886 ("pan/midgard: Add address analysis framework")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6144>
When the llvm draw engine is used for draw shaders in st_program the driver
may not enable the cap PIPE_CAP_PACKED_UNIFORMS, so uniforms are not
be lowered to UBOs. However, llvm doesn't support nir_load_uniform, so lower
the uniforms to UBO now. The multiplier is set to 16 to be the same like in
the TGSI code path.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5681>
Fixes the following building errors:
clang: error: no such file or directory: 'external/mesa/src/gallium/drivers/freedreno/a2xx/disasm-a2xx.c'
clang: error: no input files
FAILED: out/target/product/x86_64/obj/SHARED_LIBRARIES/gallium_dri_intermediates/LINKED/gallium_dri.so
ld.lld: error: undefined symbol: disasm_a2xx
>>> referenced by ir2_assemble.c:546 (external/mesa/src/gallium/drivers/freedreno/a2xx/ir2_assemble.c:546)
>>> ir2_assemble.o:(assemble) in archive out/target/product/x86_64/obj/STATIC_LIBRARIES/libmesa_pipe_freedreno_intermediates/libmesa_pipe_freedreno.a
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)
Fixes: f39afda1a7 ("freedreno: move a2xx disasm out of gallium")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6151>
Fixes the following building errors:
FAILED: ninja: 'external/mesa/src/freedreno/registers/a2xx.xml',
needed by 'out/target/product/x86_64/gen/STATIC_LIBRARIES/libfreedreno_registers_intermediates/registers/a2xx.xml.h',
missing and no known rule to make it
...
FAILED: ninja: 'external/mesa/src/freedreno/registers/adreno-pm4-pack.xml.h',
needed by 'out/target/product/x86_64/gen/STATIC_LIBRARIES/libfreedreno_registers_intermediates/registers/adreno-pm4-pack.xml.h',
missing and no known rule to make it
Fixes: b721d336da ("freedreno: slurp in rnndb")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6151>
We were open-coding it, and getting variant parsing wrong for things
like "A4XX-" which don't explicitly include the version being
disassembled. Use the rnn function instead. This makes CP_INDIRECT show
up again. Also propagate const-ness to users.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6140>
It's totally not obvious, but this runs extra error checking and is
necessary for correct variant handling, and variant handling will
silently not work if it's not enabled. Add it asm.c even though it's not
strictly necessary, to prevent anyone from missing this in the future.
Missing this really should be an error.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6140>
We'll need this for afuc, and we're currently also open-coding the same
thing in rnnutils. It seems this function was added to decode pm4 packet
names, but it currently has no users, so make it useful for what it
was intended to do.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6140>
`init_haiku()` is called by `eglInitialize()`, which then calls
`_eglComputeVersion()` (without even anything in between). The latter
sets the EGL version based on the extensions supported, and since Haiku
doesn't support any it will end up overwriting the same `1.4` value.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6131>
Previously we parsed a src non-terminal but did nothing with it. Since
the WAIT instruction is kind of weird, in that you have to give it the
same notification subregister for both destination and source, and it
always has an exec size of 1, let's parse a destination instead of a
source. This way, we can parse a writemask rather than a swizzle in
align16 mode, and easily convert the writemask to a swizzle to create
the source register.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5956>
brw_reg::subnr is in bytes, like the subnr field in the instruction
word, but we disassemble the subregister number in units of the type.
For example g0.3<1>F would have a subnr=12.
These non-terminals produce a brw_reg and feed into other non-terminals
that call brw_reg(), where they are passed the subnr that we set here.
brw_reg()'s subnr parameter is expected to be in terms of the register
type, and it is multiplied by the type size to calculate the subnr in
bytes.
In these non-terminals, we don't know the register type yet, so we
must store the subregister number as it was given to us in the .subnr
field and let the brw_reg() constructor handle the conversion to the
canonical byte-based subnr form when it knows the type.
Before this patch, subregister numbers applied to these registers would
be multiplied with the type size twice.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5956>
Also document some other registers gleaned from looking at the context
switch save/restore routines and fix CP_SDS_REM_SIZE, and make the names
line up with the CP perfcntr names. Note that the CP reads the draw
stream size in CP_SET_BIN_DATA5 using MEM_READ_ADDR, which is probably
why this was mistaken for the draw stream size address.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6123>
Mesa returns a stub function pointer to glAnything for years.
Android framework till API level 30 just uses function pointers
returned from eglGetProcAddress without checking if the underlying
extension is supported. If we return stub pointers for functions
in GL_EXT_debug_marker, Android just uses our stub functions instead
of its own stubs and then fail the dEQP. In the past, the issue
didn't show up because mesa only has limited slots and run out of slots
before Android calls eglGetProcAddress on functions inside
GL_EXT_debug_marker.
Signed-off-by: Lepton Wu <lepton@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5652>
In an effort to simplify MIR by not prepacking instructions, this commit
removes references to `ins->alu.outmod` so that we can later remove the
`ins->alu` field from midgard_instruction.
Every place that was using `ins->alu.outmod` was changed to now use the
generic `ins->outmod` field instead.
We then reconstruct the outmod field right before emission.
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5933>
In an effort to simplify MIR by not prepacking instructions, this commit
removes references to `ins->alu.reg_mode` so that we can later remove
the `ins->alu` field from midgard_instruction.
Every place that was using reg_mode was changed to now use the generic
`ins->src_type` field instead.
We then reconstruct the reg_mode field right before emission.
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5933>
In an effort to simplify MIR by not prepacking instructions, this commit
removes references to `ins->alu.op` so that we can later remove the
`ins->alu` field from midgard_instruction.
Every place that was using ins->op was changed to now use the generic
`ins->op` field instead.
We then reconstruct the `alu.op` field right before emission.
This new field is generic and can contain opcodes for ALU, texture or
load/store instructions. It should be used in conjunction with
`ins->type`, just like the current prepacked `op` field.
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5933>
I had this as abort() in my original implementation since I was doing
drm-shim and my kernel driver in parallel based around using a SW
simulator, and I wanted to always update both, but it means that people's
new feature detection code can easily end up breaing their drm-shim
shader-db runs (such as intel's kernel_has_dynamic_config_support()
checking for -ENOENT instead of -EINVAL for a feature, which showed up on
my personal runner but not fd.o's for reasons I'm unclear on).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5994>
We only need one s_bfe for a conversion with a swizzled source.
shader-db (parallel-rdp, Navi):
Totals from 487 (71.30% of 683) affected shaders:
SpillSGPRs: 3284 -> 3233 (-1.55%); split: -2.71%, +1.16%
SpillVGPRs: 2174 -> 2150 (-1.10%); split: -1.24%, +0.14%
CodeSize: 2497864 -> 2445544 (-2.09%); split: -2.11%, +0.01%
Instrs: 450613 -> 445104 (-1.22%); split: -1.27%, +0.05%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5259>
That's an attempt at replacing the complex __int64_to_float() and
__float_to_int64() implementations found in float64.glsl by a simpler
native NIR equivalent.
Thanks to that, we can have lower those conversion without having to
compile a GLSL shader, which would be quite annoying for OpenCL
kernels.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5588>
Connor recently ran into an issue where the chezas were hanging where his
GPUs weren't, and was blocked on getting some feedback on what was
happening. A devcoredump will help non-cheza-having devs debug (or
hopefully with other intermittent fails).
Closes: #3187
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6036>
This effectively reverts part of 2907faee, which changed dri2_make_current() to
always take a dri2_dpy reference regardless of whether or not a new context or
surface(s) were being bound. This led to a reference count imbalance as there
was no corresponding code added to drop a reference on the dri2_dpy. As a
consequence, any application that called eglInitialize() on a default/native
display after having called eglTerminate() would always get back the old
dri2_dpy, inheriting its previous state.
As the reference count is there to prevent the dri2_dpy from being destroyed
between eglTerminate() and eglInitialize() calls when a context is still bound,
a reference should only be taken when a successful call to
dri2_dpy->core->bindContext() has been made. Fix the issue by restoring the old
reference counting behaviour.
Fixes: 4e8f95f64d ("egl_dri2: Always unbind old contexts")
Fixes: 2907faee7a ("egl/dri2: try to bind old context if bindContext failed")
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Nicolas Cortes <nicolas.g.cortes@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3328
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6105>
We were space-leaking iris_compiled_shader objects, leaving them around
basically forever - long after the associated iris_uncompiled_shader was
deleted. Perhaps even more importantly, this left the BO containing the
assembly referenced, meaning those were never reclaimed either. For
long running applications, this can leak quite a bit of memory.
Now, when freeing iris_uncompiled_shader, we hunt down any associated
iris_compiled_shader objects and pitch those (and their BO) as well.
One issue is that the shader variants can still be bound, because we
haven't done a draw that updates the compiled shaders yet. This can
cause issues because state changes want to look at the old program to
know what to flag dirty. It's a bit tricky to get right, so instead
we defer variant deletion until the shaders are properly unbound, by
stashing them on a "dead" list and tidying that each time we try and
delete some shader variants.
This ensures long running programs delete their shaders eventually.
Fixes: ed4ffb9715 ("iris: rework program cache interface")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6075>
Instead of having separate lists of variables, roughly sorted by mode,
use a single list for all shader-level NIR variables. This makes a few
list walks a bit longer here and there but list walks aren't a very
common thing in NIR at all. On the other hand, it makes a lot of things
like validation, printing, etc. way simpler. Also, there are a number
of cases where we move variables from inputs/outputs to globals and this
makes it way easier because we no longer have to move them between
lists. We only have to deal with that if moving them from the shader to
a nir_function_impl.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
We also add a new list iterator which takes a modes bitfield and
automatically figures out which list to use. In the future, this
iterator will work for multiple modes but today it assumes a single mode
thanks to the behavior of nir_variable_list_for_mode. This also doesn't
work for function_temp variables.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
For the most part, this doesn't actually matter today. We already only
call remove_dead_vars on the lists that are specified in the modes. The
only functional change here is for the uniform, mem_ubo, and mem_ssbo
modes because they share a list. If nir_remove_dead_variables is called
with a mode of nir_var_uniform, it will no longer remove UBOs or SSBOs,
for instance.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
The "Fixes:" tag example has the commit title in double quotes, whereas the
suggested git fixes alias, a couple of lines below, also adds some outer
parenthesis.
Although there doesn't appear to be a consistent format for the "Fixes:" tag,
other than it should be a git commit sha followed by the commit title, the
information in the docs should at least be consistent. As the "Fixes:" tag was
inspired by the Linux kernel, which does have parenthesis, update the example to
match the git fixes output.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6106>
We have already been using this to describe pm4 packets that are
specific to certain generations.
Maybe we should introduce a new "packet" element type instead. But
first lets just get validation working with what we already use.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6107>
In the schema, boolean means strictly "true" or "false". But rnn
parsing code was looking for "yes"/"1"/"no"/"0". So split the
difference, and add a relaxed boolean type, and update rnn to also
accept "true" or "false".
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6107>
We'll need this for finding where INDIRECT/STRIDE are in
CP_DRAW_INDIRECT_MULTI, since they are in different locations in each
variant.
This is tricky because we need to bubble up success/failure to the upper
levels, and 0 could be a valid offset if the stripe is inside an array
or in a packet. Hence we refactor tryreg to return success/failure
separately, although I stopped short of modifying rnndec_decodereg
itself.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6104>
Most ops use the common handling in emit_alu in order to convert NIR
sources to bifrost sources, but the "special" math op lowering handrolls
the conversion (due to needing to reference the same source multiple
times).
Unfortunately, that handrolled lowering did not consider that there
might be a non-identity swizzle on the source. In this case we would
reference the wrong component of the source and generate garbage.
Fixes all but two of the remaining failures on G31 in:
dEQP-GLES2.functional.shaders.operator.exponential.*highp*
The following tests are still broken due to some other issue:
dEQP-GLES2.functional.shaders.operator.exponential.exp2.highp_float_fragment
dEQP-GLES2.functional.shaders.operator.exponential.exp2.highp_vec2_fragment
Signed-off-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6108>
Merge the extra tracking that is useful for generating stats from asm
(as opposed to ir), and for guestimating things like inputs and outputs
(mostly useful for r/e) into ir3's version and drop cffdec's version.
There is a small change in disasm output for the decode tools, in that
it no longer prints the used consts, but rather just the max accessed
const. This is the more useful piece of information, and avoids making
the shared regmask type big enough to deal with the const reg file.
Additional error checking for invalid regids causes crashdec to bail
out sooner when decoding memory that *might* hold valid instructions.
Also, crashdec no longer prints stats, because stats aren't very useful
when trying to decode random instruction memory (which might or might
not be valid instructions).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6070>
To unify the ir3 disasm code, we need to add in the regmask based
register tracking from cffdec's version of the disassembler. Split
out regmask (or at least the part that doesn't depend on ir3) so
it can be shared.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6070>
This also tunes `.freedreno-rules` a bit so that it isn't triggered by
various tools that don't effect the driver build.
The .gitlab-ci directory is kept separate from the toplevel one so that
updates to (for example) reference decode output do not trigger all the
other-driver jobs to run.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6070>
Bifrost's bitwise ops include the shift capability. Previously we had
hardcoded the shift to zero in all cases.
There's room in future to emit slightly better code if a shift and a
bitwise operation can be folded together, but not going after that for
now.
This change also removes the separate BI_SHIFT instruction class as
BI_BITWISE can cover both cases.
Signed-off-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6091>
For convenience in e68871f6a4 ("spirv: Handle constants and types
before execution modes") we moved all execution mode parsing after the
constants and types, so that those using OpExecutionModeId could be
handled together.
Later in 84781e1f1d ("spirv/nir: keep track of SPV_KHR_float_controls
execution modes") we had to parse certain non-ID execution modes
before handling constants.
Instead of handling just the float controls related execution modes
early, handle all modes that don't need an ID. This is a more
"natural" split and will allow other type handling to rely on
execution mode in the future.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6062>
Found via
dEQP-VK.binding_model.descriptorset_random.sets4.noarray.ubolimitlow.sbolimitlow.sampledimglow.outimgonly.noiub.nouab.frag.ialimitlow.0
Fixes: 159a1300ce ("turnip: input attachment descriptor set rework")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6087>
dEQP-VK.wsi.xlib.surface.query_devgroup_present_modes with prime
fail when 0 rectangles are reported. While I believe that test
tests this unintentionally (trying to test the VK_INCOMPLETE return),
I believe it makes sense to always return a rectangle.
In particular we require the data from the given rectangle for
presentation even if we use prime and given that prime is completely
transparent for the app it still counts as local from the perspective
as the application.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6071>
Vulkan allows fragment shaders to read gl_ClipDistance[], in which case
the SPIR-V compiler will inject a compact array variable at
VARYING_SLOT_CLIP_DIST0. Request the lowering to always work in terms
of a compact array variable so we don't have to care about the API
in use.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6022>
Vulkan allows fragment shaders to read gl_ClipDistance[], in which
case the SPIR-V compiler inserts a single compact array variable for
VARYING_SLOW_CLIP_DIST0 and the lowering should not try to inject
its own variables, but instead work in terms of the existing one.
Vulkan drivers are expected to call this with use_clipdist_array set
to true to be consistent with this setup.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6022>
Fixes some dEQP-VK.renderpass2.* flakes. Valgrind:
Test case 'dEQP-VK.renderpass2.dedicated_allocation.attachment.8.724'..
==754520== Conditional jump or move depends on uninitialised value(s)
==754520== at 0x575B21C: radv_layout_is_htile_compressed (radv_image.c:1690)
==754520== by 0x572F470: radv_handle_depth_image_transition (radv_cmd_buffer.c:5855)
==754520== by 0x572F2F2: radv_handle_image_transition (radv_cmd_buffer.c:6123)
==754520== by 0x572EEC6: radv_handle_subpass_image_transition (radv_cmd_buffer.c:3385)
==754520== by 0x572A104: radv_cmd_buffer_begin_subpass (radv_cmd_buffer.c:4843)
==754520== by 0x572A007: radv_CmdBeginRenderPass (radv_cmd_buffer.c:4913)
==754520== by 0x572A197: radv_CmdBeginRenderPass2 (radv_cmd_buffer.c:4921)
Why false?
A renderloop happens when the same attachment is both used as input
attachment and output (color, ds) attachment in a subpass. Of course
this doesn't happen outside of a renderpass and hence we can initialize
it to false at the start of the renderpass.
Fixes: 66131ceb8b "radv: Pass through render loop detection to internal layout decisions."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3074
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6068>
Note that there are some extra tess fails, but they're probably
unrelated to the actual feature. There were also some xfails that were
created as part of an earlier attempt to enable the feature which were
fixed in the meantime, so remove them.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5738>
I missed this if statement so atomic counters weren't getting bindings
and, when you have more than one of them, that meant they were all
getting combined into one.
Fixes: 3584cb09bc15 "spirv: Give atomic counters their own variable mode"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6060>
Track them via pending_flush_bits. Previously WFI was only tracked in
flush_bits and WAIT_FOR_ME was emitted directly. This means that we don't
emit WAIT_FOR_ME or WAIT_FOR_IDLE if there wasn't a cache flush or other
write by the GPU. Also split up host writes from sysmem writes, as only
the former require WFI/WAIT_FOR_ME.
Along the way, I also realized that we were missing proper handling of
transform feedback counter writes which require WAIT_MEM_WRITES. Plumb
that through as well. And CmdDrawIndirectByteCountEXT needs a
WAIT_FOR_ME as it does not wait for WFI internally.
As an example of what this does, a typical barrier for transform
feedback with srcAccess = VK_TRANSFORM_FEEDBACK_WRITE_COUNTER_BIT_EXT
and dstAccess = VK_ACCESS_INDIRECT_COMMAND_READ_BIT used to emit on
A650:
- WAIT_FOR_IDLE
and now we emit:
- WAIT_MEM_WRITES
- WAIT_FOR_ME
So we've eliminated a useless WFI and added some necessary waits.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6007>
These have an indirect count which is loaded from an iova, and the
minimum is taken between the indirect and direct counts. Note, I also
had to fix gen_header.py to deal with the extra-long names we get.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6007>
Now that we clear the PITCHALIGN" field when filling GMEM input attachment
descriptors, we can get rid of the extra tile width alignment on a630/a640.
With the "block_align_shift" value change, this brings down the default
gmem_align from 16k to 4k on a630/a640 and down from 24k to 12k on a650,
to match the gallium driver.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5528>
v3d_compile is now split out into a helper function that gets called a
second time if compilation fails the first time with the result
reporting the register allocation failed. The second time it is run with
the fallback scheduler to try and increase the chances of successfully
allocating the registers.
v2: Add a performance debug message when using the fallback scheduler.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>
Instead of just having a bool status for the failure, there is now an
enum so that the compilation can report a more detailed status.
Currently this is only used to report whether the failure was due to
failed register allocation. The “failed” bool doesn’t seem to actually
have been used anywhere so this doesn’t really change a lot.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>
The current scheduling algorithm favors parallelism a bit too
aggressively and sometimes generates shaders that fail register
allocation. This happens even if the threshold is set to zero to force
it to always use the CSR instruction choosing algorithm.
This patch adds an option to use an even more aggressive fallback that
just always picks the instruction with the shortest maximum delay in the
hope that that will generate the least register pressure. The intention
is to use this as a last resort after register allocation fails in order
to at least have a working shader.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>
The input primitive ID is read from the VPM in the same memory segment
as the outputs. This means that writing the GS header to VPM location 0
needs to be done after reading the primitive ID. This patch adds a
dependency between the load_primitive_id intrinsic and the store_output
intrinsic for location 0 to stop the scheduler from reordering them.
v2: Use an enum for the dependency class number.
v3: Add "GS" to the class enum name.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>
Adds a callback function to nir_schedule_options to give the backend a
chance to add custom dependencies between certain intrinsics. The
callback can assign a class number to the intrinsic and then set a read
or write dependency on that class.
v2: Use a linked-list of schedule nodes for the dependency classes
instead of a fixed-sized array.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>
nir_schedule already has a struct for options which makes it more than
just a function declaration. Later patches intend to add more structs to
complement these options. In order to make the code easier to manage,
this moves the nir_scheduler-related parts out of nir.h to their own
header.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>
Previously, objects of type OpTypeImage or OpTypeSampler were treated as
vtn_pointers and objects of type OpTypeSampledImage were a special-use
vtn_sampled_image struct. This commit changes that so that all of those
objects are stored in vtn_ssa_values. Each of images, samplers, and
sampled images, are stored as a scalar or vector nir_ssa_def whose
components are NIR deref values. We now use vtn_type_get_nir_type to
re-resolve those as-needed into GLSL sampler types for NIR.
This simplification has a number of benefits:
1. We can git rid of the rest of our special-cases for handling images
and samplers in function arguments. Now that they're treated as
structs at the glsl_type level, the generic paths can handle images
and samplers.
2. We can now construct composite values containing images and samplers
internally. It's unclear from the SPIR-V spec whether or not this
is allowed and it's not a pattern that GLSLang currently generates
thanks to GLSL rules. However, if we do start seeing SPIR-V that
contains such composites, we should now be able to handle it.
3. SPIR-V OpNull and OpUndef instructions can now create samplers,
images, and sampled images. The NIR generated won't likely be fully
valid but, given a NIR pass to do something sensible, it should be a
thing we can compile.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5278>
We're about to make the SPIR-V -> NIR path generate a bit more complex
SSA chains for certain derefs. This will ensure we don't regress anyone
when we start making vec2's of derefs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5278>
There are a few cases, atomic counters being one example, where the type
used by vtn_ssa_value is not the same as the type we want NIR to use in
derefs and variables. To solve this, we add a helper which converts
between the types for us. In the next commit, we'll be adding another
major user of this: images and samplers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5278>
Previously, we created our vtn_ssa_value in _vtn_variable_load_store
manually as we did the recursive load/store. Instead, we now create the
SSA value before calling into the recursive function. This is a tiny
bit less efficient but it removes a case of hand-rolling vtn_ssa_value
creation. For symmetry, we make _vtn_block_load_store assume the value
is already created. Finally, we remove a trivial hand-rolled case in
vtn_composite_extract.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5278>
For three different functions which create vtn_ssa_values, we had three
completely different implementations. This unifies them all to roughly
the same algorithm. While we're at it, we take advantage of the
nir_build_imm helper to avoid some extra code in vtn_const_ssa_value.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5278>
The w++ is to handle a differences between the KHR extension and Vulkan
1.1 feature where the Vulkan 1.1 instructions take an scope parameter.
While incrementing w technically works, it's really subtle and very easy
to miss when reading the code.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5278>
For some reason, we were doing a signed shift vectors and an unsigned
shift for scalars. We then plug it into i2b so it should make no
difference whatsoever. The fact that we're doing different things for
vectors vs. scalars is bonkers. Let's simplify the code a bit.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5278>
The original implementation of SPV_EXT_descriptor_indexing was extremely
paranoid about the NonUniform qualifier, trying to fetch it from every
possible location and propagate it through access chains etc. However,
the Vulkan spec is quite nice to us on this and has very strict rules
for where the NonUniform decoration has to be placed. For image and
texture operations, we can search for the decoration on the spot when we
process the image or texture op. For pointers, we continue putting it
on the pointer but we don't bother trying to do anything silly like
propagate it through casts.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5278>
For cross-process timelines we have to have a thread to wait
till the requested points become available.
The functions actually dealing with timeline semaphores stubbed out, to
implement in the next patch. As such the thread code shouldn't trigger
yet.
The core idea is that we still use the refcount mechanism that we use with
emulated timelines, though the native timeline syncobj don't participate
in the refcounting. This way we keep the ordering of submission in a queue
as each submission is also blocked by its predecessor.
Where we change behavior is when the number of blockers reaches 0. In the
new code we check if we need to wait for the timeline semaphores to
be available and if so we won't execute the submission immediately but
pass it to the submission thread.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5600>
To facilitate cross-process timeline semaphores we have to deal with
the fact that the syncobj signal operation might be submitted a
small finite time after the wait operation.
For that we start using DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT during
the wait operation so we properly wait instead of returning an error.
Furthermore, to make this effective for syncobjs that get reused we
actually have to reset them after the wait. Otherwise the wait before
submit would get the previous fence instead of waiting for the new
thing to submit.
The obvious choice, resetting the syncobj after the CS submission
has 2 issues though:
1) If the same semaphore is used for wait and signal we can't reset it.
This is solvable by only resetting the semaphores that are not in the
signal list.
2) The submitted work might be complete before we get to resetting the
syncobj. If there is a cycle of submissions that signals it again and
finishes before we get to the reset we are screwed.
Solution:
Copy the fence into a new syncobj and reset the old syncobj before
submission. Yes I know it is more syscalls :( At least I reduced the
alloc/free overhead by keeping a cache of temporary syncobjs.
This also introduces a syncobj_reset_count as we don't want to reset
syncobjs that are part of an emulated timeline semaphore. (yes, if
the kernel supports timeline syncobjs we should use those instead,
but those still need to be implemented and if we depend on them in
this patch ordering dependencies get hard ...)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5600>
If another MR was merged while these were still running for the main
project, the result could be no updated images in the main project
registry (forcing a rebuild of the new images in all forked projects) or
an outdated Mesa website.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6011>
/glcts --deqp-surface-width=1024 --deqp-surface-height=64 --deqp-case=KHR-GL45.texture_view.view_sampling --deqp-surface-type=fbo
was failing but only for width 1024.
The test was filling a 4x4 ms texture, but leaving the viewport set to 1024x64.
This was resulting in this code incorrectly sign extending a value, and passing
it into the mask generator and getting the wrong values. Explicit cast
avoids the sign extension and fixes the above test.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6006>
This avoids some performance regressions on Gen12 platforms caused by
SIMD32 fragment shaders reported in titles like Dota2, TF2, Xonotic,
and GFXBench5 Car Chase and Aztec Ruins.
The most obvious pattern in the regressing shaders I identified among
these workloads is that they all had non-uniform discard statements,
which are handled rather optimistically by the current IR analysis
pass: No penalty is currently applied to the SIMD32 variant of the
shader in the form of differing branching weights like we do for other
control flow instructions in order to account for the greater
likelihood of divergence of a SIMD32 shader.
Simply changing that by giving the same treatment to discard
statements as we give to other branching instructions seemed to hurt
more than it helped on platforms earlier than Gen12, since it reversed
most of the improvement obtained from SIMD32 fragment shaders in
Manhattan for no measurable benefit in other workloads (Manhattan has
a handful of shaders with statically non-uniform discard statements
which actually perform better in SIMD32 mode due to their approximate
dynamic uniformity). For that reason this change is applied to Gen12+
platforms only.
I've been running a number of tests trying to understand the
difference in behavior between Gen12 and earlier platforms, and most
of the evidence I've gathered seems to point at EU fusion being the
culprit: Unlike previous generations, on Gen12 EUs are arranged in
pairs which execute instructions in lockstep, giving an effective warp
size of 64 threads in SIMD32 mode, which seems to increase the
likelihood for control flow divergence in some of the affected shaders
significantly.
Fixes: 188a3659ae "intel/ir: Import shader performance analysis pass."
Reported-by: Caleb Callaway <caleb.callaway@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5910>
tgsi_exec.c uses the generic src load path for indirects, so we don't
actually need addr regs. Saves extra intructions.
shader-db results:
total instructions in shared programs: 3346685 -> 3249052 (-2.92%)
instructions in affected programs: 961832 -> 864199 (-10.15%)
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6018>
This makes us more like other drivers, and avoids having tons of different
names (particularly when you want to dump vs and fs in debugging). In the
process, having a debug flag for vertex shaders just falls out.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6018>
Turning on robust buffer access enables GLES 3.2, also
finished GL 4.3 support.
The post depth coverage fail is expected, it's a test bug
This also introduce a fail in the invalid flag test that I can't reproduce out of CI.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5971>
TGSI expect vec4 of constants for it's current code paths, and when
doing indirect accesses it does the comparison on vec4 indexes,
however NIR does the indexing on packed float indexes.
This also align the compute path with the other shaders, and
should improve robustness (at least under Vulkan)
Fixes:
KHR-NoContext.gl43.robust_buffer_access_behavior.uniform_buffer
v1.1:
rename variable to something more meaningful (Roland)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5971>
u_screen will return 0 for all of these, which means that this is one
less driver to see in git grep when I'm checking who exposes a cap.
The exception is the texel/gather offsets and stream output
components, which will not be exposed since we don't expose the
corresponding GLSL version.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3298>
When creating a sampler-type, we need to pass the correct vaclue for
the "is_shadow"-parameter to glsl_sampler_type(), otherwise the compiler
backend will have no clue about this being a shadow-sampler.
Fixes: 1c0f92d8a8 ("nir: Create sampler variables in prog_to_nir.")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5986>
We theoretically could push these sizes to the const file
opportunistically, which appears to be what the blob does, but the
maximum number of SSBO's is way too big to do that unconditionally. Just
use resinfo to get the size for now.
Fixes on turnip: dEQP-VK.ssbo.unsized_array_length.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6012>
This change mostly touches error handling code paths, where a
bug was found when the DRI driver failed to bind a new DRI
context. Specifically, the reason for it to fail was the window
system unable (for whatever reason) to provide the DRI drawable
with a buffer. In this instance, Mesa un-does the EGL bindings,
but doesn't restore the old DRI context, hence remaining in a
funny state. It's worth mentioning that despite trying, there
is no guarantee that the old DRI context can be restored,
depending on the runtime.
Before this change, if bindContext() failed then
dri2_make_current() would rebind the old EGL context and
surfaces and return EGL_BAD_MATCH. However, it wouldn't rebind
the DRI context and surfaces, thus leaving it in an
inconsistent and unrecoverable state.
After this change, dri2_make_current() tries to bind the old
DRI context and surfaces when bindContext() failed. If unable
to do so, it leaves EGL and the DRI driver in a consistent
state, it reports an error and returns EGL_BAD_MATCH.
Fixes: 4e8f95f64d ("egl_dri2: Always unbind old contexts")
Signed-off-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5707>
This adds a nir_unsigned_upper_bound() helper which does something similar
to nir_analyze_range() except it tries to obtain the largest possible
value instead of it's relation to zero.
It also adds nir_addition_might_overflow(), which uses this helper to try
to prove that an unsigned addition does not wrap around.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2720>
This (combined with a pass to actually set the corresponding NIR flags)
should help fix a lot of the regressions from the SMEM addition combining
change.
fossil-db (Navi):
Totals from 12 (0.01% of 135946) affected shaders:
CodeSize: 12376 -> 12304 (-0.58%)
Instrs: 2436 -> 2422 (-0.57%)
VMEM: 1105 -> 1096 (-0.81%)
SClause: 133 -> 130 (-2.26%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2720>
SMEM does the addition with 64-bits, not 32. So if the original code
relied on wrapping around (for example, for subtraction), it would break.
Apparently swizzled MUBUF accesses also have issues with combining
additions that could overflow. Normal MUBUF accesses seem fine.
fossil-db (Navi):
Totals from 27219 (20.02% of 135946) affected shaders:
CodeSize: 128303256 -> 131062756 (+2.15%); split: -0.00%, +2.15%
Instrs: 24818911 -> 25280558 (+1.86%); split: -0.01%, +1.87%
VMEM: 162311926 -> 177226874 (+9.19%); split: +9.36%, -0.17%
SMEM: 18182559 -> 20218734 (+11.20%); split: +11.53%, -0.34%
VClause: 423635 -> 424398 (+0.18%); split: -0.02%, +0.20%
SClause: 865384 -> 1104986 (+27.69%); split: -0.00%, +27.69%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2748
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2720>
This should be much better than the previous method that was more
guesswork-based than anything else. It returns a value within 1 of the
blob for every input value I've tested, and it seems like it returns
slightly better (but still legal) answers when it differs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5950>
For implementing panfrost_flush, it suffices to wait on only a single
syncobj, not an entire array of them. This lets us wait on it directly,
without coercing to/from syncfds in the middle (although some complexity
may be added later to support Android winsys).
Further, we should let the fence own the syncobj, tying together the
lifetimes and thus removing the connection between syncobjs and
batch_fence.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5995>
With the current kernel UABI, there is no benefit to explicitly
specifiying dependencies, since the kernel by design adds implicit
dependencies to any referenced BOs. This is something we'd like to
address in the future, but efficient handling with future kernels will
require a tweaked design in userspace as well. So let's do the obvious
thing now, and extend later.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5995>
We don't want these files shared between builds (it'll get blown away by
the next rsync), and NFS will just increase our latency for hitting the
cache.
Drops a630 gles31 run from 11-17 minutes to 5.5. Maximum cache size on a
run I've seen is 153M, which it seems we can easily spare.
Fixes: f97acb4bb4 ("freedreno/ir3: disk-cache support")
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5998>
The intention here was to check "Would the GPU be able to compress
this if we used the PBO-based texture upload path?" Prior to Gen12,
that meant checking for CCS_E. On Gen12, there are a lot more types
of compression, and basic CCS_E was replaced by GEN12_CCS_E, making
this check simply not work, so we'd take the CPU path instead.
Instead, check if it has CCS, and isn't the basic "fast clear" CCS_D.
Fixes: 39f06e2848 ("iris: Implement pipe->texture_subdata directly")
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6005>
I had a stp testcase that was getting its offset wrong, and by twiddling
bits and feeding it to qc disasm, I found that the comment was sort of
right: some the cat6a bits implicated in the old comment do get used, as
the high bits of the cat6c offset. Reallocating those bits also fixes how
we were getting r960.y for r0.y.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5815>
This started with making note of some ldp/stp instructions from the blob
and how we differ from them. In the process of fixing it, I accidentally
modified behavior of other opcodes, and the other instructions listed will
keep us from doing that. I also dropped an old stl test that looks like I
took from after a shader 'end' instruction.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5815>
fd.o has retuned the x86 runners on packet for -j8. Rather than having to
tweak our CI every time fd.o decides to rebalance job concurrency, respect
what the runner admin has chosen for their builds (this will also be
convenient for people with large local runners).
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5669>
If the SPIR-V had a shared+image memory barrier, we would emit two NIR
barriers: a shared barrier and an image barrier.
Unlike a single barrier, two barriers allows transformations such as:
intrinsic image_deref_store (ssa_27, ssa_33, ssa_34, ssa_32, ssa_25) (1)
intrinsic memory_barrier_shared () ()
intrinsic memory_barrier_image () ()
intrinsic store_shared (ssa_35, ssa_24) (0, 1, 4, 0)
->
intrinsic memory_barrier_shared () ()
intrinsic store_shared (ssa_35, ssa_24) (0, 1, 4, 0)
intrinsic image_deref_store (ssa_27, ssa_33, ssa_34, ssa_32, ssa_25) (1)
intrinsic memory_barrier_image () ()
This commit fixes two dEQP-VK.memory_model.* CTS tests with ACO.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5951>
When the VRAM size is small and the preferred heap only VRAM,
the kernel tries to always honor the requested heap and it does
a ton of evictions which is a disaster for performance.
On APUs, VRAM and GTT have similar performance, so allow the
kernel to choose the best placement (GTT or VRAM) itself.
This gives a huge performance boost with Doom Eternal on RAVEN.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5665>
This fix si_compute_copy_image for yuyv image (so using PIPE_FORMAT_R8G8_R8B8_UNORM).
With this change, the following gst pipeline produce the expected results for various
image sizes (with or without AMD_DEBUG=nodma):
gst-launch-1.0 filesrc location=input.jpg ! jpegparse ! vaapijpegdec ! filesink location=output.yuv
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5841>
Properly handle the difference between split and merged register file
when determining where arrays can fit without conflicting with other
arrays or pre-colored instructions.
1) if not mergedregs, only consider other things with same precision
as potentially conflicting
2) if mergedregs, calculate everything in therms of half-regs and
convert back to fullregs in the end
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5957>
We shouldn't divide-by-two for half-reg arrays. We set the proper node
interference class, based on `arr->half`.
Fixes a RA fail with 16b arrays:
src/freedreno/ir3/ir3_ra.c:633: name_to_array: Assertion `!"invalid array name"' failed.
Caused by use/def iterators returning `arr->length` vreg namess, but
only assigning the array half that many names.
Also, since we are assigning unique vreg names to each array element,
there is no need to try and convert from half-reg to it's conflicting
full reg when pre-coloring the array elements. Getting us closer to
having half-arrays work sanely with split-register-file (a5xx and
earlier).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5957>
Fixes the following build error:
In file included from external/mesa/src/panfrost/encoder/pan_blit.c:34:
In file included from external/mesa/src/panfrost/encoder/../midgard/midgard_compile.h:27:
external/mesa/src/compiler/nir/nir.h:52:10: fatal error: 'nir_opcodes.h' file not found
^~~~~~~~~~~~~~~
1 error generated.
Fixes: 293f251871 ("panfrost: Use Midgard-specific reloads")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5961>
legalize_block() can get run multiple times, which I didn't notice when
adding fine derivs support. Other instruction clones change things such
that the legalization won't trigger again, but that didn't apply to the
DS.PP legalization. To keep someone else from tripping over this, split
the one-shot legalization out of the iterative sync flag application.
Fixes failures in dEQP-VK.glsl.derivate.dfdxfine.*
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3198
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5699>
However complicated DCC addressing is it is still based on tiles.
If we have the intra-tile offsets + tile dimensions we can expand
that to the full image ourselves.
Behavior around ~1080p on a 2500U:
old:
30-60 ms on every miss
new:
5 ms initally (miss in the tile cache)
<0.5 ms afterwards
The most common case is that the tile cache only contains data for
2 tiles, which for Raven/Renoir/Navi14 will be 4 KiB each, so the
size increase is fairly modest.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5865>
Normally, the linker will pull in any compilation unit (aka .c file) from
a static lib (such as our shared util code) that is depended on by the
code linking against it. Since that code is already compiled, the .text
section is allowed to jump anywhere in .text, and the compiler can't
garbage collect unused functions inside of a compile unit.
Teasing callgraphs apart so that normal compilation-unit-level GCing can
reduce driver size hurts the logical organization of the code and is
difficult. As an example, once I'd split the format pack/unpack tables, I
had to split out util_format_read/write() from util_format.c to avoid
pulling in pack/unpack. But even then it didn't help, because it turns
out turnip's pack calls pull in util_format_bptc.c for bptc packing, but
that file also includes the unpack impls, and those internally call
util_format_unpack, and thus we pulled in all of unpack. Splitting all of
this to separate files makes code harder to find and maintain, and is a
waste of dev time.
By setting these compiler flags, the compiler puts each function and data
symbol in a separate ELF section and the linker can then safely GC unused
text and data sections from a compile unit that gets pulled in. There's a
bit of a space cost due to having those separate sections, but it ends up
being a huge win in disk space on my personal release driver builds:
- i965_dri.so -213k
- x86 gallium dri.so -430k
- libvulkan_intel.so -272k
- aarch64 gallium dri.so -330k
- libvulkan_freedreno.so -783k
No difference on iris drawoverhead -compat -test 1 on my skylake (n=60)
Effect on debugoptimized build times (n=5)
touch nir_lower_io.c build time (bfd) +15.999% +/- 3.80377%
touch freedreno fd6_gmem.c build time (bfd) +13.5294% +/- 4.86363%
touch nir_lower_io.c build time (lld) no change
touch freedreno fd6_gmem.c build time (lld) +2.45375% +/- 2.2383%
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5739>
I see no reason to hide this. The small hit in cycle count is offset in
practice by the increase in thread count. So let's ship it and get some
testing.
If this regresses a workload:
1. Open an issue on the tracker and attach an apitrace.
2. In the meantime set PAN_MESA_DEBUG=nofp16 to override.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5960>
I haven't noticed tftp boot issues in last few days, not sure if they
where just a fluke on Mon or if it is somehow related to # of jobs we
run (ie. having more of the c630 runners powered up and running more
of the time).
Let's turn them back on and see what happens.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5952>
warning: converting a packed ‘instr_cf_t’ {aka ‘union <anonymous>’}
pointer (alignment 1) to a ‘uint16_t’ {aka ‘short unsigned int’} pointer
(alignment 2) may result in an unaligned pointer value
[-Waddress-of-packed-member]
We may know that we'll only ever have aligned instr_cf_ts, but gcc
doesn't.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5955>
Based on the colour buffers in use, we need to select a tile size
allowing either 128-bits of storage per pixel or 512-bits. Based on the
size chosen, we scale the offsets into the tilebuffer. Likewise, we need
to calculate offsets based on bpp (with special cases) rather than
picking an average case.
Fixes regressions that otherwise would be caused by the next commit.
v2: Fix colour clears (Icecream95).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5858>
pos offset only applies to the gl_FragPos input, when I refactored
I messed that up, only use pos_offset for the position inputs
and use 0.5 otherwise.
This fixes:
GTF-GL45.gtf30.GL3Tests.fragment_coord_conventions.fragment_coord_conventions_multisample
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5926>
Currently the test crashes with LLVM errors
Stored value type does not match pointer operand type!
store <8 x i32> %s_dst, <8 x i8>* %261
Change the stored type for 8-bit stencil formats.
Fixes:
GTF-GL45.gtf44.GL31Tests.texture_stencil8.texture_stencil8_gl44
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5926>
v2: Be more explicit about sampler types. Prefer the term "load" to
"resolve" to match VK convention. Generate shaders for MRT 8x. Blit
shader generation adds about 6ms to startup cost. We could cache thes.
shaders to disk if we needed to (or indeed, ship binaries).
v3: Fallback on u_blitter on Bifrost so Bifrost continues to work.
KHR_partial_update support is mostly no-oped on Bifrost now, but that's
okay for now - compositors are still functional.
v4: Specialize on multisample state as well to enable reloads of MSAA
textures. This requires 2x the shader variants, so I assume we're up to
12ms startup cost for generation. Annoying. Also fix interactions with
depth- or stencil-only clears of combined depth-stencil surfaces.
v5: Cache to the device (screen) instead of the context, reducing
duplicated work in apps that create many contexts (e.g. Chromium)
v6: Squash in KHR_partial_update cleanup to fix intermediate
regressions on a few tests.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5824>
Some of existing texture levels can be corruted,
after calling 'glTexImage' with param 'level' higher than
max expected value 'floor(log2(max(width, height, depth)))'.
To fix we prevent overwriting image buffer pointer
in 'st_texture_object', if it was already allocated
for multiple mip-levels storage.
Fixes piglit test: 'arb_copy_image add-illegal-levels'
Signed-off-by: Yevhenii Kharchenko <yevhenii.kharchenko@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5785>
When generating the VPM write instruction for geometry shader outputs,
emit_store_output_gs ends up adding the base and offset arguments
together with an ADD instruction. The addition was done at the VIR level
after scheduling so it always ends up right next to the corresponding
stvpm instruction. Most of the time the offset is constant but nothing
does any constant folding at the VIR level.
This patch makes it instead fold the addition into the offset at the NIR
level in v3d_nir_lower_io so that the NIR-level constant folding can get
rid of the addition most of the time.
v2: Use nir_iadd_imm to simplify the code. (Eric Anholt)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5825>
GL 4.6 changed error code for when the effective target of the
texture is illegal. Since it's not an illegal enum they modified
it to be an illegal operation.
However the CTS test for this is missing support for two cases,
I'm chasing that up, but I expect this will cause a CTS regression
for anyone who runs this test. I'm leaning on the side of being
compliant rather than passing the test until the test is fixed.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5896>
The fast log2 works better if rho is squared, i.e.
fast_log2(sqrt(2)) == 0.4
0.5 * fast_log2(2) == 0.5
so just square rho, and always divide by 2 afterwards.
Fixes:
GTF-GL45.gtf30.GL3Tests.sgis_texture_lod.sgis_texture_lod_basic_lod_selection
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5820>
The headers hadn't been regenerated from envytools in a long time,
and there were a few minor divergences.
Based on envytools commit c20929ed0f3be18b8419f7332ee22d905feb6589
Among other things, rnndb has changed naming to G80/etc, for now
I've not tackled switching that over and replaced the nvidia
codenames back to the chip ids that mesa uses with the following:
$ sed -i 's/G80_2D/NV50_2D/g' rnndb/graph/g80_2d.xml.h
$ sed -i 's/GF100_2D/NVC0_2D/g' rnndb/graph/g80_2d.xml.h
No other modifications of the headergen'd headers was done, which
was helped by the differing #define's being unutilised presently.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5920>
The commit below changed the rule such that it accidentally also applied
to the non-MR pipelines created by Marge, resulting in Marge triggering
twice as many jobs as necessary.
Fixes: 549b4a3dd4 "gitlab-ci: Automatically run pipelines for Marge
Bot pre-merge only"
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5898>
Real writeout stores, which break execution, need to be moved to after
dual-source stores, which are just standard register writes.
v2: Don't move stores forward, to avoid moving them to before where
their source is written.
v3: Only reorder past dual-source stores.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5620>
this requires some modifications to the ntv slot remapping, as we definitely
don't want to reserve another 25% of the available slots for the (deprecated)
texcoord varyings
now we remap VARYING_SLOT_TEX[n] to the last available slots, which lets us avoid
needing to do permanent reservation, and we check to make sure that we haven't
seen the corresponding texcoord varying any time we emit a non-texcoord varying in
that slot
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5551>
v2: move declarations into libdrm
v3: fix typos
rework handling of how much memory to reserve
v4: remove unused parameter
unmap cutout on error and when the screen is destroyed
v5: move into screen_create
enable HMM only if CL gets enabled
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5906>
The tu_cs_entry struct doesn't match well what we want for SET_DRAW_STATE
and CP_INDIRECT_BUFFER (requires extra steps to get iova and size), so
start phasing it out.
Additionally, use newly added tu_cs_draw_state where it doesn't require any
effort (it requires a fixed size, but gets rid of the extra end_sub_stream)
Note this also changes the behavior of CmdBindDescriptorSets for compute to
emit directly in cmd->cs instead of doing through a CP_INDIRECT.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5558>
Some testing showed that the DIRTY bit has the desired behavior, so use it
to make things a bit simpler.
Note in CmdBindPipeline, having the TU_CMD_DIRTY_DESCRIPTOR_SETS behind a
if(pipeline->layout->dynamic_offset_count) was wrong.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5558>
v2: move declarations into libdrm
v3: fix typos
rework handling of how much memory to reserve
v4: remove unused parameter
unmap cutout on error and when the screen is destroyed
v5: move into screen_create
enable HMM only if CL gets enabled
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4580>
I'm not convinced we'll actually want to use this, and there may be
another enable bit in SP_UNKNOWN_AB00, but it's nice to at least write
this down in case we want to try using it in the future.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5877>
It turns out that this clears CP_LOAD_STATE6 packets, including
disabling any pending loads for SS6_INDIRECT/SS6_BINDLESS (these loads
don't actually happen until the draw itself, and I'm not sure if they
happen if the state is unused by the shader) and marking constants and
UBO descriptors loaded with SS6_DIRECT as invalid. It's used very
differently from HLSQ_UPDATE_CNTL on a4xx from whence the name came, and
unlike on a4xx it's not readable, so this probably doesn't line up with
HLSQ_UPDATE_CNTL on a4xx.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5877>
Map all formats to a valid ifmt. FMT6_10_10_10_2_UNORM_DEST still
doesn't work on the blitter so keep that one on the u_blitter path.
Fixes
dEQP-GLES31.functional.draw_buffers_indexed.random.max_implementation_draw_buffers.*
with FD_MESA_DEBUG=nogmem.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5717>
We didn't know how to write layer id without GS, since that's the only way
to do it through VK/GL, and the blob didn't implement this clear case (and
failed cases where it was absolutely necessary). However now we know how to
set it after some educated guesses and looking at tess/geom traces, so the
GS path can be dropped.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5790>
v2. Define new helper function to avoid duplicated a pair of function calls.
v3. Move new helper functions to vk_object.h and call them.
v4. Merge 2 commits to use commomn base object type and struct into one.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5539>
as in the comment, while we may want to try verifying that discard will be
the last instruction in a block, it's a bit problematic given that other nir
passes we're doing may insert instructions after a discard as part of e.g.,
nir_opt_dead_cf in the process of removing another block
fixes shaders@glsl-fs-discard-04
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5852>
if the last instruction in a loop's body terminates a block, e.g., from
a nested loop with a jump as its final instruction, then no block will
have been started when returning to the original loop, and there's no need
to emit a branch
fixes shaders@glsl-vs-continue-in-switch-in-do-while
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5852>
inline a query result value to each query object so we can stash the partial
result just before we do a pool reset, which will always happen during the
suspend/resume query mechanism that swaps active queries from a flushed batch
to the next batch
once (or if) the "real" call to fetch query results is called, we can dump the
inlined value into the fetch value and return the full results
fixesmesa/mesa#3000
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5533>
xfb queries allocate vk buffer objects in the underlying driver which
can be deallocated while an xfb query is in-flight if we attempt to
defer it due to the way that gl xfb is translated to vk, so we need to
continue forcing this behavior in that case
for other query types, we can safely defer here until the current batch has
finished rather than block
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5533>
this hooks up query objects to the batches that they're actively running on
(and the related fence) in order to manage the lifetimes of queries more
efficiently by calling vkCmdResetQueryPool only on init and when the query
pool has been completely used up. additionally, this resolves some vk spec
issues related to destroying pools with active queries
note that any time a query pool is completely used up, results are lost,
which is a very slight improvement on the previous abort() that was triggered
in that scenario
ref mesa/mesa#3000
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5533>
Mali (and Vulkan) uses D3D naming conventions for these formats where
Gallium/Mesa uses OpenGL names, but the formats are equivalent. sRGB is
communicated out-of-band on Mali; otherwise, it appears to be a 1:1
mapping.
On supported devices, this exposes GL_EXT_texture_compression_rgtc and
GL_ARB_texture_compression_bptc, so update features.txt
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5856>
../src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp:69:8: warning: type ‘struct opProperties’ violates the C++ One Definition Rule [-Wodr]
69 | struct opProperties
| ^
../src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp:88:8: note: a different type is defined in another translation unit
88 | struct opProperties
| ^
../src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp:77:17: note: the first difference of corresponding definitions is field ‘fShared’
77 | unsigned int fShared : 3;
| ^
../src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp:96:17: note: a field with different name is defined in another translation unit
96 | unsigned int fImmd : 4; // last bit indicates if full immediate is suppoted
nvc0 code also has the same thing.
v2: rename both paths (Karol)
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5873>
We don't use linked texture/samplers. Fixes a bunch of CTS issues which
also seem to fail a bit randomly depending on what tests ran before and
such, so the list is incomplete.
Fixes:
KHR-GL46.texture_gather.*
KHR-GL46.compute_shader.resource-texture
KHR-GL46.multi_bind.dispatch_bind_samplers
KHR-GL46.multi_bind.dispatch_bind_textures
KHR-GL46.shading_language_420pack.binding_sampler_array
KHR-GL46.shading_language_420pack.binding_sampler_single
KHR-GL46.shading_language_420pack.binding_samplers
KHR-GL46.stencil_texturing.functional
KHR-GL46.texture_gather.incomplete-texture-last-comp
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5874>
Whenever a struct type is decorated Block or BufferBlock we turn that
into a GLSL_TYPE_INTERFACE. Since these decorations can end up random
places, we should allow them for constants.
Closes: #3252
Fixes: 9d0ae777dd "spirv: Use interface type for block and buffer..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5855>
This is required when the shader uses local memory for arrays or spills.
I really dislike how it's done right now, but oh well, it's the same for
other gens.
Fixes CTS tests:
KHR-GL46.shading_language_420pack.binding_image_array
KHR-GL46.shading_language_420pack.length_of_compute_result
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5840>
From the Kaby Lake PRM Vol. 7 "Assigning Conditional Flags":
* Note that the [post condition signal] bits generated at
the output of a compute are before the .sat.
Paragraph about post_zero does not mention saturation, but
testing it on actual GPUs shows that conditional modifiers
are applied after saturation.
* post_zero bit: This bit reflects whether the final
result is zero after all the clamping, normalizing,
or format conversion logic.
For signed types we don't care about saturation: it won't
change the result of conditional modifier.
For floating and unsigned types there two special cases,
when we can remove inst even if scan_inst is saturated: G
and LE. Since conditional modifiers are just comparations
against zero, saturating positive values to the upper
limit never changes the result of comparation.
For negative values:
(sat(x) > 0) == (x > 0) --- false
(sat(x) <= 0) == (x <= 0) --- true
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2610
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4167>
Even though the boolean can be divergent, the control flow can be (at
least partially) uniform. For example, we don't have to create any
s_andn2_b64/s_and_b64/s_or_b64 instructions with this code:
a = ...
loop {
b = bool_phi a, c
if (uniform)
break
c = ...
}
d = phi c
fossil-db (Navi):
Totals from 5506 (4.05% of 135946) affected shaders:
SGPRs: 605720 -> 604024 (-0.28%)
SpillSGPRs: 52025 -> 51733 (-0.56%)
CodeSize: 65221188 -> 64957808 (-0.40%); split: -0.41%, +0.00%
Instrs: 12637881 -> 12584610 (-0.42%); split: -0.42%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3388>
The compute shader dirtying is a bit wrong here, since we don't
have a second stage like for fragment shaders, so dirty the compute
shader whenever a sampler or image changes, (ssbo/contexts don't
needs this).
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5835>
It's been pointed out to me that determining whether a commit is present
in a stable branch is non-trivial (cherry-picks are a pain to search for)
and the commands are hard to remember, making it too much to ask.
This script aims to solve that problem; at its simplest form, it only
takes a commit and a branch and tells the user whether that commit
predates the branch, was cherry-picked to it, or is not present in any
form in the branch.
$ bin/commit_in_branch.py e58a10af64 fdo/20.1
Commit e58a10af64 is in branch 20.1
$ echo $?
0
$ bin/commit_in_branch.py dd2bd68fa6 fdo/20.1
Commit dd2bd68fa6 was backported to branch 20.1 as commit d043d24654
$ echo $?
0
$ bin/commit_in_branch.py master fdo/20.1
Commit 2fbcfe170bf50fcbcd2fc70a564a4d69096d968c is NOT in branch 20.1
$ echo $?
1
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5306>
With turnip we can have sparse input variables like:
decl_var shader_in INTERP_MODE_NONE float @1 (VERT_ATTRIB_GENERIC1.x, 1, 0)
decl_var shader_in INTERP_MODE_NONE float @2 (VERT_ATTRIB_GENERIC1.y, 1, 0)
decl_var shader_in INTERP_MODE_NONE float @3 (VERT_ATTRIB_GENERIC1.w, 1, 0)
Example of a test fixed:
dEQP-VK.glsl.440.linkage.varying.component.vert_in.vec2.as_float_float_unused
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5818>
If INTEL_DEBUG=no16 is used, then simd16 will not be attempted. This,
in turn prevents simd32 from running, because we attempt to skip
simd32 when simd16 fails to compile.
This change more accurately recognizes when we attempted simd16, but
simd16 failed.
One easy way to cause an issue is to set both no8 and no16. Before
this change, we would be left with no FS program, even though simd32
could still be generated in some cases.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5269>
If no16 was specified, and the shader can't run in simd8 due to the
local_size, then we need to generate a simd32 program.
If both no8 and no16 are specified, then we need to generate a simd32
program.
Rework:
* Drop update of `if` that would have changed `do32` to try simd32
even if simd16 spilled registers. (Caio)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5269>
the new ci-fairy minio on ci-templates can copy
data to/from the MinIO server with much less permissions.
Upgrading mesa to this commit will allow us to restrict the
git-cache bucket permission to only "fetch" objects, i.e.
not allow anybody to walk through the tree of any repo.
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5804>
Mark support for Panfrost with the PAN_MESA_DEBUG=gles3 flag set (which
exposes a few buggier features for GLES 3.0, but we're actually quite
close to conformance. I expect this to become default in a few weeks),
based on what's supported for Mali T860 (our flagship). Less features
are supported on Mali T720 due to h/w limitations, and Bifrost support
is very much still in the pipes but will support all this soon enough.
Closes: #255
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5791>
Otherwise a VS doing the following:
out gl_PerVertex {
vec4 gl_Position;
int gl_ViewportIndex;
};
cannot be compiled because of the following error:
"redeclaration of gl_PerVertex must be a subset of the built-in
members of gl_PerVertex"
v2: add GLSL_PRECISION_HIGH param to add_varying() for "gl_Layer" in
generate_fs_special_vars.
v3: add GLSL_PRECISION_HIGH param to add_varying() for "gl_Layer" in
generate_varyings.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2946
Tested-by: John Galt <johngalt@fake.mail>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v3)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v3)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5167>
EXT_shader_image_load_store says:
The format of the image unit must be in the "1x32" equivalence class
otherwise the atomic operation is invalid.
ARB_shader_image_load_store says:
We will only support 32-bit atomic operations on images
Fixes: fc0a2e5d01 ("glsl: add EXT_shader_image_load_store new image functions")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5688>
It's something that was added to ease development, but that was supposed
to be removed before merging.
It also causes problems when arm-related jobs aren't enabled, as
arm_build is needed by these jobs but in that case isn't there.
Also extend from .ci-run-policy.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5802>
From the Vulkan spec 1.2.146:
"VK_ACCESS_MEMORY_READ_BIT specifies all read accesses. It is
always valid in any access mask, and is treated as equivalent
to setting all READ access flags that are valid where it is
used."
"VK_ACCESS_MEMORY_WRITE_BIT specifies all write accesses.
It is always valid in any access mask, and is treated as
equivalent to setting all WRITE access flags that are valid
where it is used."
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3241
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5807>
Otherwise LLVM does not see the pointers as allowing speculative
loads.
The pipeline-db results are pretty wild, but mostly what is to be
expected from allowing more code movement in LLVM:
Totals from affected shaders:
SGPRS: 157728 -> 168336 (6.73 %)
VGPRS: 158628 -> 158664 (0.02 %)
Spilled SGPRs: 10845 -> 24753 (128.24 %)
Spilled VGPRs: 13 -> 13 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 8 -> 8 (0.00 %) dwords per thread
Code Size: 17189180 -> 17313712 (0.72 %) bytes
LDS: 204 -> 204 (0.00 %) blocks
Max Waves: 5700 -> 5687 (-0.23 %)
Wait states: 0 -> 0 (0.00 %)
This gives some boosts for shaders we can move a descriptor load
outside a loop.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3159>
This updates a3xx/a4xx/a5xx to fix the fetchsize to "PITCHALIGN" (called
"MINLINEOFFSET" by the a3xx docs), and some simplifications to make things
more like a6xx. Also similar simplifications for a2xx layout code.
The pitch can always be determined using a simple calculation from the base
level pitch, so don't pre-calculate a pitch for each mipmap level.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5796>
I want the docs to be discoverable next to the code, and sphinx insists
that all docs are under the top-level docs dir (sigh). We can't symlink
from that dir to .gitlab-ci because windows builds can't do symlinks, so
link back the other direction.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5510>
In the case where SSA use/def chains are broken, NIR prints out a very
cryptic error and then aborts. This abort happens during validation
rather than after the print is complete, hiding any other errors that
may have been found. One might think, "So what? Fix your use/def issue
first." However, what makes this especially bad is that, when use/def
chains are broken, there's usually a much nicer error inline in the
shader that would have been printed had we not aborted early so the
current behavior simply ensures you get the most cryptic error possible
in an already difficult-to-debug case.
While we're at it, we remove the one other case of abort() which is in
the validation of phi instruction sources.
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Tested-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5809>
according to EXT_multiview_draw_buffers, gl_FragColor outputs to all available
render targets when used, so we need to translate this to gl_FragData[PIPE_MAX_COLOR_BUFS]
in order to correctly handle more than one color buffer attachment
this fixes the rest of spec@arb_framebuffer_object tests in piglit
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5687>
ntq_setup_fs_inputs and ntq_setup_gs_inputs sort the inputs according to
the driver location. This input array is then used to calculate the VPM
offset for the outputs in the previous stage. However, it wasn’t taking
into account variables that are packed into a single varying slot. In
that case they would have the same driver_location and are
distinguished by location_frac.
This patch makes it additionally sort by location_frac when the driver
locations are equal. This can happen when the compiler packs varyings
that are sized less than vec4. Without this fix, when the VPM is used to
transmit data free-form between the stages (such as VS->GS) then it
would end up writing to inconsistent locations.
Fixes dEQP tests such as:
dEQP-GLES31.functional.primitive_bounding_box.lines.global_state.
vertex_geometry_fragment.default_framebuffer_bbox_equal
Fixes: 5d578c27ce ("v3d: add initial compiler plumbing for geometry shaders")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5787>
gcc is not smart enough to see that
enum pipe_format dst_fmt;
...
switch (data_size) {
case 16:
dst_fmt = PIPE_FORMAT_R32G32B32A32_UINT;
...
break;
case 12:
/* RGB32 is not a valid RT format. This will be handled by the pushbuf
* uploader.
*/
break;
case 8:
dst_fmt = PIPE_FORMAT_R32G32_UINT;
...
break;
case 4:
dst_fmt = PIPE_FORMAT_R32_UINT;
...
break;
case 2:
dst_fmt = PIPE_FORMAT_R16_UINT;
...
break;
case 1:
dst_fmt = PIPE_FORMAT_R8_UINT;
break;
default:
assert(!"Unsupported element size");
return;
}
...
if (data_size == 12) {
...
return;
}
Does not result in dst_fmt being uninitialized when it is used so
lets just initialise it to silence the warning.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5766>
When executing for a single primitive, the mask has only one active
lane, however the vertex emit emits for all the lanes, pass in
the active mask and write the excess lanes to the overflow slot.
Fixes:
glsl-1.50-gs-max-output -scan 1 20
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5555>
The current code runs primitives per invocation, but the spec wants
invocations per primitive. However it means having to flush
after each invocation to get correct XFB behaviour
Fixes:
GTF-GL41.gtf40.GL3Tests.transform_feedback3.transform_feedback3_geometry_instanced
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5555>
For correct XFB queries all streams must get primitive lengths
recorded. This allocates larger memory for per-stream lengths
and the shader write into them.
Fixes:
GTF-GL41.gtf40.GL3Tests.transform_feedback3.transform_feedback3_streams_queried
GTF-GL41.gtf40.GL3Tests.transform_feedback3.transform_feedback3_streams_overflow
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5555>
There may be non-stream 0 emitted primitives that have to be processed.
Fixes:
KHR-GL42.transform_feedback_overflow_query_ARB.multiple-streams-one-buffer-per-stream
KHR-GL42.transform_feedback_overflow_query_ARB.multiple-streams-multiple-buffers-per-stream
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5555>
It will return the same table every time with no other side effects, so we
want it to be CSEed. Saves 3.5k on my aarch64 GL drivers, almost 9k on
turnip, but weirdly increases my x86 GL driver collection by ~3k.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5728>
Fill the global bo will all possible shaders for 3D clear/blit. Note the
global bo size is still <4k (so this doesn't cost any extra memory), this
saves having to allocate shaders in sub_cs everytime the 3D path is used.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5776>
Place the kernel and ramdisk into a place in the file server so the URL
will only change when the contents also change.
Also put the Mesa build into a separate tarball so the ramdisk's
contents don't change every build.
With proper caching in place, all devices in the same farm need only to
download the mesa tarball once, saving time.
As we switch to MinIO for making kernels and rootfs available to LAVA
devices, we can stop using Docker to distribute them.
Instead, build when needed in separate jobs that push directly to MinIO,
from where LAVA devices can download them.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5515>
If we clear depth only texture via glClearTex(Sub)Image it may cause:
../src/intel/blorp/blorp_genX_exec.h:1554: blorp_emit_surface_states: Assertion `params->depth.enabled || params->stencil.enabled' failed.
due to clear_depth_stencil calling blorp_clear_depth_stencil when
depth is already fast-cleared and there is no stencil.
Fixes piglit test: arb_clear_texture-depth
Fixes: 51638cf18a
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5770>
In some places in SWR cod objects are initialized using
memset/memcpy. This is usually done to enable
allocating those objects in aligned memory.
It generates compilation warnings though,
which are worked around by casting the pointers to void*
before calling memset/memcpy.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5777>
For example only some UCPs may be used by the shader, triggering asserts
that too many consts are being uploaded.
While we're at it, also fix the const size when loading UCPs, since
otherwise it doesn't correspond to what the shader is actually using.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5752>
Don't lower to offsets, instead use nir_lower_explicit_io here and
use actual pointers for UBO's and SSBO's. This makes
KHR_variable_pointers trivial. This also fixes asserts with shared
variables, which are now supposed to be lowered with
nir_lower_explicit_io.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5684>
V3D has a bit to set the line caps to be perpendicular to the line
rather than aligned to the edges of the framebuffer. I don’t know what
the disadvantages are of enabling this, but I noticed by experimentation
that enabling line smoothing on the Intel driver also enables nicer line
caps, so it seems nice to enable it here too.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>
When line smoothing is enabled, the driver now increases the width of
the line so that it can add some semi-transparent pixels to either side
of the line. A lowering pass is added which modifies the alpha component
of every write to fragment output 0 so that if the fragment is outside
the width of the line then the alpha is reduced. It additionally
discards fragments that are completely invisible. It might seem bad to
use discard on a tiled renderer but the assumption is that any bad
effects from using discard will also happen anyway because of enabling
alpha blending.
v2: Disable the line smoothing pass entirely when the framebuffer
contains an integer colour output or one with no alpha channel.
Calculate the coverage once upfront and store in a global variable
instead of calculating each time an output write is modified. Also
do the conditional discard once upfront.
v3: Don’t check whether the output buffer has an alpha channel. Only
look at output 0. Use aa_line_width intrinsic instead of calculating
the real line width in the shader. Clamp the coverage as part of the
global variable, not per output write.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>
The first intrinsic is intended to expose the value set by glLineWidth
to shaders internally. The second intrinsic exposes the value actually
sent to the hardware. This may be wider than the first one in order to
implement anti-aliasing. These will be used in later patches to
implement a line smoothing lowering pass.
v2: Add a second intrinsic for the expanded line width for
anti-aliasing.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>
perf_cfg is enough - it already contains almost all necessary
information and is constructed in a more optimal way (O(n) vs O(n^2)
- it uses hash table to build the unique counter list).
"Almost all", because it doesn't contain OA raw counters, but
we should have not exposed them anyway. Quoting Mark Janes:
"I see no reason to include the OA raw counters in the list that
are provided to the user. They are unusable.
The MDAPI library can be used to configure raw counters in a way
that provides esoteric metrics, but that library is written against
INTEL_performance_query."
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5399>
For some reason, in order to get all tests to pass, pretty much all
hardware (across vendors) has to program in offset_units * 2. This fixes
dEQP-GLES3.functional.polygon_offset.float32_displacement_with_units.
While we're at it, add polygon offset clamp support.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5763>
For turnip, we use the "bindless" model on a6xx. Loads and stores with
the bindless model require a bindless base, which is an immediate field
in the instruction that selects between 5 different 64-bit "bindless
base registers", a 32-bit descriptor index that's added to the base, and
the usual 32-bit offset. The bindless base usually, but not always,
corresponds to the Vulkan descriptor set. We can handle the case where
the base is non-constant by using a bunch of if-statements, to make it a
little easier in core NIR, and this seems to be what Qualcomm's driver
does too. Therefore, the pointer format we need to use in NIR has a vec2
index, for the bindless base and descriptor index. Plumb this format
through core NIR.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5683>
Previously we would match the start of the compatible string, in
a couple of cases, in order to match compatible strings like
"qcom,adreno-630.2". But these cases would always list a more
generic compatible (ie. "qcom,adreno") as a later choice. So if
we parse all the compatible strings, we can do a more precise
exact match.
This avoids us accidentially matching on "qcom,adreno-smmu" and
the hilarity that ensues.
Fixes: 5a13507164 ("freedreno/perfcntrs: add fdperf")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5762>
This was causing vkAcquireNextImageKHR to not signal the fences and
semaphores. In the case where the semaphore was brand new, this could
cause an unsignalled syncobj to be passed into execbuffer2 which it will
reject with -EINVAL leading to VK_ERROR_DEVICE_LOST. Thanks to Henrik
Rydgård who works on the PPSSPP project for helping me figure this out.
Fixes: ca3cfbf6f1 "vk: Add an initial implementation of the actual..."
Fixes: 778b51f491 "vulkan/wsi: Add a hooks for signaling semaphores..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5672>
The previous version assumes tess level outputs will only be written once
in the shader, however its not possible to guarantee that.
It also assumes all invocations will write all the levels, which is also
not guaranteed.
This is required to fix the "tesselation" and "terraintessellation" demos
with turnip.
The comment about nir_lower_io_to_temporaries in lower_tess_ctrl_block is
removed because nir_lower_io_to_temporaries specifically skips TESS_CTRL
shaders so the comment doesn't make sense.
The split load for tess levels workaround is removed, the new version only
has scalar access unless if ever gets vectorized.
This sets NIR_COMPACT_ARRAYS cap to avoid the glsl tess vec lowering with
gallium. It seems this will also disable "LowerCombinedClipCullDistance",
which I'm not sure was needed or not.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5744>
Accidentally broke this when rebasing the offending commit.
My use case with non-zero explicit offset is UV plane of UBWC NV12, and
only the UBWC slice offset is used for the UBWC sampler, so I didn't catch
it immediately.
Fixes: d53dc6c376 ("freedreno/fdl6: rework layout code a bit (reduce linear align to 64 bytes)")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5761>
Some Piglit tests (rightfully) fail because of min >= max when exposed
to perf counters that do not explicitly define their max value.
Failing tests:
spec/amd_performance_monitor/api/test_counter_info
spec/amd_performance_monitor/vc4/test_counter_info
u32/u64 changes are no-ops.
Fixes: 4cd1cfb983 ("st/mesa: implement GL_AMD_performance_monitor")
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5473>
ir3 already calculates the stride in the tess param bo, so use that instead
of a incorrect calculation. The calculation of per_vertex_output_size /
per_patch_output_size is wrong because it counts dwords instead of bytes,
and what it counts for per_vertex_output_size is a per-patch size because
the glsl type is already an array of # vertex/patch elements.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5743>
XXH32 is doing access through u32 *, and with strict aliasing the compiler
gets to assume that those are independent of the u16 writes we did in
fd6_texture_key setup, and based on various tweaks to the code, would
result in bad hashes computed after inlining. The failure was:
../src/util/hash_table.c:326:_mesa_hash_table_search_pre_hashed: Assertion
`ht->key_hash_function == ((void *)0) || hash == ht->key_hash_function(key)'
failed.)
By setting these two flags, we always take the unaligned,
memcpy-the-32-bit-data path. I believe this should be same perf on x86
(which will happily unaligned load 32 bits in the end), while it will be
slower on arm (where you have to a special unaligned load operation iirc).
This should still be far faster than our old hash.
Fixes: edd62619a1 ("freedreno: replace fnv1a hash function with xxhash")
Acked-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5271>
The old code updated the viewport index on the first vertex in
a primitive, however it was picking the first vertex wrong
when used with geometry shaders.
This code has access to the prim info with the primitive lengths
so instead keep track of when a new primitive starts by tracking
the lengths and updating the viewport index then. The prim info
is only valid after a GS or prim assembly, so enable prim assembly
if a vertex shader ever uses viewport index.
This fixes:
piglit arb_viewport_array-render-viewport-2
KHR-GLES31.core.viewport_array.draw_to_single_layer_with_multiple_viewports,Fail
KHR-GLES31.core.viewport_array.draw_mulitple_viewports_with_single_invocation,Fail
KHR-GLES31.core.viewport_array.draw_multiple_layers,Fail
KHR-GLES31.core.viewport_array.depth_range,Fail
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5489>
* Remove scratch_bo from cmdbuffer, use a device-global bo instead, which
also includes border color (and eventually shaders for 3D blit path)
* Use CP_SET_BIN_DATA5_OFFSET to allow setting VSC buffer addresses only
once at the start of the cmdstream
* Use scratch bo mechanism for a resizable VSC buffer
* Use feedback from "vsc_draw_overflow" and "vsc_prim_overflow" values to
increase the size of VSC buffer when beginning to record a new cmdbuffer
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5570>
Loop through pipes and then loop over the tiles in that pipe instead of
looping over all tiles then having to calculate the pipe # and slot #.
Mainly this avoids the hard to follow "config_get_tile" logic, but should
also be a gain due to better use of cache with the VSC data.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5570>
Compute the tiling config at framebuffer creation time. A framebuffer will b
be re-used multiple times, so this will avoid having to re-calculate the
tiling config every time a command buffer is recorded.
The tiling config already couldn't use the render area's x1/y1 because of
hw binning, this move makes it so the render area isn't used at all for the
tiling config.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5570>
When fetching for cube maps, we need to interpret them as 2d texture
arrays, being the third coordinate the index for the face.
Fixes Vulkan CTS tests like the following using v3dv:
dEQP-VK.binding_model.shader_access.primary_cmd_buf.storage_image.fragment.single_descriptor.cube_base_mip
dEQP-VK.binding_model.shader_access.primary_cmd_buf.storage_image.compute.multiple_descriptor_sets.multiple_contiguous_descriptors.cube_array_base_mip
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5675>
Over the last 7 days, git pulls represented a total of 1.7 TB.
On those 1.7 TB, we can see:
- ~300 GB for the CI farm on hetzner
- ~730 GB for the CI farm on packet.net
- ~680 GB for the rest of the world
We can not really change the rest of the world*, but we can
certainly reduce the egress costs towards our CI farms.
Right now, the gitlab runners are not doing a good job at
caching the git trees for the various jobs we make, and
we end up with a lot of cache-misses. A typical pipeline
ends up with a good 2.8GB of git pull data. (a compressed
archive of the mesa folder accounts for 280MB)
In this patch, we implemented what was suggested in
https://gitlab.com/gitlab-org/gitlab/-/issues/215591#note_334642576
- we host a brand new MinIO server on packet
- jobs can upload files on 2 locations:
* git-cache/<namespace>/<project>/<branch-name>.tar.gz
* artifacts/<namespace>/<project>/<pipeline-id>/
- the authorization is handled by gitlab with short tokens
valid only for the time of the job is running
- whenever a job runs, the runner are configured to execute
(eval) $CI_PRE_CLONE_SCRIPT
- this variable is set globally to download the current cache
from the MinIO packet server, unpack it and replace the
possibly out of date cache found on the runner
- then git fetch is run by the runner, and only the delta
between the upstream tree and the local tree gets pulled.
We can rebuild the git cache in a schedule job (once a day
seems sufficient), and then we can stop the cache miss
entirely.
First results showed that instead of pulling 280MB of data
in my fork, I got a pull of only 250KB. That should help us.
* arguably, there are other farms in the rest of the world, so
hopefully we can change those too.
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5428>
Dot product is multiplication followed by addition, and absolute value
does not distribute into addition.
Only vec4 platforms are affected by this change as scalar-only platforms
never have any of the fdot_replicated instructions. In the shader-db
results, below, shaders in MANY different applications are affected.
Trine, Doom3, Enemy Territory: Quake Wars, Counter Strike: Global
Offensive, Mad Max, Metro Last Light, and on and on... I'm really
shocked that there were no test regressions!
All Haswell and earlier platforms had similar results. (Haswell shown)
total instructions in shared programs: 16219743 -> 16219820 (<.01%)
instructions in affected programs: 12171 -> 12248 (0.63%)
helped: 1
HURT: 78
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.78% max: 0.78% x̄: 0.78% x̃: 0.78%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.35% max: 2.38% x̄: 0.91% x̃: 1.06%
95% mean confidence interval for instructions value: 0.92 1.03
95% mean confidence interval for instructions %-change: 0.78% 1.00%
Instructions are HURT.
total cycles in shared programs: 538481383 -> 538491045 (<.01%)
cycles in affected programs: 470796 -> 480458 (2.05%)
helped: 149
HURT: 142
helped stats (abs) min: 1 max: 1338 x̄: 71.13 x̃: 4
helped stats (rel) min: 0.06% max: 40.99% x̄: 2.76% x̃: 0.67%
HURT stats (abs) min: 1 max: 2092 x̄: 142.68 x̃: 12
HURT stats (rel) min: 0.07% max: 55.38% x̄: 5.07% x̃: 1.07%
95% mean confidence interval for cycles value: -5.28 71.69
95% mean confidence interval for cycles %-change: -0.07% 2.19%
Inconclusive result (value mean confidence interval includes 0).
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes: 62795475e8 ("nir/algebraic: Distribute source modifiers into instructions")
Closes: #3129
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5581>
We can end up with bad dependencies with a depth/stencil export. Let's
let the writeout special cases consume these values if possible, using a
move otherwise in which case it won't be used in the other slots anyway.
total instructions in shared programs: 50508 -> 50507 (<.01%)
instructions in affected programs: 12 -> 11 (-8.33%)
helped: 1
HURT: 0
total bundles in shared programs: 25640 -> 25640 (0.00%)
bundles in affected programs: 0 -> 0
helped: 0
HURT: 0
total quadwords in shared programs: 40899 -> 40899 (0.00%)
quadwords in affected programs: 0 -> 0
helped: 0
HURT: 0
total registers in shared programs: 3917 -> 3916 (-0.03%)
registers in affected programs: 3 -> 2 (-33.33%)
helped: 1
HURT: 0
total threads in shared programs: 2455 -> 2455 (0.00%)
threads in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 168 -> 168 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0
total fills in shared programs: 186 -> 186 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5513>
"when: on_success" meant that that the job wouldn't run automatically
until all jobs of all earlier stages passed, and would be skipped if
any of them failed. But we need to always run this job if any
documentation files were modified.
Fixes: 8e2cb8ef27 "gitlab-ci: Extend .ci-run-policy template for docs
jobs"
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5711>
When set, EXPAND_LINE_WIDTH expands the line width by 1/cos(a),
where a is the minimum angle from horizontal or vertical. This
seems required by OpenGL line rasterization but not by Vulkan.
Similar to what AMDVLK and AMDGPU-PRO do for AA wide lines.
This fixes
dEQP-VK.rasterization.interpolation_multisample_*_bit.*lines_wide.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5698>
This isn't fully free of bugs, but it's good to get CI working,
so fixing those bugs doesn't break anything.
The main buggy areas are missing indirect texture size,
and transform feedback geometry streams.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3778>
We generate bitfields of bits that we want to retain (mask) and bits
that we want to set (brw_mode) in the cr0 register, so the bits we want
to set should be in the set of bits we want to retain.
Also, remove the initialization of mask from
fs_visitor::emit_shader_float_controls_execution_mode since
brw_rnd_mode_from_nir initializes the mask parameter unconditionally.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5566>
Only run the job automatically for Marge Bot, otherwise let it be
triggered manually.
v2:
* Never run this job for the main project, since it's only needed in
pre-merge pipelines.
* Add comment explaining that cases not covered by explicit rules
default to "when: never".
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5469>
This code was broken, but it worked by accident, as the
pad and the edgeflag were reversed, however when Roland
removed the cliptest field back in 2015, he stopped copying
the pad which actually stopped copy the edgeflag.
Fix the function to actually copy the edgeflag.
Fixes: 1b22815af6 ("draw: don't pretend have_clipdist is per-vertex")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5679>
If we have code like:
('f2f16', ('vec2', ('f2f32', 'a@16'), '#b@32'))
We would like to eliminate the conversions, but the existing rules can't
see into the the (heterogenous) vector. So instead of trying to
eliminate in one pass, we add opts to propagate the f2f16 into the
vector. Even if nothing further happens, this is often a win since then
the created vector is smaller (half2 instead of float2). Hence the above
gets transformed to
('vec2', ('f2f16', ('f2f32', 'a@16')), ('f2f16', '#b@32'))
Then the existing f2f16(f2f32) rule will kick in for the first component
and constant folding will for the second and we'll be left with
('vec2', 'a@16', '#b@16')
...eliminating all conversions.
v2: Predicate on !options->vectorize_vec2_16bit. As discussed, this
optimization helps greatly on true vector architectures (like Midgard)
but wreaks havoc on more modern SIMD-within-a-register architectures
(like Bifrost and modern AMD). So let's predicate on that.
v3: Extend for integers as well and add a comment explaining the
transforms.
Results on Midgard (unfortunately a true SIMD architecture):
total instructions in shared programs: 51359 -> 50963 (-0.77%)
instructions in affected programs: 4523 -> 4127 (-8.76%)
helped: 53
HURT: 0
helped stats (abs) min: 1 max: 86 x̄: 7.47 x̃: 6
helped stats (rel) min: 1.71% max: 28.00% x̄: 9.66% x̃: 7.34%
95% mean confidence interval for instructions value: -10.58 -4.36
95% mean confidence interval for instructions %-change: -11.45% -7.88%
Instructions are helped.
total bundles in shared programs: 25825 -> 25670 (-0.60%)
bundles in affected programs: 2057 -> 1902 (-7.54%)
helped: 53
HURT: 0
helped stats (abs) min: 1 max: 26 x̄: 2.92 x̃: 2
helped stats (rel) min: 2.86% max: 30.00% x̄: 8.64% x̃: 8.33%
95% mean confidence interval for bundles value: -3.93 -1.92
95% mean confidence interval for bundles %-change: -10.69% -6.59%
Bundles are helped.
total quadwords in shared programs: 41359 -> 41055 (-0.74%)
quadwords in affected programs: 3801 -> 3497 (-8.00%)
helped: 57
HURT: 0
helped stats (abs) min: 1 max: 57 x̄: 5.33 x̃: 4
helped stats (rel) min: 1.92% max: 21.05% x̄: 8.22% x̃: 6.67%
95% mean confidence interval for quadwords value: -7.35 -3.32
95% mean confidence interval for quadwords %-change: -9.54% -6.90%
Quadwords are helped.
total registers in shared programs: 3849 -> 3807 (-1.09%)
registers in affected programs: 167 -> 125 (-25.15%)
helped: 32
HURT: 1
helped stats (abs) min: 1 max: 3 x̄: 1.34 x̃: 1
helped stats (rel) min: 20.00% max: 50.00% x̄: 26.35% x̃: 20.00%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 16.67% max: 16.67% x̄: 16.67% x̃: 16.67%
95% mean confidence interval for registers value: -1.54 -1.00
95% mean confidence interval for registers %-change: -29.41% -20.69%
Registers are helped.
total threads in shared programs: 2471 -> 2520 (1.98%)
threads in affected programs: 49 -> 98 (100.00%)
helped: 25
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.96 x̃: 2
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for threads value: 1.88 2.04
95% mean confidence interval for threads %-change: 100.00% 100.00%
Threads are [helped].
total spills in shared programs: 168 -> 168 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0
total fills in shared programs: 186 -> 186 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4999>
Commit e369b8931c ("freedreno: Use explicit *_NONE enum for undefined
formats") only partially converts ~0 to *_NONE enum. It breaks texture
support, and glmark2 texture scene gives a black screen.
Adding the missing conversion of ~0 to *_NONE enum fixes the issue.
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5693>
Phase two of our network reconfiguration is happening this afternoon, so
we need to drop our RK3399 out for a little while. (Part of this
reconfiguration is to shard our devices across networks and racks, so
losing one part of our infrastructure doesn't mean losing any particular
device type.)
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5689>
Build job artifacts capture Meson logs from _build, so we can analyse
what Meson did during configuration, as well as the full output of any
test jobs.
We were previously calling our build directory 'build', which meant it
wouldn't have been captured by the artifacts, and we were also deleting
it to make really sure there was no chance of logs getting captured
either.
Rename the build directory to '_build' to match the others, and don't
delete it either, so we can keep our configure/test logs.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5689>
GLSL shares functionality with ARB_vertex_program but the GLSL
spec defines the gl_LightSource builtin with a member order that
is different from the packing expected in ARB_vertex_program.
This difference introduces a need for specialist lowering code
when handling builtin structs that is not required for normal
uniform structs due to member location mismatches.
Since gl_LightSource can't be redefined it shouldn't matter if
we add the members in the order listed in the spec, just so long
as we add them all. So here we rearrange the definition of the
glsl builtin to reflex our internal layout and that of
ARB_vertex_program. This required for the following patch.
CC: <stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5656>
If the underlying X11 window gets destroyed, the event we're waiting
for may never be delivered, in which case xcb_wait_for_special_event
would hang indefinitely.
Solution:
1. Use xcb_poll_for_special_event to check if an event has arrived yet.
2. If not, Wait up to ~1s for XCB's file descriptor to become readable;
if it does, go back to step 1.
3. If the file descriptor didn't become readable, make a round-trip to
the X server to check that the window still exists. Go back to step
1 if it does, otherwise bail.
Also add an early bail-out when it's known that the window was
destroyed.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/116
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5368>
Before, if one thread ended up waiting in dri3_wait_for_event_locked
and another one in loader_dri3_wait_for_msc at the same time, one thread
could end up processing an event the other thread was waiting for, which
could result in the latter thread waiting longer than necessary
(possibly indefinitely).
Noticed by inspection.
v2:
* Drop xcb_flush call from loader_dri3_wait_for_msc in favour of the one
in dri3_wait_for_event_locked (Kenneth Graunke)
Fixes: 7b0e8264dd "loader/dri3: Try to make sure we only process our
own NotifyMSC events"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5368>
This is not completely tested, but matches the max array pitch allowed by
A6XX_TEX_CONST_9_FLAG_BUFFER_ARRAY_PITCH.
Note this still doesn't allow all image sizes, but it allows 16384x16384
cpp=4 images to work.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5678>
SPI_SHADER_COL_FORMAT allocates export memory and CB_SHADER_MASK
map them to higher MRTs if necessary. The hardware allows to remap
MRTs to avoid holes somehow.
For example, if we have a scenario where MRT0 is unused and only
MRT1 and MRT2 are used, SPI_SHADER_COL_FORMAT is 0x77 and
CB_SHADER_MASK/CB_TARGET_MASK are 0x770 (this assumes
SPI_SHADER_UINT16_ABGR is set).
This allows us to remove one workaround that was added for fixing
GPU hangs with DXVK. I think this is because SPI_SHADER_COL_FORMAT
expects contiguous MRTs to be allocated.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5434>
The code was expecting the lower 32-bits of the 64-bit to be
what it wanted, don't be implicit, pull the value from the union.
This should fix rendering on big endian systems since NIR was
introduced.
Fixes: 44a6b0107b ("gallivm: add nir->llvm translation (v2)")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5677>
resinfo always writes 3 components, which was not being taken into account
Fixes these tests:
dEQP-VK.renderpass.suballocation.attachment_sparse_filling.input_attachment_3
dEQP-VK.renderpass.suballocation.attachment_sparse_filling.input_attachment_7
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5674>
These were accidentally dropped when cleaning up the TOC, making links to
them dead. Because we used plain links, sphinx didn't inform us that
these became dead. Let's restore them.
Fixes: 14f2a81b6f ("docs: drop open-coded toc for articles")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5671>
So it could be used by both the OpenGL and the Vulkan driver.
In addition to the move, some small changes were needed to be made on
the API. For example, the simulator was receiving v3d_screen on
initialization, and that code setted v3d_screen->sim_file. Now it
returns the new sim_file created.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5666>
tu_ImportFenceFdKHR is used by tu_AcquireImageANDROID, which may or
may not work, but let's at least keep things compiling until somebody
has time to tie up the loose ends on the Android side.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5670>
Chris Wilson noted that u_default_texture_subdata's transfer path
sometimes results in wasteful double copies. This patch is based
on an earlier path he wrote, but updated now that we have staging
blits for busy or compressed textures.
Consider the case of idle, non-CCS-compressed, tiled images:
The transfer-based CPU path has to return a "linear" mapping, so upon
map, it mallocs a temporary buffer. u_default_texture_subdata then
copies the client memory to this malloc'd buffer, and transfer unmap
performs a tiled_memcpy to copy it back into the texture. By writing
a direct texture_subdata() implementation, we're able to directly do
a tiled_memcpy from the client memory into the destination texture,
resulting in only one copy.
For linear buffers, there is no advantage to doing things directly, so
we simply fall back to u_default_texture_subdata()'s transfer path to
avoid replicating those cases.
We still may want to use GPU staging buffers for busy destinations
(to avoid stalls) or CCS-compressed images (to compress the data),
at which point we also fall back to the existing path. We thought
to try and use a tiled temporary, but this didn't appear to help.
Improves performance in x11perf -shmput500 by 1.96x on my Icelake.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2500
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3818>
drmCommandWriteRead gives us a -errno, and we only checked for -1 (-EPERM,
incidentally). All the callers wanted 0 for errors, which they were
getting by the fact that req.value was 0-initialized in our stack
allocation (though this only works as long as the kernel doesn't return an
error after setting req.value to something), and -EPERM not really being
an answer we would expect from an ioctl at this stage in the driver.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2769>
There is only one queue for now, so for non-shared semaphores, the
implementation is basically a no-op. For shared semaphores, this
always uses syncobjs. This depends on syncobj support in the msm
kernel driver.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2769>
Adds a shader disk-cache for ir3 shader variants. Note that builds with
`-Dshader-cache=false` have no-op stubs with `disk_cache_create()` that
returns NULL.
Binning pass variants are serialized together with their draw-pass
counterparts, due to shared const-state.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5372>
For shader-cache, we are going to want to serialize them together.
Which is awkward if the two related variants are not compiled together.
This also decouples allocation and compile, which will simplify adding
shader-cache (which still needs to allocate, but can skip compile).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5372>
Currently we always do sysmem if there is tess. And for GS, the binning
pass VS ends up identical to the draw pass VS, so no point in compiling
it twice. (For GS what we should do someday is generate a binning pass
GS, and possibly if we can do cross-stage linking opts, an optimized
binning pass VS, but the required outputs would somehow have to end up
in the shader variant key.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5372>
The next step is to hook this into pscreen->finalize_nir() so it can
come before the state tracker's shader-caching.
Unfortunately we still need to do lower_io after mesa/st, so that is
split out into a post-finalize pass.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5372>
I created a new and cleaner favicon for mesa3d.org, and it seems like a
good idea to use that one for the docs as well.
While we're at it, replace the original PNG with the original SVG asset
the ICO-file was generated from.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5643>
This was already done correctly for the indirect variants, and turnip
was setting the correct value, but it seems freedreno missed the change
in the non-indirect variant. Also, fix a misspelling of "indices" and
add a type to INDX_SIZE.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5644>
This was assuming that unused temporaries are written but never read,
since the NOP register can only be used as a destination register,
but we can end up here also for temporaries that are read once but
never written.
This was found with a graphicsfuzz test that has a switch with
cases that have unreachable discards. In that test, NIR genrates
code like this:
decl_reg vec3 32 r19
...
r20 = mov r19.z
r21 = mov r19.y
r22 = mov r19.x
Where r19.xyz would generate 3 temporary registers that are read but
never written, so we would rewrite them to point to the NOP register
as QPU instruction sources, which is not allowed and would hit an
assert that expect magic reads to be from [r0,r5] only.
Fixes:
dEQP-VK.graphicsfuzz.unreachable-switch-case-with-discards
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5645>
The offset for the VPM write for storing outputs from the geometry
shader isn’t necessarily uniform across all the lanes. This can happen
if some of the lanes don’t emit some of the vertices. In that case the
offset for the subsequent vertices will be different in each lane. In
that case we need to use the stvpmd instruction instead of stvpmv
because it will scatter the values out.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3150
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5621>
This was making it so that the CI would error if the set of files modified
or the pipeline involvd meant the jobs we depend on weren't enabled. It
was just some misplaced debug leftovers of mine.
Fixes: b88c46fa11 ("ci: Add a freedreno a630 tracie run.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5653>
Produced by comparing the traces of:
dEQP-VK.rasterization.culling.front_triangles
dEQP-VK.rasterization.culling.front_triangles_point
dEQP-VK.rasterization.culling.front_triangles_line
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5650>
Builds the renderdoc and apitrace programs so we can replay GL traces on
DUTs.
[Separated out from 5472's commit that also enabled the jobs in LAVA,
dropped unnecessary python packages from arm_build, fixed up arm64_test
build, traces-db in baremetal, new commit message by anholt]
Signed-off-by: Rohan Garg <rohan.garg@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5433>
Note: going by the blob, VFD_INDEX_OFFSET/FD_INSTANCE_START_OFFSET seem
completely unused by indirect draws, so this changes them to only be set
for non-indirect draws (and moves them to the vs_params draw state).
Passes dEQP-VK.draw.shader_draw_parameters.*
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5635>
Fixes a case where you have something like:
aVecOutput.z = aScalarInput;
In particular, skipping over things that are not the first component is
wrong.. in the above case the input we need to precolor is the 3rd
component. But we need to adjust the target register according to the
offset.
Fixes android.hardware.nativehardware.cts.AHardwareBufferNativeTests
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5601>
The previous value was not being cleared, resulting in some dynamic stencil
state failures. Fixes these two tests:
dEQP-VK.dynamic_state.ds_state.stencil_params_advanced
dEQP-VK.dynamic_state.ds_state.stencil_params_basic_1
Fixes: 233610f8cf ("turnip: refactor draw states and dynamic states")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5586>
In rare cases, we can get into situations where the tracking of read/
written resources triggers a flush of the current batch.
To handle that, (1) take a reference to the current batch, so it doesn't
disappear under us, and (2) check after resource tracking whether the
current batch was flushed. If it is, we have to re-do the resource
tracking, but since we have a fresh batch, it should not get flushed the
second time around.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3160
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5634>
We don't really need the varying remapping, and it seems to somehow
happen twice when shader-cache comes into the picture. But we can
just choose not to have this problem.
Now that everything is using the ir3_point_sprite() helper, we can
flip this pipe cap without it being a massive flag-day.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5595>
When we flip the texcoord patch, we'll setup PNTC input slot in the
pre-built interp stateobj, rather than this being a draw-time (slow-
path) built stateobj. But rather than duplicate more of the slow-
path logic, refactor it out into a helper that is reused in both
cases.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5595>
CreateRenderPass2 is the common path now, it doesn't make sense to have a
create_render_pass_common. Rename it to tu_render_pass_gmem_config and
move logic not related to gmem config out of it.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5451>
This gets rid of the some unnecessary values that were stored in
tu_render_pass for this. It also makes the render_pass_add_implicit_deps
more generic, with very few references to the tu_render_pass.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5451>
It doesn't cut down the code size by much, and might not be the ideal for
performance (unless the compiler is unexpectedly smart), but makes it
easier to maintain (no modifying the same code in two places) and will
allow some simplifications since we wont have to worry about trying to
share code between the two versions.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5451>
If the only two registers available are consecutive and used by killed
operands, both of them will be blocked and fail the edge check.
Totals from 903 (0.66% of 135946) affected shaders:
VGPRs: 30892 -> 30884 (-0.03%)
CodeSize: 1584468 -> 1584044 (-0.03%); split: -0.05%, +0.02%
MaxWaves: 14374 -> 14378 (+0.03%)
Instrs: 306482 -> 306399 (-0.03%); split: -0.06%, +0.03%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5626>
Simplification and alignment with meson's sources generation rules
Changelog:
- move rules from src/gallium/drivers/freedreno/Android.gen.mk to Android.ir3.mk
- simplify LOCAL_GENERATED_SOURCES based on $(ir3_GENERATED_FILES)
- remove includes of src/gallium/drivers/freedreno/Android.gen.mk
- remove src/gallium/drivers/freedreno/Android.gen.mk
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5580>
Changelog:
- Makefile.sources: add ir3_lexer.c and ir3_parser.{c,h} generated sources
- Android.ir3.mk: add the necessary generated sources rules
- Android.ir3.mk: add the necessary include paths
- src/gallium/drivers/freedreno/Android.gen.mk: generate only ir3_nir_{imul,trig}.c for the moment
Fixes the following building error:
target C: libfreedreno_ir3 <= external/mesa/src/freedreno/ir3/ir3_assembler.c
FAILED: out/target/product/x86_64/obj/STATIC_LIBRARIES/libfreedreno_ir3_intermediates/ir3/ir3_assembler.o
...
external/mesa/src/freedreno/ir3/ir3_assembler.c:28:10: fatal error: 'ir3_parser.h' file not found
^~~~~~~~~~~~~~
1 error generated.
Fixes: 1e8808a4a0 ("freedreno/ir3: refactor out helper to compile shader from asm")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5580>
Fixes the following building error:
FAILED: out/target/product/x86_64/obj/SHARED_LIBRARIES/gallium_dri_intermediates/LINKED/gallium_dri.so
...
ld.lld: error: undefined symbol: fdl5_layout
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)
Fixes: a1a739995b ("freedreno/a5xx: Move resource layout to fdl.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Rob Clark <robdclark@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5580>
The load_per_vertex_{input,output} intrinsics simply mean that they're
reading an arrayed input/output, which have one element per invocation.
Most accesses to those use gl_InvocationID as the subscript. However,
it's totally possible to read any element of the array. For example,
an evaluation shader might read gl_in[2].gl_Position, or a control
shader might read output[0].
For threads processing a single patch, an input/output load is
convergent if and only if both sources (the per-vertex-array subscript
and the offset) are convergent. For threads processing multiple
patches, we continued to mark them divergent.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5613>
layout_resource_for_modifier() needs to handle DRM_FORMAT_MOD_INVALID
as well, since src/gallium/frontends/dri/dri2.c uses this to indicate
"no modifier" when it's called through the older non-modifier entry
points.
This is similar to 334788d4 ("freedreno: allow INVALID modifier") but
for the generic implementation.
Fixes: 98910626 ("freedreno/a6xx: Implement layout for DRM_FORMAT_MOD_QCOM_COMPRESSED")
Closes: #3154
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5611>
We don't want to use it on gen5 and earlier because only RNDD can be
done with a single instruction and we can implement RNDU(x) as -RNDD(-x)
so it's better to just do that when we have the instruction. On gen6
and above, we may as well just use the right instruction.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5596>
Previously all nir_intrinsic_load_uniform that were used as sources were
considered to be dynamically_uniform but when offsets of load_uniform
are indirect it can not be determined.
This fixes artefacts in Google Maps 3D view in V3D.
Fixes: 886d46b089 ("nir: Add a function to determine if a source is dynamically uniform")
Reviewed-by: Neil Roberts <nroberts@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5587>
This cleans up the CmdDraw* functions to be more straightforward. And a few
fixes applied while going through it:
* Fix indirect draw commands not adding the buffer->bo_offset, and ignoring
drawCount/stride parameters (deqp tests not testing indirect draws very
much apparently).
* Fixed a potential issue with RESTART_INDEX + secondary command bufs.
* Add missing logic for 8-bit indices
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
# Conflicts:
# src/freedreno/vulkan/tu_cmd_buffer.c
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5579>
Rework the streamout state and at the same time fix some issues, the
biggest one being to actually use the counter buffers instead of ignoring
them completely.
(note it appears the dEQP tests are bad and able to pass with the previous
broken behavior of not ever reading/writing from the counter buffers)
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5579>
If there are holes between color outputs (e.g. a shader exports MRT1, but
not MRT0), we can remove the holes by moving higher MRTs lower.
The hardware will remap the MRTs to their correct locations if we remove
holes in SPI_SHADER_COL_FORMAT but not CB_SHADER_MASK.
This is a performance optimization, but MRTs with holes are pretty rare,
so there is most likely no effect on any app.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5535>
The original issue asked for all the keys in a single file, but I didn't
do that because it's much easier to manage and verify the keys as
separate files, but sphinx doesn't provide a way to expose a folder so
we'd need to create an index.html and have it list all the keys
manually, which is very error prone.
At this point, we might as well just concatenate the keys and expose
a single file, so let's do that.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5568>
We are starting to see platforms that don't support the get/set tiling
uAPI. (For example, DG1.)
Additionally on DG1 we shouldn't be using the map_gtt anymore.
Let's add some asserts and make sure we don't take those paths
accidentally.
Rework:
* Jordan: Only apply for DG1, not all gen12
* Rafael: Use has_tiling_uapi
* Jordan: Copy has_tiling_uapi from devinfo
* Jordan: merge in "iris: Rework iris_bo_import_dmabuf() a little."
* Jordan: Continue to call get/set_tiling on modifier path
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4956>
As per discussion on !5059, we don't see any particular reason as to
why MERGEDREGS should be disabled on HS/DS/GS, and none of the dEQP
tests (both VK and GL) fail when MERGEDREGS is enabled. In fact, some
of the VK dEQP tests fail when MERGEDREGS is disabled (e.g. tests
with shaders that employ a0.x). As a result, let's just enable
MERGEDREGS unconditionally on a6xx.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5059>
This commit adds tessellation support for draws. We store the IR3
patch type in tu_pipeline so we can use it in tu_emit_draw_*. We then
convert the IR3 patch type to the native adreno patch type and set
the appropriate reg values.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5059>
To store tess outputs, the HS stg's into two buffers, one for
per-vertex/per-patch output variables (tess_param) and one for
TessLevelInner/Outer (tess_factor). The addresses of these buffers
are uploaded as consts to the HS/DS and the tess_factor iova is
written to REG_A6XX_PC_TESSFACTOR_ADDR. While the sizes of these
buffers are a function of vetex count and patch count, allocation is
relatively straightforward on freedreno- just keep track of the max
required buffer size for the entire batch and allocate before batch
submit.
In Vulkan, however, a given pipeline can be bound multiple times
across any number of command buffers, each drawing with a different
number of vertices. One solution is to track the max buffer size for
the entire command buffer (similar to fd_batch) and on
vkEndCommandBuffer, allocate appropriately sized tess BOs. Since the
tess BOs addresses are emitted as part of the pipeline state setup
(e.g. PKT4 to REG_A6XX_PC_TESSFACTOR_ADDR), we need to create a new
state group independent of a specific pipeline and parameterize its
IB with the command buffer specific tess BO iovas.
Without a larger refactor, the simplest way to do this is just to
emit per-draw call consts and leverage scratch_bo to re-use buffers.
This way we won't have to store and rewrite earlier packets in the
command stream on vkEndCommandBuffer.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5059>
lower_tess_ctrl_block assumes that the gl_TessLevel*
intrinsic_store_outputs have already been collapsed into a single
instruction before the tess lowering step:
store_output ... /* base=0 */ /* wrmask=xyzw */ /* component=0 */
store_output ... /* base=1 */ /* wrmask=xy */ /* component=0 */
While this is true in fd because of st_nir_vectorize_io, we don't do
the same lowering in turnip so each tess level component still has
its own store instruction:
store_output ... /* base=0 */ /* wrmask=x */ /* component=0 */
store_output ... /* base=0 */ /* wrmask=x */ /* component=1 */
store_output ... /* base=0 */ /* wrmask=x */ /* component=2 */
store_output ... /* base=0 */ /* wrmask=x */ /* component=3 */
store_output ... /* base=1 */ /* wrmask=x */ /* component=0 */
store_output ... /* base=1 */ /* wrmask=x */ /* component=1 */
This commit adds a component offset to the tess control lowering. An
alternative is to also perform nir_lower_io_to_vector in turnip, but
ir3 seems to generate the same assembly either way and it's nice to
not have a lowering prereq before tess lowering.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5059>
To enable lowering of tess-related shaders, this commit sets the
tessellation primitive field of the ir3_shader_key. In addition,
this commit sets various tessellation flags for
spirv_to_nir configuration.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5059>
The GLSL to NIR compiler supports the LowerTessLevel flag to convert
gl_TessLevelInner/Outer from their GLSL declarations as arrays of
floats to vec4/vec2s to better match how they are represented in
hardware.
This commit adds the similar support to the SPIR-V to NIR compiler so
turnip can use the same IR3/NIR tess lowering passes as freedreno.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5059>
This commit adds a tess_levels_are_sysvals flag to
spirv_to_nir_options similar to GLSLTessLevelsAsInputs in the GLSL to
NIR compiler options. This will be used by turnip as the tess IR3
lowering pass (ir3_nir_lower_tess) operates on TessLevelInner and
TessLevelOuter in the DS as sysvals.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5059>
Adds PIPE_CAP_PRIMITIVE_RESTART_FIXED_INDEX which is a subset of the
primitive restart cap for when the hardware can only support the fixed
indices specified in GLES.
The switch statements were automatically modified with this command:
find \( \( -name \*.cpp -o -name \*.c \) \! -type l \) \
-exec sed -i -r \
's/^(\s*case\s+PIPE_CAP_PRIMITIVE_RESTART)\s*:.*$/\0\n\1_FIXED_INDEX:/' \
{} \;
v2: Add a note in screen.rst
Reviewed-by: Eric Anholt <eric@anholt.net> (v1)
Reviewed by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5559>
When doing RA we end up with adding ValueDef references to Values across
all over the shader. This is all fine until we remove the Instruction
defining those Values, which happens when spilling values.
Instead of manipulating the values directly we should just track all
merged in defs in a seperate structure and remove stale references when
an instruction gets deleted in the spiller.
fixes following libasan report:
=================================================================
==612087==ERROR: AddressSanitizer: heap-use-after-free on address 0x6150003ea380 at pc 0x7f1d12142fe9 bp 0x7fffca6fd120 sp 0x7fffca6fd110
READ of size 8 at 0x6150003ea380 thread T0
#0 0x7f1d12142fe8 in nv50_ir::ValueDef::get() const ../src/gallium/drivers/nouveau/codegen/nv50_ir.h:648
#1 0x7f1d12143c02 in nv50_ir::Value::getUniqueInsn() const ../src/gallium/drivers/nouveau/codegen/nv50_ir_inlines.h:229
#2 0x7f1d1221530d in nv50_ir::RegAlloc::BuildIntervalsPass::addLiveRange(nv50_ir::Value*, nv50_ir::BasicBlock const*, int) ../src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp:333
#3 0x7f1d1221872e in nv50_ir::RegAlloc::BuildIntervalsPass::visit(nv50_ir::BasicBlock*) ../src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp:686
#4 0x7f1d1215676c in nv50_ir::Pass::doRun(nv50_ir::Function*, bool, bool) ../src/gallium/drivers/nouveau/codegen/nv50_ir_bb.cpp:495
#5 0x7f1d121563ed in nv50_ir::Pass::run(nv50_ir::Function*, bool, bool) ../src/gallium/drivers/nouveau/codegen/nv50_ir_bb.cpp:477
#6 0x7f1d122262b8 in nv50_ir::RegAlloc::execFunc() ../src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp:1910
#7 0x7f1d122256b0 in nv50_ir::RegAlloc::exec() ../src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp:1849
#8 0x7f1d12226f1e in nv50_ir::Program::registerAllocation() ../src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp:1970
#9 0x7f1d1214092a in nv50_ir_generate_code ../src/gallium/drivers/nouveau/codegen/nv50_ir.cpp:1275
#10 0x7f1d1227461b in nvc0_program_translate ../src/gallium/drivers/nouveau/nvc0/nvc0_program.c:634
#11 0x7f1d12294b21 in nvc0_sp_state_create ../src/gallium/drivers/nouveau/nvc0/nvc0_state.c:620
#12 0x7f1d12294d90 in nvc0_fp_state_create ../src/gallium/drivers/nouveau/nvc0/nvc0_state.c:661
#13 0x7f1d12ad4912 in st_create_fp_variant ../src/mesa/state_tracker/st_program.c:1498
#14 0x7f1d12ad4cd5 in st_get_fp_variant ../src/mesa/state_tracker/st_program.c:1525
#15 0x7f1d12ad8252 in st_precompile_shader_variant ../src/mesa/state_tracker/st_program.c:2053
#16 0x7f1d12c9c851 in st_program_string_notify ../src/mesa/state_tracker/st_cb_program.c:185
#17 0x7f1d12d17731 in st_link_tgsi ../src/mesa/state_tracker/st_glsl_to_tgsi.cpp:7441
#18 0x7f1d12cabaf0 in st_link_shader ../src/mesa/state_tracker/st_glsl_to_ir.cpp:175
#19 0x7f1d127c85ca in _mesa_glsl_link_shader ../src/mesa/program/ir_to_mesa.cpp:3186
#20 0x7f1d1252a9f7 in link_program ../src/mesa/main/shaderapi.c:1285
#21 0x7f1d1252a9f7 in link_program_error ../src/mesa/main/shaderapi.c:1384
#22 0x7f1d1252deb3 in _mesa_LinkProgram ../src/mesa/main/shaderapi.c:1876
#23 0x403e13 in main._omp_fn.0 /home/kherbst/git/shader-db/run.c:926
#24 0x7f1d17b8b4b5 in GOMP_parallel (/lib64/libgomp.so.1+0x124b5)
#25 0x4029e4 in main /home/kherbst/git/shader-db/run.c:765
#26 0x7f1d179b51a2 in __libc_start_main ../csu/libc-start.c:308
#27 0x402d1d in _start (/home/kherbst/git/shader-db/run+0x402d1d)
0x6150003ea380 is located 0 bytes inside of 504-byte region [0x6150003ea380,0x6150003ea578)
freed by thread T0 here:
#0 0x7f1d17e5d96f in operator delete(void*) (/usr/lib64/libasan.so.5.0.0+0x11096f)
#1 0x7f1d1214ec0f in __gnu_cxx::new_allocator<nv50_ir::ValueDef>::deallocate(nv50_ir::ValueDef*, unsigned long) /usr/include/c++/9/ext/new_allocator.h:128
#2 0x7f1d1214dc00 in std::allocator_traits<std::allocator<nv50_ir::ValueDef> >::deallocate(std::allocator<nv50_ir::ValueDef>&, nv50_ir::ValueDef*, unsigned long) /usr/include/c++/9/bits/alloc_traits.h:470
#3 0x7f1d1214c5fb in std::_Deque_base<nv50_ir::ValueDef, std::allocator<nv50_ir::ValueDef> >::_M_deallocate_node(nv50_ir::ValueDef*) /usr/include/c++/9/bits/stl_deque.h:624
#4 0x7f1d121498c4 in std::_Deque_base<nv50_ir::ValueDef, std::allocator<nv50_ir::ValueDef> >::_M_destroy_nodes(nv50_ir::ValueDef**, nv50_ir::ValueDef**) /usr/include/c++/9/bits/stl_deque.h:758
#5 0x7f1d1214704d in std::_Deque_base<nv50_ir::ValueDef, std::allocator<nv50_ir::ValueDef> >::~_Deque_base() /usr/include/c++/9/bits/stl_deque.h:680
#6 0x7f1d12145371 in std::deque<nv50_ir::ValueDef, std::allocator<nv50_ir::ValueDef> >::~deque() /usr/include/c++/9/bits/stl_deque.h:1069
#7 0x7f1d1213bc5b in nv50_ir::Instruction::~Instruction() ../src/gallium/drivers/nouveau/codegen/nv50_ir.cpp:615
#8 0x7f1d1213fb2f in nv50_ir::Program::releaseInstruction(nv50_ir::Instruction*) ../src/gallium/drivers/nouveau/codegen/nv50_ir.cpp:1148
#9 0x7f1d122250fb in nv50_ir::SpillCodeInserter::run(std::__cxx11::list<std::pair<nv50_ir::Value*, nv50_ir::Value*>, std::allocator<std::pair<nv50_ir::Value*, nv50_ir::Value*> > > const&) ../src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp:1830
#10 0x7f1d12221445 in nv50_ir::GCRA::allocateRegisters(nv50_ir::ArrayList&) ../src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp:1541
#11 0x7f1d122262e9 in nv50_ir::RegAlloc::execFunc() ../src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp:1913
#12 0x7f1d122256b0 in nv50_ir::RegAlloc::exec() ../src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp:1849
#13 0x7f1d12226f1e in nv50_ir::Program::registerAllocation() ../src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp:1970
#14 0x7f1d1214092a in nv50_ir_generate_code ../src/gallium/drivers/nouveau/codegen/nv50_ir.cpp:1275
#15 0x7f1d1227461b in nvc0_program_translate ../src/gallium/drivers/nouveau/nvc0/nvc0_program.c:634
#16 0x7f1d12294b21 in nvc0_sp_state_create ../src/gallium/drivers/nouveau/nvc0/nvc0_state.c:620
#17 0x7f1d12294d90 in nvc0_fp_state_create ../src/gallium/drivers/nouveau/nvc0/nvc0_state.c:661
#18 0x7f1d12ad4912 in st_create_fp_variant ../src/mesa/state_tracker/st_program.c:1498
#19 0x7f1d12ad4cd5 in st_get_fp_variant ../src/mesa/state_tracker/st_program.c:1525
#20 0x7f1d12ad8252 in st_precompile_shader_variant ../src/mesa/state_tracker/st_program.c:2053
#21 0x7f1d12c9c851 in st_program_string_notify ../src/mesa/state_tracker/st_cb_program.c:185
#22 0x7f1d12d17731 in st_link_tgsi ../src/mesa/state_tracker/st_glsl_to_tgsi.cpp:7441
#23 0x7f1d12cabaf0 in st_link_shader ../src/mesa/state_tracker/st_glsl_to_ir.cpp:175
#24 0x7f1d127c85ca in _mesa_glsl_link_shader ../src/mesa/program/ir_to_mesa.cpp:3186
#25 0x7f1d1252a9f7 in link_program ../src/mesa/main/shaderapi.c:1285
#26 0x7f1d1252a9f7 in link_program_error ../src/mesa/main/shaderapi.c:1384
#27 0x7f1d1252deb3 in _mesa_LinkProgram ../src/mesa/main/shaderapi.c:1876
#28 0x403e13 in main._omp_fn.0 /home/kherbst/git/shader-db/run.c:926
previously allocated by thread T0 here:
#0 0x7f1d17e5c9d7 in operator new(unsigned long) (/usr/lib64/libasan.so.5.0.0+0x10f9d7)
#1 0x7f1d1215046f in __gnu_cxx::new_allocator<nv50_ir::ValueDef>::allocate(unsigned long, void const*) /usr/include/c++/9/ext/new_allocator.h:114
#2 0x7f1d1214ebec in std::allocator_traits<std::allocator<nv50_ir::ValueDef> >::allocate(std::allocator<nv50_ir::ValueDef>&, unsigned long) /usr/include/c++/9/bits/alloc_traits.h:444
#3 0x7f1d1214dbd3 in std::_Deque_base<nv50_ir::ValueDef, std::allocator<nv50_ir::ValueDef> >::_M_allocate_node() /usr/include/c++/9/bits/stl_deque.h:617
#4 0x7f1d1214c464 in std::_Deque_base<nv50_ir::ValueDef, std::allocator<nv50_ir::ValueDef> >::_M_create_nodes(nv50_ir::ValueDef**, nv50_ir::ValueDef**) (/home/kherbst/local/lib64/dri//nouveau_dri.so+0x829464)
#5 0x7f1d121495cd in std::_Deque_base<nv50_ir::ValueDef, std::allocator<nv50_ir::ValueDef> >::_M_initialize_map(unsigned long) /usr/include/c++/9/bits/stl_deque.h:716
#6 0x7f1d12146f7d in std::_Deque_base<nv50_ir::ValueDef, std::allocator<nv50_ir::ValueDef> >::_Deque_base() /usr/include/c++/9/bits/stl_deque.h:507
#7 0x7f1d1214518d in std::deque<nv50_ir::ValueDef, std::allocator<nv50_ir::ValueDef> >::deque() /usr/include/c++/9/bits/stl_deque.h:912
#8 0x7f1d1213b9c9 in nv50_ir::Instruction::Instruction(nv50_ir::Function*, nv50_ir::operation, nv50_ir::DataType) ../src/gallium/drivers/nouveau/codegen/nv50_ir.cpp:605
#9 0x7f1d1224dd44 in nv50_ir::Function::convertToSSA() ../src/gallium/drivers/nouveau/codegen/nv50_ir_ssa.cpp:385
#10 0x7f1d1224d381 in nv50_ir::Program::convertToSSA() ../src/gallium/drivers/nouveau/codegen/nv50_ir_ssa.cpp:310
#11 0x7f1d121407c0 in nv50_ir_generate_code ../src/gallium/drivers/nouveau/codegen/nv50_ir.cpp:1264
#12 0x7f1d1227461b in nvc0_program_translate ../src/gallium/drivers/nouveau/nvc0/nvc0_program.c:634
#13 0x7f1d12294b21 in nvc0_sp_state_create ../src/gallium/drivers/nouveau/nvc0/nvc0_state.c:620
#14 0x7f1d12294d90 in nvc0_fp_state_create ../src/gallium/drivers/nouveau/nvc0/nvc0_state.c:661
#15 0x7f1d12ad4912 in st_create_fp_variant ../src/mesa/state_tracker/st_program.c:1498
#16 0x7f1d12ad4cd5 in st_get_fp_variant ../src/mesa/state_tracker/st_program.c:1525
#17 0x7f1d12ad8252 in st_precompile_shader_variant ../src/mesa/state_tracker/st_program.c:2053
#18 0x7f1d12c9c851 in st_program_string_notify ../src/mesa/state_tracker/st_cb_program.c:185
#19 0x7f1d12d17731 in st_link_tgsi ../src/mesa/state_tracker/st_glsl_to_tgsi.cpp:7441
#20 0x7f1d12cabaf0 in st_link_shader ../src/mesa/state_tracker/st_glsl_to_ir.cpp:175
#21 0x7f1d127c85ca in _mesa_glsl_link_shader ../src/mesa/program/ir_to_mesa.cpp:3186
#22 0x7f1d1252a9f7 in link_program ../src/mesa/main/shaderapi.c:1285
#23 0x7f1d1252a9f7 in link_program_error ../src/mesa/main/shaderapi.c:1384
#24 0x7f1d1252deb3 in _mesa_LinkProgram ../src/mesa/main/shaderapi.c:1876
#25 0x403e13 in main._omp_fn.0 /home/kherbst/git/shader-db/run.c:926
SUMMARY: AddressSanitizer: heap-use-after-free ../src/gallium/drivers/nouveau/codegen/nv50_ir.h:648 in nv50_ir::ValueDef::get() const
Shadow bytes around the buggy address:
0x0c2a80075420: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c2a80075430: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c2a80075440: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c2a80075450: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa
0x0c2a80075460: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c2a80075470:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c2a80075480: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c2a80075490: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c2a800754a0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
0x0c2a800754b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c2a800754c0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==612087==ABORTING
v2: full rework
v3: manage a full copy instead of recreating new lists on every access
Closes: #3066
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5277>
The scheduler has code to handle hardware that shares the same memory
for inputs and outputs. Seeing as the specific stages that need this is
probably hardware-dependent, this patch makes it a configurable option
instead of hard-coding it to everything but fragment shaders.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>
nir_deps_state is a struct used as a closure for calculating the
dependencies. Previously it had two fields copied out of the scoreboard.
The closure is initialised in two seperate places. In order to make it
easier to access other members of the scoreboard in the callbacks in
later patches, the closure now just contains a pointer to the scoreboard
and the two bits of copied information are removed.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>
load_per_vertex_input should probably be handled in the same way as a
regular load_input. I think the nir_schedule pass was written before V3D
had geometry shader support, so that is probably why it hasn’t taken
this into account until now.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>
wsi_release_display() implements vkReleaseDisplayEXT() which
is supposed to return control to the lessor of an output
upon call.
We need to terminate the wsi->wait_thread when close()'ing
the wsi->fd, otherwise the wait_thread holds another reference
to the wsi->fd, keeping the lease active, and thereby the
leased output blocked, until vkDestroyInstance() is called.
This gives users their GUI back, instead of extended darkness.
Fixes: 352d320a07 ("vulkan: Add EXT_direct_mode_display [v2]")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5396>
Since binning pass variants share the same const_state with their
draw-pass counterpart, we should re-use the draw-pass variant's ubo
range analysis. So split the two functions of the existing pass
into two parts.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5526>
This serves two purposes, one during ubo range analysis, where we want
to create new ranges, and another during the actual ubo lowering. Split
these in two, with read-only ubo analysis state in the second case, to
prepare to split this pass in two.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5526>
This source-file uses PIPE_OS_WINDOWS to enable the Windows
functionality. But witout including p_config.h, this pre-processor
symbol won't be defined at all.
Let's fix this by adding the missing include, enabling stack-traces on
Windows.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5497>
The Windows implementation of debug_backtrace_capture has a limiation
of max 62 frames in total. Subtract a start-frame of 1 and the wrapping
functions frame, and we land at 60.
So let's lower this number on Windows to avoid triggering an assert.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5497>
Similar to the previous patch, we currently depend on the UNICODE macro
not being set, but it sometimes ends up getting set after all.
Unlike the previous patch, the easier thing to do here, is to lean into
the Unicode wrappers, and use the TEXT()-macro to define a Unicode
or ASCII literal, depending on the setting of the UNICODE macro.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5497>
The GetCommandLine API comes in two versions, GetCommandLineA (which
returns "ANSI" results), and GetCommandLineW which returns UTF-16
("WIDE") results. Then finally, windows.h provides a wrapper-macro that
defines GetCommandLine to either of the two, based on the setting of
the UNICODE macro.
More information about this mechanism can be found here:
https://docs.microsoft.com/en-us/windows/win32/intl/unicode-in-the-windows-api
For some reason, the UNICODE macro is set during build, even if we're
not explicitly setting it. This leads to us trying to cast a UTF-16
result to a char-pointer, which is obviously not going to do the right
thing.
So let's be defensive, and just call GetCommandLineA directly instead.
This avoids us depending on the setting of the UNICODE-macro in the
first place.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5497>
Looks like it fixes some potentially important VK test bugs. But also, it
fixes the GLES31 SSBO layout tests to not be so excessively large, so we
can run them in a reasonable time now. Note that a630 fail list is reset,
since the test list has changed and so we end up with a different subset
of tests being run. Interestingly, in the process the semaphore tests are
now reporting "NotSupported (Exporting and importing semaphore type not
supported at vktSynchronizationSignalOrderTests.cpp:513)" where they
weren't before.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5554>
For texturing and draw calls, HW expects the clear color to be in two
different color spaces after sRGB fast-clears - sRGB in the former and
linear in the latter. Up until now, iris has stored the clear color in
the sRGB color space. Limit the allowable clear colors for sRGB
fast-clears to 0/1 so that both color space requirements are satisfied.
Makes iris pass the sRGB -> sRGB subtest of the fcc-write-after-clear
piglit test on gen9+.
v2:
* Drop iris_context::blend_enables. (Ken)
* Drop some more resolve-related blend-state-tracking code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4972>
For rendering operations, avoid adding or using fast-cleared blocks if
the render format is incompatible with the clear color interpretation.
Note that the clear color is currently interpreted through the
resource's surface format.
Makes iris pass subtests of the fcc-write-after-clear piglit test:
* UNORM -> SNORM, partial block on gen8+.
* linear -> sRGB, partial block on gen9+.
* UNORM -> SNORM, full block on gen12.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4972>
Remove the CCS_D fallback logic so that iris doesn't attempt to use a
non-existent surface state for some renders. Also, add an assertion to
catch the issue.
The fallback in iris_resource_render_aux_usage can lead to this problem
because it doesn't account for the fact that surface states created from
resources with the Y_TILED_CCS modifier may only have CCS_E or NONE as
aux usages (due to iris_resource_create_with_modifiers).
Without this change, the next commit would have triggered the fallback
and regressed the following tests on gen9:
* dEQP-EGL.functional.wide_color.window_888_colorspace_srgb
* dEQP-EGL.functional.wide_color.window_8888_colorspace_srgb
* dEQP-EGL.functional.wide_color.pbuffer_888_colorspace_srgb
* dEQP-EGL.functional.wide_color.pbuffer_8888_colorspace_srgb
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4972>
There's no good reason to have the "which table do I use?" code
duplicated twice. The only advantage to the way we were doing it before
was that we could move the switch statement outside the loop. If this
is ever an actual device initialization perf problem that someone cares
about, we can optimize that when the time comes. For now, the
duplicated cases are simply a platform-enabling pit-fall.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5530>
This commit fills out VkPhysicalDeviceSubgroupProperties if present
in a VkPhysicalDeviceProperties2. The values here are simply pulled
from the blob.
Fixes some flakes in dEQP-VK.subgroups.* since dEQP was reading
uninitialized values of VkPhysicalDeviceSubgroupProperties.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5564>
For shader-cache, we want to not have anything important in `ir3_shader`.
And to have shader variants with lower const size limits (to properly
handle cross-stage limits), we also want variants to be able to have
their own const_state.
But we still need binning pass shaders to align with their draw pass
counterpart so that the same const emit can be used for both passes.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
Make it an rzalloc'd ptr instead of embedded struct, so it can serve as
the mem ctx for immediates. This gets rid of needing to explicitly free
the immediates, so one less thing to deal with when moving const_state.
(Also, after we move const_state to the shader variant, we won't need
one for binning pass variants)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
It seems a lot of the lowerings being run the second time were
unnecessary. In addition, when const_state is moved to the variant,
then it will become impossible to know ahead of time whether a variant
needs additional optimizing, which means that ir3_key_lowers_nir() needs
to go away. The new approach should have the same effect, since it skips
running lowerings that are unnecessary and then skips the opt loop if no
optimizations made progress, but it will work better when we move
ir3_nir_analyze_ubo_ranges() to be after variant creation.
The one maybe controversial thing I did is to make
nir_opt_algebraic_late() always happen during variant lowering. I wanted
to avoid code duplication, and it seems to me that we should push the
_late variants as far back as possible so that later opt_algebraic runs
don't miss out on optimization opportunities.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
The only difference between this and `const_state->num_ubos` was that
the latter is counting # of ubos loaded via `ldg` (based on UBO addrs
in push-consts). But turns out there isn't really any reason to care.
Instead just add an early return in the one code-path that cares about
the number of `ldg` UBOs.
This gets rid of one more thing we need to move from `ir3_shader` to
`ir3_shader_variant`.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
We are going to want to move this back to the variant, and come up with
a different strategy for binning/nonbinning to share the same constant
layout, in order to implement shader-cache support. (Since then we
can have a mix of dynamically compiled variants and cache hits, so there
is no good place to serialize the const-state.)
To reduce the churn as we re-arrange things, move direct access to the
const-state to a helper fxn. This patch is the boring churny part.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
If buffer or stride is unaligned we use the same trick as on gfx6:
load 1 byte at a time and recompose the output if needed.
This change fixes lots of deqp/glcts tests:
- dEQP-GLES2.functional.draw.random.1, 10, ...
- dEQP-GLES2.functional.vertex_arrays.multiple_attributes.stride.3_float2_0_float2_0_float2_17, ...
- dEQP-GLES2.functional.vertex_arrays.single_attribute.first.byte_first24_offset1_stride2_quads256, ...
- dEQP-GLES2.functional.vertex_arrays.single_attribute.strides.buffer_0_17_byte2_vec4_dynamic_draw_quads_1, ...
- dEQP-GLES31.functional.draw_indirect.random.14, ...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5502>
The docs say that these registers should only be read with a certain
type, and I'm inclined to believe that the hardware behaves that way,
but it makes the assembler a little more confusing and also confuses the
user of the assembler that some operands don't take types or regions.
Just always requiring regions and types seems like the sensible thing.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5514>
Instead of having these functions sprinkled around the driver (and ending
with a duplicated tu6_compare_func for example), move everything to a
common header (using the previously unused tu_util.h).
Also applied some simplifications: using a cast when the HW enum matches
the VK enum, and using a lookup table when it makes sense (which is IMO
nicer than the switch case way).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5538>
If on a nested loop
- the outer loop needs WQM but
- the inner loop doesn't need WQM and
- the break condition of the inner loop is computed in the outer loop
then it could happen that we transitioned to Exact before entering the inner loop
which could create an empty exec mask and lead to an infinite loop.
Fixes a GPU hang with RDR2
Cc: 20.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5518>
Since a value of at least "align" is used for nblocks, we might end up
with nblocks greater than the number of GMEM blocks remaining. Check for
this case and bail out, sysmem rendering will be used for such cases.
Fixes some of these tests:
dEQP-VK.pipeline.render_to_image.core.*.huge.*
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5499>
Check pipeline's sampleShadingEnable to enable sample shading.
Also fix behavior of gl_Fragcoord with sample shading.
Fixes at least:
dEQP-VK.pipeline.multisample.min_sample_shading.min_0_5.samples_4.primitive_triangle
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5499>
pMultisampleState needs to be ignored when rasterizerDiscardEnable, so the
current code can crash when trying to load msaa_info->pNext.
At the same time this simplifies tu_pipeline_shader_key_init a bit, by not
calling it for the compute shader case (which doesn't need to set anything
in the key struct).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5499>
Rather than assuming a6xx+ means mergedregs. We can actually (mostly?)
do splitregs on a6xx as well. And GS/DS/HS currently require it, which
might be papering over a bug, or might be something to do with how
chaining shaders works. At any rate, we should at least be consistent,
and not have the compiler thinking we are doing mergedregs when we are
actually doing splitregs.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5458>
Reduce linear alignment, and rework the layout code a bit.
This rework has a side effect of also increasing the alignment on linear
levels of tiled (non-ubwc) cpp=1 and cpp=2 layouts. Since we should be
UBWC for those cases anyway, its not a big loss.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5013>
"FETCHSIZE" is actually a "minimum pitch" or "pitchalign" value that's
relevant for mipmaps. The 0 value means 64-bytes. Understanding this allows
some simplifications and will make it possible to use less alignment on
linear formats.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5013>
I've been getting a stream of errors from Marge today like:
FAILED: src/gallium/drivers/llvmpipe/lp_test_conv.exe
link @src/gallium/drivers/llvmpipe/lp_test_conv.exe.rsp
LINK : fatal error LNK1102: out of memory
[1080/1141] Compiling C object src/gallium/frontends/wgl/945cc3d@@wgl@sta/stw_getprocaddress.c.obj
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5534>
The idea is to have the canonical source of each of those files
available without having to remember anything, and to be able to update
all the Vulkan files by simply running `bin/khronos-update.py vulkan`.
The script also handles the fact all the EGL/GL/GLES* headers depend on
the KHR header, and the former should not be updated without updating
the latter.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5177>
Implement GMEM input attachments by using non-bindless texture state which
is emitted at the start of every subpass.
This achieves two things:
* More vulkan-like CmdBindDescriptorSets
* Fixing secondary command buffer input attachments with GMEM
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5446>
This moves some logic out of bind_draw_states, moving towards the eventual
goal of doing very little in bind_draw_states.
Split this out as a separate patch to make the DIRTY_INPUT_ATTACHMENTS more
visible: it can be safely removed because pipelines are subpass specific,
so there will always be a pipeline change to go with the CmdBeginRenderPass
and CmdNextSubpass (the CmdBindPipeline may not be in the subpass, but the
draw that flushes the pipeline update will be).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5446>
The only two fields were always true, and I don't think we'd ever have
use for them. If we want to disable optimizations then we'd need a
different approach, and I don't even know what include_binning_pass was
for.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5500>
This will be necessary once we start compiling multiple variants due to
different const size limits, and it will also be necessary for properly
implementing the pipeline cache.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5500>
ir3_shader_from_nir() calls ir3_optimize_nir(), which currently sets up
the const state. However, we need to know the number of user consts
reserved by the driver before setting up the const state, which means
that this information needs to be passed into ir3_shader_from_nir()
somehow rather than being set in the shader.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5500>
Only a single shader-db change it looks like, and not even from
scheduling, no fun.
instructions helped: shader31 MESA_SHADER_FRAGMENT: 64 -> 63 (-1.56%)
quadwords helped: shader31 MESA_SHADER_FRAGMENT: 66 -> 65 (-1.52%)
registers HURT: shader31 MESA_SHADER_FRAGMENT: 2 -> 3 (50.00%)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5475>
One of the neat things with Midgard's wacky VLIW... on VADD/SADD this is
(x + x) literally, on VMUL/SMUL/VLUT this is (x * 2.0) where the 2.0 is
exactly representable in FP16 so it fits nicely as an inline constant.
So we don't need to restrict its scheduling.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5475>
The initial support tried to match uniform variables from different
shaders based on the variables pointer. This will obviously never
work, instead here we use the variables name whcih also means we
must disable this optimisation for spirv.
Using the base variable name works because when collecting uniform
references we never iterate past the first array dimension, and
only support resizing 1D arrays (we also don't support resizing
arrays inside structs).
We also drop the resized bool as we can't skip processing the var
just because is was resized in another shader, we must resize
the var in all shaders.
Fixes: a34cc97ca3 ("glsl: when NIR linker enable use it to resize uniform arrays")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3130
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5487>
v2:
* Set pPipeline to NULL in the corresponding
graphics/compute_create_pipeline function.
* Keep current ANV behavior of bailing on the first real error.
v3:
* Don't return early if the pipeline succeeded.
v:4(5?):
* Simplify return conditions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5136>
In the `try_swap_mad_two_srcs()` case, valid_flags() gets called both
for the src that we want to try to fold, and for the other src that we
are trying to swap to make that possible. It can happen in the 2nd case
that a RELATIV src has already been folded. Since `ssa()` returns non-
null in both the `IR3_REG_SSA` and `IR3_REG_ARRAY` cases (in the later
case, it is the dependent array access that the current instruction
cannot be moved ahead of), we need to explicitly check that the src
reg we are looking at is still an SSA src.
Reported-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5280>
Those are currently hurting Felix' ability to look at the batches.
We can probably detect this in the aubinator but that's a bit more
work than falling back to the previous behavior.
v2: Condition VK_KHR_performance_query to not using this variable (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5391>
virgl vendor id should probably be little more generic now.
I think I picked this becuase the virtio pci id space was under
RH's name and they did pay for it, but at this point I think it's
better to just use something generic.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5483>
The approach taken in this commit only works on drivers that expose
the PIPE_CAP_BUFFER_MAP_PERSISTENT_COHERENT capability. For drivers
that don't, the buffer has been unmapped by the time we get to
hud_draw_colored_prims, leading to crashes.
It's not easy to fix the code, but drivers that do support coherent
mapping will most likely do the right think themseleves, so let's just
go back to using user-buffers here.
This reverts commit 4fe1fd4df4.
Fixes: 4fe1fd4df4 ("gallium/hud: don't use user vertex buffers")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3106
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5417>
It is better to use `nir_intrinsic_dest_components()` which also handles
the case of intrinsics with a fixed number of dest components.
Somehow this starts showing up with a nir_serialize round-trip with
shader-cache. But we really shouldn't have been relying on
`intr->num_components` directly.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5371>
It's tricky to merge XFB-outputs correctly, because we need there to not
be any overlaps when we get to `nir_gather_xfb_info_with_varyings` later
on. We currently trigger an assert there if we end up merging here.
So let's not even try. This is an optimization, and we can optimize this
in safe cases later if needed. For now, let's play it safe.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5329>
The only reason for this dependency was the fd_bo used for the uploaded
shader. But this isn't used by turnip. Now that we've unified the
cleanup path from gallium, it isn't hard to pull the fd_bo upload/free
parts into ir3_gallium.
This cleanup has the added benefit that the shader disk-cache will not
have to deal with it.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5476>
glsl builtins that have no analog in spirv are emitted as regular varyings,
which means they take up a slot.
we need to ensure that there's no conflict between these regular varying
slots (from user-defined varyings) and the glsl translated builtins, so
we do that by "reserving" the max number of varying slots that can be used
by a given stage, then remapping all glsl builtins with no spirv builtin
to a packed layout location that can be consistent across stages
sort of addresses mesa/mesa#3113 except now there's 10 fewer varying slots
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5432>
Instead of requiring an explicit unoptimized move, we can implicitly
colour the blend input intrinsic to r0, where it will be preloaded; this
is a simple task for RA, and does not conflict with anything. If there
are multiple duplicate loads, the latter ones can just be simple moves
which will be copypropped.
We don't need to include a explicit synthetic load, since (scanning
backwards) the read will cause the input to become live at the right
time and the lack of an explicit write will keep it live from the
beginning of the shader. So no need to make it more complicated than it
needs to be.
Saves a cycle in blend shaders.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5449>
ir3_nir_move_varying_inputs is broken when there a load input outside of
the first block which depends on the result of a previous load input.
This simplification/rework avoids the problem, and should also be faster.
Fixes this dEQP-VK test:
dEQP-VK.pipeline.multisample_interpolation.offset_interpolate_at_pixel_center.128_128_1.samples_2
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5465>
Old script created files in the source directory, which is generally
considered bad form.
The rewrite to python instead of duct-taping around in the shell script
goes towards the goal of only having cross-platform python scripts,
which is also harder to make mistakes in than shell scripts.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5155>
I'm not 100% sure if it feels right to update these. I mean, this keeps
links working as they should, even if exported to something else than
HTML. But it also feels a bit like history revisionism. It's probably
the right thing to do, though.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4630>
This kind of only makes sense once we have a separate home-page. But I
think this is a good way of showing why we should do this; Sphinx
doesn't support pagination, because it's not meant as a general-purpose
website framewrork. And for documentation, pagination is not really
something you need.
There's probably a lot more pages that should be moved into a separate
webpage, similar to this. In general, I think this should be done for
pages that don't relate to the source code too much, e.g isn't needed to
understand the code, or for instance explains how to get the source code.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4630>
Unfortunately, it doesn't seem like there's a way to have sphinx copy
this without moving the files, becasue html_extra_path doesn't copy the
directory itself when given a directory, only files inside and
subdirectories.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4630>
This uses the previously added scripts to convert the documentation to
reStructuredText, which is both easier to read offline, and can be used
to generate modern HTML for online documentation.
No modification to the generated results have been done.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4630>
This is just a temporary commit, adding the scripts that performs the
automated conversion of the docs. The next commit contains the results
of the conversion, and the commit following that removes these scripts
again.
To redo the conversion in the next commit, rebase interactively to edit
this commit and delete the next one, and run './update-docs.sh' from the
root directory. Then continue the rebasing, and resolve any conflicts
that might have occurred in the manual fixes on top. Finally, build the
documentation to ensure no further fixups are needed.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4630>
Like for LAVA, make the tradeoff of moving the test scripts and data (55k)
into the artifacts in order to make the per-build jobs not have to pull
down the git tree (hundreds of MB when you don't hit a cached container
for your specific user, which I see happen multiple times a day in my CI
runs).
To do this, we have to be a bit more careful in some places about our
working directory potentially being dirty.
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5393>
We've already uploaded and downloaded them from fd.o and put them in the
rootfs, so we can clean up the extra prep work.
Our test job now extends from .test so that the artifacts' install dir
with all the scripts is extracted. This required moving the dependency on
meson-testing to the x86 test-gl/test-vk job blocks.
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5393>
Tiling is expensive, so this patch converts textures that appear to be
used for streaming to a linear layout.
Performance of mpv is significantly improved, with software-decoded
1080p mp4 playback on RK3288 going from 30fps to 50fps when testing
with `--untimed --no-audio`.
To keep things simple, conversion only happens when updating the whole
texture and no mipmapping is used.
v2: Make it clear that the heuristic doesn't rely on a texture being
uninitialized, since layout switching code can get confusing (Alyssa).
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4628>
Gen9 and Cherryview have the ability to mark texture instructions with
the End-of-thread bit under some conditions, which allows the texture
result to be written to the render target directly, rather than
returning to the EU.
In order to handle overlapping primitives correctly, we have to use the
'sendc' instruction which stalls until other threads potentially writing
to the same locations in the render target are retired. Unfortunately,
this stall happens before the texture is sampled (rather than in
parallel with stall), so for some literal edge cases (like the diagonal
edge between two triangles forming a rectangle) there can be a
performance penalty. As a result, it's probably not a good idea to use
this optimization in general.
I had planned to leave it enabled only for BLORP, where we use rectangle
primitives and are typically clearing/blitting an entire render target
without any overlapping primitives, but I noticed that the optimization
wasn't applied in some normal cases anyway. For example, in the piglit
test tests/shaders/glsl-fs-texture2d-bias.shader_test it is applied to
one BLORP-blit shader but not another due to some kind of mishandling of
register types (the destination register type of the texture operation
is UD while the color source of the render target write is F).
Additionally the instruction scheduler assumed that the combined texture
and render target write operation took 0 cycles, leading to cycle
estimates that are wildly inaccurate. Since the optimization was not
implemented for SIMD32 and our decision whether to use the SIMD32
program is made by comparing the estimated performance with that of the
SIMD16 shader, we wrongly threw out a bunch of SIMD32 programs that are
likely profitable.
total cycles in shared programs: 472807891 -> 473784245 (0.21%)
cycles in affected programs: 108277 -> 1084631 (901.72%)
helped: 0
HURT: 1290
total sends in shared programs: 998955 -> 1000245 (0.13%)
sends in affected programs: 1400 -> 2690 (92.14%)
helped: 0
HURT: 1290
LOST: 0
GAINED: 33
This patch shows no performance changes in Intel's Mesa performance CI.
Given the problems, the lack of evidence that the pass improves
performance, and the fact that the hardware feature was removed from
subsequent GPU generations, I think that the pass is not valuable and
should be removed.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5412>
When you get a filure and go looking in the results, you'll find weird stuff like this in the XML:
Reference images fill undefined pixels with 3x3 grid pattern.
Attachment 0 (p' = p bin boot builds
dEQP-VK.renderpass.suballocation.attachment_allocation.grow_shrink.89.qpa
deqp dev etc home init install lib media mnt proc results root run sbin
set-job-env-vars.sh sys tmp usr var (1, 1, 1, 1) + (-1, -1, -1, 1))
because we were not quoting the line and 'p *' was getting expanded.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5435>
This fixes cases where the 3D path is used with layered rendering.
Fixes dEQP-VK.renderpass.suballocation.multisample_resolve.layers* failures
Note the blob's 3D fallback path behaves differently, and uses the
framebuffer information to clear each layer individually (changing the MRT
state each time). But that's not possible in all cases, and the blob fails
to clear properly in dEQP-VK.geometry.layered.*.secondary_cmd_buffer cases.
So this clear path is not based on the blob's behavior.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5426>
This refactor simplifies things a bit, and will make it easier to share
some logic with tu_clear_blit (see next patches).
This changes the order in which some things are emitted, and emits less
for disabled shader stages. There's also as extra write to SP_GS_PRIM_SIZE
that is removed.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5426>
This is mainly the "piglit optimization" (ie, since piglit launches an
separate process for for each test). It was never wired up for a6xx,
and makes register class setup unnecessarily complicated. Remove it to
simplify the next patch.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5431>
Even though the clear color BO is bound as a read-only buffer, report
the same caching domain as the main BO in use_surface() (typically
IRIS_DOMAIN_RENDER_WRITE) in order to avoid ping-ponging back and
forth between IRIS_DOMAIN_RENDER_WRITE and IRIS_DOMAIN_OTHER_READ,
which leads to increased stall-at-pixel-scoreboard synchronization
between draw calls.
Fixes a 5%-10% FPS regression in some benchmarks spotted on ICL.
Reported-by: Clayton Craft <clayton.a.craft@intel.com>
Fixes: eb5d1c2722 "iris: Annotate all BO uses with domain and sequence number information."
Closes: #3097
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5411>
I'm not sure why this code was if (0), but if (1) for it fixes
dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_float_color
This test expects +inf to get mapped to 255 and -inf to 0, both values
were ending up at 0.
v2: also enable in the SSE paths
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5379>
This fixes:
dEQP-GLES31.functional.draw_indirect.random.2
which ends up with 3x32-bit USCALED values going down this path
some of which have the top bit set, and end up converted to signed
float instead of unsigned float values.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5379>
Fixes dEQP-GLES31.functional.geometry_shading.emit.line_strip_emit_1_end_1
This test only emits 1 primitive, but the stores don't respect
the current mask, which might only have one lane active, for that single
primitive. Also fix the final emit path to use the emitted_mask
rather than the current execution mask.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5379>
dEQP-GLES2.functional.clipping.triangle_vertex.clip_three.clip_neg_x_neg_z_and_pos_x_pos_z_and_neg_x_neg_y_pos_z
fails pretty strangely (given that we're passing everything else) and
there's an old VK-GL-CTS bug open about this test, and it's suspicious
that all the ARM drivers seem to have trouble with it. I tried dropping
to -O0 on guilding that file in the CTS and it didn't help, though.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5419>
v2:
- add TargetGV100::isBarrierRequired() for OP_BREV
- use NV50_IR_SUBOP_LOP3_LUT() convenience macro where it makes sense
- separated out nir_lower_idiv into its own commit
- make use of the shared function to generate compiler options
- disable lower_fpow, nir's lowering is broken
v3:
- use replaceCvt() instead of custom NEG/ABS/SAT lowering
v4:
- remove WAR from peephole, not needed now we're using replaceCvt()
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377>
This replaces the existing implementation without adding lowering for
earlier GPUs. The reason for this is because the existing code isn't
at all correct, and it also can't be hit anyway.
Will be required to support SM70 lowering passes.
v2:
- fixup source selection
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377>
We already use a hack from NVC0LegalizeSSA::handleShift() on GK110 and
newer which encodes SHF into the existing SHL/SHR opcodes, but there's
a couple of problems with it:
- LO/HI are swapped in one of the directions, which is very confusing.
- The initial SM70 code will emit this from NIR->NVIR, and using the
existing encodings will confuse the optimisation passes.
As I want to limit the impact on other GPUs from the initial bring-up
of Volta/Turing, let's add an explicit representation of SHF in the IR.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5377>
Setting the depth-scale to 1 while leaving the depth-translation at 0
means our near-plane is at -1 in OpenGL semantics, which is
out-of-range on some drivers. In particular, Zink has this limitation.
But since we'll only pass a zero z in here anyway, we might as well
multiply it by zero, and get the same result. This avoids the problem.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5408>
This needs to be reworked, but it's a bit messy as we have to store
all the fetch pointers to be added as globals later once gallivm
has been initialised further. For now just refuse to cache shaders
that hit these paths (mainly ETC1 and BPTC).
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5049>
When using cached shaders we have to relink the shader with
external symbols when it's loaded. However the way gallivm does
function calls now hardcodes the function pointer into the shader.
LLVM had a mechanism for doing this properly using global mappings,
this switches the coroutine alloc/free code to use a global mapping.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5049>
When we use an object cache for the MCJIT we can have identical
cache entries from the same shader variant in different shaders,
but the JIT objcache uses the function name to relink things,
so it has to be consistent. Just drop the variants from the
function names.
Note the modules still have the variant info.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5049>
This uses the new nir_intrinsic_store_combined_output_pan intrinsic,
which can write depth, stencil and color in a single instruction. If
there are no color writes, the "depth RT" is written to.
Fixes the dEQP GLES3 depth write tests, as well as the piglit tests
fragdepth_gles2, glsl-1.10-fragdepth and when modified to not rely
on depth/stencil reload, glsl-fs-shader-stencil-export.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5065>
If gsprim_lds_size is larger than target_lds_size then gfx10_ngg_calculate_subgroup_info
will fail.
This commit adds a logic to try the multi-cycling in this case because it's
using less memory.
This fix glsl-1.50-gs-max-output when using NGG.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5401>
In d11e4738a8, we started using a start_offset to allow us to
allocate pools where the base address isn't at the start of the pool.
This is useful for binding table pools which want to be relative to
surface state base address (more or less), among other things. However,
we had a bug where, if you have a negative offset, everything returned
to the pool would end up being returned to the "back" of the pool. This
isn't what we want for binding tables in the softpin world. This was
causing us to never actually re-use any binding table blocks. How this
passed CTS, I have no idea.
Closes: #3100
Fixes: d11e4738a8 "anv/allocator: Add a start_offset to anv_state_pool"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5395>
when shaders are created and destroyed in large numbers, the same pointers
get reused for different shaders, which can lead to bad lookups in the
program_cache hash table.
now each shader tracks its program usage to automatically remove itself from
that program in order to avoid hash collisions
fixesmesa/mesa#3053
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5315>
I'm going to enable the VK CTS on cheza, so swap the deqp we have in the
container. build-deqp-vk already included GLES deqp binaries and data,
and is a newer branch than the last opengl-es-cts tag.
This brings a few things back over from build-deqp-gl for testlog
extraction, and copyes out the GLES mustpass lists.
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5266>
The app is allowed to never bind descriptor sets that are statically
unused by the pipeline, which would've caused a context fault since
CP_LOAD_STATE6 would try to load the descriptors that don't exist. Fix
this by not preloading descriptors from unused descriptor sets. We could
do more fine-grained accounting of which descriptors are used, but this
is enough to fix the problem.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5400>
Unreal Engine 4 has a bug in usage of glDrawRangeElements,
causing it to be called with a number of vertices in place
of "end" parameter (which specifies the maximum array index
contained in indices).
Since there is unknown amount of games affected and we
could not identify that a game is built with UE4 - we are
forced to make a blanket workaround, disregarding max_index
in range calculations. Fortunately all such calls look like:
glDrawRangeElements(GL_TRIANGLES, 0, 3, 3, ...);
So we are able to narrow down this workaround.
This was uncovered after b684030c3a
broke a bunch of UE4 games.
Cc: 20.1 <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2917
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5203>
Replace the various ad-hoc flushes that we've inserted, copied from
freedreno, etc. with a unified system that uses the user-supplied
information via vkCmdPipelineBarrier() and subpass dependencies.
There are a few notable differences in behavior:
- We now move setting RB_CCU_CNTL up a little in the gmem case, but
hopefully that won't matter too much. This matches what the Vulkan blob
does.
- We properly implement delayed setting of events, completing our
implementaton of events.
- Finally, of course, we should be a lot less flush-happy. We won't emit
useless CCU/cache flushes with multiple copies, renderpasses, etc. that
don't depend on each other, and also won't flush/invalidate the cache
around renderpasses unless we actually need to.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4964>
We just dropped the last user which actually cared about the seqno.
This never worked anyway, since the seqno was never reset between
multiple executions of the same command buffer. Turn the part of the
control buffer which used to track the seqno into a dummy dword, and
figure out automatically whether we need to include it. We will
implement seqnos again eventually, with timline semaphores, but that
will likely be totally different.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4964>
The Vulkan blob doesn't do this, and based on my understanding of how
the blob works this is unnecessary. CACHE_FLUSH is already serialized
against all 3d commands so you don't need to wait for rendering commands
to finish before issuing it, and the subsequent wfi + WAIT_FOR_ME will
cause the CP to wait for the CACHE_FLUSH to finish, so there's also no
need to wait for it to complete. The CACHE_INVALIDATE also seems
unnecessary, and also isn't done by the blob.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4964>
The current "ds = state->bs" seems broken, and the "vs = state->bs" is
unnecessary (already set above). Since it was added as part of a GS-related
patch, I think this is what was intended.
Note: tesselation disables GMEM rendering so we shouldn't have to worry
about hs/ds + binning interaction.
Fixes: 0eebedb619 ("freedreno/a6xx: Emit program state for GS")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5370>
asin(x) is now implemented using a piecewise approximation, which
improves the precision for |x| < 0.5
Previously, we were using a polynomial approximation for both the
asin() and acos() functions. Unfortunately, for asin(), this polynomial
does not have enough precision to satisfy the Vulkan CTS requiremenents,
which define the asin() precision based on the precision of
atan2(x, sqrt(1.0 - x*x)). The piecewise approximation gives the needed
precision in the problematic range.
v2: Skip the piecewise approximation for acos
Closes: #1843
Acked-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3809>
Tessellation shader in llvmpipe depends on llvm coroutines support. So do not
advertise tessellation shader support in llvmpipe if GALLIVM_HAVE_CORO is FALSE.
This fixes assertion in LLVMTokenTypeInContext() running tessellation shader
tests with llvm version < 6.
Fixes: eb522717 "llvmpipe: add support for tessellation shaders"
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5366>
Fixes the following building errors:
external/mesa/src/gallium/drivers/svga/svga_context.c:184: error: undefined reference to 'svga_init_ts_functions'
external/mesa/src/gallium/drivers/svga/svga_context.c💯 error: undefined reference to 'svga_cleanup_tcs_state'
out/target/product/x86_64/obj_x86/STATIC_LIBRARIES/libmesa_pipe_svga_intermediates/libmesa_pipe_svga.a(svga_state.o):svga_state.c:hw_draw_state_sm5: error: undefined reference to 'svga_hw_tes'
out/target/product/x86_64/obj_x86/STATIC_LIBRARIES/libmesa_pipe_svga_intermediates/libmesa_pipe_svga.a(svga_state.o):svga_state.c:hw_draw_state_sm5: error: undefined reference to 'svga_hw_tcs'
Fixes: ccb4ea5a "svga: Add GL4.1(compatibility profile) support in svga driver"
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5364>
The xml.etree.cElementTree module will be removed in Python 3.9. Since
Python 3.3 the xml.etree.cElementTree module has been deprecated, the
xml.etree.ElementTree module uses a fast implementation whenever
available.
Builds using Python 2.7 can still work but with the slower
implementation.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5349>
The GLES blob on the p3a limits constlen to 512 between VS and FS across
a6xx gpu ids (615, 630, 640, and 650). Experimentally, exceeding that
limit in any one stage results in rendering corruption or GPU hangs
(though my most detailed testing had a loop limit in a uniform, so that
may the cause of the hang). Clamp the limit we use inside of a shader so
we don't exceed it within a stage.
This commit doesn't resovle limiting inter-stage. Experimentally, I've
found that I can push up to a total of ~768 vec4s between VS and FS on
a630, with or without uniform updates between each draw. We'll need to do
some shader key-based limiting of constlen at draw time to respect that
limit, but that's left for future work, and this commit is enough for the
google earth case that initiated this work.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
It turns out the GL uniforms file is larger than the hardware constant
file, so we need to limit how many UBOs we lower to constbuf loads. To do
actual UBO loads, we'll need to be able to upload UBO 0's pointer or
descriptor.
No difference on nohw 1 UBO update drawoverhead case (n=35).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
For now we never ask to set up UBO 0 as a real UBO, so this doesn't
trigger, but it gets us ready for handling the case where UBO 0 is too big
to be push constants in the HW.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
Comparing a blob trace using the feature to one not, the difference was
pretty obvious and in the spot you'd expect compared to alphaToCoverage.
The SP_ reg didn't have a corresponding bit set, though it also has an
alphaToCoverage.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5343>
The script was failing for me (python 3.8), not sure if this is a recent
python version break or not as I don't know how often people have been
running this script:
Processing ./gen9.xml... Traceback (most recent call last):
File "./gen_sort_tags.py", line 177, in <module>
main()
File "./gen_sort_tags.py", line 170, in main
genxml[:] = enums + sorted_structs.values() + instructions + registers
TypeError: can only concatenate list (not "odict_values") to list
Turning the odict into a list fixes it for me, and the resulting xml
file are identical to before :)
Fixes: 903e142f0d ("genxml: add a sorting script")
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5352>
This is a squash commit of in house performance fixes and misc bug fixes
for GL4.1 support.
Performance fixes:
* started using system memory for constant buffer to gain 3X performance boost with metro redux
Misc bug fixes:
* fixed usage of vertexid in shader
* added empty control point phase in hull shader for zero ouput control point
* misc shader signature fixes
* fixed clip_distance input declaration
* clearing the dirty bit for the surface while using direct map if surface is already flushed
and there is no pending primitive
This patch also uses SVGA_RETRY macro for commands retries. Part of it is already
used in previous patch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Signed-off-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5317>
This patch adds the following tgsi utilities
* tgsi_dynamic_indexing: This utility flattens out the dyanamic indexing of constant buffers
* tgsi_vpos: This utility writes zeros to position at index 0 in vertex shader.
This utility can be used if there is no shader output in vertex shader
* util_make_tess_ctrl_passthrough_shader: This adds passthough tessellation control shader.
Input of passthrough tess ctrl shader is output of vertex shader
and output is input of tessellation eval shader.
If program has tessellation eval shader but no tessellation control shader,
this utility can be used to create passthrough tessellation control shader.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Signed-off-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5317>
Technically we only have to do late-z in the alpha-test or discard case
if depth-write is enabled. If depth write is disabled, the depth read /
test / conditional-write interlock that we need to emulate is not a
problem, so we can still use early-z test.
There is a slightly weird case when there is no zsbuf attachment (see
dEQP-GLES31.functional.fbo.no_attachments.*) where the hw wants us to
use LATE_Z.. not entirely sure if this is an interaction with occlusion
query or just a pecularity of how the hw works when there is no depth
buffer.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5336>
We were remapping the bindings so the HW binding points were consecutive,
which there's no need for. Now that we don't shuffle, we can mostly drop
the dependency on the pipeline for this SDS.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5321>
the virgl CI code was using the noopt path and crashing with a
wierd can't select llvm.coro.subfn.addr error, turns out we have
to call the cleanup pass no matter what.
This enable a lot more virgl gles31 passes, but we have
to disable tessellation shaders as now they executed, they
crash due to missing OES_gpu_shader5, I should try and reenable
them when llvmpipe is further along
Fixes: d32690b43c ("gallivm: add coroutine pass manager support")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Elie Tournier <elie.tournier@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5320>
The FETCH_CNT field isn't actually the FETCH count. We don't have a
lot of data where it's different from DECODE_CNT, so there's not much
to go by. It could be number of VFD_DEST_CNTL or maybe DECODE_CNT for
binning. For now, setting both to number of DEST_CNTL gets Google
Earth working again.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5324>
We reuse DRM file descriptors internally. Therefore when we export a
GEM handle we must do so in the file descriptor used externally.
This change also fixes a file descriptor leak of the FD given at
screen creation.
v2: Don't bother checking fd equals, they're always different
Fix dmabuf leak
Fix GEM handle leaks by tracking exported handles
v3: Check os_same_file_description error (Michel)
Don't create multiple exports for a given GEM table
v4: Add WARN_ONCE (Ken)
Rename external_fd to winsys_fd
v5: Remove export lock in favor of bufmgr's
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2882
Fixes: 7557f16059 ("iris: share buffer managers accross screens")
Tested-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4861>
Now that we are doing a better job of managing LRZ, add support for the
EARLY_LRZ_LATE_Z mode. Since we properly disable LRZ write in cases
where we don't know a fragment's z value during the binning pass (or
when blend is enabled in a later draw, meaning we will need the earlier
fragment's color), we can enable a mode that keeps the early-lrz test
when the frag shader has kill/discard. This will only discard geometry
that is definitely not visible.
This is a pretty big win for games/benchmarks that have a lot of frag
shaders with kill/discard. More than 10% gain for gfxbench trex/mh and
40% gain for mh31.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5298>
In particular, properly detect reversal of depth-test direction.
With that we can remove a lot of cases where we were unnecessarily
invalidating LRZ, which was simply papering over the direction-
reversal issue in deqp.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5298>
And document the early-lrz-late-z mode.
Initially I thought this would be two bits to control early-lrz vs
early-z. But having early-z without early-lrz does not make sense,
and the way the values line up makes an enum fit better.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5298>
With UBO access going through LDC, all memory access uses buffer based
io primitives. We can then advertise
PIPE_CAP_ROBUST_BUFFER_ACCESS_BEHAVIOR and
PIPE_CAP_DEVICE_RESET_STATUS_QUERY, which turn on GL_EXT_robustness,
GL_KHR_robust_buffer_access_behavior and GL_KHR_robustness.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5319>
The iris_blorp_exec() hook needs to be executed under a single
indivisible sync region, which means that in cases where we need to
emit a PIPE_CONTROL for a buffer barrier we won't be able to track the
subsequent commands separately from the previous commands, which will
prevent us from optimizing out subsequent PIPE_CONTROLs if we
encounter the same buffers again. In particular I've encountered this
situation in some SynMark test-cases which perform lots of BLORP
operations with the same buffer bound as both source and destination
(in order to generate mipmaps): In such a scenario if the source
requires flushing we'd also end up flushing for the destination
redundantly, even though a single PIPE_CONTROL would have been
sufficient.
This avoids a 4.5% FPS regression in SynMark OglHdrBloom and a 3.5%
FPS regression in SynMark OglMultithread.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
The big-hammer iris_flush_depth_and_render_caches() is largely
redundant whenever a format mismatch is detected from
iris_cache_flush_for_render(). There is no need to kick the depth,
sampler nor constant caches in that case.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
The render cache hash table is now *mostly* redundant with the more
general seqno matrix-based cache tracking mechanism. Most hash table
operations are now gone except for the format mismatch checks done in
iris_cache_flush_for_render(). Redundant code removed as a separate
patch for bisectability.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
Whenever iris_predraw_resolve_inputs() ends up doing a flush or
invalidate, we really want it to be on the same batch which is going
to consume the result. Any resolves should still be performed from
the render batch thanks to the previous patch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
The resolves performed by this function are only expected to work from
the render batch, so make sure we use it independently of the batch
the caller wants to use. This function provides no synchronization
guarantees anyway, the caller is expected to insert any cache flushing
and synchronization required for the resolved surface to be visible to
the target batch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
This takes advantage of the previously introduced cache tracking
infrastructure in order to define a multi-purpose barrier operation
that allows the caller to order memory operations with respect to
previous operations performed on the same buffer from any other cache
domain.
v2: Assorted CPU overhead micro-optimizations (Francisco).
v3: Use C99 designated initializers (Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
This introduces a representation of the cache coherency status of the
GPU at any point in the batch. This is done by defining a matrix C of
synchronization sequence numbers such that at any point of batch
construction, a memory operation from domain i introduced into the
batch is guaranteed to be ordered after any memory operation from
domain j in a previous batch section with seqno n if the following
condition holds:
C_i_j >= n
This allows us to efficiently determine whether additional flushing
and/or invalidation is required in order to access a buffer object
from some arbitrary domain.
Except for batch buffer reset which requires clearing the whole
matrix, all operations on the matrix are either O(n) or O(1) on the
number of caching domains (which is basically constant).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
This is the main performance trade-off of this cache tracking
mechanism: In order for the seqno vector of buffer objects to be
accurate, they need to be marked as used again every time the batch is
split into a new synchronization section if they remain bound to the
pipeline. This can be achieved easily by re-using
iris_restore_render_saved_bos() and iris_restore_compute_saved_bos(),
which currently serve a similar purpose across batch buffer
boundaries.
The impact on Piglit drawoverhead results seems to be within a
standard deviation of the current results.
XXX - It might be possible to completely remove the current
iris_batch::contains_draw flag at a small additional performance
cost.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
The write flag is redundant since it can be inferred easily from the
iris_address::access domain. This allows the iris_address struct to
be laid out more efficiently in memory, leading to a measurable
improvement in several Piglit Drawoverhead test-cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
Probably the most annoying patch to review from the whole series --
Mark every buffer object use as accessed through some caching domain
with the sequence number of the current synchronization section of the
batch. The additional argument of iris_use_pinned_bo() makes sure I'd
have gotten a compile error if I had missed any buffer added to the
batch validation list.
There are only a few exceptions where a buffer is left untracked while
adding it to the validation list, justified below:
- Batch buffers: These are strictly read-only for the moment.
- BLORP buffer objects: Their seqnos are bumped manually at the end
of iris_blorp_exec() instead, in order to avoid plumbing domain
information through BLORP address combining.
- Scratch buffers: The contents of these are strictly thread-local.
- Shader images and SSBOs: Accesses of these buffers are explicitly
synchronized at the API level.
v2: Opt out of tracking more aggressively (Ken): In addition to the
above, surface states, binding tables, instructions and most
dynamic states are now left untracked, which means a *lot* more BO
uses marked IRIS_DOMAIN_NONE which need to be reviewed extremely
carefully, since the cache tracker won't be able to provide any
coherency guarantees for them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
This delimits all batch operations which access memory between
iris_batch_sync_region_start() and iris_batch_sync_region_end() calls.
This makes sure that any buffer objects accessed within the region are
considered in use through the same caching domain until the end of the
region.
Adding any buffer to the batch validation list outside of a sync
region will lead to an assertion failure in a future commit, unless
the caller explicitly opted out of the cache tracking mechanism.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
This introduces some minimalistic infrastructure which will be used in
order to partition the batch into a series of sections, each one with
a unique, monotonically-increasing sequence number. Section
boundaries will typically lie at points in the batch where the
execution and memory coherency status of some previous commands are
known, e.g. at batch buffer boundaries or PIPE_CONTROL commands.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
The purpose of this is to represent the cache coherency state of a
buffer as a vector of integers (AKA seqnos), one for each incoherent
caching domain of the GPU. A seqno will identify a single section of
a batch buffer uniquely across the whole pipe_screen (which means that
there will be no ambiguity about what context a given seqno belongs to
even if there are multiple threads accessing the same buffer in
parallel), and is guaranteed to be allocated in monotonically
increasing order within any given context. The iris_bo_bump_seqno()
helper is provided for marking the last update of a buffer from a
given caching domain in a lockless manner.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
In this case, vs->writes_point_size is true as the VS writes
gl_PointSize, but panfrost_writes_points_size() is false as we are not
drawing points so the hardware doesn't process it. Thus the varying
descriptor is emitted but elements is never written. When the VS runs,
it will attempt to write to elements, a NULL pointer.
The behaviour is architecture-independent. On Midgard, the write
silently fails, hence why this bug was never noticed before. On Bifrost,
this raises an MMU fault.
The fix is to set the format to VARYING_DISCARD to ignore the write.
Noticed on Neverball.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5290>
We're nearly out of dirty bits, and some patches pending review on
GitLab no longer apply due to that. Make room for them by splitting
off shader stage-specific bits into a separate stage_dirty mask.
An alternative would be to split compute-related bits into a separate
mask, but that would prevent the '<< stage' indexing done in various
parts of the driver from working.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5279>
Here we turn on uniform array resizing in the NIR linker and disable
the GLSL IR resizing pass when the NIR linker is enabled.
This will potentially make uniform arrays smaller due to NIR
optimising away more uniform uses.
Shader-db results (SKL):
total instructions in shared programs: 14947192 -> 14944093 (-0.02%)
instructions in affected programs: 138088 -> 134989 (-2.24%)
helped: 822
HURT: 4
total cycles in shared programs: 324868402 -> 324794597 (-0.02%)
cycles in affected programs: 3904170 -> 3830365 (-1.89%)
helped: 2333
HURT: 1485
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4910>
i965 currently uses the NIR uniform linker for spirv support. Until
now the only reason there has been no issue with calling the
lowering pass before the linker is because no garbage collection
is done between the calls.
An upcoming change to the linker will add an optimisation to resize
unform arrays where possible. Because lowering causes the array
defs to no longer be used the new optimisation ends up resizing the
arrays to 0. To fix this we move the lowering call after the
linking calls.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4910>
SPIRV OpControlBarrier can have both a memory and a control barrier
which some hardware can handle with a single instruction. Let's
turn the scoped_memory_barrier into a scoped barrier which can embed
both barrier types. Note that control-only or memory-only barriers can
be supported through this new intrinsic by passing NIR_SCOPE_NONE to the
unused barrier type.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4900>
This is now possible as we do uniform linking via a nir based linker.
Shader-db results for IRIS (SKL):
total instructions in shared programs: 14947192 -> 14946397 (<.01%)
instructions in affected programs: 39498 -> 38703 (-2.01%)
helped: 230
HURT: 18
total cycles in shared programs: 324868402 -> 324847058 (<.01%)
cycles in affected programs: 706701 -> 685357 (-3.02%)
helped: 599
HURT: 449
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4797>
This allows us to do API specific checks before removing variable
without filling nir_remove_dead_variables() with API specific code.
In the following patches we will use this to support the removal
of dead uniforms in GLSL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4797>
Switches have been rewired, VLANs have been reconfigured, network
elements with non-functional remote management have been removed from
racks and thrown on desks in anger.
This reverts commit ae6e1aee7d1bd49ae494b8a25ca33d092a3a145a.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5301>
GFX6 and GFX7 don't have the ds_bpermute (or permute) instruction,
but we would like to support subgroup shuffle on these old GPUs.
So we introduce a new pseudio instruction which will be lowered
to an "unrolled loop" that emulates bpermute on GFX6 and GFX7
using readlane instructions, while also respecting the exec mask
thanks to v_cmpx.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5223>
Until Meson 0.47, setting `-D arrayoption=` was not the same as setting
`-D arrayoption=[]`; the latter cleared the array, while the former
filled it with an empty string option.
Since Meson 0.47 [1], the former maps to the latter, so empty items can
only be set by explicitly giving an array containing an empty string,
ie. `-D arrayoption="['']"`; however note that this is *not* what we
want in any of the current Mesa code anyway.
This makes the code handling array options a bit more complicated, and
a lot more error-prone, so let's get rid of the confusion by removing
the empty-string option.
[1] f3a8f9c34d
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/386>
This fixes render_components being 0 when mrt_count=8, because shift by 32
is UB and in arm64 it ends up shifting by 0. This fixes tests with 8 MRTs.
Fixes the 3d path sysmem CmdClearAttachments to set RENDER_COMPONENTS, as
it was previously relying on tu6_emit_mrt setting it, but it is now part of
the pipeline state.
Also switch back to the previous behavior of not setting render components
for VK_ATTACHMENT_UNUSED attachments: we don't update the MRT state for
such attachments so we definitely don't want to be trying writing to those.
Fixes: 078aa9df8d ("tu: Move RENDER_COMPONENTS setting to pipeline state")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5292>
We're reconfiguring our Cambridge office lab networking and physical
setup for more scalability amongst other things. We can still run jobs
on one RK3399 at the peak outage, but we'll lose the T7x0 this morning,
so disable it until it's all back.
T820 is still disabled due to an unrelated BayLibre internal outage.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5296>
To prepare to use meson's builtin feature options in the future, which
are more powerful and provide useful feature for packagers, like the
ability to turn all "automagic" features off, and then explicitly turn
on the ones they want.
This is designed to make the transition softer, since the 'true' and
'false' are still accepted, just with a warning.
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4740>
Using HALT to immediately jump to the end of the shader is required to
implement GL_EXT_gpu_shader4 and OpenGL 3.0. However, vanilla OpenGL
1.2 doesn't forbid it and it likely makes something somewhere faster.
We should be consistent and implement the same discard behavior on all
hardware if we can.
The rules for HALT on Gen4-5 are a bit different from Gen6+. On the
older hardware, there is no stack for HALT; instead it's up to software
to save and restore mask registers. However, there's no real saving
needed since we only use HALT to jump to the end of the program where
we're about about to do our FB writes. All we need to do is reset AMask
to DMask, the value it was initialized to at the start of the thread.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5244>
bifrost_sampler_descriptor.zero1 is of type uint8_t.
Fix warning reported by Coverity.
Invalid type in argument to printf format specifier (PRINTF_ARGS)
invalid_type: Argument s->zero1 to format specifier %lx was expected to
have type unsigned long but has type unsigned char.
Fixes: 6148d1be4b ("panfrost: Fix size of bifrost sampler descriptor")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5248>
We're not *quite* ready to open the flood gates on Bifrost (a major
blocker is CI, which is itself blocked on the lockdowns - expected to be
resolved in the coming months..)
Nevertheless, let's add a debug option to probe on compatible Bifrost
devices to avoid keeping out-of-tree patches around.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5272>
We used to output a VFD_FETCH entry for each VFD_DECODE, but we can
instead output just one VFD_FETCH per VBO and point multiple
VFD_DECODE entries at the same VFD_FETCH entry. There's typically
fewer VBOs than vertex elements so this is a small win in itselfs, but
more importantly, the VFD_DECODE state now only depends on program
state.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5140>
Now that we have scripts in place to do baremetal testing of cheza, switch
it over. As of this writing, we have 5 chezas for baremetal and 4 for the
old docker CI setup (just 2 fewer than we originally had before this work,
since some had had filesystem failures and I switched those first), and
once we are sure of this we can backport to stable branch CI and move the
rest of them to baremetal.
I've run a lot of jobs through the baremetal scripts as I worked on
sorting out vulkan CTS stability, so I feel good about the stability of
the GLES CTS here.
The options job is now split out to separate jobs, as we don't currently
have a way to stack multiple sets deqp runs with different env vars in a
single baremetal run, and just chaining cros_servo.sh invocations runs
into a lack of cleanup of the serial-watching scripts which we rely on
container exit sorting out for us. This means a little less than 2x the
artifacts downloads we had before for a630 and a few more container
instantiations.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5247>
This will let us:
- deploy kernels for testing code depending on new kernel featuers
- Ensure a pristine state in the HW before starting our tests
- Avoid disk rot on the chezas taking them out (we'd lost 3/9 in a few
months).
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5247>
This is a set of kernel options I've come up with mostly cribbing from
chrome os's kernel config snippet. We also build an lzma kernel, as
uncompressed kernel is big but lzma is the only compression supported by
the bootloader. With that image, we have to pack it into a FIT formatted
image+dtb blob.
CONFIG_SUNRPC_DEBUG is added so that you can set "nfsrootdebug" to figure
out what's going wrong with your nfs mount (mine were "both the tcp and
nfsvers options were required, and don't try to use 'default' as the root
path to defer to DHCP's answer because otherwise you get
/tftpboot/default, just use an empty root path which doesn't prepend
/tftpboot.")
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5247>
It is kinda surprising that
image2 = fromPlanar(image, 2, NULL)
mapImage(..., image2, ...)
does not map the third plane.
This implements that behavior in the case where the DRI frontend
lowers the multi-planar textures.
In the case it doesn't this would need driver support. AFAIU at
least etnaviv is impacted, and while it looks possible, I don't
have the etnaviv knowledge to implement it.
Instead of silently returning weird results (either always plane 0
or possibly something interleaved) this adds an error return on
mapping multi-planar textures otherwise.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5200>
mesa/st is initializing pipe_shader_state for user define shaders.
This patch intialized pipe_shader_state for all passthough
and transform shaders.
This fixes crashes for several opengl apps. Issue is found in vmware
internal testing
Fixes: f01c0565bb ("draw: free the NIR IR.")
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5240>
When you're bringing up a new driver in CI with significant number of
failures (or when a CI run breaks a driver), the QPA extraction can easily
take the whole job timeout as we go about processing each QPA (100 of them
in my early VK CI fails) per unexpected result we're saving (50), which
involves reading and each line of the file in shell. By quickly filtering
out the QPA files not including our test, we can save all that shell
overhead, bringing QPA extract time down to a couple of minutes.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5225>
Making use of the relatively recent FDO_BASE_IMAGE feature of the
templates, the x86_test-base image contents are shared as a separate
layer by the x86_test-gl/vk images (meaning the former only needs to be
downloaded once for either or both of the latter). This should be more
efficient in terms of overall network bandwidth and storage, in
particular if the base image changes less often than the -gl/vk ones.
v2:
* List x86_test-base in needs: along with x86_test-gl/vk (see parent
commit)
* Always put $STABLE/TESTING_EPHEMERAL on separate lines, will make it
easier to add any non-ephemeral packages
Reviewed-by: Eric Anholt <eric@anholt.net> # v1
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5186>
This will make the GL drivers pick the right SIMD variant for a given
group size set during dispatch. The heuristic implemented in
brw_cs_simd_size_for_group_size() is the same as in brw_compile_cs().
The cs_prog_data::simd_size field was removed. The generated SIMD
sizes are marked in a bitmask, which is already used via
brw_cs_simd_size_for_group_size() by the drivers.
When in variable group size, it is OK if larger SIMD shader spill,
since we'd need it for the cases where the smaller one can't hold all
the invocations.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5142>
Move the decision one level up, let brw_compile_*() functions use the
spilling information to decide whether or not a certain width
compilation can spill (passed via run_*() functions).
The min_dispatch_width was used to compare with the dispatch_width and
decide whether "a previous shader is already available, so don't
accept spill".
This is replaced by:
- Not calling run_*() functions if it is know beforehand a smaller width
already spilled -- since the larger width will spill and fail;
- Explicitly passing whether or not a shader is allowed to spill. For
the cases where the smaller width is available and haven't spilled,
the larger width will be compiled but is only useful if it won't
spill.
Moving the decision to this level will be useful later for variable
group size, which is a case where we want all the widths to be allowed
to spill.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5142>
integer and float compare ops cannot take boolean types, so the bit size
of the inputs should be checked here so that we can swap to the logical
equality functions if we're being passed a bool value
resolves tons of validator errors in glsl piglit tests
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5231>
After the recent optimizations in ppir lowering that increase options
for combining, node_to_instr now may have multiple options of nodes to
insert and needs to decide which is better.
For example, if an instruction uses both a varying and a texture, there
are two nodes nodes that can be inserted to the load varying slot in the
same instruction (ld_var and ld_coords). It is much more advantageous to
pipeline the load texture coords since that enables the higher precision
path for texture coordinates. However, with the current recursive
expansion, this cannot be influenced.
This simple ready list implementation in node_to_instr allows it to
choose the next node to expand based on a priority score, rather than
relying on the random order coming from the recursive expansion.
Other than preferring nodes with pipeline output (which covers ld_coords
vs ld_var), nodes using later slots in the pipeline are now expanded
first, allowing node_to_instr to make all of the earlier (pipelineable)
nodes available in the ready list so the best one can be chosen when
picking nodes for the earlier slots.
Fixes: 632a921bd0 lima/ppir: optimize tex loads with single successor
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5092>
We don't force w=1 for Bifrost textures. We already compose this into
the swizzle as necessary, so we can just ignore this field I think. But
let's identify it so we don't forget what it is.
The blob uses it to force w=1 for <= 3-channel formats (0x10), as well
as a flag to swap r/b for BGRA (0x4). There are probably other flags
here but it doesn't.. really matter to us.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5232>
The cu bitmap in amd gpu info structure is
4x4 size array, and it's usually suitable for Vega
ASICs which has 4*2 SE/SH layout.
But for Arcturus, SE/SH layout is changed to 8*1.
To mostly reduce the impact, we make it compatible
with current bitmap array as below:
SE4,SH0 --> cu_bitmap[0][1]
SE5,SH0 --> cu_bitmap[1][1]
SE6,SH0 --> cu_bitmap[2][1]
SE7,SH0 --> cu_bitmap[3][1]
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5212>
This will be used to import images which have different layout from what
turnip uses by default. For example non-UBWC (linear) images from the video
decoder on some hardware have a 512 pitch alignment.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4596>
Currently the nir linker resizes the amount of storage needed to hold
uniform information on the fly while linking. As shaders can contain
thousands of uniforms this can be very slow. For example some Godot
shaders can take 30 seconds to compile on some machines.
In this change we count the amount of storage needed before we start
processing the uniforms. This is what the GLSL IR linker does also.
Fixes: 95f555a93a ("st/glsl_to_nir: make use of nir linker for linking uniforms")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2996
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5137>
Generally we do not completely stop compilation as soon as we see an error,
instead we continue on to attemp to find any futher errors.
This means we shouldn't be checking state->error to see if any error has
happened during the compilation process, doing so was causing
process_parameters() to fail on completely valid functions if there was
any error found in the shader previously. This then caused the valid
functions not to be found because the paramlist was considered empty,
resulting in the compiler spewing out misleading error messages.
Here we simply add the IR error value to the param list when we have
an issue with processing a parameter, this leads to much better error
messaging.
Fixes: 53e4159eaa ("glsl: stop processing function parameters if error happened")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5205>
../src/panfrost/pandecode/decode.c:1176:60: warning: taking address of
packed member of ‘struct mali_framebuffer’ may result in an unaligned
pointer value [-Waddress-of-packed-member]
1176 |
pandecode_midgard_tiler_descriptor(&fb->tiler, fb->width1 + 1,
fb->height1 + 1, is_fragment, true);
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5219>
The nir_intrinsic_load_simd_width_intel is always lowered by the
brw_nir_lower_simd() pass before the emission happens. This is likely
a "leftover" from patch rewriting/squashing that happened when this
intrinsic was added.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5213>
Silence the warning about this always-true comparison.
src/util/softfloat.c:214:42: warning: comparison of constant 32768
with expression of type 'int16_t' (aka 'short') is always false
[-Wtautological-constant-out-of-range-compare]
} else if ((e > 0x1d) || (0x8000 <= m)) {
~~~~~~ ^ ~
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5174>
This makes it a little more explicit that the values line up.
src/freedreno/vulkan/tu_device.c:2209:75: warning: implicit conversion
from enumeration type 'const VkSamplerReductionMode' (aka 'const enum
VkSamplerReductionMode') to different enumeration type 'enum
a6xx_reduction_mode' [-Wenum-conversion]
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5174>
We're hard-coding this value, so let's use the hw enum and avoid a
warning.
src/freedreno/vulkan/tu_clear_blit.c:2091:19: warning: implicit
conversion from enumeration type 'enum VkStencilOp' to different
enumeration type 'enum adreno_stencil_op' [-Wenum-conversion]
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5174>
`layout_gmem()` recalculates the # of bins in x/y dimensions after
aligning the bin width/height to required dimensions. Because of this,
the resulting gmem config could have fewer bins in either dimension.
But the tile/bin layout and the pipe assignment logic were still using
the original values. Which could result in extraneous bins with a
width and/or height of zero.
Because the gmem rendering code uses `gmem->bin_w`/`h` to determine
the number of bins, this could result in some zero size bins being
executed, while later valid bins are skipped. Which can leave un-
rendered portions of the screen (generally lower-right).
To fix this, be sure to use `gmem->bin_w`/`h` rather than the local
variables.
Fixes: 1bd38746d5 ("freedreno/gmem: rework gmem layout algo")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5189>
`gmem_page_align` is generation specific (with the exception of a2xx
which has a different value for fast-clear). So we should override the
value from the captured gmem_key according to the gpu we are emulating
for the purposes of calculating gmem config.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5189>
This adds RB, VFMT and TFMT NONE values for a3xx-a5xx and FMT6_NONE
for a6xx. Use those values instead of open coded (enum xxx) ~0 or
sometimes even ~0, which triggers out-of-enum range warnings.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5173>
The closed GL driver uses resinfo on images with the writeonly flag (using
the texture-path's getsize only for readonly images). The closed vulkan
driver seems to use resinfo regardless. Using resinfo doesn't need any
fixups after the instruction. It also avoids one of the needs for the
TEX_CONST state for the image, which is awkward to set up in the GL
driver.
The new handler goes into ir3_a6xx to be next to the other current image
code, but the a4xx version is left in place because it wants a bunch of
sampler helpers.
Fixes assertion failure in dEQP-VK.image.image_size.buffer.readonly_32.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3501>
Since I'm going to start using the resinfo opcode, make sure we can disasm
the blob's instances of it that I've found. And, since resinfo disasm
will impact ldgb on pre-a6xx, include some of those too.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3501>
We were using the size of the underlying buffer (in R8 elements), while we
need to be using the size of the image view (which may be a subset of the
underlying buffer, and may be in a different format from R8).
This fix means less dereferencing off of the end of shader image views for
buffer images, but more importantly is needed to get the right answer from
resinfo if we are to switch to that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3501>
This was implemented in version 6 of the VK_ANDROID_native_buffer
extension and we only implement version 5. However, the Android
Vulkan loader only checks whether vkGetInstanceProcAddr for the
function is not NULL.
This all went wrong when we switched to the layer code from ANV.
Because the function may now be different per device, it adds fallback
functions that dispatch to the dispatch table. So if we didn't implement
the function we still returned a pointer to the dispatch function,
which made the Android Vulkan loader believe it was supported.
Dispatch functions:
d555794f30/src/amd/vulkan/radv_entrypoints_gen.py (L328)
Fixes: d555794f30 "radv: update entrypoints generation from ANV"
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2936
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5198>
Taken from EGL-Registry commit 90b78b0662e2f0548cfd1926fb77bf628933541b.
With this update EGL_WL_bind_wayland_display and
EGL_WL_create_wayland_buffer_from_image are now in the registry, so we
don't need to define them in eglmesaext.h anymore.
The eglSwapBufferWithDamage* functions now take a const rects argument.
The eglapi.c function signature is updated accordingly.
Signed-off-by: Simon Ser <contact@emersion.fr>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4953>
From GLSL 4.60.7 spec, section 6.1.2 "Subroutines":
It is a compile-time error if arguments and return type don’t match
between the function and each associated subroutine type.
Before, if subroutine type and implementation function were declared
with types, that could be implicitly converted, it led to a runtime crash.
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5125>
Fix warning reported by Coverity Scan.
Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member m_num_clip_dist is not
initialized in this constructor nor in any functions that it calls.
Fixes: f7df2c57a2 ("r600/sfn: extract class to handle the VS export to different stages")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5180>
Given that a530 doesn't have cpufreq, we really don't have the time to be
running the validator on all of deqp. This also helps explain why I had
to go to such a small fraction on the a3xx gles3 run (which we can now
increase). However, a3xx gles2 seems to be fast enough that we can leave
it enabled and get coverage for older chips.
Because we run more tests now, clear out some stale xfails from the a3xx
list.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5089>
We were doing sed -i /filter/p, which printed everything but printed the
filtered things twice (though they'd only get tested once). Now that the
filter works, run all the UBO tests instead of doing a 1/5 run, revealing
a new failure.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5089>
the current query pool implementation expects queries to be reset at
the time they're initiated, which means queries started at this point
need to also be explicitly reset
the zink_begin_query() function can't be reused here or else the
query will be double-added to the active list, triggering an infinite loop
ref mesa/mesa#3000
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5120>
queries with a valid active_list pointer are likely to still be active,
and vk spec requires them to have completed prior to being destroyed
this isn't completely accurate, as it's currently possible for queries
to remain in the active list while not actually being active, but it
resolves driver crashes that can occur from destroying a stilll-running
query pool object
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5120>
These fields (TILE_ALL and MIN_LAYERSZ) seem to be the same on a5xx as
a6xx, having looked at some UBWC vs non-UBWC texturator cases. Setting
MIN_LAYERSZ does fix the 3D fail we see in the CTS.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5127>
I'm working on fixing the 3D layouts in CI so we can stabilize it, but I
wanted unit tests using the texturator scripts to make sure I don't break
things. This also makes a5xx and a6xx layout easily comparable again.
This is a straightforward move of the code with prsc references replaced
by arguments in the style of fdl6.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5127>
Different variables need to respect different bounds. In general,
16-bytes is okay, but for 4-channel 16-bit vectors, we can't cross 8
byte boundaries (else the swizzles will not be packable after), so we
update LCRA to allow this more general form.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5151>
Now that we don't get scheduled to any 19mhz CPUs, the old GLES3 job went
from 12 minutes of deqp-runner runtime to 54s. Increase how much of the
testsuite we cover in exchange, still keeping the runtime at 3-6 min
(compared to previous 10-17 min). Since the tests we're running changed,
reset the xfails list.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5115>
Issue description from Matt's commit e7c376ad:
"var_range_end(v, n) loops over the n components of variable number v and
finds the maximum value, giving the last use of any component of v.
Therefore it expects v to correspond to the variable associated with the
.x channel of the VGRF.
var_from_reg() however returns the variable for the first channel of the
VGRF, post-swizzle.
So, if the last register had a swizzle with y, z, or w in the swizzle
component, we would read out of bounds. For any other register, we would
read liveness information from the next register.
The fix is to convert the src_reg to a dst_reg in order to call the
dst_reg version of var_from_reg() that doesn't consider the swizzle."
Closes: #3003
Fixes: 48dfb30f ('intel/compiler: Move all live interval analysis results into vec4_live_variables')
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrii Simiklit <asimiklit.work@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4941>
Meson 0.55.0 will set the MESON_EXE_WRAPPER environment variable to the
joined version of that wrapper if it is needed. Our tests that take
compiled targets as arguments can use that information to run cross
built binaries, or if there isn't a wrapper and we get an ENOEXEC, we
can skip the tests gracefully.
We try to use mesonlib.split_args, which handles windows arguments
better than python's builtin shlex module, but fall back to that if the
meson module isn't available for some reason.
Cc: 20.0 20.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5103>
This should fix such valgrind warnings:
==37417== Uninitialised byte(s) found during client check request
==37417== at 0x6183471: blob_write_bytes (blob.c:163)
==37417== by 0x629785B: encode_type_to_blob (glsl_types.cpp:2760)
==37417== by 0x61E68D8: write_variable (nir_serialize.c:293)
==37417== by 0x61E6F6A: write_var_list (nir_serialize.c:421)
==37417== by 0x61EBA7A: nir_serialize (nir_serialize.c:2018)
==37417== by 0x5B5E007: serialize_nir_part (brw_program_binary.c:135)
==37417== by 0x5B5E7F3: brw_serialize_program_binary (brw_program_binary.c:299)
==37417== by 0x5FEF5FF: write_program_payload (program_binary.c:177)
==37417== by 0x5FEF7BB: _mesa_get_program_binary_length (program_binary.c:225)
==37417== by 0x5E3D31D: get_programiv (shaderapi.c:912)
==37417== by 0x5E3F730: _mesa_GetProgramiv (shaderapi.c:1827)
==37417== by 0x111DA0: program_binary_save_restore (shader_runner.c:686)
==37417== Address 0x8f59481 is 81 bytes inside a block of size 480 alloc'd
==37417== at 0x483B7F3: malloc (vg_replace_malloc.c:309)
==37417== by 0x618CE67: ralloc_size (ralloc.c:123)
==37417== by 0x618CF35: rzalloc_size (ralloc.c:155)
==37417== by 0x618D245: rzalloc_array_size (ralloc.c:234)
==37417== by 0x629041D: glsl_type::glsl_type(glsl_struct_field const*, unsigned int, glsl_interface_packing, bool, char const*) (glsl_types.cpp:148)
==37417== by 0x6293EC3: glsl_type::get_interface_instance(glsl_struct_field const*, unsigned int, glsl_interface_packing, bool, char const*) (glsl_types.cpp:1271)
==37417== by 0x604C878: (anonymous namespace)::per_vertex_accumulator::construct_interface_instance() const (builtin_variables.cpp:365)
==37417== by 0x6050722: (anonymous namespace)::builtin_variable_generator::generate_varyings() (builtin_variables.cpp:1568)
==37417== by 0x60509CA: _mesa_glsl_initialize_variables(exec_list*, _mesa_glsl_parse_state*) (builtin_variables.cpp:1600)
==37417== by 0x6149AE9: _mesa_ast_to_hir(exec_list*, _mesa_glsl_parse_state*) (ast_to_hir.cpp:131)
==37417== by 0x60706D6: _mesa_glsl_compile_shader (glsl_parser_extras.cpp:2222)
==37417== by 0x5E3DC16: _mesa_compile_shader (shaderapi.c:1211)
==37417== Use of uninitialised value of size 8
==37417== at 0x529AE13: ??? (in /usr/lib/x86_64-linux-gnu/libz.so.1.2.11)
==37417== by 0x6184075: util_hash_crc32 (crc32.c:127)
==37417== by 0x5FEF401: write_program_binary (program_binary.c:95)
==37417== by 0x5FEF8BC: _mesa_get_program_binary (program_binary.c:252)
==37417== by 0x5E40E22: _mesa_GetProgramBinary (shaderapi.c:2411)
==37417== by 0x4914057: stub_glGetProgramBinary (piglit-dispatch-gen.c:24737)
==37417== by 0x111E4A: program_binary_save_restore (shader_runner.c:704)
==37417== by 0x11F765: piglit_display (shader_runner.c:5112)
==37417== by 0x499082F: run_test (piglit_fbo_framework.c:52)
==37417== by 0x4980E89: piglit_gl_test_run (piglit-framework-gl.c:229)
==37417== by 0x110DA9: main (shader_runner.c:72)
v2: - decode_glsl_struct_field_from_blob and
encode_glsl_struct_field should be `static`
( Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> )
v3: - we can get rid of `struct packed_struct_field_flags`
( Tapani Pälli <tapani.palli@intel.com> )
- we can get rid of `unsigned __pad: 15` bitfield
( Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> )
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Andrii Simiklit <asimiklit.work@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5054>
We initially used this debug option to mean "don't bother registering
the OA configuration into the kernel".
This change makes this option suppress any interaction with the
i915/perf interface. This is useful when debugging self modifying
batches with performance queries while running on the intel_mi_runner.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2775>
This has the same kernel requirements are VK_INTEL_performance_query
v2: Fix empty queue submit (Lionel)
v3: Fix autotool build issue (Piotr Byszewski)
v4: Fix Reset & Begin/End in same command buffer, using soft-pin &
relocation on the same buffer won't work currently. This version
uses a somewhat dirty trick in anv_execbuf_add_bo (Piotr Byszewski)
v5: Fix enumeration with null pointers for either pCounters or
pCounterDescriptions (Piotr)
Fix return condition on enumeration (Lionel)
Set counter uuid using sha1 hashes (Lionel)
v6: Fix counters scope, should be COMMAND_KHR not COMMAND_BUFFER_KHR (Lionel)
v7: Rebase (Lionel)
v8: Rework checking for loaded queries (Lionel)
v9: Use new i915-perf interface
v10: Use anv_multialloc (Jason)
v11: Implement perf query passes using self modifying batches (Lionel)
Limit support to softpin/gen8
v12: Remove spurious changes (Jason)
v13: Drop relocs (Jason)
v14: Avoid overwritting .sType in
VkPerformanceCounterKHR/VkPerformanceCounterDescriptionKHR (Lionel)
v15: Don't copy the entire
VkPerformanceCounterKHR/VkPerformanceCounterDescriptionKHR (Jason)
Reuse anv_batch rather than custom packing (Jason)
v16: Fix missing MI_BB_END in reconfiguration batch
Only report the extension with kernel support (perf_version >= 3)
v17: Some cleanup of unused stuff
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2775>
For a future extension we want to be able to list the counters. Our
existing sets counters might contain the same counters multiple times.
This is a side effect of the fixed OA counters in the HW. We track
thoses with a mask so that we know when a counter is available from
multiple metrics.
v2: Use BITFIELD64_BIT() (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2775>
This change adds a call/return execution mode for secondary command
buffer rather than the existing copy into the primary batch mode.
v2: Rework convention to avoid burning an ALU register (Jason)
v3: Use anv_address_add() (Jason)
v4: Move command emissions to anv_batch_chain.c (Jason)
v5: Also move last MI_BBS emission in secondary command buffer to
anv_batch_chain.c (Jason)
v6: Fix end secondary command buffer end (Jason)
v7: Refactor anv_batch_address() to remove additional emit functions
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2775>
This instruction has a group with the same name than another field above :
<field name="Data DWord" start="64" end="95" type="uint"/>
<group count="0" start="96" size="64">
<field name="Register Offset" start="2" end="22" type="offset"/>
<field name="Data DWord" start="32" end="63" type="uint"/>
</group>
The script was replacing the offset of the field first with the second
one in the group.
This change ignore anything a group within an instruction.
v2: Drop unused variable (Rafael)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2775>
We can just set the extent and not bufferRowLength/bufferImageHeight,
and the extent may not be a multiple of the block size if it covers the
entire image. In this case we have to first divide to get the
width/height in terms of blocks, and then multiply by the block size to
get the buffer's pitch and layer size. Multiplying and dividing instead
won't get the correct result when the extent covers the entire image and
isn't a multiple of the block size. This also makes the code easier to
follow because we don't calculate a pitch in non-sensical units (bytes
times the block width) as an intermediate step.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5098>
Previously we only supported BLIT_SRC_BIT and BLIT_DEST_BIT together, so
we didn't have to worry about initializing blit-related fields for
texture-only formats, but it turns out that 2d blits work out just fine
with these formats and we'll need to enable BLIT_SRC_BIT for
texture-only formats due to a Vulkan requirement on compressed formats.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5098>
All it takes are a couple small tweaks to the clone infrastructure to
allow us to use it without any remap table at all. This reduces code
duplication and the chances for bugs that come with it. In particular,
the hand-rolled nir_alu_instr_clone didn't preserve no_[un]signed_wrap,
or source/destination modifiers.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5094>
It sure looks like it should be a Boolean value, but it's not. The
values that we really want for later platforms are either 2 or 3. The
old intel_stub.c in shader-db just always returns 3
(I915_GEM_PPGTT_FULL). This returns the same set of values per platform
that kernel 5.6.13 would.
When using the shim for ICL with i965 driver, this fixes:
i965 requires softpin (Kernel 4.5) on Gen10+.
Fixes: 0f4f1d70bf ("intel: add stub_gpu tool")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5061>
Rather than heuristically guessing what PIPE formats correspond to what
in the hardware, hardcode a table. This is more verbose, but a lot more
obvious -- the previous format support code was a source of endless
silent bugs.
v2: Don't report RGB233 (icecream95). Allow RGB5 for texturing
(icecream95).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5069>
This has the downside of putting block successor validation in two
places that are a bit further apart. However, handling them as a
special case makes the code more confusing than needed. At least two
different people have not noticed that we don't have jump instruction
validation in the last week or two and added it. Being able to search
for validate_jump_instr is useful.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5101>
For 16-bit bank LDS (ie. Kabini/Stoney) we need a slightly different
path. It's completely untested though because I don't have these
chips but according to vkpipeline-db the generated assembly seems fine.
Note that 16-bit I/O is currently only exposed on GFX9+ for both
compiler backends.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4966>
On EGL 1.4, one had to check for the existence of EGL_EXT_platform_base
before querying the eglGetPlatformDisplayEXT() and
eglCreatePlatformWindowSurfaceEXT() symbols, to then use them if the
EGL_EXT_platform_* extension for the given platform was exposed.
Since EGL 1.5, the platform functionality was made core, which means we
can obtain the symbols unconditionally, but we can't know the EGL
version before having created a display, at which point we've already
done a platform selection by passing an EGLNativeDisplay. The
EGL_KHR_platform_* extensions thus are used by clients to know whether
it's safe or not to dlsym() the EGL 1.5 symbols.
This commit adds those extensions when the given platform is enabled.
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5052>
We create a hash table mapping GPU va's to mmap structures, such that
searching for a mapped address is effectively O(1) rather than O(N) to
the number of mapped entries as with the previous linked list approach.
This is a memory-time tradeoff, but the speed-up is tracing is notable.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5099>
For cases where instructions have a src and/or dst type, validate that
it matches the src/dst register types. And for cases where there are
different opcodes for half vs full, validate that the opcode matches.
Now that we maintain this properly throughout the stages of the ir, we
can drop the fixups from the RA pass.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>
When we start doing cp iteratively, we hit the case that we've already
`cmps.s.*` into a `cmps.s.ne p0.x, ...`.. when we try to do that again
we can invert the logic condition. So check specifically the condition
to prevent this.
TODO we could maybe be more clever about this to combine conditions.
But why isn't that happening in nir? For example, see
dEQP-GLES31.functional.ssbo.layout.single_basic_array.packed.bool
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>
There can be multiple (for ex.) f32f16's from a single source, in
particular appearing in different blocks. We need to update all uses
of the src which had conversion folded in, not all the uses of the
individual cov. Also, to avoid invalidating the ssa use info that was
gathered at the beginning of the pass, don't actually eliminate the
cov, but instead change it to a simple mov that the cp pass can gobble
up.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>
Or do the easy thing and claim we always changed something. It is kinda
hard and not worth the effort to determine for real.
Also rip out unused error handling. This pass should never fail. And
we weren't even actually checking the return.
And while we're at it, switch over to taking the 'struct ir3 ir*`
instead of ctx, to standardize with the other passes.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5048>
The runner is an x86 system, so running the ARM image meant doing
everything at runtime under qemu, and for the xz of the test rootfs that
was quite expensive. Also, we can rebuild x86 images much faster than we
can rebuild arm images for container development, which will help unblock
some of the other feature parity work I have to do versus the old docker
system that cheza is using.
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5033>
We create a stub never-signaled seqno to force the iris_fence to use the
fence fd, but we need to fully initialise the iris_seqno struct so that
the unset pointers are NULL and we do not try to destroy them later.
==38644== Conditional jump or move depends on uninitialised value(s)
==38644== at 0xF7FBFAA: pipe_resource_reference (u_inlines.h:142)
==38644== by 0xF7FC22F: iris_seqno_destroy (iris_seqno.c:38)
==38644== by 0xF7E8930: iris_seqno_reference (iris_seqno.h:89)
==38644== by 0xF7E8BC3: iris_fence_destroy (iris_fence.c:131)
==38644== by 0xF7E8C41: iris_fence_reference (iris_fence.c:143)
==38644== by 0xEF24525: dri2_destroy_fence (dri_helpers.c:176)
==38644== by 0x4865DC2: dri2_egl_unref_sync (egl_dri2.c:3302)
==38644== by 0x48661E8: dri2_destroy_sync (egl_dri2.c:3433)
==38644== by 0x4855BA4: _eglDestroySync (eglapi.c:1952)
==38644== by 0x4855CF5: eglDestroySyncKHR (eglapi.c:1972)
==38644== by 0x402628: test_cleanup (egl_khr_fence_sync.c:232)
==38644== by 0x40421E: test_eglCreateSyncKHR_native_from_fd (egl_khr_fence_sync.c:1521)
Closes: #2909
Fixes: fd1907efb3 ("iris: Convert fences to using lightweight seqno")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5004>
Older kernels did not support `MSM_INFO_GET_IOVA`. But this is only
required for (a) clover (ie. `fd_set_global_binding()`) and drm paths
that are limited to newer kernels. So move the location of the assert
to fix new userspace on old kernels.
Fixes: c9e8df61dc ("freedreno: Initialize the bo's iova at creation time.")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5081>
"softpin" mode was introduced in the same kernel as the 'DUMP' flag. So
if we are using the legacy non-softpin path, clear the dump flag. OTOH
the 'DUMP' flag isn't quite so needed on older kernels, since we would
get all cmdstream, even SDS stateobjs, dumped regardless, as they would
have cmd table entries.
Fixes: b2c23b1e48 ("freedreno: Mark all ringbuffer BOs as to be dumped on crash.")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5081>
v2: Python's nitpick (Andres)
In case of an empty list of traces, the results.yml will contain an empty
curly braces. In YAML, an associative array can also be specified by text
enclosed in curly braces ({}),
This commit also adds the corresponding test to check the behavior of
tracie when no traces are added in the traces.yml file.
Signed-off-by: Pablo Saavedra <psaavedra@igalia.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4916>
test_tracie_skips_traces_without_checksum does the logic previous to
the commit 8546d1dd78. The traces.yml includes
several traces, only the one without checksum is ignored by tracie.
As a complementary action, this change adds an new test
(test_tracie_only_traces_without_checksum) to verify the behavior for
cases where the traces.yml only contains traces without checksum.
Finally, test_tracie_skips_traces_without_checksum is renamed as
test_tracie_traces_with_and_without_checksum
Signed-off-by: Pablo Saavedra <psaavedra@igalia.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4916>
v2: Verbatim translation from the original shell script
Make the corrections visible in explicit commits (Andres)
Remove redundant code (Alexandros)
Code style nitpick (Rohan)
Reimplementation of the tracie's self-tests using a pythonic test suit
(pytest).
The new tracie/test.py module is almost a direct translation of the
tests defined in the tracie/test.sh. This new implementation of the
test provides a more common framework where define the tests.
Also allows a better introspection for the tests results and/or
resulting errors.
This patch also adds python3-pytest as dependency for the built images
and adapts the tracie-runner scripts to run the self-test using pytest.
Signed-off-by: Pablo Saavedra <psaavedra@igalia.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> [v1]
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Rohan Garg <rohan.garg@collabora.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4916>
RESULTS_PATH and RESULTS_PATH, as variables in the module context, are
resolved one single time, only during the first module loading. If the
the Python code in execution changes the current dir at some point,
those paths are not going to be updated anymore keeping the paths
wrongly pointing to the old working dir.
This change modify the definition of those variables to use simply
relative paths.
Signed-off-by: Pablo Saavedra <psaavedra@igalia.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Rohan Garg <rohan.garg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4916>
A3xx apparently has higher alignment requirements than later gens for
indirect const uploads. It also has fewer of them. Add compiler
parameters for both settings, and set accordingly for a3xx and a4xx+.
This fixes all the ubo test failures caused by this optimization.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5077>
This fixes a number of dEQP tests:
dEQP-GLES3.functional.fbo.blit.conversion.r8*
dEQP-GLES3.texture.specification.basic_teximage2d.r8*
and others. The reason why this enum showed up in traces for R8 is that
it was an "upgraded" texture to R8G8.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5073>
Since commit 8b8af6d398 there is a
performance regression in dirt 4 on picasso APUs.
The game ends up feeding a large value into this which overflows on the
conversion to 16bit float. With the old implementation (which now lives
in util_float_to_half_rtz) it would be clamped to inf-1, while the new
one returns inf. This causes a performance hit somehow at some point
down the line.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: 8b8af6d398 "gallium/util: Switch util_float_to_half to _mesa_float_to_half()'s impl."
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5062>
ppir doesn't register successors in other blocks, and causes
ppir_node_has_single_succ to be unreliable as it might return true for
nodes with successors in other blocks.
This is bad for optimization passes that try to pipeline registers or
avoid insertion of movs, as that can generally only be done for nodes
with a single user.
As of now, ppir can't just start adding successors in other blocks as
that breaks the scheduling code.
So this patch is a little hacky but enables pipelining optimizations
during lowering. It can hopefully be removed during future scheduler
rework.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4975>
In some paths where ppir_emit_cf_list is called, compilation errors such
as in unsupported features were not being handled, allowing compilation
to continue and fail at some random point later.
Handle them properly so compilation aborts in the expected way rather
than what may look like a compiler crash/bug.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4975>
All the other files in bin/ are not used by any build system and as such
cannot affect the build.
I've been working on maintainer tools lately and it's frustrating to have
the CI wait for 45 minutes to rebuild everything and not even read/run
the files in the MR when it could've just been merged and moved on to
the next MR 45 minutes ago.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5046>
Add new IRIS_DIRTY_STENCIL_REF dirty flag which would help us to trigger
separate 3DSTATE_WM_DEPTH_STENCIL packet using modify disable fields.
Instead of merging two packets into one in order to build
3DSTATE_WM_DEPTH_STENCIL state, set_stencil_ref can use
IRIS_DIRTY_STENCIL_REF bit and bind_zsa_state can use
IRIS_DIRTY_WN_DEPTH_STENCIL, both could cause packet to happen with
available information using modify disable bits which allow us to
construct packet by ignoring set of fields.
v2: (Kenneth Graunke)
- Fix condition ordering.
- Club GEN cases.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3688>
This implements OpCopyObject as a blind copy and propagates the
access mask properly even if the source object type isn't a SSA
value.
This fixes some recent dEQP-VK.descriptor_indexing.* failures
since CTS changed and now apply nonUniformEXT after constructing
a combined image/sampler.
Original patch is from Jason Ekstrand.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4909>
If we have a separate render resource, it may contain more up-to-date
data than what is available in the base resource, so we need to retarget
the transfer to this resource. As the most likely reason for the
existence of the render resource is a multi-tiled render layout we need
to allow this transfer to go through the resolve/blit copy path, as we
can't de-/tile those layouts in software.
Fixes: b962776530 (etnaviv: rework compatible render base)
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5051>
We will later use the devinfo from iris_bufmgr, where we don't have
access to the screen pointer. And since we are moving it, we can reuse
it in Anv and i965.
v2: return error code and check for it on Anv (Lionel).
v3: Remove anv_gem_get_aperture() from anv_private.h and stubs (Lionel).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5043>
Some apps do set pSourceRect to the full area even if not needed.
Filtering out this case is helpful, as we currently do not handle
properly resizing (pDestRect or window size not of the size of the resource)
when pSourceRect is set. Indeed in this case pSourceRect needs to be modified
before being passed to the presentation backend.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5015>
pSourceRect and pDestRect allows to:
A) display a crop of the backbuffer
B) display the content in a subregion (after an offset)
C) resize the content before displaying
Before this patch, only features A and B were supported.
This patch adds C, but breaks A, which current support
relied on C not being implemented.
I think C is more important than A, and A can be added later.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5015>
This patch caps to 4GB the limit of GPU memory accessible
only for 32bits build.
This would deserve some tests on windows, so we might change that
behaviour in the future. For example, it's possible that
GetAvailableTextureMem is capped to 4GB on 64bits build.
We cap to a bit less than 4GB, which might help
https://github.com/iXit/Mesa-3D/issues/323
In addition, increase from 80% to 95% the allocation limit above
which we fail allocating.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5015>
Ideally apps shouldn't make buggy calls.
Still if some do, let's avoid crashing.
Without this patch, if some calls are invalid,
for example if replaying a trace of a game needing
a lot of VRAM on a card with not much VRAM, it can
crash.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5015>
I don't remember exactly the conditions of the crash,
but I had a trace which was crashing in the gallium driver
before doing any rendering (something about viewports being not initialized).
It's not the first time we hit such a problem, so rather than investigating
that crash, I chose to just initialize every states at device creation.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5015>
This function has been added in glibc 2.25, and the related syscall in
Linux 3.17, in order to avoid requiring the /dev/urandom to exist, and
doing the open()/read()/close() dance on it.
We pass GRND_NONBLOCK so that it doesn’t block if not enough entropy has
been gathered to initialise the /dev/urandom source, and fallback to the
next source in any error case.
Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2026>
Device UUID becomes SHA1('freedreno' + gpu_id).
Driver UUID becomes SHA1(mesa-version + git-head-sha1).
v2: Don't use build_id for driver UUID since it generates different
values for vulkan and gl shared objects. (Kristian)
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4847>
Whoops. After fixing dual-source blending, dEQP-VK.pipeline.blend.* all
go from skipped to pass, and fixes a bunch of
dEQP-VK.api.info.format_properties.* tests where blending is required.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5039>
The hardware expects that where MRT0 and MRT1 would normally go are the
dual sources for MRT0, whereas GLSL has an extra "index" parameter that
indicates which source it is. Remap it when handling FS outputs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5039>
...in every file that includes intel_log.h.
In file included from src/intel/vulkan/anv_private.h:93,
from src/intel/vulkan/genX_cmd_buffer.c:27:
src/intel/common/intel_log.h: In function ‘__intel_log_use_args’:
src/intel/common/intel_log.h:75:34: warning: unused parameter ‘format’ [-Wunused-parameter]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4994>
...in every file that includes anv_private.h.
In file included from src/intel/vulkan/genX_cmd_buffer.c:27:
src/intel/vulkan/anv_private.h: In function ‘anv_image_get_clear_color_addr’:
src/intel/vulkan/anv_private.h:3690:57: warning: unused parameter ‘device’ [-Wunused-parameter]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4994>
Since i965 no longer uses this function for blitting color buffers,
there is no driver left that will ever support multisample textures and
use _mesa_meta_BlitFramebuffer.
v2: Delete blit_state::msaa_shaders and enum blit_msaa_shader.
text data bss dec hex filename
12243286 1344936 1290748 14878970 e308fa before/lib64/dri/i965_dri.so
12240398 1344936 1290748 14876082 e2fdb2 after/lib64/dri/i965_dri.so
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/856>
text data bss dec hex filename
12243510 1344936 1290748 14879194 e309da before/lib64/dri/i965_dri.so
12243446 1344936 1290748 14879130 e3099a after/lib64/dri/i965_dri.so
v2: Use _mesa_load_matrix to set the projection matrix instead of
frobbing dirty bits directly. Suggested by Jason. Clean up some
comments in the neighborhood. Since glOrtho isn't called, there's no
point in mentioning it in the comment. Only _mesa_load_identity_matrix
on the projection matrix when it isn't set to an ortho matrix.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/856>
Unlike the existing _math_matrix_ortho, the new _math_float_ortho
function just stores the calculated matrix in an array of floats. It
does not multiply the new matrix with data already stored.
text data bss dec hex filename
12243486 1344936 1290748 14879170 e309c2 before/lib64/dri/i965_dri.so
12243510 1344936 1290748 14879194 e309da after/lib64/dri/i965_dri.so
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/856>
These are basically DSA versions of glLoadIdentity() and glLoadMatrix()
that are available for internal Mesa use.
text data bss dec hex filename
12243574 1344936 1290748 14879258 e30a1a before/lib64/dri/i965_dri.so
12243486 1344936 1290748 14879170 e309c2 after/lib64/dri/i965_dri.so
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/856>
Since i965 no longer uses this function for clearing color buffers,
there is no driver left that will ever support integer textures and use
_mesa_meta_glsl_Clear.
As a side note, the has_integer_textures check was rubbish anyway
because meta always smashes the API to API_OPENGL_COMPAT.
text data bss dec hex filename
12244406 1344936 1290748 14880090 e30d5a before/lib64/dri/i965_dri.so
12243574 1344936 1290748 14879258 e30a1a after/lib64/dri/i965_dri.so
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/856>
text data bss dec hex filename
12244854 1344936 1290748 14880538 e30f1a before/lib64/dri/i965_dri.so
12244406 1344936 1290748 14880090 e30d5a after/lib64/dri/i965_dri.so
v2: Put static on the function definition too. Suggested by Paulo.
v3: Reformat prototype. Suggested by Jason.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> [v2]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/856>
text data bss dec hex filename
12244974 1344936 1290748 14880658 e30f92 before/lib64/dri/i965_dri.so
12244854 1344936 1290748 14880538 e30f1a after/lib64/dri/i965_dri.so
v2: Put static on the function definition too. Suggested by Paulo.
v3: Reformat prototype. Suggested by Jason.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> [v2]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/856>
For the piglit drawoverhead case, 5/18 of the objects' relocs were
duplicated. We can dedupe them at object create time (since objects are
long-lived) and avoid repeated relocation work at emit time.
nohw drawoverhead program statechange throughput 2.34082% +/- 0.645832%
(n=10).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5020>
vkCmdResolveImage doesn't respect render-condition, so let's fall back
to blitter in this case instead.
Fixes: 80d7cc6f129 ("zink: enable conditional rendering if available")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5008>
We don't want LLVM 8 packages to be pulled in from testing though (it
would make installing llvm-8-dev for cross architectures a lot more
complicated), so explicitly select buster-backports for them (they were
already implicitly installed from there before, since they're not
available in buster proper).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4851>
Notable changes:
* No longer generate a separate *-built-by-job-* image tag, instead
store the pipeline/job information as labels in the image.
* Clean up some package information files which were accidentally left
before, possibly resulting in slightly smaller images.
Acked-by: Andres Gomez <agomez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4851>
We will eventually need to build our own LLVM on Windows in order to
build libclc and other bits which are required for the d3d12 build, as
well as to be able to test SPIR-V/OpenCL on llvmpipe.
Start doing this now, building into the base container, and exercise
this by building llvmpipe under Windows.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4946>
'Newer' versions of MSVCRT than 2013 appear to have fixed the bug around expf
precision which caused bb9e8c5090. It's not clear when this was
changed, but at least on Windows 10 machines with Visual Studio 2019,
expf behaves in line with other implementations.
As there is no clear way to test for the version of the VCRT in use,
simply mark this test as expected-pass rather than xfail.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4946>
It seems less brittle to not assume they are in the same order for src
and dst instructions.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Caught by the sanity checking in nir_intrinsic_copy_const_indices()
(which is introduced by the next patch).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
These intrinsics are supposed to map to the underlying hardware
instructions, which don't have wrmask. We use them when we lower
store_output in the geometry pipeline and since store_output gets
lowered to temps, we always see full wrmasks there.
It saves addressing math, but may cause multiple loads to be done and
bcseled due to NIR not giving us good address alignment information
currently. I don't have any workloads I know of using non-const-uploaded
UBOs, so I don't have perf numbers for it
This makes us match the GLES blob's behavior, and turnip (other than being
bindful).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4858>
With the upcoming LDC usage in the GL driver, we don't want to be
uploading descriptors for every UBO when they aren't actually in use.
Trimming NIR's num_ubos will avoid that, and cleans up num_ubo handling
elsewhere right now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4858>
The vma_heap allocator was originally designed to prefer high addresses
in order to find bugs in ANV's high address handling. However, there
are cases where you might want the allocator to prefer lower addresses
for some reason. This provides a configure bit for exactly this
purpose.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5019>
The HW requires a log2 width/height of the level 0 meta_* size in the
descriptors, making it pretty clear that UBWC mipmapping is all
power-of-two sized. Fixes a bunch of failures in the upcoming unit UBWC
layout unit tests.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4931>
Using texturator on a P3A at 1024x1024, RG8 has log2w/h of 6x7 instead of
R16I/UI's 6x8. The other blockw/h I verified other than cpp=1
(R8/R8I/R8UI didn't use UBWC) and 32 (would need a bigger type).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4931>
This patch splits the visit_phi() function into
three different ones according to the kind of phi
(merge-node, loop-header or loop-exit) and calls
them when visiting the cf_nodes.
This allows to revisit loops if the loop header's
phis have changed, only.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4062>
Also, rewrite the format decision code so that we correctly decide when
the linear fallback is needed, even if UBWC is disabled. As part of
that, I also moved around some of the code to handle compressed formats
to make sure that copying compressed formats with a linear staging blit
works (this is now possible since we started allowing tiled compressed
textures).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5007>
This is simpler than a full-blown memory reuse mechanism, but is good
enough to make sure that repeatedly doing a copy that requires the
linear staging buffer workaround won't use excessive memory or be slowed
down due to repeated allocations.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5007>
If one VMEM instruction uses a sampler and the other doesn't, we can't do
this optimization.
Totals from 47 (0.04% of 127638) affected shaders:
CodeSize: 271744 -> 271656 (-0.03%); split: -0.04%, +0.01%
Instrs: 52783 -> 52761 (-0.04%); split: -0.05%, +0.01%
Cycles: 5547040 -> 5546952 (-0.00%); split: -0.00%, +0.00%
VMEM: 10022 -> 9887 (-1.35%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4949>
The GC880 on iMX6DL indicates in it's minorFeatures2 register that it
does support SEAMLESS_CUBE_MAP, however when the TE.SAMPLER_CONFIG1
VIVS_TE_SAMPLER_CONFIG1_SEAMLESS_CUBE_MAP bit is set on GC880 on iMX6DL,
the result is corrupted image. In particular, the following ~112 dEQPs
are affected and fail:
dEQP-GLES2.functional.texture.filtering.cube.*
This only happens on MX6DL GC880, MX6Q GC2000 and STM32MP1 GC400(GCnano)
do not report the minorFeatures2 SEAMLESS_CUBE_MAP bit and ignore the
TE_SAMPLER_CONFIG1 VIVS_TE_SAMPLER_CONFIG1_SEAMLESS_CUBE_MAP bit (note
that ss->seamless_cube_map is unconditionally set by mesa at times even
PIPE_CAP_SEAMLESS_CUBE_MAP_PER_TEXTURE returns 0), so there is no visible
problem and there are no failing dEQP tests on the GC2000 and GCnano.
This might imply that the minorFeatures2 SEAMLESS_CUBE_MAP has some
different meaning on GC880 or the SEAMLESS_CUBE_MAP behaves differently
on the GC880.
This patch does not set the SEAMLESS_CUBE_MAP bit on hardware which does
not indicate support for seamless cube map and on GC880, which results
in reduction in failed dEQPs: 635 to 186 on GC880, 274 to 270 on GC2000
and no change on GC400(GCnano).
Fixes: 8dd26fa2f0 ("etnaviv: support GL_ARB_seamless_cubemap_per_texture")
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4865>
On a6xx we need a 0,0 based scissor in the binning pass, but can use the
blit-scissor to avoid restore/resolve of untouched pixels, and use the
conditional execution if the IB to bin to skip bins with no geometry
(due to the scissor).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5021>
Similar to what we do in postsched. It is useful for pre-RA sched to be
a bit aware of things that would cause syncs. In particular for the tex
fetches, since the vecN src/dst tends to limit postsched's ability to
re-order them.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4923>
If an instruction's only use is as an output, and it increases register
pressure, then try to avoid scheduling it until there are no other
options.
A semi-common pattern is `fragcolN.a = 1.0`, this pushes all these
immed loads to the end of the shader.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4923>
Similar to avoidance of `(ss)` syncs, it turns out to be helpful to
avoid `(sy)` syncs as well. This helps us turn an tex, (sy)alu, tex,
(sy)alu sequence into tex, tex, (sy)alu, alu, which is a big win in
gfxbench gl_fill2.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4923>
It seems for short frag shaders, too much prefetch can be detrimental.
I think what we *really* want to do is decide after pre-RA sched, when
we also know about nop's and what the actual ir3 instruction count is.
But that will require re-working how prefetch lowering works. For now
this is a super crude heuristic to attempt to approximate a good
solution.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4923>
We can no longer assume that `state->ranges[0]` is block 0. It *often*
is, but when we encounter a "real" ubo that we lower to `load_uniform`
before a block 0 `load_ubo`, it could end up another entry in the table.
Resulting in the second pass after gathering ubo ranges, not finding a
valid range. Which results in a `load_ubo` for a thing that is not
actually a ubo making it's way into ir3 frontend. Resulting in grabbing
what we think is a ubo address out of some unrelated const register, and
trying to dereference that. Which as you can imagine, fails in amusing
ways.
Fixes: fc850080ee ("ir3: Rewrite UBO push analysis to support bindless")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4954>
Corresponds roughly to what we analyze. Note that "terminate AND
execute" is a contradiction (rather: it's equivalent to just
terminating), hence why there are only three possibilities for the
states of the flags:
.cont = continue, don't execute
.last = don't continue, don't execute
.cont.last = continue and execute
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5014>
This is for an optimization I plan in a following commit. I found I had
to add likely()s to avoid a perf regression from branch prediction.
On the drawoverhead 8 UBOs test, the HW can't quite keep up with the CPU,
but if I set nohw then this change is 1.32023% +/- 0.373053% (n=10).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4996>
This commit fixes a GS regression introduced in !4562 where
ir3's GS lowering pass was moved from common code (ir3_nir) to
freedreno-specific code (ir3_shader). For GS support in turnip, we
need to add the GS lowering pass back in, this time in tu_shader.
As for the nir_gather_info change, the GS lowering pass has always
introduced a discard_if intrinsic into the GS. Previously, we simply
ran nir_shader_gather_info before GS lowering, but now since we lower
the GS before we need to remove the assertion that only a FS can use
the discard_if intrinsic.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4892>
There is an annoying spec corner on Android. Because VkSwapchain is
implemented in the Vulkan loader on Android which may not know about
this extension, we have to handle it as a special case inside the
driver. We only have to do this on Android and only for VkSwapchainKHR.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4882>
It seems this is actually a "minimum pitch" value. For example
TFETCH6_2_BYTE means a minimum pitch of 128 bytes for mipmap levels.
This fixes breakage with compressed formats. For example this test:
dEQP-VK.pipeline.sampler.view_type.2d.format.eac_r11_snorm_block.mipmap.linear.lod.equal_min_3_max_3
Fixes: a34b3fa198 ("freedreno/fdl: Align after dividing by block size")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5009>
The nir_analyze_ubo_ranges pass removes all UBO block 0 loads to reverse
what nir_lower_uniforms_to_ubo() had done, and we only upload UBO pointers
to the HW for UBO block 1-N, so let's just fix up the shader state.
Fixes an off by one in const state layout setup, and some really dodgy
register addressing trying to deal with dynamic UBO indices when the UBO
pointers happen to be at the start of the constbuf.
There's no fixes tag, though this fixes a bug from September, because it
would require the num_ubos fix in nir_lower_uniforms_to_ubo.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4992>
The fixed commit was really nice in mostly fixing num_ubos to reflect the
shader after lowering, but for
dEQP-GLES31.functional.compute.basic.ubo_to_ssbo_single_invocation there
are no default uniforms and so we skipped the increment, even though we
shifted the block index up.
Fixes: 4777ee1a62 ("nir: Always create UBO variable when lowering uniforms to ubo")
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4992>
Using non-write flags is pretty dubious -- it means the kernel tracking an
array of read-only consumers of the BO and having exclusive consumers wait
on each reader's fence. It allows multiple readers through dma-bufs to do
work in parallel, but at the cost of kernel CPU time and memory management
of the shared array. Other drivers have dropped this distinction since
dma-buf sharing is usually producer-consumer, not producer-two-consumers,
and the userspace and kernel space tracking is expensive.
For us, this lets us drop the flags passed in for relocs and tracked in
the ringbuffer reloc lists. The end result of the flags reduction work is
drawoverhead uniforms test throughput 2.37195% +/- 0.365579% (n=15)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4967>
It's silly to have all the reloc emitters passing around FD_RELOC_READ
when you have to have it set on all relocs (that don't include WRITE,
which implies read) for the kernel to actually track the fences on the BO.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4967>
../src/mesa/main/glthread_draw.c: In function ‘_mesa_marshal_MultiDrawElementsBaseVertex’:
../src/mesa/main/glthread_draw.c:812:36: error: implicit declaration of function ‘alloca’; did you mean ‘malloc’? [-Werror=implicit-function-declaration]
812 | const GLvoid **out_indices = alloca(sizeof(indices[0]) * draw_count);
| ^~~~~~
| malloc
../src/mesa/main/glthread_draw.c:812:36: error: initialization of ‘const GLvoid **’ {aka ‘const void **’} from ‘int’ makes pointer from integer without a cast [-Werror=int-conversion]
cc1: some warnings being treated as errors
Include c99_alloca.h to portably make the alloca() prototype available.
Fixes: 2840bc30
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4920>
TMUSLOD register is the same that TMUS but having the same effect that
setting disable_autolod on the TMU configuration parameter 2.
So using that register is potentially more efficient, as in several
cases we would be able to skip writing P2.
One case where we can't use it is for texture cube maps, as we need to
use TMUSCM.
v2: don't put a comment in the middle of the conditions (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4962>
Texture access has three configuration parameters, P0 (texture), P1
(sampler) and P2(lookup). P1 and P2 are optional, but if P2 is needed
(like for example to set the offset for texelFetchOffset), then you
need to set P1.
But until now when setting up P1 we were asking the driver to fill up
the address with the shader state. But in that case we can just fill
that address with the default value NULL.
So let's avoid asking the driver to fill that default values, and do
it directly on the compiler. This is a good-to-have on OpenGL, and
likely would be needed on Vulkan.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4962>
Commit 1bc71e8b65 already did that for
the 3rd offset, but it also needs to do it for the 2nd (to handle 1d
array).
Fixes assertion failures with Vulkan CTS tests using 1darray
targets. Seems that there isn't too many 1darray tests on OpenGL CTS,
and OpenGL-ES don't support 1d arrays, but the same problem could
arise eventually on OpenGL.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4962>
On all Gen7+ platforms except DG1, the URB is a subsection of the
configurable L3 cache, and so the size can vary. The size listed
in the documentation on those platforms is an "example size", picked
by calculating it based on an arbitrarily chosen L3 config.
Hardcoding a value for those platforms provides no value and only
confuses people trying to fill out these tables when doing hardware
enabling. anv and iris never use this field. i965 uses it to
initialize brw->urb.size, but then updates that in update_urb_size()
to be the correct value, so the initial value doesn't matter.
Delete the values for Gen7+ and update the comment accordingly.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4969>
Fixes the following building errors:
In file included from external/mesa/src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:40:
external/mesa/src/gallium/drivers/freedreno/a6xx/fd6_pack.h:42:10: fatal error: 'adreno-pm4-pack.xml.h' file not found
^~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
In file included from external/mesa/src/gallium/drivers/freedreno/a6xx/fd6_blend.c:36:
external/mesa/src/gallium/drivers/freedreno/a6xx/fd6_pack.h:42:10: fatal error: 'adreno-pm4-pack.xml.h' file not found
^~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
In file included from external/mesa/src/gallium/drivers/freedreno/a6xx/fd6_const.c:26:
external/mesa/src/gallium/drivers/freedreno/a6xx/fd6_pack.h:42:10: fatal error: 'adreno-pm4-pack.xml.h' file not found
^~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
Fixes: ee293160 "freedreno/a6xx: add OUT_PKT()"
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4973>
The dependency is required to get the necessary generated headers
Fixes the following building error:
In file included from external/mesa/src/freedreno/drm/msm_bo.c:27:
In file included from external/mesa/src/freedreno/drm/msm_priv.h:30:
In file included from external/mesa/src/freedreno/drm/freedreno_priv.h:51:
external/mesa/src/freedreno/drm/freedreno_ringbuffer.h:35:10: fatal error: 'adreno_common.xml.h' file not found
#include "adreno_common.xml.h"
^~~~~~~~~~~~~~~~~~~~~
1 error generated.
Fixes: 6c688ae8 ("freedreno: Deduplicate ringbuffer macros with computerator/fdperf")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4973>
This is yet another simple optimization that attemts to save the
insertion of an unnecessary mov for a large number of cases.
If the node outputting the condition for select satisfies a few
requirements (which are common in the case of comparison conditions),
it can just be changed to pipeline output and used directly.
In case of difficult corner cases, just fall back to the mov as before.
The sel_cond op is removed as the scheduler can be smart enough to place
nodes that output to ^fmul in the ALU_SCL_MUL slot, and as there can be
alu ops other than just mov.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4632>
It turns out that with more aggressive combining, there can be cases
where the available const slots are not enough for one instruction.
In particular, fcsel can take up to two consts, and a previous alu slot,
such as a comparison condition, might require an additional const.
So add a fallback for it like for uniforms.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4632>
In many cases, it is possible to avoid creating a mov for the store
output node.
Additionally, nodes other than alu, such as load varying, can be valid
store output nodes too.
This is another small optimization, but helps a vast majority of
programs by 1 instruction.
Shaders with discard easily become complicated to handle properly.
Some example issues: ppir has to rely on instruction ordering; or a
node with ssa output could be required both before a discard_if (as a
condition) and after it (as the instruction with the 'stop' bit set).
So don't try to handle them here.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4632>
The previous code assumed that a ppir node would be created for each nir
instr and used that to add it to the list of nodes and verify success.
This didn't make much sense anymore since some emit paths create
multiple nodes anyway, and this didn't allow for an emit call to not
create any new ppir node while still returning success.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4632>
Move the duplicate consts step to a nir pass.
This makes the nir representation closer to what ppir will have in the
result.
Additionally, it handles the case where a const is used multiple times
by a single node (which can happen in instructions like fcsel). The new
implementation will only emit a single load const for that case.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4535>
Move the duplicate uniform and varying steps to a nir pass, along with
some changes in the duplicating strategy.
Node duplication is now done per user of the varying/uniform. This is
inspired by what the offline shader compiler seems to usually do, and as
usual aims to reduce register pressure and better utilize the ld_uni and
ld_var instruction slots.
It is worth noting that due to a bug/feature, ppir was already
duplicating uniforms per successor in ppir_node_add_src even if the
comment indicated it was meant to be per-block.
Additionally, ppir was duplicating load uniform nodes twice for nodes
that use the same uniform in more than one source, resulting in one
unnecessary (and unpipelineable) load. This new implementation in nir
only creates one load in that case.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4535>
Varying loads with a single successor have a high potential to be
combined with its successor node, like ppir does for uniforms, rather
than being in a separate instruction.
Even if ppir becomes capable of combining instructions in a separate
step, combining varying loads during node_to_instr is trivial enough
that it seems to be worth doing it in this stage, and this benefits
pretty much every program that uses varyings.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4535>
Even if a node has pipeline output and a single successor, it is still
valid for that successor to have multiple references to that pipeline
node. A trivial example is add(u.x,u.y) where u is a uniform.
It is even possible for this to occur with consts as operands of fcsel.
So remove uses of ppir_node_get_src_for_pred as that would assume a
single src in the node that uses the pipeline.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4535>
The lod bias register is correctly run through the entire compilation
process, but in the end its allocated register value was never being
added to the instruction.
It seems that most programs were lucky enough that lod bias was assigned
register 0.x so that things worked anyway.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4535>
The current solution for handling registers that live and die within a
single instruction does not handle all cases. In particular, these
intra-instruction use register also conflict with registers that are
part of the live_in set.
Unfortunately, adding them to the live_in set is not an easy solution as
that would cause them to be propagated upwards. So, add a separate set
to handle these registers in the particular instructions, without
propagating them.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4535>
src/panfrost/shared is shared with lima driver, build
bifrost_compiler for lima driver is meaningless and
get link error when only lima driver is enabled.
So only build bifrost_compiler when configued with:
meson -Dtools=panfrost
Fixes: ec2a59cd7a "panfrost: Move non-Gallium files outside of Gallium"
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4960>
v1. Replace the existed bool type with new bitfield and edit register
files to take a mask instead of duplicating codes to do masking.
v2. Use fragcoord_compmask != 0 instead of fragcoord_compmask > 0 since
it represents a bitfield.
Tested with
dEQP-VK.glsl.builtin_var.simple.fragcoord_xyz/w
dEQP-GLES2.functional.shaders.builtin_variable.fragcoord_xyz/w
Closes: #2680
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4723>
This allows a pool's allocations to start somewhere other than the base
address. Our first real use of this will be to use a negative offset
for the binding table pool to make it so that the offset is baked into
the pool and the code in anv_batch_chain.c doesn't have to understand
pool offsetting.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4897>
SCHED_IDLE on linux can lead to extraordinarily long periods of no scheduling
leading to starvation of minimum priority threads for such an extended period
that it can eventually lead to GUI stalls. Switch to renicing the threads to
the lowest priority and use the SCHED_BATCH scheduling policy which is a hint
to the scheduler that this is latency insensitive thread instead. This change
has been confirmed to address unexpected GUI related stalls in mesa
applications across a range of different linux kernels.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4912>
This is base on the exported extension strings, with some known-bad
extensions removed. There might be more that should be removed, as their
support isn't per-spec, but this gives us some more information, at
least.
There's also a few features that seems to be trivial to enable, simply
by flipping a cap. But let's document what is expected to work first.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2075
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4963>
We were not consistent with minimums reported in the physical device
properties.
Fixes a few CTS tests :
dEQP-VK.memory.requirements.dedicated_allocation.buffer.regular
dEQP-VK.memory.requirements.extended.buffer.regular
dEQP-VK.memory.requirements.core.buffer.regular
v2: Use define for the limit
v3: Rename define
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a0de2e0090 ("anv: increase minUniformBufferOffsetAlignment to 64")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4940>
Now that ACO supports all shader stages (the only exception is NGG
GS on Navi10 but it fallbacks to legacy GS) it makes sense to remove
the LLVM version string reported as part of the device name.
The LLVM version string was added in the past for some Feral games
to workaround LLVM issues by detecting the version. With ACO, this
is unecessary because the Mesa version is enough to eventually enable
specific shader workarounds.
When the LLVM version string is missing, it is assumed that an old
LLVM is used and workarounds are automatically applied. The only
Vulkan games that might be affected is Shadow of The Tomb Raider
but the impact should be fairly small.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4911>
LLVMpipe recently got the ability to render to MSAA-surfaces, but in
order for this to work on Windows, we need to allocate a separate MSAA
resource and resolve using a blit before we can display it.
Without this, we end up always displaying the first sample instead of
the resolved result.
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4926>
Apparently this is allowed, and the CTS started doing this more often
recently which resulted in frequent hangs running the entire CTS. I
copied the code to create an empty FS from radv.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4928>
This is currently broken hard, because this code is being used in more
places that it used to be, and fixing that is prohibitively hard right
now.
This is far from ideal, as it leaves the same inconsistency in the
EMBEDDED_DEVICE code-path. But that only used by VMWare, so it's
probably better if they fix it, as they know their requirements better
than we do.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2911
Fixes: 76f79db3f5 ("util: stop including files from mesa/main")
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4919>
DXT1_RGBA and sRGB variants of DXT[135] formats are enabled as
valid format on V3D.
Once all S3TC formats supported by V3C are enabled the following
extensions become exposed by gallium.
* GL_ANGLE_texture_compression_dxt3
* GL_ANGLE_texture_compression_dxt5,
* GL_EXT_texture_compression_dxt1
* GL_EXT_texture_compression_s3tc
* GL_S3_s3tc
* GL_EXT_texture_compression_s3tc_srgb
This enables 206 passing piglit test related to gl_compressed.*s3tc_dxt
Cc: 20.0 20.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4934>
Swizzles were ignoring the W component of the format DXT3_RGBA and
DXT5_RGBA.
This fixes 15 piglit tests:
spec/!opengl 1.1/copyteximage 2d
spec/!opengl 1.2/copyteximage 3d
spec/arb_texture_compression/fbo-generatemipmap-formats/gl_compressed_rgba
spec/arb_texture_compression/fbo-generatemipmap-formats/gl_compressed_rgba npot
spec/arb_texture_compression/texwrap formats bordercolor-swizzled/gl_compressed_rgba, swizzled, border color only
spec/arb_texture_compression/texwrap formats bordercolor/gl_compressed_rgba, border color only
spec/arb_texture_cube_map/copyteximage cube
spec/arb_texture_cube_map/copyteximage cube samples=2
spec/arb_texture_cube_map/copyteximage cube samples=4
spec/arb_texture_rectangle/copyteximage rect
spec/arb_texture_rectangle/copyteximage rect samples=2
spec/arb_texture_rectangle/copyteximage rect samples=4
spec/ext_texture_array/copyteximage 2d_array
spec/ext_texture_array/copyteximage 2d_array samples=2
spec/ext_texture_array/copyteximage 2d_array samples=4
Fixes: 469bbd8387 "broadcom/vc5: Move the formats table to per-V3D-version compile."
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4934>
For compressed formats, we need to align the number of blocks, not the
logical number of pixels in the texture. Only compressed formats have
block width/height > 1, so we can just unconditionally multiply the
alignment by the block width/height.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4868>
robclark found that we needed unique IDs when multiple runners were trying
to report flakes at the same time, but it turns out due to nick limits (16
chars on freenode) we were just getting all the runners appended with
"-142" (or whatever the prefix of the pipelines are these days). And, for
the new flake reporting from baremetal, all the runners ended up being
just "google-freedreno".
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4896>
We were incorrectly taking the merge-request on non-MR pipelines (the
master build after merge) due to a missing '$'. And, for those pipelines,
it would be nice to note whether they're for master or a stable branch.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4896>
../src/mesa/drivers/dri/i965/brw_wm_surface_state.c:1378:32: runtime error: index 3503345872 out of bounds for type 'uint32_t [149]'
brw_assign_common_binding_table_offsets has the following comment:
"Unused groups are initialized to 0xd0d0d0d0 to make it obvious that they're
unused but also make sure that addition of small offsets to them will
trigger some of our asserts that surface indices are < BRW_MAX_SURFACES."
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4350>
The VRAM size returned to apps is computed as follows:
vram_size = real_hw_vram_size - visible_vram_size.
Visible VRAM buffers should be counted only in the visible VRAM
counter and not twice. Buffers with the NO_CPU_ACCESS flag are
known to not be mappable, so they are counted in the VRAM counter.
Other buffers, with the CPU_ACCESS flag, or without any of both
(imported buffers) are counted in the visible VRAM counter because
they are mappable.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4834>
This uses the templating to generate multisample version of the
tri plane raster functions
This doesn't generate any optimised version for lower plane numbers,
maybe this is worth doing in the future.
v2: drop generating 32-bit msaa (Roland)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4122>
Extract the final per-sample masks and store to the multisample
color buffers using them.
This retypes the pointer to a uint8_t at entry to make the
GEP simpler, then recasts to the blend type.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4122>
A set of values have to be passed from the early depth test to the
late depth write, when multisampling is enabled, a range of those
values have to be stored between stages, so create storage for them
and pass the values through the storage.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4122>
Start adding support for multisample masks and the depth passes
The depth passes have to run per-sample, this isn't complete support
it adds the loops, and handles the execution masks.
One mask is stored per sample, they are combined post the early Z
pass into a single shader execution mask, and then the resulting
shader execution mask is anded back in for the late Z pass.
Init the vars to NULL to avoid gcc warnings
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4122>
The current depth stencil test code has some optimisations using
the mask when there is only one depth value, multisample requires
per-sample zstencil testing, and for that case just pass in the
mask that needs updating.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4122>
There is a race with u_blitter shaders + pipeline shaders (aaline/aapoint)
where the draw bind can cause a pipeline flush which can use bind_fs_state to
be reenters and llvmpipe->fs gets the wrong value. Fix this by only
setting the llvmpipe->fs value after the draw binding is complete.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4122>
This adds a new callback to get the stride between the per-sample
images, adds a new value for the per-sample index to lookup,
and a flag to use multisampling.
gallivm/sample: add num samples interface for dynamic samplers
This will be used for getting number of samples in jit code.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4122>
robclark noted that the blob wasn't doing range reduction in the mediump
case, and I confirmed it on
dEQP-GLES3.functional.shaders.operator.angle_and_trigonometry.sin.mediump_float_fragment
vs
dEQP-GLES3.functional.shaders.operator.angle_and_trigonometry.sin.highp_float_fragment.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4893>
NIR already optimizes undef usage.
If undef reaches llvm, it's probably because of a broken shader.
In this situation, rather than letting llvm use the undef values
to do more optimization and probably produce incorrect results,
we replace undef values by 0.
"undef" values that are directly used in exports are kept as undef,
because this allows llvm to optimize them away.
This is only enabled for radeonsi.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2689
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4607>
This commit introduces a new way to zero-init variables but keep the
old one to not break any existing behavior.
With this change GLSLZeroInit becomes an integer, with the following
possible values:
- 0: no 0 init
- 1: current behavior
- 2: new behavior. Similar to 1, except ir_var_function_out type are
0 initialized but ir_var_shader_out.
The rationale behind 2 is: zero initializing ir_var_shader_out can
prevent some optimization where out variables are completely eliminated
when not written to.
On the other hand, zero initializing "ir_var_function_out" has no
effect on correct shaders but typically helps shadertoy since the main
function is:
void mainImage(out vec4 fragColor) { ... }
So with this change we're sure that fragColor will always get a value.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4607>
Shared globals and glsl_zero_init can cause linker errors if the
variable is only initialized in 1 place.
This commit adds a flag to variables that have been implicitely
initialized to be able in this situation to keep the explicit
initialization value.
Without this change the global-single-initializer-2-shaders piglit
test fails when using glsl_zero_init.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4607>
The Vulkan spec says:
"shaderInt16 specifies whether 16-bit integers (signed and unsigned)
are supported in shader code. If this feature is not enabled, 16-bit
integer types must not be used in shader code."
I think it's just safe to enable it because 16-bit integers should
be fully supported with LLVM and also with ACO and GFX8+. On GFX8
and earlier generations, throughput of 16-bit int is same as 32-bit
but that should't change anything.
For GFX6-GFX7 ACO support, we have to implement conversions without
SDWA.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4874>
Zink needs to know the sizes of UBOs, and for normal UBOs we get this
from the nir_var_mem_ubo variables. This allows us to treat all of these
the same way.
We're about to need the same information for the in-progress D3D12
driver, so let's do this in a central location instead of in the driver.
This version is also a bit more careful than the Zink version. In
particular, for two reasons:
1. We increase the variable bindings when we adjust the pre-existing
UBOs.
2. We increase shader->info.num_ubos when we insert a new UBO variable.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4734>
It's not great to use shader_info for this information, because it
might have gone through lowering of uniforms to UBOs, which can change
the number of UBOs. So let's make sure we know the size of the
UniformBlocks array from when the shader was linked instead.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4734>
Fixes the following undefined symbol building errors:
ld.lld: error: undefined symbol: iris_seqno_init
>>> referenced by iris_batch.c:187 (external/mesa/src/gallium/drivers/iris/iris_batch.c:187)
>>> iris_batch.o:(iris_init_batch) in archive out/target/product/x86_64/obj/STATIC_LIBRARIES/libmesa_pipe_iris_intermediates/libmesa_pipe_iris.a
Fixes: e31b703c ("iris: Place a seqno at the end of every batch")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
These come from the disasm tests, and fix our disasm of blob's
uniform/nonuniform cat6 operands. We also now include human-readable names
for all the modes we know about (though bindless gets distinguished by its
.baseN, like Connor's original disasm).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4857>
With this I also brought in a few new control flow instruction disasm
tests that I'd made back when I wrote the disasm test, but which were too
far from correct to include until now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4857>
We've already got these duplicated a bunch of places. They should
really probably live in common code. The new versions take two more
arguments:
1. The struct member which gets you from __driver_type to the
vk_object_base. This requires drivers which use this to also use
vk_object_base.
2. The VkObjectType enum which represents that object type.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4690>
As discrete graphics looms, we really need to stop storing CPU data
structures in GPU memory. One of the most egregious instances of this
was VkEvent where we had a CPU data structure living inside a dynamic
state pool allocation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4690>
From OpenGL 4.6, section 9.4.4 "Effects of Framebuffer Completeness on
Framebuffer Operations", page 332:
"An INVALID_FRAMEBUFFER_OPERATION error is generated by attempts to render
to or read from a framebuffer which is not framebuffer complete.
This error is generated regardless of whether fragments are actually read
from or written to the framebuffer. For example, it is generated when a
rendering command is called and the framebuffer is incomplete, even if
RASTERIZER_DISCARD is enabled."
Signed-off-by: Dmytro Nester <dmytro.nester@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4833>
This extends commit 2243f0cd for anv with additional
extensions for Pie and Q versions.
Fixes tests with 9_R11 CTS:
dEQP-VK.api.info.android#no_unknown_extensions
dEQP-VK.api.info.device#extensions.
v2: Use snake_case function name (Jason Ekstrand)
Drop Change-Id in commit (Kristian H. Kristensen)
v3: Resolve meson-clang error for ANDROID_API_LEVEL.
Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4827>
Current Android dessert versions such as Pie, Q reject
vulkan version > 1.1. Clamp the vulkan versions to 1.1
for platforms running these Android desserts.
Fixes android.graphics.cts.VulkanFeaturesTest and
dEQP-VK.api.info.device#properties.
v2: Limit version with '!ANDROID' (Eric Engestrom and Tapani Pälli)
Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4781>
The motivating factor is: this lowering may cause
nir_intrinsic_load_local_group_size intrinsics to be added to the
shader, and by moving this around we make possible for the drivers to
lower that intrinsic by themselves.
Iris will do just that in a later patch for implementing variable
group size.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>
Intrinsic to get the SIMD width, which not always the same as subgroup
size. Starting with a small scope (Intel), but we can rename it later
to generalize if this turns out useful for other drivers.
Change brw_nir_lower_cs_intrinsics() to use this intrinsic instead of
a width will be passed as argument. The pass also used to optimized
load_subgroup_id for the case that the workgroup fitted into a single
thread (it will be constant zero). This optimization moved together
with lowering of the SIMD.
This is a preparation for letting the drivers call it before the
brw_compile_cs() step.
No shader-db changes in BDW, SKL, ICL and TGL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>
(Co-authored with Chris Wilson.)
Frequently, games create fences and later check them with a timeout of
0 to see if that work has completed yet. They do not want the work to
be flushed immediately upon fence creation.
This is what PIPE_FLUSH_DEFERRED does - it inhibits the flush at fence
creation time, but still guarantees that a flush will occur later on
once fence_finish() is called.
Since syncpts can only occur at batch boundaries, when deferring a
flush, we have to wait for the syncpt at the end of the batch being
constructed. This is later than desired, but safe if blocking. To
avoid extra delays, we additionally insert a PIPE_CONTROL to write an
availability bit at the exact point of the fence. We can poll this
on the CPU, allowing us to check whether the fence has gone by, even
if the batch hasn't completed. It can also let us skip kernel calls.
Improves performance in Bioshock Infinite by 10% on Icelake GT2 on
-ForceCompatLevel=5 settings. Thanks to Felix Degrood and Mark Janes
for helping notice the extraneous stalls and batches, Marek Olšák for
adding deferred flush support to Gallium to solve this issue, and
Chris Wilson for reworking a lot of the internals of this work.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3802>
Receiving a fence_server_sync (iris_fence_await) means that any future
work needs to wait for the fence. But previous work doesn't need to.
So flush it now, to avoid delaying it arbitrarily.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3802>
In the next patch, we will introduce deferred fences where we will need
to flush a fence later. To do this, we need to know which batch requires
flushing, so keep a 1:1 mapping between seqno[] and the associated
batch.
It's also substantially less confusing to have a 1:1 mapping.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3802>
By using the breadcrumbs we inject into the batch, we can build a
lightweight fence - that can be evaluated in userspace without having to
check in the kernel. In order to pass the fences between processes, and
to wait efficiently, we continue to track the syncobj for each batch and
use that as a terminator for the fence, and for passing coarse
scheduling decisions to the kernel on execbuf.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3802>
We can use seqno as a basic for fast userspace fences: where we can
check a value directly to test for fence completion without having to
query using the kernel. To do so we need to write a breadcrumb from the
batch and track those writes as the basis for our lightweight fences.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3802>
Batches are going to have an uploader in the next commit, so destroying
batches will destroy uploaders, which will unmap transfers, which will
return things to the slab allocator. So we need to reorder destroying
the slab allocator to the end to avoid crashing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3802>
We're going to need it to create a uploader in the batch soon. We still
avoid storing it, to maintain the charade of separation, and make people
think twice about fetching random fields from there and intertwining
things even worse.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3802>
This is just a refcounted wrapper around a drm_syncobj. There is
enough terminology going on in the area of synchronization (sync
objects, sync files, ...) that I'd rather not invent our own.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3802>
Lets us drop some cut and pasted kernel header contents.
Linux 4.7 came out 4 years before we the first officially supported
release of this driver; iris won't run on kernels older than 4.16,
and 4.18.11+ is strongly recommended. So I suspect it's safe to
assume that a kernel header from 4.7 will exist at build time.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3802>
Avoids surprises where you set an OVERRIDE but it gets ignored and the
system PCI ID is used.
Also fixes the bug that the error of invalid platform name being
printed too early, even when the passed platform was a PCI ID (which
is also supported).
For the case where euid != uid, a warning was added but the behavior
wasn't changed: it is still going to fallback to system PCI ID.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4841>
gen12 does away with the single patch dispatch mode for tcs, and
increases some limits so that 8_patch mode can always work. Make the
necessary changes so we don't try to fall back to single patch mode.
Fixes KHR-GL46.tessellation_shader.single.max_patch_vertices and others
Fixes: 44754279ac ("intel/fs/gen12: Use TCS 8_PATCH mode.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4843>
If you're going out of your way to do per-sample interpolation, you are
almost surely going to be doing so to an MSAA framebuffer. Should reduce
recompiles with MSAA enabled.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
Now that we normalize our keys fairly well, build a variant at shader
state creation time so that hopefully you don't have to call the compiler
at draw time (as is now the case with glmark2 ES and most of the humus GL
demos).
Fixes: #2782
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
We can remove a bunch of conditional code at key comparison time by
computing a bitmask of used key bits at ir3_shader creation time. This
also gives us a nice place to put additional key simplification to reduce
how many variants we create (like skipping rastflat if we don't read
colors in the FS, or skipping vclamp_color if we don't write colors).
It does mean walking the whole key to AND it, but the key is just 28 bytes
so far so that seems pretty fine.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
We weren't filling in the tess mode of the key, or setting has_gs on GS
shaders, resulting in assertion failures when NIR intrinsics didn't get
lowered.
We have to make a guess at prim mode for TCS, but it should be better to
have some shader-db coverage than none, and it will avoid these failures
happening when we start precompiling shaders.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
Some of the negative API tests make shaders for tess stages that don't do
all the stores they need to. Once we start precompiling (or doing
shader-db of tess), we need to at least not segfault when generating them.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
We were failing to tell the allocator about the restriction that scalar
texture instructions (allocated as scalar regs) couldn't be allocated such
that the start of the full unwritemasked vector started before r0. There
was a patch in select_reg_callback on a6xx that tried to work around that,
but you could still end up backed into a corner you shouldn't be because
we didn't tell the RA what it needed.
Fixes compiler assertion failures on a300-a400's blit_z shader, used for
Z32F gmem blits.
Looks like as a result we get tighter register allocation but more nops:
instructions in affected programs: 757945 -> 760356 (0.32%)
nops in affected programs: 317983 -> 320468 (0.78%)
non-nops in affected programs: 27525 -> 27451 (-0.27%)
mov in affected programs: 3098 -> 3023 (-2.42%)
dwords in affected programs: 109664 -> 110656 (0.90%)
last-baryf in affected programs: 112701 -> 112847 (0.13%)
full in affected programs: 4326 -> 4011 (-7.28%)
sstall in affected programs: 120550 -> 120836 (0.24%)
(ss) in affected programs: 13939 -> 13918 (-0.15%)
(sy) in affected programs: 3006 -> 2786 (-7.32%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
When the GS lowering was working on store_output intrinsics, we had to
clean up the split vars to avoid getting confused. Now that we shadow
the output vars instead, there's no confusion and we can drop this
hack.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
We mostly got away with replacing a store_output with a store_var, but
for complex types like structs, that doesn't work. Once the IO has
been lowered from vars to intrinsic, we've lost the deref chains and
can't properly shadow the outputs.
This commits moves the GS lowering up so we do it before the output
variables get lowered to store_output. This way the pass works much
like nir_lower_io_to_temporaries() and cleanly shadows the outputs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
This pass lowers per-vertex input intrinsics to load_shared_ir3. This
was open coded in the TCS and GS lowering passes before - this way we
can share it. Furthermore, we'll need to run the rest of the GS
lowering earlier (before lowering IO) so we need to split off this
part that operates on the IO intrinsics first.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4562>
Has been observed that after the job chain has completed, those fields
become populated.
tiler_heap_next_start contains an address inside the tiler heap, a bit
before the value that the GPU writes to tiler_heap_free.
used_hierarchy_mask contains a hex value that corresponds to values
observed as hierarchy masks.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4832>
In detecting the case where we actually do need to re-emit LRZ state
(due to new batch), we were checking `ctx->last.dirty` to detect when
we cannot trust previous state. But this is cleared before we check
it.
Move where it is cleared to the end of the draw_vbo() path.
Fixes: dfa702e94b ("freedreno/a6xx: limit LRZ state emit")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4842>
If use NIR's 1-bit bool representation , we get exactly the bool behavior
the hardware provides: CMPS produces true or false, AND/OR/XOR work as
intended without extra absnegs, and we can pass those half values directly
to other CMPS. We emit an absneg for b2b1 ("turn a memory load into a
1-bit NIR boolean"), but we would have done so for the ir3_n2b() on the
use of that value anyway. The most awkward bit is that inot(a@1) is now a
sub(1, a), but we can encode the 1 as an immediate so it's fine.
No significant changes to GL_TIME_ELAPSED on my set of traces (n=21).
instructions in affected programs: 1570638 -> 1548702 (-1.40%)
nops in affected programs: 624053 -> 611381 (-2.03%)
non-nops in affected programs: 959061 -> 949797 (-0.97%)
mov in affected programs: 5258 -> 5252 (-0.11%)
cov in affected programs: 15099 -> 15902 (5.32%)
dwords in affected programs: 469600 -> 452768 (-3.58%)
last-baryf in affected programs: 162211 -> 154726 (-4.61%)
full in affected programs: 4881 -> 4797 (-1.72%)
sstall in affected programs: 173953 -> 174545 (0.34%)
(ss) in affected programs: 10922 -> 10934 (0.11%)
(sy) in affected programs: 728 -> 745 (2.34%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4518>
In addition to the cleanup, there are these changes in behavior:
- clear_render_target waits for idle after the dispatch and then flushes
L0-L1 caches (this was missing)
- sL0 is no longer invalidated before the dispatch, because src resources
don't use it
- sL0 is no longer invalidated after the dispatch if dst is an image
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4761>
1. glthread has a private upload buffer (as struct gl_buffer_object *)
2. the new function glInternalBufferSubDataCopyMESA is used to execute the copy
(the source buffer parameter type is struct gl_buffer_object * as GLintptr)
Now glthread can handle arbitrary glBufferSubData sizes without syncing.
This is a good exercise for uploading data outside of the driver thread.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4314>
The number of lines in the disassembly of vbo_exec_api.c.o decreased
by 4.5%, which roughly corresponds to a decrease in instructions
for immediate mode thanks to the removal of ctx->vbo_context dereferences.
It increases performance in one Viewperf11 subtest by 2.8%.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4314>
Skip most of _mesa_update_vao_derived_arrays if the VAO is not static.
Drivers need a separate codepath for this.
This increases performance by 7% with glthread and the game "torcs".
The reason is that glthread uploads vertices and sets vertex buffers
every draw call, so the overhead is very noticable. glthread doesn't
hide the overhead, because the driver thread is the busiest thread.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4314>
Drop vfunc callbacks for per-gen packet emit, and instead have a header
that is #include'd once per gen.
We'll end up with multiple copies of some of this, but since we never
have multiple gen's of adreno on a single device, only one copy will be
paged in (and hopefully in the I-cache for hot-paths)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4813>
In order to inline the const emit and drop the per-gen vfuncs to emit
the correct sort of packet, we should consolidate all of the entry-
points to const emit in one object file, otherwise we'll end up with
multiple copies per gen.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4813>
This is one of the hotter pkt7 packets, since it is guaranteed to happen
on every draw. Switch to OUT_PKT() for less driver overhead in the draw
path.
Slight bit of cheating for using CP_DRAW_INDX_OFFSET_0 for the first
dword in all cases. Possibly *gen_header.py* could be more clever
and use typedef's in the cases of bitsets like vgt_draw_initiator.
But this works out because it is always the first dword.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4813>
This gets rid of one lone register we used to emit directly in IB2
whenever blend state changes, at the expense of needing blend state
variants when sample-mask changes. I think typically sample-mask
should not change frequently, so this seems like a fair trade-off.
To further limit the # of variants, we ignore sample-mask bits that
are not relavant for the current # of samples.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4813>
LLVM 11 removes the intrinsic for half to float conversion, so use the fpext
function instead. This function actually works now with half to float, albeit
a quick experiment showed at least the x86 backend cannot lower it itself if
the cpu doesn't support it natively and tries to call external library, which
crashes (and would be very slow anyway as it would be lowered to scalar code),
so for now only use it where we previously used the f16c intrinsic.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2603
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4800>
The SFID field of the SHADER_OPCODE_MEMORY_FENCE and
SHADER_OPCODE_INTERLOCK instructions now indicates the target function
of the memory fence. Account the cycle-count cost to the right shared
unit.
Fixes: f858fa26b4 ("intel/fs,vec4: Pull stall logic for memory fences up into the IR")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4817>
BITSET_FOR_EACH_SET can walk a sparse set (such as a register class's set
of registers) much faster than just iterating over individual bits.
Improves freedreno startup time (as measured by shader-db ./run
shaders/closed/gputest/triangle on my x86 system) by -4.12679% +/-
1.99006% (n=151)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>
Shader-db results on ICL:
total instructions in shared programs: 17133088 -> 17133287 (<.01%)
instructions in affected programs: 61300 -> 61499 (0.32%)
helped: 0
HURT: 199
This means it's likely fixing 199 bugs. :-) All the changed shaders are
in Mad Max. It's surprisingly difficult to get the back-end compiler to
generate a pattern that hits this we don't tend to emit a lot coalescable
MOVs. The pattern in Mad Max that's able to hit is fsign(fsat(x)) under
the right conditions.
Closes: #2820
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4773>
Mapping of graphics kernel buffers is quite costly. Therefore the svga
drm winsys caches all kernel buffer maps. However, that may lead to
less testing coverage of the unmap paths and (possibly) processes running
out of virtual memory space. Introduce a possibility to avoid that caching
by setting the environment variable SVGA_FORCE_KERNEL_UNMAPS to 1.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4804>
The way the new locations are set up has much fewer gaps
between each I/O slot, so this results in a massive reduction
in the LDS usage of tessellation shaders.
Totals (GFX10):
VGPRS: 3976792 -> 3974864 (-0.05 %)
Code Size: 260552784 -> 260532860 (-0.01 %) bytes
LDS: 48723 -> 30179 (-38.06 %) blocks
Max Waves: 1053407 -> 1053583 (0.02 %)
Totals from affected shaders (1407 shaders on GFX10):
SGPRS: 59144 -> 59216 (0.12 %)
VGPRS: 63024 -> 61096 (-3.06 %)
Code Size: 2695508 -> 2675584 (-0.74 %) bytes
LDS: 47109 -> 28565 (-39.36 %) blocks
Max Waves: 12999 -> 13175 (1.35 %)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
This commit introduces a new function nir_assign_linked_io_var_locations
which is intended to help with assigning driver locations to shaders
during linking, primarily aimed at the VS->TCS->TES->GS stages.
It ensures that the linked shaders have the same driver locations,
and it also packs these as close to each other as possible.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
Previously these functions needed the bit mask of the TCS outputs
and patch outputs written, and concluded the number of outputs
from that.
Now, they take the number of outputs and patch outputs instead.
This will allow the backend compiler to better optimize the
LDS layout.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
There's no guarantee that the formats advertised by wl_drm and the formats
advertised by zwp_linux_dmabuf_v1 are the same.
get_back_bo() handles this by falling back from createImageWithModifiers() to
createImage() when there's a wl_drm format but no corresponding linux_dmabuf
format, but create_wl_buffer() unconditionally tries to create a linux_dmabuf
buffer unless DRIimage has DRM_FORMAT_MOD_INVALID.
Fix this by always checking if the DRIimage modifier has been advertised
by zwp_linux_dmabuf_v1, and falling back to wl_drm if not.
If DRM_FORMAT_MOD_INVALID has been advertised then we trust the client
has allocated something appropriate and treat any modifier as matching.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2220
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4294>
This assert is actually correct but due to how android hardware buffer
support is implemented we should remove it, otherwise debug build of
mesa hits the assert with Android CTS tests.
Test creates VkImage with non-external format and sets up
VkExternalMemoryImageCreateInfo to indicate that image *may* be used
with Android hardwarebuffer handle. Then test attempts to get image
memory requirements. Problem with this is that we setup all android
supporting images as having external format and thus hit the assert as
the size has not been set yet. This is not a problem in practice since
android will bind ahw memory with the image later on.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2807
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4762>
In Gen11+, when emitting a fence for both L3 and SLM, the generated
code would look like
SEND, MOV (for stall), SEND, MOV (for stall)
This commit change that so two SENDs are emitted before the MOVs for
stall. This is similar to the approach used in Ivy Bridge for the
render fence.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>
Instead of emitting the stall MOV "inside" the
SHADER_OPCODE_MEMORY_FENCE generation, use the scheduling fences when
creating the IR.
For IvyBridge, every (data cache) fence is accompained by a render
cache fence, that now is explicit in the IR, two
SHADER_OPCODE_MEMORY_FENCEs are emitted (with different SFIDs).
Because Begin and End interlock intrinsics are effectively memory
barriers, move its handling alongside the other memory barrier
intrinsics. The SHADER_OPCODE_INTERLOCK is still used to distinguish
if we are going to use a SENDC (for Begin) or regular SEND (for End).
This change is a preparation to allow emitting both SENDs in Gen11+
before we can stall on them.
Shader-db results for IVB (i965):
total instructions in shared programs: 11971190 -> 11971200 (<.01%)
instructions in affected programs: 11482 -> 11492 (0.09%)
helped: 0
HURT: 8
HURT stats (abs) min: 1 max: 3 x̄: 1.25 x̃: 1
HURT stats (rel) min: 0.03% max: 0.50% x̄: 0.14% x̃: 0.10%
95% mean confidence interval for instructions value: 0.66 1.84
95% mean confidence interval for instructions %-change: 0.01% 0.27%
Instructions are HURT.
Unlike the previous code, that used the `mov g1 g2` trick to force
both `g1` and `g2` to stall, the scheduling fence will generate `mov
null g1` and `mov null g2`. During review it was decided it was not
worth keeping the special codepath for the small effect will have.
Shader-db results for HSW (i965), BDW and SKL don't have a change
on instruction count, but do report changes in cycles count, showing
SKL results below
total cycles in shared programs: 341738444 -> 341710570 (<.01%)
cycles in affected programs: 7240002 -> 7212128 (-0.38%)
helped: 46
HURT: 5
helped stats (abs) min: 14 max: 1940 x̄: 676.22 x̃: 154
helped stats (rel) min: <.01% max: 2.62% x̄: 1.28% x̃: 0.95%
HURT stats (abs) min: 2 max: 1768 x̄: 646.40 x̃: 362
HURT stats (rel) min: <.01% max: 0.83% x̄: 0.28% x̃: 0.08%
95% mean confidence interval for cycles value: -777.71 -315.38
95% mean confidence interval for cycles %-change: -1.42% -0.83%
Cycles are helped.
This seems to be the effect of allocating two registers separatedly
instead of a single one with size 2, which causes different register
allocation, affecting the cycle estimates.
while ICL also has not change on instruction count but report changes
negative changes in cycles
total cycles in shared programs: 352665369 -> 352707484 (0.01%)
cycles in affected programs: 9608288 -> 9650403 (0.44%)
helped: 4
HURT: 104
helped stats (abs) min: 24 max: 128 x̄: 88.50 x̃: 101
helped stats (rel) min: <.01% max: 0.85% x̄: 0.46% x̃: 0.49%
HURT stats (abs) min: 2 max: 2016 x̄: 408.36 x̃: 48
HURT stats (rel) min: <.01% max: 3.31% x̄: 0.88% x̃: 0.45%
95% mean confidence interval for cycles value: 256.67 523.24
95% mean confidence interval for cycles %-change: 0.63% 1.03%
Cycles are HURT.
AFAICT this is the result of the case above.
Shader-db results for TGL have similar cycles result as ICL, but also
affect instructions
total instructions in shared programs: 17690586 -> 17690597 (<.01%)
instructions in affected programs: 64617 -> 64628 (0.02%)
helped: 55
HURT: 32
helped stats (abs) min: 1 max: 16 x̄: 4.13 x̃: 3
helped stats (rel) min: 0.05% max: 2.78% x̄: 0.86% x̃: 0.74%
HURT stats (abs) min: 1 max: 65 x̄: 7.44 x̃: 2
HURT stats (rel) min: 0.05% max: 4.58% x̄: 1.13% x̃: 0.69%
95% mean confidence interval for instructions value: -2.03 2.28
95% mean confidence interval for instructions %-change: -0.41% 0.15%
Inconclusive result (value mean confidence interval includes 0).
Now that more is done in the IR, more dependencies are visible and
more SWSB annotations are emitted. Mixed with different register
allocation decisions like above, some shaders will see more `sync
nops` while others able to avoid them.
Most of the new `sync nops` are also redundant and could be dropped,
which will be fixed in a separate change.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>
This is the wrong data type, the original field - and the values we're
adding in - are both 64-bit unsigned. Keep the original data type.
Thanks to Dave Airlie for finding this while reading the code.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4802>
The cycle count estimation logic part of the scheduler is now
redundant with the shader performance modeling pass, and the estimates
can be consolidated into the brw::performance analysis result object
instead of being part of the CFG, which guarantees that the estimates
cannot be accessed without previously calling the
performance_analysis::require() method, which makes sure that the
right analysis pass is executed at the right time if we don't already
have up-to-date cached results.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
So we can eventually remove the cycle count estimates from the CFG
data structure and consolidate performance information in the
brw::performance object.
It would be cleaner to pass the brw::performance object directly to
the disassembler but that isn't straightforward since the disassembler
is built as a plain C file unlike the rest of the compiler back-end.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These should be more accurate than the current cycle counts, since
among other things they consider the effect of post-scheduling passes
like the software scoreboard on TGL. In addition it will enable us to
clean up some of the now redundant cycle-count estimation
functionality in the instruction scheduler.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is useful in order to identify codegen issues caused by SIMD32.
It doesn't currently have any effect on compute shaders since SIMD32
dispatch is only enabled for CS when it's strictly necessary to do so
in order to support the workgroup size requested for the shader --
That might change in the future though when we hook up the SIMD32
heuristic to CS compilation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The heuristic enables the SIMD32 fragment shader based on whether the
IR performance modeling pass predicts it to have greater throughput
than the SIMD16 and SIMD8 variants of the same shader. It would be
straightforward to do the same thing in order to control whether
SIMD16 dispatch is enabled, but it's pending additional performance
evaluation.
The INTEL_DEBUG=do32 option is left around in order to force the
SIMD32 shader to be used regardless of the result of the heuristic,
since it's useful as a debugging aid e.g. in order to identify
SIMD32-specific codegen issues which may be masked by the SIMD32
heuristic, or cases where the heuristic is incorrectly disabling
SIMD32 shaders that offer a performance advantage.
Currently this is only enabled on Gen6+, since SIMD32 codegen support
is incomplete on earlier platforms.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes brw_compile_fs() look a bit more similar to
brw_compile_cs(). It saves us three v*_shader_stats local variables,
and will save us additional triplicated declarations as we start
tracking IR performance analysis results.
The triplicated cfg pointers are left around because they're set to
NULL to mark specific dispatch modes as disabled (e.g. in order to
enforce hardware restrictions). Doing the same thing with the visitor
pointers would cause data leaks.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This introduces an analysis pass intended to estimate several
performance statistics of the shader, including cycle count latency
and throughput values, based on static modeling. It has instruction
performance information more comprehensive than the current scheduling
pass for all platforms between Gen4-11, and works on both the FS and
VEC4 back-end.
The most immediate purpose of this pass is to implement a heuristic
meant to determine whether using SIMD32 dispatch for a fragment shader
can be expected to help more than it hurts. In addition this will
allow the effect of passes run after scheduling (e.g. the TGL software
scoreboard pass and the VEC4 dependency control pass) to be visible in
shader-db statistics.
But that isn't the end of the story, other potential applications of
this pass (not part of this MR) I've been playing around with are:
- Implement a similar SIMD16 heuristic allowing the identification of
inefficient SIMD16 fragment shaders.
- Implement similar SIMD16 and SIMD32 heuristics for the compute
shader stage -- Currently compute shader builds always use the
SIMD16 shader if available and never use the SIMD32 shader unless
strictly necessary, which is suboptimal under certain conditions.
- Hook up to the instruction scheduler in order to improve the
accuracy of its timing information.
- Use as heuristic in order to drive the selection of scheduling
modes (Matt was experimenting with that).
- Plug to the TGL software scoreboard pass in order to implement a
more effective SBID token allocation algorithm, since in general
the optimal token allocation depends on the timings of all
instructions in the program.
- Use its bottleneck detection functionality in order to implement a
heuristic computing a more optimal bound for the number of fragment
shader threads executed in parallel (by adjusting the
MaximumNumberofThreadsPerPSD control of 3DSTATE_PS).
As a follow-up I'm planning to submit updated timing information for
Gen12 platforms -- Everything else required to support Gen12 like SWSB
handling is already included in this patch, but there were some IP
concerns regarding the TGL timing parameters since they cannot
currently be obtained with the documentation and hardware which is
publicly available. The timing parameters for any previous Gen7-11
platforms can be obtained by anyone by sampling the timestamp register
using e.g. shader_time, though I have some more convenient
instrumentation coming up.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Makes more sense considering SIMD32. Relaxing the assertion in
brw_ir_fs.h will be required in order to avoid assertion failures on
SNB with SIMD32 fragment shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In Gen12 the Poly 0 Info DWORD containing the Viewport Index and
Render Target Index fields were moved from r0.0 to r1.1 in order to
make room for dual-polygon dispatch. The render target message format
was updated to expect that information in the same location, so we
didn't need to make any changes for framebuffer fetch to work with
SIMD8 and SIMD16 dispatch. Unfortunately that won't work with SIMD32,
since the render target message header is assembled from r0 and r2
instead of r1, and the r2 thread payload wasn't updated with an
additional copy of the same information. We need to fix things up
manually instead. This avoids a handful of
EXT_shader_framebuffer_fetch regressions in combination with SIMD32
fragment shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This applies the same work-around I commited as b84fa0b31e
"intel/fs/gen11: Work around dual-source blending hangs in combination
with SIMD32." to Gen12, which seems to suffer from the same hardware
bug found empirically. The failure mode seems to be identical.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The Gen12 docs are rather contradictory regarding the dispatch
configurations supported by the fragment shader -- The same table
present in previous generations seems to imply that only one dispatch
mode can be enabled when doing per-sample shading, but a restriction
documented in the 3DSTATE_PS_BODY page implies the opposite: That
SIMD32 can only be used in combination with some other dispatch mode.
The latter seems to match the behavior of real hardware as I could
tell from my testing: A bunch of multisample test-cases that do
per-sample shading hang if we only provide a SIMD32 shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Section 2.2.2 (Data Conversions For State Query Commands) of the
OpenGL 4.5 spec says:
Following these steps, if a value is so large in magnitude that
it cannot be represented by the returned data type, then the
nearest value representable using that type is returned.
The current code doesn't do the correct thing, because it truncates a
long (potentially a 64bit values) to an int.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2828
Fixes: 53c36dfcfe
("replace IROUND with util functions")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4673>
Track how resources are used, ie. which state they may potentially dirty
if the backing bo is changed/reallocated, to optimize rebind_resource().
This will be more important in a later patch when we hook up eviction of
entries in a6xx tex state cache.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>
We should only rely on overflow detection for indirect draws, where we
have no other option.
This doesn't use quite the worst-possible-case sizes, which in practice
seem to be ~20x larger than what is required. But instead uses roughly
half of that.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4750>
Example situation:
v1 = {v0.hi, v1.lo}
v0.hi = v1.hi
The 4-byte copy's definition is completely used, but swapping it makes no
sense. We have to split it to generate correct code:
swap(v0.hi, v1.lo)
swap(v0.hi, v1.hi)
Found in dEQP-VK.spirv_assembly.type.vec3.i16.constant_composite_vert
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>
Starting with Gen11, we have two indirect clear colors: An unconverted
float/int version which is us used for rendering and a converted pixel
value version which is used for texturing. Because the one used for
texturing is stored as a single pixel of that color, it works no matter
what format is being used. Because it's a simple HW indirect and
doesn't involve copying surface states around, we can use it in the
sampler without having to worry about surface states having out-of-date
clear values. The result is that we can now allow any clear color when
texturing.
This cuts the number of resolves in a RenderDoc trace of Dota2 by 95%
on Gen11+ (you read that right) and improves perf by 3.5%. It improves
perf in a handful of other workloads by < 1%.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>
Previously, we tried to treat color image layouts as a special case
during render passes. This is largely an artifact of history as our
initial understanding of Vulkan placed much more emphasis on render
passes than our current understanding. The only real practical use for
magic layouts in the middle of a render pass, as far as I can tell, is
to allow more clear colors to get passed through to input attachments.
However, most apps aren't very creative with their clear colors and very
few of them (none coming from DXVK) actually use render passes in any
interesting way. Therefore, the risk of being able to pass fewer clear
colors through to input attachments should be minimal.
There are, however, three very big advantages to this change:
1. We are now consistent in our handling of aux usage and layouts
between color and depth/stencil.
2. We are now actually following the layout guidelines from the app and
aren't nearly as likely to see strange behavior due to us overriding
the image layouts manually.
3. It's more obviously correct. While I think our old render pass code
was probably correct, it was full of corner cases and it's very
possible that it was behaving badly in weird ways. This follows the
Vulkan API much more blindly and, as such, is more likely to be
correct and behave the same as other implementations.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>
Instead of making it a function that pretends to choose aux usage (which
isn't what it does at all), make it a function which returns whether or
not we want to do a fast clear. This is far more accurate to the
purpose of the function.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>
Previously, we bent over backwards to allow non-zero clear colors input
attachments whenever we could. However, very few apps use input
attachments and very few use non-zero clear colors. Getting rid of
support for non-zero clear colors input attachments will allow us to
treat them identically to textures which should help us simplify things
a good bit.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>
In order to actually hit this case you have to be using a very odd
color/view combination. The common cases of clear-to-zero and 0/1 clear
colors with an sRGB view don't require any re-interpretation. This is
probably better than always resolving whenever we have a format mismatch
like we are today because that hits the sRGB case every time.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>
Instead of allocating surface states for attachments in BeginRenderPass,
we now allocate them in begin_subpass. Also, since we're zeroing
things, we can be a bit cleaner about or implementation and just fill
out all those passes for which we have allocated surface states.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>
This commit splits genX(cmd_buffer_setup_attachments)() into three
functions: one which sets up cmd_buffer->state.attachments, one which
allocates surface states, and one which fills out the surface states.
While we're here, we make both functions take the framebuffer (if any)
as an argument instead of pulling it from the command buffer so it's
more clear what things are inputs to the functions. We also make the
render pass and framebuffer parameters const as those are immutable
objects. The only functional change here should be that we now
vk_zalloc the attachments which should be a bit safer.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>
The subpass usage flags are supposed to always be one bit and never
multiple bits. However, when adding in TRANSFER_SRC usage for resolve
attachments we were adding it to the subpass bits and not the render
pass bits. This potentially is causing issues where images aren't
getting marked written properly.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>
By default, RADV supports overallocation by the sense that it doesn't
reject an allocation if the target heap is full.
With VK_AMD_overallocation_behaviour, apps can disable overallocation
and the driver should account for all allocations explicitly made by
the application, and reject if the heap is full.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4785>
The reason behind this is that FMASK requires CMASK and also that
FMASK for non color attachments looks unnecessary. It's currently
much easier to add this simple check because the driver tries to
always enable DCC first and if we enable FMASK only if CMASK, we
might loose some FMASK compressions.
This helps fixing some new robustness2 tests which fails because
only FMASK is enabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4783>
For the DRI2 lowered YUV import separate pipe_resources get created
but in the end the first resource just gets asked for NPLANES.
Since
1) (Almost) everything uses the first resource + a plane index in the
Gallium interface.
2) This mirrors non-imported textures.
lets fix this in the driver.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4779>
On APUs, the memory is unified (all heaps are equally fast) and
apps should count all memory heaps together. But some games like
Id Tech games (Youngblood and such) don't manage memory correctly
on APUs and they spill everything when one VRAM heap is full.
Instead of spilling buffers, they should just allocate new buffers
in the second heap but it seems like these games are confused if
two memory heaps have the DEVICE_LOCAL_BIT set.
This is probably a first step towards better memory management on
APUs but there is still some work to do if we want to run most apps
with a small dedicated VRAM (256MB or so).
This gives a huge boost for Id Tech games on APUs, and doesn't
seem to reduce Feral games performance.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4771>
This lets us get coverage of corner cases of the driver that are tricky to
force a testcase to hit. We don't want to do a full run of the CTS with
each option because that's a lot of runner time, so stack a bunch of
fractional runs in one test job to amortize the test run setup overhead.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4621>
The HW packet requires padding the number of pointers you emit, and we
would assertion fail about running out of buffer space if the number of
UBOs to be uploaded was odd.
Fixes: b4df115d3f ("freedreno/a6xx: pre-calculate userconst stateobj size")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4621>
This should prevent horrors like Iris has with the delayed calls
to iris_resource_finish_aux_import just because info is not
available at allocation time.
AFAICT all drivers just copy the template except radeonsi/r600
which reset the next pointer.
AFAICT there is also no other place we get a state tracker setting
next ptrs on a resource.
v2: Updated Gallium docs.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3792>
This exposes the logic inside one_time_init() as _mesa_initialize(), so
drivers who needs to use functionality initialized in one_time_init
earlier if they need.
This means we can reliably use the GLSL type-system when compiling
driver built-in shaders.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4765>
We process extension overrides only when we initialize the first
context, which means that unrecognized extensions only appear in the
first context created.
Let's instead store them in a global array, so we can apply them to all
contexts. This has the added benefit of making the initialization of the
first context less special, which allows us to clean up code a bit more.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4765>
This is code Bas has out of tree but I think mesa should be shipping it, and I've improved it.
Initially-written-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v2: add infinite recursion fix (Bas)
v3: Fix wayland/xcb barrier, whitespace
v4: use a macro for getting apis, shorten some lines, use outarray
v5: rewrite in C, use hash_table/mutex.
v6: use once_init to init the mutex, fix freeing ht
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1766>
We haven't had it enabled due tointermittent failures. Those failures
are, as far as I can tell, due to GPU faults from buffer overflows where a
failing test in a thread stomps an otherwise passing thread's buffers. By
running deqp single-threaded, we can get more consistent failures, at the
cost of needing to do a tiny subset of the tests to keep runtime down.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4685>
In the top of the trunk LLVM (11) llvm/IR/CallSite.h header
has been removed. The file compiles without this include also
for LLVM 8, but I'm not sure about 9, 10, and older versions
so I disable it only for the latest LLVM
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4748>
Top of the trunk LLVM removes old vector type
and introduces two new ones - one for fixed-width
and one for scalable vectors. This commit fixes
compilation issues by switching from LLVMVectorTypeKind
to LLVMFixedVectorTypeKind for new LLVM.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4748>
I spent over an hour trying to debug a problem if a condition on a
variable not being applied. The problem turned out to be
"a(is_not_negative" instead of "a(is_not_negative)". This commit would
have detected that problem and failed to build.
v2: Just add $ to the end of the existing regex, and it will fail to
match a malformed string. Suggested by Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4720>
If storing to an integer for lower bit size (i.e. 16-bit uint to
10-bit uint), we need to clamp to the maximum value not truncate.
Fixes:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.r16_uint.a2b10g10r10_uint_pack32.optimal_optimal_nearest
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4574>
This fixes two bugs, one in clearing and one in sign extensions
for S8 only types, and enables it for use.
These are useful for vulkan support later.
v2: move casting to same place as Z casting.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4574>
When the depth value was 1.0 and was being converted to Z32_UNORM
the conversion would scale it up to INT32_MAX + 1 which would
cause FPToSI to give incorrect results, changing it to use
FPToUI for the unsigned 32-bit case only fixes it.
Fixes:
GTF-GL45.gtf30.GL3Tests.depth_texture.depth_texture_fbo_clear
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4574>
It's legal to have a stride less than the num of parameters,
in this case no need to try and over map the buffer which asserts
Fixes:
GTF-GL45.gtf43.GL3Tests.multi_draw_indirect.multi_draw_indirect_stride
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4574>
This fixes a few of the image store paths, to do the
correct clamping of unsigned/signed values
Fixes: KHR-GLES31.core.layout_binding.block_layout_binding_block_ComputeShader
KHR-GL45.shader_image_load_store.basic-allFormats-store
KHR-GL46.shader_image_load_store.multiple-uniforms
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4574>
The hardware is capable of automatically filling in certain values in
the VPC without writing them from the last geometry stage, like
gl_PointCoord or gl_PrimitiveID when there is no GS. However, we *do*
have to enable these outputs (i.e. set the VPC_VAR_DISABLE bit to 0) as
VPC_VAR_DISABLE is really about FS inputs rather than VS outputs. To do
this, we move the computation of the enable bits to ir3_link_add(),
which is also a nice refactor anyway. In addition we detect the PrimID
case specifically so that the driver can program the location.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4704>
This patch fixes depth stencil ops on MX6S GC880 and MX6Q GC2000.
The following dEQPs now pass:
dEQP-GLES2.functional.fragment_ops.depth_stencil.stencil_depth_funcs.*
dEQP-GLES2.functional.fragment_ops.depth_stencil.stencil_ops.*
which is roughly 600 fixed dEQP tests.
The problem is that if the front-facing stencil has a value mask 0x00 and
the back-facing stencil has some non-zero value mask, then the stencil part
of the depth stencil buffer is written with 0x00 unconditionally. The blob
replicates the value mask of the back-facing stencil to the value mask of
the front-facing stencil to achieve correct rendering, replicate the same
behavior here.
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4275>
split_store_data() splits a vector and p_as_uniforms it if needed.
scan_write_mask()/advance_write_mask() are similar to
u_bit_scan_consecutive_range(), but makes it easier to only clear part of
the range and will also give ranges for zero'd bits.
split_buffer_store() is a helper for splitting VMEM/SMEM stores.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4639>
The TCS inputs and outputs must always fit into the LDS,
which implies that their addresses also always fit 24 bits.
On AMD GPUs, 24-bit multiplication is much faster than 32-bit
multiplication, so we can take the opportunity to use that
for TCS I/O instead.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4536>
Previously, it was just unreachable, which means it will generate
invalid shaders when it encounters a situation when it can't allocate
registers for eg. a large load.
This commit makes it slightly easier to notice such problems without
triggering a GPU hang.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4536>
This commit completely rewrites the way we extract a structured CFG from
SPIR-V. The new approach is different in a few ways:
1. It does a breadth-first search instead of depth-first. This means
that we've visited the merge node for a construct before we visit
any of the nodes inside the construct. This makes it easier to
validate things like loop and switch nesting.
2. We record more information in the CFG. Earlier commits added a
parent pointer to vtn_cf_node but we now record all of the merge and
other special blocks for each CFG node. This lets us validate
things more precisely.
3. It makes heavy use of merge blocks for walking the CFG. Previously,
we sort of used them as hints for trying to guess the CFG structure
but things got dicey whenever a merge was missing. We had some
heuristics for how to handle short-circuiting if statements but it
was a bunch of special cases.
Now, we make them a fundamental part of walking the CFG. When we
encounter a control-flow construct, we add the body components of
the construct to the BFS work list and then jump to the merge block
if one exists to continue scanning the current CFG nesting level.
If no merge block exists, we assume that means that control-flow
never re-converges in a normal way and that the only way to get back
to normality is with a direct jump such as a loop break or continue.
This should make things far more robust when trying to deal with the
more creative placement (or lack thereof) of merge instructions.
Reviewed-by: Alan Baker <alanbaker@google.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3820>
Closes: #2760
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4446>
When we originally wrote spirv_to_nir we didn't have a good scalar value
union to handily use so we rolled our own thing for spec constants. Now
that we have nir_const_value, we can use that and simplify a bunch of
the spec constant logic.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4675>
This fixes find_and_update_named_uniform_storage() for subroutines
and also updates num_shader_uniform_components for non opaque
uniforms.
The following patch will ensure this type of bug won't happen again.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4721>
If you have a struct, the var's base driver location is not the last
driver location that will be accessed in that var. We have a shader
struct member with this number for us, already. Fixes overflows in:
dEQP-GLES31.functional.program_interface_query.program_output.type.interface_blocks.out.named_block_explicit_location.struct.mat3x2
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4670>
When I was writing drm-shim, I was focused on the v3d kmsro case -- use my
intel device as the kmsro display device and add on a simulator-based v3d
device that we could render with. But for the noop backends we use for
shader-db, it's a lot more useful to just overwrite the first render node
in the system so that you don't have to pass a -d <how many render nodes I
already have in my system> argument.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4664>
Back in a2xx, HW pitches were in pixels, so storing that was reasonable.
Ever since then, the HW wants pitches in bytes, and we have only one
instance of using pitch in pixels in the code (a3xx sysmem path).
Flip things around so that only a2xx has to worry about the cpp for
looking at pitches.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4558>
Instead of copying the operands of the other p_create_vector and labelling
the definition with label_vec, copy the operands and label it with
label_temp so that it can be copy-propagated.
This was found while removing a redundant copy in load_input_from_temps()
which removed duplicate p_create_vector instructions.
shader-db (Navi):
Totals from 139 (0.11% of 127638) affected shaders:
VGPRs: 8472 -> 7948 (-6.19%)
CodeSize: 514592 -> 512368 (-0.43%)
MaxWaves: 1089 -> 1195 (+9.73%)
Instrs: 100214 -> 99658 (-0.55%)
Cycles: 400856 -> 398632 (-0.55%)
VMEM: 15545 -> 15338 (-1.33%)
Copies: 5140 -> 4584 (-10.82%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4667>
For copies like v[7:8] = v[8:9], what currently happens is:
- do_copy() will skip the second dword
- the uses of the second dword will be reduced to 0
- the copy operation will be removed from the map
and v8 will never be set to v9.
So just decrease the uses of other operations after splitting or removing
the current operation, so: "v8 = v9" will be split off, it's uses reduced
and then the new copy will be done in the next iteration.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4686>
The macro "_WINVER" does nothing, the macro definitions that matter for
windows API version selection are "_WIN32_WINNT" and "WINVER".
The header "sdkddkver.h" (which is included from thousands of
different windows-headers) defines "WINVER" to the same value as
"_WIN32_WINNT" of only the latter is defined, which explains why this
works right now. But we shouldn't depend on that kind of luck, and
instead define the right maco.
Fixes: 3aee462781 ("meson: add windows compiler checks and libraries")
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4681>
nir_opt_load_store_vectorize has an is_strided_vector() function that
looks for types with weird explicit strides. It does so by comparing
the explicit stride against the type-size-derived typical stride.
This had a subtle bug. Simple vector types (vec2/3/4) have no explicit
stride, so glsl_get_explicit_stride() returns 0. This never matches the
typical stride for a vector, so is_strided_vector() would return true
for basically any vector type, causing the vectorizer to bail.
I found this by looking at a compute shader with scalar SSBO loads at
offsets 0x220, 0x224, 0x228, 0x22c. nir_opt_load_store_vectorize would
properly vectorize the first two into a vec2 load, but would refuse to
extend it to a vec3 and ultimately vec4 load because is_strided_vector()
saw a vec2 and freaked out.
Neither ACO nor ANV do load/store vectorization before lowering derefs,
so this shouldn't affect them. However, I'd like to fix this bug to
avoid the trap for anyone who decides to in the future. In a branch
where anv used this lowering, this cut an additional 38% of the send
messages in the shader by properly vectorizing more things.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4255>
In case of transform feedback query, it writes two integer values,
which one is for primitives written and another is for primitives
generated.
To handle this, the second member of the struct slot_value is worth
to be presented not as a padding.
In addition, we also need to modify get/copy_result to access both
values.
This patch is the prep work for the transform feedback query support.
Tested with
dEQP-VK.pipeline.timestamp.*
dEQP-VK.query_pool.occlusion_query.*
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Brian Ho <brian@brkho.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4604>
VEC4_OPCODE_PICK_HIGH_32BIT performs 32-bit UD access on a 64-bit DF
value. abs and negate make sense on DF, but break entirely when
trying to access pieces of the value as unsigned integer dwords.
Fixes an fsign Piglit test on Ivybridge:
tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/vs-sign-neg-abs
It had regressed when I removed nir_lower_to_source_modifiers, as that
caused us to start generating different code which provoked this bug.
Fixes: b7c47c4f7c ("intel/compiler: Drop nir_lower_to_source_mods() and related handling.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2817
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4691>
Fixes all the ARB_texture_query_lod piglit tests, and needed to get
the Vulkan CTS textureQueryLOD passing with the ongoing Vulkan driver.
Note that LOD Query bit flag became only available on V42 of the hw,
but the v3d40_tex is using V41 as reference. In order to avoid setting
up the infrastructure to support both v41 and v42, we manually set the
bit if the device version is the correct one.
We also fix how the ARB_texture_query_lod (so EXT_texture_query_lod)
is exposed. Before this commit it was always exposed (wrongly as it
was not really supported). Now it is exposed for devinfo.ver >= 42.
v2: move _need_sampler helper to nir.h (Eric Anholt)
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4677>
Configuration Parameter packets 1 and 2 are pointed as optional, but
it is not clearly stated if you can skip only P1 when P2 is needed.
In the practice, it seems that the situation P0 - non-P1 - P2 can
causes problems, and at least on the simulator, it seems that sampler
info are attempted to be accessed. So let's just be conservative, and
only skip P1 configuration if we can skip P2 configuration too.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4677>
TMU configuration parameter 1 configures the sampler for the texture
operation. But there are some texture operations that doesn't need a
sampler. Skipping the configuration could provide a small perf
improvement on OpenGL. On the incoming Vulkan driver, would allow us
to avoid to set up an unneeded sampler.
Note that we still need to add the sampler configuration parameter if
the output is a 32bit, as it is on the sampler where we configure that
info.
Also, note that for images this is done comparing against a unpacked
p1 default. But in order to do that it is needed to go through the
code that fills up the unpacked p1. We can skip that too.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4677>
Passes tests in:
dEQP-VK.pipeline.multisample.sample_locations_ext.*
Note that these tests fail because of gl_PrimitiveID not working correctly:
dEQP-VK.pipeline.multisample.sample_locations_ext.verify_location.*
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4665>
discard is supposed to be a terminator, killing the thread, so that it's
possible to exit main solely by a discard e.g. inside of an infinite
loop. However, it currently isn't treated as a terminator in NIR due to
workarounds turning it into demote (d3d-style kill) and even if that
were fixed, we probably wouldn't want to treat discard_if as a jump
since otherwise the scheduler wouldn't be able to schedule things around
it. So, add this workaround which inserts jump instructions as
necessary to guarantee that the program always terminates.
This fixes a hang in dEQP-VK.graphicsfuzz.while-inside-switch, which
conditionally does a discard inside an infinite loop.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4658>
The first block was being added to the list twice, once here and once in
emit_block(), leading to list corruption and infinite loops when trying
to traverse the list of blocks backwards.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4658>
Since we consume NIR, we get FRAG_RESULT_DEPTH in .x. Something must have
been working out for this code to not be trying to get an undefined value,
but go ahead and drop it now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4668>
For GLVND reasons the client/platform extensions strings should be
split. While in the non GLVND case they're one big string.
Currently we handle this distinction at run-time for not obvious reason.
Adding additional code and complexity.
Swap those with a few well placed #if USE_LIBGLVND guards.
As a side result this removes a minor memory leak due to the
concatenation in the non GLVND case.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4491>
I think we're unanimous in wanting to drop nir_lower_to_source_mods.
It's a bit of complexity to handle in the backend, but perhaps more
importantly, would be even more complexity to handle in nir_search.
And, it turns out that since we made other compiler improvements in the
last few years, they no longer appear to buy us anything of value.
Summarizing the results from shader-db from this patch:
- Icelake (scalar mode)
Instruction counts:
- 411 helped, 598 hurt (out of 139,470 shaders)
- 99.2% of shaders remain unaffected. The average increase in
instruction count in hurt programs is 1.78 instructions.
- total instructions in shared programs: 17214951 -> 17215206 (<.01%)
- instructions in affected programs: 1143879 -> 1144134 (0.02%)
Cycles:
- 1042 helped, 1357 hurt
- total cycles in shared programs: 365613294 -> 365882263 (0.07%)
- cycles in affected programs: 138155497 -> 138424466 (0.19%)
- Haswell (both scalar and vector modes)
Instruction counts:
- 73 helped, 1680 hurt (out of 139,470 shaders)
- 98.7% of shaders remain unaffected. The average increase in
instruction count in hurt programs is 1.9 instructions.
- total instructions in shared programs: 14199527 -> 14202262 (0.02%)
- instructions in affected programs: 446499 -> 449234 (0.61%)
Cycles:
- 5253 helped, 5559 hurt
- total cycles in shared programs: 359996545 -> 360038731 (0.01%)
- cycles in affected programs: 155897127 -> 155939313 (0.03%)
Given that ~99% of shader-db remains unaffected, and the affected
programs are hurt by about 1-2 instructions - which are all cheap
ALU instructions - this is unlikely to be measurable in terms of
any real performance impact that would affect users.
So, drop them and simplify the backend, and hopefully enable other
future simplifications in NIR.
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4616>
In meson 0.54.0 I fixed the llvm cmake dependency to return "not found"
if shared linking is requested. This means that for 0.54.0 and later we
don't need to do anything, and for earlier versions we only need to
change the logic to force the config-tool method if shared linking is
required.
Fixes: 821cf6942a
("meson: Use cmake to find LLVM when building for window")
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4556>
I'm not really sure where else to put it. Since imports.h only has two
things left in it (neither of which are abstractions for smoothing away
libc differences) I'd like to get them out of there. macros.h is the
only place I can think of to put this macro.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3024>
This adds two new util functions to rounding.h, _mesa_iroundf and
mesa_lround, which are just wrappers around roundf and round, that cast
to int and long int respectively. This is possible since mesa recently
dropped support for VC2013, since 2015 and 2017 support roundf.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3024>
I wanted to look at the effect of a core NIR change on a2xx codegen, but I
don't have any of those boards. This could also prove useful for quickly
sanity-checking the compiler by running shader-db on it -- a2xx fails in a
few ways on glmark2, and a3xx-a5xx fails on glmark2 in a debug_assert
(which we don't have enabled in our dEQP runs).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4652>
After call to nir_shader_gather_info - inputs_read may have changed so
st_nir_assign_vs_in_locations should be called for shader to remain in
sync with vbo state.
Fixes piglit tests:
gl-1.0-fpexceptions
gl-1.1-color-material-unused-normal-array
arb_vertex_program-unused-attributes
regression on several gallium drivers.
Fixes: d684fb37bf
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4645>
This appears to be limited by VPC_CNTL_0::NUMNONPOSVAR, which is an
8-bit bitfield with no possibility for expansion. Also, in practice
we'll be limited by the vertex shader output maximum, which includes
gl_Position, of 128, so that users won't be able to use more than 124
components anyways. Lower it to match the GL blob.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4641>
The extra bit needs to be used when using the maximum of 128 varying
components. I confirmed that PC_PRIMITIVE_CNTL_1 and SP_PRIMITIVE_CNTL
are expanded using a trace of the Vulkan blob with the maximum number of
varyings, and changed the others by analogy.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4641>
Version 4.4 of the GLSL spec changed the definition of noise*() to
always return zero and earlier versions of the spec allowed zero as a
valid implementation.
All drivers, as far as I can tell, unconditionally call lower_noise()
today which turns ir_unop_noise into zero. We've got a 10-year-old
comment in there saying "In the future, ir_unop_noise may be replaced by
a call to a function that implements noise." Well, it's the future now
and we've not yet gotten around to that. In the mean time, the GLSL
spec has made doing so illegal.
To make things worse, we then pretend to handle the opcode in
glsl_to_nir, ir_to_mesa, and st_glsl_to_tgsi even though it should never
get there given the lowering. The lowering in st_glsl_to_tgsi defines
noise*() to be 0.5 which is an illegal implementation of the noise
functions according to pre-4.4 specs. We also have opcodes for this in
NIR which are never used because, again, we always call lower_noise().
Let's just kill the whole opcode and make builtin_builder.cpp build a
bunch of functions that just return zero.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4624>
Consider the following case:
// g119-123 are written somewhere above
mul.sat(16) g67<1>F g6.4<0,1,0>F g125<8,8,1>F
mul.sat(16) g69<1>F g6.5<0,1,0>F g125<8,8,1>F
mul.sat(16) g71<1>F g6.6<0,1,0>F g125<8,8,1>F
mov(16) g119<1>F g67<8,8,1>F
mov(16) g121<1>F g69<8,8,1>F
mov(16) g123<1>F g71<8,8,1>F
We should be able to coalesce it into
mul.sat(16) g119<1>F g6.4<0,1,0>F g125<8,8,1>F
mul.sat(16) g121<1>F g6.5<0,1,0>F g125<8,8,1>F
mul.sat(16) g123<1>F g6.6<0,1,0>F g125<8,8,1>F
What's stopping us is an overly conservative check for writes to the two
registers being coalesced. The check walks over the intersection of
their live ranges and checks for no writes to either one. However,
because the register which starts the live range (the mul.sat in this
case) is inside that intersection, we flag it as a write in the
intersection and don't coalesce. However, this case is safe because the
destination register of the copy is never read after the source is
written.
Shader-db changes on ICL:
total instructions in shared programs: 16043613 -> 16042610 (<.01%)
instructions in affected programs: 43036 -> 42033 (-2.33%)
helped: 226
HURT: 0
helped stats (abs) min: 1 max: 30 x̄: 4.44 x̃: 4
helped stats (rel) min: 0.09% max: 26.67% x̄: 4.89% x̃: 3.43%
95% mean confidence interval for instructions value: -4.86 -4.02
95% mean confidence interval for instructions %-change: -5.57% -4.22%
Instructions are helped.
total cycles in shared programs: 334766372 -> 334710124 (-0.02%)
cycles in affected programs: 617548 -> 561300 (-9.11%)
helped: 214
HURT: 2
helped stats (abs) min: 15 max: 1512 x̄: 263.21 x̃: 212
helped stats (rel) min: 0.30% max: 75.36% x̄: 25.30% x̃: 21.58%
HURT stats (abs) min: 40 max: 40 x̄: 40.00 x̃: 40
HURT stats (rel) min: 0.15% max: 0.15% x̄: 0.15% x̃: 0.15%
95% mean confidence interval for cycles value: -277.91 -242.90
95% mean confidence interval for cycles %-change: -27.58% -22.55%
Cycles are helped.
No spill/fill changes or gained/lost
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4627>
If we have a clip dist float[1] compact followed by a tess factor
float[2] we don't want to overlap them, but the partial check
only happens for non-compact vars.
This fixes some issues seen with my sw vulkan layer with
dEQP-VK.clipping.user_defined.clip_distance*
v2: v1 failed with clip/cull mixtures, since in that
case the cull has a location_frac to follow after the clip
so only reset if we get a location_frac of 0 in a subsequent
clip var
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4635>
In the long term the goal of this script is to nearly completely
automate the process of picking stable nominations, in a well tested
way.
In the short term the goal is to provide a better, faster UI to interact
with stable nominations.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3608>
Use the new DRM_IOCTL_I915_GEM_MMAP_OFFSET ioctl when available.
[jordan.l.justen@intel.com: iris port]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
v2: Update getparam check (Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1675>
We want to add a new ioctl for mmap'ing buffers, so let's avoid
duplicating that code on both functions by extracting it from them
first.
[jordan.l.justen@intel.com: iris port]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
v2: Rename helper function names (Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1675>
After the decorations of a variable are evaluated, propagate the
access flag to the associated vtn_pointer. This was done when
creating the pointer but at that point there was no access flags for
the variable.
Inline the pointer creation to make this point clearer, in isolation
the helper made the impression that the value was being propagated.
Issue found by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4620>
Change brw_memory_fence to return the number of messages emitted, and
use that to update the send_count statistic in code generation.
This will fix the book-keeping for IVB since the memory fences will
result in two SEND messages.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4646>
Meson specifies /EDITANDCONTINUE for MSVC projects when using the debug
build-type. This collides with our across-the-board disabling of
incremental linking.
It's clear that we don't want to do incremental linking for
release-builds; it increase the code-size, and adds some needless jumps
to be able to patch in new code. But for debug-builds this seems like a
good thing; we can now debug and on-the-fly recompile changes if we want
to.
This flag seems to have been simply forwarded from the SCons build
system, where it makes a bit more sense; SCons doesn't really integrate
with visual studio, so you can't properly debug with it. But Meson does,
so let's keep some bells-and-whistles here.
So let's avoid disabling incremental linking for debug-builds. For other
builds we still want to do this, because Meson only disables it
automatically for minsize-builds.
This avoids a boat-loads of warnings on the form:
warning LNK4075: ignoring '/EDITANDCONTINUE' due to '/INCREMENTAL:NO' specification
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4572>
This helps us avoid moving the movs outside if branches when there
src can't be scalarized.
For example it avoids:
vec4 32 ssa_7 = tex ssa_6 (coord), 0 (texture), 0 (sampler),
if ... {
r0 = imov ssa_7.z
r1 = imov ssa_7.y
r2 = imov ssa_7.x
r3 = imov ssa_7.w
...
} else {
...
if ... {
r0 = imov ssa_7.x
r1 = imov ssa_7.w
...
else {
r0 = imov ssa_7.z
r1 = imov ssa_7.y
...
}
r2 = imov ssa_7.x
r3 = imov ssa_7.w
}
...
vec4 32 ssa_36 = vec4 r0, r1, r2, r3
Becoming something like:
vec4 32 ssa_7 = tex ssa_6 (coord), 0 (texture), 0 (sampler),
r0 = imov ssa_7.z
r1 = imov ssa_7.y
r2 = imov ssa_7.x
r3 = imov ssa_7.w
if ... {
...
} else {
if ... {
r0 = imov r2
r1 = imov r3
...
else {
...
}
...
}
While this is has a smaller instruction count it requires more work
for the same result. With more complex examples we can also end up
shuffling the registers around in a way that requires more registers
to use as temps so that we don't overwrite our original values along
the way.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
We can't move them later as we could move them into non-uniform
control flow, but moving them earlier should be fine.
This helps avoid a bunch of spilling in unigine shaders due to
moving the tex instructions sources earlier (outside if branches)
but not the instruction itself.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
Classically, global code motion is also a dead code pass. However, in
the initial implementation, the decision was made to place every
instruction and let conventional DCE clean up the dead ones. Because
any uses of a dead instruction are unreachable, we have no late block
and the dead instructions are always scheduled early. The problem is
that, because we place the dead instruction early, it pushes the
placement of any dependencies of the dead instruction earlier than they
may need to be placed. In order prevent dead instructions from
affecting the placement of live ones, we need to delete them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
We are about to adjust our instruction block assignment algorithm and we
will want to know the current block that the instruction lives in. In
order to allow for this, we can't overwrite nir_instr::block in the
early scheduling pass.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
Chocolatey installs depend on downloading binaries from SourceForge,
which is an unreliable host: container builds often fail because it
cannot pick up winflexbison.
Add a loop to retry chocolatey installs if any installs have failed, and
ensure Python is in the accessible PowerShell path rather than relying
on the path being externally refreshed.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4631>
Inside nested flow control, nir_lower_returns inserts predicated breaks
in the outer block. However, it would omit doing this if the remainder
of the outer block (after the inner block) was empty. This is not
correct in the case of loops, as execution just wraps back around to the
start of the loop, so this change doesn't skip the predication inside
loops.
Fixes: 79dec93ead (nir: Add return lowering pass)
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2724
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4603>
Push constants in particular can get picked up by the hardware at weird
times that happen *before* 3DPRIMITIVE. Therefore, we need to flush
before we emit all our state to ensure that any data they may pick up is
in memory in time. This fixes an app which does vkCmdCopyBuffers
immediately followed by a vkCmdBeginRenderPass and vkCmdDraw which uses
the destination of the copy as a UBO which we push.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4601>
The intersects() function assumes that inside each instruction values
always die before they are defined, so that if the end of one range is
the same instruction as the beginning of the next then they don't
intersect. However, this isn't the case for values that become live at
the beginning of a basic block, which become live *before* the first
instruction, or instructions that die at the end of a basic block which
die after the last instruction.
For example, imagine that we have two values, A which is defined earlier
in the block and B which is defined in the last instruction of the block
and both die at the end of the basic block (e.g. are used in the next
iteration of a loop). We would compute a range for A of, say, (10, 20)
and for B of (20, 20) since each block's end_ip is the same as the ip of
the last instruction, and RA would consider them to not interfere.
There's a similar problem with values that become live at the beginning.
The fix is to offset the block's start_ip and end_ip by one so that they
don't correspond to any actual instruction. One way to think about this
is that we're adding fake instructions at the beginning and end of a
block where values become live & die. We could invert the order, so that
values consumed by each instruction are considered dead at the end of
the previous instruction, but then values that become dead at the
beginning of the basic block would incorrectly have an empty live range,
with a similar problem at the end of the basic block if we try to say
that values are defined at the beginning of the next instruction. So
the extra padding instructions are unavoidable.
This fixes an accidental infinite loop in the shader for
dEQP-VK.spirv_assembly.type.scalar.u32.switch_vert.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4614>
We use vtn_vector_extract to handle vector component level derefs. This
makes us gracefully handle the case where your vector component is OOB
and give you an undef. The SPIR-V working group is still working out
whether or not this is technically legal but it's very little code for
us to handle it so we may as well.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4495>
The versions that take a specific number of operands will do various
fixups depending on the platform and the opcode. However, the version
that takes an array of sources did not. This makes all version operate
similarly.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4582>
This commit fixes performance regressions introduced by e03f965280
in which we started bounds checking our push constants. This added a
LOT of shader code to shaders which use the robustBufferAccess feature
and led to substantial spilling. The checking we just added to the FS
back-end is far more efficient for two reasons:
1. It can be done at a whole register granularity rather than per-
scalar and so we emit one SIMD8 SEL per 32B GRF rather than one
SIMD16 SEL (executed as two SELs) for each component loaded.
2. Because we do it with NoMask instructions, we can do it on whole
pushed GRFs without splatting them out to SIMD8 or SIME16 values.
This means that robust buffer access no longer explodes our register
pressure for no good reason.
As a tiny side-benefit, we're now using can use AND instead of SEL which
means no need for the flag and better scheduling.
Vulkan pipeline database results on ICL:
Instructions in all programs: 293586059 -> 238009118 (-18.9%)
SENDs in all programs: 13568515 -> 13568515 (+0.0%)
Loops in all programs: 149720 -> 149720 (+0.0%)
Cycles in all programs: 88499234498 -> 84348917496 (-4.7%)
Spills in all programs: 1229018 -> 184339 (-85.0%)
Fills in all programs: 1348397 -> 246061 (-81.8%)
This also improves the performance of a few apps:
- Shadow of the Tomb Raider: +4%
- Witcher 3: +3.5%
- UE4 Shooter demo: +2%
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4447>
Even though vkCmdRenderPassBegin() isn't allowed inside a secondary
command buffer, vkCmdDispatch() is, and we emit an IB with compute
dispatches, which means that if the secondary command buffer records a
vkCmdDispatch() then we'll have an IB inside an IB, which is illegal.
Fixes hangs in e.g.
dEQP-VK.api.command_buffers.record_simul_use_secondary_one_primary.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4605>
One notable change is that DRAW_INFO_START_WITH_USER_INDICES is enabled.
An audit of the code indicates that it should work, and a number of
piglit tests exercising glMultiDrawElements continue to function.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4520>
We make some assumptions in opt_large_constants such as the size_align
function returning the obvious sizes for vectors. Now that we've got
the deref_size lying around, we may as well assert it's consistent with
our assumptions. In particular, we now assert that it really claims
booleans are 32-bit. If anyone's driver ever decides to be clever and
change this, we'll now catch the breakage earlier.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4468>
In f1883cc73d we tried to pass through alignments from load_constant
intrinsics when rewriting them to load_ubo in iris. However, those
intrinsics don't have ALIGN_MUL or ALIGN_OFFSET indices. It's easy
enough to add them. We just call the size/align function on the vector
type at the end of our deref chain and use the alignment returned from
there. It's possible we could do better by walking the whole deref
chain but this should be good enough.
Fixes: f1883cc73d "iris: Set alignments on cbuf0 and constant reads"
Closes: #2739
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4468>
flexint.h uses stdint.h if the compiler claims to support C99. MSVC
doesn't support enough of C99 to enable this flag, but it supports
enough to keep flex happy.
Without this, we end up with *both* some flex-specific definitions as
well as our own definitions from mesa-headers, producing a slew of
compiler warnings.
Reviewed-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4577>
On Windows, main/glheader.h ends up including windows.h which in turn
includes wingdi.h unless the NOGDI macro is defined. And wingdi.h
defines a macro called "ERROR", which we end up redefining below.
To avoid a warning on the redefinition, we can define NOGDI to prevent
wingdi.h from implicitly being included.
Reviewed-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4577>
Instead of exposing various layout functions, move image-related logic
into tu_image.c and have the image_view pre-fill relevant register values.
This changes the clear/blit code to use image_view.
This will make it much easier to deal with aspect masks, in particular for
planar formats and D32_S8.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4581>
Even though we normally use the CP_BLIT path with resolves that aren't
aligned, there's a special case when we're resolving the entire image
and there's enough padding so that we can still use CP_EVENT_WRITE::BLIT
when the render area isn't aligned. The hardware seems to not like
unaligned scissors when not clearing, and sometimes hangs rather than
silently round the scissor. This causes hangs in e.g.
dEQP-VK.glsl.derivate.dfdx.texture.msaa4.float_highp.
There was some concern that the CP_BLIT path might use this scissor
also, but I confirmed that this isn't the case by setting it to 0 before
resolving and then noting that CP_BLIT still works (but CP_EVENT_WRITE
doesn't). Furthermore, this is actually impossible because of how the 2D
engine is set up: it gets its own pair of register banks, which can be
switched independently of the 3D register banks, so that 2D events
(CP_BLIT) normally aren't synchronized relative to 3D events
(CP_EVENT_WRITE, CP_DRAW_*, and CP_EXEC_CS) and therefore they can't
share any registers except for non-pipelined registers like RB_CCU_CNTL
that don't use the register bank mechanism at all.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4585>
The enum values can be used directly as indices into arrays, simplifying
the code.
This significantly cuts down the number of CPU cycles spent inside
* Addr::V2::Gfx9Lib::HwlComputeDccAddrFromCoord:
+------------------------------------------------------------------------+
|+ +++ + x x xx|
| |_____AM____| |_A__||
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 5 14.89 15.44 15.14 15.156 0.24704251
+ 5 8.26 9.96 9.37 9.282 0.6262747
Difference at 95.0% confidence
-5.874 +/- 0.694294
-38.7569% +/- 4.58098%
(Student's t, pooled s = 0.476051)
* Addr::V2::CoordEq::solve:
+------------------------------------------------------------------------+
| + x |
| + + + + x x x x|
||__MA____| |______A__M____||
+------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 5 8.11 9.59 9.21 9.02 0.55605755
+ 5 4.28 5.05 4.48 4.564 0.32867917
Difference at 95.0% confidence
-4.456 +/- 0.666135
-49.4013% +/- 7.38509%
(Student's t, pooled s = 0.456744)
(The measured numbers are the percentages of samples inside the
respective function and its calles for
`perf record --call-graph=fp kitty -e false`, measured on a Lenovo
Thinkpad E595 (Picasso))
v2:
* Add missed 'coords[dim] |= bit << ord;' (Pierre-Eric Pelloux-Prayer)
* Put 'ADDR_ASSERT(dim < DIM_S);' where the code previous had
'ADDR_ASSERT_ALWAYS()' for the s/m dimensions.
* Use 1u for BitsValid (since it's 32-bit unsigned values).
* Use parens in 'BitsValid[dim] & (1u << ord)' for clarity.
Acked-by: Marek Olšák <marek.olsak@amd.com> # v1
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4523>
We were treating count == 0 as the format not being supported at all,
but queryDmaBufModifiers would return false in that case.
Fixes spuriously reporting all formats as unsupported with radeonsi
(which doesn't support modifiers yet), which would e.g. cause mutter
to think the HW cursor format isn't supported and fall back to SW
cursor.
Suggested-by: Daniel Stone <daniels@collabora.com>
Fixes: 4e3a7dcf6e "gallium: enable
EGL_EXT_image_dma_buf_import_modifiers
unconditionally"
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4532>
Otherwise, we might end up with a tiling-capable driver creating a
tiled resource here instead of linear. This is currently possible with
Zink, although we currently force all display-targets to be linear.
But that doesn't seem like a good idea in the long run, so let's loosen
this restriction.
Acked-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2678>
Don't emulate them with uncompressed formats if the host
support them since uncompressed formats like GL_R16 could
be not available on GLES hosts.
Signed-off-by: Lepton Wu <lepton@chromium.org>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
tc_transfer_map maps buffers directly, but the unmap operation is executed
in the driver thread.
When an application does a lot of map/unmap operations, without flushing,
this increase the RAM used (and eventually get the app killed by the oom-killer).
This commit allows tc to keep track of how many bytes were mapped during
the current batch. When this estimation becomes higher than a threshold,
we flush the batch.
See: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2735
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4508>
To workaround a crash with Wolfeinstein Younglood because the
games creates one descriptor with
VK_DESCRIPTOR_TYPE_ACCELERATION_STRUCTURE_NV...
I reported the problem to Machine Games, but still no answer, so
let's remove the unreachable calls (which are technically not
unreachable for buggy apps) to help gamers.
Note that AMDVLK and AMDGPU-PRO don't crash because they ignore
unsupported descriptor types.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4571>
I had missed that LDC actually uses vec4 units for its offset. This
means that we have to create a new instruction, and lower it in
ir3_nir_lower_io_offsets, similar to the existing SSBO instructions.
Unfortunately we can't assume that loads are always vec4-aligned, so we
have to use the alignment information that NIR gives us. Unfortunately,
it's currently woefully inadequate, and will have to be fixed to give us
good codegen in the future.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4568>
This reverts commit a0e57432b7.
It's unclear what caused the test to fail back then. Now it's seems to be
reversed. I tested with a close enough piglit and mesa branch and wasn't
able to reproduce the same test result I've got in some older piglit runs.
Fixes:
dEQP-GLES2.functional.rasterization.primitives.lines_wide
dEQP-GLES2.functional.rasterization.primitives.line_strip_wide
dEQP-GLES2.functional.rasterization.primitives.line_loop_wide
dEQP-GLES2.functional.rasterization.limits.points
dEQP-GLES2.functional.clipping.line.wide_line_z_clip
dEQP-GLES2.functional.clipping.line.wide_line_z_clip_viewport_center
dEQP-GLES2.functional.clipping.line.wide_line_z_clip_viewport_corner
dEQP-GLES2.functional.clipping.line.wide_line_clip
dEQP-GLES2.functional.clipping.line.wide_line_clip_viewport_center
dEQP-GLES2.functional.clipping.line.wide_line_clip_viewport_corner
dEQP-GLES2.functional.clipping.line.wide_line_attrib_clip
dEQP-GLES2.functional.polygon_offset.default_result_depth_clamp
dEQP-GLES2.functional.polygon_offset.default_factor_1_slope
dEQP-GLES2.functional.polygon_offset.fixed16_result_depth_clamp
dEQP-GLES2.functional.polygon_offset.fixed16_factor_1_slope
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4575>
This fixes two bugs: First, if the same block index showed up twice, we
only pick the first one. Second, we weren't multiplying by 32. This
didn't show up in tests because RBA testing is garbage. Found while
looking at shaders from the UE4 Shooter demo.
Fixes: e03f9652 "anv: Bounds-check pushed UBOs when..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4578>
Iris allocates gem buffers using buckets of allocation sizes that are
page aligned. We always ask for batch buffers of size BATCH_SZ +
BATCH_RESERVED, which is not page aligned: we ask for 65552 bytes,
which ends up in the bucket of size 81920, resulting in 20% unused
space. Adjust things so there is no waste of space: BATCH_SZ +
BATCH_RESERVED is now 65536.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4561>
This decreases the size of the struct on a 64bit machine from 144 to
136. While that's not a lot, this is one of the structs that we're
allocating all the time.
For a full Aztec run on BDW we allocate this struct 3273 times, and we
can have up to 3259 of them live at the same time. So we end up saving
just a little over 6 pages for this benchmark.
Spotted this while trying to add another bool for an unrelated
feature.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4561>
After some experimentation, I believe that GRAS_LAYER_CNTL is
actually just a count register storing the number of layers in the
render target. While debugging cube_array geometry tests, I noticed
that the blob was setting an unknown 0x8 to LAYER_CNTL, so I checked
the value of LAYER_CNTL for various layer sizes:
1: LAYER_CNTL=0
2: LAYER_CNTL=1
3: LAYER_CNTL=2
4: LAYER_CNTL=3
9: LAYER_CNTL=8
256: LAYER_CNTL=255
2000: LAYER_CNTL=1999
Seems like this register just stores a count of the largest layer
that can be written to via gl_Layer. This commit updates the reg
docs, freedreno's gs implementation, and turnip's gs implementation.
Fixes dEQP-VK.geometry.layered.cube_array.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4541>
Previously we were using layout.layer_size for the layer stride, but
in Vulkan, you can alias a 3D image as an array of 2D images via the
VK_IMAGE_CREATE_2D_ARRAY_COMPATIBLE_BIT flag. One reason to use
this behavior is so the geometry shader can write to a specific
depth in a 3D framebuffer with gl_Layer.
Since the 3D image is not a *true* layered image, layer_size is 0.
Instead, we can copy what freedreno does and use the slice size.
Fixes dEQP-VK.geometry.layered.3d.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4541>
Earlier commit added the code unconditionally, since the loader code
itself is already built on macOS.
Although it did not consider the #include mayhem that src/glx is.
In particular, none of the __GLXDRI{screen,context,drawable) are
available for macOS... those are pulled by dri_common.[ch].
Ideally we'll untangle that, but for the time being simply #ifdef out
the include/call.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2726
Fixes: b699d070a6 ("glx: set the loader_logger early and for everyone")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4490>
v2: use static array to keep name -> func mapping
v3: use unordered_map
v4: handle ARM constants
reorder dispatch table
wrap enqueue APIs as the command value differs between khr and arm
v5: move declarations into dispatch.hpp
handle CL_MEM_USES_SVM_POINTER_ARM in clGetMemObjectInfo
v6: breaking long lines
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2076>
all of the functionality can be mapped to malloc/free if the device supports
fine grained system SVM.
v2: fix some API bugs found with the OpenCL CTS
v3: remove validate_even_wait_list
improve implementation of clSetKernelExecInfo
make clEnqueueSVMFree spec compliant
rename can_emulate_non_system_svm to has_system_svm and make it a member method
improve validation in clEnqueueSVMMemFill
handle CL_MEM_USES_SVM_POINTER in clGetMemObjectInfo
v4: break long lines and other minor cosmetic adjustments
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2076>
it is pretty much identical to a clSetKernelArg for a scalar field, except
it is only valid for global and constant memory pointers.
Also the type equals void* on the Host, so we can just check the size of it.
v2: prefer using target_size to extend the pointer value
v3: handle more corner cases in combiation to clSetKernelArg
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Pierre Moreau <dev@pmoreau.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2076>
although most of those are 2.0 core functions, there is
cl_arm_shared_virtual_memory to expose those in a 1.2 context. But we
should be able to expose this extension with 1.1 as well as there is no
technicaly reason why this shouldn't work.
v2: move svm functions into existing files
v3: rename func args to match convention
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Pierre Moreau <dev@pmoreau.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2076>
v_frexp_exp_i16_f16 returns the two's complement for negative
exponents. For example, with 0.333252 it returns 0.666504 for
the mantissa and 65535 for the exponent (-1 in decimal).
RADV/LLVM and AMDVLK do a v_bfe_i32 and AMDGPU-PRO uses SDWA with
the sign extension bit set. The latter is probably what we want to
do in long term but for now RA doesn't support changing non-SDWA
instructions to SDWA if useful/needed.
Fixes dEQP-VK.glsl.builtin.precision_fp16_storage16b.frexp.compute.*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4546>
GL spec requires user culling, then clipping then face culling.
llvmpipe was doing clipping then user culling then face culling.
Fix the ordering by adding a new user_cull stage that does the user
culling
Fixes piglit clip_cull-4.shader_test
v2: simplify this a lot (Roland)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4560>
before:
$ file src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_types.py
src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_types.py: Python script text executable, UTF-8 Unicode (with BOM) text
after:
$ file src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_types.py
src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_types.py: Python script text executable, ASCII text
This patch also fixes this build error.
File "src/gallium/drivers/swr/rasterizer/codegen/gen_llvm_types.py", line 1
# Copyright (C) 2014-2018 Intel Corporation. All Rights Reserved.
^
SyntaxError: invalid character in identifier
Fixes: c6e67f5a93 ("gallium/swr: add OpenSWR rasterizer")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4221>
These macros produced a lot of errors with ubsan preventing us from
expanding the ubsan coverage on CIs.
C++ spec has such clause:
"If the prvalue of type "pointer to cv1 B" points to a B that is
actually a subobject of an object of type D, the resulting pointer
points to the enclosing object of type D. Otherwise, the result
of the cast is undefined."
Ubsan error example:
../src/compiler/glsl/builtin_functions.cpp:4945:4: runtime error: downcast of address 0x559b926abb50 which does not point to an object of type 'ir_instruction'
0x559b926abb50: note: object has invalid vptr
9b 55 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 58 ba 6a 92 9b 55 00 00 01 00 00 00
^~~~~~~~~~~~~~~~~~~~~~~
invalid vptr
#0 0x559b914dbe1a in call ../src/compiler/glsl/builtin_functions.cpp:4945
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4129>
Since the alignment is now checked in the validator we must set it.
v2: Use alignement of 4, i.e. dest bit size by eight.
v3: Use alignment 16 (Rhys Perry & Jason Ekstand)
v4: Use nir_intrinsic_set_align to make it clear that align offset is 0
(Jason)
Fixes: e78a7a1825
nir: Assert memory loads are aligned
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4544>
I missed that this had a micro-optimization to assume that there was
only ever one source, which is no longer valid for the bindless model
since we now have a bindless handle source. Remove the optimization to
fix assertion failures with turnip.
Fixes e.g.
dEQP-VK.glsl.texture_functions.query.texturesize.sampler2d_fixed_vertex
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4548>
This fixes an assertion in iris_disk_cache_init() when the initialization
goes through drm_create_adapter(), which lives in d3dadapter9.so.
In this case build_id_find_nhdr_for_addr() fails and returns NULL, since
the shared library does not include a build ID.
The issue can be reproduced with an iris capable GPU and Xnine, while
removing the shader cache prior to launching the application.
Fix this by doing the same as in 29ea92e6a1.
Fixes: 4756864cdc "iris: Start wiring up on-disk shader cache"
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4499>
Saves a bunch of noise for me to sort through in IR3_SHADER_DEBUG=vs,fs
shader-db/run <single shader_test>. I found that we were crashing on
destroy of NULL programs in fd_prog_fini, so I replicated the gpu_id < 300
early exit from fd_prog_init() down to _fini as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4538>
When dealing with a regression in libdrm-2.4.101, I masked the package
in Gentoo. In doing so, we discovered that Mesa's dri.pc specifies a
version requirement in dri.pc for >= the version of libdrm Mesa was
built against, thus preventing packages from being rebuilt with the
older version of libdrm installed.
Let's reduce this version requirement to the latest libdrm required by
Mesa instead, since libdrm is backward compatible.
Fixes: a3a16d4aa7 ("meson: use dep_libdrm version for pkg-config")
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4534>
Do a better job of pruning when removing unused instructions, including
cleaning up dangling false-deps.
This introduces a new ssa src pointer iterator, which makes it easy to
clear links without having to think about whether it is a normal ssa
src or a false-dep.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4440>
This replaces the depth-first search scheduler with a more traditional
ready-list scheduler. It primarily tries to reduce register pressure
(number of live values), with the exception of trying to schedule kills
as early as possible. (Earlier iterations of this scheduler had a
tendency to push kills later, and in particular moving texture fetches
which may not be necessary ahead of kills.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4440>
If the group pass must insert a mov to resolve conflicts, avoid the mov
appearing *after* the meta:collect whose src it is.
The current pre-RA scheduler doesn't really care about the initial
instruction order, but the new one will in some cases.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4440>
SEL_B32 (and presumably B16) checks for 0 or nonzero in the condition
(tested by just stuffing a uniform's value into it), so there's no need to
do ir3_b2n() on it, or any preceding ir3_n2b().
instructions in affected programs: 664444 -> 659927 (-0.68%)
nops in affected programs: 267898 -> 266312 (-0.59%)
non-nops in affected programs: 420260 -> 417329 (-0.70%)
dwords in affected programs: 144032 -> 137568 (-4.49%)
last-baryf in affected programs: 10801 -> 10321 (-4.44%)
full in affected programs: 2003 -> 2002 (-0.05%)
sstall in affected programs: 76670 -> 77405 (0.96%)
(ss) in affected programs: 4515 -> 4525 (0.22%)
(sy) in affected programs: 612 -> 604 (-1.31%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4516>
Many little changes. Almost everything is indentation or removal of
trailing whitespace. Some places move a loop variable declaration to
the loop. Some comments either re-wrapped or converted to single
line.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4512>
So many little changes. Almost everything is indentation or removal of
trailing whitespace. There's a couple places where an "else" is moved
to the previous line with the "}". Some places move a loop variable
declaration to the loop. One change of assert to unreachable.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4512>
For larger render targets or smaller GMEM size, we could end up needing
multiple chunks of tracepoints per batch. But we still want to decode
the traces as single batch. So we need a bit of a state across
process_chunk() calls to accumulate timestamp information.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4510>
The v->binning variant is never added to shader->variants, so just free
each one as we free the nonbinning variant.
Noticed from drm-shim mode running out of open fds, since each bo ends up
with an fd.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4502>
Store Primitive Counts operations write 7 counters in 32-bit words
but also a padding 32-bit with 0. So we need 8 32-bit words instead
of the current 7 allocated.
This was causing an corruption in the next buffer when Transform
Feedback was enabled that were exposed on tests like:
dEQP-GLES3.functional.transform_feedback.*.points.*
This patch fixes 196 tests that were failing when they were run isolated
but they were passing when run using cts-runner.
Fixes: 0f2d1dfe65 ("v3d: use the GPU to record primitives written to transform feedback")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2674
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4501>
PSEUDO instructions are lowered using SDWA, and thus,
cannot take literals and before GFX9 cannot take constants
at all. As the in-register representation differs between
32bit and 16bit floats, we first need to ensure correct
behavior.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4492>
This is a preparation for dropping this field since this value is
expected to be calculated by the drivers now for variable group size
case. And also the field would get in the way of brw_compile_cs
producing multiple SIMD variants (like FS).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4504>
Move the calculation to helper functions -- similar to what GL already
needs to do.
This is a preparation for dropping this field since this value is
expected to be calculated by the drivers now for variable group size
case. And also the field would get in the way of brw_compile_cs
producing multiple SIMD variants (like FS).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4504>
Add new builtin parameters that are used to keep track of the group
size. This will be used to implement ARB_compute_variable_group_size.
The compiler will use the maximum group size supported to pick a
suitable SIMD variant. A later improvement will be to keep all SIMD
variants (like FS) so the driver can select the best one at dispatch
time.
When variable workgroup size is used, the small workgroup optimization
is disabled as it we can't prove at compile time that the barriers
won't be needed.
Extracted from original i965 patch with additional changes by
Caio Marcelo de Oliveira Filho.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4504>
The push.total field had three values but only one was directly
used (size). Replace it with a helper function that explicitly takes
the cs_prog_data and the number of threads -- and use that in the
drivers.
This is a preparation for ARB_compute_variable_group_size where the
number of threads (hence the total size for push constants) is not
defined at compile time (not cs_prog_data->threads).
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4504>
Under the bindless model, there are 5 "base" registers programmed with a
64-bit address, and sam/ldib/ldc and so on each specify a base register
and an offset, in units of 16 dwords. The base registers correspond to
descriptor sets in Vulkan. We allocate a buffer at descriptor set
creation time, hopefully outside the main rendering loop, and then
switching descriptor sets is just a matter of programming the base
registers differently. Note, however, that some kinds of descriptors
need to be patched at command recording time, in particular dynamic
UBO's and SSBO's, which need to be patched at CmdBindDescriptorSets
time, and input attachments which need to be patched at draw time based
on the the pipeline that's bound. We reserve the fifth base register
(which seems to be unused by the blob driver) for these, creating a
descriptor set on-the-fly and combining all the dynamic descriptors from
all the different descriptor sets. This way, we never have to copy the
rest of the descriptor set at draw time like the blob seems to do. I
mostly chose to do this because the infrastructure was already there in
the form of dynamic_descriptors, and other drivers (at least radv) don't
cheat either when implementing this.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4358>
In Vulkan, descriptors for samplers, SSBO's, etc. are collected into
descriptor sets, and shaders can use multiple descriptor sets. At
command-recording time, users can swap out only some of the descriptor
sets, and the driver is supposed to do the minimum amount necessary to
update any internal binding tables, knowing that only some of the
descriptors have changed.
With the old binding model, focused on GL, where there are separate
tables for each type of resource, we can do somewhat better than now by
preserving descriptors from lower descriptor sets when switching higher
descriptor sets. However we still have to copy around descriptors before
each draw.
At least for a6xx, qualcomm went further, essentially copying the Vulkan
binding model as an alternate way to load resources. There's an array of
registers (actually an array for compute and one for everything else),
where each register holds a pointer to a descriptor set that can contain
various different descriptor types. The descriptors are padded out to 16
dwords, so that every instruction can use an index instead of a dword
offset. It's called "bindless", I think, because it can also be used to
implement the old GL bindless extensions (presumably it allows more
samplers and textures than the old model).
This commit adds the register and cmdstream parts. Next up will be the
instruction encoding.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4358>
Carve out some space at the beginning for push constants, and push them
directly, rather than remapping them to a UBO and then relying on the
UBO pushing code. Remapping to a UBO is easy now, where there's a single
table of UBO's, but with the bindless model it'll be a lot harder. I
haven't removed all the code to move the remaining UBO's over by 1,
though, because it's going to all get rewritten with bindless anyways.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4358>
An improved hashing greatly reduces the number of collisions,
and thus, increases the speed for lookups in the hash table.
The hash function now uses Murmur3 written by Austin Appleby.
This patch also pre-reserves space for the hashmap to avoid rehashing.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4130>
* Take tile_mode as input directly
* tu6_format_gmem to tu6_base_format, use may not be limited to GMEM
* Add new helpers that will return the correct tile_mode as for image level
as part of the format.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3783>
This gives +8% with Wolfeinstein Youngblood on my Vega64, and
according to someone else, it also improves performance with Doom
2016 and Wolfenstein 2 (and probably other ID Tech games).
This improvement is because Youngblood uses GENERAL for the main
depth-only pass and TC-compat HTILE is now enabled with GENERAL if
we know that we are outside of a render loop. This obviously also
reduces the number of HTILE decompressions from/to GENERAL.
Note that Youngblood violates the Vulkan spec regarding render loops
because they are only allowed with input attachments. Expect possible
rendering issues if apps use render loops with the wrong way (ie.
without input attachmens) because HTILE might not be coherent if
a depth-stencil texture is sampled and rendered in the same draw.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2704
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4391>
This disables shaderFloat16 on GFX8 because only GFX9+ supports
double rate packed math.
This improves consistency regarding other AMD Vulkan drivers and
it makes no sense to enable that feature without packed math.
This also reduces performance with Wolfeinstein Youngblood if
fp16 is forced enabled on GFX8, while it's similar on GFX9.
We might re-introduce that feature in the future with ACO support
if it ends up being faster and correct.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4453>
This replaces emit_vertex with:
if (vertex_count < max_vertices) {
emit_vertex_with_counter vertex_count ...
vertex_count += 1
}
Which is exactly what NIR->LLVM was doing but at NIR level. This
pass is already called by ACO.
pipeline-db changes on GFX10:
Totals from affected shaders:
SGPRS: 1952 -> 1912 (-2.05 %)
VGPRS: 2112 -> 2044 (-3.22 %)
Code Size: 189368 -> 185620 (-1.98 %) bytes
Max Waves: 494 -> 491 (-0.61 %)
No pipeline-db changes on other generations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4182>
The goal of this function was to return whether a depth-stencil image
has HTILE, in comparison to radv_layout_is_htile_compressed() which
is used to know whether a depth-stencil image has HTILE compressed.
These two functions are actually similar and they have never been
used for what they were supposed to. Remove radv_layout_has_htile()
in favour of radv_layout_is_htile_compressed() for now. If it's
needed in the future, I will re-introduce this concept properly.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4389>
This re-enables the change made in 2f5783bc2b which was
incorrectly disabled by 3e1dd99adc.
For radeonsi, we will prefer the NIR pass as it'll generate better code
(some index calculation and a single load vs. a load, then index
calculation, then another load) and oftentimes NIR optimization can kick
in and make all the access indices constant.
Fixes: 3e1dd99adc ("radeonsi: Remove a bunch of default handling of pipe caps.")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4474>
Fix build error after llvm-11 commit 3a29393b4709 ("Remove
math.h/cmath include from DataTypes.h").
src/gallium/auxiliary/gallivm/lp_bld_format_srgb.c: In function ‘lp_build_linear_to_srgb’:
src/gallium/auxiliary/gallivm/lp_bld_format_srgb.c:194:44: error: implicit declaration of function ‘powf’ [-Werror=implicit-function-declaration]
194 | exp2f_c * powf(coeff_f, 1.0f / exp_f));
| ^~~~
src/gallium/auxiliary/gallivm/lp_bld_format_srgb.c:194:44: warning: incompatible implicit declaration of built-in function ‘powf’
src/gallium/auxiliary/gallivm/lp_bld_format_srgb.c:78:1: note: include ‘<math.h>’ or provide a declaration of ‘powf’
77 | #include "lp_bld_format.h"
+++ |+#include <math.h>
78 |
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4473>
Found from piglit fbo-generatemipmaps failures, then tracked down with the
texturator test. The piece that really revealed things was finding that
1024x1 linear RGBA8 on the older blob drivers would have a pitch of 5120
instead of 4096, and the following levels minified that pitch.
Fixes ~124 piglit tests (~8.5% of piglit failures) on cheza.
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3987>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3987>
Trying to work out texture layout by remembering what things looked like
in texturator is hard. Instead, let's use texture layouts from tracing
the blob as a source of truth to make sure that we pick the same layouts
they do (and don't break known-good ones). More testcases will be added
as I fix layout bugs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3987>
MR pipelines not triggered by Marge Bot can still be triggered manually.
Motivation: The main & forked Mesa project CI pipelines combined are
currently generating over 1 TB of egress traffic per week. ~80% of this
is from pre-merge pipelines. Assuming this corresponds to 4 pre-merge
and one post-merge pipeline per MR on average, this change could
potentially eliminate up to ~60% of the overall traffic (by preventing
3 of the 4 pre-merge pipelines from running automatically).
(Of course, this could be subverted if all jobs of the other pipelines
were triggered manually anyway... In most cases, manually triggering
just a few jobs should suffice)
v2:
* $GITLAB_USER_NAME was the wrong variable, $GITLAB_USER_LOGIN should
do the trick.
Suggested-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4432>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4432>
Identify if view_index is used only for position calculation, and use
Primitive Replication to implement Multiview in Gen12. This feature
allows storing per-view position information in a single execution of
the shader, treating position as an array.
The shader is transformed by adding a for-loop around it, that have an
iteration per active view (in the view_mask). Stores to the position
now store into the position array for the current index in the loop,
and load_view_index() will return the view index corresponding to the
current index in the loop.
The feature is controlled by setting the environment variable
ANV_PRIMITIVE_REPLICATION_MAX_VIEWS, which defaults to 2 if unset.
For pipelines with view counts larger than that, the regular
instancing will be used instead of Primitive Replication. To disable
it completely set the variable to 0.
v2: Don't assume position is set in vertex shader; remove only stores
for position; don't apply optimizations since other passes will
do; clone shader body without extract/reinsert; don't use
last_block (potentially stale). (Jason)
Fix view_index immediate to contain the view index, not its order.
Check for maximum number of views supported.
Add guard for gen12.
v3: Clone the entire shader function and change it before reinsert;
disable optimization when shader has memory writes. (Jason)
Use a single environment variable with _DEBUG on the name.
v4: Change to use new nir_deref_instr.
When removing stores, look for mode nir_var_shader_out instead
of the walking the list of outputs.
Ensure unused derefs are removed in the non-position part of the
shader.
Remove dead control flow when identifying if can use or not
primitive replication.
v5: Consider all the active shaders (including fragment) when deciding
that Primitive Replication can be used.
Change environment variable to ANV_PRIMITIVE_REPLICATION.
Squash the emission of 3DSTATE_PRIMITIVE_REPLICATION into this patch.
Disable Prim Rep in blorp_exec_3d.
v6: Use a loop around the shader, instead of manually unrolling, since
the regular unroll pass will kick in.
Document that we don't expect to see copy_deref or load_deref
involving the position variable.
Recover use_primitive_replication value when loading pipeline from
the cache.
Set VARYING_SLOT_LAYER to 0 in the shader. Earlier versions were
relying on ForceZeroRTAIndexEnable but that might not be
sufficient.
Disable Prim Rep in cmd_buffer_so_memcpy.
v7: Don't use Primitive Replication if position is not set, fallback
to instancing; change environment variable to be
ANV_PRIMITVE_REPLICATION_MAX_VIEWS and default it to 2 based on
experiments.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2313>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2313>
Change brw_compute_vue_map() to also take the number of pos slots. If
more than one slot is used, the VARYING_SLOT_POS is treated as an
array.
When using Primitive Replication, instead of a single position, the
VUE must contain an array of positions. Padding might be
necessary (after clip distance) to ensure rest of attributes start
aligned.
v2: Add note about array in the commit message and assert that
pos_slots >= 1 to make clear 0 is invalid. (Jason)
Move padding to be after the clip distance.
v3: Apply the correct offset when gathering the sources from outputs.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2313>
If a nir_variable is tagged with per_view, it must be an array with
size corresponding to the number of views. For slot-tracking, it is
considered to take just the slot for a single element -- drivers will
take care of expanding this appropriately.
This will be used to implement the ability of having per-view position
in a vertex shader in Intel platforms.
Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2313>
Geometry shaders support an invocations parameter up to a limit
defined by maxGeometryShaderInvocations. This was set to 127, but
executing with invocations > 32 causes a crash. As it turns out, the
blob only advertises a max of 32 invocations, so we set that in
turnip as well.
Fixes dEQP-VK.geometry.instanced.draw_*_instances_{127, 64}_geometry_invocations
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4436>
One of the features of geometry shaders is the ability to render to
different layers by assigning to the gl_Layer (Layer in SPIR-V)
builtin.
While have already plumbed the layer regid to the geometry shader,
we also need to GRAS_LAYER_CNTL to actually use layered rendering.
In addition, gmem does not support layered rendering, so we need to
force sysmem.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4436>
This commit updates VFD_CONTROL to use the GS header and primitive
ID sysvals if a geometry shader stage is present in the pipeline.
Like in the case of VPC, the code here is adapted from fd6_program.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4436>
This commit updates tu6_emit_vpc to selectively emit GS-specifc
configuration. Most of this is repurposed from fd6_program.c.
This also refactors `link_geometry_stages` to ir3_nir_lower_tess.c
so it can be shared between fd and tu.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4436>
Like with other shader types, we need to emit the geometry shader
object and the consts it uses. In addition, we need to emit
additional geometry-specific consts that link primitive/vertex stride
between the vs and gs. In conjunction with the gsheader, these are
used by the vs to determine where to stlw outputs and used by the gs
to determine where to ldlw those outputs from.
FD emits these consts in the draw call because in GL, you can mix
and match shaders in different programs. In Vulkan, however, we
compile and link the shaders at pipeline creation, so we can emit
these in the pipeline IB instead.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4436>
The ir3 compiler only lowers the VS and GS for geometry shading if
the corresponding has_gs key is set in the shader key. Without it,
GS-specific intrinsics like load_per_vertex_input won't get lowered
and the GS header will be initialized with invalid values.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4436>
When NGG is used, vertex and tess eval shaders are executed on the
hardware NGG geometry stage. There is a series of steps they
must perform:
* Request GS space using GS_ALLOC_REQ
* Export the primitive
* Finally, export the normal VS outputs
In this commit, two modes are implemented:
* "late" which matches what the RADV LLVM backend currently does
* "early" which is an optimized version as seen in radeonsi
Vulkan doesn't allow the shader to write the edge flags, so we can
currently always use the "early" mode.
Exporting the primitive ID is also supported by having the GS threads
write that into LDS and reading them from LDS in the ES threads.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3576>
We want to execute instructions after s_setprio in the given
priority, so we must prevent the scheduler from scheduling beyond
s_setprio, otherwise some instructions could be executed in a
different priority.
Rename hazard_fail_memtime to hazard_fail_unreorderable and include
s_setprio in the list of unreorderable opcodes.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3576>
Fixes a building error due to compiler/aco_statistics.cpp
missing in src/amd/Makefile.sources
Fixes: b1544352 ("aco: add various compiler statistics")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
We might end in cases where etna_acc_get_query_result(..) gets called
within one draw call (aka before flushing). At this point the status
of the resource was not set but gets used in etna_acc_get_query_result(..)
to handle different wait cases. Fix this issue by calling resource_written(..)
explicitly.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1530>
Simplify the interface a sampler provider needs to implement. The start(..)
and stop(..) functions got called by resume(..) and suspend(..) so lets
get rid of start(..) and stop(..). Also the way we count and use samples
is much easier to follow now.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1530>
The name hw queries was choosen as occlusion queries are 'feeling'
like nothing special. It is possible to interact with them only
via the command stream - unlike perfom queries where some kernel
magic is needed.
Accumulated HW queries is a much better name for this type of queries.
We read some hardware values over some draw calls and need to accumulate
them to get the final result.
This is some prep work for the following perfmon changes.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1530>
We don't really track these as the ir is transformed, but it would be a
useful thing for some passes to have. So add a pass to collect this
information. It uses instr->data (generic per-pass ptr), with the
hashsets hanging under a mem_ctx for easy disposal at the end of the
pass.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4423>
Don't fold conversions into array (incl phi lowered to regs/array).
These aren't SSA. Avoids crashes in particular in frag shaders with
flow control, which would leave a dangling array write disconnect from
the original cov src.
Possibly this could be slightly relaxed, if there is no other consumer
of the src, and it were in the same block. But it would require
updating block->keeps, and taking care of barrier state. Which isn't a
thing the cf pass does currently.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4423>
This commit completely rewrites the way we extract a structured CFG from
SPIR-V. The new approach is different in a few ways:
1. It does a breadth-first search instead of depth-first. This means
that we've visited the merge node for a construct before we visit
any of the nodes inside the construct. This makes it easier to
validate things like loop and switch nesting.
2. We record more information in the CFG. Earlier commits added a
parent pointer to vtn_cf_node but we now record all of the merge and
other special blocks for each CFG node. This lets us validate
things more precisely.
3. It makes heavy use of merge blocks for walking the CFG. Previously,
we sort of used them as hints for trying to guess the CFG structure
but things got dicey whenever a merge was missing. We had some
heuristics for how to handle short-circuiting if statements but it
was a bunch of special cases.
Now, we make them a fundamental part of walking the CFG. When we
encounter a control-flow construct, we add the body components of
the construct to the BFS work list and then jump to the merge block
if one exists to continue scanning the current CFG nesting level.
If no merge block exists, we assume that means that control-flow
never re-converges in a normal way and that the only way to get back
to normality is with a direct jump such as a loop break or continue.
This should make things far more robust when trying to deal with the
more creative placement (or lack thereof) of merge instructions.
Reviewed-by: Alan Baker <alanbaker@google.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3820>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3820>
This commit enables the I/O vectorization pass that was originally
written for ACO for Intel drivers. We enable it for UBOs, SSBOs, global
memory, and SLM. We only enable vectorization for the scalar back-end
because it vec4 makes certain alignment assumptions.
Shader-db results with iris on ICL:
total instructions in shared programs: 16077927 -> 16068236 (-0.06%)
instructions in affected programs: 199839 -> 190148 (-4.85%)
helped: 324
HURT: 0
helped stats (abs) min: 2 max: 458 x̄: 29.91 x̃: 4
helped stats (rel) min: 0.11% max: 38.94% x̄: 4.32% x̃: 1.64%
95% mean confidence interval for instructions value: -37.02 -22.80
95% mean confidence interval for instructions %-change: -5.07% -3.58%
Instructions are helped.
total cycles in shared programs: 336806135 -> 336151501 (-0.19%)
cycles in affected programs: 16009735 -> 15355101 (-4.09%)
helped: 458
HURT: 154
helped stats (abs) min: 1 max: 77812 x̄: 1542.50 x̃: 75
helped stats (rel) min: <.01% max: 34.46% x̄: 5.16% x̃: 2.01%
HURT stats (abs) min: 1 max: 22800 x̄: 336.55 x̃: 20
HURT stats (rel) min: <.01% max: 17.11% x̄: 2.12% x̃: 1.00%
95% mean confidence interval for cycles value: -1596.83 -542.49
95% mean confidence interval for cycles %-change: -3.83% -2.82%
Cycles are helped.
total sends in shared programs: 814177 -> 809049 (-0.63%)
sends in affected programs: 15422 -> 10294 (-33.25%)
helped: 324
HURT: 0
helped stats (abs) min: 1 max: 256 x̄: 15.83 x̃: 2
helped stats (rel) min: 1.33% max: 67.90% x̄: 21.21% x̃: 15.38%
95% mean confidence interval for sends value: -19.67 -11.98
95% mean confidence interval for sends %-change: -23.03% -19.39%
Sends are helped.
LOST: 7
GAINED: 2
Most of the helped shaders were in the following titles:
- Doom
- Deus Ex: Mankind Divided
- Aztec Ruins
- Shadow of Mordor
- DiRT Showdown
- Tomb Raider (Rise, I think)
Five of the lost programs are SIMD16 shaders we lost from dirt showdown.
The other two are compute shaders in Aztec Ruins which switched from
SIMD8 to SIMD16.
Vulkan pipeline-db stats on ICL:
Instructions in all programs: 296780486 -> 293493363 (-1.1%)
Loops in all programs: 149669 -> 149669 (+0.0%)
Cycles in all programs: 90999206722 -> 88513844563 (-2.7%)
Spills in all programs: 1710217 -> 1730691 (+1.2%)
Fills in all programs: 1931235 -> 1958138 (+1.4%)
By far the most help was in the Tomb Raider games. A couple of Batman
games with DXVK were also helped. In Shadow of the Tomb Raider:
Instructions in all programs: 41614336 -> 39408023 (-5.3%)
Loops in all programs: 32200 -> 32200 (+0.0%)
Cycles in all programs: 1875498485 -> 1667034831 (-11.1%)
Spills in all programs: 196307 -> 214945 (+9.5%)
Fills in all programs: 282736 -> 307113 (+8.6%)
Benchmarks of real games I've done on this patch:
- Rise of the Tomb Raider: +3%
- Shadow of the Tomb Raider: +10%
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
Thanks to the NIR vectorizing pass, we're about to see alignments that
are higher than the bit size. Previously, we could use either and we
just happened to choose alignment (probably the wrong choice) so it's
harmless to switch to detecting based on bit size. This commit changes
things to take both into account which is more accurate to what the
messages we're using do. We also beef up the asserts and make them more
consistent, more accurate, and more complete.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
Immediate offsets are currently collapsed for ldlw, but ldlw does
behave correctly with immediate values. For example,
`ldlw.u32 r0.x, l[4], 1` actually means to use the value of
regid 4 (r1.x) as the offset when we actually want it to use the
imm value of 4 as the offset.
This commit disables copy prop for ldlw offsets so the same
intrinsic gets compiled to:
mov.u32u32 r0.y, 0x00000004
ldlw.u32 r0.x, l[r0.y], 1
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4439>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4439>
Previously, turnip advertised 4-bit subpixel precision when in
practice, a6xx seems to render with 8-bit precision. This caused
dEQP-VK.renderpass2.suballocation.subpass_dependencies.late_fragment_tests.*
to fail because they compare images rendered with turnip against
ones rendered via a software reference implementation parameterized
by turnip's VkPhysicalDeviceLimits.subPixelPrecisionBits value.
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4172>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4172>
v_mad and v_madak are both 64-bit instructions, so it doesn't
increase code size to always apply a 32-bit literal instead of
using v_mad and a sgpr which contains that literal.
Found with some Youngblood shaders but help some other games.
vkpipeline-db (VEGA10):
Totals from affected shaders:
SGPRS: 46168 -> 46016 (-0.33 %)
VGPRS: 45576 -> 45564 (-0.03 %)
Code Size: 5187208 -> 5179584 (-0.15 %) bytes
Max Waves: 3297 -> 3297 (0.00 %)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4410>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4410>
The use of vector.end()[-1] seems to generate warnings in Coverity about
not allowing a negative argument to a parameter. The intention with the
code snippet is just to access the last element of the vector. The
vector.back() call acheives the same thing, is clearer and will
hopefully fix the Coverity warning.
I’m not exactly sure why Coverity thinks the array index can’t be
negative. cplusplus.com says that vector::end() returns a random access
iterator and that the type of the array index operator argument to that
should be the difference type for the container. It then also says that
difference_type for a vector is "a signed integral type".
Reviewed-by: Eric Anholt <eric@anholt.net>
Depending on user's vdpau headers, not all of those defines may exist.
Eventually we may want a private copy of these, but this is simple
enough for now.
Fixes asserts when running vdpauinfo which supports these recently added
formats.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4108>
The algorithm we use for resolving parallel copy instructions plays this
little shell game with the values. The reason for this is that it lets
us handle cases where, for instance we have a -> b and b -> a and we
need to use a temporary to do a swap. One result of this algorithm is
that it tends to emit a lot of mov chains which are typcially really bad
for GPUs where a mov is far from free. For instance, it's likely to
turn this:
r16 = ssa_0; r17 = ssa_0; r18 = ssa_0; r15 = ssa_0
into this:
r15 = mov ssa_0
r18 = mov r15
r17 = mov r18
r16 = mov r17
which, if it's the only thing in a block (this is common for phis) is
impossible for a scheduler to fix because of the dependencies and you
end up with significant stalling. If, on the other hand, we only do the
chaining in the actual case where we need to free up a so that it can be
used as a destination, we can emit this:
r15 = mov ssa_0
r18 = mov ssa_0
r17 = mov ssa_0
r16 = mov ssa_0
which is far nicer to the scheduler. On Intel, our copy propagation
pass will undo the chain for us so this has no shader-db impact.
However, for less intelligent back-ends, it's probably a lot better.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4412>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4412>
It turns out that every *_TS event, i.e. every event which requires a
seqno pointer, also allows generating an interrupt in the kernel, at
least since a3xx. And furthermore these interrupts are named by the kgsl
kernel driver and already in envytools. Therefore it's possible to map
out what the *_TS events are with 100% certainty, given access to the
hardware, by sending a CP_EVENT_WRITE with bit 31 set, unmasking all
interrupts in the kernel, and logging which ones get hit. I've done this
for a6xx, and I've also looked at the a5xx firmware, and the list of TS
interrupts is the same as a6xx, so I have a pretty good idea of what the
a5xx events are. I also fixed a few related things along the way:
- VIZQUERY_END overlaps with WT_DONE_TS, but VIZQUERY_START was also a
mess, with neither VIZQUERY_START nor HLSQ_FLUSH using variants. I added
what seems like reasonable variants, based on the existing comment
and the fact that HLSQ_FLUSH is only used in Mesa with a3xx and a4xx.
- CACHE_FLUSH_AND_INVALIDATE seems to come straight from R600, and I
have no idea if it's actually valid with a2xx, but given that RB_DONE_TS
exists in the interrupt mask since a3xx, I guessed that RB_DONE_TS
hasn't changed position since then and put it down as a3xx+ and limited
CACHE_FLUSH_AND_INVALIDATE to a2xx. Someone with the relevant hardware
should be able to confirm.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4065>
Our shared runners are set up for concurrent jobs ~= CPUs / 4 (x86) or 8
(ARM). If you use more build processes than that, then jobs may be
fighting each other for shared system resources, possibly to the point of
failure (we've seen one of the runners OOM on some jobs before, though I'm
not sure if this was the cause).
To try to systematically prevent the problem, we make a ninja wrapper in
the containers that passes the -j flags, and set MAKEFLAGS in the
container builds. This doesn't cover make in non-container builds, but I
believe we don't have any of those.
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3782>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3782>
Currently that's the hard-coded maximum in the kernel, even though the
libdrm API allows for more. Latter is done with extendability in mind.
Allocate 64 pointers^Wdevices on stack for now. Making for shorter and
ever-so-slightly faster code.
v2: Use single MAX_DRM_DEVICES #define (Eric)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com> (v1)
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4084>
This reverts commit 1b87f4058d.
dlclose() of the handle is perfectly reasonable, a follow-up NULL
assignment is missing.
As-is this causes a leak for nearly every platform, since they call
dri2_load_driver* initially, followed by a second swrast fallback call.
Some platforms even loop through the existing drivers probing.
Revert the commit and add the NULL check.
Fixes: 1b87f4058d ("egl/dri2: Don't dlclose() the driver on dri2_load_driver_common failure")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4084>
With earlier commit we've added a generic LIBGL_ALWAYS_SOFTWARE handling
yet did not consider that the existing codebase unconditionally errors
out when set. That was fixed with a latter commit, while the fix itself
added erroneous restriction for egl/drm.
As mentioned in the report - the feature was working for ages. It was a
Gnome developer who added kms_swrast support for gbm in the first place.
Admittedly kms_swrast is somewhat in the middle between traditional
swrast and HW drivers, regardless - reinstate support.
Fixes: 47273d7312 ("egl: set UseFallback if LIBGL_ALWAYS_SOFTWARE is set")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/165
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4084>
This is definitely not the first time I've debugged why I'm getting swrast
on a device only to find out I'm not a member of the render node's group.
This does mean that you'll get a warning print even without EGL_LOG_LEVEL
set. This may be an issue if we expect people outside of the DRI node's
group to actually be using swrast instead of getting their permissions
fixed. Right now surfaceless throws a "libEGL warning: No hardware driver
found, falling back to software rendering" in that case anyway, so this is
just more informative.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3703>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3703>
The kernel driver requires immediate notification using a
BindGBSurface command when a graphics coherent memory resource changes
backing MOB, so that it can start dirty-tracking the new MOB.
Since we always use graphics coherent memory for persistent memory, enqueue
and flush a BindGBSurface commmand at map time rather than at unmap time.
Since we're dealing with persistent memory, It's OK to flush while mapped.
This fixes an issue with gnome-shell / Wayland which uses persistent
memory together with discard maps when we advertise ARB_buffer_storage.
XWayland clients will render incorrectly.
Fixes: 71b43490dd ("svga: Support ARB_buffer_storage")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4399>
We need to lower them to something reasonable (ideally, the modifier
would be attached but we need to do something for the case it's not). We
specifically have to lower pre-sched as well, but we can do the lower
literally at schedule time for now (if this proves annoying, we can move
it earlier, but I want to leave room for modifier-aware copyprop should
that prove interesting).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4396>
If the expression tree that is being replaced has a unary operation at
its root, set the cursor (location where new instructions are inserted)
at the source instruction instead.
This doesn't do much now because there are very few patterns that have a
unary operation as the root. Almost all of the patterns that do have a
unary operation as the root have inot. All of the shaders that are
affected by this commit have expression trees with an inot at the root.
This change prevents some significant, spurious caused by the next
commit. There is further explanation in the large comment added in
the code.
I also considered a couple other options that may still be worth exploring.
1. Add some mark-up to the search pattern to denote where new
instructions should be added. I considered using "@" to denote the
cursor location. For example,
(('fneg', ('fadd@', a, b)), ...)
2. To prevent other kinds of unintended code motion, add the ability to
name expressions in the search pattern so that they can be reused in
the replacement. For example,
(('bcsel', ('ige', ('find_lsb=b', a), 0), ('find_lsb', a), -1), b),
An alternative would be to add some kind of CSE at the time of
inserting the replacements. Create a new instruction, then check to
see if it already exists. That option might be better overall.
Over the years I know Matt has heard me complain, "I added a pattern
that just deleted an instruction, but it added a bunch of spills!" This
was always in large, complex shaders that are very hard to analyze. I
always blamed these cases on the scheduler being dumb. I am now very
suspicious that unintended code motion was the real problem.
All Gen4+ Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 17611405 -> 17611333 (<.01%)
instructions in affected programs: 18613 -> 18541 (-0.39%)
helped: 41
HURT: 13
helped stats (abs) min: 1 max: 18 x̄: 4.46 x̃: 4
helped stats (rel) min: 0.27% max: 5.68% x̄: 1.29% x̃: 1.34%
HURT stats (abs) min: 1 max: 20 x̄: 8.54 x̃: 7
HURT stats (rel) min: 0.30% max: 4.20% x̄: 2.15% x̃: 2.38%
95% mean confidence interval for instructions value: -3.29 0.63
95% mean confidence interval for instructions %-change: -0.95% 0.02%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 338366118 -> 338365223 (<.01%)
cycles in affected programs: 257889 -> 256994 (-0.35%)
helped: 42
HURT: 15
helped stats (abs) min: 2 max: 120 x̄: 39.38 x̃: 34
helped stats (rel) min: 0.04% max: 2.55% x̄: 0.86% x̃: 0.76%
HURT stats (abs) min: 6 max: 204 x̄: 50.60 x̃: 34
HURT stats (rel) min: 0.11% max: 4.75% x̄: 1.12% x̃: 0.56%
95% mean confidence interval for cycles value: -30.39 -1.02
95% mean confidence interval for cycles %-change: -0.66% -0.02%
Cycles are helped.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>
This change incurs a small amount of hurt now, but it enables a lot of
benefit on vec4 shaders on the next commit. nir_opt_algebraic_late
converts dph, dot3, etc. to dhp_replicated, dot_replicated3, etc. In
the process, it introduces extra moves. If the original NIR contained
vec1 32 ssa_45 = fdot4 ssa_51, ssa_44
vec1 32 ssa_46 = fneg ssa_45
nir_opt_algebraic_late will produce
vec4 32 ssa_18 = fdot_replicated4 ssa_1, ssa_15
vec1 32 ssa_19 = mov ssa_18.x
vec1 32 ssa_17 = fneg ssa_19
The algebraic pass added in the next commit can't see through the move
to know that the fneg applies to a fdot_replicated4.
Haswell, Ivy Bridge, and Sandybridge had similar results. (Haswell shown)
total cycles in shared programs: 187077604 -> 187079858 (<.01%)
cycles in affected programs: 350132 -> 352386 (0.64%)
helped: 174
HURT: 194
helped stats (abs) min: 2 max: 124 x̄: 23.60 x̃: 16
helped stats (rel) min: 0.12% max: 15.88% x̄: 4.98% x̃: 3.86%
HURT stats (abs) min: 2 max: 164 x̄: 32.78 x̃: 16
HURT stats (rel) min: 0.17% max: 22.82% x̄: 6.46% x̃: 0.86%
95% mean confidence interval for cycles value: 2.04 10.21
95% mean confidence interval for cycles %-change: 0.17% 1.93%
Cycles are HURT.
No shader-db changes on any other Intel platform.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>
Tell users what the base address of the image needs to be aligned to.
These values are based on experimentation via passing an offset to
vkBindImageMemory with turnip and seeing if tests still pass. Note that
r8g8 is also special in this regard, however it actually has an
increased alignment (in bytes).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4357>
When we originally wrote a bunch of the allocation data structures, we
re-used the GPU memory for CPU-side data structures. It's a bit more
memory efficient and usually ok. However, this has a couple of
problems:
1. It makes it MUCH more likely that the GPU will accidentlly stomp
CPU-side data structures and cause nearly impossible to debug
crashes.
2. With discrete GPUs, the memory will be mapped somehow and that map
may be across the BAR so it could have horribly slow CPU access.
This is bad for our CPU-side data structures.
In the case of anv_state_stream, it also made the data structure
massively more complex than it needed to be.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4336>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4336>
If we have an allocation that's exactly the block size, we end up
computing a new block size to allocate that's exactly the block size,
add in the header, and then assert fail. When computing the block size,
we need to account for the header.
Fixes: 955127db93 "anv/allocator: Add support for large stream..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4336>
With EGL_EXT_image_dma_buf_import, user can import dma_buf
with offset.
This is also used by AOSP GLConsumer::updateTexImage
with HAL_PIXEL_FORMAT_YV12 buffer which store YUV planes in
the same buffer with offset. Render sample from it using
GL_OES_EGL_image_external. This should fix some video
display problem when using MediaCodec soft decoding which
generates HAL_PIXEL_FORMAT_YV12 buffer and render it on
screen.
Test program:
https://github.com/yuq/gfx/tree/master/yuv2rgb/dma-buf
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4362>
Now that its Gallium dependencies have been resolved, we can move this
all out to root. The only nontrivial change here is keeping the
pandecode calls in Gallium-panfrost to avoid creating a circular
dependency between encoder/decoder. This could be solved with a third
drm folder but this seems less intrusive for now and Roman would
probably appreciate if I went longer than 8 hours without breaking the
Android build.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>
'fraghalf' is unused (superceeded by actually lowering output based on
the precision information in nir). And glsl140 support in ir3 is long
past the experimental stage, so the glsl120 option is no longer needed.
So remove them and free up some bits for new things.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4366>
In 87839680c0, a very subtle mistake was made with the CFG walking
recursion. Instead of setting the local has_nested_loop variable when
process child loops, has_nested_loop_out was passed directly into the
process_loop_in_block call. This broke nested loop detection heuristics
and caused loop unrolling to run massively out of control. In
particular, it makes the following CTS test compile virtually forever:
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_geom
Fixes: 87839680c0 "nir: Fix breakage of foreach_list_typed_safe..."
Closes: #2710
Reviewed-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4380>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4380>
In trying to track down the new failure in #2670, I found that I could get
the flaky test set down to 4 tests, and dropping any remaining test
wouldn't trigger the failure (a bad 8x4 block in the middle of
dEQP-GLES3.functional.fbo.msaa.4_samples.r16f's render target). Disabling
gmem or bypass didn't help, and adding lots of CCU flushing didn't help.
What did help was disabling blitting, or this memset to initialize the
UBWC area after we (presumably) pull a BO out of the BO cache. My guess
is that the 2D blitter can't handle some rare set of state in the flags
buffer and emits some garbage.
I've run 8 gles3 and 7 gles31 runs with this branch now so hopefully I've got the4 right set of flakes marked for removal.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2670
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4290>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4290>
The batch might not have stage == FD_STAGE_BLIT set because
fd_blitter_pipe_begin was sticking the stage on some random batch (or none
at all) rather than the one that would be used in the meta operation.
What we actually wanted to be looking at was set_active_query_state(),
which is already called by util_blitter and whose state we just needed to
track.
Fixes piglit occlusion_query_meta_no_fragments. I haven't changed
query_hw.c's stage handling to clean the rest up because I don't have a
db410c/db820c at home to iterate over the piglit tests.
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>
We were returning the same kind of result as time_elapsed (an end - start
time in ns), which on a timestamp query is approximately zero since
begin/end are at the same point in time. What we're supposed to return is
a converted-to-ns timestamp based on the GPU clock. Remove the _pause()
function for time_elapsed to reduce the command stream overhead, and just
capture start (which is, unfortunately, going to happen on each tile and
thus the final start value we ready will be the last tile of the frame,
not the first).
Fixes piglit spec/arb_timer_query/query gl_timestamp
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>
When we switch batches and start a new draw, we need to cap the queries in
the previous batch and start queries again in the new one.
FD_STAGE_NULL got renamed to 0 so that it would naturally return
!is_active and end the queries at the end of the batch.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>
By inserting a b2b1 around the load_ubo, load_input, etc. intrinsics
generated by nir_lower_io, we can ensure that the intrinsic has the
correct destination bit size. Not having the right size can mess up
passes which try to optimize access. In particular, it was causing
brw_nir_analyze_ubo_ranges to ignore load_ubo of booleans which meant
that booleans uniforms weren't getting pushed as push constants. I
don't think this is an actual functional bug anywhere hence no CC to
stable but it may improve perf somewhere.
Shader-db results on ICL with iris:
total instructions in shared programs: 16076707 -> 16075246 (<.01%)
instructions in affected programs: 129034 -> 127573 (-1.13%)
helped: 487
HURT: 0
helped stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.45% max: 3.00% x̄: 1.33% x̃: 1.36%
95% mean confidence interval for instructions value: -3.00 -3.00
95% mean confidence interval for instructions %-change: -1.37% -1.29%
Instructions are helped.
total cycles in shared programs: 338015639 -> 337983311 (<.01%)
cycles in affected programs: 971986 -> 939658 (-3.33%)
helped: 362
HURT: 110
helped stats (abs) min: 1 max: 1664 x̄: 97.37 x̃: 43
helped stats (rel) min: 0.03% max: 36.22% x̄: 5.58% x̃: 2.60%
HURT stats (abs) min: 1 max: 554 x̄: 26.55 x̃: 18
HURT stats (rel) min: 0.03% max: 10.99% x̄: 1.04% x̃: 0.96%
95% mean confidence interval for cycles value: -79.97 -57.01
95% mean confidence interval for cycles %-change: -4.60% -3.47%
Cycles are helped.
total sends in shared programs: 815037 -> 814550 (-0.06%)
sends in affected programs: 5701 -> 5214 (-8.54%)
helped: 487
HURT: 0
LOST: 2
GAINED: 0
The two lost programs were SIMD16 shaders in CS:GO. However, CS:GO was
also one of the most helped programs where it shaves sends off of 134
programs. This seems to reduce GPU core clocks by about 4% on the first
1000 frames of the PTS benchmark.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
The implementations here just clone i2b32 and i2b1. This means that
b2b32 doesn't technically generate true NIR 0/-1 booleans but it should
be fine as it's only ever generated for shared variable writes which
will always be consumed by something which will then run it through an
i2b again.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
These exist to convert between different types of boolean values. In
particular, we want to use these for uniform and shared memory
operations where we need to convert to a reasonably sized boolean but we
don't care what its format is so we don't want to make the back-end
insert an actual i2b/b2i. In the case of uniforms, Mesa can tweak the
format of the uniform boolean to whatever the driver wants. In the case
of shared, every value in a shared variable comes from the shader so
it's already in the right boolean format.
The new boolean conversion opcodes get replaced with mov in
lower_bool_to_int/float32 so the back-end will hopefully never see them.
However, while we're in the middle of optimizing our NIR, they let us
have sensible load_uniform/ubo intrinsics and also have the bit size
conversion.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
This prunes out all targets except libgl-gdi, libgl-xlib, and svga, as
suggested by Marek Olšák.
libgl-xlib will be remove once I have had time to confirm no automated
tests we have rely upon it.
There are also a bunch of Makefile.sources which become orphaned as
result, that are not taken care of in this change.
v2: Prune remainders of swr support.
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4348>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4348>
We know that in this case, the LS and HS invocations are working
on the exact same vertex, so it's safe to skip the LDS.
Totals:
VGPRS: 3960744 -> 3961844 (0.03 %)
Code Size: 254824300 -> 254764624 (-0.02 %) bytes
Max Waves: 1053748 -> 1053574 (-0.02 %)
Totals from affected shaders:
VGPRS: 26152 -> 27252 (4.21 %)
Code Size: 1496600 -> 1436924 (-3.99 %) bytes
Max Waves: 4860 -> 4686 (-3.58 %)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
Clear the workgroup size for all supported shader stages.
Also, unify the workgroup size calculation accross various places.
As a result, insert_waitcnt can use the proper workgroup size
which means that some waits can be dropped from tessellation
shaders. Also, in cases where the previous calculation was wrong,
we now insert s_barrier instructions.
Totals from affected shaders (GFX10):
Code Size: 340116 -> 338484 (-0.48 %) bytes
Fixes: a8d15ab6da
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
The following new fields are added to tess shader info:
* `tcs_cross_invocation_inputs_read`
* `tcs_cross_invocation_outputs_read`
These are I/O masks that are a subset of inputs_read and outputs_read
and they contain which per-vertex inputs and outputs are read
cross-invocation.
Additionall, the following new fields are added to shader_info:
* `inputs_read_indirectly`
* `outputs_accessed_indirectly`
* `patch_inputs_read_indirectly`
* `patch_outputs_accessed_indirectly`
These new fields can be used for optimizing TCS in a back-end compiler.
If you can be sure that the TCS doesn't use cross-invocation inputs
or outputs, you can choose a different strategy for storing VS and TCS
outputs. However, such optimizations might need to be disabled when
the inputs/outputs are accessed indirectly due to backend limitations,
so this information is also collected.
Example: RADV currently has to store all VS and TCS outputs in LDS, but
for shaders when only inputs and/or outputs belonging to the current
invocation ID are used, it could skip storing these in LDS entirely.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
It can be further optimized in the future, but
the new sequence already has a few advantages:
* Uses fewer instructions
* Uses even fewer instructions in wave32 mode
* Doesn't use the VALU at all
Totals from affected shaders (GFX10):
VGPRS: 43504 -> 43496 (-0.02 %)
Code Size: 2436000 -> 2423688 (-0.51 %) bytes
Max Waves: 8704 -> 8705 (0.01 %)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
When TCS has an equal number of input and output, it means that the
number of VS and TCS invocations (LS and HS) are the same; and that
the HS invocations operate on the same vertices as the LS.
When this is the case, this commit removes the else-if between
the merged VS and TCS halves, making it possible to schedule
and optimize the code accross the two halves.
Totals:
SGPRS: 5577367 -> 5581735 (0.08 %)
VGPRS: 3958592 -> 3960752 (0.05 %)
Code Size: 254867144 -> 254838244 (-0.01 %) bytes
Max Waves: 1053887 -> 1053747 (-0.01 %)
Totals from affected shaders:
SGPRS: 29032 -> 33400 (15.05 %)
VGPRS: 35664 -> 37824 (6.06 %)
Code Size: 1979028 -> 1950128 (-1.46 %) bytes
Max Waves: 7310 -> 7170 (-1.92 %)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
Previously the tess factors were loaded individually, but now they can
be loaded using a single LDS load instruction.
Note that the inner and outer tess factors are not yet combined.
Totals (GFX10):
Code Size: 254896008 -> 254879212 (-0.01 %) bytes
Totals from affected shaders (GFX10):
Code Size: 2028352 -> 2011556 (-0.83 %) bytes
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
Some copypasta may have stuck in the code.
This was left on false by mistake.
Totals (GFX10):
Code Size: 254939248 -> 254896008 (-0.02 %) bytes
Totals from affected shaders (GFX10):
VGPRS: 16196 -> 16212 (0.10 %)
Code Size: 1126332 -> 1083092 (-3.84 %) bytes
Max Waves: 2336 -> 2334 (-0.09 %)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
This allows the passes after isel to assume that the exports are
always correct, and also allows to schedule these null exports later.
Additionally, it ensures that the correct exec mask is used for
these exports.
Totals from affected shaders (GFX10):
SGPRS: 84224 -> 84344 (0.14 %)
VGPRS: 23088 -> 23076 (-0.05 %)
Code Size: 882892 -> 894368 (1.30 %) bytes
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
The .fdo.container-ifnot-exists template has been replaced by
.fdo.container-build.
We need to include "debian/" in FDO_REPO_SUFFIX for now, we can drop it
for individual images when their tags are bumped if we want.
Miscellaneous other goodies this gets us:
* The templates now add some labels to images which may be useful for
garbage collecting unused tags in the future.
* The templates now copy the current tag from the main project
registry to the forked project's if it already exists in the latter
but points to a different image hash. This will avoid false failures
(or passes) due to using the wrong image.
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4286>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4286>
Since we are re-assigning the scalars anyways in the second pass, assign
them to the highest free reg in the first pass (rather than lowest) to
allow packing vecN regs as low as possible.
Note this required some changes specifically for tex instructions with a
single component writemask that is not necessarily .x, as previously
these would get assigned in the first RA pass, and since they are still
scalar, we'd end up w/ some r47.* and other similarly way-to-high
assignments after the 2nd pass.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4272>
Using the output of the first pass isn't ideal, as it can bake in the
losses from fragmentation which the scalar pass is intended to fill in.
This gets worse when we start using "vectorish" instructions, due to
higher use of vecN values.
Instead, we can just use the outputs of the liveness analysis to get a
more accurate # of maximum live values at any point.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4272>
Decouple the messy logic of figuring out vreg names defined/used by an
instruction from the logic of what to do about it by introducing
iterators. There is still *some* array vs ssa special casing in
ra_block_compute_live_ranges(), but less than before. And this will
avoid introducing a second copy of the def/use logic in a following
patch which uses the liveranges to calculate the maximum # of live
values (which is the optimal target for max physical register window
to round-robin within).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4272>
Account for the # of regs an instruction writes, and fix an off-by-one.
(We are about to replace this with calculating the register target using
the live-ranges, but in debugging that it was useful to assert() if it
chose a higher target.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4272>
This way RA doesn't have to special case it in use/def accounting..
This gets rid of an extra level of split/collect, which shouldn't be
needed. And interferes with scheduler trying to put tex-prefetches
after inputs but before other instructions. (Otherwise it would have
to figure out which split/collects need to go before the tex-prefetch)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4272>
Adds a native build of Mesa using Meson with the Visual Studio 2019
toolchain on a Windows host.
Though Docker is supported on Windows, Docker-in-Docker is not possible,
nor are podman and skopeo available. We handle this by creating the
container from a shell-executor Windows machine, which gives us a native
PowerShell that we can execute Docker from. This attempts to do the same
copy-from-upstream-or-create-if-not-exists optimisation as the
ci-templates do for our Linux builds, albeit open-coded in PowerShell.
The Mesa build itself is executed inside a container, using Meson and
Ninja.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jose Fonseca <jfonseca@vmware.com>
Acked-by: Brian Paul <brianp@vmware.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4304>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4304>
Windows provides MAX_PATH rather than PATH_MAX for the maximum allowable
path length. This is not a limit on the length of filename which can
exist on the filesystem, but a length on the length of path which can be
passed to Win32 API calls.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Fixes: f8f1413070 ("util/u_process: add util_get_process_exec_path")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4304>
Unlike other stages TCS outputs not read by the TES cannot always
be demoted to globals e.g. when they are read by other TCS
invocations.
We were not taking these outputs into account when packing which
could result in other outputs being assigned to the same location.
Here we make sure to gather information on these outputs and group
them together when packing.
This fixes rendering issues in QUBE 2 via Proton.
Closes: #2653
Fixes: 26aa460940 ("nir: rewrite varying component packing")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4328>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4328>
This has been reported to fix a hang in Shadow of Mordor on Gen12.
One of its compute shaders seems to cause an in-order exec_all
dependency to be merged into an out-of-order SET dependency slot,
which would prevent us from baking the SET dependency into the parent
instruction, leading to an assert failure in emit_inst_dependencies()
(Thanks to Rafael for noticing that). Prevent that by avoiding
combination of in-order dependencies whenever that would cause a SET
dependency to be demoted to a SYNC.NOP instruction.
Fixes: e14529ff32 "intel/fs/gen12: Workaround data coherency issues due to broken NoMask control flow."
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Currently, we store the kernel and ramdisk for each LAVA job in the
artifacts of the job that built them. Because artifacts are stored in
GCE and LAVA labs aren't, this causes a lot of egress with is expensive.
To avoid this, have runners download most of the data via the (cached)
container images once, and for each job upload the kernel and ramdisk to
a server outside GCE.
Right now we only have Collabora's runner with a local web server, so
jobs that go to Baylibre's lab have been disabled.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4295>
There's some files from the .gitlab-ci directory that are needed in the
test stage and that, because the Mesa repository isn't checked out in
that stage, need to be made available through other means.
Because those files are going to be needed in LAVA devices, place them
ino the tarball containing the built files so it's available to both
gitlab-ci runners and LAVA devices.
Before those files were passed in the artifacts of the Gitlab CI job,
but this commit places them into the built tarball so scripts later in
the pipeline don't need to account for this discrepancy.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4295>
This implementation was removed by 8b8af6d3 ("gallium/util: Switch
util_float_to_half to _mesa_float_to_half()'s impl.")
It was not actually broken, but _mesa_float_to_half() implements
round-to-nearest-even, whereas util_float_to_half() implemented
round-to-zero. So rename it appropriately.
GL actually never cares about rounding (except a broken piglit test),
however d3d10 very much does and requires RTZ for float to half
conversion. Moreover, apparently at least radeon gpus actually always
do RTZ when doing RT writes (and I'd suspect for shader image writes
as well). Hence it seems appropriate to hook up this rtz function to
the format instead. This will cause llvmpipe and softpipe to use rtz
rounding for clears with half float formats, and softpipe would use rtz
behavior for rt writes as well (llvmpipe has that hardcoded), not sure
if "real" hw drivers hit this function for much.
(For shader opcodes would still need to figure out what rounding to use
appropriately, but this is a question for another day.)
Note should probably unify with _mesa_float_to_float16_rtz. Unclear at
this point which one is better, so just restore previous function here.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4312>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4312>
At least GC880 (iMX6S), GC2000 (iMX6Q) blobs do not emit the
PE.ALPHA_COLOR_EXT0 and PE.ALPHA_COLOR_EXT1 into the command
stream. The GCnano (STM32MP1) is not affected by this change
either. This is because neither of these GPUs support the
half-float feature.
Emit PE.ALPHA_COLOR_EXT* in etnaviv only if half-float support
is present in the GPU. This fixes all of the currently failing
dEQPs in this group:
dEQP-GLES2.functional.fragment_ops.blend.*
Fixes: 76adf041f2 ("etnaviv: fix blend color on newer GPUs")
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4277>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4277>
Similarly to the previous cast; on 64-bit Windows, unsigned long is
32-bit, and casting a pointer to a non-matchin bit-width integer produce
warnings. So let's use uintpre_t for this purpose instead.
Reviewed-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4297>
This code produces warnings, so let's fix that. The problem is that
casting a pointer to an integer of non-pointer-size triggers warnings on
MSVC, and on 64-bit Windows unsigned long is 32-bit large.
So let's instead use uintptr_t, which is exactly for these kinds of
things.
While we're at it, let's make the resulting index a plain "unsigned",
which is the type this originated from before we started with this
cast-dance.
Fixes: 1a66ead1c7 ("pipebuffer, winsys/svga: Add functionality to update pb_validate_entry flags")
Reviewed-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4297>
When an ir_call is encountered that invokes a builtin, it will now try
to generate a lowered version of the builtin. This only happens if all
of the arguments to the function are lowerable. Previously the builtin
would be inlined before the lowering pass is invoked and then the
implementation would be lowered as a consequence of the pass. However
this causes problems if the builtin has multiple arguments and the
implementation has operations on only a few of the arguments before
combining it with the others. In that case the entire builtin should
only be lowered if all of the arguments are lower precision. The
previous approach would end up lowering only parts of the
implementation.
The lowered implementations are cached in a hash table in case they can
be reused.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
Previously, the ir_call functions for builtin functions were replaced
with the inline implementation immediately after being added to the
instruction list. This patch replaces that with a separate pass that
lowers them after the conversion from AST to IR is complete. This will
be useful to be able to insert some handling for the precision lowering
pass before the inlining. This needs to happen because the precision
of the operations in the inlined implementation depends on the highest
precision of all of the arguments to the call.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
If precision lowering happens at GLSL IR, loop_analysis at IR doesn't
work as expected since it can't handle things like:
"(expression bool < (expression float16_t f2fmp (var_ref ndx) ) (constant float16_t (1.000000)) )"
So we'd rather do this optimization at the NIR stage.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
The pass lowers 1-bit booleans produced by NIR to the native bitsize
of the operations that produce them.
v2: change on lower_load_const_instr after upstream changes. Added
TODO2 to explain it, as it was not properly tested yet (see
already existing TODO) (Neil)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
This works by finding the first rvalue that it can lower using an
ir_rvalue_visitor. In that case it adds a conversion to float16
after each rvalue and a conversion back to float before storing
the assignment.
Also it uses a set to keep track of rvalues that have been
lowred already. The handle_rvalue method of the rvalue visitor doesn’t
provide any way to stop iteration. If we handle a value in
find_precision_visitor we want to be able to stop it from descending into
the lowered rvalue again.
Additionally this pass disallows converting nodes containing non-float.
The can_lower_rvalue function explicitly excludes any branches
that have non-float types except bools. This avoids the need to have
special handling for functions that convert to int or double.
Co-authored-by: Hyunjun Ko <zzoon@igalia.com>
v2. Adds lowering for texture samples
v3. Instead of checking whether each node can be lowered while walking the
tree, a separate tree walk is now done to check all of the nodes in a
single pass. The lowerable nodes are added to a set which is checked
during find_precision_visitor instead of calling can_lower_rvalue.
v4. Move the special case for temporaries to find_lowerable_rvalues. This
needs to be handled while checking for lowerable rvalues so that any
later dereferences of the variable will see the right precision.
v5. Add an override to visit ir_call instructions and apply the same
technique to override the precision of the temporary variable in the
same way as done for builtin temporaries and ir_assignment calls.
v6. Changes the pass so that it doesn’t need to lower an entire subtree in
order do perform a lowering. Instead, certain instructions can be
marked as being indepedent of their child instructions. For example,
this is the case with array dereferences. The precision of the array
index doesn’t have any bearing on whether things using the result of
the array deref can be lowered.
Now, only toplevel lowerable nodes are added to the lowerable_rvalues
instead instead of additionally adding all of the subnodes.
It now also only needs one hash table instead of two.
v7. Don’t try to lower sampler types. Instead, the sample instruction is
now treated as an independent point where the result of the sample can
be used in a lowered section. The precision of the sampler type
determines the precision of the sample instruction. This also means
the coordinates to the sampler can be lowered.
v8. Use f2fmp instead of f2f16.
v9. Disable lowering derivatives calcualtions, which might not work
properly on some hw backends.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
Adds ir_dereference::precision(). For a normal variable dereference,
the precision comes from the variable. For a record member it comes
from the field within the record. For an array it can come from
either, depending on where the underlying array is stored. The method
recursively walks the derefs until it finds one of the first two.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
These are pretty straightforward but there's a lot of details to keep
straight. In the IR, we keep a general logical comparator and types
separately; in the hardware, the type gets fused with a (much more)
limited number of comparators. So there's a fair bit of code here to
account for these differences, fusing in the type information, and
changing up argument order as necessary to make it actually correct.
Anything to save a bit!
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4276>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4276>
We rewrite BIR_INDEX_CONSTANT (and _ZERO) to preassigned constant ports
when assign uniform_const for the bundle. There are a lot of issues
raised here, unfortunately, and the implementation here is woefully
incomplete with a nasty hack for loads... nevertheless, it's somewhere
to start.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4276>
We're not totally sure what's up with this but Connor says if you
violate it Bad Things happen in your shader. I think this might be an
issue affecting early Bifrost (G71, ..?); when we know more we can look
into patching in a fix.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4276>
We split off MOV from FMOV since the canonical move on Bifrost doesn't
accept modifiers. (We can still do fmov, but with something like add-0.)
This will also make copyprop a little nicer, I think. Anyway, the
non-modifier version we can implement as-is for FMA.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4276>
This reworks the data structure a bit and, in my view, simplifies it.
Instead of each node having a header which has the node level in it, we
use the bottom 6 bits of the pointer for that. This requires us to
allocate with the os_malloc/free_aligned helpers (which call into
posix_memalign on Linux) but cache-line aligning our allocations is
actually probably a good thing given that we're doing atomics on them.
The primary advantages to doing this is that it changes the number of
memory accesses per tree level from 2 to 1 when walking the tree because
we no longer have to look at node->level.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4228>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4228>
As soon as I switch to using the allocation helpers in os_memory.h,
these tests start blowing up on the Windows build in GitLab CI. As far
as I can tell, the issue is something with the combination of the debug
allocator in u_debug_memory.c and the mutex implementation in the
version of Wine running in CI. The tests don't fail on real windows nor
do they fail with newer versions of Wine.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4228>
According to the SPIR-V specification, these operations require
integer-types. When bit_size is 1, we use booleans, which makes us emit
illegal code.
So let's fix the emitting to check if the first source is one bit wide.
For inot we can take a short-cut, and check the destination instead.
This doesn't work for ieq and ine, so let's not bother to do this
BINOP_LOG.
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4036>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4036>
This is what the HW provides us. If we need integer pixel centers, we
want the state tracker to do the lowering pass so that it gets to optimize
on the subtract. This is also the shader instructions that the blob is
doing on GLES, and is what Vulkan wants too, as was noted in MR !4172.
shader-db on a630:
total instructions in shared programs: 186689 -> 186168 (-0.28%)
total nops in shared programs: 66253 -> 66139 (-0.17%)
total non-nops in shared programs: 120436 -> 120029 (-0.34%)
total dwords in shared programs: 292192 -> 291168 (-0.35%)
total last-baryf in shared programs: 4810 -> 4734 (-1.58%)
total full in shared programs: 10176 -> 10195 (0.19%)
total constlen in shared programs: 54589 -> 54575 (-0.03%)
total sstall in shared programs: 24582 -> 24802 (0.89%)
total (ss) in shared programs: 3921 -> 3925 (0.10%)
total (sy) in shared programs: 1934 -> 1923 (-0.57%)
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4223>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4223>
In trying a full build under AOSP, I ran into the following error:
In file included from external/mesa3d/src/gallium/drivers/r600/sfn/sfn_nir_lower_fs_out_to_vector.cpp:33:
external/libcxx/include/set:942:26: error: the specified comparator type does not provide a const call operator [-Werror,-Wuser-defined-warnings]
static_assert(sizeof(__diagnose_non_const_comparator<_Key, _Compare>()), "");
^
external/mesa3d/src/gallium/drivers/r600/sfn/sfn_nir_lower_fs_out_to_vector.cpp:78:34: note: in instantiation of template class 'std::__1::multiset<nir_intrinsic_ins
tr *, r600::nir_intrinsic_instr_less, std::__1::allocator<nir_intrinsic_instr *> >' requested here
using InstrSubSet = std::pair<InstrSet::iterator, InstrSet::iterator>;
^
external/libcxx/include/__tree:967:5: note: from 'diagnose_if' attribute on '__diagnose_non_const_comparator<nir_intrinsic_instr *, r600::nir_intrinsic_instr_less>':
_LIBCPP_DIAGNOSE_WARNING(!std::__invokable<_Compare const&, _Tp const&, _Tp const&>::value,
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
external/libcxx/include/__config:1244:21: note: expanded from macro '_LIBCPP_DIAGNOSE_WARNING'
__attribute__((diagnose_if(__VA_ARGS__, "warning")))
^ ~~~~~~~~~~~
1 error generated.
Which is pretty opaque to me, but searching the web suggested
adding a cost, which seems to resovle it.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4175>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4175>
Building with AOSP I'm seeing:
external/mesa3d/src/gallium/drivers/etnaviv/etnaviv_screen.c:245:31: error: signed shift result (0x100000000) requires 34 bits to represent, but 'int' only has 32 bits [-Werror,-Wshift-overflow]
system_memory = 4096 << 20;
system_memory is a uint_64t, so this patch addresses the issue
by casting 4096 to a unint_64t before the shift is done.
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4175>
Previously, we were initializing the CCS to 0xFF for MCS+CCS due to a
misunderstanding of the following lines in the bspec:
The following are the general SW requirements for MCS buffer clear
functionality:
...
- If Software wants to enable Color Compression without Fast
clear, Software needs to initialize MCS with zeros.
- Lossless compression and CCS initialized to all F (using HW
Fast Clear or SW direct Clear) on the same surface is not
supported.
The first line does not refer to the CCS as the comment author supposed
but refers to the MCS as the comment says. It means that if you want to
use MCS compression without a fast-clear, you should initialize the MCS
to 0x00. This is because the value 0x00 in the MCS means "all data is
in plane 0" which is a perfectly valid non-fast-clear initialization.
It's also the value the MCS should be in if you do a RECTLIST slow-clear
where the primitive fully covers each pixel such that the same value is
written to all samples.
The second line in the above quote seems to imply that CCS fast-clear is
incompatible with MCS fast-clear. In particular, MCS+CCS fast-clear
uses a 0xff value in the MCS (like on Gen7-11) and leaves the CCS in
either the compressed or the pass-through state. Therefore, we should
initialize the CCS to 0x00 even for MCS+CCS surfaces.
Reviewed-by: Sagar Ghuge<sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4074>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4074>
There's an easy mapping for this, so let's do it. Note we do this at
schedule-time instead of emit since we'll need to lookahead clause
types. The alternative is a prepass running after schedule but before
codegen, but there's no reason not to just stick it here when we're
preparing bi_clause in the first place.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4242>
This is our first instruction we've emitted, requiring us to pipe
through registes/ports and various details from the IR. It's quite a bit
of code, but overall I'm happy with this structure. With some tedium we
should be able to emit the rest of the ALU ops this way, too.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4242>
Now that we can pack registers given the assigned ports, and we can
assign registers from the indices, the missing link is assigning ports
from the registers, and now finally we get some real data showing up in
a disassembly exercising lots of different code paths.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4242>
Starting with Gen12, we can fast-clear a lot more surface formats and we
are suddenly in the position of having to fast-clear surfaces with
formats with an implicit swizzle such as VK_FORMAT_R4G4B4A4_UNORM_PACK16
which is represented as ISL_FORMAT_A4B4G4R4 with a BGRA swizzle. In
order for blorp to do the fast-clear color conversion for us, it needs
a properly swizzled color.
This fixes the following Vulkan CTS groups on TGL:
- dEQP-VK.pipeline.blend.format.b4g4r4a4_unorm_pack16.*
- dEQP-VK.api.image_clearing.core.clear_color_image.*.b4g4r4a4*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4218>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4218>
This function has code like:
if (0x7FD <= zExp) {
if ((0x7FD < zExp) ||
((zExp == 0x7FD) &&
(0x001FFFFFu == zFrac0 && 0xFFFFFFFFu == zFrac1) &&
increment)) {
...
return ...;
}
if (zExp < 0) {
I saw that, and I thought, "Uh... what? Dead code?" I thought it was a
bit fishy, so I grabbed the Berkeley SoftFloat Library 3e code, and
there is similar code in softfloat_roundPackToF64
(source/s_roundPackToF64.c), but it has an extra (uint16_t) cast in the
first comparison. This is basicially a shortcut for
if (zExp < 0 || zExp >= 0x7FD) {
So, having the nesting kind of makes sense. On a CPU, nesting the flow
control can be an optimization. On a GPU, it's just fail. Split the
block so that we don't need the uint16_t cast magic.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 683638 -> 658127 (-3.73%)
instructions in affected programs: 666839 -> 641328 (-3.83%)
helped: 92
HURT: 0
helped stats (abs) min: 26 max: 2456 x̄: 277.29 x̃: 144
helped stats (rel) min: 3.21% max: 4.22% x̄: 3.79% x̃: 3.90%
95% mean confidence interval for instructions value: -345.84 -208.75
95% mean confidence interval for instructions %-change: -3.86% -3.73%
Instructions are helped.
total cycles in shared programs: 5458858 -> 5344600 (-2.09%)
cycles in affected programs: 5360114 -> 5245856 (-2.13%)
helped: 92
HURT: 0
helped stats (abs) min: 126 max: 10300 x̄: 1241.93 x̃: 655
helped stats (rel) min: 1.71% max: 2.37% x̄: 2.12% x̃: 2.17%
95% mean confidence interval for cycles value: -1539.93 -943.94
95% mean confidence interval for cycles %-change: -2.16% -2.08%
Cycles are helped.
Fixes: f111d72596 ("glsl: Add "built-in" functions to do add(fp64, fp64)")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
The previous two commits were just setting the scene for this change.
The mix(..., __propagateFloat64NaN(a, b), propagate) statements are not
identical in the two halves, but they are equivalent. The first clause
of the mix in the else-branch is trivally ±Inf. The first clause in the
then-branch __packFloat64(aSign, aExp, aFracHi, aFracLo). The
preceeding conditions prove that aExp=0x7ff, aFracHi=0, and aFracLo=0.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 819560 -> 787258 (-3.94%)
instructions in affected programs: 757737 -> 725435 (-4.26%)
helped: 74
HURT: 0
helped stats (abs) min: 43 max: 3545 x̄: 436.51 x̃: 296
helped stats (rel) min: 3.54% max: 6.16% x̄: 4.52% x̃: 4.36%
95% mean confidence interval for instructions value: -548.42 -324.61
95% mean confidence interval for instructions %-change: -4.68% -4.37%
Instructions are helped.
total cycles in shared programs: 6817254 -> 6483227 (-4.90%)
cycles in affected programs: 6385272 -> 6051245 (-5.23%)
helped: 74
HURT: 0
helped stats (abs) min: 430 max: 33271 x̄: 4513.88 x̃: 3047
helped stats (rel) min: 4.28% max: 7.45% x̄: 5.48% x̃: 5.31%
95% mean confidence interval for cycles value: -5610.46 -3417.30
95% mean confidence interval for cycles %-change: -5.65% -5.32%
Cycles are helped.
total spills in shared programs: 591 -> 553 (-6.43%)
spills in affected programs: 591 -> 553 (-6.43%)
helped: 1
HURT: 0
total fills in shared programs: 1353 -> 1307 (-3.40%)
fills in affected programs: 1353 -> 1307 (-3.40%)
helped: 1
HURT: 0
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
In one branch we know that expDiff is already positive.
In the other branch we know the expDiff is negative. Previously in that
branch the code was -(expDiff + 1). This is equvialent to (-expDiff) -
1, and since expDiff is negative, abs(expDiff) - 1.
The main purpose of this commit is to prepare for "soft-fp64/fadd: Move
common code out of both branches of an if-statement".
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 818246 -> 819560 (0.16%)
instructions in affected programs: 756423 -> 757737 (0.17%)
helped: 1
HURT: 73
helped stats (abs) min: 1205 max: 1205 x̄: 1205.00 x̃: 1205
helped stats (rel) min: 1.36% max: 1.36% x̄: 1.36% x̃: 1.36%
HURT stats (abs) min: 2 max: 149 x̄: 34.51 x̃: 27
HURT stats (rel) min: 0.14% max: 1.09% x̄: 0.41% x̃: 0.30%
95% mean confidence interval for instructions value: -16.56 52.07
95% mean confidence interval for instructions %-change: 0.30% 0.47%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 6816686 -> 6817254 (<.01%)
cycles in affected programs: 6384704 -> 6385272 (<.01%)
helped: 37
HURT: 37
helped stats (abs) min: 30 max: 5790 x̄: 289.05 x̃: 102
helped stats (rel) min: 0.04% max: 0.86% x̄: 0.29% x̃: 0.31%
HURT stats (abs) min: 2 max: 1020 x̄: 304.41 x̃: 232
HURT stats (rel) min: <.01% max: 1.58% x̄: 0.55% x̃: 0.43%
95% mean confidence interval for cycles value: -165.37 180.72
95% mean confidence interval for cycles %-change: <.01% 0.27%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 705 -> 591 (-16.17%)
spills in affected programs: 705 -> 591 (-16.17%)
helped: 1
HURT: 0
total fills in shared programs: 1501 -> 1353 (-9.86%)
fills in affected programs: 1501 -> 1353 (-9.86%)
helped: 1
HURT: 0
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
Exchanging aFracHi / bFracHi and aFracLo / bFracLo should not affect the
result of the later call to __add64.
The main purpose of this commit is to prepare for "soft-fp64/fadd: Move
common code out of both branches of an if-statement".
v2: Fix a typo in a comment. Noticed by Matt.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 812094 -> 818246 (0.76%)
instructions in affected programs: 750271 -> 756423 (0.82%)
helped: 0
HURT: 74
HURT stats (abs) min: 7 max: 520 x̄: 83.14 x̃: 59
HURT stats (rel) min: 0.52% max: 1.48% x̄: 0.89% x̃: 0.84%
95% mean confidence interval for instructions value: 63.96 102.31
95% mean confidence interval for instructions %-change: 0.83% 0.95%
Instructions are HURT.
total cycles in shared programs: 6797157 -> 6816686 (0.29%)
cycles in affected programs: 6365175 -> 6384704 (0.31%)
helped: 0
HURT: 74
HURT stats (abs) min: 16 max: 1690 x̄: 263.91 x̃: 181
HURT stats (rel) min: 0.14% max: 0.68% x̄: 0.32% x̃: 0.27%
95% mean confidence interval for cycles value: 199.74 328.07
95% mean confidence interval for cycles %-change: 0.29% 0.36%
Cycles are HURT.
total spills in shared programs: 703 -> 705 (0.28%)
spills in affected programs: 703 -> 705 (0.28%)
helped: 0
HURT: 1
total fills in shared programs: 1499 -> 1501 (0.13%)
fills in affected programs: 1499 -> 1501 (0.13%)
helped: 0
HURT: 1
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
The main purpose of this commit is to prepare for "soft-fp64/fadd: Move
common code out of both branches of an if-statement".
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 812590 -> 812094 (-0.06%)
instructions in affected programs: 672135 -> 671639 (-0.07%)
helped: 57
HURT: 0
helped stats (abs) min: 1 max: 32 x̄: 8.70 x̃: 7
helped stats (rel) min: <.01% max: 0.49% x̄: 0.12% x̃: 0.09%
95% mean confidence interval for instructions value: -10.46 -6.94
95% mean confidence interval for instructions %-change: -0.15% -0.09%
Instructions are helped.
total cycles in shared programs: 6798039 -> 6797157 (-0.01%)
cycles in affected programs: 5810059 -> 5809177 (-0.02%)
helped: 54
HURT: 2
helped stats (abs) min: 2 max: 68 x̄: 16.44 x̃: 12
helped stats (rel) min: <.01% max: 0.12% x̄: 0.03% x̃: 0.02%
HURT stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3
HURT stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for cycles value: -19.50 -12.00
95% mean confidence interval for cycles %-change: -0.03% -0.02%
Cycles are helped.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
Convert
} else if (...) {
...
} else {
...
}
to
} else {
if (...) {
...
} else {
...
}
}
Not doing this reformatting in the previous commit makes the previous
commit easier to review, and doing it before the next commit makes the
next commit easier to review.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
Previous condition checks already guaranteen that expDiff != 0 and
!(expDiff > 0), so expDiff < 0 is the only option left.
The main purpose of this commit is to prepare for "soft-fp64/fadd: Move
common code out of both branches of an if-statement".
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 815491 -> 812590 (-0.36%)
instructions in affected programs: 753668 -> 750767 (-0.38%)
helped: 74
HURT: 0
helped stats (abs) min: 3 max: 281 x̄: 39.20 x̃: 25
helped stats (rel) min: 0.29% max: 0.73% x̄: 0.42% x̃: 0.40%
95% mean confidence interval for instructions value: -48.50 -29.91
95% mean confidence interval for instructions %-change: -0.45% -0.40%
Instructions are helped.
total cycles in shared programs: 6813681 -> 6798039 (-0.23%)
cycles in affected programs: 6381699 -> 6366057 (-0.25%)
helped: 74
HURT: 0
helped stats (abs) min: 24 max: 1488 x̄: 211.38 x̃: 149
helped stats (rel) min: 0.20% max: 0.44% x̄: 0.26% x̃: 0.25%
95% mean confidence interval for cycles value: -261.68 -161.08
95% mean confidence interval for cycles %-change: -0.28% -0.25%
Cycles are helped.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
The main purpose of this commit is to prepare for "soft-fp64/fadd: Move
common code out of both branches of an if-statement".
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 822766 -> 817327 (-0.66%)
instructions in affected programs: 760943 -> 755504 (-0.71%)
helped: 74
HURT: 0
helped stats (abs) min: 8 max: 515 x̄: 73.50 x̃: 51
helped stats (rel) min: 0.58% max: 1.10% x̄: 0.77% x̃: 0.73%
95% mean confidence interval for instructions value: -91.17 -55.83
95% mean confidence interval for instructions %-change: -0.81% -0.74%
Instructions are helped.
total cycles in shared programs: 6816791 -> 6822826 (0.09%)
cycles in affected programs: 6384809 -> 6390844 (0.09%)
helped: 0
HURT: 74
HURT stats (abs) min: 6 max: 1179 x̄: 81.55 x̃: 50
HURT stats (rel) min: 0.02% max: 0.17% x̄: 0.09% x̃: 0.09%
95% mean confidence interval for cycles value: 48.99 114.12
95% mean confidence interval for cycles %-change: 0.09% 0.10%
Cycles are HURT.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
v2: Go to extra effort to avoid flow control inserted to implement
short-circuit evaluation rules.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 797779 -> 796849 (-0.12%)
instructions in affected programs: 3499 -> 2569 (-26.58%)
helped: 21
HURT: 0
helped stats (abs) min: 8 max: 112 x̄: 44.29 x̃: 44
helped stats (rel) min: 16.09% max: 33.15% x̄: 25.72% x̃: 24.62%
95% mean confidence interval for instructions value: -55.94 -32.63
95% mean confidence interval for instructions %-change: -28.14% -23.30%
Instructions are helped.
total cycles in shared programs: 6601355 -> 6588351 (-0.20%)
cycles in affected programs: 25376 -> 12372 (-51.25%)
helped: 21
HURT: 0
helped stats (abs) min: 156 max: 1410 x̄: 619.24 x̃: 526
helped stats (rel) min: 42.39% max: 53.98% x̄: 50.12% x̃: 50.75%
95% mean confidence interval for cycles value: -776.58 -461.89
95% mean confidence interval for cycles %-change: -51.57% -48.67%
Cycles are helped.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
fsat is defined as min(max(a, 0.0), 1.0), and IEEE defines both min and
max to return the non-NaN value when one value is NaN. Based on this,
fsat should definitely return 0.0 for NaN.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 841666 -> 841647 (<.01%)
instructions in affected programs: 122033 -> 122014 (-0.02%)
helped: 7
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 2.71 x̃: 3
helped stats (rel) min: 0.01% max: 0.02% x̄: 0.02% x̃: 0.01%
95% mean confidence interval for instructions value: -3.74 -1.69
95% mean confidence interval for instructions %-change: -0.02% -0.01%
Instructions are helped.
total cycles in shared programs: 6927246 -> 6926904 (<.01%)
cycles in affected programs: 1038987 -> 1038645 (-0.03%)
helped: 7
HURT: 0
helped stats (abs) min: 18 max: 72 x̄: 48.86 x̃: 54
helped stats (rel) min: 0.03% max: 0.05% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -67.38 -30.33
95% mean confidence interval for cycles %-change: -0.05% -0.02%
Cycles are helped.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: a42163cbbc ("compiler: Add lowering support for 64-bit saturate operations to software")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2585
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
__fabs64 doesn't do anything special, and the value is still NaN
regardless of the value of the MSB. In a strict sense, it's possible
that both functions should set the "signal" bit.
lts on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 844558 -> 843005 (-0.18%)
instructions in affected programs: 725975 -> 724422 (-0.21%)
helped: 53
HURT: 4
helped stats (abs) min: 1 max: 313 x̄: 29.87 x̃: 21
helped stats (rel) min: 0.01% max: 0.94% x̄: 0.30% x̃: 0.22%
HURT stats (abs) min: 4 max: 11 x̄: 7.50 x̃: 7
HURT stats (rel) min: 0.03% max: 0.09% x̄: 0.05% x̃: 0.04%
95% mean confidence interval for instructions value: -39.02 -15.47
95% mean confidence interval for instructions %-change: -0.34% -0.21%
Instructions are helped.
total cycles in shared programs: 6962024 -> 6944998 (-0.24%)
cycles in affected programs: 6185470 -> 6168444 (-0.28%)
helped: 59
HURT: 0
helped stats (abs) min: 64 max: 2863 x̄: 288.58 x̃: 208
helped stats (rel) min: 0.11% max: 0.87% x̄: 0.33% x̃: 0.27%
95% mean confidence interval for cycles value: -387.15 -190.00
95% mean confidence interval for cycles %-change: -0.38% -0.28%
Cycles are helped.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
Replace all of the bool(qSign) with qSign != 0u. Remove unnecessary
parenthesis from around most of the existing qSign != 0u.
This dramatically simplifies the next commit.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 848109 -> 848106 (<.01%)
instructions in affected programs: 53 -> 50 (-5.66%)
helped: 1
HURT: 0
total cycles in shared programs: 6969145 -> 6969125 (<.01%)
cycles in affected programs: 396 -> 376 (-5.05%)
helped: 1
HURT: 0
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
findMSB returns -1 for an input of zero, so 31 - findMSB(a) is
sufficient on its own.
There's only one user of findMSB in shader-db, and it does not match
this pattern.
TODO: Add a pattern in the backend code generator that emits 31 -
nir_op_ufind_msb(a) as if it were nir_op_uclz. That should save a couple
instructions.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 859509 -> 848109 (-1.33%)
instructions in affected programs: 841058 -> 829658 (-1.36%)
helped: 97
HURT: 0
helped stats (abs) min: 3 max: 1161 x̄: 117.53 x̃: 72
helped stats (rel) min: 0.98% max: 6.74% x̄: 1.70% x̃: 1.35%
95% mean confidence interval for instructions value: -147.21 -87.84
95% mean confidence interval for instructions %-change: -1.94% -1.46%
Instructions are helped.
total cycles in shared programs: 7072275 -> 6969145 (-1.46%)
cycles in affected programs: 6955767 -> 6852637 (-1.48%)
helped: 97
HURT: 0
helped stats (abs) min: 32 max: 10900 x̄: 1063.20 x̃: 560
helped stats (rel) min: 1.18% max: 7.58% x̄: 1.84% x̃: 1.45%
95% mean confidence interval for cycles value: -1339.43 -786.96
95% mean confidence interval for cycles %-change: -2.11% -1.57%
Cycles are helped.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
This doesn't help a lot of shaders, but it helps those few a LOT.
This could also be implemented using bcsel. That version is very
slightly worse because the generated SEL instruction wants to have two
immediate sources, so one of them usually needs an extra MOV instruction
to load.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 929619 -> 928859 (-0.08%)
instructions in affected programs: 1651 -> 891 (-46.03%)
helped: 8
HURT: 0
helped stats (abs) min: 38 max: 152 x̄: 95.00 x̃: 95
helped stats (rel) min: 42.70% max: 86.36% x̄: 49.88% x̃: 44.66%
95% mean confidence interval for instructions value: -132.97 -57.03
95% mean confidence interval for instructions %-change: -62.28% -37.49%
Instructions are helped.
total cycles in shared programs: 7280180 -> 7272912 (-0.10%)
cycles in affected programs: 12960 -> 5692 (-56.08%)
helped: 8
HURT: 0
helped stats (abs) min: 352 max: 1456 x̄: 908.50 x̃: 910
helped stats (rel) min: 52.45% max: 91.19% x̄: 59.24% x̃: 55.15%
95% mean confidence interval for cycles value: -1274.03 -542.97
95% mean confidence interval for cycles %-change: -70.06% -48.41%
Cycles are helped.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
The pattern is added to opt_algebraic because, for example, comparisons
with constant 0.0 will produce (a1 < 0).
Even with a pass that optimized Boolean expressions, I think this would
be very difficult to automatically recognize and optimize.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 933054 -> 929619 (-0.37%)
instructions in affected programs: 784041 -> 780606 (-0.44%)
helped: 59
HURT: 0
helped stats (abs) min: 2 max: 213 x̄: 58.22 x̃: 44
helped stats (rel) min: 0.02% max: 2.51% x̄: 0.72% x̃: 0.46%
95% mean confidence interval for instructions value: -70.80 -45.64
95% mean confidence interval for instructions %-change: -0.92% -0.53%
Instructions are helped.
total cycles in shared programs: 7304712 -> 7280180 (-0.34%)
cycles in affected programs: 7176260 -> 7151728 (-0.34%)
helped: 92
HURT: 0
helped stats (abs) min: 8 max: 1414 x̄: 266.65 x̃: 166
helped stats (rel) min: 0.04% max: 2.34% x̄: 0.43% x̃: 0.22%
95% mean confidence interval for cycles value: -333.05 -200.26
95% mean confidence interval for cycles %-change: -0.54% -0.31%
Cycles are helped.
Regular shader-db changes:
No changes on any Intel platform.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
Fragment shader can write depth and stencil if we set necessary flags
in RSW. In addition to that we need to use special format for Z24S8.
Original format is apparently Z24X8 since we can't sample stencil in GLES2.
This new format also seems to use several components for storing depth
since we saw r != g != b when sampling with this format.
[vasily: - initialize clear->depth to 0xffffff if we reload depth, just
like blob does. Reloading doesn't work otherwise
- use single bitmap for reload type]
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4197>
In Fedora 32 build was failing with meson-0.53.2-1.git88e40c7.fc32
and gcc-10.0.1-0.9.fc32.x86_64.
Worked with meson-0.53.1-1 and same gcc.
/usr/bin/ld: src/gallium/state_trackers/dri/libdri.a(dri2.c.o): in function `dri2_interop_export_object':
/home/airlied/devel/mesa/mesa/build/../src/gallium/state_trackers/dri/dri2.c:1813: undefined reference to `st_finalize_texture'
/usr/bin/ld: src/gallium/state_trackers/dri/libdri.a(dri_screen.c.o): in function `dri_init_screen_helper':
/home/airlied/devel/mesa/mesa/build/../src/gallium/state_trackers/dri/dri_screen.c:580: undefined reference to `st_gl_api_create'
Moving this around seems to fix it.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4220>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4220>
Instead of uint64_t. Fixes potentially writing beyond the end of the
handles pointer array on 32-bit architectures (and copying all 0s
instead of the computed pointer values to the array on big endian
ones).
Corresponding compiler warning:
../src/gallium/drivers/llvmpipe/lp_state_cs.c: In function ‘llvmpipe_set_global_binding’:
../src/gallium/drivers/llvmpipe/lp_state_cs.c:1312:12: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
1312 | va = (uint64_t)((char *)lp_res->data + offset);
| ^
Fixes: 264663d55d "gallivm/llvmpipe: add support for global
operations."
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4166>
Optimize the result of a conditional break/continue. In NIR something
like:
loop {
...
if (cond)
continue;
would get lowered to:
block_0:
...
block_1:
branch_cond !cond block_3
block_2:
branch_uncond block_0
block_3:
...
We recognize the conditional branch skipping over the unconditional
branch, and turn it into:
block_0:
...
block_1:
branch_cond cond block_0
block_2:
block_3:
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4125>
We weren't using this function before. The name is confusing, but it
changes the child while also fixing up the dependence link, if you don't
have access to it already. Or at least, I think that's what the
intention is, and what we'll need to change the branch condition in the
next commit. Adding a dependency between the new and old source doesn't
make any sense for this, and we also need to change the actual source.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4125>
Apparently needed for Stoney Ridge, Kabini and Mullins APUs.
gfx702 also has 16-bank LDS and https://llvm.org/docs/AMDGPUUsage.html
lists some dGPUs under there. Those GPUs seem to be Hawaii actually
(gfx701) and we don't seem to have gotten any interpolation related bugs
reported with them so far.
The late kill flag was tested by running pipeline-db with
ACO_DEBUG=validatera while setting late kill for SMEM buffer loads,
emit_vop2_instruction() and texture instructions. I also tested with
just setting the flag for v_interp_p1_f32.
As far as I know, the only other thing we have to consider for 16-bank LDS
is something to do with 16-bit interpolation. We don't do that yet.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3914>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3914>
There was a disconnect between anv_nir_compute_push_layout and the code
which sets up the push_ubo_sizes array. The NIR code we emit checks
relative to the start of the bound UBO range so that, if we end up with
a vector which straddles the start of the push range, we can perform the
bounds check without risking overflow issues. The code which sets up
the push_ubo_sizes, on the other hand, assumed it was relative to the
start of the push range. Somehow, this didn't get get caught by any of
the available tests.
Fixes: e03f965280 "anv: Bounds-check pushed UBOs when ..."
Closes: #2623
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4195>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4195>
Emit a single table of all possible Vulkan border colors up front, and
then index into it using the Vulkan enum directly. In fact this seems to
be the entire point of separating out border colors in the first place.
In addition to being simpler and having less CPU overhead, and fixing
cases where more than one sampler uses border color, this paves the way
for bindless samplers because the existing approach isn't great for
bindless.
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4200>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4200>
The lava tags was a python array not it's a gitlab CI string,
slit the string with periods in the jinja2 template to avoid having
the following tags :
tags:
- p
- a
- n
- f
- r
- o
- s
- t
instead of :
tags:
- panfrost
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4057>
Patch changes surface state setup to alloc/fill states for all possible
usages for image resource on gen12. Also predraw and binding table
population is changed to determine correct aux usage with the new
iris_image_view_aux_usage.
v2: alloc always all states independent on current image
aux state on gen >= 12 , code cleanups (Nanley)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4080>
EXPECT_DEATH works by forking the process and letting the forked process
fail with an assertion. This process is evidently incredibly expensive,
taking ~30 seconds to run the whole isl_aux_info_test on a 2.8GHz
Skylake. Annoyingly all of the (expected) assertion failures also leaves
lots of messages in dmesg and potentially generates lots of coredumps.
Instead, avoid the expense of fork/exec by redefining assert() and
unreachable() in the code we're testing to return a unit-test-only
value. With this patch, the test takes ~1ms.
Also, while modifying the EXPECT_EQ() calls, reverse the arguments so
that the expected value comes first, as is intended. Otherwise gtest
failure messages don't make much sense.
Fixes: https://gitlab.freedesktop.org/mesa/mesa/issues/2567
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4174>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4174>
Explicitly pass the active stages and the array (and size) of shaders
to be processed. This will make easy to store only the shaders needed
for each pipeline.
The active stages can be identified by a non-NULL shader in the
shaders array, so stop using it and keep track of the flushed stages
as iteration happens.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4040>
Avoids waste for pipelines that don't use all the shaders, and is
flexible enough to cover cases where there are multiple variants per
shader (e.g. SIMD8/16/32 for fragment shader).
Even though we could pre-calculate the exact size of the array, this
is not a critical path so it is worth preventing the bug that will
likely happen when new variants are added but not accounted for.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4040>
This will avoid generating multiple identical fences in a row.
For Gen11+ we have multiple types of fences (affecting different
variable modes), but is still better to combine them in a single
scoped barrier so that the translation to backend IR have the option
of dispatching both fences in parallel.
This will clean up redundant barriers from various
dEQP-VK.memory_model.* tests.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3224>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3224>
In ISL, usage flags only carry intent and not semantic meaning. We
don't have a bulletproof way in ISL to specify that an image is of
depth/stencil type. The usage flags are great but blorp, for instance,
loves to disrespect them. One proposed solution to this problem is to
add explicit depth/stencil formats which are distinct from the
corresponding color formats.
Fortunately, however, empirical evidence suggests that this bit only
affects the sampler's interpretation of the CCS data. Therefore, we can
set the bit based off of the aux_usage which is now very specific and
does carry semantic meaning. In particular, aux_usage now makes a
distinction between color CCS and depth/stencil CCS which appears to be
exactly what the DepthStencilResource bit is for.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4056>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4056>
Stencil CCS is slightly different from color CCS. Using a color CCS
resolve with stencil CCS doesn't do the right thing and you can't sample
from a stencil CCS image without the DepthStencilResource bit set or you
will get the wrong data. Stencil CCS also has it's own rules such as it
doesn't support fast-clear and has no partial resolve. This seems to
indicate that it should probably be its own isl_aux_usage. Now that
adding new isl_aux_usage values is pretty cheap, let's split stencil CCS
out on its own.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4056>
We also delete the badly named isl_surf_supports_hiz_ccs_wt. The name
is misleading because it doesn't return whether or not the surface
supports HiZ+CCS in write-through mode (any single-sampled HiZ+CCS
capable surface does) but rather a heuristic decision about whether or
not we want to enable write-through mode based on the usage flags in the
isl_surf.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4056>
Previously, we always set the aux_usage to ISL_AUX_USAGE_HIZ_CCS and let
ISL choose write-through based on isl_surf_supports_hiz_ccs_wt. This
commit makes us choose explicitly at surface creation time whether to
use HIZ_CCS or HIZ_CCS_WT based on the same set of conditions. This is
more explicit and should be more robust as it lets us choose WT mode in
one place rather than trusting isl_surf_supports_hiz_ccs_wt to return
the same thing every time.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4056>
This is distinct from ISL_AUX_USAGE_HIZ_CCS in that the HiZ surface
operates in write-through mode which means that the HiZ surface is only
used for depth-testing acceleration and the CCS-compressed main surface
is always valid so we can texture from it.
Separating full HiZ from write-through mode at the isl_aux_usage level
has a couple of advantages:
1. It's more explicit. Instead of write-through mode depending on the
heuristic decision in isl_surf_supports_hiz_ccs_wt, it's now
something that's explicitly requested by the driver. This should be
more robust than hoping isl_surf_supports_hiz_ccs_wt always returns
the same thing every time. If someone (say BLORP) ever drops a
usage flag on the isl_surf, there's a chance it could return a
different value without us noticing leading to corruptions.
2. Because ISL_AUX_USAGE_HIZ_CCS_WT is it's own isl_aux_usage flag, we
can say inside the driver that HIZ_CCS does not support sampling but
HIZ_CCS_WT does. We can also pass HIZ_CCS_WT to isl_surf_fill_state
and it can do some validation for us beyond what we would be able to
do if we conflate HIZ_CCS_WT and CCS_E.
3. In the future, we can add new heuristics to the driver which do
things such as start all depth surfaces (regardless of usage flags)
off in HIZ_CCS and then do a full resolve and drop to HIZ_CCS_WT the
first time it gets used by the sampler. This would potentially let
us enable the faster HIZ_CCS mode even in cases where it technically
comes in through the API as a texture.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4056>
The first check is redundant because the first thing we do in the "emit
the aux surface" section is assert that we actually have an aux_surf.
The second check involves an exclusion list of things which don't have
aux surfaces on Gen12 but an inclusion list is much simpler because it's
just "does it have MCS?".
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4056>
Currently when lowering mod() we add an extra instruction so if
mod(a,b) == b then 0 is returned instead of b, as mathematically
mod(a,b) is in the interval [0, b).
But Vulkan spec has relaxed this restriction, and allows the result to
be in the interval [0, b].
For the OpenGL case, while the spec does not allow this behaviour, due
the allowed precision errors we can end up having the same result, so
from a practical point of view, this behaviour is allowed (see
https://github.com/KhronosGroup/VK-GL-CTS/issues/51).
This commit takes this in account to remove the extra instruction
required to return 0 instead.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4118>
There are multiple LLVM passes that very much move the
intrinsic using the descriptor outside of the loop, defeating
the entire point of creating the loop.
Defeat the optimizer by splitting the break into a separate
if-statement and putting an optimization barrier on the bool
in between.
v2: Move from a callback based system to begin/end loop.
This does not make it significantly less intrusive but
is a bit nicer with all the extra struct and callback
stubs.
v3: Deal with non-divergent values in divergent path.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2160
Fixes: 028ce52739 "radv: Add non-uniform indexing lowering."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4109>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4109>
We model the machine as vector (with restrictions) to natively handle
mixed types and I/O and other goodies. We use LCRA for the heavylifting.
This commit adds only the modeling to feed into LCRA and spit LCRA
solutions back; next commit will integrate it with the IR.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4158>
The issue was messing with liveness analysis... with Midgard we look at
the writemask to decide how the instruction behaves. Here, since our ALU
is scalar (except for subdivision which doesn't have proper writemasks
anyway) we just look at the component count directly -- either 4 for
vector instructions (essentially - for smaller loads we can replicate
manually without much burden), or 1 for scalar.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4158>
1. Implement vkCmdBindTransformFeedbackBuffersEXT,
vkCmdBeginTransformFeedbackEXT and vkCmdEndTransformFeedbackEXT.
- Not handling counter buffers yet.
2. Implement streamout emit function, mostly taken from fd6_emit.c
v2. Replace emit_pkt4 funcs with emit_regs.
v3. Don't copy the state of stream-output from tu_pipeline.
v4. Set zero to VPC_SO_CNTL/VPC_SO_BUF_CNTL in tu6_init_hw.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3942>
Mostly taken from fd6_program.c.
v2. Note that it forces to use full VS instead of binning pass VS if
there's stream output as the binning pass VS will have outputs on
other than position/psize stripped out, which is the same as freedreno.
v3. fix indentation.
v4. Use register index instead of location when setup streamout.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3942>
Define new structures for streamout buffers and state.
Most members of the state struct are taken from freedreno driver.
v2. Use IR3_MAX_SO_* and avoid using magic values.
v3. Remove the state of stream-output in tu_cmd_state and use one in
tu_pipeline and split out reset and enabled fields.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3942>
- Add one member to the existed ir3_stream_output so that we could
assign location information from nir_xfb_info, rather than defining
new struct.
- Redefine maximum of so buffers, streams and outputs, which will be
used for turnip.
- Also enable caps for transform feedback for spirv_to_nir.
v2. Remove redefined maximums and use IR3_MAX_SO_* and add
IR3_MAX_SO_STREAMS.
v3. Remove the newly added location field so that we could keep aligned
with 32 bytes. Instead we create an array mapping between the location
and consecutive index, which is GL driver is doing.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3942>
When creating an egl surface from an ANativeWindow, the window's usage
flags need to be set so that buffers are allocated properly.
Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lepton Wu <lepton@chromium.org>
This supports powering up the device (using an external tool you
provide based on your particular lab), talking over serial to wait for
the fastboot prompt, and then booting a fastboot image on a target
device.
I was previously relying on LAVA for this, but that ran afoul of
corporate policies related to the AGPL. However, LAVA wasn't doing
too much for us, given that gitlab already has a job scheduler and
tagging and runners. We were spending a lot of engineering on making
the two systems match up, when we can just have gitlab do it directly.
Lightly-reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4076>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4076>
Knowing following:
- CMP writes to flag register the result of
applying cmod to the `src0 - src1`.
After that it stores the same value to dst.
Other instructions first store their result to
dst, and then store cmod(dst) to the flag
register.
- inst is either CMP or MOV
- inst->dst is null
- inst->src[0] overlaps with scan_inst->dst
- inst->src[1] is zero
- scan_inst wrote to a flag register
There can be three possible paths:
- scan_inst is CMP:
Considering that src0 is either 0x0 (false),
or 0xffffffff (true), and src1 is 0x0:
- If inst's cmod is NZ, we can always remove
scan_inst: NZ is invariant for false and true. This
holds even if src0 is NaN: .nz is the only cmod,
that returns true for NaN.
- .g is invariant if src0 has a UD type
- .l is invariant if src0 has a D type
- scan_inst and inst have the same cmod:
If scan_inst is anything than CMP, it already
wrote the appropriate value to the flag register.
- else:
We can change cmod of scan_inst to that of inst,
and remove inst. It is valid as long as we make
sure that no instruction uses the flag register
between scan_inst and inst.
Nine new cmod_propagation unit tests:
- cmp_cmpnz
- cmp_cmpg
- plnnz_cmpnz
- plnnz_cmpz (*)
- plnnz_sel_cmpz
- cmp_cmpg_D
- cmp_cmpg_UD (*)
- cmp_cmpl_D (*)
- cmp_cmpl_UD
(*) this would fail without changes to brw_fs_cmod_propagation.
This fixes optimisation that used to be illegal (see issue #2154)
= Before =
0: linterp.z.f0.0(8) vgrf0:F, g2:F, attr0<0>:F
1: cmp.nz.f0.0(8) null:F, vgrf0:F, 0f
= After =
0: linterp.z.f0.0(8) vgrf0:F, g2:F, attr0<0>:F
Now it is optimised as such (note change of cmod in line 0):
= Before =
0: linterp.z.f0.0(8) vgrf0:F, g2:F, attr0<0>:F
1: cmp.nz.f0.0(8) null:F, vgrf0:F, 0f
= After =
0: linterp.nz.f0.0(8) vgrf0:F, g2:F, attr0<0>:F
No shaderdb changes
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2154
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3348>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3348>
This optimizes out several superfluous stores of the tess factors,
especially if the shader wrote those outputs multiple times.
Pipeline DB changes on GFX10:
Totals from affected shaders:
SGPRS: 30384 -> 29536 (-2.79 %)
Code Size: 2260720 -> 2214484 (-2.05 %) bytes
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>
Tessellation control shaders can have per-vertex inputs,
and both per-vertex and per-patch outputs. TCS can not only store,
but also load their outputs.
The TCS outputs are stored in RING_HS_TESS_OFFCHIP in VMEM, which
is where the TES reads them from. Additionally, the are also stored
in LDS to make sure they can be loaded fast when read by the TCS.
Tessellation factors are always just stored in LDS.
At the end of the shader, the first shader invocation reads these
from LDS and writes them to RING_HS_TESS_FACTOR in VMEM, and
additionally to RING_HS_TESS_OFFCHIP when they are read by
the Tessellation Evaluation Shader.
This implementation matches the memory layouts used by radv_nir_to_llvm.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3964>
Only on supported GPUs at the moment; for older Bifrost that don't
support these, I'm not sure yet where the right place to do the lowering
is. NIR algebraic rules would be "nice" but probably impractical -- but
it wouldn't be hard to do it directly in BIR (as a lowering pass or
alternative implementation).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4139>
We'd really rather not emit extracts. We are approaching on a vector IR
anyway which is annoying but really necessary to handle I/O and fp16
correctly. So let's just go all the way and deal with swizzles and masks
within reason; it'll still be somewhat saner in the long-term.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4139>
In the second (scalar pass) use the information about # of registers
used in the first pass as the target max, and round-robin within that
range. This generally gives the post-RA sched pass more opportunities
to re-order instructions to remove nop's.
Also, we can be a bit clever when assigning dest registers for SFU
instructions, by picking the register used for it's src (if available
and already assigned). This avoids some (ss) syncs caused by write
after read hazards. (Ie. the SFU instruction will read it's own src
before writing dest.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
Add a parameter so the callback can know which node it is selecting a
register for. And remove the graph parameter, as it is unused by
existing users, and somewhat unnecessary (ie. the callback data could
be used instead).
And add a comment so $future_me remembers how this works.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071>
Vertex processing workaround flags can be split into
array and current vertex attributes. By that we
can use specific access functions for these different
vertex attribute kinds. This finally obsoletes
some access functions that I introduced last winter
for a smooth transition.
v2: Style fixes.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/308>
The change basically reimplements array setup by walking
the gl_contex::Array._DrawVAO on a per binding sequence.
In this way we can make direct use of the application
provided minimum set of buffer objects and emit fewer relocs.
v2: Rebase onto:
compiler: Move double_inputs to gl_program::DualSlotInputs
v3: Rebase onto introduction of gl_vertex_format
v4: Reorder and extend patch series.
v5: Split out two hunks into seperate patches.
v6: Avoid using GL* types.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/308>
The vertex array object contains 32 vertex arrays. By that we cannot
reference more then these in the vertex shader inputs. So, we can use
the 32 bits u_bit_scan function to iterate the vertex shader inputs
and place an assert that only these are present. Also place an other
assert that the vertex array setup in i965 does not overrun the
enabled array in brw_context::vs::enabled.
v2: Style fixes.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/308>
Avoid looping over all VARYING_SLOT_MAX urb_setup array
entries from genX_upload_sbe. Prepare an array indirection
to the active entries of urb_setup already in the compile
step. On upload only walk the active arrays.
v2: Use uint8_t to store the attribute numbers.
v3: Change loop to build up the array indirection.
v4: Rebase.
v5: Style fix.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/308>
This way we avoid potential state leaks and keep the shader_meta
initialization in once place. The time spent preparing the shader
descriptors should be negligible compared to the time spent pushing
those descriptors to the transient buffer (remember we are writing to
non-cacheable memory here).
Note that we might get back to some sort of shader_meta descriptor
caching at some point if that proves necessary, but now we have those
panfrost_frag_meta_xxx_update() helpers now where xxx maps directly to
a CSO bind, which should ease desc template updates.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4083>
panfrost_shader_state.tripipe is used as a template for shader_meta
desc emission, but shader_meta desc preparation time should be negligible
compared to desc emission time (remember we are writing to non-cacheable
memory here). Let's prepare for generating the the shader_meta desc
entirely at draw time by adding the necessary fields to
panfrost_shader_state.
Note that we might brink back some sort of shader_meta desc caching at
some point, but let's simplify things a bit for now.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4083>
That's part of our attempt to make panfrost_emit_for_draw() a bit more
dry and eventually get rid of it by inlining the code in
panfrost_draw_vbo(). This is just one step in this direction.
Note that we get rid of the panfrost_rasterizer.tiler_gl_enables field
along the way, as setting/clearing those bits at draw time instead of
doing when the state is created should make a huge difference. We might
get back to pre-computed VT descs at some point, but let's keep things
simple for now.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4083>
Move panfrost_attach_vt_framebuffer() to pan_cmdstream.c and change its
name to panfrost_vt_attach_framebuffer() so we can use a consistent
prefix (panfrost_vt_) for all helpers initializing/updating
midgard_payload_vertex_tiler fields.
Note that the function only initializes one VT object now.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4083>
Right now we emit two shader descriptors for the fragment shader, one
when panfrost_patch_shader_state() is called, and the final one
including both the shader_meta and the blend RT descriptors.
The first generated fragment shader descriptor is never used, since the
second one overrides the postfix.shader pointer.
Let's dissociate the state patching logic from the descriptor emission
so we don't upload descriptors that are never used.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4083>
Let's move the viewport descriptor emission logic to a dedicated helper
in order to shrink a bit the panfrost_emit_for_draw().
Note that this helper is placed in a new pan_cmdstream.c file where we
will group all cmdstream related helpers (everything that's related to
HW descriptor initialization emission).
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4083>
Checking vs->writes_point_size is not enough, as we might have a vertex
shader writing point size, but a primitive that's not MALI_POINT. That
currently works because emit_varying_descriptor() is called before the
primitive_size.constant field is update, but let's make the logic more
robust, just in case things are re-ordered at some point.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4083>
The current workaround for this hardware bug involved marking the ADD
instruction used to initialize the address register as NoMask on
Gen12, which was based on the assumption that the problem was caused
by a hardware bug affecting the application of the execution mask to
the address register write.
However that doesn't seem to be the case: The address register write
was working correctly, the real problem leading to hangs on TGL is
that the indirect addressing logic is unable to deal with garbage
values in the address register (e.g. misaligned offsets), even for
channels which are currently inactive due to non-uniform control flow.
The current workaround isn't able to avoid that situation in general,
since the result of the NoMask ADD instruction for a dead channel is
calculated based on the corresponding (dead) component of the
indirect_byte_offset source, which would still be undefined in the
likely case that the source was initialized under control flow itself.
This would lead to hangs whenever MOV_INDIRECT was used under
non-uniform control flow in some scenarios like a tessellation shader
from GFXBench5/gl_4 (AKA Car Chase) on TGL. In addition I've managed
to reproduce the same issue on earlier platforms by initializing the
whole address register with garbage before the ADD instruction, so
this seems to be a long-standing issue we have avoided mostly by luck.
This patch fixes the problem and applies the workaround to all
platforms, since even when the hardware is able to deal with garbage
address values without hanging there might be a significant
performance cost from reading random GRF registers due to the useless
extra EU cycles spent fetching registers for dead channels and due to
the potential for unintended serialization with respect to other
random instructions that could be executed in parallel, which may have
had a cost of the order of hundreds of cycles in the worst case
scenario.
Fixes: f93dfb509c "intel/fs: Write the address register with NoMask for MOV_INDIRECT"
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is the same as ir_unop_f2f16 except that it comes with a promise
that it is safe to optimise it out if the result is immediately
converted back to float32 again. Normally this would be a lossy
operation but it is safe to do if the conversion was generated as part
of the precision lowering pass.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3929>
This way the generated constant folding code doesn't need to
understand fp16. All operations have to be expanded to full float for
evaulation on the CPU, so we might as well do it up front. As far as
GLSL is concerned, fp16 isn't a separate type from float, so
everything we're supposed to support for float we need to do for fp16.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3929>
For now tests only use these drivers:
* llvmpipe
* softpipe
* freedreno
* lima
* etnaviv
* panfrost
So using rules:changes gitlab feature to run the tests when the changes
made are potentially affecting these drivers.
A few notes:
* the following code:
.piglit-test:
extends:
- .test-gl
- .llvmpipe-rules
makes gitlab replace .test-gl "rules:changes" values by the one from
".llvmpipe-rules".
* rules:changes always matches for non-MR new branches so jobs will always be
created (and they'll be run if their dependencies are run). For pushes to
existing branches the files changed by the push are used to match the
rules:changes path.
* the same gitlab feature could be used for some build jobs
Acked-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2569>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2569>
Passing shader_stats to the fs_generator constructor means that the
SIMD8 shader stats from the visitor (such as the scheduler mode) will be
reported out for the SIMD16/SIMD32 versions as well.
As you can see, we are now passing 'shader_stats' and 'stats' to
generate_code(), which is obviously odd looking. Ian rebased and
committed an old patch of mine which added the shader_stats struct on
July 30 in commit dabb5d4bee (i965/fs: Add a shader_stats struct.) and
shortly after on August 12 Jason added the brw_compile_stats struct in
commit 134607760a (intel/compiler: Fill a compiler statistics struct).
I'd like to combine the two, but I'm not sure how. shader_stats is an
input to generate_code() while brw_compile_stats is an output and is
only used by the Vulkan driver. Leave it as is for now...
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4093>
Do the absolute simplest possible thing -- create a clause for every
instruction, and just pick whichever slot we can, nopping the other,
copying whatever constant we have whether it's used or not.
To be clear - this is not to be used in a production compiler. But this
lets actual bundles and clauses show up in the BIR, which unblocks work
on final code generation and packing (which can happen more or less in
parallel to NIR->BIR, optimization, register allocation, and writing an
actual scheduling).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4097>
In the laziest possible way... We can just emit worst case moves which
DCE will eat for breakfast anyway, and inline constants on instructions
where that is supported directly. This approach eliminates a lot of
nasty corner cases that Midgard's crazy cache scheme has hit, at the
expense of slightly more work for DCE (but it's only a single iteration
of an O(N) pass that has to run anyway..)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4097>
Pretty much a copypaste from Midgard except where architectural
decisions diverge around vectorization. On that note, we will need our
own ALU scalarization pass at some point (or rather we'll need to extend
nir_lower_alu_scalar) to allow partial lowering for 8/16-bit ops. I.e.
we'll approximately need to lower
vec4 16 ssa_2 = fadd ssa_0, ssa_1
to
vec2 16 ssa_2 = fadd ssa_0.xy, ssa_1.xy
vec2 16 ssa_3 = fadd ssa_0.zw, ssa_1.zw
vec4 16 ssa_4 = vec4 ssa_2.x, ssa_2.y, ssa_3.x, ssa_4
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4097>
make_surface() contained a giant if-tree for creation of aux surfaces.
Move the if-tree into its own function, add_aux_surface_if_supported().
This will simplify future changes for VK_EXT_image_drm_format_modifier.
This patch merely moves the code verbatim, then extracts duplicate
assertions to the top.
v2: Rename func to add_aux_surface_if_supported [for jekstrand].
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4096>
The function returns true if hardware limitations require the image
plane to use a shadow surface. It replaces equivalent code in
make_surface().
Refactor only. No intended change in behavior.
Why extract this code out of vkCreateImage? If an image requires
a shadow surface, then that may impact its support for DRM format
modifiers, which must be evaluated during
vkGetPhysicalDeviceImageFormatProperties2.
v2:
- Use early return. [for jekstrand]
- Unexport function.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4096>
These block member have been split into individual vars so we need
to set the correct offsets for each member in the new glsl nir
linker. We also take this opportunity to set the correct location
for the variable.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4050>
This defines a new BRW_ANALYSIS object which wraps the register
pressure computation code along with its result. For the rationale
see the previous commits converting the liveness and dominance
analysis passes to the IR analysis framework.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
It makes sense to keep the result of analysis passes independent from
the IR itself. Instead of representing the idom tree as a pointer in
each basic block pointing to its immediate dominator, the whole
dominator tree is represented separately from the IR as an array of
pointers inside the idom_tree object. This has the advantage that
it's no longer possible to use stale dominance results by accident
without having called require() beforehand, which makes sure that the
idom tree is recalculated if necessary.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This only does half of the work. The actual representation of the
idom tree is left untouched, but the computation algorithm is moved
into a separate analysis result class wrapped in a BRW_ANALYSIS
object, along with the intersect() and dump_domtree() auxiliary
functions in order to keep things tidy.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This involves wrapping vec4_live_variables in a BRW_ANALYSIS object
and hooking it up to invalidate_analysis() so it's properly
invalidated. Seems like a lot of churn but it's fairly
straightforward. The vec4_visitor invalidate_ and
calculate_live_intervals() methods are no longer necessary after this
change.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This involves wrapping fs_live_variables in a BRW_ANALYSIS object and
hooking it up to invalidate_analysis() so it's properly invalidated.
Seems like a lot of churn but it's fairly straightforward. The
fs_visitor invalidate_ and calculate_live_intervals() methods are no
longer necessary after this change.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This could be improved somewhat with additional validation of the
calculated live in/out sets and by checking that the calculated live
intervals are minimal (which isn't strictly necessary to guarantee the
correctness of the program). This should be good enough though to
catch accidental use of stale liveness results due to missing or
incorrect analysis invalidation.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This could be improved somewhat with additional validation of the
calculated live in/out sets and by checking that the calculated live
intervals are minimal (which isn't strictly necessary to guarantee the
correctness of the program). This should be good enough though to
catch accidental use of stale liveness results due to missing or
incorrect analysis invalidation.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This removes the dependency of fs_live_variables on fs_visitor. The
IR analysis framework requires the analysis result to be constructible
with a single argument -- The second argument was redundant anyway.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This makes the structure of the vec4 live intervals calculation more
similar to the FS back-end liveness analysis code. The non-CF-aware
start/end computation is moved into the same pass that calculates the
block-local def/use sets, which saves quite a bit of code, while the
CF-aware start/end computation is moved into a separate
compute_start_end() function as is done in the FS back-end.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This moves the following methods that are currently defined in
vec4_visitor (even though they are side products of the liveness
analysis computation) and are already implemented in
brw_vec4_live_variables.cpp:
> int var_range_start(unsigned v, unsigned n) const;
> int var_range_end(unsigned v, unsigned n) const;
> bool virtual_grf_interferes(int a, int b) const;
> int *virtual_grf_start;
> int *virtual_grf_end;
It makes sense for them to be part of the vec4_live_variables object,
because they have the same lifetime as other liveness analysis results
and because this will allow some extra validation to happen wherever
they are accessed in order to make sure that we only ever use
up-to-date liveness analysis results.
The naming of the virtual_grf_start/end arrays was rather misleading,
they were indexed by variable rather than by vgrf, this renames them
start/end to match the FS liveness analysis pass. The churn in the
definition of var_range_start/end is just in order to avoid a
collision between the start/end arrays and local variables declared
with the same name.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This moves the following methods that are currently defined in
fs_visitor (even though they are side products of the liveness
analysis computation) and are already implemented in
brw_fs_live_variables.cpp:
> bool virtual_grf_interferes(int a, int b) const;
> int *virtual_grf_start;
> int *virtual_grf_end;
It makes sense for them to be part of the fs_live_variables object,
because they have the same lifetime as other liveness analysis results
and because this will allow some extra validation to happen wherever
they are accessed in order to make sure that we only ever use
up-to-date liveness analysis results.
This shortens the virtual_grf prefix in order to compensate for the
slightly increased lexical overhead from the live_intervals pointer
dereference.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
Have fun reading through the whole back-end optimizer to verify
whether I've missed any dependency flags -- Or alternatively, just
trust that any mistake here will trigger an assertion failure during
analysis pass validation if it ever poses a problem for the
consistency of any of the analysis passes managed by the framework.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
I've deliberately separated this from the general analysis pass
infrastructure in order to discuss it independently. The dependency
classes defined here refer to state changes of several objects of the
program IR, and are fully orthogonal and expected to change less often
than the set of analysis passes present in the compiler back-end.
The objective is to avoid unnecessary coupling between optimization
and analysis passes in the back-end. By doing things in this way the
set of flags to be passed to invalidate_analysis() can be determined
from knowledge of a single optimization pass and a small set of well
specified dependency classes alone -- IOW there is no need to audit
all analysis passes to find out which ones might be affected by
certain kind of program transformation performed by an optimization
pass, as well as the converse, there is no need to audit all
optimization passes when writing a new analysis pass to find out which
ones can potentially invalidate the result of the analysis.
The set of dependency classes defined here is rather conservative and
mainly based on the requirements of the few analysis passes already
part of the back-end. I've also used them without difficulty with a
few additional analysis passes I've written but haven't yet sent for
review.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
The invalidate_analysis() method knows what analysis passes there are
in the back-end and calls their invalidate() method to report changes
in the IR. For the moment it just calls invalidate_live_intervals()
(which will eventually be fully replaced by this function) if anything
changed.
This makes all optimization passes invalidate DEPENDENCY_EVERYTHING,
which is clearly far from ideal -- The dependency classes passed to
invalidate_analysis() will be refined in a future commit.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
Motivated in detail in the source code. The only piece missing here
from the analysis pass infrastructure is some sort of mechanism to
broadcast changes in the IR to all existing analysis passes, which
will be addressed by a future commit. The analysis_dependency_class
enum might seem a bit silly at this point, more interesting dependency
categories will be defined later on.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
brw_vec4.h (in particular vec4_visitor) is logically a user of the
live variables analysis pass, not the other way around.
brw_vec4_live_variables.h requires the definition of some VEC4 IR data
structures to compile, but those can be obtained directly from
brw_ir_vec4.h without including brw_vec4.h.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
brw_fs.h (in particular fs_visitor) is logically a user of the live
variables analysis pass, not the other way around.
brw_fs_live_variables.h requires the definition of some FS IR data
structures to compile, but those can be obtained directly from
brw_ir_fs.h without including brw_fs.h. The dependency of
fs_live_variables on fs_visitor is rather accidental and will be
removed in a future commit, a forward declaration is enough for the
moment.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
When this commit was originally written, these two structures had the
exact same name. Subsequently in commit 12a8f2616a (intel/compiler:
Fix C++ one definition rule violations) they were renamed.
Original commit message:
> These two structures have exactly the same name which prevents the two
> files from being included at the same time and could cause serious
> trouble in the future if it ever leads to a (silent) violation of the
> C++ one definition rule.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This pulls out the i965 IR definitions into a separate file and leaves
the top-level backend_shader structure and back-end compiler entry
points in brw_shader.h. The purpose is to keep things tidy and
prevent a nasty circular dependency between brw_cfg.h and
brw_shader.h. The logical dependency between these data structures
looks like:
backend_shader (brw_shader.h) -> cfg_t (brw_cfg.h)
-> bblock_t (brw_cfg.h) -> backend_instruction (brw_shader.h)
This circular header dependency is currently resolved by using forward
declarations of cfg_t/bblock_t in brw_shader.h and having brw_cfg.h
include brw_shader.h, which seems backwards and won't work at all when
the forward declarations of cfg_t/bblock_t are no longer sufficient in
a future commit.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
We depend on BLORP to convert the clear color and write it into the
clear color buffer for us. However, we weren't bothering to call blorp
in the case where the state is ISL_AUX_STATE_CLEAR. This leads to the
clear color not getting properly updated if we have back-to-back clears
with different clear colors. Technically, we could go out of our way to
set the clear color directly from iris in this case but this is a case
we're unlikely to see in the wild so let's not bother. This matches
what we already do for color surfaces.
Cc: mesa-stable@lists.freedesktop.org
Reported-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4073>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4073>
This commit mainly adds basic infrastructure for tracking vertex array
state.
If glthread gets a non-VBO pointer, this commit delays disabling
glthread until glDraw is called. The next will change that to "sync"
instead of "disable".
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3948>
It's straightfoward except that I had to hack the python scripts to add
"marshal_count", which behaves just like "count" except that "variable_param"
is ignored. ("variable_param" changes the behavior of "count", which I don't
want)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3948>
It was diabled because RADV is the only driver that tests Vulkan
and running CTS on my personal machine and without recovery is
not safe enough for CI (too long and too unstable).
Now that we are going to test Fossilize with RADV, it's needed to
build the test image for VK unconditionally. As RADV now supports
creating NULL devices, the fossilize jobs can run everywhere.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3960>
On Gen4 and G45 and earlier, we have to handle weird offsetting to write
to depth and stencil due to a lack of proper depth mipmapping support in
hardware. On Gen6, we have to deal with strange HiZ and stencil
layouts. Prior to Gen9, we also had to do crazy things for stencil
writes because we didn't support GL_ARB_shader_stencil_export and
friends in hardware. However, starting with Gen7 for depth and Gen9 for
stencil, we can easily write out with the "right" hardware. This allows
us to leave HiZ and other compression enabled for blorp_blit() and
blorp_copy() operations.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3717>
So we can debug the IR in memory before code emit has happened. We'd
like to have a complete dump of the IR -- neglecting this with Midgard
was one of those mistakes I've regretted so let's get this right for the
first time around.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4061>
There's a magic bit in preload_regs which controls this. It doesn't
appear to be supported on G71 but it is on G52. I'd guess G72 supports
it too but I don't have a way to check this.
Needless to say, we'll need a quirks database for this.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4061>
This reverts commit f04d7439a0.
It no longer helps performance and the current vbo implementation is
faster anyway.
The app that hit this was a CAD program called Spazio3D. It made pretty
terrible use of the OpenGL API and we sent them some tips for improvements.
I'm assuming they've fixed this by now.
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4052>
Fixes the following building errors:
In file included from external/mesa/src/gallium/drivers/r600/sfn/sfn_debug.cpp:28:
In file included from external/mesa/src/gallium/drivers/r600/sfn/sfn_debug.h:34:
In file included from external/mesa/src/compiler/nir/nir.h:41:
In file included from external/mesa/src/compiler/nir_types.h:36:
external/mesa/src/compiler/glsl_types.h:38:10: fatal error: 'main/config.h' file not found
#include "main/config.h"
^~~~~~~~~~~~~~~
1 error generated.
In file included from external/mesa/src/gallium/drivers/r600/sfn/sfn_debug.cpp:28:
In file included from external/mesa/src/gallium/drivers/r600/sfn/sfn_debug.h:34:
external/mesa/src/compiler/nir/nir.h:50:10: fatal error: 'nir_opcodes.h' file not found
#include "nir_opcodes.h"
^~~~~~~~~~~~~~~
1 error generated.
Fixes: f718ac62 ("r600/sfn: Add a basic nir shader backend")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Fixes the following building errors:
In file included from external/mesa/src/amd/compiler/aco_dead_code_analysis.cpp:25:
In file included from external/mesa/src/amd/compiler/aco_ir.h:33:
In file included from external/mesa/src/compiler/nir/nir.h:40:
external/mesa/src/util/format/u_format.h:33:10: fatal error: 'pipe/p_format.h' file not found
#include "pipe/p_format.h"
^~~~~~~~~~~~~~~~~
...
In file included from external/mesa/src/amd/compiler/aco_dominance.cpp:31:
In file included from external/mesa/src/amd/compiler/aco_ir.h:33:
In file included from external/mesa/src/compiler/nir/nir.h:40:
external/mesa/src/util/format/u_format.h:33:10: fatal error: 'pipe/p_format.h' file not found
#include "pipe/p_format.h"
^~~~~~~~~~~~~~~~~
...
In file included from external/mesa/src/amd/compiler/aco_instruction_selection.cpp:31:
In file included from external/mesa/src/amd/common/ac_shader_util.h:32:
In file included from external/mesa/src/compiler/nir/nir.h:40:
external/mesa/src/util/format/u_format.h:33:10: fatal error: 'pipe/p_format.h' file not found
#include "pipe/p_format.h"
^~~~~~~~~~~~~~~~~
3 errors generated.
Fixes: 8d07d661 ("glsl,nir: Switch the enum representing shader image formats to PIPE_FORMAT.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
In preparation to having "vk" (Vulkan) along "gl" (OpenGL/ES).
This is so it is clearer which traces belong to which API and also for
the build jobs.
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Since we are at it, replace "cd" with pushd / popd and homogenize how
VK-GL-CTS is built in comparison with other build scripts.
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
The Wayland platform's resize_callback is invoked from libwayland-egl
when wl_egl_window_resize() is called. The resize call is the only place
for the application to insert dx/dy arguments to wl_surface_attach().
When modifying the cursor hotspot (as in wayland/wayland#148), we want
to set dx/dy, but leave the surface size the same. If we get
wl_egl_window_resize() with the same width and height argument as we
already have, we do not need to invalidate our existing drawable.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4030>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4030>
This reverts commit 35fc7bdf0e.
Unfortunately mentioned commit introduced a memory leak because
`driwindowsMapConfigs` and `createDriMode` functions allocate
small memory portions for each element:
21,576 (232 direct, 21,344 indirect) bytes in 1 blocks are definitely lost in loss record 1,411 of 1,414
at 0x483A7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x5D4AA09: createDriMode (dri_common.c:291)
by 0x5D4ABF5: driConvertConfigs (dri_common.c:310)
by 0x5D58414: dri3_create_screen (dri3_glx.c:945)
by 0x5D39829: AllocAndFetchScreenConfigs (glxext.c:815)
by 0x5D39C57: __glXInitialize (glxext.c:941)
by 0x5D3290A: GetGLXPrivScreenConfig (glxcmds.c:174)
by 0x5D34F38: glXQueryExtensionsString (glxcmds.c:1307)
by 0x4F83038: glXQueryExtensionsString (in /usr/local/lib/libGL.so.1.7.0)
by 0x4F2EA6B: ??? (in /usr/lib/x86_64-linux-gnu/libwaffle-1.so.0.6.0)
by 0x4F2A0D7: waffle_display_connect (in /usr/lib/x86_64-linux-gnu/libwaffle-1.so.0.6.0)
by 0x498F42A: wfl_checked_display_connect (piglit-util-waffle.h:74)
There is one more thing which disallow us to easily fix it are different element sizes
for instance: `glx_config_create_list` allocates memory just for `glx_config`,
`driwindowsMapConfigs` for `driwindows_config` and
`createDriMode` for `__GLXDRIconfigPrivate`.
Yes it is possible but size of such fix
will be more big and complex than original one.
So it make sense only if the malloc overhead
really is a big problem there.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3406>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3406>
imfi.sType was being set to an invalid value, triggering a warning in Clang. The only valid value for imfi.sType is VK_STRUCTURE_TYPE_IMPORT_MEMORY_FD_INFO_KHR which is the value it is being given at initialisation time, a few lines earlier. The incorrect value, VK_EXTERNAL_MEMORY_HANDLE_TYPE_OPAQUE_FD_BIT, is supposed to be used in imfi.handleType instead - and indeed, handleType *is* being set to this value a few lines later.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4034>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4034>
Avoids the following Android Build System error:
FAILED:
build/make/core/binary.mk:1245: error: external/mesa/src/gallium/auxiliary/Android.mk: libmesa_gallium: Unused source files: tessellator/tessellator.hpp
10:24:30 ckati failed with: exit status 1
Fixes: bd0188f ("gallium/auxiliary: add the microsoft tessellator and a pipe wrapper.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Dave Airlie <airlied@redhat.com>
GEN:BUG:14010455700 (lineage 1808121037):
"To avoid sporadic corruptions “Set 0x7010[9] when Depth Buffer
Surface Format is D16_UNORM , surface type is not NULL & 1X_MSAA"
Required for fixing ttps://gitlab.freedesktop.org/mesa/mesa/issues/2501.
GEN:BUG:1806527549:
"Set HIZ_CHICKEN (7018h) bit 13 = 1 when depth buffer is D16_UNORM."
This one could fix a GPU hang in some workloads.
v2: Implement WA in isl and add another similar WA (Jason).
v3: Add flushes before changing chicken registers (Jason)
v4: Depth flush and stall + end of pipe sync when changing registers
(Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3801>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3801>
Some lowering passes modify the value of built-in variables in
order for drivers to work properly. However, modifying such values
will also break transform feedback as the captured value won't
match what's expected.
For example, on some hardware, the vertex shaders are expected to
output gl_Position in screen space. However, the transform
feedback captured value is still supposed to be the world-space
coordinates (see nir_lower_viewport_transform).
To fix that, we create a new variable that contains the
pre-transformation value and use it for transform feedback instead
of the built-in one.
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2433>
When varying packing is disabled for transform feedback and a xfb
declaration points to an array element or structure member, the
element/member should be aligned to the start of a slot as well.
If that's not the case, a new varying is created and the
element/member value is copied.
There might a way to further optimize the number of slots allocated
or the number of copies necessary if the performance cost is
problematic. For example, in cases where simply padding the top
level variable might correctly align all the captured values.
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2433>
Some drivers (e.g. Panfrost) don't support packing of varyings when
used for transform feedback. This new constant ensures that any
varying used for xfb is aligned at the start of a slot and won't be
packed with other varyings.
Scenarios where transform feedback declarations are related to an
array element or a struct member will be handled in a subsequent
patch.
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> (Fix order of arguments to varying_matches())
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2433>
Previously, we only added the secondary command buffer's draw and
draw epilogue command streams to the primary command buffer on
vkCmdExecuteCommands. However, we also need to merge the primary cs
for non-draw operations like vkCmdCopyBuffer and vkCmdBeginQuery.
Fixes dEQP-VK.memory.pipeline_barrier.host_write_transfer_src.*
and various other tests in dEQP-VK.api.command_buffers.*.
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3988>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3988>
This is the same idea as "intel: fix the gen 11 compute shader scratch
IDs".
The number of EUs on TGL is not the same as ICL, but the
MEDIA_VFE_STATE restrictions stay the same, so adapt the code to it.
Also, consider the base configuration instead of what we read from the
Kernel.
According to Mark, this fixes the following piglit tests on TGL:
piglit.spec.arb_compute_shader.execution.shared-atomicmax-uint.tglm64
piglit.spec.arb_compute_shader.execution.shared-atomicmax-int.tglm64
piglit.spec.intel_shader_atomic_float_minmax.execution.shared-atomicmax-float.tglm64
v2: s/ICL+/Gen11+/ (Jason).
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4006>
Scratch space allocation is based on the number of threads in the base
configuration, and we only have one base configuration for ICL, with 8
subslices.
This fixes an issue with Aztec on Vulkan in a machine with a
configuration that's not the base. The issue looks like a regression
from b9e93db208, but it seems things are broken since forever, just
not easily reproducible.
v2: Reimplement it using the subslices variable. Don't touch TGL.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4006>
As per
fb9b2a8731,
the compositor may advertise DRM_FORMAT_MOD_INVALID as a supported
modifier. This patch makes mesa recognize this fact and allow
linux_dmabuf usage with the INVALID modifier in this case.
In case the driver doesn't support modifiers, we can still use
linux-dmabuf protocol instead of the legacy wl_drm interface to create
wl_buffers. This will help compositors to handle these buffers better.
In this commit, the INVALID modifier is allowed to be added to the list
of supported modifiers, and create_wl_buffer will be able to use
linux_dmabuf with an INVALID modifier if the compositor advertised it as
supported.
Signed-off-by: Ivan Molodetskikh <yalterz@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2147>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2147>
Now that we have 7 (soon 8) boards available, there's capacity to be
testing GLES 3.0. However, due to (it looks like) buffer overflows in the
driver, we end up with flaky test results: 1/60 jobs spuriously failed,
and another 6/60 jobs reported flakes. At 6 jobs per pipeline, that's way
too high of a failure rate to enable for non-freedreno developers. Leave
the job present but disabled so that we can do manual test runs for
regressions.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3661>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3661>
This should get us better stability of the db410c boards by having a
smaller per-board software stack, with no disks involved (just initramfs).
Additionally, the new cluster is 7 (soon 8) db410cs, while currently the
docker cluster only has 1/4 of its db410cs still running.
Unfortunately, we have to prepare the fastboot boot image during the ARM
drivers build stage, because LAVA relies on publicly available URLs for
the images to load into the bootloaders of the boards, and the only thing
we have for that is gitlab's artifacts.
Note that this testing relies on the boards being freshly flashed with the
linaro v136 firmware to pick up the initramfs size fixes and to stop the
boot at fastboot.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3661>
Our current GPGPU_WALKER code only supports up to 64 threads.
On HSW we could use up to 70 and TGL up to 112, but only if the walker
is adjusted so the width does not exceed 64. Work to support this is
in progress.
Previous to this change, we might try to downgrade to SIMD8 if the
SIMD16 shader spilled. Since HSW and TGL have the max number of
threads above 64, we would then try to emit an invalid GPGPU walker
command.
Fixes: 932045061b ("i965/cs: Emit compute shader code and upload programs")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
To avoid spurious sync flags, we want to, for a6xx+, operate in terms of
half-regs, with a full precision register testing the corresponding two
half-regs that it conflicts with.
And while we are at it, stop open-coding BITSET
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3989>
These formats are an exception that can't be modeled in the current format
table. Switch to a table with only a single a6xx_format per vk format,
and deal with the exceptions separately (currently the only exception is
10_10_10_2_UNORM which has a different color format).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3806>
This is the bulk of the llvm shader builders and tessellation
execution code.
TCS uses a coroutine launcher like compute shaders to handle
barriers. It executes 4-wide with one input vertex per lane.
Tessellation happens before the TES is run.
TES is just a 4-wide launcher, one per primitive is executed,
with one lane per tessellation coordinate input.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3841>
This adds the initial draw_tess.h with a define needed
for the interfaces. TCS input array doesn't need to handle
patch inputs so can be smaller.
The TCS context has some dummy values to align the textures/images
properly.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3841>
This adds the same tessellator code that swr uses, swr should
move to using this copy, unfortunately that wasn't trivial on my first
look.
The p_tessellator wrapper wraps it in a form that is a useful interface
for draw.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3841>
This is pretty basic (and a bit crappy at the moment). I think we
might want some sort of overlay in the future and also be able to
trigger captures with F12 or whatever.
To record a capture, set RADV_THREAD_TRACE to something greater than
zero (eg. RADV_THREAD_TRACE=100 will capture frame #100). If the
driver didn't crash (or the GPU didn't hang), the capture file
should be stored in /tmp.
To open that capture, use Radeon GPU Profiler and enjoy your
profiling times with RADV! \o/
Note that thread trace support is quite experimental, only GFX9 is
supported at the moment, and a bunch of useful stuff are still missing
(shader ISA, pipelines info, etc). More is comming soon.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3900>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3900>
SQTT is a hardware block that collects thread trace data (like
wave occupancy, timings, etc) for every draw/dispatch calls.
It's only supported on GFX9 at the moment but I will add other
generations support soon.
This is the first step towards profiling with RADV!
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3900>
Midgard has the ability to calculate addresses as part of the load/store
pipeline. We'd like to make use of this to avoid doing this work on the
ALU pipes. To do so, when emitting globals/SSBOs/shareds, we walk the
tree looking for address arithmetic to try to parse out something the
hardware can work with, letting the original instructions be DCE'd
ideally. This analysis is done at the NIR level to properly account for
some messy details of vectorization which we'd rather not poke at the
backend level. (Originally I wrote this as a MIR pass but I'm fairly
sure it was wrong.)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3978>
I'm working on moving the db410c CI from docker to LAVA, which means we
get to boot a custom kernel. To do that, we need to enable ARCH_QCOM in
the kernel, save the dtb around, and include abootimg in our container so
that we can generate combined kernel/dtb/ramdisk images for fastboot.
LAVA's fastboot support is unable to pack the overlay into an abootimg
image, just a cpio rootfs. We could flash the cpio rootfs after overlay
addition, but that takes 2 minutes to do, and causes wear on the devices.
Instead, we'll bring up the network at boot and use wget to fetch the
overlay. We'll want network support anyway, so that we can transfer the
failure xmls back to the gitlab job's artifacts at some point.
Since the msm GPU and realtek network firmware increase our payload by
3MB, add in firmware compression so that it doesn't waste as much RAM on
devices not using it.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3928>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3928>
The LLVM libraries were a significant fraction of the entire payload
(55M/250M uncompressed) into the initramfs of the test boards, but
LLVM is only used for the draw module used in select/feedback (which
isn't even tested in CI on ARM yet).
Assume that llvmpipe draw is safe enough for ARM given the coverage on
x86, and disable LLVM for these jobs.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3928>
This field controls the size of per-thread temporaries (somehow this is
separate from the regular stack for register spilling..), though I'm not
certain on the details. Regardless this value of 0x1e despite being used
in places by the blob seems wrong and is interfering with correct sizing
of the stack.
We don't use non-spilling scratchpad yet, so this is just here to fix
some details of spilling.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3950>
These two cases were flipped from the notes, leading to underestimates
of the padded vertex count, manifesting as visual corruption (random
geometry messed up). This issue was raised when noticing the corruption
went away when dramaticlaly oversizing max_index on an instanced indexed
draw, and then checking that padded_count >= vertex_count -- which
turned out *not* to be the case on certain inputs, a clear issue. Hence
looking into this routine...
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3950>
For index bufer resources (not user index buffers), we're able to cache
results. In practice, the cache works pretty dang well. It's still
important that the min/max computation is efficient (since when the
cache misses it'll run at draw-time and we don't want jank), but this
can eliminate a lot of computations entirely.
We use a custom data structure for caching. Search is O(N) to the size
but sizes are capped so it's effectively O(1). Insertion is O(1) with
automatic oldest eviction, on the assumption that the oldest results are
the least likely to still be useful. We might also experiment with other
heuristics based on actual usage later.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3880>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3880>
Blender was crashing with a SIGFPE even though the divide by 0
logic was kicking in. I'm not sure why TGSI doesn't get into this state.
The problem was is the numerator was INT_MIN we'd replace the div by
0 with a divide by -1, which is an exception for INT_MIN as INT_MIN/-1
== INT_MAX + 1 (too large for 32-bits). Instead for integer divides
just replace the mask values with 0x7fffffff. Also fix up the
result handling so it aligns with TGSI usage. (gives 0)
Fixes: c717ac1247 ("gallivm/nir: wrap idiv to avoid divide by 0 (v2)")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3956>
Even though the workaround description says:
"all the listed commands are non-pipelined and hence flush caused due
to pipeline mode change must not cause performance issues..."
My understanding is that we still need to have the flushes. Also, the
flushes are required not only to stall the pipeline, but also to clear
caches, so I don't think they can simply be discarded.
Additionally, while doing some testing that increased the number of
surface STATE_BASE_ADDRESS emitted, I got a lot more GPU hangs. Adding
these flushes fixes those hangs.
Fixes: b8fbb39a (iris: Implement Gen12 workaround for non pipelined
state)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3908>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3908>
The __DRI_IMAGE_FORMAT_* part wants to be handled for the *101010
type formats as well. Factor out a common function for that task.
That again makes the piglit egl_ext_device_base test work again
for hardware drivers.
v2: Factor out a common function for that task.
v3: dri2_pbuffer_visuals -> dri2_pbuffer_visuals
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fixes: 9acb94b623 "egl: Enable 10bpc EGLConfigs for platform_{device,surfaceless}"
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3790>
Added extra argument named 'type' which can be 'bin' (default if
ommited) or 'c_literal' for input type.
Change 'binary-path' argument name to 'input-path'.
v2:
- Use util_dynarray for assembly (Matt Turner)
- Read data in 8 bytes chunk (Matt Turner)
- Fix help option (Akeem Abodunrin)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3952>
Because we set the needs_data_cache bit from the NIR during compilation,
any time a shader was pulled out of the pipeline cache, we wouldn't set
the bit and the data cache was disabled. Fortunately, on Gen8+, this
bit is ignored because we always use the ALL section in the L3$ config
instead of separate DC and RO sections. On Gen7, however, this meant
that we were basically never running with the data cache enabled and our
compute performance was suffering massively because of it. This commit
improves Geekbench 5 scores on my Haswell GT3 by roughly 330% (no,
that's not a typo).
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3912>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3912>
When a resource is written by a compute shader and then used by a
non-compute stage we sync on last compute job to guarantee that the
resource has been completely written when the next stage reads resources.
In the other cases how flushes are done guarantee the serialization of
the writes and reads.
To reproduce the failure the following tests should be executed in batch
as last test don't fail when run isolated:
KHR-GLES31.core.shader_image_load_store.basic-allFormats-load-fs
KHR-GLES31.core.shader_image_load_store.basic-allFormats-loadStoreComputeStage
KHR-GLES31.core.shader_image_load_store.basic-allTargets-load-cs
KHR-GLES31.core.shader_image_load_store.advanced-sync-vertexArray
v2: Use fence dep instead of bo_wait (Eric Anholt)
v3: Rename struct names (Iago Toral)
Document why is not needed on graphics->compute case. (Iago Toral)
Follow same code pattern of the other update of in_sync_bcl.
v4: Fixed comments style. (Iago Toral)
Fixes KHR-GLES31.core.shader_image_load_store.advanced-sync-vertexArray
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
CC: 19.3 20.0 <mesa-stable@lists.freedesktop.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2700>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2700>
If an attempt to create an shm pixmap in XCreateDrawable fails
then it ends up with the shmid == -1. This means the get image
path needs to fallback so return false in this case to use the
non-shm get image path.
Fixes: 02c3dad0f3 ("Call shmget() with permission 0600 instead of 0777")
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3823>
When Brian in 02c3dad0f3 restricted
the shm permissions it means we hit the fallback paths in some
scenarios we hadn't before.
When you use Xephyr to xdmcp from one user to another the new perms
stop the X server (running as user a) attaching to the SHM segments
from gnome-shell (running as user b).
In this case however only the GLX side of the code had insight into this,
and the dri could was meant of fall back, and it worked for put image
fine but the get image path was broken, since there was no indication
in the broken case of the need to fallback.
This adds a return type to a new interface member that lets the
caller know it has to fallback.
Fixes: 02c3dad0f3 ("Call shmget() with permission 0600 instead of 0777")
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3823>
This is where the other debug_memory_* functions are declared, so let's
move it here for symmetry.
This allows us to drop an include of u_debug_gallium.h, which makes us
depend on gallium-headers in non-gallium code.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3901>
When a resouce is already invalidated, its hash table entry becomes
useless. In addition, the lima_job_free() function won't remove the hash
table entry for invalidated resource. So the hash entry should be
removed when invalidating the resource, otherwise bogus hash entry might
be left in the table, and when the resource is reused in another job,
the code will find the freed job when invalidating and thus result in crash.
Fixes: c64994433c ("lima: track write submits of context (v3)")
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3917>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3917>
Most of the vars tests already had a local helper, so just drop it in
favor of the one in nir_builder. Remaining two tests changed to use
the helper.
The load_store_vectorizer tests were using the specific memory
barriers, but since scoped barriers are also handled, prefer that.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3913>
This will help upcoming C++ code that will have to combine those two
semantics. In C++ it is not possible to do this without a cast or
adding an operator| to the enum. Since having the short form will
also be convient to C, we picked the former solution.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3913>
Provide a generic interface which manages aux resolves in ISL. The
feature differences between this and what's in iris is:
* Support for media compression. ISL_AUX_USAGE_MC behaves differently
from many other usages of CCS, so it was useful to implement this
support upfront, while designing the interfaces.
* Optimizations for full-surface writes. For example, after a
full-surface write occurs with ISL_AUX_USAGE_CCS_E in the PARTIAL_CLEAR
state, isl_aux_state_transition_write() returns COMPRESSED_NO_CLEAR
instead of COMPRESSED_CLEAR.
A performance suggestion for main-surface-invalidating/replacing writes
is given as a comment instead of adding a boolean to
isl_aux_prepare_access(). This avoids extra validation and should be
simple enough for the caller to handle.
v2. Add assertions. (Jason)
v3. Use switches in 2 more functions. (Jason)
Store aux metadata in a static table. (Jason)
Change prepare and finish function signatures. (Jason)
Keep isl_aux_state_transition_* functions separate.
v4. (Jason)
Assert against resolving in AUX_INVALID.
Rename aux_info struct to aux_usage_info.
Drop the justification for each aux_usage_info field.
Split out the NONE case in write function.
Restructure tests to more easily confirm coverage.
Rename access_compressed field to compressed.
Make write behavior less ambiguous.
v5. (Jason)
Add more detail above WRITES_RESOLVE_AMBIGUATE.
Add ISL_AUX_USAGE_MC to WritesResolveAmbiguate.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2957>
This lowers mediump FS outputs to fp16 in the ir3 backend. For now
this is a modest improvement, which mostly helps us whittle down the
full mediump work. Once the GLSL level support lands, then right hand
side of the store output intrinsics will be fp16 expressions and we'll
cancel out the fp16 -> fp32 -> fp 16 round trip here.
We've had different attempts at implementing this: rewriting stores in
the GLSL IR, lowering GLSL IR outputs to temporaries and inserting
conversions when writing the temporaries to the outputs. In the end,
GLSL ends up getting in the way a lot and doing it at the nir level is
easier and still possible since we have the output var precisions.
This part of the fp16 work is more of a step on the way towards full
fp16 support and will add a few extra conversion instructions:
total instructions in shared programs: 8151 -> 8163 (0.15%)
instructions in affected programs: 1187 -> 1199 (1.01%)
helped: 4
HURT: 10
total nops in shared programs: 3146 -> 3152 (0.19%)
nops in affected programs: 563 -> 569 (1.07%)
helped: 5
HURT: 10
total non-nops in shared programs: 5005 -> 5011 (0.12%)
non-nops in affected programs: 92 -> 98 (6.52%)
helped: 0
HURT: 3
total dwords in shared programs: 12832 -> 12800 (-0.25%)
dwords in affected programs: 96 -> 64 (-33.33%)
helped: 1
HURT: 0
total last-baryf in shared programs: 118 -> 115 (-2.54%)
last-baryf in affected programs: 21 -> 18 (-14.29%)
helped: 1
HURT: 0
total full in shared programs: 424 -> 417 (-1.65%)
full in affected programs: 15 -> 8 (-46.67%)
helped: 7
HURT: 0
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
So far we only handle full regs of arrays during pre-allocation.
This patch is to handle half regs of arrays and also consider the size
of half regs when finding out conflicts.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
This eliminates conversions between f16 and f32 where possible. We can
always remove an upcast followed by a down cast, that is:
f2f16 ( f2f32 (a) ) -> a
f2fmp ( f2f32 (a) ) -> a
In the other direction, f2f16 loses precision and can't be undone by a
f2f32. However, by definition it's always safe to elminate f2fmp:
f2f32 ( f2fmp (a) ) -> a
v2. [Neil Roberts (nroberts@igalia.com)]
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
This pass tries to fold f2f16 conversion into alu instructions.
This will be useful to help reduce the number of instructions once
mesa starts supporting precision lowering. For example:
add.f r0.w, r0.w, c0.x
cov.f32f16 hr2.x, r0.w
to
add.f hr2.x, r0.w, c0.x
Additionally this pass also tries to fold f2f16 conversion into load_input
instruction:
bary.f r0.x, 3, r0.w
cov.f32f16 hr0.x, r0.x
to
bary.f hr1.x, 3, r0.x
v2: Edit to not fold OPC_MAX_F and OPC_MIN_F, since that's not valid.
v3: Add OPC_ABSNEG_F to the blacklist as well.
v4: Don't remove dead cov instructions, DCE will do that later; don't
iterate through sources when a cov only has one; remove special
handling of IR3_REG_ARRAY and IR3_REG_RELATIV.
v5: Handle folding into u32.u32 movs of floats correctly, don't bail
out on IR3_REG_RELATIV or IR3_REG_ARRAY movs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
This opcode is the same as the f2f16 opcode except that it comes with
a promise that it is safe to optimise it out if the result is
immediately converted back to a 32-bit float again. Normally this
would be a lossy conversion and so it would be visible to the
application, but if the conversion is generated as part of the mediump
lowering process then this removal doesn’t matter. The opcode is
eventually replaced with a regular f2f16 in the late optimisations so
the backends don’t need to handle it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3822>
When bringing up a new board or starting a new GLES version, we have a lot
of unexpected fails to document, so we need the full list in the log (not
just deqp-runner.sh's head -n 50) so we can populate the xfail list.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3883>
LAVA finds a '#' early in boot and races to emit its shell commands.
Apparently for the current boards those serial commands end up getting
buffered such that things work out, but for db410c and db820c, the buffer
is lost and LAVA gets stuck waiting for the prompt. By setting a prompt,
we can delay our commands until we're actually supposed to emit them (and
suppress a complaint from the lava dispatcher that we're using a risky
prompt!)
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3883>
Because it's in double-quotes, it will search the current folder before
any search paths. Since nir_builder.h and nir_builtin_builder.h are in
the same folder, this guarantees a correct include. However,
nir/nir_builder.h does not unless the includer's path is set up just
right.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3897>
To facilitate lowering SSBOs to globals, we need a load_ssbo_address
intrinsic. This intrinsic takes an SSBO index and loads the address in
global memory of the SSBO (likely implemented via a uniform in the
driver). In the future, we'll support bounds checking, but at the moment
this is not supported (this pass should only be used for trusted
contexts at the moment, i.e. contexts without robustness extensions).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2753>
Rather than creating partially within the Gallium create function and
monkeypatching on draw time with code split across N different files
with tight Gallium dependencies, let's streamline everything into a
series of maintainable routines in mesa/src/panfrost with no Gallium
dependencies, doing the entire texture creation in one-shot and thus
adding absolutely zero draw-time overhead (since we can allocate a BO
for the descriptor and upload ahead-of-time, so switching textures is as
cheap as switching pointers).
Was this worth it? You know, I'm not sure :|
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3858>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3858>
When os_memory_debug.h was promoted to src/util, this source-file on
which it depends on when the debug-flag is set on windows was left
out. So let's move this also.
It doesn't seem there's any way of triggering this issue right now, but
it seems better to correct this to avoid this from biting us in the ass
in the future.
Fixes: 88c4680b5a ("util: promote u_memory to src/util")
Reviewed-by: Dylan Baker <dylan@pnwbakers>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3844>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3844>
Generating PLB PP stream is expensive. PLB PP stream content depends on
damage, and if damage consists of several rects it's impossible to come
up with a simple key.
Simplify damage to a single bounding box so we have a simple key
and cache PLB PP stream. Cache size is limited to 0.1% of system RAM and
once limit is reached least recently used entries are dropped.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3834>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3834>
The other source of the multiply will be interpreted as a uint32_t in an
XOR instruction. Any source modifiers with either not be interpreted at
all or will be misinterpreted due to the differing types.
If the other operand of the multiplication has a source modifier, just
emit an extra move to resolve the source modifiers.
The negation source modifier problem is difficult to reproduce due to an
algebraic optimization that changes (-a*b) to -(a*b). However, changes
in MR !1359 push the negations back down.
On Gen7+ it might be possible to do slightly better for an abs() source
modifier by using BFI2 as a glorified copysign().
On Gen8+ it might be possible to do slightly better for a neg() source
modifier by emitting (~a ^ b).
There were no shader-db changes on any Intel platform, so I think we can
deal with that problem when it arises.
See also piglit!224.
Fixes: 06d2c11641 ("intel/fs: Add a scale factor to emit_fsign")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3780>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3780>
Quiet warnings when called with a GLubyte:
src/mesa/main/get.c:3215:19: warning: result of comparison of constant 32767 with expression of type 'GLubyte' (aka 'unsigned char') is always false [-Wtautological-constant-out-of-range-compare]
params[0] = INT_TO_FIXED(((GLubyte *) p)[0]);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/mesa/main/get.c:78:38: note: expanded from macro 'INT_TO_FIXED'
~~~ ^ ~~~~~~~~
Delete ENUM_TO_INT64, ENUM_TO_FIXED and BOOLEAN_TO_INT64 which aren't
used.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3866>
Not initializing prim.indexed caused a few thousand failures on Intel
drivers.
I also compared the generated assembly with this change and before
a6d3158909. The code is still somewhat improved, which I am assuming
was the original goal. _mesa_DrawArrays, for example, appears to drop an
instruction or two... though the body of the function is only one byte
shorter.
MR !3591 will eventually delete the uninitialized fields. However, I
believe that explicitly initializing the whole thing is more future
proof. This ensures that if someone adds fields in the future, they
will also be initialized. Once the extra fields are removed, the two
implementations should generate idential code.
Fixes: a6d3158909 ("mesa: don't use memset in glDrawArrays")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3870>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3870>
The FLUSH_VERTICES macro is supposed to be called before the current
context state is changed.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
../src/util/blob.c:166:7: runtime error: null pointer passed as argument 2, which is declared to never be null
#0 0x7fe51bc315df in blob_write_bytes ../src/util/blob.c:166
#1 0x7fe51c7a7b9a in iris_disk_cache_store ../src/gallium/drivers/iris/iris_disk_cache.c:115
#2 0x7fe51c7f444d in iris_compile_fs ../src/gallium/drivers/iris/iris_program.c:1693
#3 0x7fe51c7fdcd9 in iris_create_fs_state ../src/gallium/drivers/iris/iris_program.c:2331
#4 0x7fe519e871a3 in st_create_fp_variant ../src/mesa/state_tracker/st_program.c:1275
#5 0x7fe519e89dd0 in st_get_fp_variant ../src/mesa/state_tracker/st_program.c:1435
#6 0x7fe519ed51e1 in st_update_fp ../src/mesa/state_tracker/st_atom_shader.c:163
#7 0x7fe519eb5d73 in st_validate_state ../src/mesa/state_tracker/st_atom.c:261
#8 0x7fe519e4e0bf in prepare_draw ../src/mesa/state_tracker/st_draw.c:132
#9 0x7fe519e4e76e in st_draw_vbo ../src/mesa/state_tracker/st_draw.c:184
#10 0x7fe51aca5245 in vbo_save_playback_vertex_list ../src/mesa/vbo/vbo_save_draw.c:215
#11 0x7fe51a25b1cc in ext_opcode_execute ../src/mesa/main/dlist.c:1126
#12 0x7fe51a2f8d58 in execute_list ../src/mesa/main/dlist.c:11830
#13 0x7fe51a34b2d0 in _mesa_CallList ../src/mesa/main/dlist.c:14267
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3825>
../src/mesa/drivers/dri/i965/intel_buffer_objects.c:405:4: runtime error: left shift of 255 by 24 places cannot be represented in type 'int'
#0 0x7f9404ac4ae1 in brw_map_buffer_range ../src/mesa/drivers/dri/i965/intel_buffer_objects.c:405
#1 0x7f9405a9cb13 in vbo_save_map_vertex_store ../src/mesa/vbo/vbo_save_api.c:261
#2 0x7f9405b6a89d in vbo_save_NewList ../src/mesa/vbo/vbo_save_api.c:1774
#3 0x7f94051aba3d in _mesa_NewList ../src/mesa/main/dlist.c:14172
../src/gallium/drivers/iris/iris_resource.c:1725:61: runtime error: left shift of 255 by 24 places cannot be represented in type 'int'
#0 0x7fe51c820c8e in iris_map_direct ../src/gallium/drivers/iris/iris_resource.c:1725
#1 0x7fe51c82322c in iris_transfer_map ../src/gallium/drivers/iris/iris_resource.c:1895
#2 0x7fe5202628be in u_transfer_helper_transfer_map ../src/gallium/auxiliary/util/u_transfer_helper.c:243
#3 0x7fe51997c508 in pipe_buffer_map_range ../src/gallium/auxiliary/util/u_inlines.h:344
#4 0x7fe51997ec8d in u_upload_alloc_buffer ../src/gallium/auxiliary/util/u_upload_mgr.c:221
#5 0x7fe51997f24f in u_upload_alloc ../src/gallium/auxiliary/util/u_upload_mgr.c:254
#6 0x7fe51ccf43af in upload_state ../src/gallium/drivers/iris/iris_state.c:323
#7 0x7fe51d06963a in gen9_init_state ../src/gallium/drivers/iris/iris_state.c:7516
#8 0x7fe51c7c2ea0 in iris_create_context ../src/gallium/drivers/iris/iris_context.c:294
#9 0x7fe519dc729b in st_api_create_context ../src/mesa/state_tracker/st_manager.c:921
#10 0x7fe5198c47ea in dri_create_context ../src/gallium/state_trackers/dri/dri_context.c:161
#11 0x7fe519898aac in driCreateContextAttribs ../src/mesa/drivers/dri/common/dri_util.c:475
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3825>
../src/intel/compiler/brw_nir_analyze_ubo_ranges.c:316:4: runtime error: null pointer passed as argument 1, which is declared to never be null
#0 0x7f78f5916611 in brw_nir_analyze_ubo_ranges ../src/intel/compiler/brw_nir_analyze_ubo_ranges.c:316
#1 0x7f78f255c189 in brw_codegen_wm_prog ../src/mesa/drivers/dri/i965/brw_wm.c:97
#2 0x7f78f2565571 in brw_fs_precompile ../src/mesa/drivers/dri/i965/brw_wm.c:608
#3 0x7f78f24edd2c in brw_shader_precompile ../src/mesa/drivers/dri/i965/brw_link.cpp:56
#4 0x7f78f24f3af8 in brw_link_shader ../src/mesa/drivers/dri/i965/brw_link.cpp:381
#5 0x7f78f39a302a in _mesa_glsl_link_shader ../src/mesa/program/ir_to_mesa.cpp:3119
#6 0x7f78f3a43826 in create_new_program ../src/mesa/main/ff_fragment_shader.cpp:1133
#7 0x7f78f3a43d00 in _mesa_get_fixed_func_fragment_program ../src/mesa/main/ff_fragment_shader.cpp:1163
#8 0x7f78f325ddcd in update_program ../src/mesa/main/state.c:134
#9 0x7f78f325fe64 in _mesa_update_state_locked ../src/mesa/main/state.c:360
#10 0x7f78f32600f1 in _mesa_update_state ../src/mesa/main/state.c:394
#11 0x7f78f2b3e587 in clear ../src/mesa/main/clear.c:169
#12 0x7f78f2b3e587 in _mesa_Clear ../src/mesa/main/clear.c:242
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3825>
../src/intel/compiler/brw_nir.c:979:40: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
#0 0x7f78f590d10b in brw_nir_apply_sampler_key ../src/intel/compiler/brw_nir.c:979
#1 0x7f78f590e07b in brw_nir_apply_key ../src/intel/compiler/brw_nir.c:1057
#2 0x7f78f5754b45 in brw_compile_fs ../src/intel/compiler/brw_fs.cpp:8347
#3 0x7f78f255c8e4 in brw_codegen_wm_prog ../src/mesa/drivers/dri/i965/brw_wm.c:123
#4 0x7f78f2565571 in brw_fs_precompile ../src/mesa/drivers/dri/i965/brw_wm.c:608
#5 0x7f78f24edd2c in brw_shader_precompile ../src/mesa/drivers/dri/i965/brw_link.cpp:56
#6 0x7f78f24f3af8 in brw_link_shader ../src/mesa/drivers/dri/i965/brw_link.cpp:381
#7 0x7f78f39a302a in _mesa_glsl_link_shader ../src/mesa/program/ir_to_mesa.cpp:3119
#8 0x7f78f3a43826 in create_new_program ../src/mesa/main/ff_fragment_shader.cpp:1133
#9 0x7f78f3a43d00 in _mesa_get_fixed_func_fragment_program ../src/mesa/main/ff_fragment_shader.cpp:1163
#10 0x7f78f325ddcd in update_program ../src/mesa/main/state.c:134
#11 0x7f78f325fe64 in _mesa_update_state_locked ../src/mesa/main/state.c:360
#12 0x7f78f32600f1 in _mesa_update_state ../src/mesa/main/state.c:394
#13 0x7f78f2b3e587 in clear ../src/mesa/main/clear.c:169
#14 0x7f78f2b3e587 in _mesa_Clear ../src/mesa/main/clear.c:242
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3825>
This is a builtin type (treated as uint, but with special type-aware
decoding) in envytools/cffdump. Lets teach gen_header.py about it and
drop the enum hack in the xml so I don't have to keep deleting the enum
when I sync the xml back to the freedreno envytools tree.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3833>
The interpretation of the fields is different depending whether the
instruction is a SEND/MATH or not.
This fixes the disassembly output for non-SEND/MATH instructions that
have both in-order and out-of-order dependencies. Their dependencies
were wrongly represented as `@A $B` when the correct would be `@A
$B.dst`.
Fixes: 6154cdf924 ("intel/eu/gen12: Add auxiliary type to represent SWSB information during codegen.")
Fixes: 83612c0127 ("intel/disasm/gen12: Disassemble software scoreboard information.")
Acked-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3660>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3660>
Rather than manipulating job descriptor headers as fat pointers (slow)
and using fancy manipulation functions for programatically building the
tree in arbitrary orders (slow and complicated) and then having to do a
topological sort at runtime every frame (slow) which requires traversing
said headers in GPU memory (slow!)... we finally know enough about
the hardware to just get things right the first time, or second for
next_job linking. So rip out all that code and replace it with a much
better routine to create, upload, and queue a job all in one (since now
it's the same operation essentially - which is much better for memory
access patterns, by the way) and most everything falls into place
gracefully according to the rules we've set out. Even wallpapering isn't
*so* terrible if you just... move that one little... giant... hack out
of sight... ahem....
panfrost_scoreboard_link_batch is no longer a bottleneck, mostly because
it no longer exists :-)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3836>
The code is about to change so the whole uint-story isn't as true as it
used to be. So let's soften up the semantics a bit here; we only care
about if we're doing a typed ot untyped store here, really.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3763>
Rewriting the variable bindings is nasty and error-prone, and this code
triggered an assert when trying to resolve API bindings into Vulkan
bindings.
This code still needs some tweaks, but this makes things much better,
and fixes a few bugs where we incorrectly accounted for the
array-indexes.
Fixes: 1c3f4c0704 ("zink: fixup sampler-usage")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3826>
clear info is needed when submit flush and may be changed after
framebuffer switch, so we need to move it into submit.
This also fixes 5 dEQP tests as a side effect: clear info is per
submit so clear value when one submit won't affect next submit.
v2:
remove fixed dEQP test from CI list.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3755>
We need to flush submit which write to the FBO before read it as
texture.
v2:
rename lima_flush_previous_write_submit to
lima_flush_previous_submit_writing_resouce.
v3:
delay add submit to hash_table to lima_update_submit_wb when really
know the render target will be written.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3755>
lima_submit is created by lima_submit_get() in draw/clear functions
and freed after submit to kernel.
There is a hash map to find the same lima_submit for color/depth
buffer combination if user switch framebuffer w/o flush the command
then switch back again.
v2:
rename lima_flush_submit to lima_flush_submit_accessing_bo.
v3:
delay flush access submit to lima_update_submit_wb when really know
if this submit will write to the target buffer.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3755>
lima_ctx_buff is used cross function calls in draws. But plbu/vs_cmd
and pp_stack are used immediately after create. And we need to get
rid of "ctx->submit" in the flush code path which exists in
lima_ctx_buff.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3755>
"ctx->submit" won't be used directly, so remove the usage in
lima_submit.c by directly passing the submit parameter. And
more data in lima_context will be moved to lima_submit, so
"ctx" will be replaced by "submit" in those functions.
v2:
rename lima_submit_flush to lima_do_submit.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3755>
Prepare for multi submit in which case only know the plb_index when
final flush. Another need for this is proper reload detection: only
when flush stage can we know if need to do reload, because user may
first clear depth, then color individually, so we don't know if need
to pack repload plbu cmd when first clear depth.
v2:
move lima_update_submit_wb before ctx->resolve change for lima_ctx_dirty
to work in lima_submit_wb.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3755>
We don't have any shared lima_ctx_buff for both GP and PP,
so no need to have these flags.
v2:
still add submit in lima_ctx_buff_va because some bo need
to exist cross flush, so not every submit will call
lima_ctx_buff_alloc.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3755>
Barriers execute on the texture pipeline on Midgard, so let's
tentatively handle barrier() as conservatively as possible (forcing
memory barriers of both buffers and shared memory). Implementation isn't
quite there yet -- it doesn't look at interactions of adjacent barriers
like it's supposed to -- but the core is there.
Fixes dEQP-GLES31.functional.compute.basic.ssbo_local_barrier_single_invocation
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3835>
Unlike other GPUs, Mali does not have dedicated shared memory for
compute workloads. Instead, we allocate shared memory (backed to RAM),
and the general memory access functions have modes to access shared
memory (essentially, think of these modes as adding this allocates base
+ workgroupid * stride in harder). So let's allocate enough memory
based on the shared_size parameter and supply it.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3835>
(And bifrost_fb_extra to mali_framebuffer_extra, bifrost_render_target
to mali_render_target)
These structures are the norm on midgard t760+, drop the bifrost names,
it's silly... unrelated to the rest of the series but while I'm messing
with pandecode and cleaning up bifrost abstractions, might as well.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3835>
It's a complicated story. But from what I can tell, in GL compute
without barriers, the blob is able to redistribute the workgroups in
various ways (that are not yet understood), whereas with barriers it
cannot redistribute anything, which accounts for erratic workgroup
packing without barriers.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3835>
At this point this simply involves fixing the initialization of the
sample mask flag register to take the right dispatch mask from the
thread payload, and fixing sample_mask_reg() to return f1.1 for the
second half of a SIMD32 thread. This improves Manhattan 3.1
performance by 2.4%±0.31% (N>40) on my ICL with SIMD32 enabled
relative to falling back to SIMD16 for the shaders that use discard.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In SIMD32 programs that don't use discard, the upper 16 bits of the UD
result of sample_mask_reg() don't contain the sample mask of the upper
16 channels as one would expect. Stop pretending we are returning a
valid 32-bit mask.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
FIND_LIVE_CHANNEL was using f1.0-f1.1 as temporary flag register on
Gen7, instead use f0.0-f0.1. In order to avoid collision with the
discard sample mask, move the latter to f1.0-f1.1. This makes room
for keeping track of the sample mask of the second half of SIMD32
programs that use discard.
Note that some MOVs of the sample mask into f1.0 become redundant now
in lower_surface_logical_send() and lower_a64_logical_send().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>x
Use it instead of hard-coding f0.1 for the sample mask of programs
that use discard. This will make the task easier when we replace f0.1
with another flag register location in order to support discard with
SIMD32 shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's only really useful there. This will avoid confusion with another
helper with a similar purpose I'm about to introduce that will be
useful in multiple files from the FS back-end.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The SIMD8 dual-source blending framebuffer write messages seem to have
trouble releasing the pixel scoreboard dependency in SIMD32 dispatch
mode, which leads to hangs. I have a better workaround for this which
doesn't involve disabling SIMD32 when dual-source blending is enabled,
but I'm still investigating some issues with it. Limit the dispatch
width to SIMD16 in such cases for the moment in order to make the CI
happy on ICL with SIMD32 fragment shaders enabled.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently the "Source0 Alpha Present to RenderTarget" bit of the RT
write message header is derived from brw_wm_prog_data::replicate_alpha.
However the src0_alpha payload is provided anytime it's specified to
the logical message. This could theoretically lead to an
inconsistency if somebody provided a src0_alpha value while
brw_wm_prog_data::replicate_alpha was false, as I'm planning to do in
a future commit in order to implement a hardware workaround.
Instead calculate the header bit based on whether a src0_alpha value
was provided to the logical message, which guarantees the same
behavior on pre-ICL and ICL+ (the latter used an extended descriptor
bit for this which didn't suffer from the same issue). Remove the
brw_wm_prog_data::replicate_alpha flag.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Together with the fixup_nomask_control_flow() pass introduced in a
previous patch, this implements a less invasive alternative to the
workaround documented in the hardware spec for GEN:BUG:1407528679,
which doesn't involve disabling structured control flow.
Under some conditions Gen12 hardware can end up executing a BB with
all channels disabled, which will lead to the execution of any NoMask
instructions in it, even though any execution-masked instructions will
be correctly shot down. This could break assumptions of the SWSB pass
if the data computed by a NoMask instruction is synchronized against
by using an SWSB annotation baked into a regular execution-masked
instruction, since the first (NoMask) instruction may be executed
redundantly by the hardware, even though the second will correctly be
shot down, potentially leading to a RaW or WaW hazard if a third
instruction subsequently accesses the destination register of the
first instruction.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
Found by inspection. Existing code was trying to avoid assuming that
an SBID had been assigned to the virtual instruction, but
synchronizing the header setup with respect to the previous SIMD16
SEND by using SYNC.ALLRD doesn't really seem possible unless the SEND
instruction had been assigned an SBID. Assert-fail instead if no SBID
has been allocated.
Fixes: 15e3a0d9d2 "intel/eu/gen12: Set SWSB annotations in hand-crafted assembly."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
This is a less invasive alternative to the workaround documented in
the hardware spec for GEN:BUG:1407528679, which doesn't involve
disabling structured control flow (it's unlikely that switching to
GOTO/JOIN would have actually fixed the problem anyway).
Under some conditions Gen12 hardware can end up executing a BB with
all channels disabled, which will lead to the execution of any NoMask
instructions in it, even though any execution-masked instructions will
be correctly shot down. This may break assumptions of some NoMask
SEND messages whose descriptor depends on data generated by live
invocations of the shader.
This avoids the problem by predicating certain instructions on an ANY
horizontal predicate that makes sure that their execution is omitted
when all channels of the program are disabled. The shader-db impact
of this patch seems to be minimal:
total instructions in shared programs: 17169833 -> 17169913 (0.00%)
instructions in affected programs: 30663 -> 30743 (0.26%)
helped: 0
HURT: 42
total cycles in shared programs: 336966176 -> 336968568 (0.00%)
cycles in affected programs: 2367290 -> 2369682 (0.10%)
helped: 0
HURT: 13
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
We need to pass a width of 32 since the opcode bashes the whole f1.0
register on IVB. This is unlikely to have caused problems since f1.0
is largely unused currently. That's likely to change soon though,
even on platforms other than Gen7.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
Found by inspection. This seems particularly likely to cause problems
with instructions dependent on the current execution mask like
SHADER_OPCODE_FIND_LIVE_CHANNEL or the FS_OPCODE_LOAD_LIVE_CHANNELS
instruction I'm about to introduce, but one could imagine it leading
to data corruption if CSE ever managed to combine two instructions
before and after the FS_OPCODE_PLACEHOLDER_HALT, since the one before
may not be executed for some channels.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
Gen12 added CCS_E support for A8_UNORM. Intercept A8_UNORM format and
switch to R8_UNORM, as both share the same aux map format encoding so
they are compatible.
Fixes Piglit's ext_framebuffer_multisample-formats all_samples, which
was hitting an assert about A8_UNORM and R8_UINT not being CCS_E
compatible formats.
v2: Add gen check (Kenneth Graunke)
v3: Intercept A8_UNORM and set format to R8_UNORM (Jason Ekstrand)
v4:
- Remove gen check and move block little bit down (Jason Ekstrand)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3719>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3719>
CI didn't run so missed this.
Note previously had :
texfmt = TFMT6_Z24_UNORM_S8_UINT
rbfmt = RB6_Z24_UNORM_S8_UINT_AS_R8G8B8A8
which are both now FMT6_Z24_UNORM_S8_UINT_AS_R8G8B8A8
Fixes: 18786cc7d5 ("freedreno/a6xx: use single format enum")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3804>
Adding a new CSO proved to be fairly difficult especially because this
extension affect draw/dispatch/blit alike.
Instead this change passes the state of the noop into the entry points
emitting the operations affected.
v2: Fix assert in default pipe caps
v3: Drop whitespace changes (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2964>
v2: condition the extension on context isolation support from the
kernel (Chris)
v3: (Lionel)
The initial version of this change used a feature of the Gen7+
command parser to turn the primitive instructions into no-ops.
Unfortunately this doesn't play well with how we're using the
hardware outside of the user submitted commands. For example
resolves are implicit operations which should not be turned into
no-ops as part of the previously submitted commands (before
blackhole_render is enabled) might not be disabled. For example
this sequence :
glClear();
glEnable(GL_BLACKHOLE_RENDER_INTEL);
glDrawArrays(...);
glReadPixels(...);
glDisable(GL_BLACKHOLE_RENDER_INTEL);
While clear has been emitted outside the blackhole render, it
should still be resolved properly in the read pixels. Hence we
need to be more selective and only disable user submitted
commands.
This v3 manually turns primitives into MI_NOOP if blackhole render
is enabled. This lets us enable this feature on any platform.
v4: Limit support to gen7.5+ (Lionel)
v5: Enable Gen7.5 support again, requires a kernel update of the
command parser (Lionel)
v6: Disable Gen7.5 again... Kernel devs want these patches landed
before they accept the kernel patches to whitelist INSTPM (Lionel)
v7: Simplify change by never holding noop (there was a shortcoming in the test not considering fast clears)
Only program register using MI_LRI (Lionel)
v8: Switch to software managed blackhole (BDW hangs on compute batches...)
v9: Simplify the noop state tracking (Lionel)
v10: Don't modify flush function (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v8)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2964>
==7265== 248 (120 direct, 128 indirect) bytes in 1 blocks are definitely lost in loss record 1,438 of 1,465
==7265== at 0x483980B: malloc (vg_replace_malloc.c:309)
==7265== by 0x598A2AB: ralloc_size (ralloc.c:119)
==7265== by 0x598F861: _mesa_set_create (set.c:127)
==7265== by 0x599079D: _mesa_pointer_set_create (set.c:570)
==7265== by 0x58BD7D1: build_program_resource_list(gl_context*, gl_shader_program*, bool) (linker.cpp:4026)
==7265== by 0x548231B: st_link_shader (st_glsl_to_ir.cpp:170)
==7265== by 0x54DA269: _mesa_glsl_link_shader (ir_to_mesa.cpp:3119)
Fixes: a6aedc66 ("st/glsl_to_nir: use nir based program resource list builder")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3574>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3574>
Similar to vkCmdClearAttachments(), we use CP_COND_REG_EXEC to
conditionally execute both the gmem and sysmem paths, except for after
the last subpass where it's known whether we're using sysmem rendering
or not.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3713>
This has only lightly been tested. It passes dEQP-VK.api.smoke.triangle,
so at least we're able to show a triangle. For now, it's just enabled
under a debug flag. In the future we'll probably want some heuristics
like what freedreno has and another debug flag to disable it except when
it's forced.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3713>
We may need shader workarounds for some formats, but for now this seems
to work at least as well as the gmem path for clearing multisample
attachments. And soon we'll start calling this even on the gmem path,
since we leave the final decision of whether to use sysmem or not up
till the end, so we can't have it assert or otherwise working tests
would assert.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3713>
I merely ported a freedreno patch to turnip which
updates some magic regsiter values.
commit ff6e148a3d
Author: Rob Clark <robdclark@chromium.org>
CommitDate: Tue Oct 29 09:19:34 2019 -0700
Subject: freedreno/a6xx: add a618 support
That's all that Rob did for gallium for a618, so I assume that's we need
for turnip also.
Tested manually with:
dEQP-VK.api.image_clearing.core.clear_color_image.2d.linear.single_layer.*
pass 300/555
fail 0/555
skip 255/555
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3743>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3743>
The value of some magic regsiters differ across chipsets. fd6_context
manages the differences by initializing them at runtime. Let's do the
same.
Add to tu_physical_device a subset of those found in fd6_context:
RB_UNKNOWN_8E04_blit
RB_CCU_CNTL_gmem
PC_UNKNOWN_9805
SP_UNKNOWN_A0F8
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3743>
This adds support for OpVariable having an initializer that points to
another variable, rather than a constant. In this case, the variable is
initialized to a pointer to the other variable.
Fixes Vulkan CTS tests:
dEQP-VK.spirv_assembly.instruction.compute.variable_init.private.*
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3047>
Add a pointer_initializer field to nir_variable analogous to
constant_initializer, which can be used to initialize the nir_variable
to a pointer to another nir_variable. Just like the
constant_initializer, the pointer_initializer gets eliminated in the
nir_lower_constant_initializers pass.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3047>
This patch fix these build errors with GCC 10.
/usr/bin/ld: src/gallium/drivers/panfrost/libpanfrost.a(pan_resource.c.o):src/panfrost/midgard/midgard_compile.h:52: multiple definition of `pan_sysval'; src/gallium/drivers/panfrost/libpanfrost.a(pan_screen.c.o):src/panfrost/midgard/midgard_compile.h:52: first defined here
/usr/bin/ld: src/gallium/drivers/panfrost/libpanfrost.a(pan_resource.c.o):src/panfrost/midgard/midgard_compile.h:68: multiple definition of `pan_special_attributes'; src/gallium/drivers/panfrost/libpanfrost.a(pan_screen.c.o):src/panfrost/midgard/midgard_compile.h:68: first defined here
Fixes: 7e8de5a707 ("panfrost: Implement system values")
Fixes: 306800d747 ("pan/midgard: Lower gl_VertexID/gl_InstanceID to attributes")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3752>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3752>
This reverts the functional part of commit
d17ff2f7f1, leaving the unit test for
mesa/pipe agreement on what's an array.
The issue is that the util_channel_desc.shift values on array formats are
not used for bit addressing in memory, they're bit addressing within a
word treating a pixel of the format as a native type, as seen by
llvmpipe's use of the values to do shifts (see
lp_build_unpack_arith_rgba_aos() for example). This means the values are
nonsensical for 3-byte RGB, but then llvmpipe doesn't expose those formats
so it works out.
I still want to clean up our big-endian format handling at some point, but
let's fix the s390x regression first, sort out our format unit tests in
CI, then be able to refactor with confidence.
Fixes: d17ff2f7f1 ("gallium: Fix big-endian addressing of non-bitmask array formats.")
Closes: #2472
Acked-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3721>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3721>
r600 reads vec4 from the UBO, but the offsets in nir are evaluated to the component.
If the offsets are not literal then all non-vec4 reads must resolve the component
after reading a vec4 component (TODO: figure out whether there is a consistent way
to deduct the component that is actually read).
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3225>
This commit adds support for vertex and fragment shaders from NIR, and
support for most TEX and ALU instructions.
Thanks Dave Airlied for adding support for a number of ALU instructions.
v2: fix compilation with gcc-6
v3: rebase: use mesa/core glsl_type_size function
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3225>
Use pipe_shader_state_from_tgsi() to set shader state for transformed
shader so that we get all correct data for respective shader state.
This fixes several regressed glretrace, piglit crashes found during merging
upsteam mesa
Fixes: bf12bc2dd7 (draw: add nir info gathering and building support)
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Since we are now using sparse matrix for format_conversion_table,
we have to make sure we have last entry in table which gives the
sense of required size of format_conversion_table
Fixes: 84db6ba7 ("svga: Drop unsupported formats from the format table")
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
It seems I got this stuff all wrong, and looked at driver_location
rather than the binding. But since we mess with the binding, we need to
adjust things a bit to get things right.
This still isn't great as-is, but it seems to work. In the future, we
should move to having samplers always at bindings 0 and up, and just
update the bindings that are used by either of the stages. But this
band-aid should be OK for now.
This fixes 0AD for me.
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3668>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3668>
The previous commit leads to match immed values unexpectedly.
This makes constlen for each shader including bvert wrong.
Also fixes atan2 for mediump deqp tests.
Fixes: cbd1f47433 ("freedreno/ir3: convert back to 32-bit values for half constant registers.")
v2: Move conversion up above fabs/fneg modifier handling as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3737>
The freedreno gen_header.py script now only works under python3.
It contains a "print()" call which prints a blank line under python3
but prints "()" under python2.7.
However the Android build currently uses python2.
This leads to incorrect code generation and a later build error.
.../STATIC_LIBRARIES/libfreedreno_registers_intermediates/registers/adreno_common.xml.h:163:2: error: expected identifier or '('
()
Fix this by adding MESA_PYTHON3 and using it for the freedreno scripts.
Signed-off-by: Martin Fuzzey <martin.fuzzey@flowbird.group>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3736>
On a daily basis I've been having to restart people's a630 jobs in the
front couple of pages of /merge_requests due to spurious failures from our
flaky tests, and fielding reports of spurious fails from other developers,
and babysitting my own marge merges that are failing due to our flakes.
Nobody should have to deal with that, especially not non-freedreno
developers, so just scrape the list of flakes reported to #freedreno-ci
for the last month and ban those tests that have failed more than once
until we have a credible fix.
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3662>
We were generating a packed copy and then memcpying it, but we can just
pack directly to the destination. Change on glmark2 -b build:use-vbo=true
is modest: 1.06328% +/- 0.994771% (n=84) but does remove the function that
was .6% of CPU time.
I'm not doing the equivalent "get" path at this time because softpipe's
texture cache has some clipping issues that get revealed.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3698>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3698>
Since 9254fb4fc7, the pass replaced the SCC clobber with the scalar
identity temporary. Just skip most of the temporary setup, since we don't
need it for gfx10_wave64_bpermute.
Although shuffles are disabled on GFX10, Detroit: Become Human seems to
use them anyway.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Fixes: 9254fb4fc7 ('aco: don't use a scalar
temporary for reductions on GFX10')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3683>
This was adding "#define reserved 2" to genxml includes, which is a
fairly mean lowercase word to redefine. It ends up breaking the build
on Android, which has __u32 reserved fields in headers.
Defining it also has no purpose. Just drop it.
Fixes: 5bea0cf779 ("intel/isl: Move iris's pipe-to-isl format function to isl.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3729>
This function is responsible for completing the layout for an imported
resource with the given modifier. Returns 0 on success or -1 If the
modifier is unsupported, invalid or the input parameters are not
compatible with the modifier.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3704>
This means you can directly use format utils on it without having to have
your own GL enum to number-of-components switch statement (or whatever) in
your vulkan backend.
Thanks to imirkin for fixing up the nouveau driver (and a couple of core
details).
This fixes the computed qualifiers for EXT_shader_image_load_store's
non-integer sizeNxM qualifiers, which we don't have tests for.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v3d)
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355>
This will get reused in the shader compiler once we switch it over to pipe
formats instead of GL enums. We can't easily deduplicate i965's
mesa-to-isl mapping because of cases like A32_FLOAT that are mapped
differently.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355>
We discovered 2 new shader flags used when a fragment shader updates
the depth/stencil value through a ZS writeout. If those flags are not
set, the depth/stencil value stored in the depth/stencil tilebuffer
remain unchanged.
While at it, rename unknown2 into flags_hi and rename flags into
flags_lo.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3697>
Midgard can't write depth and stencil separately. It has to happen in
a single store operation containing both. Let's add a panfrost specific
intrinsic and turn all depth/stencil stores into a packed depth+stencil
one.
Note that this intrinsic is not yet handled in emit_intrinsic(), but
we'll address that later.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3697>
Using LLVM 8 for ppc64el and 7 for s390x (which hits some coroutine
related issues with LLVM 8).
There are some test failures we need to ignore for now. Also, the
timeout needs to be bumped from the default 30s for some tests, because
they can take longer under emulation.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3643>
The motivation for this is that we want to make use of the meson cross
files in this script, which have the ccache compiler paths.
We need to remove the ccache directory at the end, it would just waste
space in the image for no benefit.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3643>
Among other things, this increases robustness when copying a docker
image from the main Mesa project to a forked project, avoiding spurious
image rebuilds from scratch.
Also drop the comment about .gitlab-ci/lava-gitlab-ci.yml, it doesn't
include the templates anymore.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3643>
The extra bits in CB_SHADER_MASK break dual source blending in
SkQP on a Stoney device. However:
- As far as I can tell, some other dual source blend tests are passing
before and after the change.
- A hacked around skqp passes on my Vega desktop and Raven laptop
- Getting Skqp to give any useful info or to run it outside of Android
on ChromeOS is proving difficult.
I have confirmed 3 strategies that seem to work:
- The old radv behavior of setting CB_SHADER_MASK to 0xF
- AMDVLK: CB_SHADER_MASK = 0xFF, and the 3 RB+ regs
are 0.
- radeonsi: CB_SHADER_MASK = 0xFF, but does not set DISABLE
bits in SX_BLEND_OPT_CONTROL for CB 1-7.
Let us use the radeonsi solution as that solution also seems like the correct
thing to do for holes. I have tested on my Raven laptop that setting the high
surfaces to not disabled and downconvert to 32_R does not imply a performance
penalty.
Fixes: e9316fdfd4 "radv: fix setting CB_SHADER_MASK for dual source blending"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3670>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3670>
The previous caller wasn't using it as tiled, just row-at-a-time, and
didn't want the clipping (since copytexsubimage comes in clipped). If
someone wanted these functions again in the future, they should be
rewritten on u_format_pack/unpack.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2744>
OSMesa is weird and you only get the type (byte/ubyte/565/etc.) at
MakeCurrent time, having only a channel order at CreateContext time. The
code was setting up a visual at CreateContext time, and then at
MakeCurrent it would fail to validate against the visual.
Reviewed-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3216>
Clang has a warning about overloading virtuals that triggers when a
derived class defines a virtual function that's an overload of
function in the base class. This kind of thing:
struct chart; // let's pretend this exists
struct Base
{
virtual void* get(char* e);
};
struct Derived: public Base {
virtual void* get(chart* e); // typo, we wanted to override the same function
};
The solution is to use
using Base::get;
to be explicit about the intention to reuse the base class virtual.
We hit this a lot with out glsl ir_hierarchical_visitor visitor
pattern, so let's adds some 'using' to calm down the compiler.
See-also: https://stackoverflow.com/questions/18515183/c-overloaded-virtual-function-warning-by-clang)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3686>
nir_intrinsic_base() is assigned nir_variable.data.driver_location,
which is assigned a unique ID based on the variable position in the
shader variable list. There's no guarantee that this position will
match the RT id we want to pass to emit_fragment_store().
Add a search_var() helper to retrieve a nir_variable based on its driver
location, so we can pass the right RT value to emit_fragment_store().
We also make sure the shader output is color, since emit_fragment_store()
is not ready for depth/stencil stores yet.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3676>
The blob returns 15, the state creation code clamps it to 15, but since
the dawn of time we've returned 4.0. Setting this to 15 also fixes
GTF-GL33.gtf21.GL3Tests.texture_lod_bias.texture_lod_bias_clamp_m_le_M
which is sensitive to these limits.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This patch fixes this build error with GCC 10.
/usr/bin/ld: src/gallium/drivers/lima/liblima.a(lima_context.c.o):src/gallium/drivers/lima/lima_util.h:32: multiple definition of `lima_dump_command_stream'; src/gallium/drivers/lima/liblima.a(lima_screen.c.o):src/gallium/drivers/lima/lima_util.h:32: first defined here
/usr/bin/ld: src/gallium/drivers/lima/liblima.a(lima_resource.c.o):src/gallium/drivers/lima/lima_util.h:32: multiple definition of `lima_dump_command_stream'; src/gallium/drivers/lima/liblima.a(lima_screen.c.o):src/gallium/drivers/lima/lima_util.h:32: first defined here
/usr/bin/ld: src/gallium/drivers/lima/liblima.a(lima_draw.c.o):src/gallium/drivers/lima/lima_util.h:32: multiple definition of `lima_dump_command_stream'; src/gallium/drivers/lima/liblima.a(lima_screen.c.o):src/gallium/drivers/lima/lima_util.h:32: first defined here
/usr/bin/ld: src/gallium/drivers/lima/liblima.a(lima_bo.c.o):src/gallium/drivers/lima/lima_util.h:32: multiple definition of `lima_dump_command_stream'; src/gallium/drivers/lima/liblima.a(lima_screen.c.o):src/gallium/drivers/lima/lima_util.h:32: first defined here
/usr/bin/ld: src/gallium/drivers/lima/liblima.a(lima_submit.c.o):src/gallium/drivers/lima/lima_util.h:32: multiple definition of `lima_dump_command_stream'; src/gallium/drivers/lima/liblima.a(lima_screen.c.o):src/gallium/drivers/lima/lima_util.h:32: first defined here
/usr/bin/ld: src/gallium/drivers/lima/liblima.a(lima_util.c.o):src/gallium/drivers/lima/lima_util.h:32: multiple definition of `lima_dump_command_stream'; src/gallium/drivers/lima/liblima.a(lima_screen.c.o):src/gallium/drivers/lima/lima_util.h:32: first defined here
/usr/bin/ld: src/gallium/drivers/lima/liblima.a(lima_texture.c.o):src/gallium/drivers/lima/lima_util.h:32: multiple definition of `lima_dump_command_stream'; src/gallium/drivers/lima/liblima.a(lima_screen.c.o):src/gallium/drivers/lima/lima_util.h:32: first defined here
Fixes: d71cd245d7 ("lima: Rotate dump files after each finished pp frame")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
In some cases we need to split components out from what was already a
collect. That was making it hard to DCE unused components of the
collect. (Ie. unused components of fragcoord, etc)
So just detect this case and skip the chained collect+split.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
This was somehow working to create the instructions in a random block,
and use the value in other blocks, by dumb luck. But two-pass-RA's
better choice of register assignment causes a couple dEQPs to start
failing without this fix:
dEQP-GLES3.functional.shaders.metamorphic.bubblesort_flag.variant_1
dEQP-GLES3.functional.shaders.metamorphic.bubblesort_flag.variant_2
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
Running the complete deqp_gles2 seems to trigger an overflow in lrz
cmdstream. We skip the blit clear fast-path if there have been any
draws (so mid-batch clears of any attached buffer hit the 3d pipe).
Which means it is safe to simply discard any lrz clear rendering.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
Some of the aspects of tex prefetch are in common with normal tex
instructions, such as having a wrmask to control which components
are written. Add a helper for this.
This should result in actually using the prefetch wrmask to avoid
fetching unneeded components.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
After RA, we can schedule to increase parallelism (reduce nop's) without
worrying about increasing register pressure. This pass lets us cut down
the instruction count ~10%, and prioritize bary.f, kill, etc, which
would tend to increase register pressure if we tried to do that before
RA.
It should be more useful if RA round-robin'd register choices.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
kill (and other cat0/flow instructions) do not have a dst register.
Which was mostly harmless before, other than RA thinking it would need
a free register to write. (But nothing consumed it, so the value would
be immediately dead.) But this would cause more problems with postsched
which would see a bogus dependency.
Also, post-RA sched *does* need to see the dependency on the predicate
register.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
Originally these were nested functions, which worked nicely, giving us
the function of a local macro that was actual 'c' syntax (ie. not token
pasted macro). But these were converted to macros because clang doesn't
let us have nice gcc extensions.
Extract these back out into functions, before adding more things and
making the macros even more cumbersome.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
This way we can deal with it in one place, *after* all the blocks have
been scheduled. Which will simplify life for a post-RA sched pass.
This has the benefit of already taking into account nop's that legalize
has to insert for non-delay related reasons.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3569>
This patch is required to fix 11K+ vulkan CTS failures we were
getting with way_size_per_bank of 4 (see next patch).
Thanks to Sagar Ghuge and Jordan Justen for all the hard work of
debugging and testing.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Sagar Ghuge<sagar.ghuge@intel.com>
The old code would only break at stride boundaries if the stride was
less than 32B; otherwise it would just break every 32B. This commit
makes it break at stride boundaries and 32B boundaries (starting from
the last stride). This makes reading large vertex buffers in aubinator
much nicer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3642>
GCC 10 added _mm256_storeu2_m128i.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91341
This patch fixes this build error with GCC 10.
In file included from src/gallium/drivers/swr/rasterizer/codegen/gen_knobs.cpp:39:
../src/gallium/drivers/swr/rasterizer/common/os.h:178:20: error: ‘void _mm256_storeu2_m128i(__m128i*, __m128i*, __m256i)’ redeclared inline without ‘gnu_inline’ attribute
178 | static INLINE void _mm256_storeu2_m128i(__m128i* hi, __m128i* lo, __m256i a)
| ^~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-redhat-linux/10/include/immintrin.h:51,
from /usr/lib/gcc/x86_64-redhat-linux/10/include/x86intrin.h:32,
from ../src/gallium/drivers/swr/rasterizer/common/os.h:107,
from src/gallium/drivers/swr/rasterizer/codegen/gen_knobs.cpp:39:
/usr/lib/gcc/x86_64-redhat-linux/10/include/avxintrin.h:1580:1: note: ‘void _mm256_storeu2_m128i(__m128i_u*, __m128i_u*, __m256i)’ previously defined here
1580 | _mm256_storeu2_m128i (__m128i_u *__PH, __m128i_u *__PL, __m256i __A)
| ^~~~~~~~~~~~~~~~~~~~
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3650>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3650>
Previously, i965/iris tried to reuse the currently programmed URB config
if it was good enough for BLORP, rather than reprogramming it each time.
However, this will make some things harder on Gen12+ and we've not seen
any performance impact from emitting URB more frequently in ANV.
This makes the blorp <-> driver interface a bit simpler on Gen7+ because
now all the driver has to do is to provide the L3$ config rather than
trying to hand off URB re-config to blorp.
Cc: "20.0" mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
Previously, calling vkCmdCopyQueryPoolResults with the
VK_QUERY_RESULT_WITH_AVAILABILITY_BIT flag set the query result
field in the buffer to 0 if unavailable and the query result if
available. This was a misunderstanding of the Vulkan spec, and this
commit corrects the behavior to emitting a separate available
result in addition to the query result.
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3560>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3560>
Previously, calling vkGetQueryPoolResults with the
VK_QUERY_RESULT_WITH_AVAILABILITY_BIT flag set the query result
field in *pData to 0 if unavailable and the query result if
available. This was a misunderstanding of the Vulkan spec, and this
commit corrects the behavior to eriting a separate available result
in addition to the query result.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3560>
When setting the vertex buffers, lima calls
util_set_vertex_buffers_mask() to reference and copy buffers. That
function
function adds dst with start_slot internally, so lima should not offset
the destination address again.
This is discovered when comparing with other drivers, and fixed by
removing the extra offset in lima_set_vertex_buffers().
This fixes draws that get translated in u_vbuf, because u_vbuf adds
extra vertex buffers when translating.
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3620>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3620>
The gmem state is split out now, so it does not require synchronization.
But gmem rendering still accesses vsc state from the context.
TODO maybe there is a better way? For gen's that don't do vsc resizing,
this is probably easier.. but for a6xx there isn't really a great
position for more fine grained locking. Maybe it doesn't matter since
in practice the lock shouldn't be contended.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3503>
For the moment, everything is I915_EXEC_RENDER, so this isn't necessary.
But even should that change, I don't think we want to handle multiple
engines in this manner.
Nowadays, we have batch->name (IRIS_BATCH_RENDER, IRIS_BATCH_COMPUTE,
possibly an IRIS_BATCH_BLIT for blorp batches someday), which describes
the functional usage of the batch. We can simply check that and select
an engine for that class of work (assuming there ever is more than one).
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3613>
For all VMEM instructions, the resource constant is now
in operands[0]. For MIMG instructions, the sampler shares
operands[1] with write data in case this instruction writes memory.
Moving the VADDR to be the last operand for MIMG is the first step to
support Navi NSA encoding.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3602>
In a NIR generated using SPIR-V initializers to variables, copy
propagation can end up transforming
vec1 32 ssa_33 = deref_var &@1 (shared mat2x4)
vec1 32 ssa_35 = mov ssa_33
vec1 32 ssa_7 = deref_cast (mat2x4 *)ssa_35 (shared mat2x4) /* ptr_stride=0 */
into
vec1 32 ssa_33 = deref_var &@1 (shared mat2x4)
vec1 32 ssa_7 = deref_cast (mat2x4 *)ssa_33 (shared mat2x4) /* ptr_stride=0 */
Before the optimization, the "head" of a path of deref that uses ssa_7
will be the cast. After, it will be the variable in ssa_33. Since
the types are the same, this is a trivial cast that would be picked up
by nir_opt_deref.
If we need to compare such deref-chain after optimization with another
deref-chain for the same variable, the compare function will get
confused by the cast in the middle.
One alternative would be to add nir_opt_deref to places that compare
derefs, but that might not scale well, so skip the trivial casts when
generating the paths instead.
Motivated by the discussion in
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3047#note_383660.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3420>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3420>
process_block() will use this to determine the register demand of the
before the current instruction. Previously, it was filled with zeroes
which could result in process_block() only using the register demand
of after the current instruction.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3257>
Otherwise, code like this will be broken:
loop {
if (...) {
break;
} else {
break;
}
}
The continue_or_break block doesn't have any logical predecessors but it's
a logical predecessor of the header block. This liveness error breaks the
spiller in init_live_in_vars() (under "keep variables spilled on all
incoming paths") and eventually creates garbage reloads.
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3257>
The operand isn't fixed to exec, which can mess up the spiller. This also
adds a new situation where a phi is needed.
Fixes dEQP-VK.ssbo.layout.random.descriptor_indexing.2 and an assertion
when compiling a Detroit: Become Human shader.
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3257>
A shader might require vgpr spilling but not require sgpr spilling. In
that case, the spiller lowers the sgpr target by 5 which could mean sgpr
spilling is then required. Then the vgpr target has to be lowered to make
space for the linear vgprs. Previously, space wasn't make for the linear
vgprs.
Found while testing the spiller on the pipeline-db with a lowered limit
Fixes: a7ff1bb5b9
('aco: simplify calculation of target register pressure when spilling')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3257>
Because softpin block pools are made up of a set of BOs with different
maps, it was possible for a single state to end up straddling blocks.
To fix this, we pass a contiguous size to anv_block_pool_grow and it
ensures that the next allocation in the pool will have at least that
size.
We also add an assert in anv_block_pool_map to ensure we always get
contiguous maps. Prior to the changes to anv_block_pool_grow, the unit
tests failed with this assert. With this patch, the tests pass.
This was causing problems on Gen12 where we allocate the pages for the
AUX table from the dynamic state pool. The first chunk, which gets
allocated very early in the pool's history, is 1MB which was enough that
it was getting multiple BOs. This caused the gen_aux_map code to write
outside of the map and overwrite the instruction state pool buffer which
lead to GPU hangs.
Fixes: 731c4adcf9 "anv/allocator: Add support for non-userptr"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We intentionally throw away all but one BT block but then we set
cmd_buffer->bt_block to ANV_STATE_NULL instead of the one we hung on to.
This causes the command buffer to immediately re-emit STATE_BASE_ADDRESS
the first time a BT is needed for no good reason.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
When an error condition occurs during tu_create_cmd_buffer(), the
cmd buffer has already been added to a pool, so the cleanup code should
remove it.
Fixes a crash (assert in tu_device::tu_bo_finish()) in dEQP tests:
dEQP-VK.api.object_management.max_concurrent.command_buffer_primary
dEQP-VK.api.object_management.max_concurrent.command_buffer_secondary
due to pool attempting to destroy an invalid command buffer.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3572>
I find warnings to be very disruptive to my workflow (using emacs's "go to
next error" feature), and I periodically have to go clean up other
people's drivers to get back to finding my own warnings in the noise. I
know I'm not the only one doing something like this.
We don't want to enable -Werror by default in builds, since it means that
end users will have builds spuriously fail based on what compiler version
and opt flags they have compared to what the devs are using. However, it
is quite easy to have CI ensure that we at least don't introduce warnings
on the compiler version that it uses.
For now I've just enabled it on meson-i386 to cover a bunch of Mesa core
and get us started on ratcheting up warnings-cleanliness in the tree,
without me having to fix up all the drivers at once.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3539>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3539>
In f132e0fddf, I attempted to allow BLORP to do CCS_E copies by using
the UNORM formats instead. However, the old BLORP bit-cast code could
only handle RGBA formats and asserted on anything other than UINT
formats. The reason we didn't catch this is because it only comes up on
Gen12 platforms which aren't in our normal CI yet.
Fixes: f132e0fddf "intel/blorp: Add support for CCS_E copies with..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3593>
Currently, fetching the partial results (VK_QUERY_RESULT_PARTIAL_BIT)
of an unavailable occlusion query via vkGetQueryPoolResults can
return invalid values. anv returns slot.end - slot.begin, but in the
case of unavailable queries, slot.end is still at the initial value
of 0. If slot.begin is non-zero, the occlusion count underflows to
a value that is likely outside the acceptable range of the partial
result.
This commit fixes vkGetQueryPoolResults by always returning 0 if the
query is unavailable and the VK_QUERY_RESULT_PARTIAL_BIT is set.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3586>
A cmdstream of size zero is invalid. But this can appear in various
places where we emit a pointer to state. This doesn't show up with
newer kernels (newer than v5.0) which use "softpin", but on earlier
kernels can result in:
[drm:msm_ioctl_gem_submit [msm]] *ERROR* invalid cmdstream size: 0
Since the pointer value doesn't matter in these cases, the easy solution
is just to not emit a cmds table entry in this case.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2805>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2805>
This reverts commit b60f5cbc15.
This fixes dmesg errors and X freezes:
[ 29.543096] amdgpu 0000:0c:00.0: No GEM object associated to handle 0x00000009, can't create framebuffer
[ 29.543103] amdgpu 0000:0c:00.0: No GEM object associated to handle 0x00000009, can't create framebuffer
v2: (Francisco Jerez)
- Drop vec4 changes.
- Handle explicit acc0 operand and implicit one.
- Make sure instruction is SIMD16, prediction is off and default mask
control set to true.
v3: (Francisco Jerez)
- Clear accumulator only when it's written.
- Use BRW_MASK_DISABLE instead of true.
- Use correct width for brw_acc_reg().
- Fix last_inst_offset.
v4: (Francisco Jerez)
- Don't check for last instruction for accummulator write.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3376>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3376>
Until now, embedded constants were printed as all 32 bits integer or
floats, but the compiler can pack constant from different types if
severa instructions with different reg_mode and native type refer to
the constant register. Let's implement something smarter so users don't
have to do a manual conversion when looking at a trace.
Note that 8-bit constants are not decoded yet, as we're not sure how
the writemask is encoded in that case.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3536>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3536>
We were applying row pitch constraint of CCS surfaces to linear
surfaces. But CCS is only supported in linear tiling under some
condition (more on that in the following commit). So let's drop that
requirement for now.
Fixes a bunch of crucible assert where the byte size of a linear image
is expected to be similar to the byte size of buffer for the same
extent in the following category :
func.miptree.r8g8b8a8-unorm.aspect-color.view-2d.*download-copy-with-draw.*
v2: Move restriction to isl_calc_tiled_min_row_pitch()
v3: Move restrinction to isl_calc_row_pitch_alignment() (Jason)
v4: Update message (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 07e16221d9 ("isl: Round up some pitches to 512B for Gen12's CCS")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3551>
Gen12 does not support RENDER_SURFACE_STATE::SurfaceArray = true &&
RENDER_SURFACE_STATE::Depth = 0. SurfaceArray can only be set to true
if Depth >= 1.
We workaround this limitation by adding the max(value, 1) snippet in
the shaders on the 3 components for texture array sizes.
Tested on Gen9 with the following Vulkan CTS tests :
dEQP-VK.image.image_size.2d_array.*
v2: Drop debug print (Tapani)
Switch to GEN:BUG instead of Wa_
v3: Fix dEQP-VK.image.image_size.1d_array.* cases (Lionel)
v4: Fix dEQP-VK.glsl.texture_functions.query.texturesize.* cases
(Missing tex_op handling) (Lionel)
v5: Missing break statement (Lionel)
v6: Fixup comment (Tapani)
v7: Fixup comment again (Tapani)
v8: Don't use sample_dim as index (Jason)
Rename pass
Simplify control flow
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v7)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3362>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3362>
Some of the smaller bit-size formats which support CCS_E don't have a
UINT representative in their compression class. However, we should be
able to use UNORM just fine and still get bit-exact copies. We just
have to do a conversion to/from UNORM when we bitcast.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3554>
Those were renamed/merged some time ago but it turns out that
ppir_op_undef can't be shared.
It was being used for undefined ssa operations and for read-before-write
operations that may happen to e.g. uninitialized registers (non-ssa)
inside a loop.
We really don't want to reserve a register for the undef ssa case, but
we must reserve and allocate register for the unitialized register case
because when it happens inside a loop it may need to hold its value
across iterations.
This dummy node might be eliminated with a code refactor in ppir in case
we are able to emit the write and allocate the ppir_reg before we emit
the read. But a major refactor we need this to keep this code to avoid
apparent regressions with the new liveness analysis implementation.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3502>
nir can output writes to dead registers when expanding vec4 operations
to non-ssa registers. In that case, some components of the vec4 may be
assigned but never read. These are also not currently removed by a nir
dead code elimination pass as they are not ssa.
In order to prevent regalloc from allocating a live register for this
operation, an interference must be assigned to it during liveness
analysis.
This workaround may be removed in the future if the assignments to dead
components can be removed earlier in ppir or nir.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3502>
The previous way we were attempting to handle AUX tables on TGL-LP was
very GL-like. We used the same aux table management code that's shared
with iris and we updated the table on image create/destroy. The problem
with this is that Vulkan allows multiple VkImage objects to be bound to
the same memory location simultaneously and the app can ping-pong back
and forth between them in the same command buffer. Because the AUX
table contains format-specific data, we cannot support this ping-pong
behavior with only CPU updates of the AUX table.
The new mechanism switches things around a bit and instead makes the aux
data part of the BO. At BO creation time, a bit of space is appended to
the end of the BO for AUX data and the AUX table is updated in bulk for
the entire BO. The problem here, of course, is that we can't insert the
format-specific data into the AUX table at BO create time.
Fortunately, Vulkan has a requirement that every TILING_OPTIMAL image
must be initialized prior to use by transitioning the image from
VK_IMAGE_LAYOUT_UNDEFINED to something else. When doing the above
described ping-pong behavior, the app has to do such an initialization
transition every time it corrupts the underlying memory of the VkImage
by using it as something else. We can hook into this initialization and
use it to update the AUX-TT entries from the command streamer. This way
the AUX table gets its format information, apps get aliasing support,
and everyone is happy.
One side-effect of this is that we disallow CCS on shared buffers.
We'll need to fix this for modifiers on the scanout path but that's a
task for another patch. We should be able to do it with dedicated
allocations.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
All they do now is take a size, align, and flags and figure out which
heap to allocate in. All of the actual code to deal with the BO is in
anv_allocator.c. We want to leave anv_vma_alloc/free in anv_device.c
because it deals with API-exposed heaps so it still makes sense to have
it there.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
This commit moves it in with all the other cache invalidation operations
as if it were done by PIPE_CONTROL even though it's a pair of register
writes. This means we only have to write the GFX_AUX_TABLE_BASE_ADDR
register once at device initialization instead of every invalidate.
Invalidates are now a single LRI instead of two.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
This breaks add_mapping() into three pieces:
1. get_aux_entry() adds AUX-TT pages as needed and returns the
L1 entry index, L1 entry address, and L1 entry map.
2. gen_aux_map_format_bits_for_isl_surf() computes the format-
specific information that goes in the AUX-TT entry.
3. add_mapping() is a lot dumber function that now just adds the
requested mapping with the requested format bits.
This lets us break out some additional helpers in the API which we want
to use for more direct AUX-TT management in ANV.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3519>
Pass down stencil data from the subpass attachment like we do
elsewhere. Only stencil attachments will make use of it.
Fixes warnings like
../src/intel/vulkan/genX_cmd_buffer.c: In function ‘cmd_buffer_begin_subpass’:
../src/intel/vulkan/genX_cmd_buffer.c:4656:41: warning: ‘target_stencil_layout’ may be used uninitialized in this function [-Wmaybe-uninitialized]
4656 | att_state->current_stencil_layout = target_stencil_layout;
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3557>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3557>
Previously, we set aux_usage=ISL_AUX_USAGE_NONE when we really meant
CCS_D. This sort-of made sense before we had anv_layout_to_aux_usage
but now that we have that helper. However, in our more modern aux
tracking model, all aux usage goes through anv_layout_to_* and we're
better off making the meaning of anv_image::planes[]::aux_usage be
AUX_USAGE_NONE if and only if there is no compression.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3556>
The instruction encoding for SENDS changed on Gen12 and it now supports
embedding the entire extended message descriptor in the instruction if
it's an immediate. Stop falling back to doing an indirect SEND just
because we had something in [15:12] of ex_desc.ud.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3547>
This commit makes two changes:
1. We set pending_pipe_bits instead of emitting PIPE_CONTROL directly
for the flush at the end of cmd_buffer_begin_subpass.
2. Because BLORP ops such as vkCmdClearAttachments may come in the
middle of a render pass, we have to also flag the need for a cache
flush after the blorp op.
Fixes: 185630c6bc "anv/blorp: Do the gen11 BTI flush"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3547>
../src/panfrost/pandecode/decode.c: In function ‘pandecode_compute_fbd’:
../src/panfrost/pandecode/decode.c:789:35: warning: taking address of packed member of ‘struct mali_compute_fbd’ may result in an unaligned pointer value [-Waddress-of-packed-member]
789 | pandecode_u32_slide(num, s->unknown ## num, ARRAY_SIZE(s->unknown ## num))
| ~^~~~~~~~~
../src/panfrost/pandecode/decode.c:800:9: note: in expansion of macro ‘SHORT_SLIDE’
800 | SHORT_SLIDE(1);
| ^~~~~~~~~~~
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3543>
It's required to insert 1 wait state if the dst VGPR of any v_interp_*
is followed by a read with v_readfirstlane or v_readlane to fix GPU
hangs on GFX6. Note that v_writelane_* is apparently not affected.
This hazard isn't documented anywhere but AMD confirmed it.
This fixes a GPU hang with the texturemipmapgen Sascha demo on GFX6.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3533>
Unlike on an immidiate-mode renderer, Turnip only renders tiles on
vkCmdEndRenderPass. As such, we need to track all queries that were
active in a given render pass and defer setting the available bit
on those queries until after all tiles have rendered.
This commit adds a draw_epilogue_cs to tu_cmd_buffer that is
executed as an IB at the end of tu_CmdEndRenderPass. We then emit
packets to this command stream that update the availability bit of a
given query in tu_CmdEndQuery.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3279>
Mostly a translation of freedreno's implementation of glEndQuery for
GL_SAMPLES_PASSED query objects with a slight modification to set the
availability bit of the query bo (slot->available) if the query was
not ended inside a render pass.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3279>
General structure is inspired by anv's implementation in genX_query.c.
We define a packed struct that tracks sample count at the beginning of
the query and at the end; the result of the occlusion query is then
slot->end - slot->begin.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3279>
Most places we actually know the usage and can provide it. There are
two exceptions to this:
1. We pass 0 into get_blorp_surf_for_anv_image when we use
ANV_IMAGE_LAYOUT_EXPLICIT_AUX because anv_layout_to_aux_usage is
never actually called so it doesn't matter.
2. We pass 0 into anv_layout_to_aux_usage in transition_color_buffer.
However, the coming commits which will begin using the usage
parameter only care about depth.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2605>
Rather than looking at the aux usage, we look at the isl_aux_state which
provides us with more detailed information. This commit adds a couple
helpers to isl which let us quickly determine if we have valid depth/hiz
on the initial layout and if we need valid depth/hiz for the final
layout.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2605>
When copying to an RGB surface, we treat it as an R only one of three
times the width, which may end up being larger than the maximum size
supported by the hardware and so it hits the shrink path. This forced
both source and destination surfaces to be shrunk, even though it's not
necessary for the former, and may even hit some assertions in some
cases, such as the surface being compressed.
Fixes several tests under dEQP-VK.api.copy_and_blit.core.image_to_image.dimensions.*
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3422>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3422>
This allows removing superfluous s_cselect instructions
that come from turning booleans into 64-bit vector condition.
v2 by Daniel Schürmann:
- Make the code massively simpler
v3 by Timur Kristóf:
- Fix regressions, make it work in wave32 mode
- Eliminate extra moves by not always using the SCC definition
- Use s_absdiff_i32 for uniform XOR
- Skip the transformation for uncommon or invalid instructions
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3450>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3450>
For GS copy shaders, whether we want to do exports is conditional. By
explicitly marking the end blocks, we can mark an IF's then branch as an
export block and ensure that's where the assembler inserts null exports.
v6: only fixup exports in the end block, like before
v8: simplify some code
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>
v2: implement GFX10
v3: rebase
v7: rebase after shader args MR
v8: fix gs_vtx_offset usage on GFX9/GFX10
v8: use unreachable() instead of printing intrinsic
v8: rename output_state to ge_output_state
v8: fix formatting around nir_foreach_variable()
v8: rename some helpers in the scheduler
v8: rename p_memory_barrier_all to p_memory_barrier_common
v8: fix assertion comparing ctx.stage against vertex_geometry_gs
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2421>
This is not always ->rgbBits, because there are cases where that could
be 32 but we're (legally) bound to a depth-24 pixmap. The important
thing to have match here is the actual server-side notion of depth. You
can look this up (at modest expense) from the xlib visual info if the
fbconfig has a visual. But it might not, so if not, fetch it (at
slightly greater expense) from XGetGeometry. Do this at GLX drawable
creation so you don't have to do it on the SwapBuffers path.
Apparently this fixes glx/glx-swap-singlebuffer, which is unintentional
but quite pleasant.
Fixes: mesa/mesa#2291
Fixes: 90d58286 ("drisw: Fix and simplify drawable setup")
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3305>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3305>
This gets a lot of the hard code converted over to the new macros,
resulting in (I feel) much more readable code with
LESS_SHOUTING_ABOUT_THE_REG(). I decided to consistently put the reg on
its own line, so that all the register names line up.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3455>
This introduces some minor unpacking of the temporary fd_reg_pair structs
to code that previously was packing a whole register field.
In the pack wrapper in tu_cs.h, I added some explanatory docs, dropped the
relocs handling since we don't need it, and removed the extra regs[] in
the __ONE_REG() macro (which was causing gcc's optimizer to fall on its
face in my release build).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3455>
Sometimes you want to zero out an address by supplying a NULL BO, but
without this we would end up only emitting one dword. Increases size of
fd6_gmem.o by .8%, though it's not clear to me why (no obvious terrible
codegen happening)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3455>
legalize is computing a lot of state that goes in the variant, let's just
store it directly instead of passing pointers around. This leaves
max_bary in place, which is doing some surprising work (overwriting the
original total_in in some cases).
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3494>
A few hash_table users roll their own integer hash functions which
call _mesa_hash_data to perform the hashing which ultimately calls
into XXH32 with a dynamic key length. When using small keys with a
constant size the hash rate can be greatly improved by inlining
XXH32 and providing it a constant key length, see:
https://fastcompression.blogspot.com/2018/03/xxhash-for-small-keys-impressive-power.html
Additionally, this patch removes calls to _mesa_key_hash_string and
makes them instead call _mesa_has_string directly, matching the new
integer hash functions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3475>
For most key sizes, xxhash outperforms fnv1a's hash rate substantially (bug
2153). In particular, the V3D driver hashes multiple ~200 byte keys as part
of the shader cache lookup which can easily eat up 10-20% of the runtime on
the Raspberry Pi. Swapping over to xxhash drops this to ~1% of the runtime.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3475>
When the amdgpu_screen_winsys uses the same FD as the amdgpu_winsys
(which is always the case for the first amdgpu_screen_winsys), we can
just use bo->u.real.kms_handle.
v2:
* Also only create the kms_handles hash table if the
amdgpu_screen_winsys fd is different from the amdgpu_winsys one.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3202>
The assumption being that KMS handles are only retrieved for relatively
few BOs, so hash tables should be efficient both in terms of performance
and memory consumption.
We use the address of struct amdgpu_winsys_bo as the key and its
kms_handle field (the KMS handle valid for the DRM file descriptor
passed to amdgpu_device_initialize) as the hash value.
v2:
* Add comment above amdgpu_screen_winsys::kms_handles (Pierre-Eric
Pelloux-Prayer)
v3:
* Protect kms_handles hash table with amdgpu_winsys::sws_list_lock
mutex.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3202>
This introduces:
- nir_texop_fragment_mask_fetch (fetch a fragment mask from a
compressed multisampled color surface)
- nir_texop_fragment_fetch (fetch a color fragment for a
particular sample at corresponding fragment mask index).
These two texture operations are necessary for implementing
SPV_AMD_shader_fragment_mask.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3304>
Pretty straightforward: Port texture descriptor code from freedreno, fill
in alignment limits from closed vk, and tu_cmd_buffer.c was already
uploading the texture descriptor.
This doesn't implement storage texel buffers (required in the compute
pipeline) yet, since those will need an IBO descriptor for the store path.
Still, making the load path be connected to the texture descriptor won't
hurt.
Part of #2237
Fixes dEQP-VK.binding_model.shader_access.primary_cmd_buf.uniform_texel_buffer.*
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3522>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3522>
I noticed that we can do better for these kinds of comparisons while
working on the lowering for iadd_sat@64 and isub_sat@64. This
eliminated 11 instruction from the fs-addSaturate-int64.shader_test.
My hope is that this will improve the run-time of int64 tests on Ice
Lake. I have no data to support or refute this.
Unsurprisingly, no changes on shader-db.
v2: Condition the min and max patterns with nir_lower_minmax64.
Suggested by Caio. Very long discussion in the MR. :)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
Driver supports integer multiplication between a 32-bit integer and a
16-bit integer. If the second operand is 32-bits, the upper 16-bits are
ignored, and the low 16-bits are possibly sign extended as necessary.
Iris will eventually enable this. Not sure about other drivers.
v2: Add default value to u_screen.c. Suggested by Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Rebase on 272e927d0e ("nir/spirv: initial handling of OpenCL.std
extension opcodes")
v3: Add missing SpvOpUCountTrailingZerosINTEL case to switch in
vtn_handle_body_instruction. Remove stray semicolon in
vtn_nir_alu_op_for_spirv_opcode. Use umin instead of umax for
SpvOpUCountTrailingZerosINTEL "lowering" in vtn_handle_alu.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Remove smashing type to D for nir_op_irhadd. Caio noticed it was
odd, and removing it fixes an assertion failure in the crucible
func.shader.averageRounded.int64_t test (because the source should be
W).
v3: Emit BRW_OPCODE_MUL directly for nir_op_umul_32x16 and
nir_op_imul_32x16. Suggested by Curro.
v4: Smash types of MUL instruction generated for nir_op_umul_32x16 and
nir_op_imul_32x16. With this change, I get the same assembly now as I
did with v2.
v5: Remove support for pre-Gen7. The integer multiply path was
incorrect, and, since the extension isn't enabled pre-Gen7, there's no
way to test it.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Add a big comment explaining the [IU]SUB_SAT lowering. Suggested by
Caio.
v3: Use get_fpu_lowered_simd_width in get_lowered_simd_width. Suggested
by Ken on IRC.
v4: Fix a typo in a comment. Noticed by Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Move the check to fs_visitor::lower_integer_multiplication.
Previously the cases where lowering was skipped, the original
instruction was removed by fs_visitor::lower_integer_multiplication.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Rebase on 272e927d0e ("nir/spirv: initial handling of OpenCL.std
extension opcodes")
v3: Add a new lower_usub_sat64 flag that only applies to the 64-bit
version of the nir_op_usub_sat instruction.
v4: Also enable the lowering when nir_lower_iadd64 is set.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v3]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Rebase on 272e927d0e ("nir/spirv: initial handling of OpenCL.std
extension opcodes")
v3: Add a new lower_hadd64 flag that only applies to the 64-bit versions
of the instructions.
v4: Also enable the lowering when nir_lower_iadd64 is set.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v3]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
uctz isn't added because it will implemented in the GLSL path and the
SPIR-V path using other pre-existing instructions.
v2: Avoid signed integer overflow for uabs_isub(0, INT_MIN). Noticed by
Caio.
v3: Alternate fix for signed integer overflow for abs_sub(0, INT_MIN).
I tried the previous methon in a small test program with -ftrapv, and it
failed.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
v2: Re-write iadd64_saturate and isub64_saturate to avoid undefined
overflow behavior. Also fix copy-and-paste bug in isub64_saturate.
Suggested by Caio.
v3: Avoid signed integer overflow for abs_sub(0, INT_MIN). Noticed by
Caio.
v4: Alternate fix for signed integer overflow for abs_sub(0, INT_MIN).
I tried the previous methon in a small test program with -ftrapv, and it
failed.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/767>
Gen4/5's rounding instructions operate differently than later Gens'.
They all return the floor of the input and the "Round-increment"
conditional modifier answers whether the result should be incremented by
1.0 to get the appropriate result for the operation (and thus its
behavior is determined by the round opcode; e.g., RNDZ vs RNDE).
Since this requires a second instruciton (a predicated ADD) that
consumes the result of the round instruction, the round instruction
cannot write its result directly to the (write-only) message registers.
By emitting the ADD in the generator, the backend thinks it's safe to
store the round's result directly to the message register file.
To avoid this, we move the emission of the ADD instruction to the NIR
translator so that the backend has the information it needs.
I suspect this also fixes code generated for RNDZ.SAT but since
Gen4/5 don't support GLSL 1.30 which adds the trunc() function, I
couldn't write a piglit test to confirm. My thinking is that if x=-0.5:
sat(trunc(-0.5)) = 0.0
But on Gen4/5 where sat(trunc(x)) is implemented as
rndz.r.f0 result, x // result = floor(x)
// set f0 if increment needed
(+f0) add result, result, 1.0 // fixup so result = trunc(x)
then putting saturate on both instructions will give the wrong result.
floor(-0.5) = -1.0
sat(floor(-0.5)) = 0.0
// +1 increment would be needed since floor(-0.5) != trunc(-0.5)
sat(sat(floor(-0.5)) + 1.0) = 1.0
Fixes: 6f394343b1 ("nir/algebraic: i2f(f2i()) -> trunc()")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2355
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3459>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3459>
Each instruction bundle can contain up to 16 constant bytes. The meaning
of those byte is instruction dependent: it depends on the instruction
native type (int, uint or float) and the instruction reg_mode (8, 16, 32
or 64 bit). Those different layouts can be exposed as a union to
facilitate constants manipulation.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3478>
On platforms without mincore(), _eglPointerIsDereferencable()
currently just checks whether p != NULL. This is not sufficient:
In the Wayland platform code (i.e., in get_wl_surface_proxy()),
_eglPointerIsDereferencable() is called on the version field
of `struct wl_egl_window` which is 3 on current versions of
Wayland. This causes a segfault when trying to dereference p.
Fix this behavior by assuming that the first page of the
process is never dereferencable.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3103>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3103>
Specifically, execution size, register file, and register type. I did
not add validation for vertical stride and width because I don't believe
it's possible to have an otherwise valid instruction with an invalid
vertical stride or width, due to all of the other regioning
restrictions.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
16-bit immediates need to be replicated through the 32-bit immediate
field, so we should never see one that isn't.
This does happen however in the fuzzer unit test, so returning false
allows the fuzzer to reject this case.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
Previously we were sharing tables between generations that were nearly
identical (i.e., Gen8 3-src adds HF support) and used a small bit of
code to handle the differences. This is kind of a mess if you want to
reject 64-bit types on platforms that don't support 64-bit types, so
split the tables, allowing each generation's table to list exactly what
it supports.
Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
Two of the tests emit instructions with MRF destinations, and MRFs
aren't present on Gen7+. I think we were just lucky that this didn't
cause a problem earlier since we were running the tests on Gen7-9.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
Since the platforms don't support align1 3-src instructions, the
contents of these operands are not going to be meaningful. Just don't
print them to avoid hitting some assertions in brw_inst functions.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
When there's only one hardware thread (i.e. the dispatch width greater
or equal to the workgroup size), there's no need to use a barrier to
ensure all the invocations reach the same point in the shader, because
they are already running lock-step.
Results for SKL running Iris for shader-db tests with compute shaders
total sends in shared programs: 18361 -> 18339 (-0.12%)
sends in affected programs: 904 -> 882 (-2.43%)
helped: 9
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 2.44 x̃: 2
helped stats (rel) min: 0.84% max: 21.43% x̄: 7.82% x̃: 2.67%
95% mean confidence interval for sends value: -3.31 -1.58
95% mean confidence interval for sends %-change: -14.67% -0.97%
Sends are helped.
Shaders from Aztec Ruins, Car Chase, Manhattan and DeusEx are helped.
Results for ICL and TGL are similar to SKL.
Results for BDW are similar to SKL except for DeusEx shader that has a
workgroup size 16 but in BDW picks the SIMD8.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>
When there's only one hardware thread (i.e. the dispatch width greater
or equal to the workgroup size), there's no need to synchronize shared
memory access (SLM) since all the requests from a single thread are
already synchronized. In such case, we just add a scheduling fence.
To be able to identify that case for all platforms, move the handling
of platforms prior to Gen11 (which don't have a separate SLM fence)
after the optimization.
Results for SKL running Iris for shader-db tests with compute shaders
total sends in shared programs: 18395 -> 18361 (-0.18%)
sends in affected programs: 938 -> 904 (-3.62%)
helped: 9
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 3.78 x̃: 4
helped stats (rel) min: 1.56% max: 26.32% x̄: 10.33% x̃: 2.60%
95% mean confidence interval for sends value: -4.85 -2.71
95% mean confidence interval for sends %-change: -19.12% -1.54%
Sends are helped.
Shaders from Aztec Ruins, Car Chase, Manhattan and DeusEx are helped.
Results for ICL and TGL are similar to SKL.
Results for BDW are similar to SKL except for DeusEx shader that has a
workgroup size 16 but in BDW picks the SIMD8.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>
Like a SHADER_OPCODE_MEMORY_FENCE but doesn't doesn't generate any
assembly code.
Will be used when the compiler shouldn't reorder certain instructions
but there's no need to generate code for the HW to do it -- as the
ordering will be guaranteed by other means.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>
The closed GL driver doesn't use UBWC on any storage images. It does tile
mostly (skipping tiling on writeonly images, it seems), but for freedreno
we've been enabling tiling in all cases and it's fine. We do need to
disable UBWC, as tests fail otherwise and just plugging in the equivalent
UBWC regs like we were setting up a texture isn't enough.
Fixes dEQP-VK.image.atomic_operations.*
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3433>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3433>
So far this doesn't handle the texture state-based storage image access
loads, and doesn't support descriptor arrays (same as SSBOs). The texture
side is more tricky, since we have another remapping table to work around.
This is enough to get some of dEQP-VK.image.atomic_operations.* working.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3433>
u_screen will return 0 for all of these, which means that this is one
less driver to see in git grep when I'm checking who exposes a cap.
The exception is the texel/gather offsets and stream output
components, which will not be exposed since we don't expose the
corresponding GLSL version.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3493>
Just make it be all SSBOs then all storage images. The remapping table
was there to make it so that the big gap present from gallium's atomic
lowering would get cleaned up, but that's no longer case. The table has
made it very hard to support Vulkan storage images, so it's time for it to
go.
This does mean that an SSBO/IBO that is only loaded (or size-queried) will
now occupy a slot in the table where it wouldn't before. This seems like
a minor cost compared to being able to drop this much logic.
With the remapping table gone, SSBO array handling for turnip just falls
out.
Fixes many array cases of
dEQP-VK.binding_model.shader_access.primary_cmd_buf.storage_buffer.*
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jonathan Marek <jonathan@marek.ca> (turnip)
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3240>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3240>
The arguments passed in were:
- prog->info.num_ssbos
- prog->nir->info.num_ssbos
- arbitrary values for standalone compilers
The num_ssbos should match between the prog's info and prog->nir's info
until this lowering happens.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3240>
We carve out half the SSBO space for atomics, and we were just binding
them way up there. freedreno was then using a remapping table to map the
sparse buffer index back down, since space in the descriptor array is a
shared resource that may limit parallelism. That remapping table
generated inside of the ir3 compiler is getting thoroughly in the way of
implementing vulkan descriptor sets.
We will be able to get rid of the freedreno's remapping table, and
hopefully save shared resources on other hardware, by packing the atomics
tightly above the SSBOs (like i965 does). We already rebind the shader
buffers on program change if either the old or new program has SSBOs or
ABOs, so this doesn't necessarily increase the program state change cost
(the only cost increase I can come up with is if you're using the same
atomic counter without rebinding it across changes of programs with
varying SSBO counts, meaning it would now bounce around index space).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3240>
Gallium arbitrarily (it seems) put atomics below SSBOs, resulting in a
bunch of extra index management, and surprising shader code when you would
see your SSBOs up at index 16. It makes a lot more sense to see atomics
converted to SSBOs appear as magic high numbers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3240>
There's a lot going on here (it's a ton of commits squashed together
since otherwise this would be impossible to review...)
1. We have a fast path for linear->tiled for whole (aligned) tiles, but we
have to use a slow path for unaligned accesses. We can get a pretty
major win for partial updates by using this slow path simply on the
borders of the update region, and then hit the fast path for the
tile-aligned interior. This does require some shuffling.
2. Mark the LUTs constant, which allows the compiler to inline them,
which pairs well with loop unrolling (eliminating the memory accesses
and just becoming some immediates.. which are not as immediate on
aarch64 as I'd like..)
3. Add fast path for bpp1/2/8/16. These use the same algorithm and we
have native types for them, so may as well get the fast path.
4. Drop generic path for bpp != 1/2/8/16, since these formats are
generally awful and there's no way to tile them efficienctly and
honestly there's not a good reason too either. Lima doesn't support any
of these formats; Panfrost can make the opinionated choice to make them
linear.
5. Specialize the unaligned routines. They don't have to be fully
generic, they just can't assume alignment. So now they should be nearly
as fast as the aligned versions (which get some extra tricks to be even
faster but the difference might be neglible on some workloads).
6. Specialize also for the size of the tile, to allow 4x4 tiling as well
as 16x16 tiling. This allows compressed textures to be efficiently tiled
with the same routines (so we add support for tiling ASTC/ETC textures
while we're at it)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Vasily Khoruzhick <anarsoul@gmail.com> #lima on Mali400
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3414>
Having VERTEX_BUFFER_STATE.BufferSize greater than the size of
a bound vertex buffer allows shader to read uninitialized vertex
attributes from BO, instead of allowing hardware to return zeroes
on out-of-bounds access.
OpenGL spec "6.4 Effects of Accessing Outside Buffer Bounds" says:
"Robust buffer access can be enabled by creating a context with robust access
enabled through the window system binding APIs. When enabled, any command
unable to generate a GL error as described above, such as buffer object accesses
from the active program, will not read or modify memory outside of the data
store of the buffer object and will not result in GL interruption or termination.
Out-of-bounds reads may return values from within the buffer object or zero
values."
Fixes three webgl tests:
conformance/rendering/out-of-bounds-array-buffers.html
conformance2/rendering/out-of-bounds-index-buffers-after-copying.html
conformance2/rendering/element-index-uint.html
See #1996
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3427>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3427>
Amend fails and skips lists basing on lists from Andreas Baierl,
shard mali400 job across two devices since it takes close to 10min
and rename jobs to lima-mali400-test and lima-mali450-test.
Also don't set MESA_GLES_VERSION_OVERRIDE=3.0 for lima since we don't support
GLES 3.0 and lower DEQP_PARALLEL to 3 for jobs on H3.
Keep mali400 jobs disabled atm since they take too much time to complete
and we also get some unexplicable failures in dEQP-GLES2.functional.default_vertex_attrib.*
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3163>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3163>
deqp-runner.sh uses it to determine whether we split job across multiple
devices and if we do what's the node index.
With this change we now can set 'parallel: N' in job description if we want
to split the job.
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3163>
When VK_DESCRIPTOR_TYPE_SAMPLER is provided, it doesn't need to be
counted as a buffer count. Otherwise it leads to mismatch of allocated
buffer size, hitting VK_ERROR_OUT_OF_POOL_MEMORY finally.
Fixes: c39afe68f0
Also fixes amber tests:
./tests/cases/address_modes_float.amber
./tests/cases/address_modes_int.amber
./tests/cases/magfilter_linear.amber
./tests/cases/magfilter_nearest.amber
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Having to always pull the physical device from the instance has been
annoying for almost as long as the driver has existed. It also won't
work in a world where we ever have more than one physical device. This
commit adds a new field called "physical" to anv_device and switches
every location where we use device->instance->physicalDevice to use the
new field instead.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
Only non-indexed triangle lists and strips are supported. This increases
performance if there is something to cull.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
- Use conservative late alloc when the number of CUs <= 6.
- Move the late alloc GS register to the GS shader state, so that it can be
tuned for NGG culling.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
The value is not changed. I just use a different way to compute it.
The value will vary with NGG culling.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
This decreases VGPR usage and will allow us to merge some IF blocks
in shaders.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
We didn't use the correct LDS pointer, though it probably doesn't matter,
because I think that nothing else is using LDS here.
This commit makes it consistent with all other esgs_ring use.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Some formats, in particular YCbCr formats and ASTC have additional
restrictions. We already whack ASTC formats to RGBA32_UINT because the
hardware doesn't allow LINEAR with ASTC. However, we need to fix YCbCr
formats as well because they come with alignment restrictions that we
can't guarantee are satisfied. We're using blorp_copy to do the copies
so we may as well just stomp formats for everything.
Fixes: b24b93d584 "anv: enable VK_KHR_sampler_ycbcr_conversion"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3460>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3460>
This fixes a crash in LZDoom where over 16 shader variants are needed
for a few shaders in some maps, and should also save a few kilobytes
of RAM as most of the time only one or two variants of the 8 previously
allocated are actually needed.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Previously, the array bo_access->readers was only cleared when there
were no unsignaled fences, which in some situations never happened.
That resulted in the array having thousands of NULL pointers, but only
a handful of active readers.
With this patch, all the unsignaled readers are moved to the front of
the array, effectively building a new array only containing the active
readers in-place. This results in the readers array usually only having
a couple of elements.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3419>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3419>
The dl-tag isn't a neat tool for defining sub-headings, it's a semantic
tool for defining definitions and their meaning. Let's insetad use
normal sub-headings instead.
To make the last few paragraphs stand out from the above, let's add a
sub-heading for those as well.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>
This lets us move PC_PRIMITIVE_CNTL into the rasterizr stateobj, rather
than unconditionally emitting it directly in the cmdstream on every
draw.
This also starts adding some tracking about previous draw state, so that
following patches can limit some of the register writes we currently
emit on every draw.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>
The overhead does seem to matter when you have a high enough # of draw
calls that effect few bins/pixels, because these writes would happen
unconditionally (ie. not part of a state-group).
Possibly we could keep these if we moved them into a state-group so the
register writes would be no-ops on bins with no geometry. OTOH I
usually end up adding in a WFI when using them scratch reg values to
track down a crash. (So add a WFI to mitigate the annoyance of needing
to use a debug build to get scratch regs to locate the position of a
crash/hang in the cmdstream.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>
This involves permuting the registers of barycentric vectors to have
the standard X[0-n] Y[0-n] layout at NIR translation time.
Barycentrics are converted to the format expected by the PLN
instruction in the lower_barycentrics() pass run after the
optimization loop.
Main reason is correctness of SIMD32 fragment shaders. The
shuffle_from_pln_layout() and shuffle_to_pln_layout() helpers used
during NIR translation are busted for SIMD32. This leads to serious
corruption at present with INTEL_DEBUG=do32, especially on Gen11+
where these helpers are hit more frequently due to the lack of a
hardware PLN instruction.
Of course one could have chosen to fix those helpers instead, but
there is another far more subtle issue that was reported during review
of the SIMD32 fragment shader codegen changes: The SIMD splitting pass
currently handles SIMD32 barycentric vectors as if they had the
standard X[0-n] Y[0-n] layout, even though they are interleaved for
the PLN instruction, which causes incorrect execution masks to be
applied to the MOVs unzipping barycentric vectors in cases where a
LINTERP instruction occurs under non-uniform control flow.
I'm not aware of any conformance regressions due to the latter issue
at present, but for our peace of mind let's move the conversion to the
PLN layout into the lower_barycentrics() pass run after
lower_simd_width().
This leads to the following shader-db improvements (including SIMD32
shaders) in combination with the previous back-end preparation changes
-- Without them (especially the copy propagation changes) this would
lead to a massive number of regressions. On ICL:
total instructions in shared programs: 20662316 -> 20466903 (-0.95%)
instructions in affected programs: 10538474 -> 10343061 (-1.85%)
helped: 68775
HURT: 6
total spills in shared programs: 8938 -> 8748 (-2.13%)
spills in affected programs: 376 -> 186 (-50.53%)
helped: 9
HURT: 5
total fills in shared programs: 8965 -> 8663 (-3.37%)
fills in affected programs: 965 -> 663 (-31.30%)
helped: 9
HURT: 6
LOST: 146
GAINED: 43
On SKL:
total instructions in shared programs: 18725867 -> 18614912 (-0.59%)
instructions in affected programs: 3876590 -> 3765635 (-2.86%)
helped: 27492
HURT: 2
LOST: 191
GAINED: 417
On SNB:
total instructions in shared programs: 14573613 -> 13980646 (-4.07%)
instructions in affected programs: 5199074 -> 4606107 (-11.41%)
helped: 29998
HURT: 0
LOST: 21
GAINED: 30
Results are somewhat less impressive but still significant without
SIMD32 fragment shaders enabled. On ICL:
total instructions in shared programs: 16148728 -> 16061659 (-0.54%)
instructions in affected programs: 6114788 -> 6027719 (-1.42%)
helped: 42046
HURT: 6
total spills in shared programs: 8218 -> 8028 (-2.31%)
spills in affected programs: 376 -> 186 (-50.53%)
helped: 9
HURT: 5
total fills in shared programs: 8953 -> 8651 (-3.37%)
fills in affected programs: 965 -> 663 (-31.30%)
helped: 9
HURT: 6
LOST: 0
GAINED: 3
On SKL:
total instructions in shared programs: 14927994 -> 14926738 (-0.01%)
instructions in affected programs: 168850 -> 167594 (-0.74%)
helped: 711
HURT: 2
On SNB:
total instructions in shared programs: 10770538 -> 10734403 (-0.34%)
instructions in affected programs: 2702172 -> 2666037 (-1.34%)
helped: 17818
HURT: 0
All of the hurt shaders are either spilling slightly more or emitting
additional NOP instructions due to the SIMD16 POW workaround for
Gen8-9 combined with differences in scheduling.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The goal is to represent barycentrics with the standard vector layout
during optimization and particularly SIMD lowering. Instead of
emitting the barycentric layout conversions at NIR translation time,
do it later as a lowering pass. For the moment this is only applied
to PI messages, but we'll give the same treatment to LINTERP
instructions too.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're about to change the layout of barycentric vectors, which will
involve permuting the GRFs of barycentrics fetched from the thread
payload. Make room for this in a function separate from the generic
fetch_payload_reg(), since the permutation will only be applicable to
barycentric vectors. This allows simplifying fetch_payload_reg(),
since there was no need for handling multiple-component payload
registers except for barycentrics.
This causes some minor shader-db noise due to the new helper emitting
a LOAD_PAYLOAD instruction unconditionally, but it will be cleaned up
shortly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This prevents regressions on SNB due to the redundant MOVs lying
around in cases where fetch_payload_reg() returns a VGRF (currently
only in SIMD32 but soon in pretty much all cases). The MOVs can't be
register-coalesced due to their source being a FIXED_GRF, and they
can't be copy-propagated either due to the unlit centroid workaround
partial writes. They can be copy-propagated just fine into a SEL
instruction though.
On SNB this prevents the following shader-db regressions (including
SIMD32 programs) in combination with the interpolation rework part of
this series:
total instructions in shared programs: 13996898 -> 14001982 (0.04%)
instructions in affected programs: 197461 -> 202545 (2.57%)
helped: 0
HURT: 1251
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is mainly meant to avoid shader-db regressions on SNB as we start
using VGRFs for barycentrics more frequently. Currently the
aligned_pairs_class is only useful in SIMD8 mode, because in SIMD16
mode barycentric vectors are typically 4 GRFs. This is not a problem
on Gen4-5, because on those platforms all VGRF allocations are
pair-aligned in SIMD16 mode. However on Gen6 we end up using either
the fast or the slow path of LINTERP rather non-deterministically
based on the behavior of the register allocator.
Fix it by repurposing aligned_pairs_class to hold PLN-aligned
registers of whatever the natural size of a barycentric vector is in
the current dispatch width.
On SNB this prevents the following shader-db regressions (including
SIMD32 programs) in combination with the interpolation rework part of
this series:
total instructions in shared programs: 13983257 -> 14527274 (3.89%)
instructions in affected programs: 1766255 -> 2310272 (30.80%)
helped: 0
HURT: 11608
LOST: 26
GAINED: 13
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This avoids regressions on SNB due to the bank conflict mitigation
pass moving a VGRF-allocated barycentric vector to a misaligned
location, which would prevent the PLN instruction from being used.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously we would hardcode fs_visitor::delta_xy barycentrics to be
allocated from aligned_pairs_class on hardware with PLN source
alignment restrictions (pre-Gen7). Instead allocate any registers
consumed by LINTERP from aligned_pairs_class, even if some barycentric
vector had ended up in a temporary.
On SNB this prevents the following shader-db regressions (including
SIMD32 programs) in combination with the interpolation rework part of
this series:
total instructions in shared programs: 13983257 -> 14527274 (3.89%)
instructions in affected programs: 1766255 -> 2310272 (30.80%)
helped: 0
HURT: 11608
LOST: 26
GAINED: 13
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is particularly useful in cases where register coalaesce is
unlikely to succeed because the LOAD_PAYLOAD isn't a plain copy --
E.g. when a LOAD_PAYLOAD is shuffling the contents of a barycentric
vector in order to transform it into the PLN layout.
This prevents the following shader-db regressions (including SIMD32
programs) in combination with the interpolation rework part of this
series. On SKL:
total instructions in shared programs: 18596672 -> 18976097 (2.04%)
instructions in affected programs: 7937041 -> 8316466 (4.78%)
helped: 39
HURT: 67427
LOST: 466
GAINED: 220
On SNB:
total instructions in shared programs: 13993866 -> 14202963 (1.49%)
instructions in affected programs: 7611309 -> 7820406 (2.75%)
helped: 624
HURT: 52943
LOST: 6
GAINED: 18
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In cases where a LOAD_PAYLOAD instruction copies a single block of
sequential GRF registers into the destination (see
is_identity_payload()), splitting the block copy into a number of ACP
entries (one for each LOAD_PAYLOAD source) is undesirable, because
that prevents copy propagation into any instructions which read
multiple components at once with the same source (the barycentric
source of the LINTERP instruction is going to be the overwhelmingly
most common example).
Technically it would also be possible to do this for VGRF sources, but
there is little benefit from that since register coalesce already
covers many of those cases -- There is no way for a block of
FIXED_GRFs to be coalesced into a VGRF though.
This prevents the following shader-db regressions (including SIMD32
programs) in combination with the interpolation rework part of this
series. On SKL:
total instructions in shared programs: 18595160 -> 18828562 (1.26%)
instructions in affected programs: 13374946 -> 13608348 (1.75%)
helped: 7
HURT: 108977
total spills in shared programs: 9116 -> 9106 (-0.11%)
spills in affected programs: 404 -> 394 (-2.48%)
helped: 7
HURT: 9
total fills in shared programs: 8994 -> 9176 (2.02%)
fills in affected programs: 898 -> 1080 (20.27%)
helped: 7
HURT: 9
LOST: 469
GAINED: 220
On SNB:
total instructions in shared programs: 13996898 -> 14096222 (0.71%)
instructions in affected programs: 8088546 -> 8187870 (1.23%)
helped: 2
HURT: 66520
total spills in shared programs: 2985 -> 2961 (-0.80%)
spills in affected programs: 632 -> 608 (-3.80%)
helped: 2
HURT: 0
total fills in shared programs: 3144 -> 3128 (-0.51%)
fills in affected programs: 1515 -> 1499 (-1.06%)
helped: 2
HURT: 0
LOST: 0
GAINED: 4
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be useful for eliminating redundant copies from the FS
thread payload, particularly in SIMD32 programs. For the moment we
only allow FIXED_GRFs with identity strides in order to avoid dealing
with composing the arbitrary bidimensional strides that FIXED_GRF
regions potentially have, which are rarely used at the IR level
anyway.
This enables the following commit allowing block-propagation of
FIXED_GRF LOAD_PAYLOAD copies, and prevents the following shader-db
regressions (including SIMD32 programs) in combination with the
interpolation rework part of this series. On ICL:
total instructions in shared programs: 20484665 -> 20529650 (0.22%)
instructions in affected programs: 6031235 -> 6076220 (0.75%)
helped: 5
HURT: 42073
total spills in shared programs: 8748 -> 8925 (2.02%)
spills in affected programs: 186 -> 363 (95.16%)
helped: 5
HURT: 9
total fills in shared programs: 8663 -> 8960 (3.43%)
fills in affected programs: 647 -> 944 (45.90%)
helped: 5
HURT: 9
On SKL:
total instructions in shared programs: 18937442 -> 19128162 (1.01%)
instructions in affected programs: 8378187 -> 8568907 (2.28%)
helped: 39
HURT: 68176
LOST: 1
GAINED: 4
On SNB:
total instructions in shared programs: 14094685 -> 14243499 (1.06%)
instructions in affected programs: 7751062 -> 7899876 (1.92%)
helped: 623
HURT: 53586
LOST: 7
GAINED: 25
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This involves indexing the ACP tables used internally by
fs_copy_prop_dataflow::setup_initial_values() by reg_space() instead
of register number. Both are nearly equivalent for virtual GRFs
(barring the single bit of entropy lost in the hash), and this makes
handling FIXED_GRFs straightforward.
Because we're only going to support FIXED_GRFs for the source of a
copy, this change is only strictly necessary during the second pass
that checks for source interference, but we also apply the same change
to the first pass for consistency.
Note that this shouldn't change the behavior of the copy propagation
pass until we start inserting FIXED_GRF entries into the ACP. Even
then FIXED_GRF writes are extremely rare so this change will hardly
ever have an effect, but they aren't completely non-existing so we
need to handle them for correctness.
No functional nor shader-db changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reworks the current fs_inst::is_copy_payload() method into a
number of classification helpers with well-defined semantics. This
will be useful later on in order to optimize LOAD_PAYLOAD instructions
more aggressively in cases where we can determine it's safe to do so.
The closest equivalent of the present fs_inst::is_copy_payload()
method is the is_coalescing_payload() helper introduced here.
No functional nor shader-db changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In cases where LOAD_PAYLOAD is provided a pair of contiguous registers
as header sources, try to use a single SIMD16 instruction in order to
initialize them. This is unlikely to affect the overall cycle count
of the shader, since the compressed instruction has twice the issue
time, except due to the reduced pressure on the instruction cache.
Main motivation is avoiding instruction-count regressions in
combination with the following copy propagation improvements, which
will allow the SIMD16 g0-1 header setup emitted for framebuffer writes
to be copy-propagated into its LOAD_PAYLOAD, leading to the emission
of two SIMD8 MOV instructions instead of a single SIMD16 MOV.
Reverting this commit on top of the copy propagation changes would
lead to the following shader-db regressions on SKL and other
platforms:
total instructions in shared programs: 14926738 -> 14935415 (0.06%)
instructions in affected programs: 1892445 -> 1901122 (0.46%)
helped: 0
HURT: 8676
Without the following copy propagation changes this doesn't have any
effect on shader-db on Gen7+, because we would typically set up the FB
write header with a separate SIMD16 MOV that isn't currently
copy-propagated into the LOAD_PAYLOAD, so the individual SIMD8 MOVs
result of LOAD_PAYLOAD lowering would get register-coalesced away
under normal circumstances. However that wasn't the case for MRF
LOAD_PAYLOAD destinations on Gen6 and earlier, because register
coalesce only kicks in for GRFs, leaving a number of redundant SIMD8
MOVs lying around. On SNB this leads to the following shader-db
improvements:
total instructions in shared programs: 10770538 -> 10734681 (-0.33%)
instructions in affected programs: 2700655 -> 2664798 (-1.33%)
helped: 17791
HURT: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Images with modifiers come with restrictions:
1. They have to be simple 2D images right now
2. They need to have a sensible format (not compressed, multi-plane, or
non-power-of-two)
3. If a CCS modifier is being requested, they have to actually support
CCS_E and be CCS-compatible with any other formats the client may
wish to use for image views.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3434>
Ensure the hardware cursor is disabled when we set the mode for a
VkDisplayKHR object. The extension doesn't expose any mechanisms to
program the hardware cursor, so we need to ensure it is hidden.
Currently, it seems like X is responsible for disabling the cursor
before handing over the lease. But that seems a little frail, and we
should be disabling the cursor ourselves so it works correctly
independently of how the lease was prepared for us.
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1922>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1922>
When swr driver is in use it print detected architecture
message to std::err. It can be harmfull when swr is using
in multinodes environments.
It can be enabled setting env var SWR_PRINT_INFO to 1.
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
The current logic considers that the nir_intrinsic_component(store_intr)
encodes the source components start, but it actually encodes the
destination one. Source component offset adjustment is taken care of in
install_registers_instr(), when offset_swizzle() is called.
This fixes dEQP-GLES2.functional.shaders.random.all_features.fragment.45
when PAN_MESA_DEBUG=deqp (looks like exposing GLES3 features has an
impact on the varyings layout).
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3429>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3429>
The pre-tag right before is a block-level tag, which means it implicitly
terminates the paragraph. So there's no paragraph to close after this.
Instead, move the paragraph-closing before the pre-tag, to explicitly
close the paragraph.
Fixes: 41b3eb08d9 "docs: update meson docs for windows"
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3431>
Fixes the following valgrind error:
Invalid read of size 16
at 0x28F458A1: si_set_sampler_view_desc (in radeonsi_drv_video.so)
by 0x28F4657E: si_set_sampler_views (in radeonsi_drv_video.so)
by 0x28D62BF5: util_compute_blit (in radeonsi_drv_video.so)
by 0x28D3A944: vlVaHandleVAProcPipelineParameterBufferType (in radeonsi_drv_video.so)
by 0x28D34EE1: vlVaRenderPicture (in radeonsi_drv_video.so)
by 0x4B2582B: vaRenderPicture (in libva.so.2.500.0)
Address 0x18142a10 is 0 bytes inside a block of size 48 free'd
at 0x48369AB: free (vg_replace_malloc.c:540)
by 0x28D62D51: util_compute_blit (in radeonsi_drv_video.so)
by 0x28D3A944: vlVaHandleVAProcPipelineParameterBufferType (in radeonsi_drv_video.so)
by 0x28D34EE1: vlVaRenderPicture (in radeonsi_drv_video.so)
by 0x4B2582B: vaRenderPicture (in libva.so.2.500.0)
Block was alloc'd at
at 0x4837B65: calloc (vg_replace_malloc.c:762)
by 0x28EFB2EC: si_create_sampler_state (in radeonsi_drv_video.so)
by 0x28D62C30: util_compute_blit (in radeonsi_drv_video.so)
by 0x28D3A944: vlVaHandleVAProcPipelineParameterBufferType (in radeonsi_drv_video.so)
by 0x28D34EE1: vlVaRenderPicture (in radeonsi_drv_video.so)
by 0x4B2582B: vaRenderPicture (in libva.so.2.500.0)
Fixes: 69430d7e59 ("va: use a compute shader for the blit")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2321
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3428>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3428>
Introduce separate helper functions to set the blendfactor bits.
Lima uses bits 0-2 for the type, bit 3 sets the inverted function
and bit 4 is set if alpha is used.
alpha_src_factor and alpha_dst_factor don't need the alpha bit, so
they are masked with 0xf. There is only place for 4 bits anyway.
If alpha_src_factor is PIPE_BLENDFACTOR_SRC_ALPHA_SATURATE, we need
to change it to PIPE_BLENDFACTOR_ONE first.
This is exactly what the blob does and we pass all
dEQP-GLES2.functional.fragment_ops.blend.* tests now.
Better than the blob btw...
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3411>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3411>
This patch changes lower_to_cssa to be much more conservative
about assumptions which phi operands might interfere.
Previously, this pass wasn't exhaustive and could miss some corner cases.
v2: remove optimizations to find better insertion points as it's hard
to guarantee that they are always correct and have overall no benefit.
Fixes: 0b8216b2cd ('aco: Lower to CSSA')
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3385>
Some applications explicitly call glTex[ture]Parameteri[v] to set
GL_TEXTURE_MAX_LEVEL and GL_TEXTURE_BASE_LEVEL before uploading any
texture data. Core Mesa initializes MaxLevel to 1000, so if it isn't
that, we know they've set it. (We check for < TEXTURE_MAX_LEVELS to
avoid hardcoding that value, however.)
If MaxLevel - BaseLevel > 0, then the app is trying to tell us that
this texture is going to have multiple miplevels. In that case, go
ahead and allocate the space for it.
Avoids many resource_copy_region calls at texture finalization time
in the Civilization VI benchmark.
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3401>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3401>
No stride / no attributes means that nothing is being written to the
buffer. However it might still prevent primitives from being written out
to the other buffers. Disabling it entirely seems to fix it.
Fixes GTF-GL45.gtf30.GL3Tests.transform_feedback.transform_feedback_overflow
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The existing liveness analysis in ppir still ultimately relies on a
single continuous live_in and live_out range per register and was
observed to be the bottleneck for register allocation on complicated
examples with several control flow blocks.
The use of live_in and live_out ranges was fine before ppir got control
flow, but now it ends up creating unnecessary interferences as live_in
and live_out ranges may span across entire blocks after blocks get
placed sequentially.
This new liveness analysis implementation generates a set of live
variables at each program point; before and after each instruction and
beginning and end of each block.
This is a global analysis and propagates the sets of live registers
across blocks independently of their sequence.
The resulting sets optimally represent all variables that cannot share a
register at each program point, so can be directly translated as
interferences to the register allocator.
Special care has to be taken with non-ssa registers. In order to
properly define their live range, their alive components also need to be
tracked. Therefore ppir can't use simple bitsets to keep track of live
registers.
The algorithm uses an auxiliary set data structure to keep track of the
live registers. The initial implementation used only trivial arrays,
however regalloc execution time was then prohibitive (>1minute on
Cortex-A53) on extreme benchmarks with hundreds of instructions,
hundreds of registers and several spilling iterations, mostly due to the
n^2 complexity to generate the interferences from the live sets. Since
the live registers set are only a very sparse subset of all registers at
each instruction, iterating only over this subset allows it to run very
fast again (a couple of seconds for the same benchmark).
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3358>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3358>
There are some cases in shades using control flow where the varying load
is cloned to every block, and then the original node is left orphan.
This is not harmful for program execution, but it complicates analysis
for register allocation as there is now a case of writing to a register
that is never read.
While ppir doesn't have a dead code elimination pass for its own
optimizations and it is not hard to detect when we cloned the last load,
let's remove it early.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3358>
After flushing batches, iris_fence_flush() asks the kernel whether
each batch's last_syncpt has already signalled or not. (The idea is
that either the compute or render batch may not have actually had any
work queued up, so last_syncpt there might have been signalled a long
time ago.) If it's already completed, we don't bother to record it.
A strange corner is the case of repeated flushes. For example, we
might flush for some reason, and hit a glFlush(), and hit SwapBuffers.
It's possible for all the batches to have been flushed previously, -and-
for them to have actually completed. In this case, we'll see that there
are no syncobj's to wait on, and record fence->count == 0.
This works fine internally - fence_finish can see count == 0 and realize
that it doesn't need to wait, for example. But when working with native
FDs, we may be asked to export a fence with count == 0. So we need an
actual synchronization primitive we can hand off. Because all of the
relevant batches had been signalled when creating the fence, we want the
new dummy fence to be signalled as well.
So we just make a signalled syncobj and export it.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Currently the schedule_program implementation being used is picked
at compile time, which on the Android platform means that the
bifrost compiler & scheduler is used for all targets, including
midgard based hardware.
This commit disambiguates between the two schedule_program functions.
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
get_nir_image_intrinsic_image() was incorrectly mutating the value held
by the register which holds the intrinsic's first source (image index).
If this happened to be the register for an SSA def which is also used
elsewhere in the program, this meant that we would clobber that value
in subsequent uses.
Note that this only affects i965, because neither anv nor iris use the
binding table start sections, so nothing is ever added here.
Fixes KHR-GL46.compute_shader.resources-max on i965 with Eric Anholt's
MR !3240 applied. That MR reorders SSBOs and ABOs, so that test uses
image 0 and SSBO 0, causing this code to brilliantly add binding table
index 45 to both the image (correct) and the SSBO (bzzt, wrong!).
Fixes: 09f1de97a7 ("anv,i965: Lower away image derefs in the driver")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3404>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3404>
Vulkan 1.2 introduces some new structures to get the properties and
features of a device from extensions that were promoted to core in 1.1
and 1.2. This commit implements the new property queries and makes all
of the corresponding extension queries map to them.
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Vulkan 1.2 introduces some new structures to get the properties and
features of a device from extensions that were promoted to core in 1.1
and 1.2. This commit implements the new feature queries and makes all
of the corresponding extension queries map to them.
Reviewed-by: Iván Briano <ivan.briano@intel.com>
This is required for the subgroupBroadcastDynamicId feature that was
added in Vulkan 1.2.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
v2 (Jason Ekstrand):
- Add duplicate hooks for both the 1.2 and KHR versions of
vkCmdDraw[Indexed]IndirectCount.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It doesn't really support any Vulkan properly yet so why not claim 1.2?
This was an easier way of fixing the build than trying to roll it
forward to a later version of ANV's entrypoint generator scripts.
They were causing trouble with Marge Bot: The project settings require
that the pipeline succeeds before a merge request (MR) can be merged,
otherwise Marge doesn't wait for the pipeline to succeed before merging
an MR assigned to her. But Marge can't start manual jobs, so she would
always time out waiting for pipelines with manual jobs.
To avoid this, use these rules:
* Run the pipeline by default for MRs and main project branches changing
any files affecting it.
* For other MRs, run a single dummy job which always succeeds.
* Don't run any jobs for main project branch changes (e.g. from an MR
having been merged) not affecting the pipeline.
* Allow jobs to be started manually on branches of forked projects, as
before.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3361>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3361>
To implement NIR-to-TGSI, we need to be able to get the size of the
uniform variable for the TGSI declaration, not just the
.driver_location. With its location in mesa/st, drivers couldn't link
to it from nir-to-tgsi.
This feels like a common enough function to want, so let's share it in
the core compiler.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3297>
The only bit that gallium varied on was handling of bindless. We can
retain previous behavior for count_attribute_slots() by passing in
"true" (though I suspect this is just giving a silly answer to a silly
question), and delete our recursive function from mesa/st.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3297>
PhysicalStorageBuffer is lowered to nir_var_mem_global, and
SPIR-V 1.5rev1 in section "3.25. Memory Semantics <id>" says
UniformMemory
Apply the memory-ordering constraints to StorageBuffer,
PhysicalStorageBuffer, or Uniform Storage Class memory.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3322>
When possible, get rid of an s_not when all it does is invert the SCC,
and its successor s_cbranch / s_cselect can be inverted instead.
Also modify some parts of instruction_selection to take advantage of
this feature.
Example:
s2: %3900, s1: %3899:scc = s_andn2_b64 %0:exec, %406
s2: %3902 = s_cselect_b64 -1, 0, %3900:scc
s2: %407, s1: %3903:scc = s_not_b64 %3902
s2: %3906, s1: %3905:scc = s_and_b64 %407, %0:exec
p_cbranch_z %3905:scc
Can now be optimized to:
s2: %3900, s1: %3899:scc = s_andn2_b64 %0:exec, %406
p_cbranch_nz %3900:scc
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Previously all booleans needed an s_and with exec when they were turned
into a scalar condition. However, this is not needed for uniform booleans.
v2 by Daniel Schürmann:
- Make the code more readable
v3 by Timur Kristóf:
- Fix regressions, make it work in wave32 mode
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
By adding an extra instruction, we can replace the operands of
the s_cselect_b64, which allows it to get picked up by the
optimizer when it looks for uniform booleans.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
As pointed out by Boris, what we were calling PAN_LINEAR depth textures
was in fact u-interleaved tiled (!), but we never noticed since we
flipped the flag used for sampling, leading to all sorts of fun bugs
when attempting to directly acess depth textures from the CPU. Which
begs the question -- if what we called LINEAR was tiled, how do we
actually render linear depth textures? It turns out the flags for AFBC
form a mali_block_format 2-bit code just like their render-target
counterparts, so we can render to any of the above.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reported-by: Boris Brezillon <boris.brezillon@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3393>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3393>
Fixes: b390ff3517 ("intel/fs: Add support for SLM fence in Gen11")
Fixes: e142061399 ("intel/fs: Implement scoped_memory_barrier")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The assertion was failing.
Fixes: 363b4027fc - radeonsi: put up to 5 VBO descriptors into user SGPRs
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Using needs_vop3 check was flawed because it would only combine the
literal if the first operand is the literal. If the second or third
operand is the literal, then needs_vop3 will be true and the literal will
not be combined.
pipeline-db (Navi):
Totals from affected shaders:
SGPRS: 782051 -> 782051 (0.00 %)
VGPRS: 630048 -> 630048 (0.00 %)
Spilled SGPRs: 195 -> 195 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 54743740 -> 54585548 (-0.29 %) bytes
Max Waves: 67340 -> 67340 (0.00 %)
Instructions: 10182030 -> 10182030 (0.00 %)
pipeline-db (Vega):
Totals from affected shaders:
SGPRS: 701990 -> 699590 (-0.34 %)
VGPRS: 566632 -> 566784 (0.03 %)
Spilled SGPRs: 218 -> 218 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 49173564 -> 49007856 (-0.34 %) bytes
Max Waves: 59650 -> 59612 (-0.06 %)
Instructions: 9315135 -> 9293330 (-0.23 %)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
We could create VALU instructions which read two sgprs, but only if isel
created an instruction which already read one of them.
This change is in a separate patch from the apply_sgprs() rewrite so that
it can be tested if the rewrite affected anything.
pipeline-db (Navi):
Totals from affected shaders:
SGPRS: 216 -> 216 (0.00 %)
VGPRS: 64 -> 64 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 1756 -> 1708 (-2.73 %) bytes
Max Waves: 120 -> 120 (0.00 %)
Instructions: 312 -> 300 (-3.85 %)
pipeline-db (Vega):
Totals from affected shaders:
SGPRS: 216 -> 216 (0.00 %)
VGPRS: 64 -> 64 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 1784 -> 1736 (-2.69 %) bytes
Max Waves: 120 -> 120 (0.00 %)
Instructions: 319 -> 307 (-3.76 %)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
On some platforms (like Win64), unsigned long is 32-bit, so the first
cast doesn't do anything, and the compiler complains about an implicit
cast to a smaller type. So let's cast to an uintptr_t instead first,
as that's large enough on all platforms.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
I get warnings on MSVC for these implicit casts. Let's use explicit
casts instead.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We currently initialize this float-array with double-literals. Some
compilers generate warnings for this, so let's switch these to
float-literals instead.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Lower 8 bits of unknown_1_3 seems to be min_lod,
rest of 4 bits + miplevels are max_lod and min_mipfilter seems to be
lod bias. All are in fixed format with 4 bit integer and 4 bit fraction,
lod_bias also has sign bit.
Blob also seems to do some magic with lod_bias if min filter is nearest --
it adds 0.5 to lod_bias in this case. Same story when all filters are
nearest and mipmapping is enabled, but in this case it subtracts 1/16
from lod_bias.
Fixes 134 dEQP tests in dEQP-GLES2.functional.texture.*
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3359>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3359>
This stops using driGetRendererString() in favor of a simple snprintf().
This should have the same functionality on 64-bit systems, but drops
a "x86/MMX/SSE2" suffix on 32-bit systems. (People shouldn't be using
the GL_RENDERER string to check for CPU features...)
We also use gen_get_device_name() instead of PCI ID list munging.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3371>
This reverts commit 4cda61f11e for now,
as it appears to break i965 CI (32,000+ failures). Rob and I suspect
we need to do the equivalent of 1c6a2efa06
on i965 - we are doing nir_lower_tex and brw_nir_lower_resources in the
wrong order and that's likely triggering this condition. Once we fix
that, we should put this patch back.
This doesn't change behavior, but makes the code a bit easier to read.
Both values are zero, but I somehow swapped the logical meaning of them
when initializing.
This is needed to implement the EXT_EGL_image_storage spec:
"If <target> is GL_TEXTURE_2D, then the resultant texture must have a
sized internal format which is colorspace and size compatible with the
dma-buf. If the GL is unable to determine such a format, the error
INVALID_OPERATION is generated."
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Check various parts of the EXT_EGL_image_storage spec, and add a
new vfunc for drivers implementing it.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The major differences between EXT_EGL_image_storage and
EGLImageTargetTexture2DOES are:
(1) The texture target is made immutable
(2) EXT_EGL_image_storage supports non-2D targets.
We can reuse EGLImageTargetTexture2D and FreeTextureImageBuffer
for (1) pretty easily. For (2), let's just not support the
complicated targets. Let's reuse aspects of the
EGLImageTargetTexture2DOES implementation.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Now that both GLSL and SPIR-V are adding shared and tcs_patch barriers
(as appropreate) prior to the nir_intrinsic_barrier, we don't need to do
it ourselves in the back-end. This reverts commit
26e950a5de01564e3b5f2148ae994454ae5205fe.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
The GLSL barrier() intrinsic does an implicit shared memory barrier in
compute shaders and an implicit TCS patch output barrier in tessellation
control shaders. We'd like NIR's barrier intrinsic to just be a control
flow barrier and not have memory implications. To satisfy this, we need
to add an extra memory barrier in front of each nir_intrinsic_barrier.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
As per the Vulkan memory model, the proper translation of GLSL barrier()
is an OpControlBarrier with a scope of Workgroup and semantics of
Acquire, Release, and WorkgroupMemory. Older versions of GLSLang gave
an OpControlBarrier with semantics of None so we need to patch it up on
those versions.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Right now, it's implemented as a no-op for everyone. For most drivers,
it's a switch case in the NIR -> whatever which just breaks. For ir3,
they already have code to delete tessellation barriers so we just add a
case to also delete memory_barrier_tcs_patch.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
This re-enables and fixes support for stencil buffer.
It fixes 365 stencil related deqp tests. All tests that use INCR, INCR_WRAR,
DECR and DECR_WRAP as a stencil op still fail, but they also fail with the
blob, so we may ignore that for now.
We still have dEQP-GLES2.functional.depth_stencil_clear.depth_stencil_masked
failing, which is strange because it's the only one out of the
depth_stencil_clear.* set.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
This field is for the primitive ID export to the fragment shader.
Ported from RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It can't be enabled for geometry shaders, for NGG streamout and
for vertex shaders that export the primitive ID. NGG passthrough
requires that LDS isn't used.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Per the semi-recently-released NVIDIA docs, when this bit is not
enabled, then the result for RT[0] will be used. So if e.g. only a
single RT is drawn to and it's not RT[2], the results will not be
visible. Fixes
GTF-GL45.gtf33.GL3Tests.explicit_attrib_location.explicit_attrib_location_pipeline
which was failing due to a frag shader outputting only to location=2.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This corresponds to gl_PrimitiveID and gl_Layer. When both of these are
stored in a single AST.64 or AST.128 operation, then it appears as
though the whole store fails. Fixes the recently extended
glsl-1.50-transform-feedback-builtins piglit, and also
gtf30.GL3Tests.transform_feedback.transform_feedback_builtins.
The issue was reproduced on GM107 and GP108 but not GK208 nor GK104.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Perhaps in a future implementation, such events could be passed back to
the driver, or queried directly. However for now, this is required for
GL 4.3 robustness contexts.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The fix was found by Karol Herbst a long time ago, but it was unclear
why it helped or if it would create additional problems. This change
adds a comment that explains what's going on, and in the process also
normalizes the nv50 implementation to match.
The coordinates which are fed to gl_Position map directly to pixel
coordinates, since the viewport transform is disabled. If the
framebuffer is MSAA, then that doesn't affect the pixel coordinates at
all, it's just that each pixel has multiple samples.
Note that this makes it really clear that this approach is inappropriate
for EXT_framebuffer_multisample_blit_scaled, and also the 3d path will
fail terribly for direct copies. Thankfully the 2d path normally takes
care of this.
Fixes KHR-GL43.packed_depth_stencil.blit.depth32f_stencil8 as well as
scaling issues in a number of EXT_framebuffer_multisample-related piglit
tests (although they continue to fail due to inaccuracies).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
lima doesn't support alpha test, flat shading, two-sided color nor
clip planes. We can enable these caps when corresponding hw features
are implemented in the driver.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Fixes some of dEQP-GLES2.functional.polygon_offset.* tests and shadows in Q3A.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Apparently Mali4x0 doesn't do viewport clipping, so anything rendered beyond viewport
is still rendered. Looks like we need to use scissors to do clipping.
Fixes most of dEQP-GLES2.functional.clipping.*, 6 out of 7 remaining failures
fail on blob as well. Remaining [1] fails on many other gallium drivers.
[1] dEQP-GLES2.functional.clipping.triangle_vertex.clip_three.clip_neg_x_neg_z_and_pos_x_pos_z_and_neg_x_neg_y_pos_z
Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Apparently it doesn't depend on primitive type, the value
only depends on whether we specify point size via PLBU command --
bit 12 is set in this case
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
The state value of main_uniform_storage_index will be wrong for
add_parameter() when find_and_update_previous_uniform_storage()
finds a uniform if there is more than 1 uniform used in
multiple shader stages.
The new code is also simpler.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
The output of v_cmp instructions is s1 (a single SGPR) in wave32 mode,
as opposed to s2 (an SGPR-pair) in wave64 mode.
A couple of cases where this should have been fixed were omitted from
the previous patch by mistake.
Fixes: e0bcefc3a0
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
This will be convenient in a later commit enabling SIMD32 fragment
shaders, and happens to fix the calculation for MATH instructions
which is currently inaccurate for SIMD-lowered instructions on Gen4-5
platforms (all of them on Gen4 in SIMD16 mode), since it was based on
the shader's dispatch width rather than on the actual execution size
of the instruction.
This causes some shader-db noise on Gen4 due to the more compact
register allocation interacting with the SEND dependency workarounds,
but otherwise no major changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The liveness calculation done by the local CSE pass in order to prune
AEB entries whose sources are no longer live is currently inaccurate,
because the live intervals are calculated once at the beginning of the
pass, so they don't take into account any of the copy instructions
inserted by the CSE pass as it makes progress. However the IP counter
used in that calculation is based on the start_ip of the basic block,
which is updated automatically whenever any instructions are inserted
into the CFG. This causes the IP counter and liveness intervals to
get out of sync in programs with multiple basic blocks, causing the
CSE pass to toss AEB entries prematurely, which can lead to missed
optimization opportunities rather non-deterministically.
On BDW this leads to the following shader-db changes:
total instructions in shared programs: 14952488 -> 14951763 (-0.00%)
instructions in affected programs: 45416 -> 44691 (-1.60%)
helped: 40
HURT: 4
total spills in shared programs: 20989 -> 20970 (-0.09%)
spills in affected programs: 103 -> 84 (-18.45%)
helped: 3
HURT: 0
total fills in shared programs: 24981 -> 24926 (-0.22%)
fills in affected programs: 127 -> 72 (-43.31%)
helped: 3
HURT: 0
In addition it avoids a number of regressions in combination with some
of the optimization changes I'm working on for SIMD32, which would
have made CSE more effective... Causing it to be less effective
elsewhere in the program astonishingly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For uniform sample ID, only the first channel of msg_data will be
initialized. We need to pass that component only to the SEND message
for SIMD lowering to unzip the descriptor source correctly.
Fixes several dozens of conformance test failures with SIMD32 fragment
shaders enabled, including:
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.dynamic_sample_number.*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The problem occured when the return payload of a SIMD8 SEND
instruction was re-used as source payload of an EOT SEND message. In
such cases the interference edge added by that workaround between the
payload and grf127_send_hack_node would have no effect, because the
payload would be allocated to a fixed range of registers containing
r127 by the special handling of EOT message payloads in the same
function. This would cause things to blow up if the source payload of
the first SIMD8 message ended up being allocated to a range which
happened to overlap the destination.
Fix it by avoiding r127 altogether in the allocation of EOT message
payloads.
The problem can be reproduced on ICL with the fp-indirections2 Piglit
test-case in combination with the other optimizer changes of this
series.
Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127 for sends dest"
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Prevents invalid code from being emitted for ROR/ROL instructions in
SIMD32 shaders.
The problem can be reproduced with the following tests while forcing
SIMD32 to be used for fragment shaders:
piglit.shaders.glsl-rotate-left
piglit.shaders.glsl-rotate-right
However the issue could occur in production already with compute
shaders and a workgroup size large enough to trigger SIMD32 dispatch.
Fixes: 83fdec0f0d "intel/compiler: Enable the emission of ROR/ROL instructions"
Cc: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The current implementation was broken for any integers between 2^24
and 2^30 (it would return zero for me on ICL). The reason is that for
such integers we wouldn't take the 'if (0 <= shiftCount)' early return
path, however 'shiftCount + 7' would be positive, leading to a
negative 'count' argument passed to __shift64RightJamming(), which
would give undefined results.
This reworks the affected conversion functions to use either
__shortShift64Left() or __shift64RightJamming() based on the sign of
the final shift count, which should avoid the problem. In addition
this should qualify as a clean-up/optimization -- This implementation
of the conversion functions translates to 7 instructions less than the
original on Intel hardware.
This fixes the 'KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot'
conformance tests on soft fp64 hardware with large enough subgroup
size (>16).
Fixes: d5cf6e92b4 "glsl: Add built-in functions to do uint64_to_fp32(uint64_t)"
Fixes: c9d333a6b7 "glsl: Add built-in functions to do int64_to_fp32(int64_t)"
Cc: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
This patch prevents memory leak in get_version function in st_manager.c
This issue was found by valgrind:
16 bytes in 1 blocks are definitely lost in loss record 6 of 1,418
at 0x483CD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x63D9476: st_init_extensions (st_extensions.c:1679)
by 0x63B803B: get_version (st_manager.c:1271)
by 0x63B8124: st_api_query_versions (st_manager.c:1289)
by 0x63266EF: dri_init_screen_helper (dri_screen.c:583)
by 0x6321B12: dri2_init_screen (dri2.c:2110)
by 0x631AACC: driCreateNewScreen2 (dri_util.c:155)
by 0x5D58192: dri3_create_screen (dri3_glx.c:897)
by 0x5D39829: AllocAndFetchScreenConfigs (glxext.c:815)
by 0x5D39C57: __glXInitialize (glxext.c:941)
by 0x5D3290A: GetGLXPrivScreenConfig (glxcmds.c:174)
by 0x5D34F38: glXQueryExtensionsString (glxcmds.c:1307)
Fixes: eca8032f20 ("gallium: Add ARB_gl_spirv support")
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3345>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3345>
Previously, when cluster_size was set to 0, it always worked as if
the cluster size was 64. This commit fixes it in wave32 mode by
changing to work as if the cluster size was set to 32.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We did not take into account if name is NULL, so we could dereference
a NULL pointer in strncmp() call.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
We can only sample from 24-bit packed format and can't render into it and
it causes chromium-based browsers to fail when they create FBO with GL_RGB
format. Drop R8G8B8 alltogether so mesa can promote it to RGBX format.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
There's no reason to hand-roll all of the memory re-allocation fall-back
code for compute shaders. It's just duplicated complexity. This also
makes it more clear in flush_compute_state where the
MEDIA_INTERFACE_DESCRIPTOR_LOAD command gets emitted relative to other
packets in the command stream.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Because Gen7 push constants are already relative to dynamic state base
address, they aren't really an address. It's deceptive to return an
address from the helper function. Instead, let's leave it as a
special-case in the gen7-11 helper; we don't need the helper for code
de-duplication for Gen7 anyway.
Fixes: 67d2cb3e93 "anv: Add get_push_range_address() helper"
Closes: #2323
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Add debug flag to disable tiling. Note that it prevents lima from creating
tiled buffers, but it's still able to import them if modifier is specified
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Use linear layout for shared buffers if modifier is not specified
and use linear layout when importing buffers with invalid modifier.
Fixes: 01a451b04d ("lima: handle DRM_FORMAT_MOD_INVALID in resource_from_handle()")
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Setting up transitive conflicts between a full register and its two
half registers (eg r0.x and hr0.x and hr0.y) will make the half
registers conflict. They don't actually conflict and this prevents us
from using both at the same time.
Add and use a new ra helper that sets up transitive conflicts between
a register and its subregisters, except it carefully avoids the
subregister conflict.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
The availability is not written at the location changed in
ee6fbb95a74d...
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ee6fbb95a7 ("anv: Properly handle host query reset of performance queries")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Use image_load_mip and image_store_mip respectively if the lod
parameter isn't zero.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
SPV_AMD_shader_image_load_store_lod allows to use a lod parameter
with OpImageRead, OpImageWrite and OpImageSparseRead.
According to the specification, this parameter should be a 32-bit
integer. It is initialized to 0 when no lod parameter is found
during SPIR-V->NIR translation.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is a cleanup but also a fix for commit dd09f1d806. In case of
i965 we did not actually create hash for cached shader programs.
Fixes: dd09f1d806 "mesa/st/i965: add a ProgramResourceHash for quicker resource lookup"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When decoding using VDPAU, the _MaxLevel value becomes -1 due to
NumLevels being equal to 0 at a certain point, and decoding fails
due to an assertion later on.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: 19.2 19.3 <mesa-stable@lists.freedesktop.org>
PIPE_CAP_MAX_VERTEX_BUFFERS already sets the maximum vertex_buffer_index.
There's no need to error on num_elements == 0 (if that can even happen).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Fixes:
dEQP-GLES3.functional.draw.draw_arrays_instanced.*
dEQP-GLES3.functional.draw.draw_elements_instanced.*
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Sometimes using user buffer (not VBO) e.g. glVertexPointer
one thread could free memory before other thread used it.
Instead of copying this memory to driver simplier thing is
to block until draw finish.
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
Only the blocks that are reachable are inserted with an end_nop
instruction at the end.
When handling the Phi second pass, if the Phi has a parent block that
does not have an end_nop then it means this block is unreachable, and
thus we can ignore it, as the Phi will never come through it.
Fixes dEQP-VK.graphicsfuzz.uninit-element-cast-in-loop.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
e5167a9276 disabled SDMA for gfx8.
This caused 3 piglit arb_sparse_buffer tests (basic, buffer-data
and commit) to crash on GFX8.
Reported-by: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes: e5167a9276 ("radeonsi: disable SDMA on gfx8 to fix corruption on RX 580")
From issue 10 of the OES_EGL_image_external_essl3:
A limited set of use-cases is enabled by making glBindImageTexture
accept external textures. Shaders can access such external textures
using the existing <image2D> sampler type.
Fixes: 02a6d901ee ("mesa: add OES_EGL_image_external_essl3 support")
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Our barrier instruction does not implicitly do a memory fence but the
GLSL barrier() intrinsic is supposed to. The easiest back-portable
solution is to just add the NIR barriers. We'll sort this out more
properly in later commits.
Cc: mesa-stable@lists.freedesktop.orgCloses: #2138
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes a hang on Raven with Resident Evil 2.
I did not find anything more restricted to fix it:
- Setting persistent_states_per_bin to 1 fixes it too,
but likely does an internal break on any descriptor set changes
too.
- Only breaking the batch when cb_target_mask changes does not fix
it (and looking at AMDVLK comments, I suspect the code in radeonsi
should really be doing a FLUSH_DFSM).
- Always doing a FLUSH_DFSM on shader switch helps, but that is more
often than this and I don't think we should be doing that when DFSM
is disabled.
- Also emitting the existing break on framebuffer change when DFSM is
disabled does not fix the issue.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2315
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
They're not available for Debian buster yet, so we have to use upstream
snapshot packages again.
In contrast to earlier, we now store the LLVM APT repository key in Git
instead of re-downloading it every time.
We need to mindful that we don't clobber the shadow comparator.
Fixes dEQP-GLES3.functional.shaders.texture_functions.texture.sampler2darrayshadow_*
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
The hardware separates face selection and array indexing, it looks like,
whereas Gallium smushes them together with some modulus fun. Let's fix
it so mipmapped 2D arrays work without regressing cubemaps.
Fixes dEQP-GLES3.functional.texture.filtering.2d_array.* among others.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
This is adapted from the GLSL IR code but doesn't need to
iterate over the IR. I believe this also fixes a potential bug in
the GLSL IR code which potentially counts the same output twice.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This will allow us to do some linking in NIR that was previously
done by the GLSL IR linker. To start with this just has calls for
linking atomics.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This is pretty much a copy of link_check_atomic_counter_resources()
updated to work with the NIR linker.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
A NIR based glsl linking function will be too different to the
spirv version to bother attempting any sharing. So lets change
the name to be explicit.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
The gl_nir_lower_buffers pass relies on recognizing the same literal
constants as the GLSL compiler so that constant buffer array indices
are constant in nir as well. Without this, get_block_array_index()
would see
vec1 32 ssa_723 = deref_var &const_temp@1 (function_temp int)
vec1 32 ssa_724 = load_const (0x00000001 /* 0.000000 */)
...
vec1 32 ssa_5 = deref_var &const_temp@1 (function_temp int)
vec1 32 ssa_6 = intrinsic load_deref (ssa_5) (0) /* access=0 */
vec1 32 ssa_7 = deref_var &blockB (ssbo BlockB[1])
vec1 32 ssa_8 = deref_array &(*ssa_7)[ssa_6] (ssbo BlockB) /* &blockB[ssa_6] */
instead of a literal 1, and ultimately generate the block name
BlockB[0]. That used to work, since we before the previous commits
we'd compact the block binding points and names. Thus, there would
always be a BlockB[0].
Now, if an entry in a block array isn't used, we don't generate that
block name, which means that if entry 0 isn't used BlockB[0] isn't
present and then get_block_array_index() fails to find the block.
In most cases we would have dealt with this in the call to
st_nir_opts() in st_nir_link_shaders(), but in the num_shaders == 1
case (for example, compute) we would call gl_nir_lower_buffers()
before we lowered GLSL constants. Move that corner case up next to
where we call st_nir_link_shaders() so we call st_nir_opts() at the
same point in the flow for all shaders.
Fixes: dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.18
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
When SSBO array is used with packed layout, both IR tree
and as a result, NIR tree will be incorrect.
In fact, the SSBO dereference indices won't
match the array size in some cases like the following:
"layout(packed, binding=1) buffer SSBO { vec4 a; } ssbo[3];
out vec4 color;
void main() {
color = ssbo[2].a;
}"
After linking the IR and then NIR will have an SSBO array
definition with size 1 but dereference still will have index 2
and linked_shader->Program->sh.ShaderStorageBlocks
will contain just SSBO with name "SSBO[2]"
So this line should be removed at least as a workaround for now
to avoid error like:
Failed to find the block by name "SSBO[0]"
Fixes: 810dde2a "glsl/nir: Add a pass to lower UBO and SSBO access"
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This is needed to be in agreement with spec requirements:
https://github.com/KhronosGroup/OpenGL-API/issues/46
Piers Daniell:
"We discussed this in the OpenGL/ES working group meeting
and agreed that eliminating unused elements from the interface
block array is not desirable. There is no statement in the spec
that this takes place and it would be highly implementation
dependent if it happens. If the application has an "interface"
in the shader they need to match up with the API it would be
quite confusing to have the binding point get compacted.
So the answer is no, the binding points aren't affected by
unused elements in the interface block array."
v2: - 'original_dim_size' field moved above to keep
the struct packed better on 64-bit
- added a comment for 'total_num_array_elements' field
- fixed a binding point calculations for SSBOs array of arrays
( Ian Romanick <ian.d.romanick@intel.com> )
- fixed binding point calculations for non-packed SSBOs
v3:
- rename 'total_num_array_elements' to 'aoa_size'
( Jason Ekstrand <jason@jlekstrand.net> )
- rename 'boffset' to 'binding_stride'
( Alejandro Piñeiro <apinheiro@igalia.com> )
Fixes: 8cf1333b "glsl: link uniform block arrays of arrays"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109532
Reported-By: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Fritz Koenig <frkoenig@google.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Take one step towards sharing code between the LAVA and non-LAVA jobs,
with the goals of reducing maintenance burden and use of computational
resources.
The env var DEQP_NO_SAVE_RESULTS allows us to skip the procesing of the
XML result files, which can take a long time and is not useful in the
LAVA case as we are not uploading artifacts anywhere at the moment.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Same format code as UINT... might be different in how it's fed into a
shader but we'll deal with that when we get there.
Fixes dEQP-GLES3.functional.vertex_arrays.single_attribute.output_types.usigned_int2_10_10_10.components4_vec2_quads1
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Make it a lot more obvious what we're doing and fix more than a few
corner cases in the process.
Fixes
dEQP-GLES3.functional.buffer.map.write.render_as_index_array.pixel*, and
likely others.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
We use the lowering in nir_format_convert. There are native ops for this
so this is far from optimal and not remotely efficient but as with most
blend shader things right now, it's hard enough to get it working, so
let's focus on that for now. We'll make it fast later (once we have
GLES3 stable, we can start optimizing these things).
Fixes dEQP-GLES3.functional.fragment_ops.blend.fbo_srgb.*
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Changes the assert to match the comment above.
This assert was failing in some cases while running darkplaces.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
PP stream terminator size seems to be 4 words, it worked with full PP
stream because we align stream beginning to 32 bytes and BO is
initialized with zeroes. But with partial PP stream it sometimes break
if for new PP stream we reuse BO that has non-zero value at this place.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
We don't need to reload and redraw some tiles if framebuffer was not
cleared and scissor test was enabled for some of draws. This simple
optimization fixes cursor lag in X11
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
This commit postpones PP stream generation till job is submitted.
Doing that this late allows us to skip reloading and redrawing tiles
that were not updated.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Drop assert as it is not necessary and used wrong anyway.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Updated documentation renames "Anisotropic Algorithm" to "LOD Algorithm"
and adds a note for Gen9+ saying "The EWA Algorithm should only be
enabled for Anisotropic Filtering modes." and indicating that the extra
accuracy shouldn't be necessary for other modes, and comes at a cost.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2 (Ken): Handle platforms without sampler support for HiZ
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v2 changes]
This eliminates 50% of pixels (2M) rendered for a blit in GS:GO. This
accounts for 3% of pixels rendered in the game. Total GPU clocks for
the first 900 frames of CSGO improves by 1%.
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The shader code required to do this is int(sat(x) * UINT24_MAX) which
isn't really worth all the effort to avoid. Doing the format
conversion, on the other hand, prevents us from sampling with HiZ which
is something that we very much want on gen8-9 where we can.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This moves the descriptor based texture structs and their helpers
into the only user.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
This moves the state based texture structs and their helpers
into the only user.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
There are only 13 bits available to store the line width, hence
it can't be larger than 8191
v2: Add Fixes tag
v3: - Unify value since for all r600 archs (Konstantin Kharlamov)
- Correct the value the line width value is emitted as a 12.4
fixed point value of 1/2 line width on r600-r700 and as
8 * line width on Evergreen and newer.
Fixes: 06bfb2d28f
r600: fork and import gallium/radeon
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Konstantin Kharlamov <hi-angel@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3286>
The tiled-case is non-sensical for non-base mips, but Vulkan requires
that this function handles it but at the same time does not require
returning anything useful. So we can basically return anything.
Correct tiled pitch and offset are still required for our own WSI and
in the future getting the layouts of images with DRM format modifiers.
Both don't have to deal with images with more than 1 level though.
Fixes: 824bd0830e "radv: return the correct pitch for linear mipmaps on GFX10"
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2301
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2304
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
I copy and pasted some of the boilerplate but never the implementation.
For now, ASTC 5x5 is disabled and faked via uncompressed RGBA; let's
delete these remnants until such a time when we implement it properly.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Intel Gen9 hardware has some nasty restrictions where ASTC 5x5 formats
and color compression can't both live in the sampler cache at the same
time. To properly support it, we have to track which of those exist
in the cache and flush ASTC out or resolve away compression.
As far as I'm aware, very little uses ASTC 5x5 textures, so instead
of replicating all that for iris, we simply turn it off and rely on
the Gallium fallback mechanism to fake it via uncompressed RGBA.
This should avoid GPU hangs any time people use ASTC 5x5 with CCS.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This patch allows us to fake ASTC 5x5 specifically, while leaving the
other ASTC LDR formats with native support. I plan to use this in iris,
at least for the time being.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We already have a helper for this, so let's use that instead of rolling
our own version.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Paul Cercueil <paul@crapouillou.net>
Etnaviv also does the same thing, so let's try to avoid repetition here,
and use the same for it code as well.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Paul Cercueil <paul@crapouillou.net>
Not 100% sure if this matches the semantics, but it seems to pass the
tests, so it seems like an improvement.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
According to the description of VkGraphicsPipelineCreateInfo(),
pViewportState, pMultisampleState, pDepthStencilState and
pColorBlendState must be ignored when rasterization is not enabled.
This avoids potentially invalid pointers being dereferenced when
rasterization is disabled. Tested with `demos_x64 VK_Parameter_Zoo`
from Renderdoc repository.
v2: Don't store the `raster_enabled` as part of anv_pipeline, just
query it from the create info. This avoids storing a state that's
only used during pipeline creation. (Jason)
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2258
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric@engestrom.ch> [v1]
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For indexed draw number of VS invocations is (ctx->max_index - ctx->min_index + 1),
so we have to use this number when calculating space for varyings, gl_Position and
gl_PointSize.
Fixes dEQP-GLES2.functional.buffer.write.use.index_array.array and
dEQP-GLES2.functional.buffer.write.use.index_array.element_array
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
All VkFoo structs are typedef'd to not need the struct keyword. Leaving
it in there is just extra characters and breaks Vulkan's aliasing when
stuff gets promoted to core versions. It's better to just never use
struct for VkFoo.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Previously, P016 was used for the decoding of 10-bit HEVC/H.265 encoded
videos, which worked fine for mpv and ffmpeg. GStreamer specifically looks
for P010, so this patch sets the default buffer type to P010 for HEVC
decoding.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3153>
texelFetch is a requirement for OpenGL 3.0, so this gets us a step
closer to GL 3.0 support.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
With VK_AMD_mixed_attachment_samples, the number of depth/stencil
samples isn't always equal to the number of color samples. Adjust
the number of Z samples when it's different but make sure to have
a consistent sample count if there are no depth/stencil attachments.
Also adjust the number of samples used for fragment shaders which is
the number of color samples if mixed attachment samples are used.
Only enabled on GFX8+ because it's untested on previous chips.
All dEQP-VK.pipeline.multisample.mixed_attachment_samples.* now pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3018>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3018>
Since 18a8c3f7f1 we don't create a driver CSO if there are any
incompatible elements, so only ask backends to delete it if it exists.
Fixes multiple CTS crashes in V3D.
Fixes: 18a8c3f7f1 ("u_vbuf: Only create driver CSO if no incompatible elements")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
i965 and iris use inputs_read/outputs_written for a shader stage to
determine the layout of input and output storage. Adjacent stages must
agree on the layout, so adjacent input/output bitfields must match.
This patch adds a new nir_shader_compiler_options::unify_interfaces
flag which asks the linker to unify the input/output interfaces between
adjacent stages.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3249>
They need a very particular form; the naive way we did before is not
sufficient in practice, it doesn't look like. So let's follow the rough
structure of the blob's writeout since this is fixed code anyway.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This still may not be perfect (in the sense that legal shaders might
still get cut off) but this fits how writeout is done with both Panfrost
and the blob, so it's good enough for what we need and allows MRT
shaders to be sanely disassembled.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It's a long story... but we'd try to insert constants that weren't there
and end up clobbering fields in the bundle following the constant
array...
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Blend shader size and location in memory is considerably constrained,
probably to facilitate optimizations (my guess is that blend shaders are
run strictly out of i-cache). We need to pack the blend shaders for each
RT of a single framebuffer together. The easiest way to do this is at
draw time which is not terribly efficient but will hold us over for now.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We don't handle this format yet, but we will soon, and the abort in
pan_pack_color is possible even without exposing the format... Handling
this gracefully might not be required by the spec but let's not crash.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Remove the invert on arguments to branches, and invert the branch
condition instead. This saves one instruction per inverted argument.
Closes#2088
Signed-off-by: Afonso Bordado <afonsobordado@az8.co>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The GCnano has only 4 vertex buffers instead of 16. This information
can be extracted from the GPU status registers and is already stored
in screen->specs.stream_count. Use PIPE_CAP_MAX_VERTEX_BUFFERS to
report this information and permit u_vbuf to reorganize the shaders
to fit.
This fixes the following dEQP on GCnano:
dEQP-GLES2.functional.shaders.conversions.vector_combine.float_float_float_float_to_vec4_vertex
This fixes all the other dEQP-GLES2.functional.shaders.conversions.*
which used to fail on GCnano.
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3241>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3241>
Fixes 240 failing test cases in dEQP-VK.spirv_assembly which
were failing due to a bad s_ashr_i32 instruction. This commit
fixes the instruction format along with the definitions of the
instruction.
Fixes: 11f43caaec
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Two competing rules for defining u_format_table.c exists,
which is an error.
Additionally the more general rule lacks the inclusion of
format/u_format.csv.
Fixes: 882ca6dfb0 ("util: Move gallium's PIPE_FORMAT utils to /util/format/")
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Since we have a separate blend shader for each render target, let's
simplify this structure and reduce the options memory footprint by 88%
or something goofy like that.
Should also enable separate blending per render target.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We need to actually work out the varying format on demand, rather than
assuming rgba32f.
Fixes dEQP-GLES3.functional.fragment_out.basic.int.*
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tessellation Control and Evaluation shaders are implementing
tessellation and require special handling of their inputs
and outputs.
TCS can write out not only per-vertex, but also per-patch
(per-primitive) attributes and tessellation factor values
that control the tessellator.
TES can read TCS outputs, plus must be feeded with new
system values (tessellation coordinates) that are
outputs of the tessellator fixed function.
TCS can also contain calls to barrier() function (similar
to compute shaders).
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Alok Hota <alok.hota@intel.com>
It has been unused for a while; let's just remove the abstraction.
Technically the hardware does support 32-bit job descriptors, but we
don't and we can't keep them from breaking so let's not pretend they
work.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Suggested-by: Boris Brezillon <boris.brezillon@collabora.com>
There's only one way to encode comparison functions in the command
stream, not two. It's just that the semantics for texture comparisons
are flipped from the semantics of stencil comparison. We can factor out
that flip to common Panfrost code, rather than tying it to a second
Gallium routine.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Somehow we have native hardware for all of these. Suspected by staring
at the bit pattern; confirmed by poking in various texture wrap modes
into the textures mesa demo and seeing what happens.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It's a relic from before we understood the varying builtins. It should
never actually come up if the builtins are decoded correctly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
These are conventions by the blob (a convention we happent to follow).
They are not at all intrinsic to the hardware, so now that the
convention is implemented within the Midgard stack, these defines are
wholly unused. Remove them.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Updates radv Makefile.sources and fixes the following building error:
external/mesa/src/amd/vulkan/radv_shader.c:1122:
error: undefined reference to 'radv_declare_shader_args'
Fixes: 3b14336 ("ac/nir, radv, radeonsi: Switch to using ac_shader_args")
Fixes: 66c703b ("radv: Move argument declaration out of nir_to_llvm")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Updates amd Makefile.sources and fixes the following building errors:
external/mesa/src/gallium/drivers/radeonsi/si_compute_prim_discard.c:338: error: undefined reference to 'ac_add_arg'
external/mesa/src/gallium/drivers/radeonsi/si_compute_prim_discard.c:340: error: undefined reference to 'ac_add_arg'
external/mesa/src/gallium/drivers/radeonsi/si_compute_prim_discard.c:341: error: undefined reference to 'ac_add_arg'
external/mesa/src/gallium/drivers/radeonsi/si_compute_prim_discard.c:342: error: undefined reference to 'ac_add_arg'
Fixes: 9885af3 ("ac: Add a shared interface between radv, radeonsi, LLVM and ACO")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
RADV Android build rules are now getting the wrong vk_format.h
from src/vulkan/util include, the simplest way to fix is to add
src/amd/vulkan include prior to src/vulkan/util include
Fixes the following building errors:
out/target/product/x86_64/obj_x86/STATIC_LIBRARIES/libmesa_radv_common_intermediates/vk_format_table.c:39:4:
error: use of undeclared identifier 'VK_FORMAT_LAYOUT_PLAIN'
...
out/target/product/x86_64/obj_x86/STATIC_LIBRARIES/libmesa_radv_common_intermediates/vk_format_table.c:131:8:
error: use of undeclared identifier 'VK_FORMAT_TYPE_UNSIGNED'; did you mean 'UTIL_FORMAT_TYPE_UNSIGNED'?
{VK_FORMAT_TYPE_UNSIGNED, true, false, false, 4, 0}, /* x = a */
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
Fixes: 3a28281 ("util: Add a mapping from VkFormat to PIPE_FORMAT.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Updates Makefile.sources and fixes the following building error:
In file included from external/mesa/src/vulkan/util/vk_format.c:24:
In file included from external/mesa/src/vulkan/util/vk_format.h:28:
external/mesa/src/util/format/u_format.h:33:10: fatal error: 'pipe/p_format.h' file not found
#include "pipe/p_format.h"
^~~~~~~~~~~~~~~~~
1 error generated.
Fixes: 3a28281 ("util: Add a mapping from VkFormat to PIPE_FORMAT.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It shows up as a special (magic?) attribute. We could try to be clever
and only include the extra record if gl_VertexID is actually read, but
honestly that's just extra complexity for no good reason. Might as well
just always include it; this won't be a real bottleneck, I don't think.
Fixes dEQP-GLES3.functional.shaders.builtin_variable.vertex_id.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We have special records for these, put in a fixed location by convention
per the blob.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Just like varyings have special records for point coordinates (etc),
attributes have special records for vertex/instance ID. We can parse
these fairly easily, although they don't line up exactly with normal
attribute records.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Padded counts are numbers of the form:
n = (2k + 1) * 2^s
for k, s integers. Rather than explicitly store k and s separately and
then compute this formula on demand, it's much cleaner to store the
padded number itself, which is what you manipulate most of the time.
When you do need k,s it is easy to factor by noticing the bitwise
representation:
s = ctz(n)
k = n >> (s + 1)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Slight bug with instancing. No harm done but let's get rid of the
pandecode warning, it's just noise.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The algorithm doesn't need to be tangled up in details about the
attribute records themselves. We'll need to compute magic divisors for
gl_InstanceID in a second.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
They don't need them; this will allow us to move the code into encoder/
which in turn will make the messy Gallium code less scary.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Let's follow the naming convention that panfrost command stream code is
organized by command stream structure.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
These show up in some blend shaders. Let's use the shared lowering and
remove our own.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
GL ES 3.0 requires it to be higher, and stuff seems to work just fine.
Fixes: dEQP-GLES3.functional.implementation_limits.max_vertex_output_components
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
We need to reshuffle to sync up the shadow coordinate temporary with the
cubemap coordinate temporary. Once that's in place, it's simple enough
(we load the shadow coordinate into .z like 2D).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
My latest divination spell has uncovered a pattern in the aether.
Although the swizzle is unaligned, its format is otherwise standard.
Document this, removing the old incorrect understanding of the swizzle
(which coincided on common special swizzles only).
Fixes dEQP-GLES3.functional.shaders.texture_functions.texelfetchoffset.sampler2d_fixed_fragment
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We zero the extra components anyway. Fixes
dEQP-GLES3.functional.shaders.texture_functions.texelfetch.sampler2d_fixed_fragment
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We may call it with sentinel values (~0 in particular) corresponding to
unused arguments; ignore these.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
These days `ctx->inputs` is the split scalar input components and
`ir->inputs` is the full vecN. This got fixed in the load_input case,
but the load_interpolated_input case was missed.
Fixes: bdf6b7018c ("freedreno/ir3: re-work shader inputs/outputs")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Due to the succeeding break we would fall into some off-by-one errors.
These should be resolved now.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
At the moment there's no need to actually count these but we do need a
placeholder for report.py to be happy.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We use these prefixes in panfrost shader-db and they need to match for
shader-db to be happpy.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The blob uses COMPUTE jobs for some internal purposes. These are
essentially free but panfrost doesn't use them, so it messes up the
numbering. Just filter them out.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We support the same set of samples for integer color formats as for
non-integer. We've been advertising it wrong since before the initial
Vulkan 1.0 release. :-(
Fixes: d689745303 "vk/0.210.0: Rework device features and limits"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Some of the latest changes are causing the following build error on Android:
```
external/mesa3d/src/gallium/auxiliary/nir/nir_to_tgsi_info.c:403:6:
error: redefinition of 'nir_tgsi_scan_shader'
void nir_tgsi_scan_shader(const struct nir_shader *nir,
^
external/mesa3d/src/gallium/auxiliary/nir/nir_to_tgsi_info.h:37:20:
note: previous definition is here
static inline void nir_tgsi_scan_shader(const struct nir_shader *nir,
^
```
Include nir_to_tgsi_info.c and nir_to_tgsi_info.h into the build
only if LLVM is enabled.
Signed-off-by: Roman Stratiienko <roman.stratiienko@globallogic.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2978>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2978>
We might get asked to pitch the storage on a buffer that already has
no meaningful contents. In this case, the existing buffer is as good
as a new one.
I was passing iris keys to brw_debug_key_recompile, leading to out of
bounds memory reads.
Fixes: 2e654db27a ("iris: Create smaller program keys without legacy features")
Fix build error after llvm-10 commit 5d986953c8b9 ("[IR] Split out
target specific intrinsic enums into separate headers").
../src/gallium/drivers/swr/rasterizer/jitter/functionpasses/lower_x86.cpp:78:37: error: ‘x86_bmi_bextr_32’ is not a member of ‘llvm::Intrinsic’
{"meta.intrinsic.BEXTR_32", Intrinsic::x86_bmi_bextr_32},
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Krzysztof Raszkowski <krzysztof.raszkowski@intel.com>
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
The GPU presents the state of the hardware front_face in internal
register 0 (i0), the range of which is 0.0f..1.0f.
This patch assigns the fragment shader input to this internal register.
Moreover, based on the internal front_ccw state, the value of the i0
register is inverted accordingly using SET.EQ/SEQ.NE instruction before
being further processed in the shader. This mimics the operation of the
NIR compiler.
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2868>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2868>
More vertex buffers are used than the hardware supports. In
principle, we only need to make sure that less vertex buffers are
used, and mark some of the latter vertex buffers as incompatible.
For now, mark all vertex buffers as incompatible.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2807>
This enables Mesa to work with Ingenic SoCs through the use of the
ingenic-drm modesetting driver along with the render-only drivers,
such as Etnaviv on the JZ4770 SoC.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This introduces new vec8 and vec16 instructions (which are the only
instructions taking more than 4 sources), in order to construct 8 and 16
component vectors.
In order to avoid fixing up the non-autogenerated nir_build_alu() sites
and making them pass 16 src args for the benefit of the two instructions
that take more than 4 srcs (ie vec8 and vec16), nir_build_alu() is has
nir_build_alu_tail() split out and re-used by nir_build_alu2() (which is
used for the > 4 src args case).
v2 (Karol Herbst):
use nir_build_alu2 for vec8 and vec16
use python's array multiplication syntax
add nir_op_vec helper
simplify nir_vec
nir_build_alu_tail -> nir_builder_alu_instr_finish_and_insert
use nir_build_alu for opcodes with <= 4 sources
v3 (Karol Herbst):
fix nir_serialize
v4 (Dave Airlie):
fix serialization of glsl_type
handle vec8/16 in lowering of bools
v5 (Karol Herbst):
fix load store vectorizer
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This assert causes testing tools such as shaderdb to abort on some test
cases. This is an unsupported feature and not a compiler bug. The
compilation error is already propagated correctly, so we can remove the
assert to allow testing tools to run to completion.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3176>
ppir has some code that operates on all ppir_src variables, and for that
uses ppir_node_get_src.
lod bias support introduced a separate ppir_src that is inaccessible by
that function, causing it to be missed by the compiler in some routines.
Ultimately this caused, in some cases, a bug in const lowering:
.../pp/lower.c:42: ppir_lower_const: Assertion `src != NULL' failed.
This fix moves the ppir_srcs in ppir_load_texture_node together so they
don't get missed.
Fixes: 721d82cf06 lima/ppir: add lod-bias support
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3185>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3185>
The test here is testing whether either variable is non-zero.
While currently the test works fine, it's fragile. Replace it
with logical OR to avoid the fragility.
Signed-off-by: Marek Vasut <marex@denx.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Currently piglit spec@arb_occlusion_query@occlusion_query_conform
spins for ever as the resource status is never reset. See
etna_hw_get_query_result(..) for more details.
Fixes: 1456aa61cc ("etnaviv: Rework resource status tracking")
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Tested-by: Marek Vasut <marex@denx.de>
Since we are using st_common_variant while creating variant for vertext
program, we can release tokens created in st_create_vp_variant which
are already stored in respective states.
This fix memory leak found with piglit tests
Fixes bc99b22a30 ('st/mesa: use a separate VS variant for the draw module')
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Currently when lowering mod() we add an extra instruction so if
mod(a,b) == b then 0 is returned instead of b, as mathematically
mod(a,b) is in the interval [0, b).
But Vulkan spec has relaxed this restriction, and allows the result to
be in the interval [0, b].
This commit takes this in account to remove the extra instruction
required to return 0 instead.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2922>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2922>
Using a drm syscall layer faking a kernel driver :
==581460== Conditional jump or move depends on uninitialised value(s)
==581460== by 0x48A4C2B: close (drm-hooks.cpp:185)
==581460== by 0x5A815F1: dri3_alloc_render_buffer (loader_dri3_helper.c:1469)
==581460== by 0x5A82050: dri3_get_buffer (loader_dri3_helper.c:1827)
==581460== by 0x5A82662: loader_dri3_get_buffers (loader_dri3_helper.c:2028)
==581460== by 0x6C78109: intel_update_image_buffers (brw_context.c:1870)
==581460== by 0x6C77805: intel_update_renderbuffers (brw_context.c:1499)
==581460== by 0x6C7789D: intel_prepare_render (brw_context.c:1520)
==581460== by 0x6C773D4: intelMakeCurrent (brw_context.c:1341)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 069fdd5f9f ("egl/x11: Support DRI3 v1.1")
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3152>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3152>
And add fields uncovered by looking at the firmware. I think this covers
all the memory, register, and scratch manipulation opcodes that exist on
A6xx, plus one additional nice find for Vulkan and describing a
previously unknown opcode and documenting CP_WAIT_REG_MEM.
Note that the bits for the CP_REG_TO_MEM count, as well as the formula
for computing the actual count for both CP_REG_TO_MEM and CP_MEM_TO_REG,
are changed because the A630 SQE firmware actually does something
different. I haven't investigated older microcodes to see whether this
extends back to A5xx and A4xx, but the only non-A6xx uses of this
field result in the same bit-pattern when using the A6xx bit range and
formula, so it should be safe to change the definition universally.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3116>
For the sake of our testing infrastructure, disable this extension
for TGL until we can sort out a hang in Vulkan CTS.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Existing code was ignoring whether the type of the immediate source
was signed or not. If the source was signed, it would ignore small
negative values but it also would wrongly accept values between
INT16_MAX and UINT16_MAX, causing the atual value to later be
reinterpreted as a negative number (under 16-bits).
Fixes tests/shaders/glsl-mul-const.shader_test in Piglit for older
platforms that don't support MUL with 32x32 types and use vec4.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Existing code was ignoring whether the type of the immediate source
was signed or not. If the source was signed, it would ignore small
negative values but it also would wrongly accept values between
INT16_MAX and UINT16_MAX, causing the atual value to later be
reinterpreted as a negative number (under 16-bits).
Fixes tests/shaders/glsl-mul-const.shader_test in Piglit for platforms
that don't support MUL with 32x32 types, including ICL and TGL.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2186
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
With this patch, GCC generates vectorized code that does the comparisons
without converting the indices to 32-bit first.
This optimization makes the aforementioned function almost twice as fast
for ARM NEON, and should speed up vectorised code on other platforms.
Without vectorisation, the function is still a percent or two faster,
but slightly larger.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3050>
This fixes a bug with NGG that is probably harmless.
Basically, !is_monolithic makes the VS prolog emit
llvm.amdgcn.init.exec.from.input, which sets the EXEC mask to only enable
ES threads. In the NGG non-GS case, the GS threads <= ES threads, so it was
never an issue.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>
lower_mul_2x32_64 generates mul_high opcodes, and lower_mul_high is done by
nir_lower_alu, so call nir_lower_alu after nir_opt_algebraic.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
If for some reason the fence associated with an image doesn't signal,
we're likely in a device lost scenario, we should report that error.
We can't really wait for a given amount of time because we could get a
timeout and that is not a valid error to report for vkQueuePresentKHR,
so just wait forever.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/830
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
I'm honestly unsure what this is for, but it's needed on MFBD systems
for unknown reasons, at least when MRT is actually in use and then
sometimes without MRT (it fixes a blend shader issue on T760?)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>
Epilogues are special fixed-function blocks, so they need special
handling for liveness analysis to work completely. This in turns fixes
RA issues for many shaders using MRT.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>
The flow is considerably more complicated. Instead of one writeout loop
like usual, we have a separate write loop for each render target. This
requires some scheduling shenanigans to get right.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>
V3D can do indirect inputs so we don't need it. Also, the lowering
produces horrible if-ladder code that is particularly bad for geometry
shaders where inputs are always arrays and shader bodies usually have
a loop indexing into them.
This fixes a couple of geometry shader tests in CTS that would fail to
register allocate otherwise.
There are no changes in shader-db.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
With geometry shaders the number of emitted primitived is decided
at run time, so we cannot precompute it in the CPU and we need to
use the PRIMITIVE_COUNTS_FEEDBACK commands to have the GPU provide
the number like we do for the number of primitives written to
transform feedback. This may have a performance impact though, since
it requires a sync wait for the draw to complete, so we only do
it when geometry shaders are present.
v2: remove '> 0' comparison for ponter type (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
When geometry shaders write a value to gl_Layer that doesn't correspond to
an existing layer in the target framebuffer the rendering behavior is
undefined according to the spec, however, there are CTS tests that trigger
this scenario on purpose, probably to ensure that nothing terrible happens.
For V3D, this situation is problematic because the binner uses the layer
index to select the offset to write into the tile state data, and we only
allocate tile state for MAX2(num_layers, 1), so we want to make sure we
don't produce values that would lead to out of bounds writes. The simulator
has an assert to catch this, although we haven't observed issues in actual
hardware it is probably best to play safe.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
When doing layered rendering the binning stage will prepare per-tile
lists for each layer in the framebuffer, so we need to make sure
we allocate enough space for them .
We also need to emit the NUMBER_OF_LAYERS packet. This is required
even when the number of layers is only 1, otherwise the simulator
detects buffer overflows in the tile_state BO during some CTS test
cases involving layered FBOs.
When rendering, we need to emit commands for each layer of the
framebuffer separately and make sure we address the correct layers for
each one.
v2: fixed typo in comment (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
For layered rendering we need to emit per layer rendering commands
lists so we we can end up requiring a fairly large buffer for this
if the number of layers is large enough.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
OES_geometry_shader introduced the concept of layered framebuffers.
Removing this assertion gets a bunch of CTS tests to pass. We will
also need layered images to implement layered rendering with geometry
shaders.
v2: fix typo in commit message (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
If we program an output size of 0 the simulator asserts. This was
not a problem until now because our VS would always have to
emit fixed function outputs, however, now that it can be paired
with a GS we can end up with a VS shader that no longer emits
any outputs.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Geometry shaders can output many vertices and thus have higher VPM memory
pressure as a result. It is possible that too wide geometry shader dispatches
exceed the maximum available VPM output allocated, in which case we need
to reduce the dispatch width until we can fit the VPM memory requirements.
Supported dispatch widths for geometry shaders are 16, 8, 4, 1.
There is a limit in the number of VPM output sectors that can be used by a
geometry shader that we can meet by lowering the dispatch width at compile
time, however, at draw time we need to revisit this number and, together with
other elements that can contribute to total VPM memory requirements, decide
on a configuration that can fit the program into the available VPM memory.
Ideally, we also want to aim for not using more than half of the available
memory so we that we can run a pair of bin and render programs in parallel.
v2: fixed language in comment and typo in commit log. (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
According to the documentation, the 1-way dispatch width is only supported
with geometry shaders.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This is good enough to get basic GS workloads working, later patches will
improve this by adding instancing support, proper SIMD configuration, etc.
Notice that most of the TESSELLATION_GEOMETRY_SHADER_PARAMS fields are only
relevant when tessellation shaders are present. We do not support tessellation
yet, but we still need to fill in these tessellation state with default values
since our packing functions require some of these to have non-zero values.
v2:
- Add a comment in the code explaining why we fill in
tessellation fields (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Every code address starts at bit 3 (addresses must be 64-bit aligned),
with the first 3 bits used to specify threading and NaN propagation
parameters for the shader program.
We generally skip "reserved" bits, however, doing this when the
reserved field is the last in a struct and it is large enough can
make us compute incorrect (smaller) struct sizes which can
lead to corrupt CLs. In particular, the "Tess/Geom Common Params"
struct has a reserved field at the end that is 8-bit, so if we
don't include this we compute a packet size that is 1 byte smaller
than it shold, making the next packet we emit start 1 byte
earlier and therefore leading to incorrect CL data from that point
forward.
The name of one of the fields was not correct.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Most of the relevant work happens in the v3d_nir_lower_io. Since
geometry shaders can write any number of output vertices, this pass
injects a few variables into the shader code to keep track of things
like the number of vertices emitted or the offsets into the VPM
of the current vertex output, etc. This is also where we handle
EmitVertex() and EmitPrimitive() intrinsics.
The geometry shader VPM output layout has a specific structure
with a 32-bit general header, then another 32-bit header slot for
each output vertex, and finally the actual vertex data.
When vertex shaders are paired with geometry shaders we also need
to consider the following:
- Only geometry shaders emit fixed function outputs.
- The coordinate shader used for the vertex stage during binning must
not drop varyings other than those used by transform feedback, since
these may be read by the binning GS.
v2:
- Use MAX3 instead of a chain of MAX2 (Alejandro).
- Make all loop variables unsigned in ntq_setup_gs_inputs (Alejandro)
- Update comment in IO owering so it includes the GS stage (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
While lowering vpm outputs we look for the NIR variables matching
particular store output instructions and we expect to find a match,
so assert on that.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
BEGIN_RING() could decide we can't fit the next packet in the current
cmdstream segment, and grow a new segment. So we need to grab ring->cur
*after* BEGIN_RING(), otherwise we are writing cmdstream past the end of
the previous segment.
Fixes: bdd98b892f ("freedreno: New struct packing macros")
Signed-off-by: Rob Clark <robdclark@chromium.org>
The current strategy using the suballocator with fixed size doesn't
scale and causes some programs with large number of vertices (like some
glmark2 scenes) to crash.
Change it to dynamically allocate a separate bo to accomodate for
arbitrary number of vertices.
This also fixes the buffer read/write flags for gp.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2445>
I feel really bad about this but this one test is flaking. I don't want
to do a mass revert (and bisection is extremely difficult with
nondeterministic/Heisenbugs), but it's Friday night and master needs to
pass. This commit should be reverted asap (once the flake is solved)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This can be used to start/stop statistics capturing from the command
line.
v3:
- Install script (Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
By default, if an output_file is specified, the overlay layer will start
capturing data immediately. After this commit, when a control socket is
used, the capture starts disabled by default, and is only enabled when a
command ":capture=1;" is received.
when the capture is enabled, we might have already accumulated some
stats. To avoid capturing such noise, we discard and reset the fps and
stats, updating the display and capturing only data from that point on.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Add support for socket from which the overlay layer can receive
commands. This control socket can be useful to allow setting options
once the application is already running. For instance, triggering the
capture of fps data at a certain point.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Add some infrastructure to trace scheduler decisions. The next patch
will add some more traces, just splitting this out to reduce clutter.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Sometimes sched changes that are a win in terms of instruction count
and/or register pressure, are worse in real life, due to keeping varying
storage locked for too long. Add a shader-db stat to give this more
visibility.
Signed-off-by: Rob Clark <robdclark@chromium.org>
We'll need this so we can allocate a stack for the batch large enough
for all the jobs within it.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The size of the scratchpad (as well as some tiler details) depend on the
contents of the batch, so we need to wait to defer filling out the FBD
until after all draws are queued.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The only format that needs swizzle is R8 emulated with L8, so we can get
rid of the SWIZ(X, Y, Z, W) everywhere.
Note: R8G8 also had a swizzle, but it wasn't necessary.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This supports all sRGB formats, without having them in the format table.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
The Vulkan spec doesn't have any words for vertex attributes alignment.
Fixes a test failure on GFX6 and a GPU hang on GFX10 with:
dEQP-VK.spirv_assembly.instruction.spirv1p4.entrypoint.tess_con_pc_entry_point
vkpipeline-db results on GFX10:
Totals from affected shaders:
SGPRS: 463772 -> 472972 (1.98 %)
VGPRS: 343208 -> 343752 (0.16 %)
Spilled SGPRs: 323 -> 336 (4.02 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 13806200 -> 14164472 (2.60 %) bytes
Max Waves: 84021 -> 83755 (-0.32 %)
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2161
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Our current implementation of performance queries is fairly harsh
because it completely flushes and invalidates the 3d pipeline caches
at the beginning and end of each query. An argument can be made that
this is how performance should be measured but it probably doesn't
reflect what the application is actually doing and the actual cost of
draw calls.
A more appropriate approach is to just stall the pipeline at
scoreboard, so that we measure the effect of a draw call without
having the pipeline in a completely pristine state for every draw
call.
v2: Use end of pipe PIPE_CONTROL instruction for Iris (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was initially intended to fix issues with the query timings going
occassionally high.
It turns out there was a bug in the attribution of OA reports to our
context when parsing the OA data. This led to reports flagged with
other context IDs to be included in our queries results.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were passing cl->bo, which is NULL, so v3d_job_add_bo was a no-op.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Partial depth/stencil clear and skipping unused attachments.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
We don't have an entry for cpp 128 in the tile_alignment table, but I don't
think the HW supports this at all (blob driver just doesn't have 8x msaa).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Use a special format which allows sampling the stencil and set the correct
swizzle.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
We don't have layered rendering and ir3 doesn't support this intrinsic, so
just set it to zero for now.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
It looks like the actual tile alignment requirement is less than 32x32, but
in some cases input attachment texture needs 64 alignment.
Reduced the h alignment to 16 to compensate and it seems to work fine.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Use DIV_ROUND_UP and stop trying to increase the tile_count width/height
once tile_align_w/tile_align_h are reached.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
pColorBlendState is allowed to be NULL if subpass has >0 color attachments
but they are all unused.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Here we use the NIR based builder to add everything to the resource
list execpt for SSO packed varyings. Since the details of those
varyings get lost during packing we leave the special handing to
the GLSL IR pass for now. In order to do this we add some bools
to the build resource list functions.
Using the NIR based resource list builder gets us a step closer to
using a native NIR based linker. It should also be faster than the
GLSL IR builder, one because the NIR optimisations should mean we
add less entries due to better optimisations, and two because nir
gives us better lists to work with and we don't need to walk the
entire IR to find the resources.
Ack-by: Alejandro Piñeiro <apinheiro@igalia.com>
In a following commit we will use a NIR based builder to build the
OpenGL resource list, so we want to delay this call a little.
Ack-by: Alejandro Piñeiro <apinheiro@igalia.com>
This adds support for adding names of varying to the resource list
which is required for us to use this function with the glsl linker.
Support for names is optional for spirv which is why it had not been
added yet.
This is mostly a copy of the GLSL IR code adapted to nir.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
In order to be able to implement a NIR based glsl linker we need to
build the program resource list with NIR. This change delays the
remaping so that a later commit can call the NIR based resource
list builder.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
It's already enabled for all gallium drivers that support GLSL 1.40 or
above and we already support everything in our compiler on SNB+
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's conceptually independent from the upper part (which is not yet
understood, but for spilling generally remains equal to 0x1e).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Due to this issue we were using 4x the memory we should have for TLS,
which was messing up the size calculations. Oops!
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
I'm not totally sure why this would *break* things, but it's certainly
not necessary and it does break things. Somehow this gives the RA more
freedom, fixing some spill issues.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We would like no_spill decisions to be class-specific -- spilling from
special register to a work register doesn't preclude also spilling that
work register to stack.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This allows us to spill two 128-bit values in the same bundle, since we
have two registers we can spill with. This improves the
register allocation flexibility in programs with heavy spilling, though
unfortunately it isn't sufficient (theoretically, 3.5 128-bit values can
be spilled from 3 vector units and 2 scalar units).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This has been unused since the beginning since it's broken. Let's toss
it so it doesn't get in the way of further fixes. Bigger to fish to fry.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We do need some sort of a cost heuristic, but this one is just causing
spilling to behave worse on shaders I'm looking at, and I don't need
more noise in the spill implementation right now.
Get it working first. We can optimize this later.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Let's not worry about spilling twice in a bundle; that's too
restrictive. We'll need to change the schedule itself -- unfortunately,
this can have second-order effects due to pipeline registers.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Instead of having a giant function for both, split into the two
subtasks so we can handle errors better.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We move it to the register allocator itself. It doesn't belong in
midgard_schedule.c!
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Should be harmless, but UBSAN complains about it and fills the logs with
noise.
../src/mesa/state_tracker/st_manager.c:523:27: runtime error: member access within null pointer of type 'struct st_framebuffer'"}
#0 0xaad4e89c in st_framebuffer_reference ../src/mesa/state_tracker/st_manager.c:523"}
#1 0xaad4e89c in st_api_make_current ../src/mesa/state_tracker/st_manager.c:1091"}
#2 0xaab69e0e in dri_make_current ../src/gallium/state_trackers/dri/dri_context.c:301"}
#3 0xaab48fd2 in driBindContext ../src/mesa/drivers/dri/common/dri_util.c:581"}
#4 0xb682a122 in dri2_make_current ../src/egl/drivers/dri2/egl_dri2.c:1625"}
#5 0xb67f95a4 in eglMakeCurrent ../src/egl/main/eglapi.c:884"}
#6 0x4c2b0e in tcu::surfaceless::EglRenderContext::EglRenderContext(glu::RenderConfig const&, tcu::CommandLine const&) (/deqp/modules/gles2/deqp-gles2+0x29b0e)"}
#7 0x4c3302 in tcu::surfaceless::ContextFactory::createContext(glu::RenderConfig const&, tcu::CommandLine const&, glu::RenderContext const*) const (/deqp/modules/gles2/deqp-gles2+0x2a302)"}
#8 0x73a9b0 in glu::createRenderContext(tcu::Platform&, tcu::CommandLine const&, glu::RenderConfig const&, glu::RenderContext const*) (/deqp/modules/gles2/deqp-gles2+0x2a19b0)"}
#9 0x73ad86 in glu::createDefaultRenderContext(tcu::Platform&, tcu::CommandLine const&, glu::ApiType) (/deqp/modules/gles2/deqp-gles2+0x2a1d86)"}
#10 0x4c6a78 in deqp::gles2::Context::Context(tcu::TestContext&) (/deqp/modules/gles2/deqp-gles2+0x2da78)"}
#11 0x4c3ba0 in deqp::gles2::TestPackage::init() (/deqp/modules/gles2/deqp-gles2+0x2aba0)"}
#12 0x852fd8 in tcu::TestHierarchyIterator::next() (/deqp/modules/gles2/deqp-gles2+0x3b9fd8)"}
#13 0x829660 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x390660)"}
#14 0x810aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"}
#15 0x4c1d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"}
#16 0xb64b6aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"}
../src/mesa/state_tracker/st_atom.c:115:8: runtime error: member access within null pointer of type 'struct st_program'"}
#0 0xaae11a58 in check_program_state ../src/mesa/state_tracker/st_atom.c:115"}
#1 0xaae128f6 in st_validate_state ../src/mesa/state_tracker/st_atom.c:192"}
#2 0xaadc58c2 in prepare_draw ../src/mesa/state_tracker/st_draw.c:132"}
#3 0xaadc58c2 in st_draw_vbo ../src/mesa/state_tracker/st_draw.c:184"}
#4 0xabc4f924 in _mesa_validated_drawrangeelements ../src/mesa/main/draw.c:816"}
#5 0xabc50240 in _mesa_DrawElements ../src/mesa/main/draw.c:970"}
#6 0x73ebd2 in glu::CallLogWrapper::glDrawElements(unsigned int, int, unsigned int, void const*) (/deqp/modules/gles2/deqp-gles2+0x2d4bd2)"}
#7 0x6d86b2 in deqp::gls::FragOpInteractionCase::iterate() (/deqp/modules/gles2/deqp-gles2+0x26e6b2)"}
#8 0x494d16 in deqp::gles2::TestCaseWrapper::iterate(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x2ad16)"}
#9 0x7f9cf2 in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x38fcf2)"}
#10 0x7fa5f0 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x3905f0)"}
#11 0x7e1aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"}
#12 0x492d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"}
#13 0xb64b9aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"}
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
UBSAN complained that when alpha was 255 and we shifted it 24 positions
to the left, it didn't fit in a signed int. That's because bitwise
operations automatically promote to signed int.
../src/gallium/drivers/panfrost/pan_job.c:1130:64: runtime error: left shift of 255 by 24 places cannot be represented in type 'int'"}
#0 0xacf953d6 in pan_pack_color ../src/gallium/drivers/panfrost/pan_job.c:1130"}
#1 0xacf953d6 in panfrost_batch_clear ../src/gallium/drivers/panfrost/pan_job.c:1204"}
#2 0xaae3226a in st_Clear ../src/mesa/state_tracker/st_cb_clear.c:513"}
#3 0x4c3d0e in deqp::gles2::TestCaseWrapper::iterate(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x2ad0e)"}
#4 0x828cf2 in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x38fcf2)"}
#5 0x8295f0 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x3905f0)"}
#6 0x810aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"}
#7 0x4c1d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"}
#8 0xb64b6aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"}
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Should be harmless, but UBSAN complains about it and fills the logs with
noise.
../src/gallium/auxiliary/util/u_inlines.h:110:8: runtime error: member access within null pointer of type 'struct pipe_surface'"}
#0 0xaaccf186 in pipe_surface_reference ../src/gallium/auxiliary/util/u_inlines.h:110"}
#1 0xaaccf186 in util_copy_framebuffer_state ../src/gallium/auxiliary/util/u_framebuffer.c:105"}
#2 0xaabfb60e in cso_set_framebuffer ../src/gallium/auxiliary/cso_cache/cso_context.c:723"}
#3 0xaae195ce in st_update_framebuffer_state ../src/mesa/state_tracker/st_atom_framebuffer.c:207"}
#4 0xaae12316 in st_validate_state ../src/mesa/state_tracker/st_atom.c:261"}
#5 0xaae31302 in st_Clear ../src/mesa/state_tracker/st_cb_clear.c:438"}
#6 0x4c3d0e in deqp::gles2::TestCaseWrapper::iterate(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x2ad0e)"}
#7 0x828cf2 in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x38fcf2)"}
#8 0x8295f0 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x3905f0)"}
#9 0x810aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"}
#10 0x4c1d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"}
#11 0xb64b6aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"}
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
It's undefined behavior UBSAN complains about, so fixing this will
reduce the noise a bit.
../src/compiler/nir/nir_clone.c:710:4: runtime error: null pointer passed as argument 2, which is declared to never be null"}
#0 0xac781be4 in clone_function ../src/compiler/nir/nir_clone.c:710"}
#1 0xac781be4 in nir_shader_clone ../src/compiler/nir/nir_clone.c:740"}
#2 0xacf99442 in panfrost_shader_compile ../src/gallium/drivers/panfrost/pan_assemble.c:54"}
#3 0xacf6b268 in panfrost_bind_shader_state ../src/gallium/drivers/panfrost/pan_context.c:1960"}
#4 0xaae326bc in set_fragment_shader ../src/mesa/state_tracker/st_cb_clear.c:135"}
#5 0xaae326bc in clear_with_quad ../src/mesa/state_tracker/st_cb_clear.c:335"}
#6 0xaae326bc in st_Clear ../src/mesa/state_tracker/st_cb_clear.c:518"}
#7 0x494d0e in deqp::gles2::TestCaseWrapper::iterate(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x2ad0e)"}
#8 0x7f9cf2 in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x38fcf2)"}
#9 0x7fa5f0 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x3905f0)"}
#10 0x7e1aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"}
#11 0x492d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"}
#12 0xb64b9aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"}
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
As found by UBSAN, it should be harmless but it's good to remove any UB
so the tool's output is useful.
../src/panfrost/midgard/midgard_schedule.c:1094:9: runtime error: index -1 out of bounds for type 'midgard_instruction *[6]'"}
#0 0xad047872 in schedule_block ../src/panfrost/midgard/midgard_schedule.c:1094"}
#1 0xad04d41a in schedule_program ../src/panfrost/midgard/midgard_schedule.c:1116"}
#2 0xad031f98 in midgard_compile_shader_nir ../src/panfrost/midgard/midgard_compile.c:2588"}
#3 0xacf9874e in panfrost_shader_compile ../src/gallium/drivers/panfrost/pan_assemble.c:68"}
#4 0xacf6b268 in panfrost_bind_shader_state ../src/gallium/drivers/panfrost/pan_context.c:1960"}
#5 0xaae2596e in st_update_fp ../src/mesa/state_tracker/st_atom_shader.c:168"}
#6 0xaae12316 in st_validate_state ../src/mesa/state_tracker/st_atom.c:261"}
#7 0xaadc58c2 in prepare_draw ../src/mesa/state_tracker/st_draw.c:132"}
#8 0xaadc58c2 in st_draw_vbo ../src/mesa/state_tracker/st_draw.c:184"}
#9 0xabc4f924 in _mesa_validated_drawrangeelements ../src/mesa/main/draw.c:816"}
#10 0xabc50240 in _mesa_DrawElements ../src/mesa/main/draw.c:970"}
#11 0x73ebd2 in glu::CallLogWrapper::glDrawElements(unsigned int, int, unsigned int, void const*) (/deqp/modules/gles2/deqp-gles2+0x2d4bd2)"}
#12 0x6d86b2 in deqp::gls::FragOpInteractionCase::iterate() (/deqp/modules/gles2/deqp-gles2+0x26e6b2)"}
#13 0x494d16 in deqp::gles2::TestCaseWrapper::iterate(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x2ad16)"}
#14 0x7f9cf2 in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x38fcf2)"}
#15 0x7fa5f0 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x3905f0)"}
#16 0x7e1aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"}
#17 0x492d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"}
#18 0xb64b9aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"}
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tessellator defines own fmin/fmax functions that conflict
with those defined in cmath header. Need to use legacy math.h
which was originally used in MS code.
Reviewed-by: Krzysztof Raszkowski <krzysztof.raszkowski@intel.com>
Global load/store instructions can't know if the offset is
out-of-bound because they don't use descriptors (no range).
Fix this by clamping the offset for arrays that are indexed
with a non-constant offset that's greater or equal to the array
size.
This fixes VM faults and GPU hangs with Dead Rising 4.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2148
Fixes: 71a6794200 ("ac/nir: Enable nir_opt_large_constants")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Since f9a3d9738b temporary BO_WSI are definitely a thing so we have
an assert wrong.
Take that opportunity to expand a bit on an existing comment.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: f9a3d9738b ("anv: Use BO fences/semaphores for AcquireNextImage")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
We appear to have got lucky that the only type of temporary fence
payload we could have was a syncobj and that would only happen when
the type of the permanent payload was also a syncobj.
This code was broken if that assumption changed and it did in commit
f9a3d9738b.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
This adds nir encoding for these, generating them from libclc
was very expensive, and this is a lot simpler.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
There is an alignment issue doing this the other way, the
spec clearly says vload/store don't require alignment.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Neither Mutter nor KWin's wayland compositors appear to use modifiers.
In the non-modifier case, iris was still trying to use Y-tiling for
scan-out surfaces, leading to this error:
(gnome-shell:7247): mutter-WARNING **: 09:23:47.787: meta_drm_buffer_gbm_new failed: drmModeAddFB failed: Invalid argument
We now fall back to the historical X-tiling for scanout buffers, which
ought to work everyone, at lower performance. To regain that, we need
to ensure modifiers are actually supported in environments people use.
Fixes: fbf3124771 ("iris: Rework tiling/modifiers handling")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I forgot why that was required, but it still is the correct thing to do.
Hit it at some point when working on implementing more CL features.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Just like we already do in the llvm backend. The current constant buffer code
seems fundamentally flawed and right now we are thinking on how we want to
reimplement all of that.
But until that happens, just treat is as global memory and go on.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The caller is responsible for setting up the ubo_addr_format value as
contrary to shared and global, it's not controlled by the spirv.
Right now clovers implementation of CL constant memory uses a 24/8 bit format
to encode the buffer index and offset, but that code is dead as all backends
treat constants as global memory to workaround annoying issues within OpenCL.
Maybe that will change, maybe not. But just in case somebody wants to look at
it, add a toggle for this inside vtn.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The current code looks like a typo, and fails if the usage_mask
is for a y/z enabled input.
Fixes piglit ext_transform_feedback-immediate-reuse-index-buffer
with llvmpipe/nir
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The previous fix worked when the second channel wasn't exposed, but
a couple of piglit tests have inputs with just the y/z chans, no x/w.
Partly Fixes piglit ext_transform_feedback-immediate-reuse-index-buffer
with llvmpipe/nir
Fixes: 5363cda52b ("gallivm: add swizzle support where one channel isn't defined.")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Including non-functional changes to get the value from the fd_reg_pair
in places.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
All developers now use gitlab, don't confuse newcomers by suggesting
they might use the mailing list. We want everyone to use gitlab so
that patches get run through basic CI before they are merged.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Vulkan 1.1 requires VK_KHR_external_fence which requires syncobj support
to be actually usable. However, it doesn't strictly require that we
support any external handle types. We should be able to advertise 1.1
even on old kernels that don't have syncobj support.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
When we have syncobj_wait, we can trust in WAIT_FOR_SUBMIT but when we
don't, we only have BO waits and those aren't quite as nice. This
commit adds a flag to _anv_queue_submit to wait for the queue to drain
before returning. This gives us the behavior we need to implement
DeviceWaitIdle.
Fixes: 246261f0ad "anv: prepare the driver for delayed submissions"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Otherwise we may lower some fdot to fdph which is not implemented in pp.
Fixes#2126
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Nir serializes uses nir_ssa_alu_instr_src_components in a few places to
determine how many components a src has, but that's not what this function
returns. It simply returns how many channels are used, which is still fine
for most of the code.
This was breaking code like this:
vec16 32 ssa_1 = intrinsic load_global
vec1 32 ssa_2 = fmax ssa_1.a, ssa_2.b
v2: make the 16bit encoding work for identify swizzles again
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This is correct per the Vulkan spec format equivalence table.
Fixes: f36b52740a "radv/android: Add android hardware buffer queries."
Reviewed-by: Eric Anholt <eric@anholt.net>
Sometimes it's useful to get information about GPU faults in the
console, so it's synchronized with other messages.
This commit will cause Mesa to wait for completion and check if there
are any faults raised by the GPU.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
A lot of the brw_*_prog_key fields are for emulating features on legacy
hardware that iris doesn't support. In particular, all of the texture
swizzle fields take up a lot of space. These dead fields make hashing
the shader keys more expensive than it ought to be.
We introduce iris-specific keys with only the information we need, and
translate them to brw keys when actually compiling new variants. This
way, key comparisons can use the small keys. The size reductions are:
VS: 328 bytes -> 8 bytes
TCS: 312 bytes -> 24 bytes
TES: 304 bytes -> 24 bytes
GS: 284 bytes -> 8 bytes
FS: 304 bytes -> 16 bytes
CS: 280 bytes -> 4 bytes
Scores for the Piglit drawoverhead microbenchmark case with a shader
program change improve by roughly 30%.
Reviewed-by: Eric Anholt <eric@anholt.net>
macOS does not have pthread_getcpuclockid.
src/util/u_thread.h:156:4: error: implicit declaration of function 'pthread_getcpuclockid' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
pthread_getcpuclockid(thread, &cid);
^
Fixes: 4913215d14 ("util/u_thread: don't restrict u_thread_get_time_nano() to __linux__")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2171
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Eric Engestrom <eric@engestrom.ch>
This gets us shared non-UBWC layout code between gallium and turnip.
Until I fix up the rest of gallium to handle UBWC mipmapping, we do the
single-level UBWC setup in gallium as a fixup after layout.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
We pass in all the parameters for setting up the layout, though freedreno
still sets a few of them up early (since it uses layout helpers in making
some decisions about the layout setup parameters that will be cleaned up
once krh's blitter work lands).
This lets us start using some of the fdl_* helpers and have more obviously
matching code between gallium and turnip. We can't yet use the fdl_* UBWC
helpers, since the gallium driver doesn't do UBWC mipmaps (which I'm
working on in another branch).
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
It's the same logic for each of these being emitted, and I was about to
change the rsc->layout.* for UBWC.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
We can just bake the UBWC-goes-first delta into the slices at setup time.
I did have to fix up the resource shadowing swap path to swap the slice
fields, as it was missing and regressed the format reinterpets otherwise.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
i965 wants to use an offset from a base because everything is in a
single buffer whose address may be relocated, and all base addresses
are set to the start of that buffer.
iris wants to use a full 64-bit address, because state lives in separate
buffers which may be in the shader, surface, and dynamic memory zones,
where addresses grow downward from the top of a 4GB zone, So it's very
possible for a 32-bit offset to exist relative to multiple bases,
leading to the wrong state size.
low-level implementation of INTEL-performance-query APIs in
Intel iris driver. Most of functions and procedures defined here
are adopted from i965 driver (brw_performance_query.c)
v2: - replace genX_init_performance_query with
iris_init_perfquery_functions which is gen's version agnositic
- general code clean-up
v3: include gen_perf_gens.h as some of defines were moved to this new
header file
v4: - checking for kernel 4.13+ won't be needed here as Iris won't be
loaded anyway without DRM_SYNCOBJ that is enabled after Kernel
4.13.
- checking whether gen < 8 or is_cherryview won't be required as
well because those cases are screened in iris_screen_create.
v5: remove genX(init_performance_query)
v6: - remove oa_metrics_kernel_support as iris works only with kernel
4.18 and newer.
- use perf functions defined in separate file, iris_perf.h/c
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The configuration of the gen_perf vtable will be the same for
INTEL_performance_query and AMD_performance_monitor.
Initialize the table in a single routine that can be called from both
implementations.
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
new state tracker APIs added for INTEL_performance_query
This extension is enabled if all vendor specific functions for it
exist.
v2: add st_cb_perfquery.* to the list of sources in Makefile
v3: minor code clean-up
v4: - add driver hooks for intel-performance-query apis
- add PIPE level performance counter and type enums that
match to OpenGL enums
- do conversion of pipe_perf_counter_type and
pipe_perf_counter_data_type enums to GL defines in state_tracker
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
TCCNTLREG contains additional L3 cache write merging optimizations.
The default value on my system appears to be:
- URB Partial Write Merging (bit 0)
- L3 Data Partial Write Merging (bit 2)
- TC Disable (bit 3)
Windows drivers appear to set bit 1 as well to enable "Color/Z Partial
Write Merging". This should solve an issue we were seeing where MRT
benchmarks were using substantially more bandwidth than they ought.
However, we have not observed it to cause measurable FPS gains.
It is unclear whether we should be setting bit 0 or bit 3, so for now
we leave those at the hardware default value.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
TCCNTLREG contains additional L3 cache write merging optimizations.
The default value on my system appears to be:
- URB Partial Write Merging (bit 0)
- L3 Data Partial Write Merging (bit 2)
- TC Disable (bit 3)
Windows drivers appear to set bit 1 as well to enable "Color/Z Partial
Write Merging". This should solve an issue we were seeing where MRT
benchmarks were using substantially more bandwidth than they ought.
However, we have not observed it to cause measurable FPS gains.
It is unclear whether we should be setting bit 0 or bit 3, so for now
we leave those at the hardware default value.
Improves performance in Manhattan 3.0 by 6% on ICL 8x8 at a fixed
frequency, according to Felix Degrood. I didn't see any improvements
at out-of-the-box power management settings, however.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
TCCNTLREG contains additional cache programming settings. In
particular, there are several write combining controls we'd like to use.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This makes simple_mtx_destroy set the counter to an invalid canary
value and then makes lock/unlock assert that the value is legal.
That way, calling lock/unlock after destroy will assert fail,
rather than deadlocking or potentially even working.
This has caught real deadlocks in dEQP multithreaded tests (in st/mesa
shader variant zombie list handling), which have since been fixed.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
In particular, we need to invalidate the LRZ state when we cannot be
confident in what the Z state would be during rendering:
1) depth test modes not supported by LRZ
2) stencil test, which would require full rasterization and stencil
test in the binning pass (whereas LRZ normally just needs to
determine the min and max z value in an 8x8 quad)
Signed-off-by: Rob Clark <robdclark@chromium.org>
This was reverted needlessly because if was part of another series.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
Maybe finer way of dealing with this requirement would be to increase
the number of pdevice->memory.types[] to add a category for special
alignment cases.
Meanwhile this fixes the problem of CCS surface alignment and it's
probably not going to cause issues given the size of our address
space.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 6af8a4acc4 ("anv: Add aux-map translation for gen12+")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This has no real effect other than the names of the temporary files in
the build folder.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
It either clears the whole HTILE buffer or part of it depending
on the HTILE mask parameter.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
I don't think this makes much differences and a potential clear
following the initialization will overwrite HTILE anyways.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For depth+stencil images, the driver might use an optimized path
if only one aspect is cleared. It either clears the depth or the
stencil part of HTILE. Because the two separate aspects might use
the same HTILE memory we have to synchronize.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
PA_SC_AA_CONFIG might be updated when conversative rasterization is
enabled. Because the driver only re-emits the multisample state if
the number of samples is different, that register value might not
be updated correctly.
Found by inspection, doesn't fix anything known.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The new callback is called right before the flush is done to allow
users of st->flush to do some work after all the previous work has
been flushed.
This will be used by dri_flush in the next commit.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When using 3 planes, the sequence produces this chain:
plane0 -> plane2
This commit fixes this to produce:
plane0 -> plane1 -> plane2
Fixes: 86e60bc265 ("radeonsi: remove si_vid_join_surfaces and use combined planar allocations")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2193
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
instead of keeping the IR indefinitely in st_vp_variant.
This trivially fixes Selection/Feedback/RasterPos for NIR.
Reviewed-by: Dave Airlie <airlied@redhat.com>
gallivm receives these opcodes anyway because st_draw_feedback.c uses
shaders that were assembled for drivers, not llvmpipe.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
With 781a78 ("mesa: enable ARB_direct_state_access in compat for
GL3.1+), it's possible to have DSA with GL3.1+.
FTL creates a GL3.1 compat context, but fails the
_mesa_has_geometry_shaders(..) check in frame_buffer_texture.
Bump the compat version to pass the check.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We were iterating over the entire 32-entry array each time, when we
can just use a bitset to know that we're only uploading from the first
entry normally.
Knocks ir3_emit_user_consts down from ~.5% of CPU to .1% on WebGL
fishtank.
Reviewed-by: Rob Clark <robdclark@chromium.org>
The default is to not throw GL errors when drawing with mapped
buffers, but we were forcing it on for unclear reasons. Internally we
keep all our buffers mapped anyway, so it should be a no-op other than
reducing CPU overhead (.23% in a perf report for WebGL fishtank)
Reviewed-by: Rob Clark <robdclark@chromium.org>
u_decomposed_prims_for_vertices cannot support POLYGON, but POLYGON is
trivial to support as a special case directly (since we have the number
of vertices directly).
Fixes aborts in Panfrost in apps using GL_POLYGON.
Fixes: e881aa8c12 ("gallium/util: Add u_stream_outputs_for_vertices helper")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Revewied-by: Eric Anholt <eric@anholt.net>
Also removing the FIXME comments after matching the numbers with
updated documentation.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Assume that resource is tiled if we get DRM_FORMAT_MOD_INVALID
in resource_from_handle() and we don't have RO.
Fixes: 8c12f4e5f2 ("lima: enable tiling")
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Gives a very slight decrease in code size:
Totals from affected shaders:
Code Size: 1708488 -> 1702768 (-0.33 %) bytes
Max Waves: 2858 -> 2855 (-0.10 %)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
This patch also disables AMD_shader_ballot on GFX7 by default if ACO is used.
Note that shader_ballot works correctly, but performance seems inferior.
To enable shader_ballot use RADV_PERFTEST=shader_ballot.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
ACO writes an unused 3rd operand for internal usage
which makes LLVM recoginize it as illegal instruction.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
It's a very odd case to hit in the real world. However, there are some
CTS tests which switch back and forth between dispatch and clear without
changing the pipeline.
Fixes: bc612536eb "anv: Emit a dummy MEDIA_VFE_STATE before switching..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
When we moved from allocating BOs directly to using the BO cache, we
lost the EXEC_OBJECT_CAPTURE flag on all our state buffers.
Fixes: 3119b96bdf "anv: Allocate block pool BOs from the cache"
Fixes: ee77938733 "anv: Allocate batch and fence buffers from..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
With the addition of the planar formats helper, the
planar formats no longer have a valid block.bits field.
Calling util_format_get_blocksize therefore asserts.
Reorder the check to see if the format is supported
before doing the query to get the blocksize.
Fixes: 20f132e5ef ("gallium/util: add planar format layouts and helpers")
Signed-off-by: Fritz Koenig <frkoenig@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Multi-planar surfaces are allowed to have modifiers. Don't require
DRM_FORMAT_MOD_INVALID in order to create a surface for each plane
defined by the format.
Fixes: 246eebba4a ("iris: Export and import surfaces with modifiers that have aux data")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This format will be used to properly handle planar images with modifiers
in iris.
Fixes: 246eebba4a ("iris: Export and import surfaces with modifiers that have aux data")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The commit noted below assumed and enforced that DRM_MOD_INVALID was the
only valid modifier for multi-planar imported images. Due to that, it
required that modifier on multi-planar images to:
1. Allow multiple planes.
2. Perform YUV format lowering and extent adjustments.
3. Use buffer_index to correctly map the given planes.
Fix these issues by removing or updating the code built on that
assumption.
Fixes: 2066966c10 ("gallium/dri2: Support creating multi-planar modifier images")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of doing a dummy submit on the command buffer for the fence or a
dummy semaphore and trusting in implicit sync, this commit moves us to
take advantage of implicit sync and just use the WSI image BO as the
fence. Both semaphores and fences require a tiny bit of extra plumbing
to do this but the result is that we can get rid of a bunch of the extra
synchronization we're doing today.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In 83b943cc2f, we started making all VkDeviceMemory BOs resident all
the time. One unfortunate side-effect of this is that every
vkQueueSubmit sets EXEC_OBJECT_WRITE on every WSI memory object which
means that X server or Wayland compositor, instead of waiting on the
last vkQueueSubmit to actually write the buffer, now waits on the last
vkQueueSubmit to from that driver instance relative to whenever the
compositor's GL driver instance calls execbuf. This potentially leads
to a lot of extra synchronization that we didn't intend to have.
Instead, this commit makes it so that we leave WSI memory objects with
EXEC_OBJECT_ASYNC most of the time and only unset EXEC_OBJECT_ASYNC and
set EXEC_OBJECT_WRITE in the dummy execbuf that we do as part of
vkQueuePresent. This should hopefully result in tighter integration
with the compositor, lower latency, and better performance.
Testing with DOOM 2016, this seems to reduce latency by at least a frame
if not two and makes the game much more responsive. Testing was,
however, subjective, so we don't have any hard data on that.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Otherwise, we're trusting in the execbuf_add_bo which sets
EXEC_OBJECT_WRITE to to always be the first one that gets called. This
is likely true for fences but it seems somewhat fragile.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This lets us treat the implicit synchronization that we need for X11 and
Wayland like a semaphore. Instead of trusting the driver to somehow
figure out when that memory object needs to be signaled, we provide an
explicit point where the driver can set EXEC_OBJECT_WRITE and signal the
dma_fence on the BO. Without this, we have to somehow track inside the
driver when WSI buffers are actually used to avoid extra synchronization
dependencies.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It's not a tiler specific initialization; it's a generic GPU-side write
primitive that may be used for tiler reset on midgard.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Only Polaris10 is tested at the moment, and I disabled a TON of
tests to keep a CTS run within 5 minutes because my local runner
is a bit slow. A full CTS run takes more than 1h, which means it
will hit the timeout.
RADV CI can only be triggered manually on personal branches to
avoid breaking the world because one runner is definitely not
enough. This will allow us to test it until it's stable enough
to be enabled by default.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
This requires to bump LLVM to 8 because it's the minimum supported
version by RADV.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Use new rules: instead of only:
For container stage jobs:
* In the main Mesa project, run them by default.
* In merge requests, run them by default if any files affecting pipeline
results are changed.
* In all other cases (in particular branches in personal projects),
don't run them by default but allow triggering them manually.
build & test stage jobs are left at the default (when: on_success), so
they will run automatically once all their dependencies are satisified.
(Using the same rules as above would require these jobs to be manually
triggered as well, which is only possible once all dependency jobs have
passed) Please be considerate of CI runner resources and cancel unneeded
jobs on personal branches with no corresponding merge requests (this can
be done before the jobs start running).
In summary: No more special branch names. Unnecessary job runs are
avoided by default, but jobs which don't run by default can be triggered
manually.
v2:
* Split out LAVA changes to separate commit
* Clarify commit log a little, in particular WRT build/test stage jobs
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> # v1
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> # v1
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> # v1
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Having different policies could have some weird results, e.g. changes
only touching documentation (where the intention is not to run the
pipeline by default) would still create a pipeline with the LAVA jobs
running by default.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes the deqp fails in:
dEQP-VK.pipeline.sampler.*border*
(minus 1d array/d24 cases which fail for other reasons)
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Two things:
* Texture/sampler pointers aligned to the size of texture/sampler state
* Returning errors instead of crashing on OOM
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
To use with texture states that need alignment (texconst, sampler, border)
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Iterate the system values list when adding varyings to the program
resource list in the NIR linker. This is needed to avoid CTS
regressions when using the NIR to build the GLSL resource list in
an upcoming series. Presumably it also fixes a bug with the current
ARB_gl_spirv support.
Fixes: ffdb44d3a0 ("nir/linker: Add inputs/outputs to the program resource list")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
The main and Gallium implementations were recently merged, and the
align2 parameter in the Gallium one is in bits. execmem.c expected
bytes still. This led to every call here asserting.
Fixes: b6fd679a9e("mesa/main/util: moving gallium u_mm to util, remove main/mm")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
If the runner has a HW device that would be supported, even without
/dev/dri forwarded into the container, it will be enumerated and the tests
on llvmpipe fail with (for example):
libEGL warning: Not allowed to force software rendering when API explicitly selects a hardware device.
libEGL warning: MESA-LOADER: failed to open i965 (search paths /builds/anholt/mesa/install/lib/dri)
Given that we can't necessarily control the DRI devices present on the
runners (particularly for developers bringing their own runners to reduce
the demands on fd.o's shared resources), just skip these tests in CI.
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
The primary difference between the KHR and EXT versions of the extension
is that the KHR provides the address at AllocateMemory time for replay
so we can replay it safely without moving to a sparse address model.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This function has a lot of possible extensions and some of them we can
easily handle on-the-fly so it's easier to just have a loop than to find
each structure manually.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
When a BO is flagged as having a client visible address, we put it in
its own heap. We also support the client explicitly specifying an
address in said heap. If an address collision happens, we return false
from anv_vma_alloc which turns into a VK_ERROR_OUT_OF_DEVICE_MEMORY.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This new function lets you request to remove a specific address range
from the allocator. It returns true on success and leaves the allocator
unmodified and returns false on failure. It doesn't need to return an
offset because, if it succeeds, the offset passed in is the allocated
offset.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We already have a mechanism for specifying that we want a fixed address
provided by the driver internals. We're about to let the client start
specifying addresses in some very special scenarios as well so we want
to pass this through to the allocation function.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Our VMA allocations are really independent from the memory heaps we
expose via the API. The only thing that really matters is the GTT size
so we can make the high heap the right size.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
util_vma_heap_alloc will already return 0 if it doesn't have enough
space. The only thing the vma_*_available tracking was doing was
preventing us from allocating too much on any given heap. Now that
we're tracking that in the heap itself, we can drop these.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We're already tracking the amount of memory used in each heap. This
commit just makes us start rejecting memory allocations if the heap
would grow too large.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
I think the reason why we only do this for primaries is that we didn't
expect to have blorp calls in secondaries. However, you are allowed to
have a full render pass in a secondary command buffer so resolves and
clears can end up in there. We should just always invalidate.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In ee77938733, we started using the BO cache for anv_bo_pool and
stopped using the bo_flags parameter. However, we never dropped it from
the struct or the init function.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
strip() removes leading and trailing newlines, but leaves newlines
between multiple lines in the string. This could cause failures when
comparing the output of cross-compiled Windows binaries (producing
Windows-style newlines) to the expected output with Unix-style newlines.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
vl functions moved from radeonsi to gallium/auxiliary/vl have left
android build of radeonsi in broken state.
libmesa_galliumvl static is need to build readeonsi,
gallium_dri building rules are reworked to avoid multiple symbols
and libmesa_galliumvl static dependency is needed in radeonsi.
Here is the changelog:
- android: gallium/auxiliary: add libmesa_galliumvl static
- android: gallium_dri: move libmesa_gallium to static to prevent multiple symbols
- android: radeonsi: fix build after vl refactoring
Fixes the following building error:
external/mesa/src/gallium/drivers/radeonsi/si_uvd.c:47:
error: undefined reference to 'vl_video_buffer_create_as_resource'
clang.real: error: linker command failed with exit code 1 (use -v to see invocation)
Fixes: 86e60bc ("radeonsi: remove si_vid_join_surfaces and use combined planar allocations")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Since compute shares the FS state with graphics, we have to re-upload the
pipeline state when switching between compute dispatch and graphics draws.
We could potentially expose graphics and compute as separate queues and
then we wouldn't need pipeline state management, but the closed driver
exposes a single queue and consistency with them is probably good.
So far I'm emitting texture/ibo state as IBs that we jump to. This is
kind of silly when we could just emit it directly in our CS, but that's a
refactor we can do later.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
I tripped over this during CS enabling when my program BO wasn't set up.
Easier to debug this way than the kernel telling us a 0 handle is invalid.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
The loop over the pipelines to create (and the failure handling) was
noisy, and the stub for compute setup looked nicer to me.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
This is enough to pass
dEQP-VK.binding_model.shader_access.primary_cmd_buf.storage_buffer.fragment.single_descriptor.*
with fragmentStoresAndAtomics set, and thus to be able to start working on
compute. I haven't enabled that flag yet, because it also implies image
load/store support, which I haven't filled in.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
The spec requires unused uniform block to be set as active in the
program resource list. To support this we tell opt dead code not to
remove them. However we can mark them as unused internally and
avoid unnecessarily state changes.
This change is also required for the folowing clean-up patch.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This is where all the other uniform values are populated so it
makes much more sense here. Moving it will also allow us to better
share code between the NIR and GLSL IR resource list builders.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Without looking at the assembly or something, I'm not sure what the
compiler does here. The brw_reg_type enum is marked packed, so I'm
guess that it gets represented as a uint8_t. That's the only reason I
could think that comparing with -1 would be always true.
This patch adds the same cast that exists in brw_hw_type_to_reg_type.
It might be better to add a #define outside the enum for
BRW_REGISTER_TYPE_INVALID as (enum brw_reg_type)-1.
src/intel/compiler/brw_eu_compact.c: In function ‘has_immediate’:
src/intel/compiler/brw_eu_compact.c:1515:20: warning: comparison is always true due to limited range of data type [-Wtype-limits]
1515 | return *type != -1;
| ^~
src/intel/compiler/brw_eu_compact.c:1518:20: warning: comparison is always true due to limited range of data type [-Wtype-limits]
1518 | return *type != -1;
| ^~
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
CID: 1455194
Fixes: 12d3b11908 ("intel/compiler: Add instruction compaction support on Gen12")
Cc: @mattst88
This is for state commands like CmdSetViewport that can be used outside of
a renderpass. Accumulating those into draw_cs outside of the renderpass
should have the desired effect.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Somehow adjusting maxloc based on existing outputs got lost, resulting
in the clipdist varying clobbering the position varying. Causing a
shader that had no position output in freedreno/ir3, which triggers GPU
hangs in neverball.
Fixes: d0f746b645 ("nir: Save nir_variable pointers in nir_lower_clip_vs rather than locs.")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
The logic to ensure VS and BS inputs are aligned wasn't accounting for
unused inputs in VS. This *usually* doesn't happen, but it seems it
can in the case of ARB programs?
Fixes assert:
```
fd6_program_create: Assertion `bs->inputs[i].regid == vs->inputs[i].regid' failed.
```
Fixes: 882d53d8e3 ("freedreno/ir3+a6xx: same VBO state for draw/binning")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Fixes crashes that were unnoticed in CI because debug_assert() was not
enabled (but become real crashes after the next patch):
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitfieldextract.ivec2_highp_geometry
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitfieldextract.ivec2_lowp_geometry
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitfieldextract.ivec2_mediump_geometry
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitfieldextract.uvec2_highp_geometry
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitfieldextract.uvec2_lowp_geometry
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitfieldextract.uvec2_mediump_geometry
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
The following programming note shows up in all 3DSTATE_CONSTANT_*
packets:
"The sum of all four read length fields must be less than or equal to
the size of 64."
The backend compiler should guarantee this for us, so let's just add a
check here.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Use this new instruction introduced in Gen12. The instruction itself is
smaller, and it also allows us to emit a single instruction to all
stages that have the same push constant buffers (e.g. when they don't
have constant buffers).
There's one restriction to use this instruction, though: the length
field is only 5 bits long, so we need to check whether we can use it,
and fallback to the old 3DSTATE_CONSTANT_XS if that field is >= 32.
v2:
- Rebased on top of the lasted changes from Jason.
- Added review suggestions by Caio.
- Removed struct push_bos and merged some code into
anv_nir_compute_push_layout().
v3:
- Remove code churn due to gen8+ workaround in
anv_nir_compute_push_layout(). This code has been removed in an earlier
commit, and implemented in cmd_buffer_emit_push_constant().
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Add a helper function to get the push range address. Once we have a
separate function for emitting gen12 push constants, we can use this
helper and avoid duplicating code.
v3: Do not add range->start to the address in gen7 (Caio).
v4: Do not drop range->start from gen7 (Caio, Jason).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Store push_ranges in ascending order, and only "shift" them to the end
of the array during state packet emission.
We don't need this workaround with the new 3DSTATE_CONSTANT_ALL packet.
So instead of applying the workaround here just for GEN < 12 (which
requires and extra loop through all the ranges to figure out if we
should shift them or not), we simply move the whole logic to the state
emission code. At that point, in a later commit, we are already looping
through all of the ranges anyway to check which packet we will be using,
so we might as well implement the workaround there, where it is going to
be used.
v3: Move gen8+ workaround to the state emission code (Caio).
v4: Add explanation of why we moved the workaroudn (Caio).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Use this new instruction introduced in Gen12. The instruction itself is
smaller, and it also allows us to emit a single instruction to all
stages that have the same push constant buffers (e.g. when they don't
have constant buffers).
There's one restriction to use this instruction, though: the length
field is only 5 bits long, so we need to check whether we can use it,
and fallback to the old 3DSTATE_CONSTANT_XS if that field is >= 32.
v2 (Suggestions from Caio):
- use max_length instead of large_buffers.
- remove UNUSED and use #if GEN_GEN >= 12 instead.
- inline "buffers" and drop BITSET_RANGE() usage.
- add assert(n <= max_pointers)
- move emit to outside of the loop.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Split into a function the logic to gather the push constant buffers,
which now stores them in struct push_bos. Another function is added to
emit the packet, using data from the push_bos struct.
This will be useful when adding a new function for emitting push
constants for newer platforms.
v2 (Suggestions from Caio):
- rename 'n' -> 'buffer_count'
- remove large_buffers (for now)
- initialize push_bos
- remove assert
- change for() condition (i <= 3 -> i < 4)
v3:
- Add comment about size limit.
- Rework "shift" logic and 'for' loop.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
In blorp, all the push constants are disabled, so we only need to emit a
single 3DSTATE_CONSTANT_ALL with the bitmask for stage update
appropriately set.
v2: Update comment (Caio).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
These bits are ignored when clearing so don't bother setting them.
Note: MSAA samples when clearing comes from other registers (tu6_emit_msaa)
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Passes these deqp tests: dEQP-VK.api.image_clearing.core.*attach*single*
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it easier to find the gmem_offset associated with an attachment.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we have tiled format modifier merged into linux we can enable tiling.
That should improve overall performance and also workaround broken mipmapping
for linear textures since now we prefer tiled textures.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Patch adds additional linker check for SSO programs to make sure they
are redeclaring built-in blocks as required by the desktop spec.
This fixes following Piglit tests:
arb_separate_shader_objects/linker/pervertex-*
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Commit also updates the Piglit quick_gl.txt, list modifications happened
due to following Piglit commits: c248bf201,c acff58ca, 5603e2e60.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Previously, it would only work when the ballot size was set to the
lane mask. This patch makes is possible to set the ballot size
to either 32-bit or 64-bit for wave32 mode.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Currently all usages of exec and vcc are hardcoded to use s2 regclass.
This commit makes it possible to use s1 in wave32 mode and
s2 in wave64 mode.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Several places in ACO we use SOP1 or SOP2 instructions to operate over the
exec mask or VCC, and these need to be adapted to the new size in wave32
mode.
This commit adds a way to deal with this problem in aco_builder: the caller
can specify a wave size specific opcode and the builder will translate that
to the correct opcode based on the current wave size.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
This is relevant because in wave32 mode the v_mbcnt_hi_u32_b32
instruction is superfluous.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
And only use --process-isolation false for the quick_gl tests.
This will hopefully avoid variance in the test results that we've been
seeing lately. But even if it doesn't, it should at least help narrow
down the cause of the variance.
Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
This is a more accurate description of what happens in processing the
OA reports.
Previously we only had a somewhat difficult to parse state machine
tracking the context ID.
What we really only need to do to decide if the delta between 2
reports (r0 & r1) should be accumulated in the query result is :
* whether the r0 is tagged with the context ID relevant to us
* if r0 is not tagged with our context ID and r1 is: does r0 have a
invalid context id? If not then we're in a case where i915 has
resubmitted the same context for execution through the execlist
submission port
v2: Update comment (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If we read the OA reports late enough after the query happens, we can
get a timestamp in the report that is significantly in the past
compared to the start timestamp of the query. The current code must
deal with the wraparound of the timestamp value (every ~6 minute). So
consider that if the difference is greater than half that wraparound
period, we're probably dealing with an old report and make the caller
aware it should read more reports when they're available.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We always add an empty buffer in the list when creating the query.
Let's set the len appropriately so that we can recognize it when we
read OA reports up to the end of a query.
We were using an 0 timestamp value associated with the empty buffer
and incorrectly assuming this was a valid value. In turn that led to
not reading enough reports and resulted in deltas added to our counter
values which should have been discarded because those would be flagged
for a different context.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Accumulation happens between 2 reports, it can be between a start/end
report from another context. So only consider updating the hw_id of
the results when it's not already valid and that we have a valid value
to put in there.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 41b54b5faf ("i965: move OA accumulation code to intel/perf")
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
My fix wasn't totally correct as pointed out by Marek.
Ported from RadeonSI.
Fixes: deafe4cc58 ("radv/gfx10: fix primitive indices orientation for NGG GS")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
In d1c4e64a69, we added a parameter to tell the back-end compiler to
ignore the param array and just push however many constants you ask it
to push. Iris doesn't want to push anything so it gives a bogus number
of parameters and trusts the back-end compiler to dead-code all of them.
Now that we can tell the back-end compiler to stop re-arranging things,
delete the hack and enable the new simpler code path.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reject the new formats in swr to prevent crashes because it doesn't
know how to handle the new formats.
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
gl_Viewport is also in the VUE header so we need to whack the read
offset to 0 and emit a default (no overrides) SBE_SWIZ entry in that
case as well.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This wires up the front facing value as a sysval, I'd like to
remove the other facing code but I'd need to confirm VMware
don't use it first.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes dEQP-GLES3.functional.primitive_restart.*. Note the 0x18000 value
is accidentally somehow enabling primitive restart for some reason.
I'm not sure where this value came from but let's not.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
The algorithm is as described. Nothing fancy here, just need to add some
new code paths depending on which model we're running on.
Tomeu:
- Also disable tiling when !hierarchy and !vertex_count
- Avoid creating polygon lists smaller than the minimum when
vertex_count > 0 but tile size smaller than 16 byte
- Take into account tile size when calculating polygon list size for
!hierarchy
- Allow 0-sized tiles in a single dimension
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
We've figured out most of the big pieces, and though it looks faintly
like other Midgards, it's much simpler.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Similarly to how it's already done in the compiler, add a way to express
differences between GPU models that need to be taken into account when
assembling the cmdstream.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Since a is non-negative, neither fsqrt nor frsq should return NaN. frsq
should only return Inf when fsqrt returns 0.
The changes are pretty small, but this turns a few hundred hurt shaders
in the next patch into helped shaders.
An alternative to the intBitsToFloat is to import numpy and do
np.finfo(np.float32).max. That's more explicit, but we may also want to
have specific bit encodings of float values later. I could be convinced
either way, but intBitsToFloat(0x7f7fffff) was what I implemented first.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14661140 -> 14661104 (<.01%)
instructions in affected programs: 7520 -> 7484 (-0.48%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.32% max: 0.61% x̄: 0.49% x̃: 0.52%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.52% -0.47%
Instructions are helped.
total cycles in shared programs: 228585416 -> 228584806 (<.01%)
cycles in affected programs: 56321 -> 55711 (-1.08%)
helped: 32
HURT: 0
helped stats (abs) min: 2 max: 98 x̄: 19.06 x̃: 10
helped stats (rel) min: 0.08% max: 6.41% x̄: 1.09% x̃: 0.65%
95% mean confidence interval for cycles value: -28.32 -9.80
95% mean confidence interval for cycles %-change: -1.63% -0.54%
Cycles are helped.
Sandy Bridge
total cycles in shared programs: 152991077 -> 152991075 (<.01%)
cycles in affected programs: 11525 -> 11523 (-0.02%)
helped: 2
HURT: 2
helped stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.07% max: 0.11% x̄: 0.09% x̃: 0.09%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08%
95% mean confidence interval for cycles value: -5.27 4.27
95% mean confidence interval for cycles %-change: -0.16% 0.15%
Inconclusive result (value mean confidence interval includes 0).
No changes on Iron Lake or GM45.
In many cases, fsat, fneg, fabs, ineg, and iabs will get folded into
another instruction as either source or destination modifiers.
Counting them as instructions means that some if-statements won't get
converted to selects. For example,
vec1 32 ssa_25 = flt32 ssa_0, ssa_23.x
/* succs: block_1 block_2 */
if ssa_25 {
block block_1:
/* preds: block_0 */
vec1 32 ssa_26 = fabs ssa_24
vec1 32 ssa_27 = fneg ssa_26
vec1 32 ssa_28 = fabs ssa_20
vec1 32 ssa_29 = fneg ssa_28
vec1 32 ssa_30 = fmul ssa_27, ssa_29
vec1 32 ssa_31 = fsat ssa_30
/* succs: block_3 */
} else {
block block_2:
/* preds: block_0 */
/* succs: block_3 */
}
block block_3:
/* preds: block_1 block_2 */
block_1 isn't really 6 instructions, but it will be counted that way.
Most callers of the peephole_select pass use either 1 or 8. It's very
easy to blow way past either of these limits with things that are really
only one or two actual instructions.
I also tried some fancier things like making sure the fsat was of
another SSA def from the same block, but the simple test was actually
better.
The i965 back-end SEL peephole pass still helps ~700 shaders in
shader-db with this change.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
All Gen6+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14743694 -> 14738910 (-0.03%)
instructions in affected programs: 156575 -> 151791 (-3.06%)
helped: 1204
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 3.97 x̃: 3
helped stats (rel) min: 0.15% max: 19.57% x̄: 5.15% x̃: 4.55%
95% mean confidence interval for instructions value: -4.12 -3.82
95% mean confidence interval for instructions %-change: -5.35% -4.95%
Instructions are helped.
total cycles in shared programs: 231749141 -> 231602916 (-0.06%)
cycles in affected programs: 2818975 -> 2672750 (-5.19%)
helped: 876
HURT: 322
helped stats (abs) min: 2 max: 788 x̄: 180.99 x̃: 220
helped stats (rel) min: <.01% max: 43.82% x̄: 20.75% x̃: 19.44%
HURT stats (abs) min: 1 max: 1188 x̄: 38.27 x̃: 20
HURT stats (rel) min: 0.09% max: 102.67% x̄: 5.17% x̃: 1.70%
95% mean confidence interval for cycles value: -130.47 -113.64
95% mean confidence interval for cycles %-change: -14.85% -12.72%
Cycles are helped.
total sends in shared programs: 730495 -> 730491 (<.01%)
sends in affected programs: 46 -> 42 (-8.70%)
helped: 2
HURT: 0
Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8122757 -> 8122617 (<.01%)
instructions in affected programs: 14716 -> 14576 (-0.95%)
helped: 46
HURT: 1
helped stats (abs) min: 1 max: 8 x̄: 3.07 x̃: 3
helped stats (rel) min: 0.36% max: 10.00% x̄: 2.54% x̃: 1.06%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 1.59% max: 1.59% x̄: 1.59% x̃: 1.59%
95% mean confidence interval for instructions value: -3.42 -2.54
95% mean confidence interval for instructions %-change: -3.28% -1.62%
Instructions are helped.
total cycles in shared programs: 188510100 -> 188509780 (<.01%)
cycles in affected programs: 58994 -> 58674 (-0.54%)
helped: 32
HURT: 1
helped stats (abs) min: 2 max: 96 x̄: 10.06 x̃: 6
helped stats (rel) min: 0.05% max: 15.29% x̄: 1.37% x̃: 0.31%
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 0.68% max: 0.68% x̄: 0.68% x̃: 0.68%
95% mean confidence interval for cycles value: -16.34 -3.06
95% mean confidence interval for cycles %-change: -2.46% -0.15%
Cycles are helped.
pthread_getcpuclockid() and clock_gettime() are also available on at least
OpenBSD, FreeBSD, NetBSD, DragonFly, Cygwin.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Enabling this option makes Intel Gen8-11 hardware load the 'iris'
driver by default instead of the older 'i965' driver.
Regardless of how this option is set, users can still override which
driver the loader selects via two methods. The first is to create a
~/.drirc or /etc/drirc file with the following snippet:
<driconf>
<device driver="loader" kernel_driver="i915">
<option name="dri_driver" value="i965" />
</device>
</driconf>
The other option is to set an environment variable:
export MESA_LOADER_DRIVER_OVERRIDE=i965
For now, "prefer_iris" defaults to i965 (the historical choice).
A separate future patch will change the default driver to iris.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1893
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Improves generated code of dEQP-VK.graphicsfuzz.disc-and-add-in-func-in-loop
because a loop exit phi can then be fixed to exec, removing copies and
improving jump threading.
No pipeline-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
ACO considers discards jumps and creates edges in the CFG for them but NIR
does neither of these.
This can be fixed instead by keeping track of whether a side of an IF had
a break/discard, but this doesn't solve the issue with discards affecting
loop exit phis. So this reworks phi handling a bit.
Fixes these tests:
dEQP-VK.graphicsfuzz.disc-and-add-in-func-in-loop
dEQP-VK.graphicsfuzz.loop-call-discard
dEQP-VK.graphicsfuzz.complex-nested-loops-and-call
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Right now there are two copies of mm:
* mesa/main/mm.[ch]
* gallium/auxiliary/util/u_mm.[ch]
At some point they splitted, and from the commit message it was not
clear why it was not possible to have only one copy at a common place.
Taking into account that was several years ago, Im assuming that it
was not possible then.
This change would allow to have one copy of the same code, and also
being able to use that code out of mesa/main or gallium, if needed.
This commit moves u_mm and removes mm, as u_mm has slightly more
changes.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This exposes GL_TDFX_texture_compression_FXT1 support. It's ancient,
only Intel GPUs appear to support it, and I seriously doubt anybody
uses it. But i965 supports it, and it's trivial to do, so we may as
well support it in the new iris driver as well.
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric recently added PIPE_FORMAT_FXT1_RGB[A] as part of his format
unification work. This was really most of the work of implementing
the extension. We just need to handle it in a couple of places and
expose the extension.
v2: Reject the new formats in llvmpipe_is_format_supported to prevent
crashes because it doesn't know how to handle the new formats.
Reviewed-by: Marek Olšák <marek.olsak@amd.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
This allows this pass to be run multiple times and the results are
just or'ed together.
It fixes on test on llvmpipe nir, and regresses none.
Suggested by Kenneth
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This shouldn't introduce any functional changes for RadeonSI
when NIR is enabled because these operations are already lowered.
pipeline-db (NAVI10/LLVM):
SGPRS: 9043 -> 9051 (0.09 %)
VGPRS: 7272 -> 7292 (0.28 %)
Code Size: 638892 -> 621628 (-2.70 %) bytes
LDS: 1333 -> 1331 (-0.15 %) blocks
Max Waves: 1614 -> 1608 (-0.37 %)
Found this while glancing at some F12019 shaders.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The reference guide is incorrect and SADDR is actually used with FLAT on
GFX10.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
LLVM and the proprietary compiler seem to do this
Fixes: b01847bd9 ("aco/gfx10: Fix mitigation of VMEMtoScalarWriteHazard.")
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
The blob driver does something like this for all vertex formats:
if (normalize) {
if (OPENGL_ES30)
val = VIVS_FE_VERTEX_ELEMENT_CONFIG_NORMALIZE_SIGN_EXTEND;
else
val = VIVS_FE_VERTEX_ELEMENT_CONFIG_NORMALIZE_ON;
} else {
val = VIVS_FE_VERTEX_ELEMENT_CONFIG_NORMALIZE_OFF;
}
As there is no way to get to that information in gallium we always
assume OPENGL_ES30.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
brw_performance_query_metrics.h was removed in
134e750e16 and
brw_performance_query.h was removed in
8ae6667992
remove reference to these files from Makefile.sources
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Fixes: 134e750e16 ("i965: extract performance query metrics")
Fixes: 8ae6667992 ("intel/perf: move query_object into perf")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
We must reset the damage info of our render targets here even though a
damage reset normally happens when the DRI layer swaps buffers. That's
because there can be implicit flushes the GL app is not aware of, and
those might impact the damage region: if part of the damaged portion
is drawn during those implicit flushes, you have to reload those areas
before next draws are pushed, and since the driver can't easily know
what's been modified by the draws it flushed, the easiest solution is
to reload everything.
Reported-by: Carsten Haitzler <raster@rasterman.com>
Fixes: 65ae86b854 ("panfrost: Add support for KHR_partial_update()")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
BACK_LEFT attachment can be outdated when the user calls
KHR_partial_update() (->lastStamp != ->texture_stamp), leading to a
damage region update on the wrong pipe_resource object.
Let's delay the ->set_damage_region() call until the attachments are
updated when we're in that case.
Reported-by: Carsten Haitzler <raster@rasterman.com>
Fixes: 492ffbed63 ("st/dri2: Implement DRI2bufferDamageExtension")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Coverity doesn't know that we always have coordinates if we have lod. To
avoid annoying errors, let's just zero-initialize this.
CoverityID: 1455202
Reviewed-by: Dave Airlie <airlied@redhat.com>
Same story as the previous two commits; these functions dereference the
memory they are pointed at. We can't do that.
CoverityID: 1455180
Reviewed-by: Dave Airlie <airlied@redhat.com>
Similar to the previous commit, pipe_resource_reference also dereference
the memory pointed at. Let's avoid it.
CoverityID: 1455198
Reviewed-by: Dave Airlie <airlied@redhat.com>
zink_render_pass_reference will dereference the memory 'dst' points at,
which can't really go well. All we want to do here is to increase the
reference-count, so let's use a different helper for that instead.
CoverityID: 1455200
Reviewed-by: Dave Airlie <airlied@redhat.com>
destroy_fence doesn't handle NULL-pointers gracefully. So let's avoid
hitting that code-path, by simply returning NULL early here instead.
CoverityID: 1455179
Reviewed-by: Dave Airlie <airlied@redhat.com>
It seems I had some fat fingers when writing this function, and I
accidentally ended up allocating a new query and immediately trying to
delete an uninitialized pool instead of just deleting the pool of the
query that was passed.
CoverityID: 1455196
Reviewed-by: Dave Airlie <airlied@redhat.com>
When I changed to heap-allocated sampler-objects, I missed the code-path
that restores sampler-states after the blitter; it needs an array of
pointers, not an array of VkSampler objects to behave.
This fixes spec@arb_texture_cube_map@copyteximage for me.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 5ea787950f ("zink: heap-allocate samplers objects")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Vulkan only allows power-of-two sample counts. We already kinda checked
for this, but forgot to validate the result in the end. Let's check the
result and error properly.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This drops all the old documentaion around applying for push access.
Also this removes the documentation stating that you can push
directly to mesa rather than using merge requests.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1969
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Was totally broken ...
Removed two if(point) {} because point is always non-NULL and we
were counting on that already for counting, since we NULL our
references to semaphores without active point earlier.
Fixes: 4aa75bb3bd "radv: Add wait-before-submit support for timelines."
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2137
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
pthread_mutex_unlock() when unlocked is documented by posix as
being undefined behaviour. On OpenBSD pthread_mutex_unlock() will call
abort(3) if this happens.
This occurs in amdgpu_winsys_create() after
cb446dc0fa
winsys/amdgpu: Add amdgpu_screen_winsys
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: 19.2 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
They were out of sync. Besides syncing, lets ensure they never diverge
again.
Fixes: 8d2654a419 "radv: Support VK_EXT_inline_uniform_block."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Split out the logic for exclusive scans into a separate function
that makes clear what it does instead of having this opaque 60
line if.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
To avoid generating invalid LLVM IR when both operands don't have
the same type. This might happen when performing pointer comparisons
with SPIRV 1.4.
Fixes invalid LLVM IR for:
dEQP-VK.spirv_assembly.instruction.spirv1p4.opptrequal.variable_pointers_ssbo_equal
dEQP-VK.spirv_assembly.instruction.spirv1p4.opptrnotequal.variable_pointers_ssbo_not_equal
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This adds the hooks between llvmpipe and the gallivm NIR
code, for compute and fragment shaders.
NIR support is hidden behind LP_DEBUG=nir for now until
all the intergration issues are solved
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This transforms the NIR shaders like the TGSI transforms worked.
v2: fix some nir info requirements, use 32-bit bools
Acked-by: Roland Scheidegger <sroland@vmware.com>
This add the initial implementation of the NIR->LLVM conversion
for llvmpipe NIR support.
v2: lower bool to int32 in nir not llvm
Acked-by: Roland Scheidegger <sroland@vmware.com>
This is a port of the old radeonsi code to be used for llvmpipe NIR support.
Once we remove TGSI support from llvmpipe (I can dream? :-), then
we should be able to refine most of this down and remove it.
v2: port to later radeonsi code for vertex inputs and sampler/io parsing.
Acked-by: Roland Scheidegger <sroland@vmware.com>
When drawing the main character in Shadow of Mordor, the game appears
to draw Talion with one vertex shader, and the Wraith with another.
If the compiler optimizes those in different ways which lead to slight
imprecisions, then the resulting positions may not line up, leading to
Z-fighting occurring as the game decides which of the two are in front.
brw_nir_opt_peephole_ffma looks at usages of multiply adds across the
entire shader, and may make different decisions between the two, leading
to such imprecisions and Z-fighting. This started happening recently
after a NIR change to eliminate unnecessary MOVs (7025dbe7), but that
change simply exposed the existing problem.
Improves performance on Skylake GT4e by 1.22945% +/- 0.398672% (n=3),
likely due to the fixed rendering.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1985
Fixes: 7025dbe794 ("nir: Skip emitting no-op movs from the builder.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Many applications use multi-pass rendering and require their vertex
shader position to be computed the same way each time. Optimizations
may consider, say, fusing a multiply-add based on global usage of an
expression in a shader. But a second shader with the same expression
may have different code, causing that optimization to make the other
choice the second time around.
The correct solution is for applications to mark their VS outputs
'invariant', indicating they need multiple shaders to compute that
output in the same manner. However, most applications fail to do so.
So, we add a new driconf option - vs_position_always_invariant - which
forces the gl_Position output in vertex shaders to be marked invariant.
Fixes: 7025dbe794 ("nir: Skip emitting no-op movs from the builder.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
They're not implemented, and not critical to bring up immediately. Avoids
failures in the CTS when nothing gets written to the query.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This will make it easier to look at details of failed / skipped tests.
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we're not reporting test results as JUnit anymore, we can use the
default JSON format.
This affects how test results are summarized, update the reference files
accordingly.
Reviewed-by: Eric Anholt <eric@anholt.net>
It was basically useless in this form, and processing the JUnit data in
the GitLab backend was pretty expensive.
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We were always ensuring a minimum size of 4 bytes for uniforms
for the case where we don't have any, to account for hardware pre-fetching
of the uniform stream, however, pre-fetching could also lead to to out
of bounds reads when have read the last uniform in the stream, so we
probably want to have the extra 4 bytes to prevent the kernel from
observing invalid memory accesses when the uniform stream sits right at
the end of a page.
This seems to fix MMU exceptions reported with a Linux 5.4 kernel.
Credit goes to Phil Elwell for identifying the problem and narrowing
it down to memory accesses in the uniform stream.
Reported-by: Phil Elwell <phil@raspberrypi.org>
Tested-by: Phil Elwell <phil@raspberrypi.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When a fragment shader includes an input variable decorated with
SampleId or SamplePosition, sample shading should be enabled
because minSampleShadingFactor is expected to be 1.0.
Cc: 19.2, 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Add missing required bits. Fixes at least:
dEQP-VK.pipeline.render_to_image.dedicated_allocation.1d.small.r16g16_sint_d24_unorm_s8_uint
dEQP-VK.pipeline.render_to_image.dedicated_allocation.2d.mipmap.r16g16_sint_d24_unorm_s8_uint
dEQP-VK.renderpass.dedicated_allocation.attachment.4.401
dEQP-VK.renderpass2.suballocation.formats.r16_uint.load.draw
dEQP-VK.synchronization.op.single_queue.barrier.write_draw_read_copy_image_to_buffer.image_128x128_r16_uint
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Only clang has this argument (at least as of clang 8 and gcc 9), which
errors when using the gcc empty initializer syntax in C:
```C
struct foo f = {};
```
GCC has a warning for this, but only when using -Wpedantic, which is a
lot of noise to lose useful warnings in.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Most of these will never actually be compiled by windows, but in the
interest of being able to make using struct foo = {}; an error and
avoiding breaking windows removing a handful of safe uses seems like a
good trade off.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
The vertex cache uses the full 48-bit address on Gen11+. See the
documentation for 3DSTATE_VERTEX_BUFFERS, which describes the
workaround and lists it as pre-Icelake.
Interestingly, the docs don't mention index buffers as needing a
workaround at all. So either we've been overzealous, or the docs
never got updated to record that. Which begs the question of whether
the issue there was fixed, if there was one...
Cuts 40% of the PIPE_CONTROLs from Civilization VI's benchmark; appears
that it improves performance by about 1-2% on Icelake 8x8 (not frequency
locked).
The slices table and most of the other layout fields in the
freedreno_resource moves into fdl_layout.
v2: Changes by anholt to not have duplicate fields, which was introducing
a surprising behavior change in resource layout (using the
level_linear helper before the setup of the shadowed fields)
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Rob Clark <robdclark@chromium.org>
This gets the worst of the sed required for shared resource layout out of
the way. The texture layout comment is dropped now that we're referencing
the shared header, which has a more complete description.
Acked-by: Rob Clark <robdclark@chromium.org>
This will be used for sharing resource layout code between freedreno and
tu. Mostly copied from a commit by Rob, with a new location and the slice
struct renamed for consistency.
Acked-by: Rob Clark <robdclark@chromium.org>
Multiple places were doing the same thing to get the tile mode of a level,
so refactor it out. This will make the shared resource helper transition
cleaner.
Acked-by: Rob Clark <robdclark@chromium.org>
This factors out a bit of duplicated code, but will also make the shared
resource layout transition process clearer.
Acked-by: Rob Clark <robdclark@chromium.org>
The algebraic pass was exhibiting O(n^2) behavior in
dEQP-GLES2.functional.uniform_api.random.3 and
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 (along with
other code-generated tests, and likely real-world loop-unroll cases).
In the process of using fmul(b2f(x), b2f(x)) -> b2f(iand(x, y)) to
transform:
result = b2f(a == b);
result *= b2f(c == d);
...
result *= b2f(z == w);
->
temp = (a == b)
temp = temp && (c == d)
...
temp = temp && (z == w)
result = b2f(temp);
nir_opt_algebraic, proceeding bottom-to-top, would match and convert
the top-most fmul(b2f(), b2f()) case each time, leaving the new b2f to
be matched by the next fmul down on the next time algebraic got run by
the optimization loop.
Back in 2016 in 7be8d07732 ("nir: Do opt_algebraic in reverse
order."), Matt changed algebraic to go bottom-to-top so that we would
match the biggest patterns first. This helped his cases, but I
believe introduced this failure mode. Instead of reverting that, now
that we've got the automaton, we can update the automaton's state
recursively and just re-process any instructions whose state has
changed (indicating that they might match new things). There's a
small chance that the state will hash to the same value and miss out
on this round of algebraic, but this seems to be good enough to fix
dEQP.
Effects with NIR_VALIDATE=0 (improvement is better with validation enabled):
Intel shader-db runtime -0.954712% +/- 0.333844% (n=44/46, obvious throttling
outliers removed)
dEQP-GLES2.functional.uniform_api.random.3 runtime
-65.3512% +/- 4.22369% (n=21, was 1.4s)
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 runtime
-68.8066% +/- 6.49523% (was 4.8s)
v2: Use two worklists, suggested by @cwabbott, to cut out a bunch of
tricky code. Runtime of uniform_api.random.3 down -0.790299% +/-
0.244213% compred to v1.
v3: Re-add the nir_instr_remove() that I accidentally dropped in v2,
fixing infinite loops.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
My motivation was to clarify the changes in the following commit, but
incidentally, it reduces runtime of
dEQP-GLES2.functional.uniform_api.random.3 (an algebraic-heavy
testcase) by -5.39524% +/- 2.21179% (n=15)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
In order to have nir_opt_algebraic be able to do further algebraic
work on the output of a replacement, we need to maintain the
automaton's state.
Reviewed-by: Eric Anholt <eric@anholt.net>
If not all bits are cleared, then BLT needs to be given the current clear
value and not the new one.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Fixes:
Wolfenstein:Youngblood (w/o shader_ballot)
dEQP-VK.descriptor_indexing.combined_image_sampler_in_loop_with_lod
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
I don't think the bug applies for global/scratch instructions and
load_barycentric_at_sample selection expects this feature to work.
Fixes various dEQP-VK.pipeline.multisample_interpolation.* tests on GFX10.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Commit 0899bf55 made some deqp-gles3 tests related to RGB8 PBOs fail
on R600 because it exposed PIPE_FORMAT_R8G8B8_UNORM and R600 doesn't
propely handle this. Disabling this format also for buffers fixes the
issue.
In addition, disabling also the related RGB8 integer formats for buffers
fixes some deqp-gles3 tests:
dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rgb8ui_cube
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8i_2d
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8i_cube
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8ui_2d
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8ui_cube
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8i_2d_array
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8i_3d
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8ui_2d_array
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8ui_3d
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8i_2d_array
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8i_3d
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8ui_2d_array
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8ui_3d
Fixes: 0899bf55
st/mesa: Map MESA_FORMAT_RGB_UNORM8 <-> PIPE_FORMAT_R8G8B8_UNORM
Closes#2118
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
../src/amd/llvm/ac_llvm_build.c: In function ‘ac_build_canonicalize’:
../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘intr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
4567 | return ac_build_intrinsic(ctx, intr, type, params, 1,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4568 | AC_FUNC_ATTR_READNONE);
| ~~~~~~~~~~~~~~~~~~~~~~
../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘type’ may be used uninitialized in this function [-Wmaybe-uninitialized]
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
In the case of promoted extensions we can end up with an entrypoint that
we support being an alias of an entrypoint we do not support. For
instance, if an extension gets promoted from EXT to KHR, the EXT entry-
points may be aliases of the KHR ones. We want to leave everything as
EXT until we get around to advertising the KHR so that we don't break
things when we update the XML and headers.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The current code can only handle enum aliases if the original enum is
declared first followed by the alias as we walk the XML in a linear
fashion. This commit allows us to handle aliases where the alias
declaration comes before the thing it's aliasing.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We may have replaced the backing storage for a texture buffer while it
was unbound, at which point iris_rebind_buffer would not have caught it
and updated it. We need to ensure that the current resource's address
matches the one our SURFACE_STATE points at. If not, update addresses
and re-upload the SURFACE_STATE.
Shader images and buffers do not suffer from this problem because we
re-stream the surface state on every set call, since there isn't a
created CSO object for those with a saved SURFACE_STATE. Constant
buffers are also currently re-streamed (we pitch the SURFACE_STATE
on every set_constant_buffer call). Surfaces would need this
treatment (as they're created CSOs) except that we never swap out
their backing storage today (we only do it for buffers), so it's OK
for now.
Fixes misrendering in Unreal 4 demos (Elemental, Matinee Fight Scene).
Huge thanks to Andrii Simiklit for tracking down the problem - it was
quite difficult to find! Also fixes Andrii's new Piglit test for the
bug, 'arb_texture_buffer_object-re-init'.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1365
When replacing the backing storage for texture buffers, image buffers,
and so on, we may need to update the "Surface Base Address" field in
any corresponding SURFACE_STATE. This is easier to accomplish if we
have a copy on the CPU - we can just compare the current field, update
it, and re-upload.
This patch adds a CPU-side copy to the new iris_surface_state wrapper
struct, and reworks allocation and upload to fill things out on the
CPU copy first, then upload that to the GPU when finished.
This will be necessary to fix iris_invalidate_resource bugs shortly.
Technically, we never replace the backing storage for pipe_surfaces
(render targets), so we don't need to make this change there. However,
it's nice to have surfaces, sampler views, and image views handled
similarly. Plus, if we ever wanted to swap out backing storage for
busy textures, we'd need this infrastructure.
v2: Properly free memory (caught by Andrii Simiklit)
Today, we only have a state reference to the GPU buffer containing our
uploaded SURFACE_STATEs. However, we're going to want a CPU-side copy
soon. Making a wrapper struct means we can talk about both together,
and also put both in the field called "surface_state".
We can just compare the VERTEX_BUFFER_STATE address field to the
current BO's address. When calling rebind, we've already updated
the resource to the new buffer, but the state will have the old
address.
Mutating fields of global resources is generally not safe, and the only
reason we were doing it was to avoid passing an extra parameter to
the fill_surface_state helper.
This takes a noticable amount of time in piglit and some tests don't
need it.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
This is similar to a scheduler I've written for vc4 and i965, but this
time written at the NIR level so that hopefully it's reusable. A notable
new feature it has is Goodman/Hsu's heuristic of "once we've started
processing the uses of a value, prioritize processing the rest of their
uses", which should help avoid the heuristic otherwise making such
systematically bad choices around getting texture results consumed.
Results for v3d:
total instructions in shared programs: 6497588 -> 6518242 (0.32%)
total threads in shared programs: 154000 -> 152828 (-0.76%)
total uniforms in shared programs: 2119629 -> 2068681 (-2.40%)
total spills in shared programs: 4984 -> 472 (-90.53%)
total fills in shared programs: 6418 -> 1546 (-75.91%)
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> (v1)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v2)
v2: Use the DAG datastructure, fold in the scheduling-for-parallelism
patch, include SSA defs in live values so we can switch to bottom-up
if we want.
v3: Squash in improvements from Alejandro Piñeiro for getting V3D to
successfully register allocate on GLES3.1 dEQP. Make sure that
discards don't move after store_output. Comment spelling fix.
At the same time, update etna_clear_blit_pack_rgba to work with integer
formats.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Use the extended format if an such a format was passed.
v1 -> v2:
- set FORMAT_MASK bit when using ext PE format as suggested
by Wladimir J. van der Laan
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
No functional changes, but it will be used to decompress
separate depth/stencil aspects.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
No functional changes because the aspect mask is still not used
during image transitions but it will be needed for the separate
depth/stencil aspects logic.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
v7: run nir_opt_algebraic
v9: rework the callback function
v9: update alignment on all loads/stores, even if they're not vectorized
v10: add tests for 64-bit offsets
v10: add tests for signed offsets
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)
This pass combines intersecting, adjacent and identical loads/stores into
potentially larger ones and will be used by ACO to greatly reduce the
number of memory operations.
v2: handle nir_deref_type_ptr_as_array
v3: assume explicitly laid out types for derefs
v4: create less deref casts
v4: fix shared boolean vectorization
v4: fix copy+paste error in resources_different
v4: fix extract_subvector() to pass
nir_load_store_vectorize_test.ssbo_load_intersecting_32_32_64
v4: rebase
v5: subtract from deref/offset instead of scheduling offset calculations
v5: various non-functional changes/cleanups
v5: require less metadata and preserve more
v5: rebase
v6: cleanup and improve dependency handling
v6: emit less deref casts
v6: pass undef to components not set in the write_mask for new stores
v7: fix 8-bit extract_vector() with 64-bit input
v7: cleanup creation of store write data
v7: update align correctly for when the bit size of load/store increases
v7: rename extract_vector to extract_component and update comment
v8: prevent combining of row-major matrix column acceses
v9: rework process_block() to be able to vectorize more
v9: rework the callback function
v9: update alignment on all loads/stores, even if they're not vectorized
v9: remove entry::store_value, since it will not be updated if it's was
from a vectorized load
v9: fix bug in subtract_deref(), causing artifacts in Dishonored 2
v9: handle nir_intrinsic_scoped_memory_barrier
v10: use nir_ssa_scalar
v10: handle non-32-bit offsets
v10: use signed offsets for comparison
v10: improve create_entry_key_from_offset()
v10: support load_shared/store_shared
v10: remove strip_deref_casts()
v10: don't ever pass NULL to memcmp
v10: remove recursion in gcd()
v10: fix outdated comment
v11: use the new nir_extract_bits()
v12: remove use of nir_src_as_const_value in resources_different
v13: make entry key hash function deterministic
v13: simplify mask_sign_extend()
v14: add comment in hash_entry_key() about hashing pointers
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)
It shouldn't matter, but the 1 was leftover from when it was handled
together with workgroup_size and num_work_groups.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
The former was always true and hence dead code. We will want to
explicitly declare the ring offset register with ACO, but we also want
to declare the scratch offset too, and we can't try to disable it since
ACO also supports spilling and the determination of whether spilling has
to happen occurs well after setting up registers. So replace
supports_spill with something that will actually be used for ACO.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Due to how LLVM works we have to make some of the FS inputs become
vectors, and therefore have to split them early so that they don't take
up extra register pressure due to how RA currently works.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
ac_shader_args will be similar to ac_shader_abi, except for being free
from LLVM-specific concepts and therefore capable of being shared
between LLVM and ACO. This will help us accomplish a few different
things:
- Decouple setting up SGPR and VGPR arguments from translating to LLVM,
so that we can reference these arguments in NIR lowering passes, which
will let us lower e.g. descriptor sets in NIR.
- Stop using radv-specific structures for things like determining the
chip generation in ACO.
In the end, we should replace ac_shader_abi with this structure +
driver-specific lowering passes.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We'll duplicate this in a header file in the next commit, and then
remove the original enum. Just rename it temporarily so that things
keep building.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
GiMark benchmark from GpuTest has such code in VS:
out vec4 lightDir0;
out vec4 lightDir1;
...
lightDir0.xyz = lp0 - vVertex.xyz;
lightDir1.xyz = lp1 - vVertex.xyz;
In FS:
float distSqr = dot(lightDir0, lightDir0);
So due to the usage of uninitialized .w channel in the dot product,
distSqr may become undefined which results in many black dots
in the test on Iris.
In https://www.geeks3d.com/forums/index.php/topic,6242.0.html
developer stated that this benchmark most likely won't be updated.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1919
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes dEQP-VK.compute.builtin_var.local_invocation_index with
RADV_PERFTEST=cswave32.
My initial fix was to lower it but Rhys suggested the shift-right
and it's much better like this.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
They are broken like on GFX6-GFX7. It seems better to disable them
instead of enabling a broken feature.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fails a couple of piglits due to other bugs in llvmpipe,
but it adds support for the feature properly.
v2: don't reset pipestats, just recalc, fix CI expectation
In order to prevent a potential malicious pipeline tainting our
secure compile process and interfering with successive pipelines
we want to create a fresh fork for each pipeline compile.
Benchmarking has shown that simply forking on each pipeline
creation doubles the total time it takes to compile a fossilize db
collection. So instead here we fork the process at device creation
so that we have a slim copy of the device and then fork this
otherwise idle and untainted process each time we compile a
pipeline. Forking this slim copy of the device results in only a
20% increase in compile time vs a 100% increase.
Fixes: cff53da3 ("radv: enable secure compile support")
This will be used to create a communication pipe between the user
facing device and a freshly forked (per pipeline compile) slim copy
of that device.
We can't use pipe() here because the fork will not be a direct fork
of the user facing process. Instead we use a previously forked
copy of the process that was forked at device creation in order to
reduce the resources required for the fork and avoid performance
issues.
Fixes: cff53da374 ("radv: enable secure compile support")
In the following commits we want to be able to fork an existing lightweight
fork created at device creation time. In order for the user facing process
to communicate with this new fresh fork we create some members here to hold
FIFO file descriptors and a unique id.
Here we also add a new fork enum that we use to tell the lightweight
process to create a fresh fork.
For more information on why we create a fresh fork see the following
commits.
This fixes a build failure on MSVC.
BTW, it looks like clang supports _Pragma() but I don't know if it
understands the "gcc unroll N" directive.
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes build with MinGW, with shared LLVM and lto
/tmp/opengl32.dll.BxiIYm.ltrans59.ltrans.o:<artificial>:(.text+0x1674): undefined reference to `LLVMAddInstructionCombiningPass'
See also scons/llvm.py
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Only NPOT vectors greater than vec4 use the extra uint32.
This is for instructions that share the dest code.
load_const and undef already support 1-16 in the header.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
vec4 scalarized ALUs typically have 4 equal instruction headers, so remove
the last 3.
There are no bits left in the ALU header for more flags, so future
extensions of NIR will have to use something like instr_type == 15
to describe more complex ALU instructions.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
It can be derived from src and var. This frees 10 bits in the header
that will be used later.
"mode" is moved in the structure, because those bits will be used for
something else later.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
- type_cast: deduplicate types if the last one is the same
- derive the type from the parent for other derefs
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The majority of constants can be packed like this.
v2: - use enum for the packing encoding,
- trim packed_value to 20 bits add 1 bit to last_component,
which simplifies a later commit
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
If the repo continues development, we don't want to accidentally pick
up potentially breaking changes on our next container rebuild.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
xfb + lines/points still flakes too frequently (and the problem isn't
even related to xfb), but we can add the rest back into this mix now.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Extract .qpa for the individual unexpected results and flakes, and
translate to xml, preserved with the artifacts. This allows easy
browsing of the test logs for fails/flakes, for easier debugging.
The # of logs to preserve is capped at 50 to avoid saving 100s of
megabytes of logs in case someone pushes a change that breaks
everything.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
If there are a small number of fails, re-run to determine if they are
flakes, and optionally (if `$FLAKES_CHANNEL` configured) report the
flakes.
This way flakes don't interfere with developers working on other
drivers, but get logged so that the developers working on the flaking
driver can monitor the situation.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Bump cts_runner to pick up the change to preserve .qpa and caselist .txt
files for blocks of tests that contain fails, and preserve the caselist
files. To reproduce fails that depend on order of running tests, these
are useful.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
The log only shows the first 50, but preserve the full list for easier
browsing.
(Also move return of exit code to end which makes later patches in the
series easier)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Update the deqp build to preserve testlog-to-xml and stylesheets, so
deqp runner can extract .qpa for failed/flaked tests, and convert to
xml. With this, will be able to browse output from failed tests
directly from the artifacts.
The main motiviation is to give better visibility into what happens with
flaked tests, when it is difficult/impossible to reproduce the flake
locally (ie. when it happens once out of N million tests). But this
should also make it easier to debug regressions that a MR triggers,
especially when it is on hw that you don't have.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Dolphin: 75 fps -> 88 fps - Super Mario Galaxy
Citra: 81 fps -> 91 fps - A Link Between Worlds
Yuzu: 21 fps -> 27 fps - Super Mario Odyssey
Dolphin still has many syncs because of glFenceSync and glClientWaitSync.
Moving them to the dispatcher thread might yield another speedup.
Yuzu uses a compatible profile by default. This benchmark used the variable
MESA_GL_VERSION_OVERRIDE=4.5FC to overwrite this behavior.
This profilation was done on a mobile i7-8550U CPU with i965.
Signed-off-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For temporary lookups, just allocate out of the NULL ralloc context,
so we don't have to edit the linked list of ralloc children to add it
and then immediately remove it again.
When uploading a new shader, allocate the keybox off the shader, so
if we delete the shader the keybox also goes away. Less manual cleanup.
All of the tables are static const, so they only need to be validated
once. As noted in the previous commit, the compiler should be able to
eliminate all of this code when the assertions would pass. Even with
the help of the previous commit, this does not always occur.
-Og: -95.688 +/- 3.91935 (-24.9562% +/- 1.0222%) N=5
-O1: No difference proven at 95.0% confidence. N=5
-O2: -1.962 +/- 0.85001 (-0.860013% +/- 0.372589%) N=5
Reviewed-by: Eric Anholt <eric@anholt.net>
I was pretty liberal with these assertions when I wrote this code
because I had assumed that GCC would unroll the loops, inline the look ups
of static const arrays with now constant indices, and then elmininate
all the actuall assertions. It seems none of this happens even at -O3.
Adding the pragmas helps encourage loop unrolling at some optimization
levels. I tested by running shader-db with NIR_VALIDATE=false on a Core
i7 Haswell desktop system.
-Og: No difference proven at 95.0% confidence. N=5
-O1: -48.304 +/- 1.221 (-16.3343% +/- 0.412888%) N=5
-O2: -49.94 +/- 1.23521 (-17.9634% +/- 0.444303%) N=5
v2: Add a _Pragma to an inner loop that was accidentally dropped during
a rebase.
Reviewed-by: Eric Anholt <eric@anholt.net>
Varyings are similar to already handled cases. And "glsl_zero_init"
name of the workaround already looks like it should include varyings.
The issue was observed in GiMark subtest from GpuTest.
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Just a bit of cleanup. lower_tex can do this lowering for us, which
should also eliminate some special cases (one less thing to fix if we
ever need texturing in tess/geom/etc, perhaps?)
Closes#2133
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
It seems overkill to me to build scons 7x for every pipeline.
Scons is now build with the oldest llvm version in scons-old-llvm
and with the newest llvm version in scons.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We fetch the info with the new intrinsic and lower with ALU ops for txl
instructions, which seemingly correspond to "TEXGRD" instructions (what
we call textureLod).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
We can stuff this information in as parametrized system values, like we
currently do texture size and SSBO addresses.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
This loads in the <min_lod, max_lod, lod_bias> settings for a given
sampler, which is necessary for lowering clamps/biases on certain
Midgard chips.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Generating a source code with a fixed size leads to issues with plattform dependent types.
We either hard code 4 or 8 bytes there, and both are wrong on the other plattform.
So this patch solves this issue by generating eg sizeof(GLsizeiptr), which is valid both
on 32 and on 64 bit plattforms.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This enables UBWC for everything except 3D textures.
It breaks many image_to_image copies but those aren't important and it can
be worked around later (image_to_image copy needs to be done in two steps,
decode from the source format and then encode to the destination format).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fix build error after llvm-10.0 commit 1dfede3122ee ("Move
CodeGenFileType enum to Support/CodeGen.h").
../src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp: In member function ‘void JitManager::DumpAsm(llvm::Function*, const char*)’:
../src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp:428:45: error: ‘CGFT_AssemblyFile’ is not a member of ‘llvm::TargetMachine’
*pMPasses, filestream, nullptr, TargetMachine::CGFT_AssemblyFile);
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
Do this by repeating processing of loops until no progress is made.
Totals from affected shaders:
SGPRS: 162576 -> 162576 (0.00 %)
VGPRS: 145228 -> 145228 (0.00 %)
Spilled SGPRs: 668 -> 668 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 15778640 -> 15771336 (-0.05 %) bytes
LDS: 146 -> 146 (0.00 %) blocks
Max Waves: 6087 -> 6087 (0.00 %)
v2: use block_kind_loop_header/block_kind_loop_exit to repeat at the end
of loops instead of at each continue
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
When GPU is idle and suspends, the currently selected countables
will all reset to the first one. So periodically restore the selected
countables.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Port from the envytools tree, but converted to use the .c tables for
describing the perfcounter groups/countables, rather than using rnndec
to get this at runtime from the register xml.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Currently this are getting blocked by the kernel.. these counters don't
seem to be the most useful ones, and to use them we'd have to somehow
probe the kernel by submitting cmdstream to write the selector regs and
see if that triggers a GPU fault. So let's just skip them.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This should eventually be useful for VK_KHR_performance_query as well.
And in the more near term, for fdperf.
Attempt to not break android build is best-effort and untested.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
When we had one gen supporting performance counters, it made sense to
have these builder macros in the .c file with the table. But time has
come to de-duplicate.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This passes the piglit CL builtin-ulong-clz-1.0.generated.cl
test.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
This adds the option to lower 64-bit ufind_msb opcodes.
v2: use split_x/y removes component loops (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I set the runners to concurrency=1, so they serve only one gitlab-ci job
at at time. Swap over to using the parallel runner now to keep the
runners busy, more efficiently than spawning many docker containers and
downloading artifacts multiple times, and producing easier-to-understand
results for browsing on the web.
This bumps the a306 runners to 4x parallel instead of 2x like before, but
cheza gles3 drops from 6 to 4. Current rough timings of the jobs (if no
container download):
db410c-gles2: 5:00
a630-gles2: 1:30
a630-gles3: 6:00
a630-gles31: 5:30
a630-gles3 is a bit longer than I like, but it should come back down once
I can sort out the NIR algebraic rewinding.
This was apparently missed in 67b32190f3, which added support
for ARB_shading_language_include to #line, including the 'path'
field for the location.
Fixes crashes in CTS with all drivers as they attempt to access
an uninitialized path string during parsing.
Fixes: 67b32190f3 ("glsl: add ARB_shading_language_include support to #line")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2132
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jose Maria Casanova <jmcasanova@igalia.com>
Use hardcoded /cache/mesa/ccache for the cache, so it will be shared by
all jobs of all Mesa projects running on the same runner host. This
should increase the hit rate and decrease the worst case storage used.
Further benefits of directly using a host-mapped directory:
* Saves up to ~1 minute per job for restoring and saving the cache
contents via the GitLab CI cache mechanism
* Cache contents generated by failed jobs are no longer lost
* Jobs running in parallel on the same runner host can get hits from
each other
Also enable compression, so the default maximum cache size of 5G might
be sufficient.
v2:
* Move CCACHE_DIR variable to the .build-linux template
Suggested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net> # v1
Building GLVND in meson-main doesn't work because this disables
libEGL and it's needed for running shader-db.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Now that debugoptimized isn't set and that all test jobs depend on
meson-testing, enabling swr shouldn't slowdown the CI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
For turnip and RADV testing, we will need a debugoptimized build
without UBSAN. This introduces meson-testing which builds only the
things that are needed by the test stage.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
This happens when mesa is built with only swrast. The default
driver being kmsro and the default driconf file being v3d,
it's NULL and then strdup crashes.
This fixes a crash with piglit spec/egl_mesa_query_driver/conformance.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Enough trial and error ... just think even *more* Midgard about where
this field might be!
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
also make 8 and 16 compoments invalid. We will enable that later again
when we actually support it.
v2: fix validation of nir_intrinsic_instr::num_components
correct validation of instr->num_components
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This reverts commit 52c7df1643. The pass,
while clearly useful for some shaders, has at least three bugs that I
was able to find fairly quickly:
1. It doesn't work for type-converting MOVs because f > 0 is not the
same as f2i(f) > 0
2. CSEL is a 3src instruction and only supports one source type; it
doesn't take this into account and tries to create instructions
which do a F compare and a D select. This is especially nasty to
debug because you don't see that in the dumped assembly because we
don't properly assert that types are the same in codegen.
3. While you can handle 2, in theory, by reinterpreting types, you
can't do that in the presence of source modifiers. This pass
doesn't even attempt to detect that.
Those are just the ones I found with the one almost trival shader I was
debugging. There very likely may be more and. Best thing to do for now
is just shut it off until someone has the time to figure out how to do
this properly and write tests to ensure it's correct.
Fixes: 3cb085e6d61a "i965/fs: Merge CMP and SEL into CSEL on Gen8+"
Reviewed-by: Brian Paul <brianp@vmware.com>
This will be useful as a deterministic identifier/index for the variable.
v2: fix comment style
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v1)
Doesn't shrink it (at least, on x86-64) and leaves space for more members.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
v2. [Hyunjun Ko (zzoon@igalia.com)]
Avoid using too much open code like "instr->regs[n]->flags |= FOO"
v3. [Hyunjun Ko (zzoon@igalia.com)]
Remove redundant code for both 16b and 32b operations.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Adds binop_reduce_all_sizes which generates both 1-bit and 32-bit
versions of the reduce operation. This reduces the code duplication a
bit and will make it easier to later add 16-bit versions as well.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Adds binop_compare_all_sizes which generates both 1-bit and 32-bit
versions of the comparison operation. This reduces the code
duplication a bit and will make it easier to later add 16-bit versions
as well.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Most of DEQP-VK.subgroups are skipped because 16-bit float aren't
supported but others pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Specifically when we are in non-uniform control flow, as we would need
to set the condition for the last instruction. If (for example) a
image atomic load stores directly their value on a NIR register,
last_inst would be a nop, and would fail when set the condition.
Fixes piglit test:
spec/glsl-es-3.10/execution/cs-ssbo-atomic-if-else-2.shader_test
Fixes: 6281f26f06 ("v3d: Add support for shader_image_load_store.")
v2: (Changes suggested by Eric Anholt)
* Cover all sig.ld* signals, not just ldunif and ldtmu, as all of
them have the same restriction.
* Update comment explaining why we add a MOV in that case
* Tweak commit message.
v3:
* Drop extra set of parens (Eric)
* Add missing ld signal to is_ld_signal to fix shader-db regression.
Reviewed-by: Eric Anholt <eric@anholt.net>
Support cases such as depth-only renders and only set stencil buffers
when needed, to match the blob's behaviour.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The tiler unit in these GPUs is quite different and we haven't reverse
engineered enough of it yet to validate and pretty print it.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Rather than open-coding checks on gpu_id in the compiler, let's track
quirks applying to whatever we're compiling for, to allow us to manage
the complexity of many heterogenous GPUs in the compiler.
It was discovered that a workaround used on T720 is also required on
T820 (and presumably T830), so let's fix this. This will also decrease
friction as we continue improving T720 support.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
This will allow us to continue searching the current path for
relative shader includes.
From the ARB_shading_language_include spec:
"If it is quoted with double quotes in a previously included
string, then the first search point will be the tree location
where the previously included string had been found."
Reviewed-by: Witold Baryluk <witold.baryluk@gmail.com>
If the shader contains an include when need to first run the
preprocessor before deciding if we can skip compilation based
on the shader cache.
Reviewed-by: Witold Baryluk <witold.baryluk@gmail.com>
From the ARB_shading_language_include spec:
"#line must have, after macro substitution, one of the following
forms:
#line <line>
#line <line> <source-string-number>
#line <line> "<path>"
where <line> and <source-string-number> are constant integer
expressions and <path> is a valid string for a path supplied in the
#include directive. After processing this directive (including its
new-line), the implementation will behave as if it is compiling at
line number <line> and source string number <source-string-number>
or <path> path. Subsequent source strings will be numbered
sequentially, until another #line directive overrides that
numbering."
Reviewed-by: Witold Baryluk <witold.baryluk@gmail.com>
The new local function lookup_shader_include() will be used by
glDeleteNamedStringARB() in the following patch.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Witold Baryluk <witold.baryluk@gmail.com>
This will be usefull when implementing glIsNamedStringARB() which
doesn't do error checking, it just returns false for invalid
lookups instead.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Witold Baryluk <witold.baryluk@gmail.com>
When the scratch ringbuffer settings are changed, the shader unit has
to be idle or we will have shaders using old and new settings.
That combination is not supported on the HW (likely the offset is
ringbuffer idx * WAVESIZE * 1024).
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Now that we can (mostly) generate a pipe format for a VkFormat, use that
to answer queries about formats. This will let us refactor the freedreno
format table surface layout code to be shared between gallium and vulkan.
This causes us to expose fewer formats for now (on a 1/100 CTS run I'm
doing, skips go from 3671 to 3835 out of 5145 tests). Fails stay about
the same (478 -> 434, but the run is pretty flaky and we're doing fewer
tests now).
v2: Rebase on master, throw a finishme on missing vk-to-pipe formats that
tu used to support.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> (v1)
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
I'm planning on using this from radv and tu for queries about formats.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
This decreases memory usage, because serialized NIR is more compact.
If shader_has_one_variant is true and the shader is uncached, the first
variant is created from nir_shader, otherwise the first variant and
all other variants are created from serialized NIR.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
a later commit will add back st_vertex_program as a subclass of
st_common_program
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
This matches the uncached codepath.
affected_states was used before initialization, which was technically
a bug, but probably not reproducible due to _NEW_PROGRAM rebinding
everything.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Allows eglmesaext.h to be used in C++ code.
This aligns this file with the rest of EGL.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
Two files exist in that directory:
- vulkan_xlib_randr.h
- vulkan_xlib_xrandr.h
Both were imported in 205c271562 ("vulkan: Update the XML and
headers to 1.1.70") with identical contents (ie. the
VK_EXT_acquire_xlib_display extension), but the former was never
included anywhere and can't be found upstream [1], while the latter is
included in vulkan.h and found upstream.
[1] https://github.com/KhronosGroup/Vulkan-Headers/tree/master/include/vulkan
Fixes: 205c271562 ("vulkan: Update the XML and headers to 1.1.70")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
We could enable it on GFX10 if LLVM wasn't used as a fallback for
unsupported stages. Note that the CTS only tests it if
VK_KHR_shader_float16_int8 is enabled, even though it's not a
requirement.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
The multiplication reduction is larger than it could be, but it should be
easier to implement this way.
No failures with dEQP-VK.subgroups.*int64* except those caused by LLVM
being used for other stages.
v2: don't call setFixed() for v_add carry-out, since setHint sets physReg
v3: add and use emit_vadd32() helper
v4: use num_opcodes instead of last_opcode
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)
Should make 64-bit integer reductions easier to implement.
v4: use num_opcodes instead of last_opcode
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)
This extension allows to use subgroup operations with 8 and 16-bits
Untested on GFX6-GFX7, and most of subgroup operations are broken
on GFX10, so don't enable it for now. Not enabled on ACO because
it's still doesn't support 8-bits/16-bits.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It should rely on the source type, not on the return type which
is always a boolean anyways, so vote_feq was never selected. For
OpSubgroupAllEqualKHR it's always an integer comparison.
This fixes some VK_KHR_shader_subgroup_extended_types tests with RADV.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We don't seem to fault any more when running dEQP GLES2, and we don't
scrape serial output any more anyway so no problems should be caused by
that.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Always enabled; this doesn't require any driver work, it's just
core mesa bits.
quick_gl.txt is also updated because previously piglit ext_dsa
tests were skipped.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The spec is unclear on how to handle the buffer argument so we reuse
the logic from the EXT_direct_state_access spec.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We can't simply alias ARB_direct_state_access functions because
those fail if the vao has never been bound before.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The wording in ARB_framebuffer_no_attachments and EXT_direct_state_access
is different.
In the former framebuffer names must have been generated using glGenFramebuffers
before using the named functions.
In the latter framebuffer names have no such constraints, so we can't use
the _mesa_lookup_framebuffer_dsa function.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
All features from the EXT_dsa spec are implemented.
Interactions with other specs:
- GL_AMD_gpu_shader_int64: not needed, since it's not enabled in
compatibility profile.
- GL_ARB_bindless_texture is DONE
"INVALID_OPERATION is generated when calling various functions
to modify the state of a texture object from which handles have
been extracted"
- GL_ARB_buffer_storage/GL_EXT_buffer_storage is DONE (NamedBufferStorageEXT function)
- GL_ARB_texture_storage is DONE (3 TextureStorage*DEXT functions)
- GL_ARB_vertex_attrib_binding is DONE (6 VertexArray* functions)
- GL_EXT_external_buffer is not supported by Mesa
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It doesn't make sense to have nonlinear layouts for a buffer that can be
accessed as direct memory for a compute kernel. Turn that off so things
work as expected.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We can take the OpenCL kernel inputs and interpret them as uniforms by
simply reusing the Gallium callback.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Chrome OS would like to import and render to any supported format that has
a corresponding display plane format, and this prevents throwing
framebuffer incomplete for FBOs using these textures.
See: crbug.com/949260
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Store a flag stating if there was an implmentation, and use
fxn->impl as a temporary flag between deserializsation stages.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In d1c4e64a69, we added a parameter to tell the back-end compiler to
ignore the param array and just push however many constants you ask it
to push. I enabled it for iris because this is really what iris wants
but it seems to have caused a number of regressions. Revert to the old
behavior for now.
Fixes: d1c4e64a69 "intel/compiler: Add a flag to avoid compacting..."
Alignment requirements may have changed the horizontal stride already,
so don't set it if not required to avoid breaking said requirements.
Fixes several tests such as
dEQP-VK.subgroups.vote.graphics.subgroupallequal_int8_t
Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
RS engine does this already, it is missing for BLT engine. This fixes
cases where a clear isn't immediately at the start of the frame.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
There are PE formats not supported by RS, so we can't have a single
to translate both.
Use RS only for same formats until we have a translate_rs_format and test
the possible different format blits.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
* Removes the incorrect usage of translate_rs_format
* Disables use of BLT engine for different src/dst format
We only really need the BLT engine for tiling/detiling right now, but it
would be nice to support as many blit cases as possible to avoid using PE
for that.
To deal with different formats we need to:
* Have a translate_blt_format which has all supported formats
* Fix the swizzle translation from gallium (current version was wrong)
* Set the src/dst sRGB bits as needed
* Find which type conversions the BLT engine can actually do
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
A security advisory (TALOS-2019-0857/CVE-2019-5068) found that
creating shared memory regions with permission mode 0777 could allow
any user to access that memory. Several Mesa drivers use shared-
memory XImages to implement back buffers for improved performance.
This path changes the shmget() calls to use 0600 (user r/w).
Tested with legacy Xlib driver and llvmpipe.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
If both are zero (the common case), we can emit a null vertex buffer
rather than emitting a vertex buffer with zeros in it. The packing of
the VERTEX_BUFFER_STATE is faster because no relocation is emitted and
we can avoid creating the vertex buffer which means one less
anv_state_stream_alloc.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This is a bit more natural because we're already getting an anv_state
most places in the pipeline. The important part here, however, is that
we're no longer calling anv_block_pool_map on every alloc_binding_table
call. While it's probably pretty cheap, it is potentially a linear walk
over the list of BOs and it was showing up in profiles.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instead of blindly dirtying descriptors and push constants the moment we
see a pipeline change, check to see if it actually changes the bind
layout or push constant layout. This doubles the runtime performance of
one CPU-limited example running with the Dawn WebGPU implementation when
running on my laptop.
NOTE: This effectively reverts beca63c6c0. While it was a nice
optimization, it was based on prog_data and we can't do that anymore
once we start allowing the same binding table to be used with multiple
different pipelines.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instead of dirtying all graphics or all compute based on binding point,
we're now much more careful. We first check to see if the actual
descriptor set changed and then only dirty the stages used by that
descriptor set. For dynamic offsets, we keep a bitfield per-stage of
which offsets are actually used in that stage and we only dirty push
constants and descriptors if that stage has dynamic offsets AND those
offsets actually change.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It theoretically could be more efficient but the real point here is that
it's no longer really a matter of dealing with special cases and then
the "real" thing. The way we're handling binding tables, it's more of a
multi-step process and a switch is more natural.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This substantially reworks both the state setup side of push constant
handling and the pipeline compile side. The fundamental change here is
that we're no longer respecting the prog_data::param array and instead
are just instructing the back-end compiler to leave the array alone.
This makes the state setup side substantially simpler because we can now
just memcpy the whole block of push constants and don't have to
upload one DWORD at a time.
This also means that we can compute the full push constant layout
up-front and just trust the back-end compiler to not mess with it.
Maybe one day we'll decide that the back-end compiler can do useful
things there again but for now, this is functionally no different from
what we had before this commit and makes the NIR handling cleaner.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This moves the compute stuff into a anv_push_constants::cs sub-struct.
It also moves dynamic offsets into the push constants. This means we
have to duplicate the data per-stage but that doesn't seem like the end
of the world and one day we may wish to make dynamic offsets per-stage
anyway.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It turns off that emitting push constants is one of the hottest paths in
the driver and ANY work we do there costs us. By pre-computing things a
bit ahead of time, we shave 5% off the runtime of a CPU-limited example
running with the Dawn WebGPU implementation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The bounds checking is actually less safe than just pushing the data.
If the bounds checking actually ever kicks in and it's not on the last
UBO push range, then the shrinking will cause all subsequent ranges to
be pushed to the wrong place in the GRF. One of the behaviors we
definitely don't want is for OOB UBO access to result in completely
unrelated UBOs returning garbage values. It's safer to just push the
UBOs as-requested. If we're really concerned about robustness, we can
emit shader code to do bounds checking which should be stupid cheap (a
CMP followed by SEL).
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
As of 2d78e55a8c, nir_intrinsic_load_constant with a constant offset
is constant-folded so we should never end up with any that trigger
brw_nir_analyze_ubo_ranges.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This lets us stop tracking the pipeline layout. It also means less
indirection on a very hot path. As an extra bonus, we can make some of
our data structures smaller. No measurable CPU overhead improvement.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In the early days of the driver we allowed layout to be VK_NULL_HANDLE
and used that for some internal pipelines when we wanted to be lazy.
Vulkan doesn't actually allow NULL layouts, however, so there's no
reason to have this check.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
A 'normal' texture op may be emitted in a vertex shader on T720 but it
still doesn't take any derivatives.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Re-emitting 3DSTATE_CC_STATE_POINTERS after emitting
3DSTATE_BLEND_STATE_POINTERS fixes the shadow flickering in
SuperTuxCart and Tropico 6 which was seen only on Haswell.
The reason for this is unknown and fix was found empirically.
The closest mention in PRM is that it should improve performance.
From the HSW PRM, volume 2b, page 823 (3DSTATE_BLEND_STATE_POINTERS):
"When the BLEND_STATE pointer changes but not the CC_STATE pointer,
driver needs to force a CC_STATE pointer change to improve
blend performance in pixel backend."
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1834
Fixes: eca4a654 ("i965: Disable dual source blending when shader doesn't support it on gen8+")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This extension adds the device coherent and device uncached memory
types. It's known to be slower than non-device coherent memory but
it might be useful for debugging.
This is only exposed for chips that support L2 uncached.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This simplifies manipulation of the offsets dramatically, fixing some
UBO access related bugs.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Prefetch only supports the basic 2D texture case, checking is_array is
needed because 1d array textures pass the coord num_components==2 test.
Fixes: 2a0d45ae ("freedreno/ir3: Add a NIR pass to select tex instructions eligible for pre-fetch")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
This makes the streams more readable and comparable with the blob's parser
as it parses the VS and PLBU stream and shows the currently known values.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
CodeGenFileType moved from ::llvm::TargetMachine in
llvm/Target/TargetMachine.h to ::llvm:: in llvm/Support/CodeGen.h
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
To avoid following building error:
out/target/product/x86_64/obj_x86/STATIC_LIBRARIES/libmesa_util_intermediates/format/u_format_table.c:30:10:
fatal error: 'u_format.h' file not found
^~~~~~~~~~~~
1 error generated.
Fixes: 882ca6d ("util: Move gallium's PIPE_FORMAT utils to /util/format/")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
GEN10_FORMAT_TABLE_INPUTS requires correction of u_format.csv file path
in order to avoid following build error:
ninja: error: 'external/mesa/util/format/u_format.csv',
needed by 'out/target/product/x86_64/gen/STATIC_LIBRARIES/libmesa_pipe_radeonsi_intermediates/radeonsi/gfx10_format_table.h',
missing and no known rule to make it
Fixes: 882ca6d ("util: Move gallium's PIPE_FORMAT utils to /util/format/")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
We had this ad-hoc exact size matching for unsized internalformats,
but st_choose_matching_format() can do exactly what we want. This
means, that, for example, we'll now prefer the matching ordering for
565/565_REV if the driver supports both orders. We also pass
Unpack.SwapBytes through from ChooseTextureFormat so that we can hit
the memcpy path for 8888 formats when that flag is set.
Some interesting format choice changes from this (on softpipe):
intf/form/type before after
----------------------------------------------------
RGBA/RGBA/USHORT: R8G8B8A8_UNORM -> RGBA_UNORM16
RGB/RGBA/8888: X8B8G8R8_UNORM -> R8G8B8X8_UNORM
RGB/ABGR/8888_REV: X8B8G8R8_UNORM -> R8G8B8X8_UNORM
RGBA/RGBA/5551: B5G5R5A1_UNORM -> A1B5G5R5_UNORM
RGBA/RGBA/4444: R8G8B8A8_UNORM -> A4B4G4R4_UNORM
RGBA/GL_RGBA/1010102: R8G8B8A8_UNORM -> A2B10G10R10_UNORM
DEPTH/DEPTH/UINT: Z24X8 -> Z_UNORM32
DEPTH/DEPTH/USHORT: Z24X8 -> Z_UNORM16
v2: Make sure that the baseformat still matches. v1 would pick
MESA_FORMAT_L16_UNORM for RED/LUMINANCE/SHORT, when we clearly
want a red format.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
sRGB vs unorm was the only conflict case being guarded against in this
function. Before the PIPE_FORMAT conversion, we always listed the
unorm before the sRGB in the enums, but PIPE_FORMAT_A8B8G8R8_SRGB
happens to be before _UNORM. We always want the unorm result here.
Fixes: 807a800d8c ("mesa: Redefine MESA_FORMAT_* in terms of PIPE_FORMAT_*.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We now have a nice helper function for finding those memcpy formats,
without needing to go through each entry of the mesa format table to
see if it happens to match.
While looking at sysprof of a softpipe GLES2 CTS run, we were spending
~8% of the CPU on ChooseTextureFormat. With this, roughly the same
region of the testsuite was .4%.
v2: Add Ken's fix for canonicalizing array formats.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Just return MESA_FORMAT_NONE to avoid triggering unreachable; there's
really no sensible thing to return for this case anyway.
This prevents regressions in the next commit, which makes st/mesa
start using this function to find a reasonable format from GL format
and type enums.
Reviewed-by: Eric Anholt <eric@anholt.net>
Eventually, we will want to combine constants across types, but for now
let's not break the world.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
64-bit ops have their own funky swizzles. Let's pack them, both for
native 64-bit sources as well as extended 32-bit sources.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
State was leaking from previous frames as we weren't updating the
descriptor in all cases.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
On newer GPUs, this is a no-op. On older GPUs, this prevents needless
spilling since texture registers are shared with a subset of work
registers.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
This actually supports more of the extension than the LLVM backend but we
can't enable it because ACO doesn't work with all stages yet.
With more of it enabled, some CTS tests fail because our 64-bit sqrt
is very imprecise. I can't find any precision requirements for it
anywhere, so I'm thinking it might be a CTS issue.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
ACO sets this itself and will have to set it differently in the future to
support shaderDenormFlushToZeroFloat64.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Two benefits:
Most docker image related environment variables can now be defined in
the jobs where they're used instead of globally. The DEBIAN_TAG values
are propagated to other jobs via YAML anchors.
Images on https://gitlab.freedesktop.org/mesa/mesa/container_registry
are now organized in separate repositories with a suffix matching the
name of the job which makes sure the image is there.
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Cleans up .gitlab-ci/ a little, and allows using a single DEBIAN_EXEC
line for all container jobs.
v2:
* Use lava_arm.sh instead of arm_lava.sh for consistency with v2 of the
previous change
Reviewed-by: Eric Anholt <eric@anholt.net> # v1
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
This makes it easier to tell which job is which in a pipeline.
v2:
* Use lava_arm{64,hf} instead of arm{64,hf}_lava to keep these jobs
together in pipeline overviews
Reviewed-by: Eric Anholt <eric@anholt.net> # v1
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Per the spec, the units passed to glPolygonOffset are to be multiplied
by an implementation-defined constant.
On Midgard, this constant seems to be 2.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
To not overwrite the resolve if there is pending clear aspects,
same as color resolves.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
In the case of glibc, pthread_t is internally a pointer. If
lp_rast_destroy() passes a 0-value pthread_t to pthread_join(), the
latter will SEGV dereferencing it.
pthread_create() can fail if either the user's ulimit -u or Linux
kernel's /proc/sys/kernel/threads-max is reached.
Choosing to continue, rather than fail, on theory that it is better to
run with the one main thread, than not run at all.
Keeping as many threads as we got, since lack of threads severely
degrades llvmpipe performance.
Signed-off-by: Nathan Kidd <nkidd@opentext.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
So nir_validate happens properly. Unfortunately this means we have
to play the metadata song and dance, so walk over all impls and say
that we didn't hurt anything.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When demoting it from an output to a global, we need to actually move
it to the correct list. While here, we also refactor so it's clear
we aren't mutating the list while iterating.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2106
Fixes: f9fd04aca1 ("nir: Fix non-determinism in lower_global_vars_to_local")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We were compiling them twice, costing extra build time. Reduces my
ccache-hot clean build time by a second (24.3s to 23.3s, 3 runs each).
The windows args are a little strange -- it's not clear to me that
they're actually used for building these files, but keep them in place
just in case, since we don't have a good windows CI story yet. We
should want them on both gallium and classic regardless: Only osmesa
could be built for windows in classic, and classic OSMesa's scons
build defines these flags too.
Closes: #2052
Acked-by: Dylan Baker <dylan@pnwbakers.com>
As this required use of Python 3.8, mako module also had to be updated.
v2 - Unbind mako module version when using Meson.
Signed-off-by: Prodea Alexandru-Liviu <liviuprodea@yahoo.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
On gen7 and earlier the scratch space size is limited to 12kB.
By enabling this optimization we may easily exceed this limit
without having any fallback.
arb_compute_shader/linker/bug-93840.shader_test crashes with
this lowering on IVB due to exceeding scratch size limit.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2092
Fixes: 69244fc7
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
To make PIPE_FORMATs usable from non-gallium parts of Mesa, I want to
move their helpers out of gallium. Since u_format used
util_copy_rect(), I moved that in there, too.
I've put it in a separate directory in util/ because it's a big chunk
of related code, and it's not clear to me whether we might want it as
a separate library from libmesa_util at some point.
Closes: #1905
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Previously, instruction selection had two kinds of booleans:
1. divergent which was per-lane and stored in s2 (VCC size)
2. uniform which was stored in s1
Additionally, uniform booleans were made per-lane when they resulted
from operations which were supported only by the VALU.
To decide which type was used, we relied on the destination size,
which was not reliable due to the per-lane uniform bools, but it
mostly works on wave64.
However, in wave32 mode (where VCC is also s1) this approach
makes it impossible keep track of which boolean is uniform and
which is divergent.
This commit makes all booleans per-lane.
The resulting excess code size will be taken care of by the optimizer.
v2 (by Daniel Schürmann):
- Better names for some functions
- Use s_andn2_b64 with exec for nir_op_inot
- Simplify code due to using s_and_b64 in bool_to_scalar_condition
v3 (by Timur Kristóf):
- Fix several subgroups regressions
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
ACO's optimizer would try to propagate 64-bit constants, but
does so in such a way that wouldn't work due to how the 64-bit
constants are handled in the IR.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
This patch tries to give instructions with the same execution
mask also the same pass_flags and enables VN for SALU instructions
using exec as Operand.
This patch also adds back VN for VOPC instructions and removes VN for phis.
v2 (by Timur Kristóf):
- Fix some regressions.
v3 (by Daniel Schürmann):
- Fix additional issues
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Using a hash-table walk means that variables will get inserted in
different orders on different runs. Just walk the list of globals
instead, even if some of them can't be turned into locals.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit "1c2bf82d24a glsl: disable lower_fragdata_array() for NIR drivers"
disabled the GLSL IR lowering that turned gl_FragData from an array into a
collection of scalar outputs under the assumption that this was already being
handled properly elsewhere, however there are some corner cases where NIR
would fail to do this, leaving gl_FragData[] as an array variable. This can
break backends that assume that all their outputs will be scalar and use the
variable definitions from the shader to do their output setup, such as the
case of V3D.
At least one corner case was found in some Portal shaders from shader-db, where
NIR would optimize out the full body of a fragment shader. In this scenario,
the empty shader would keep the original array definition of gl_FragData[],
causing the backend to assert.
We need to do this late enough for it to be effective, since doing it in
st_nir_preprocess does not fix the original problem.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2091
Fixes: 1c2bf82d ("glsl: disable lower_fragdata_array() for NIR drivers")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This reverts commit 7520478461.
This series caused unexpected flickering artifacts with Iris driver on
Chrome OS and EGL_EXT_image_flush_external spec has not been published
yet.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
This reverts commit 1d1b457821.
This series caused unexpected flickering artifacts with Iris driver on
Chrome OS and EGL_EXT_image_flush_external spec has not been published
yet.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
This reverts commit 1d122c104a.
This series caused unexpected flickering artifacts with Iris driver on
Chrome OS and EGL_EXT_image_flush_external spec has not been published
yet.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
This reverts commit 34b1aa957a.
This series caused unexpected flickering artifacts with Iris driver on
Chrome OS and EGL_EXT_image_flush_external spec has not been published
yet.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
This reverts commit c1c574fdf1.
This series caused unexpected flickering artifacts with Iris driver on
Chrome OS and EGL_EXT_image_flush_external spec has not been published
yet.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Kristian H. Kristensen <hoegsberg@google.com>
Not much of a difference but slightly better and slightly less
arbitrary.
total instructions in shared programs: 3560 -> 3559 (-0.03%)
instructions in affected programs: 44 -> 43 (-2.27%)
helped: 1
HURT: 0
total bundles in shared programs: 1844 -> 1843 (-0.05%)
bundles in affected programs: 23 -> 22 (-4.35%)
helped: 1
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
On ICL we have the src1 restriction which is applied through
fix_byte_src() and potentially changes the type of the operands from 8
to 32 bits. When this change happens, we fall into the "else if
(bit_size < 32)" case and miscompute src_type because it takes into
consideration bit_size (8) instead of the adjusted size of temp_op
(32). This results in the shader reading unused memory, giving us
mostly failures, but occasional passes due to whatever was already in
the registers we were reading.
This commit fixes a lot of dEQP subgroup i8vec2 tests on ICL, such as:
dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2
This can also be verified by simply changing fix_byte_src() to apply
on all platforms.
Fixes: 5847de6e9a ("intel/compiler: don't use byte operands for src1 on ICL")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
This was causing uninitialized value to end up propagated to the
3DSTATE_DEPTH_BOUNDS packet, leading to asserts on packet
building due to the value being greater than 1.
Fixes: 939ddccb7a ("anv: Add support for depth bounds testing.")
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Pretty routine, we do have a hack to force swizzle alignment for !32-bit
for until we implement !32-bit the right way.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This is less complicated than previously thought. Note we have no way of
specifying the work register count for blend shaders; it must be
strictly less than the work register count of the corresponding fragment
shader (which is fine since we force the fragment shader to report a
count of 16 with a blend shader as a major hack until we get register
pressure down for blend shaders).
TODO: pandecode the flags.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Close input end of the pipe after data was written. Without this
fix I have seen a hang in sysfs_uevent_get(.., "OF_FULLNAME")
when key was not found.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We can clear the "needs" flags once we emit a flag. And also, don't
open-code the opcode name.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
For pre-fs-dispatch texture fetch, we need to assign bary_ij to r0.x,
even if it is not used in the shader (ie. only varying use is for tex
coords). But if, for example, gl_FragCoord is used, it could get
assigned on top of bary_ij, resulting in a GPU hang.
The solution to this is two-fold: (1) the inputs/outputs rework has the
benefit of making RA realize bary_ij is a vec2, even if there are no
split/collect instructions (due to no varying fetches in the shader
itself). And (2) extend the live ranges of meta:input instructions to
the first non-input, to prevent RA from assigning the same register to
multiple inputs.
Backport note: because of (1) above, a better solution for 19.3 would be
to revert f30c256ec0.
Fixes: f30c256ec0 ("freedreno/ir3: enable pre-fs texture fetch for a6xx")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
At the ir3 level, we would assume that we could use wrmask to mask
off other components of an instruction returning a vecN when they are
not used. Which would let RA use components not written for other live
values. But this is only true for tex instructions.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Allow inputs/outputs to be vecN (ie. whatever their actual size is), and
use split to get scalar components of inputs, and collect to gather up
scalar components of outputs.
The main motivation is to simplify RA, by only having to consider split/
collect to figure out where values need to land in consecutive scalar
registers, rather than having to also deal with left/right neighbors.
Because of varying packing, and the resulting fractional location
(location_frac), to implement load_input/store_output, it is still
convenient to have a table of scalar inputs/outputs. We move this to
the compile ctx (since it is only needed for nir->ir3).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
In almost all places, the add_sysval_input() is paired directly with a
create_input(). (The one exception is frag shader ij bary coord, and
this exception will go away in a later patch.)
So go ahead and clean this up before reworking input/output handling.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is a driver-param (loaded from uniform), not a sysval (populated by
hw into a register). So it has no value to having a sysval slot.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We keep kill's alive w/ keeps these days, rather than a fake output.
This condition was left over from prior to that change.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
If I'm going to refactor a bit to use these meta instructions to also
handle input/output, then might as well cleanup the names first.
Nouveau also uses collect/split for names of these meta instructions,
and I like those names better.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This doesn't really work, we can't necessarily just change the outputs
to half-precision like this in anything but simple cases.
Keep the shader key entry around though, eventually with proper mediump
support we could use this with a nir pass to use lower precision frag
shader outputs when the render target format has <= 16b/component.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The instruction has 3 src regs, so `instr->regs[0..3]` are valid, but
`instr->regs[4]` is not.
```
Test case 'dEQP-GLES31.functional.shaders.linkage.es31.tessellation.varying.rules.output_superfluous_declaration'..
==29239== Invalid read of size 8
==29239== at 0x5BE9CDC: emit_cat6 (ir3.c:841)
==29239== by 0x5BEA1BF: ir3_assemble (ir3.c:921)
==29239== by 0x5BDF0A7: ir3_shader_assemble (ir3_shader.c:133)
==29239== by 0x5BDF193: assemble_variant (ir3_shader.c:162)
==29239== by 0x5BDF407: create_variant (ir3_shader.c:215)
==29239== by 0x5BDF4DB: shader_variant (ir3_shader.c:241)
==29239== by 0x5BDF553: ir3_shader_get_variant (ir3_shader.c:257)
==29239== by 0x5BA85F7: ir3_shader_variant (ir3_gallium.c:80)
==29239== by 0x5BA7703: ir3_cache_lookup (ir3_cache.c:96)
==29239== by 0x5B8B8B3: fd6_emit_get_prog (fd6_emit.h:119)
==29239== by 0x5B8C137: fd6_draw_vbo (fd6_draw.c:186)
==29239== by 0x5BB1FBB: fd_draw_vbo (freedreno_draw.c:290)
==29239== Address 0xb97f2d0 is 0 bytes after a block of size 240 alloc'd
==29239== at 0x4848D54: malloc (in /usr/lib/aarch64-linux-gnu/valgrind/vgpreload_memcheck-arm64-linux.so)
==29239== by 0x61BD35B: ralloc_size (ralloc.c:119)
==29239== by 0x61BD41B: rzalloc_size (ralloc.c:151)
==29239== by 0x5BE599B: ir3_alloc (ir3.c:45)
==29239== by 0x5BEA583: instr_create (ir3.c:984)
==29239== by 0x5BEA5DF: ir3_instr_create2 (ir3.c:1000)
==29239== by 0x5BEE317: ir3_STLW (ir3.h:1431)
==29239== by 0x5BF12D3: emit_intrinsic_store_shared_ir3 (ir3_compiler_nir.c:903)
==29239== by 0x5BF418B: emit_intrinsic (ir3_compiler_nir.c:1802)
==29239== by 0x5BF5D07: emit_instr (ir3_compiler_nir.c:2339)
==29239== by 0x5BF603F: emit_block (ir3_compiler_nir.c:2426)
==29239== by 0x5BF624B: emit_cf_list (ir3_compiler_nir.c:2474)
==29239==
```
Probably this only triggers in non-optimized builds?
Fixes: 1f3b52ce50 ("freedreno/a6xx: Add register offset for STG/LDG")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we're not using so many job slots, it's easy to get these
jobs run in a reasonable amount of time (gles3 took 10 minutes for 4
cores, and gles31 was 15 minutes for 4 cores).
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
This runner is a little project by Bas, written in C++, that spawns
threads that then loop grabbing chunks of the (randomly shuffled but
consistently so) test list and hand it to a dEQP instance. As the
remaining list gets shorter, so do the chunks, so hopefully the
threads all complete effectively at once. It also handles restarting
after crashes automatically. I've extended the runner a bit to do
what I was doing in the bash scripts before, like the skip list and
expected failures handling. This project should also be a good
baseline for extending to handle retesting of intermittent failures.
By switching to it, we can have the swrast tests just take up one job
slot on the shared runners and keep their allotment of CPUs busy,
instead of taking up job slots with single-threaded dEQP jobs. It
will also let us (eventually, once I reprovision) switch the freedreno
runners over to threading within the job instead of running concurrent
jobs, so that memory scribbles in one pipeline don't affect unrelated
pipelines, and I can experiment with their parallelism (particularly
on a306 where we are frequently backed up) without trashing other
people's jobs.
What we lose in this process is per-test output in the log (not a big
loss, I think, since we summarize fails at the end and reducing log
length keeps chrome from choking on our logs so badly). We also drop
the renderer sanity checking, since it's not saving qpa files for us
to go poke through. Given that all the drivers involved have fail
lists, if we got the wrong renderer somehow, we'd get a job failure
anyway.
v2: Rebase on droppong of the autoscale cluster and the arm64
build/test split. Use a script to deduplicate the cts-runner
build.
v3: Rebase on the amd64 build/test container split.
Acked-by: Daniel Stone <daniels@collabora.com> (v1)
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> (v2)
The bash scripts were using grep in the manner that matches any subset
of the line, but the new CTS runner matches the whole line and I think
that's a pretty good behavior. Given that some of the skip lists
already were written to match the full test name, just make them
consistently do so.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
This helps cut down our container build time. I've left a few that
we're likely to rev more frequently or I was less confident in
dropping.
v2: Rebase on the build/test container split, now bumps the build
container tag in this commit.
Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Acked-by: Daniel Stone <daniels@collabora.com> (v1)
We can end up with scenarios where last_fence is associated with a batch
that is flushed through some other path before needs_out_fence_fd gets
set. Resulting in returning a fence that has no backing fd.
The simplest thing is to just skip the optimization to try and avoid
no-op batches when a fence-fd is requested. This should normally be
just once a frame anyways.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
If the data is uniform, then it's really a uniform copy. If the index is
uniform, then it's really a read_invocation.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Seems we can use DPP's row_mask field to get an effect similar to
modifying exec.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Autotools was deprecated for a while and has now been removed, so let's
start using meson here so that we won't have any issues next time we
update libdrm.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
It seems whatever was causing this is no longer an issue. So let's get
rid of the hack here.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
The automatically generated padding in structs contains
undefined values, force pack the structs to eliminate the
padding. Otherwise structs with the same values may generate
different hashes.
Valgrind output:
Conditional jump or move depends on uninitialised value(s)
util_fast_urem32 (fast_urem_by_const.h:71)
hash_table_search (hash_table.c:262)
_mesa_hash_table_search (hash_table.c:296)
anv_pipeline_cache_search_locked (anv_pipeline_cache.c:318)
anv_pipeline_cache_search (anv_pipeline_cache.c:335)
lookup_blorp_shader (anv_blorp.c:38)
blorp_params_get_mcs_partial_resolve_kernel (blorp_clear.c:1112)
blorp_mcs_partial_resolve (blorp_clear.c:1205)
anv_image_mcs_op (anv_blorp.c:1742)
anv_cmd_predicated_mcs_resolve (genX_cmd_buffer.c:774)
transition_color_buffer (genX_cmd_buffer.c:1159)
cmd_buffer_end_subpass (genX_cmd_buffer.c:4840)
Uninitialised value was created by a stack allocation
blorp_params_get_mcs_partial_resolve_kernel (blorp_clear.c:1103)
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Same as was done for the ARM images before.
This should make it less painful to update to newer dEQP / piglit as
well as to make changes to the build/test environment.
Reviewed-by: Eric Anholt <eric@anholt.net>
One job for the quick_gl profile, one for the glslparser & quick_shader
profiles (doing these together takes hardly any more time than
quick_shader alone).
v2:
* Don't break lava tests
v3:
* Remove piglit test artifacts paths:
* Exclude some quick_shader tests again:
- Test whose result flips between pass/fail/skip
- *@vs_in tests, as not the same one of these gets picked every time
v4:
* Do not list passing tests in .gitlab-ci/piglit/*.txt (Eric Anholt)
* Include the test number summary in .gitlab-ci/piglit/*.txt
* Completely disable generating any vs_in tests in the piglit build.
* Remove some more unneded files from the piglit build tree.
* Exclude quick_gl arb_gpu_shader5 tests; they were all skipped anyway,
as llvmpipe doesn't support this extension yet, but occasionally they
would spuriously fail instead.
v5:
* Set LD_LIBRARY_PATH, so we actually test the Mesa build from the
pipeline...
* Verify that wflinfo reports the expected Mesa version
* Pass -noreset to Xvfb
v6:
* Don't use autoscale runners, run piglit with -j4 (Eric Anholt)
Reviewed-by: Eric Anholt <eric@anholt.net>
It's currently only needed for the meson-main and meson-arm64 jobs, not
the other meson build jobs.
Also remove MESON_SHADERDB, just run .gitlab-ci/run-shader-db.sh
directly from the meson-main job.
v2:
* Also run prepare-artifacts.sh in meson-arm64 script
v3:
* Move tarball creation into the new script as well, as it prevented
ccache --show-stats from running in after_script
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> # v1
Reviewed-by: Eric Anholt <eric@anholt.net>
By default, ninja tries to saturate all cores of the runner host
machine, which could overload it due to other jobs running in parallel.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Otherwise, we fail validation and potentially generate invalid code.
Let's fix up the mode of the accesses to the variable.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This fixes crashes with some
dEQP-VK.spirv_assembly.instruction.spirv1p4.* tests.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
A640 seems to work without any other changes (glmark and vkcube).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Timeline semaphore introduce support for wait before signal behavior,
which means that it is now allowed to call vkQueueSubmit() with wait
semaphores not yet submitted for execution. Our kernel driver requires
all of the wait primitives to be created before calling the execbuf
ioctl. As a result, we must delay submissions in the userspace driver.
This change store the necessary information to be able to delay a
VkSubmitInfo submission to the kernel driver.
v2: Fold count++ into array access (Jason)
Move queue list to another patch (Jason)
v3: Document cleanup of temporary semaphores (Jason)
v4: Track semaphores of SYNC_FD type that needs updating after delayed
submission
v5: Don't forget to update sync_fd in signaled semaphores after
submission (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Delayed submissions required by timeline semaphores mean we need to be
able to update the sync fd backed semaphores in a delayed fashion.
This could mean a race between the application destroying the
semaphore and the submission code trying to update it with the new
sync fd.
This change prepares semaphores to be refcounted, we'll most likely
only take a reference for cases where we signal a sync fd semaphore.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When we will submit to i915 from a submission thread, we won't be able
to directly report the error to the user (in particular through the
debug report callbacks). So prepare 2 paths to report errors device ->
notifying the user immediately, queue -> notifying the user the next
time an entry point is called.
In this change we still report directly for both paths, this will
change in the next commit.
v2: Split NULL batch parameter handling in
anv_queue_submit_simple_batch() in a different commit
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Prepare the queue initialization to take on more responsabilities and
possibly fail.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In the future we'll have 2 different allocations depending on whether
we're using threaded submission or not.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This doesn't seem to fix anything because those destroy() calls happen
right before the command buffer object & its list of batch_bo is also
destroyed. Still looks a bit cleaner.
v2: Found a second occurence
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
Fixes: 26ba0ad54d ("vk: Re-name command buffer implementation files")
Cc: <mesa-stable@lists.freedesktop.org>
We always close the in_fence at the end the anv_cmd_buffer_execbuf()
so when we take it from the semaphore, let's not forget to invalidate
it.
Note that the code leaks the fence_in if we get any error before
reaching the close(). Let's fix that in another patch or better,
rewrite the whole thing!
v2: drop redundant fd = -1 (Jason)
v3: Update commit message (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The change made in 88d665830f ("mesa: check draw buffer completeness
on glClearBufferfi/glClearBufferiv") correctly updated the state prior
to checking the framebuffer completeness on glClearBufferiv but not in
glClearBufferfi.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Fixes: 88d665830f ("mesa: check draw buffer completeness on glClearBufferfi/glClearBufferiv")
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/2072
Currently the linker do all the work then check for the limits, which
means num_textures and num_images in shader_info may have to store more
than the limit. This breaks down now since shader_info was packed and
doesn't expect to store larger invalid values.
To fix this, pull the check before we set the counts in shader_info.
Add necessary plumbing to make sure we bail once those errors are
found.
Fixes: 84a1a2578d ("compiler: pack shader_info from 160 bytes to 96 bytes")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Currently the linker do all the work then check for the limits, which
means num_ssbos and num_ubos in shader_info may have to store more
than the limit. This breaks down now since shader_info was packed and
doesn't expect to store larger invalid values.
To fix this, pull the check before we set the counts in shader_info.
One drawback of this approach is that for some cases we might not see
the collected errors from various stages, but bail as soon as a stage
breaks the limits.
Fixes: 84a1a2578d ("compiler: pack shader_info from 160 bytes to 96 bytes")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This allows ZSTD instead of ZLIB to be used for compressing the shader
cache.
On a 72 core system emulating skl with a full shader-db (with i965):
ZSTD:
1915.10s user 229.27s system 5150% cpu 41.632 total (cold cache)
225.40s user 10.87s system 3810% cpu 6.201 total (warm cache)
154M (235M on disk)
ZLIB:
2231.33s user 194.24s system 1899% cpu 2:07.72 total (cold cache)
229.15s user 10.63s system 3906% cpu 6.139 total (warm cache)
163M (244M on disk)
Tim Arceri sees (8 core ryzen and a full shader-db):
ZSTD:
2505.22 user 40.50 system 3:18.73 elapsed 1280% CPU (cold cache)
418.71 user 14.93 system 0:46.53 elapsed 931% CPU (warm cache)
454.3 MB (681.7 MB on disk)
ZLIB:
3069.83 user 40.02 system 4:20.13 elapsed 1195% CPU (cold cache)
425.50 user 15.17 system 0:46.80 elapsed 941% CPU (warm cache)
470.3 MB (701.4 MB on disk)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
Move differences in eglextchromium.h header file, then provide the same header than libglvnd-1.2
So program that omit to include eglextchromium.h will fail to build with both mesa and libglvnd headers.
Fixes: a0a8109f "include: add the definition of EGL_EXT_image_flush_external"
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Shader-db results on Kaby Lake:
total instructions in shared programs: 14929212 -> 14880028 (-0.33%)
instructions in affected programs: 72428 -> 23244 (-67.91%)
helped: 6
HURT: 2
helped stats (abs) min: 2165 max: 15981 x̄: 8590.00 x̃: 7624
helped stats (rel) min: 56.06% max: 74.52% x̄: 67.55% x̃: 72.08%
HURT stats (abs) min: 1178 max: 1178 x̄: 1178.00 x̃: 1178
HURT stats (rel) min: 350.60% max: 361.35% x̄: 355.97% x̃: 355.97%
95% mean confidence interval for instructions value: -11947.03 -348.97
95% mean confidence interval for instructions %-change: -125.72% 202.37%
Inconclusive result (%-change mean confidence interval includes 0).
total cycles in shared programs: 368585300 -> 342557344 (-7.06%)
cycles in affected programs: 28144921 -> 2116965 (-92.48%)
helped: 6
HURT: 2
helped stats (abs) min: 1404978 max: 7766106 x̄: 4353922.00 x̃: 3890682
helped stats (rel) min: 82.01% max: 95.57% x̄: 89.95% x̃: 92.28%
HURT stats (abs) min: 47778 max: 47798 x̄: 47788.00 x̃: 47788
HURT stats (rel) min: 278.20% max: 282.98% x̄: 280.59% x̃: 280.59%
95% mean confidence interval for cycles value: -5900438.73 -606550.27
95% mean confidence interval for cycles %-change: -140.79% 146.16%
Inconclusive result (%-change mean confidence interval includes 0).
total spills in shared programs: 9243 -> 8901 (-3.70%)
spills in affected programs: 2718 -> 2376 (-12.58%)
helped: 4
HURT: 4
total fills in shared programs: 21831 -> 10141 (-53.55%)
fills in affected programs: 11804 -> 114 (-99.03%)
helped: 6
HURT: 2
total sends in shared programs: 815912 -> 815912 (0.00%)
sends in affected programs: 0 -> 0
helped: 0
HURT: 0
LOST: 1
GAINED: 3
The helped shaders are all compute shaders in Aztec Ruins. There is
also a compute shader in synmark2 OglCSDof that's helped but it doesn't
show up in above shader-db results because it went from SIMD8 to SIMD16.
That shader improves enough to yield an 15-20% performance boost to the
benchmark as a whole on my KBL laptop. The hurt shaders are a couple
shaders in Kerbal Space Program and a couple in Aztec Ruins.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This commit fills in a number of different pieces:
1. We add support to brw_nir_lower_mem_access_bit_sizes to handle the
new intrinsics. This involves simple plumbing work as well as a
tiny bit of extra logic to always scalarize scratch intrinsics
2. Add code to brw_fs_nir.cpp to turn nir_load/store_scratch intrinsics
into byte/dword scattered read/write messages which use the A32
stateless model.
3. Add code to lower_surface_logical_send to handle dword scattered
messages and the A32 stateless model.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The new helper solves most of the annoying problems with data wrangling
in brw_nir_lower_mem_access_bit_sizes.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This new helper is better than nir_bitcast_vector because it's able to
take a (mostly) arbitrary range from the source vector. The only
requirement is that first_bit has to be aligned to the smaller of the
two bit sizes. It wouldn't be hard to lift that requirement but it's
reasonable for now.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Avoid duplicating some checks and code by making anv_GetDeviceQueue a
subcase of anv_GetDeviceQueue2, like radv does.
Signed-off-by: Ricardo Garcia <rgarcia@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If we have an accelerated path for a particular framebuffer format,
let's use it to save a bunch of instructions in a blend shader.
[Tomeu: Only use the faster intrinsic on >T760]
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
While most load/store operations on 32-bit/vec4 intriniscally, some are
not and have special type-size-dependent semantics for the mask. We need
to convert into this native format.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
There are two versions of this opcode, depending what version of the ISA
you're using. I'm not sure if there's a semantic difference; I think
there might be some slight subtleties but it's too early to know at this
stage.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
This is a single opcode, at least on newer Midgard chips. It's easier to
have this represented in NIR rather than trying to optimize out the
conversions, so let's add the intrinsic.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
When using packed vulkan-formats on little-endian systems, we need to
swap the components for the gallium formats. And since Zink isn't
big-endian safe yet, little-endian is the only endianess we care about
right now.
This fixes a bunch of piglit tests, amongs others:
- spec@arb_depth_texture@depth-level-clamp
- spec@arb_depth_texture@depthstencil-render-miplevels * d=z24
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-blit
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-copypixels
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-drawpixels
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-readpixels
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 8d46e35d16 ("zink: introduce opengl over vulkan")
The system can be disabling HW acceleration unbeknown to the user,
leading to a long debug session trying to work out which component is
failing. A quick mention that it is the environment override would be
very useful.
v2: Use more generic "CPU renderer" and so try to avoid jargon.
Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Martin Peres <martin.peres@linux.intel.com>
This commit makes two major changes. First, we add a second case to
OpLoad for sampled images which constructs a vtn_sampled_image and
stashes that rather than stashing a pointer to the combined image
sampler like we do for bare samplers and images. This should be more in
line with how SPIR-V is intended to work and hopefully doesn't cause any
weird problems. The second is a rework of vtn_handle_texture to assume
that everything has an image but not everything has a sampler. We also
add a vtn_fail_if for the case where a texture instructions require a
sampler but none is provided.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This helper makes a duplicate copy of the pointer if any new access
flags are set at this stage. This way we don't end up propagating
access flags further than they actual SPIR-V decorations. In several
instances where we create new pointers, we still call the decoration
helper directly because no copy is needed.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We have types on all vtn_values at this point so there's no reason to
carry the redundant type information.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The instruction count is (mostly) a measure of what optimization passes
can do, while # of nops is more an indication of how effectively the
scheduler is balancing register pressure vs instruction count. So track
these independently.
(There could be opportunities to rematerialize values to reduce register
pressure, swapping some nop's with other alu instructions, so nothing is
truely independent.. but it is still useful to break these stats out.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
The meta PHI instruction was removed long ago. And fanin/fanout
themselves to not contribute actual instructions (at least not by the
time you get to sched, they may prevent copy-propagating away a mov)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Fold it in to writes_gpr() (since a register that does not reference any
registers by definition does not write a register). This lets us avoid
having to handle this case in a few other places.
Signed-off-by: Rob Clark <robdclark@chromium.org>
We did this in some places before, but not consistantly. But it will be
useful for two-pass RA, to identify which registers have already been
assigned.
While we are cleaning this up, use __ssa_src() and new __ssa_dst()
helper more consistently. (If nothing else, this reduces the # of
callers of ir3_reg_create() to audit that we didn't miss something)
Signed-off-by: Rob Clark <robdclark@chromium.org>
The stage specific fields of shader_info are in an union. We've
likely been lucky that this value was either overwritten or ignored by
other stages. The recent change in shader_info layout in commit
84a1a2578d ("compiler: pack shader_info from 160 bytes to 96 bytes")
made this issue visible.
Fixes: cf2257069c ("nir/spirv: Set a default number of invocations for geometry shaders")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Found once I started using the generated unpack code from the Mesa side.
Fixes: 4bbaac3782 ("gallium: Add some more channel orderings of packed formats.")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Mesa emulates planar format sampling with per-plane samplers. Virgl now
supports this by allowing the plane index to be passed when creating a
sampler view from a planar image. With this change, mesa now passes that
information to virgl.
Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Lepton Wu <lepton@chromium.org>
It happens that some games try to access a vertex buffer without
a valid format. This case was incorrectly handled by
ac_get_tbuffer_format which made ACO emit an invalid instruction.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The panfrost BO cache can only grow since all newly allocated BOs are
returned to the cache (unless they've been exported).
With the MADVISE ioctl that's not a big issue because the kernel can
come and reclaim this memory, but MADVISE will only be available on 5.4
kernels. This means an app can currently allocate a lot memory without
ever releasing it, leading to some situations where the OOM-killer kicks
in and kills the app (or even worse, kills another process consuming
more memory than the GL app) to get some of this memory back.
Let's try to limit the amount of BOs we keep in the cache by evicting
entries that have not been used for more than one second (if the app
stopped allocating BOs of this size, it's likely to not allocate
similar BOs in a near future).
This solution is based on the VC4/V3D implementation.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We will soon introduce an LRU list to evict BOs that have been unused
for more than 1 second. Let's first move all BO cache fields to a
sub-struct to clarify which fields are used by the BO caching logic.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
There aren't texture pipeline registers anymore; instead, space is
shared with work and ldst registers for output and input respectively.
We need to shift the base registers to represent this correctly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The meaning of some bits shifts; we need to account for this to print
swizzles sanely.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We were using old style half-registers; let's update that to be
consistent, preparing us for more disassmbler changes in this area.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
When other geometry stages are present, we chose two quads and no
merged regs.
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
At least the gallium blitter helper will call us to draw with
tessellation shaders set but a non-patch primitive.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
It seems like tiling could work in the Adreno architecture, but we've
only ever seen bypass rendering with tessellation. For now, let's do
that too.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Tessellation needs a couple of buffers that should hold the entire
output from a full VS+TCS draw call.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
We need to select the right primitive type, set a bit to turn on
tessellation and or in the TES output primitive type.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
The tessellation stages need size and stride or the patch layout as
well as locations of attributes in the patch. The tesselation stages
also use two system memory BOs and need the iovas of those.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Similar to GS, the registers are shared and not reinitialized betewen
VS and TCS, so we need to make sure to allocate the same registers for
the system values between stages.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Similar to GS, some inputs are reused when the chsh from VS to TCS or
TES to GS, so we need to make sure we setup the right inputs and make
the shared system values outputs so they don't get clobbered.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
We add two new IR3 specific nir intrinsics that map to the new condend
and endpatch instructions.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Our lowering pass made the z component unused by replacing its uses
by 1 - x - y. The intrinsic implementation then just need to return
the x and y components.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
When we have both TES and GS, the TES needs to chain to the VS with
chmask and chsh GS just like the VS does to either TCS or GS.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
There are two new opcodes in use in tesselation control shaders:
category 0, opcodes 13 and 15. unk13 is a kill type of instruction
that terminates threads where !p0.x and it used to narrow down a patch
wavefront to just thread 0. Then, once thread 0 has written the tess
levels, it issues unk15, which might signal the TE that another patch
has been fully written.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
VS and TCS pass varyings the same way as VS and GS does. TCS then
writes entire patch to a system memory BO and TES eventually reads
back from the BO once the TE starts generating vertices. TES outputs
vertices the same way as VS and GS, except when there's a GS as well,
in which case TES passes varyings to GS same way the VS would.
In addition, the TCS needs a little bit of control flow massaging so
that it only runs for valid invocations needs a couple of unknown
instructions to synchronize with the TE.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Whether we're tessellating and which primitives the TES outputs
affects the entire pipeline so let's add a field to the key to track
that.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
With the imul24 opcode in place, we can now use it for computing local
offsets (ie for ldlw/stlw).
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
These provide the iovas for system memory buffers used for
tessellation as well as a new HW specific system value.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
The gallium helper doesn't like patches and we can't determine how
many primitives it gets tessellated into anyway. On gens where we
have tessellation, we get the prim count from a HW counter so just
skip counting on the CPU.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
These intrinsics take a ivec2 for the 64 bit base address and a
integer offset.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Stages that load inputs with ldlw (TCS, GS) need byte offsets, stages
that load with ldg (TES) need dwords offsets.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
These instructions take a 64 bit iova as two conescutive registers and
a immediate offset. This patch adds support for the offset to be a
single register, which is added to the 64 bit iova.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
What we call eRB6_Z24_UNORM_S8_UINT now is actually
RB6_Z24_UNORM_S8_UINT_AS_R8G8B8A8 and RB6_X8Z24_UNORM is actually
RB6_Z24_UNORM_S8_UINT.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2D array textures and 3D textures are different enum values after all.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
We use one mechanism for (REG_A6XX_RBBM_PRIMCTR_8_LO)
PIPE_QUERY_PRIMITIVES_GENERATED, which counts all primitives that exit
the geometry pipeline, whether or not xfb is on. Then for
PIPE_QUERY_PRIMITIVES_EMITTED, we use the CP_EVENT_WRITE subfunction
that writes out per-stream counts for generated and emitted, but only
when xfb is enabled.
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
In particular, increase the cost of 64-bit integer division.
Fixes huge shaders with dEQP-VK.spirv_assembly.type.scalar.i64.mod_geom
, with ACO used for GS this creates shaders requiring a branch with
>32767 dword offset.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fix memory leak on allocation for lima submit, reported by valgrind.
128 bytes in 1 blocks are definitely lost in loss record 38 of 84
at 0x484A6E8: realloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
by 0x58689C7: util_dynarray_ensure_cap (u_dynarray.h:91)
by 0x5868BBB: util_dynarray_grow_bytes (u_dynarray.h:139)
by 0x5868BBB: lima_submit_add_bo (lima_submit.c:113)
by 0x585D7D3: lima_ctx_buff_va (lima_context.c:57)
by 0x586378F: lima_pack_plbu_cmd (lima_draw.c:802)
by 0x586378F: lima_draw_vbo (lima_draw.c:1351)
by 0x5406A2F: u_vbuf_draw_vbo (u_vbuf.c:1184)
by 0x55D0A57: st_draw_vbo (st_draw.c:268)
by 0x55576CB: _mesa_draw_arrays (draw.c:374)
by 0x55576CB: _mesa_draw_arrays (draw.c:351)
by 0x43610B: Mesh::render_vbo() (mesh.cpp:583)
by 0x415DBB: SceneBuild::draw() (scene-build.cpp:242)
by 0x41131B: MainLoop::draw() (main-loop.cpp:133)
by 0x411947: MainLoop::step() (main-loop.cpp:108)
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Fix memory leak on allocation for nir shader, reported by valgrind.
3,502 (480 direct, 3,022 indirect) bytes in 1 blocks are definitely lost in loss record 77 of 84
at 0x48483F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
by 0x5750817: ralloc_size (ralloc.c:119)
by 0x5750977: rzalloc_size (ralloc.c:151)
by 0x575C173: nir_shader_create (nir.c:45)
by 0x5763ACB: nir_shader_clone (nir_clone.c:728)
by 0x55D5003: st_create_fp_variant (st_program.c:1242)
by 0x55D789F: st_get_fp_variant (st_program.c:1522)
by 0x55D789F: st_get_fp_variant (st_program.c:1507)
by 0x56400C3: st_update_fp (st_atom_shader.c:163)
by 0x563D333: st_validate_state (st_atom.c:261)
by 0x55D07CB: prepare_draw (st_draw.c:132)
by 0x55D08DF: st_draw_vbo (st_draw.c:184)
by 0x55576CB: _mesa_draw_arrays (draw.c:374)
by 0x55576CB: _mesa_draw_arrays (draw.c:351)
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
We would like to have GL 4.6 Compatibility too.
The extensions don't support compatibility features, so no other changes
are needed.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
All callers other than the unit test just wanted to convert back from
a known-mesa-equivalent format, which is now a no-op.
v2: Fix assertion failure in iris GL startup with BGR565 by continuing
to return MESA_FORMAT_NONE for non-Mesa formats.
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Now that MESA_FORMAT_x is just a PIPE_FORMAT_x define, we can strip
this function down to just the compression fallbacks.
v2: Restore the SRGB format for ASTC SRGB fallback case.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There are various places in Mesa where we would like to be able to
have a shared format enum between Mesa and gallium (NIR compiler's
image formats, for example, or mapping from gallium's formats to
mesa's and vice versa in st_format.c). Rewriting all MESA_FORMAT to
PIPE_FORMAT would be disruptive and possibly more work than it's worth
(And I actually prefer MESA_FORMAT's name scheme), so for now just
make it so that there's one shared set of enum values.
The #defines here were generated by printing out from the
tests/st_format.c round-tripping loop, with the exception of 8888
formats where I hand-edited the #defines to point at the corresponding
gallium packed format define.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
To redefine MESA_FORMAT in terms of PIPE_FORMAT enums, we need to fix
places where we iterated up to MESA_FORMAT_COUNT. I use
_mesa_get_format_name(f) == NULL as the signal that it's not an enum
value with a MESA_FORMAT.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We checked round-tripping of formats without fallbacks, but weren't
setting the compression support flags in the mock context and thus
needed to skip testing those. Just set all the flags and assert that
no fallbacks are triggered, so we get full test coverage.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We have packed formats for RGBA and ABGR already, so we can just
pack/unpack code.
v2: Rebase on endianness macro rename
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
These are the last formats that MESA_FORMAT had and PIPE_FORMAT
didn't. The .csv entries channel sizes and swizzles all came from the
corresponding UNORM format.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is the last unorm format that MESA_FORMAT had and PIPE_FORMAT
didn't. Note that it's an array format on gallium's side as well,
since it's a NPOT pixel size.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This texture compression is exposed by 830 and 915, and to make
MESA_FORMAT match PIPE_FORMAT defines I need a corresponding
PIPE_FORMAT.
v2: Set is_hand_written so we don't try to generate pack/unpack code.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The primitive indices have to be swapped to follow the drawing
order.
This fixes corruption with Overwatch when NGG GS is force enabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This reverts commit 4432a2d14d.
Pretty much every SKQP test dies with this assertion:
skqp: ../src/mesa/drivers/dri/i965/brw_program_cache.c:102: hash_key: Assertion `item->key_size % 4 == 0' failed.
The automatically generated padding in structs contains
undefined values, force pack the structs to eliminate the
padding. Otherwise structs with the same values may generate
different hashes.
Valgrind output:
Conditional jump or move depends on uninitialised value(s)
util_fast_urem32 (fast_urem_by_const.h:71)
hash_table_search (hash_table.c:262)
_mesa_hash_table_search (hash_table.c:296)
anv_pipeline_cache_search_locked (anv_pipeline_cache.c:318)
anv_pipeline_cache_search (anv_pipeline_cache.c:335)
lookup_blorp_shader (anv_blorp.c:38)
blorp_params_get_mcs_partial_resolve_kernel (blorp_clear.c:1112)
blorp_mcs_partial_resolve (blorp_clear.c:1205)
anv_image_mcs_op (anv_blorp.c:1742)
anv_cmd_predicated_mcs_resolve (genX_cmd_buffer.c:774)
transition_color_buffer (genX_cmd_buffer.c:1159)
cmd_buffer_end_subpass (genX_cmd_buffer.c:4840)
Uninitialised value was created by a stack allocation
blorp_params_get_mcs_partial_resolve_kernel (blorp_clear.c:1103)
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Only the GL_UNSIGNED_BYTE cases actually work, the rest all fail, but we
should test the working cases to ensure that they continue to work.
Reviewed-by: Brian Paul <brianp@vmware.com>
The workaround got accidentally moved to the wrong place
Fixes: 08d510010b aco: increase accuracy of SGPR limits
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
ctx->pipe_framebuffer contains the last bound FB state, let's release
resources pointed by this FB state when the context is destroyed.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
pipe->stream_uploader has been allocated with u_upload_create_default()
in panfrost_create_context(), let's destroy it in the context destroy
path.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This commit fixes the following warning:
../src/intel/common/gen_decoder.c: In function ‘gen_spec_load_from_path’:
../src/intel/common/gen_decoder.c:741:11: warning: variable ‘len’ set but not used [-Wunused-but-set-variable]
741 | size_t len, filename_len = strlen(path) + 20;
| ^~~
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This commit fixes the following warning:
../src/compiler/nir/nir.c:1827:1: warning: ‘dest_is_ssa’ defined but not used [-Wunused-function]
1827 | dest_is_ssa(nir_dest *dest, void *_state)
| ^~~~~~~~~~~
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This commit fixes the following warning:
../src/compiler/glsl/gl_nir_link_uniforms.c: In function ‘find_and_update_previous_uniform_storage’:
../src/compiler/glsl/gl_nir_link_uniforms.c:166:16: warning: unused variable ‘num_blks’ [-Wunused-variable]
166 | unsigned num_blks = nir_variable_is_in_ubo(var) ?
| ^~~~~~~~
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This fix wrong color when playing video under Android + virgl
configuration.
Fixes: 2decad495f ("gallium/dri2: Support images with multiple planes for modifiers")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Lepton Wu <lepton@chromium.org>
v2: Use ternary to simplify code (Jason)
v3: Reorder switch cases to follow existing section ordering (Nanley)
Add missing comment in cmd_buffer_end_subpass() about new layout (Nanley)
v4: Fix layout comparison for stencil case (Nanley)
Update a few more comments (Nanley)
Move VK_IMAGE_LAYOUT_STENCIL_ATTACHMENT_OPTIMAL_KHR in color
attachment case for future stencil-CCS support (Nanley)
v5: Missed comments update (Nanley)
Updated relnotes.txt (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This reverts commit c9df92bf79.
It turns out that gitlab-runner uses kubernetes all wrong, spawning Pods
and sshing into them to run the script instead of Jobs containing the
script to run. This means that when anything goes wrong with the pod
(autoscale, preemption, VM maintenance, cluster reconfiguration), the job
fails and only sometimes gets handled as a runner system failure. Even
worse, due to bugs in either the runner or k8s itself, some classes of
timeout-related failure end up not being reported as failures, and the job
will incorrectly report success!
Disable using the "autoscale" cluster until we can do something else
(docker-machine instead of k8s, or the custom third-party k8s-native
runner).
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Acked-by: Daniel Stone <daniels@collabora.com>
The image used for test jobs is only about 1/6 as big as before, which
may help avoid some issues with some of the test boards.
Inspired by https://gitlab.freedesktop.org/mesa/mesa/issues/2046 .
v2:
* Leave LIBDRM_VERSION at 2.4.99 (Daniel Stone)
* Delete more build artifacts from dEQP tree (Daniel Stone)
v3:
* Set LD_LIBRARY_PATH for ldd
Acked-by: Daniel Stone <daniels@collabora.com> # v2
Reviewed-by: Eric Anholt <eric@anholt.net> # Except for the ldd line
We don't support nir_texop_txd, which is required by this cap. So let's
disable it for now.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 8d46e35d16 ("zink: introduce opengl over vulkan")
We do not support them yet, so let's not pretend.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 8d46e35d16 ("zink: introduce opengl over vulkan")
There's no good way to know if a texture-view will be created, so we
just have to accept it for all resources.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 8d46e35d16 ("zink: introduce opengl over vulkan")
We should use the format derived from the image-view here, not from the
image itselt. Otherwise, we'll end up with incompatible render-passes.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 8d46e35d16 ("zink: introduce opengl over vulkan")
Use unsigned values otherwise signed extension will produce a 64 bits value where
the 32 left-most bits are 1.
Fixes: 2afeed3010 ("radeonsi: tell the shader disk cache what IR is used")
This extension allows to control the subgroup size by allowing a
varying subgroup size and also specifying a required subgroup size.
This implementation only allows to specify a required subgroup
size for compute shaders because there is some caveats with
other shader stages (eg. NGG with geometry shader). This
basically allows apps to use Wave32 for compute shaders.
This extension is enabled for all chips but only GFX10 supports
Wave32. ACO doesn't support it.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The maximum number of descriptor sets is indeed 32 but without
the sign bit.
The maximum number of bindings for RADV is way larger, keep it
as 32-bit.
Fixes: 96e6ef80d9 ("nir: pack the rest of nir_variable::data")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Now that all environment variables are documented, it would be
appreciated if we can keep this up-to-date.
[skip ci]
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
That means we have only 30 bits for object IDs, because 2 bits are
sometimes used for something else.
This decrease the uncompressed shader size for the biggest Borderlands 2
shader from 33.6 KB to 23.2 KB. (31% decrease)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This decreases memory usage, because serialized NIR is more compact.
The main shader part is compiled from nir_shader.
Monolithic shader variants are compiled from nir_binary.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
not needed. We also need to free TGSI in the destroy function for the case
when an app is terminated and si_create_compute_state_async is never
executed because of util_queue_drop_job.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
GP handles gl_PointSize similar to gl_Position, i.e. it needs
separate buffer and it has special type in varying descriptors, also
for indexed draw we need to emit special PLBU command to pass
address of gl_PointSize buffer.
Blob also clamps gl_PointSize to 1 .. 100 (as well as line width),
so let's do the same.
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
This was made optional in ff9bf223c2 ("meson: make nm binary optional")
for Windows, but proper windows has been added and `nm` is now only used
on Unix systems.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviwed-by: Dylan Baker <dylan@pnwbakers>
Otherwise if glvnd is not installed systemwide, but only in a prefix,
it's headers wont be found. This happens because if it's headers are in
/usr/include/ then another dependence will provide the necessary -I
arguments and compilation will work.
Fixes: 035ec7a2bb
("meson: Add support for EGL glvnd")
Acked-by: Eric Engestrom <eric@engestrom.ch>
As requested by Tim.
This was generated with:
grep 'PIPE_ARCH_.*_ENDIAN' -rIl | xargs sed -ie 's@PIPE_ARCH_\(.*\)_ENDIAN@UTIL_ARCH_\1_ENDIAN@'g
v2: - add this patch
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
This will allow it to be used as a drop in replacement for
_mesa_little_endian in a number of cases.
v2: - Always define PIPE_ARCH_LITTLE_ENDIAN and PIPE_ARCH_BIG_ENDIAN,
define the one that reflects the host system to 1 and the other to 0
- replace all uses of #ifdef, #ifndef, and #if defined() with #if
and #if ! with PIPE_ARCH_*_ENDIAN
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
_WIN32 is defined by basically all windows compilers (MSVC, ICL, MinGW),
wereas _MSC_VER is not defined by MinGW. Without this change MinGW falls
through and doesn't define PIPE_ARCH at all, and is caught by some extra
code in gallium.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Run only jobs needed for testing on LAVA devices if a branch starts with
lava-ci-.
This allows developers to have faster test cycles as these pipelines
take only a bit above 8 minutes. Also has the advantage of conserving
resources.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The implementation doesn't share much with get.c because:
* the refactoring needed for get.c to not depend on ctx->Array.VAO would
be quite large
* glGetVertexArray* would still need to filter pname to only accept the one
specified by the spec
* these functions are getter, the implementation is trivial (the complexity
is in the correct filtering of pname input)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Add a single helper dealing with the lookup of both the vao
and the vbo to avoid duplicating this code in all the
glVertexArray* functions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ARB_dsa and EXT_dsa slightly differs when an uninitialized VAO
is requested.
In this case ARB_dsa fails while EXT_dsa requires to initialize
the object.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
$PWD doesn't work for variables:, it ended up as "/ccache", always
starting with an empty cache.
v2:
* Use relative path and realpath
v3:
* Use $CI_PROJECT_DIR (Eric Anholt)
* Clear ccache stats in before_script if the cache is in $CI_PROJECT_DIR
Fixes: c9df92bf79 "ci: Switch over to an autoscaling GKE cluster for
builds."
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes a ton of regressions in image load store tests.
Fixes: 4319cc8c0f ("nir: pack nir_variable::data::xfb_*")
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Commit 5847de6e9a implemented a restriction that applies to ICL, but
wrongly marked it as also applying to GLK. Reviewers or MR !1125
pointed this, and the commit history shows removal of GLK to parts of
the patch, but it turns there was still a left-over GLK check in the
code.
This code was breaking some of the i8vec2 tests on GLK, for example:
dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2
Removing the GLK check solves the issue for GLK. I don't see a reason
on why implementing this restriction would actually break GLK, so
there's still more to investigate here since this bug may be affecting
ICL+, but let's apply the real GLK fix while we analyze and discuss
the other possible issues.
Fixes: 5847de6e9a ("intel/compiler: don't use byte operands for src1
on ICL")
BSpec: 3017
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
This prevents some additional optimizations that would change the
original result. This includes things like (b < a && b < c) => b <
min(a, c) and !(a < b) => b >= a. Both of these optimizations were
specifically observed in the piglit tests added in piglit!160.
This was discovered while investigating
https://gitlab.freedesktop.org/mesa/mesa/issues/1958. However, the
problem in that issue was Chrome or Angle is replacing calls to isnan()
with some stuff that we (correctly) optimize to false. If they had left
the calls to isnan() alone, everything would have just worked.
No shader-db changes on any Intel platform.
I also tried marking the comparison generated by the isnan() function
precise. The precise marker "infects" every computation involved in
calculating the parameter to the isnan() function, and this severely
hurt all of the (few) shaders in shader-db that use isnan().
I also considered adding a new ir_unop_isnan opcode that would implement
the functionality. During GLSL IR-to-NIR translation, the resulting
comparison operation would be marked exact (and the samething would need
to happen in SPIR-V translation).
This approach taken by this patch seemed easier, but we may want to do
the ir_unop_isnan thing anyway.
Fixes: d55835b8bd ("nir/algebraic: Add optimizations for "a == a && a CMP b"")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We would like to pack not just xyzw swizzles but also efgh swizzles.
This should work for vec4/16-bit. More work will be needed to pack
swizzles for vec8/16-bit and even more work for 8-bit, of course.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Midgard prefetches instructions based on tag (ALU, LD/ST, texture *
size). To do so, the shader descriptor specifies the tag of the first
instruction, all instructions specify the tag of the next linear
instruction is, and all branches explicitly specify the tag of the
branch target.
If you mess this up, you get an INSTR_TYPE_MISMATCH, which unambiguously
refers to this problem, but it's still annoying to try to work out all
the branch targets in your head to debug.
Instead, let's track the tags of various blocks over time, so we can
automatically validate tags of branch targets, to make
INSTR_TYPE_MISMATCH issues immediately obvious in a disassembly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The host query reset entry point didn't use the availability offset
for performance queries.
To fix this, reorder the availability of performance queries to match
other queries.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2b5f30b1d9 ("anv: implement VK_INTEL_performance_query")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When importing a dmabuf with a specified tiling, the dmabuf user
should always try to set the tiling mode because: 1) the exporter
can set tiling AFTER exporting/importing. 2) a dmabuf could be
exported from a kernel driver other than i915, in this case the
dmabuf user and exporter need to set tiling separately.
This patch fixes a problem when running vkmark under weston with
iris on ICL, it crashed to console with the following assert. i965
doesn't have this problem as it always tries to set the specified
tiling mode.
weston: ../src/gallium/drivers/iris/iris_resource.c:990: iris_resource_from_handle: Assertion `res->bo->tiling_mode == isl_tiling_to_i915_tiling(res->surf.tiling)' failed.
Signed-off-by: James Xiong <james.xiong@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
In 2ca0d913ea, we began updating cso_fb->layers to the actual layer
count, rather than 0. This fixed cases where we were setting "Force
Zero RTA Index Enable" even when doing layered rendering. Sadly, it
also broke the check entirely: cso_fb->layers is now 1 for non-layered
cases, but the Force Zero RTA Index check was still comparing for 0.
Fixes: 2ca0d913ea ("iris: Fix framebuffer layer count")
Python has the identity operator `is`, and the equality operator `==`.
Using `is` with strings sometimes works in CPython due to optimizations
(they have some kind of cache), but it may not always work.
Fixes: 96c4b135e3
("nir/algebraic: Don't put quotes around floating point literals")
Reviewed-by: Matt Turner <mattst88@gmail.com>
MALI_DEPTH_TEST should only be set when depth->writemask is true,
not when the depth test is enabled. Let's rename the flag and patch
panfrost_bind_depth_stencil_state() to do the right thing.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
If an app first creates a compute pipeline with
VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT set, then re-compile it
without that flag, the driver should re-compile the compute shader.
Otherwise, it will return the unoptimized one.
Fixes: ce188813bf ("radv: add initial support for VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Any BO would work, we don't have any BO types yet anyway. Moreover
lima_submit_add_bo() changes BO flags so they won't match allocation
flags.
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
LIMA_DEBUG=bocache now activates debug prints for BO allocation,
destruction and BO cache.
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Clearly we do want to have fp16 at some point ... but I kind of give up
debugging and it turns out the issues with fp16 support in 'frost are so
deeply rooted that I might as well disable this non-opt and land
LCRA now.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The seccomp filter allows read/write, let us make sure nobody can
do anything with this.
Fixes: cff53da374 "radv: enable secure compile support"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is incorrect, because polygonMode only applies if the final
primitive type is a polygon; polygonMode doesn't apply to
line-primitives as the comment suggests.
The Vulkan 1.1 spec, section 26.11, "Polygons" defines that polygons are
separate from points and line segments:
" A polygon results from the decomposition of a triangle strip, triangle
fan or a series of independent triangles. Like points and line segments,
polygon rasterization is controlled by several variables in the
VkPipelineRasterizationStateCreateInfo structure. "
Further, section 26.11.2, "Polygon Mode", only define polygonMode to
apply to polygons:
" Possible values of the VkPipelineRasterizationStateCreateInfo::polygonMode
property of the currently active pipeline, specifying the method of
rasterization for polygons, are: "
This seems to clearly define that polygonMode doesn't apply to points
and lines, so let's make sure that we don't early out with the wrong
value.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Rather than having hw-specific swizzles encoded directly in the
instructions, have a unified swizzle arary so we can manipulate swizzles
generically.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We want symmetry between loads and stores, so we add a dummy source. So
we get, e.g.
st_int4 _, val, arg_1, arg_2
ld_int4 dest, _, arg_1, arg_2
Semantically, this dummy source represents the data itself, as if the
load is simply a move. That means it has a swizzle that acts as a
source.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This function was added in 7e414b5864 to work around a defect in
lower_output_reads(). As of the previous commit no NIR driver calls
lower_output_reads().
This change means we don't need the special GLSL IR style
gl_FragData handling for building the resource list in a NIR based
linker.
No shader-db change on SKL i965.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This will allow us to stop lowering gl_FragData in GLSL IR for NIR
drivers which means we won't need the special GLSL IR type
handling for building the resource list in a NIR based linker.
i965 has been doing this since b828f7a27b.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When doing an indexed draw with index_bias set to a non-zero value (e.g.
by glDrawElementsBaseVertex), the vertex buffer should be offseted by
index_bias vertices.
Add this offset when setting the vertex buffer address.
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Now that we're no longer compacting binding table entries, the only time
they can possibly change is when we actually switch subpasses.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Instead, always emit one entry for every color attachment in the subpass
or one NULL if there are no color attachments. This will let us adjust
an Ice Lake workaround so we don't get a stall on every draw call.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Also, use color_outputs_valid rather than nr_color_outputs since it
should be a bit more accurate.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
The GKE pool we're using is 1-3 32-core VMs, preemptible (to keep
costs down), with 8 jobs concurrent per system. We have plenty of
memory (4G/core), so we run make -j8 to try to keep the cores busy even
when one job is in a single-threaded step (docker image download, git
clone, artifacts processing, etc.) When all jobs are generating work
for all the cores, they'll be scheduled fairly.
The nodes in the pool have 300GB boot disks (over-provisioned in space
to provide enough iops and throughput) mounted to /ccache, and
CACHE_DIR set pointing to them. This means that once a new
autoscaled-up node has run some jobs, it should have a hot ccache from
then on (instead of having to rely on the docker container cache
having our ccache laying around and not getting wiped out by some
other fd.o job). Local SSDs would provide higher performance, but
unfortunately are not supported with the cluster autoscaler.
For now, the softpipe/llvmpipe test runs are still on the shared
runners, until I can get them ported onto Bas's runner so they can be
parallelized in a single job.
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
That's where `xmlpool_options_h` is defined, and this way we can make sure
nobody starts making use of it in the future :)
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
When switching this to dynamic state, I forgot that this also needs to
be emitted when we use a polygon-mode set to lines.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 6d30abb4f1 ("zink: use dynamic state for line-width")
Build failure reported by i965 CI, triggered by building dynamic
pipeloaders with kmsro drivers (besides 'frost). At this point, there's
no reason to actually do that -- mesa CI didn't mind -- but let's not
break the build.
v2: Simplify script. Add extra dependencies for v3d.
Fixes: afb0d08cb0 ("pipe-loader: Default to kmsro if probe fails")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reported-by: Clayton Craft <clayton.a.craft@intel.com>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Now that we can conveniently map between GEM handles and struct anv_bo
pointers, we can use a simple bitset for residency tracking instead of
the complex hash set. This shaves about 3% off of a CPU-limited example
running with the Dawn WebGPU implementation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Otherwise relocations just up and crash.
Fixes: a3153162a9 "anv: Delay allocation of relocation lists"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We're about to start needing to lookup BO pointers by GEM handle so we
need access to the device.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
BOs are now only ever allocated through the BO cache so there's no need
to have these exposed.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
While we're here, we get rid of the locking and use a lock-free
algorithm. The chances of spilling contention are low and this is
actually a bit simpler in some ways.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
the ASYNC flag, in particular, has the potential to help performance
because it means less sync tracking in the kernel.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This commit switches block pools over to being allocated from the BO
cache rather than being allocated manually by the block pool.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We're about to start depending on the BO cache in the state and block
pools so we need them properly initialized for the tests to work.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
All block pools are allocated with the same flags. There's no good
reason why it needs to be configurable.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This makes a number of changes to the current API:
1. Everything is renamed to anv_device_* instead of anv_bo_cache_*
because the BO cache is soon going to be the sole BO allocation path
and not some special case to make import/export work.
2. Drop the cache parameter. It's totally redundant with the device
and just annoying to keep typing.
3. Rework flags so that they go the convenient direction for usage in
ANV rather than whichever awkward way the i915 specified it to
maintain backwards compatibility. This also gives us the
opportunity to set some defaults.
4. Add flags for mapping and coherency.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The growing algorithms for the softpin case and the userptr version are
almost entirely different. Having this weird join doesn't make the code
more comprehensible. This rework does a few things:
1. Move the comment about 48-bit addresses to anv_device_init where we
actually unset the EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag.
2. Separate the paths in anv_block_pool_expand_range so it's easier to
see what happens in the two different cases.
3. Use the anv_block_poo::bos array for storing all allocated BOs in
both paths rather than using the cleanup list in both paths. This
lets us make the cleanups array only used for mmaps of the memfd for
the userptr case.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instead of depending on a mutable BO in the state pool for handling
growing state pools, add a concept of "wrapper" BOs which just wrap an
actual BO. This way, the wrapper can exist once for all of time and we
can put it in relocation lists even if the actual BO it references gets
swapped out.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We're not THAT strapped for space that we can't burn one extra bit for
a boolean. If we're really worried about it, we can always shrink the
flags field to 16 bits because the kernel only uses 7 currently.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It has exactly one caller and we're about to change some of the dynamics
which would make this confusing as a separate function.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We have to go through and rewrite them all anyway so it doesn't do us
any good to put them in the list in anv_reloc_list_add. Also, for state
pools the handles are likely wrong by the time vkQueueSubmit is called.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Previously, we would read the offset from the BO in anv_reloc_list_add
to generate the presumed offset and then again in the caller to compute
the 64-bit address to write into the buffer. However, if the offset
somehow changed between these two points, the presumed offset would no
longer match the written offset. This is unlikely to actually ever be a
problem in practice because the presumed offset gets recorded first and
so if the written address is wrong then the presumed offset is almost
certainly wrong and the relocation will trigger. However, it's much
safer to simply have anv_reloc_list_add return the 64-bit address.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This lets us do less allocation because the anv_bo's are now embedded in
the sparse array and it also allows lock-free translation from GEM
handle to BO which will be useful in future commits.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The runner that submits jobs there is down and will turn some time to
get fixed. Disable them for now to keep the CI green.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Got some int->pointer warnings and 20 is not a valid pointer ....
Fixes: 2e3a635ee6 "radv: Add an early exit in the secure compile if we already have the cache entries."
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Until 8bef4df196 the IR (TGSI or NIR) was used in disk_cache driver_flags.
This commit restores this features to avoid crashing when switching from
one IR to the other.
As radeonsi's default is TGSI, I used "driver_flags & 0x8000000 = 0" for TGSI
to keep the same driver_flags.
Fixes: 8bef4df196 ("radeonsi: add si_debug_options for convenient adding/removing of options")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Currently the Android build system doesn't expose the panfrost
driver.
This patch enables the panfrost driver to be build on for the
Android platform.
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-By: Rohan Garg <rohan.garg@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
nir_lower_point_size.c was not build into the libmesa_nir library for non-meson
builds. However it was included in the meson build.
This patch fixes that.
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Until now this made sense because we always paired vertex shaders
with fragment shaders, but as soon as we implement geometry and
tessellation shaders that will no longer be the case, so rename
this to (num_)used_outputs.
v2: Use 'used_outputs' instead of ns_outputs, which is more explicit (Eric).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes the following building error:
external/mesa/src/amd/compiler/aco_spill.cpp:1768:
error: undefined reference to 'aco::lower_to_cssa(aco::Program*, aco::live&, radv_nir_compiler_options const*)'
Fixes: 0b8216b ("aco: Lower to CSSA")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
When starting a BLORP operation, we do the BTI-change flush. However,
when ending it and transitioning back to regular drawing, we change the
render target again - without a set_framebuffer_state() call. We need
to do the BTI flush there too. BLORP flags IRIS_DIRTY_RENDER_BUFFER
now, which will cause the next draw to get the BTI flush again.
(explanation of fix by Ken)
Fixes: 2b956a093a ("iris: totally untested icelake support")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This make shader-db's report.py work on Haswell and earlier platforms.
The problem is that the script would detect the "sends" output for
scalar shaders and expect in in vec4 shaders too. When it didn't find
it, the script would fail with:
Traceback (most recent call last):
File "./report.py", line 351, in <module>
main()
File "./report.py", line 182, in main
before_count = before[p][m]
KeyError: 'sends'
Fixes: f192741ddd ("intel/compiler: Report the number of non-spill/fill SEND messages")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
libdrm returns -errno instead of directly the ioctl ret of -1.
Fixes: 1c3cda7d27 "radv: Add syncobj signal/reset/wait to winsys."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Observed an issue when looking at the code generatedy by the
image-vertex-attrib-input-output piglit test. Even though the test
itself worked fine (due to TIC 0 being used for the image), this needs
to be fixed.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Unfortuantely we don't know if a particular load is a real 2d image (as
would be a cube face or 2d array element), or a layer of a 3d image.
Since we pass in the TIC reference, the instruction's type has to match
what's in the TIC (experimentally). In order to properly support
bindless images, this also can't be done by looking at the current
bindings and generating appropriate code.
As a result all plain 2d loads are converted into a pair of 2d/3d loads,
with appropriate predicates to ensure only one of those actually
executes, and the values are all merged in.
This goes somewhat against the current flow, so for GM107 we do the OOB
handling directly in the surface processing logic. Perhaps the other
gens should do something similar, but that is left to another change.
This fixes dEQP tests like image_load_store.3d.*_single_layer and GL-CTS
tests like shader_image_load_store.non-layered_binding without breaking
anything else.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "20.0" <mesa-stable@lists.freedesktop.org>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.