Since spills and fills use the TMU, special care has to be taken to
avoid putting one between a TMU setup instruction and the corresponding
reads or writes. This change adds logic to move fills up and move spills
down to avoid interrupting such sequences.
This allows compiling 6 more programs from shader-db. Other stats:
total spills in shared programs: 446 -> 446 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0
total fills in shared programs: 606 -> 610 (0.66%)
fills in affected programs: 38 -> 42 (10.53%)
helped: 0
HURT: 2
total instructions in shared programs: 19330 -> 19363 (0.17%)
instructions in affected programs: 3299 -> 3332 (1.00%)
helped: 0
HURT: 5
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6606>
Currently dpb_size for VP9 profile0 and profile2 is same eventhough
for profile2 dpb_size is multiplied by extra 3/2 and we are
seeing VM_L2_PROTECTION_FAULT error and ring vcn_dec timeout because
of less dpb_size for VP9_2.
This patch will correct dpb_size for VP9_2 and fixes the issue.
Signed-off-by: SureshGuttula <suresh.guttula@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7480>
Intel hardware supports 8-bit arithmetic but it's tricky and annoying:
- Byte operations don't actually execute with a byte type. The
execution type for byte operations is actually word. (I don't know
if this has implications for the HW implementation. Probably?)
- Destinations are required to be strided out to at least the
execution type size. This means that B-type operations always have
a stride of at least 2. This means wreaks havoc on the back-end in
multiple ways.
- Thanks to the strided destination, we don't actually save register
space by storing things in bytes. We could, in theory, interleave
two byte values into a single 2B-strided register but that's both a
pain for RA and would lead to piles of false dependencies pre-Gen12
and on Gen12+, we'd need some significant improvements to the SWSB
pass.
- Also thanks to the strided destination, all byte writes are treated
as partial writes by the back-end and we don't know how to copy-prop
them.
- On Gen11, they added a new hardware restriction that byte types
aren't allowed in the 2nd and 3rd sources of instructions. This
means that we have to emit B->W conversions all over to resolve
things. If we emit said conversions in NIR, instead, there's a
chance NIR can get rid of some of them for us.
We can get rid of a lot of this pain by just asking NIR to get rid of
8-bit arithmetic for us. It may lead to a few more conversions in some
cases but having back-end copy-prop actually work is probably a bigger
bonus. There is still a bit we have to handle in the back-end. In
particular, basic MOVs and conversions because 8-bit load/store ops
still require 8-bit types.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7482>
We can't really support these directly on any platform. May as well let
NIR lower them. The NIR lowering is potentially one more instruction
for scan/reduce ops thanks to not being able to do the B->W conversion
as part of SEL_EXEC. For imax/imin exclusive scan, it's yet another
instruction thanks to the extra imax/imin NIR has to insert to deal with
the fact that the first live channel will contain the identity value
which, when signed, will cast wrong. However, it does let us drop some
complexity from our back-end so it's probably worth it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7482>
We forgot to initialize the sample_count member here, leading to it
being undefined. This causes problems on MSVC when compiling in
debug-mode, where we get a run-time error for using an undefined
variable.
To avoid similar problems in the future if more fields are added,
let's initialize the whole struct to zero to start with. This also
allows us to remove a no-longer-needed zero-initialization.
Fixes: cf170616da ("gallium: Add a util_blitter path for using a custom VS and FS.")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7503>
Issue: Encoding parameters not updated after changing FrameRate
Root Cause:
In radeon_enc_begin_frame, there is a parameter need_rate_control
which was enabled only if the bitrate is changed. Due to this the
radeon_enc_rc_layer_init was not updating the encoder parameters with new
framerate, peak_bits_per_picture_integer and avg_target_bits_per_picture
Fix:
Added the condition where we will check if there is a change in
other parameters and enable rate control. Eventually updating the
encoder parameters with new framerate and bitrate.
Signed-off-by: Krunal Patel <krunalkumarmukeshkumar.patel@amd.corp-partner.google.com>
Reviewed-by: Boyuan Zhang boyuan.zhang@amd.com
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7363>
When calling GetPhysicalDeviceProperties2 we were ignoring and logging
the structures for extensions not supported. But for the case of
VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PCI_BUS_INFO_PROPERTIES_EXT we
already know that we are not going to support it, so let's just do
nothing (not even logging) when passed.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7497>
vkGetPhysicalDeviceFeatures2 and vkGetPhysicalDeviceProperties2 are not present on some MoltenVK versions.
VK_KHR_get_physical_device_properties2 exposes the KHR versions of the same functions.
These cannot be used via static linking, so we have to dynamically detect the loader version and then the extension
to work out which pointers to use.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7447>
resource cannot be NULL at this point since it has already been
dereferenced earlier.
Fix defect reported by Coverity Scan.
Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking resource suggests that it may be
null, but it has already been dereferenced on all paths leading to
the check.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7343>
I found the C++ runner hard to develop on, and we had stability issues and
outstanding feature needs that made me want something I felt good about
hacking on. Thus, Rewrite It In Rust of the deqp runner.
The new runner includes:
- Skip lists don't reshuffle the test list.
- Known-flake handling without resorting to skip lists (fixing our main CI
reliability issue on a3xx right now).
- Per-thread Vulkan shader caches should speed up VK CI runtime.
- Tracking of crashes separate from fails (so we can see progress on that
front).
- Logging of deqp stderr spam (particularly assertion failures!) in the CI
log.
- Integrated QPA filtering so we don't have bash perf issues for it.
- Logging of what caselist to go look at for a given error report (in red,
so it's easier to find in your CI log).
- The code is 1/3 unit tests, and easy to extend for more coverage.
- Non-LAVA CI runs create a failures.csv in artifacts that you can check
in as your deqp-*-fails.txt file.
- Test runtime is included in results.csv so you can debug how to speed up
your CI job.
- Pretty summary at the end of the run of slow/flaky/failed tests.
Since this is a new runner with a different RNG, the test groups are
shuffled one more time. This seems to result in some panfrost T720
stability issues (See its new deqp-panfrost-t720-flakes.txt), and one new
flake in freedreno a630.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7434>
Fixes most subcases of dEQP-VK.image.subresource_layout.3d.* The remaining
failures appear to be in snorm, which 2D also fails on (and the blob
reports as not supported for this test).
We don't currently have these tests in CI, but they'll appear with
1.2.4.0.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7467>
MoltenVK, at least upto 1.2.141, does not render triangle fans. This is reflected in the portability EXTX extension.
This code get the extensions properties and features and then sets the have_triangle_fans.
This extension is not avaiable on all systems, so an amout of the code has to be protected by the define VK_EXTX_PORTABILITY_SUBSET_EXTENSION_NAME.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7457>
Extends the flexability of the device info script.
Allows #if defined()/#endif guards around particular data, for platform or version specific structures.
Allows for feature structures to be retreved without having to have a have_ variable as well. Helps if there is more than one feature in the structure.
These changes help towards allowing the use of the portability set extensions.
Reviewed-by: Hoe Hao Cheng <haochengho12907@gmail.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7457>
There are resources that may have more planes than chained resources. The
frontend has no way of figuring out which (if any) chained resource is the
right one to call resource_get_handle with and until a (now reverted)
change to the dri frontend it just always called with the first resource.
The convention of calling with the first resource of a chain allows the
pipe driver, which has the necessary information of how resources and
planes map to each other for a specific format/modifier combination, to do
the necessary walking. Document this as the official calling convention
of this function.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7419>
The ZSA state is not fully self contained, as other states (mostly
shader using discard or writing depth information) have an influence
on whether we can use early Z test/write.
Rework the ZSA state into a derived state that gets updated whenever
a new ZSA or SHADER state is bound. This way we can automatically
enable/disable early Z as needed.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7396>
The information about a shader using discard/kill is interesting
to other parts of the driver, as depth states need to programmed
differently depending on this. As we don't want to deal with
NIR/TGSI differences in other parts of the driver, track this
usage in the common etna_shader_variant.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7396>
Multiple threads may access st_update_* function at same time. Issues
happen when the threads modify lists managed by shader compiler.
Issues were found with script that runs multithread tests 1000 times in
a row with MESA_GLSL_CACHE_DISABLE=1 set. Problems start when 2
simultaneous st_create_[vp|fp]_variant calls start to compile a new
shader variant for the same program and various nir passes use and
modify same exec_lists.
Example failure:
deqp-egl: ../src/compiler/glsl/list.h:575: exec_list_validate: Assertion `node->next->prev == node' failed.
v2: instead of introducing new mutex, lock shared state
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7418>
Identical to GL_MESA_pack_invert in effect, just need to check for a
different enum value for GLES vs GL. The spec claims that "OpenGL 1.5 or
OpenGL ES 1.0 are required", but ReadPixels isn't a thing for ES1 so we
only enable it for ES2+.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3156>
This newly enables the extension for r100 and vieux. As far as I can
tell, that's safe to do. vieux's ReadPixels is just _mesa_readpixels,
which clearly already handles it correctly because the extension is
enabled for classic swrast. r100 has some custom acceleration paths
before falling down to _mesa_readpixels, which winds its way through to
r100_blit, which already has code to handle pack->Invert being set.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3156>
subpass_idx is of type uint32_t.
Fix defects reported by Coverity Scan.
Macro compares unsigned to 0 (NO_EFFECT)
unsigned_compare: This greater-than-or-equal-to-zero comparison of
an unsigned value is always true. subpass_idx >= 0U.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7325>
We need to consider shader calls as potential writes to their payloads.
For other ray-tracing intrinsics, we may not have a shader payload
pointer and have to treat them more like a barrier. We also need to
ensure that global and SSBO reads/writes aren't propagated across shader
call intrinsics.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6479>
The SPV_KHR_ray_tracing extension adds 6 new storage classes which is a
bit on the ridiculous side. In order to avoid adding that many variable
modes to NIR, we make a few simplifying assumptions:
1. CallableData and RayPayload data actually lives on the stack
somewhere, presumably in the caller's stack. We assume that these
are no different from global variables and use nir_var_shader_temp
for them. We still need a separate storage class for the incoming
variants but only so we can figure out which one the incoming one
is and lower it to something useful.
2. There's no difference between incoming CallableData and RayPaolad
data. We can use a single storage class for both.
3. ShaderRecordBuffer data is just a global memory access. This lets
us avoid NIR variables entirely and just fetch the pointer via the
shader_record_ptr system value and it's accessed using a 64-bit
global memory pointer.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6479>
If we were desperate to reduce bits, we could probably also use
shader_in/out for hit attributes as they really are an output from
intersection shaders and read-only in any-hit and closest-hit shaders.
However, other passes such as nir_gether_info like to assume that
anything with nir_var_shader_in/out is indexed using vec4 locations for
interface matching. It's easier to just add a new variable mode.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6479>
These are a bit more tricky than most because they're matrix system
values. We make the intentional choice here to not bother with allowing
indirect addressing of columns for these. Since they're system values,
they may be magically constructed somehow or come from weird hardware so
it's easier on back-ends to just handle any indirects with bcsel.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6479>
Missing in this commit are NIR intrinsics for the ObjectToWorld and
WorldToObject built-ins. Those are matrices and so they take a bit more
work and justify a separate commit. For now, we add the enums and leave
the SYSTEM_VALUE <-> nir_intrinsic conversion commented out.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6479>
We are limited to 64 threads per dispatched group, regardless of what
num_cs_threads claims, so advertise that limit correctly.
Fixes (on TGL and up):
dEQP-VK.subgroups.size_control.compute.required_subgroup_size_min
and other *.required_subgroup_size_min tests.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7453>
Add missing GetPhysicalDeviceImageFormatProperties2 logic for the extension
and enable it.
Also stop exposing optimal tiling for formats which are linear only, to
simplify dealing with those.
Passes dEQP-VK.drm_format_modifiers.*
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6940>
As it needs firmware to probe, and we cannot bundle it within the kernel
image because it is incompatible with the GPL.
Currently we rebind the driver after boot but that's slow and fragile,
as unloads of DRM drivers aren't generally tested.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7420>
So far the support for R/B swapping in vertex attributes were for the
generic attributes.
But there are cases like glSecondaryColorPointer() supporting BGRA
formats that require the R/B swapping to be also allowed in the
non-generic vertex attributes (in this case, in the COLOR1 attribute).
v2:
- Don't split line (Iago)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7196>
There's already code handling that case and help text also says
it's possible.
Found, because Coverity complained about optarg NULL check,
suggesting optarg can be NULL for other options, where it's not
possible. IOW, false positive lead me to finding an unrelated issue.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7449>
32bit architectures which have 64bit time_t does not fit the assumption
of time_t being same as system long int
Fixes
error: format specifies type 'long' but the argument has type 'time_t' (aka 'long long') [-Werror,-Wformat]
time.tv_sec);
^~~~~~~~~~~
Signed-off-by: Khem Raj <raj.khem@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2966>
Resources with alpha formats that are mapped to R are fast-cleared with
the wrong clear color.
When such resources with are cleared via iris_clear_texture,
isl_color_value_unpack places channel data in the R channel.
convert_fast_clear_color then overwrites the channel with 0.
To avoid zeroing the clear color, move convert_fast_clear_color to the
other callers of clear_color: iris_clear and iris_clear_render_target.
Enables iris to pass the "A" case of the fcc-clear-tex piglit test.
v2. Rename convert_fast_clear_color to convert_clear_color. (Ken)
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3670
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7345>
Resources with luminance alpha formats that are mapped to RG are
fast-cleared with the wrong clear color.
When such resources with are cleared via iris_clear_texture,
isl_color_value_unpack places channel data in the R and G channels.
convert_fast_clear_color then overwrites the G channel with R.
Delete the clear color override that's specific to luminance alpha
formats.
Enables iris to pass the "LA" case of the fcc-clear-tex piglit test.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7345>
This case hadn't been ported to NIR before, and I missed that when
removing the TGSI path and replacing it with NIR -> NTT for TGSI drivers.
This caused breakage in nv50 on piglit's pbo-teximage.
In the process, the !use_gs gets its layer output fixed to be an int
instead of a vec4, which I suspect would fix validation in that path.
Fixes: 57effa342b ("st/mesa: Drop the TGSI paths for PBOs and use nir-to-tgsi if needed.")
Closes: #3680
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7320>
There are too many algebraic optimizations to be certain that one of them
couldn't create instructions which need lowering. It also creates better
code for some reason.
fossil-db (parallel-rdp, Navi):
Totals from 217 (31.77% of 683) affected shaders:
VGPRs: 7716 -> 7672 (-0.57%)
CodeSize: 1516152 -> 1510688 (-0.36%); split: -0.38%, +0.02%
MaxWaves: 3964 -> 3982 (+0.45%)
Instrs: 269445 -> 268508 (-0.35%); split: -0.36%, +0.02%
Cycles: 37963416 -> 37912592 (-0.13%); split: -0.15%, +0.01%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4791>
The shuffles provided by the SPV_INTEL_subgroups extension generate
bcsel(b, shuffle(x, i), shuffle(y, j))
In the case where x and y are the same, we can turn this into a shuffle
with the bcsel on the index which lets us drop a whole shuffle.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7366>
Most of this is fairly straightforward; we just set all the modes on any
derefs which are generic. The one tricky bit is OpGenericCastToPtrExplicit.
Instead of adding NIR intrinsics to do the cast, we add NIR intrinsics
to do a storage class check and then bcsel based on that.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
All three passes check the variables for complex uses and don't split
them if they have any complex uses. Most of these checks are just early
returns to avoid chasing the deref to the variable and a hash table
lookup if we can quickly determine it has the wrong mode. In a couple
of cases, we need to re-arrange or add other checks to ensure that it's
safe for generic pointers.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
We use the same nir_deref_mode_is_in_set helper that we use in
nir_lower_vars_to_explicit_types for the same reason. If there are any
generic pointers in play, we have to lower all generic pointer modes at
the same time or else we risk types getting out-of-sync.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
We can only lower a deref to SSA in this pass if it's guaranteed to be
nir_var_function_temp. We already flag any variables with complex uses
(i.e. casts) as not being lowerable and refuse to lower any derefs to
them so we don't have to worry about false negatives.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
For non-explicit nir_lower_io, we use nir_deref_mode_is because there's
no way it works for generic pointers. For nir_lower_vars_to_explicit_types,
and nir_lower_explicit_io, we use nir_deref_mode_is_in_set to ensure we
never get type confusion. For generic pointers, this means that they
must be called with the full set of generic pointer modes.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
NIR derefs currently have exactly one variable mode. This is about to
change so we can handle OpenCL generic pointers. In order to transition
safely, we need to audit every deref->mode check. This commit adds a
set of helpers that provide more nuanced mode checks and converts most
of NIR to use them.
For simple cases, we add nir_deref_mode_is and nir_deref_mode_is_one_of
helpers. These can be used in passes which don't have to bother with
generic pointers and just want to know what mode a thing is. If the
pass ever encounters generic pointers in a way that this check would be
unsafe, it will assert-fail to alert developers that they need to think
harder about things and fix the pass.
For more complex passes which require a more nuanced understanding of
modes, we add nir_deref_mode_may_be and nir_deref_mode_must_be helpers
which accurately describe the compiler's best knowledge about the given
deref. Unfortunately, we may not be able to exactly identify the mode
in a generic pointers scenario so we have to be very careful when we use
these. Conversion of these passes is left to later commits.
For the case of mass lowering of a particular mode (nir_lower_explicit_io
is one good example), we add nir_deref_mode_is_in_set. This is also
pretty assert-happy like nir_deref_mode_is but is for a set containment
comparison on deref modes where you expect the deref to either be all-in
or all-out.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
We already have the variable so we know the mode exactly. Just use that
instead of the deref mode. If these paths ever have to handle variable
pointers (not likely since they're OpenGL-specific), we can fix them to
handle crazy deref modes then.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
In split_var_list_structs where we initalize the splitting, we already
use get_complex_used_vars to avoid splitting any variables that have a
complex use. However, we weren't actually handling the complex uses
properly in the case where we can't actually find the variable.
Fixes: f1cb3348f1 "nir/split_vars: Properly bail in the presence of ..."
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
I can't think of any reason why shared and output aren't in this list.
The real thing we're trying to do is avoid premature scalarization
because of a shader or function temporary variable because we might
lower it to something we don't want scalarized later. Also fix the
version we copy+pasted into GCM.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
for render pass information pointers into the command buffer are stored,
but command buffers are immutable content so make sure to use const ptrs
to avoid problems like was seen with clears.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7416>
When the cmd buffer is recorded, we record the attachments state
into it, however we use the pending clear aspects to keep track
of clears, but it should be kept in the state no overwrritten
in the cmd buffer.
Allocate some memory to store this hanging off the state.
This fixes gears and radialblur demos.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7416>
evergreen_emit_atomic_buffer_setup_count is only called if chip >= EVERGREEN
otherwise atomic_used_mask is left uninitialized when unconditionally used by
r600_need_cs_space so it might want more space than needed
fix this by always initializing atomic_used_mask
Fixes: 32529e6084 ("r600/eg: rework atomic counter emission with flushes")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7417>
Detects the MoltenVK layer and extension.
If present, get the ext function pointers and use to enable full swizzeling suport.
Fixes issues with Swizzling behaviour fro MoltenVk is disabled by default and needed to be enable via this API.
This also supplied the ground work to allow IOSurfaces to be used later for surface passing.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7383>
This will create code that is easier to combine into MADs/FMA when the
last component of the vector is 1.0.
nir_opt_algebraic_late has an optimization to do something similar but it
only works for inexact code, if the multiplication-by-1 optimization is
done before it and if the backend enables fuse_ffma.
fossil-db (Navi):
Totals from 4296 (3.75% of 114665) affected shaders:
SGPRs: 283468 -> 283764 (+0.10%); split: -0.02%, +0.12%
VGPRs: 172868 -> 172904 (+0.02%); split: -0.09%, +0.11%
CodeSize: 14045312 -> 14027128 (-0.13%); split: -0.15%, +0.02%
MaxWaves: 59285 -> 59282 (-0.01%); split: +0.04%, -0.05%
Instrs: 2703507 -> 2683187 (-0.75%); split: -0.76%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5631>
This will create code that is easier to combine into MADs/FMA when the
last component is 1.0.
nir_opt_algebraic_late has an optimization to do something similar but it
only works for inexact code, if the multiplication-by-1 optimization is
done before it and if the backend enables fuse_ffma.
fossil-db (Navi):
Totals from 85583 (74.64% of 114665) affected shaders:
SGPRs: 4556060 -> 4558596 (+0.06%); split: -0.07%, +0.12%
VGPRs: 3315060 -> 3312984 (-0.06%); split: -0.23%, +0.17%
SpillSGPRs: 13552 -> 13553 (+0.01%)
CodeSize: 184962756 -> 184431388 (-0.29%); split: -0.32%, +0.03%
MaxWaves: 1208693 -> 1209361 (+0.06%); split: +0.17%, -0.11%
Instrs: 35678819 -> 35361617 (-0.89%); split: -0.91%, +0.02%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5631>
Like with other getters, invalid enum is dealt in find_value by setting
error to GL_INVALID_ENUM and returning INVALID_TYPE which makes
get_value_size return 0.
Fixes false 'implementation errors' seen with Piglit test:
ext_external_objects-memory-object-api-errors
"Mesa 20.3.0-devel implementation error: invalid value type in GetUnsignedBytei_vEXT()
Please report at https://gitlab.freedesktop.org/mesa/mesa/-/issues"
v2: add assert to get_value_size() (Lionel)
Fixes: e064d66020 ("mesa: implement glGetUnsignedByte{v|i_v}")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eleni Maria Stea <estea@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7403>
Something may go wrong during import which leaves pointer to null and
when ctx and it's shared state gets destroyed we will attempt to call
memobj_destroy. Instead of forcing every driver to handle it, add check
here.
Fixes crashes with Piglit test:
ext_external_objects_fd-memory-object-api-errors
Fixes: 99cf910834 ("mesa/st: Actually free the driver part of memory objects on destruction.")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eleni Maria Stea <estea@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7403>
Addresses the following linker error when building for Android:
ld.lld: error: undefined symbol: freedreno_dev_info_init
>>> referenced by freedreno_screen.c:1001 (external/mesa3d/src/gallium/drivers/freedreno/freedreno_screen.c:1001)
>>> freedreno_screen.o:(fd_screen_create) in archive [..]/libmesa_pipe_freedreno_intermediates/libmesa_pipe_freedreno.a
These functions were introduced in a file that was not included in the
Android build yet. Also sort the list of files alphabetically as
requested in an earlier MR.
Fixes: 4a0bdf47e4 ("freedreno: Introduce common device info struct")
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7411>
It is not used right now, so keeping it adds some noise/confusion.
So far configuring Z test are done through the CFG_BITS. See
v3dX(emit_state) at v3dx_emit.c for v3d, and pack_cfg_bits at
v3dv_pipeline.c for v3dv. There flags like z_updates_enable and others
are filled up.
That key field seems like a leftover coming from using vc4 as
reference, as that driver defines and uses a field with name name.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7421>
As a result of this patch, compiler chooses SIMD32 shaders more
frequently.
Current logic is designed to avoid regressions from enabling SIMD32 at
all cost, even though the cases where regression can happen are probably
for smaller draw calls (far away from the camera and though smaller).
In Intel perf CI this patch improves FPS in:
- gfxbench5 alu2: 21.92% (gen9), 23.7% (gen11)
- synmark OglShMapVsm: 3.26% (gen9), 4.52% (gen11)
- gfxbench5 car chase: 1.34% (gen9), 1.32% (gen11)
No observed regressions there.
In my testing, it also improves FPS in:
- The Talos Principle: 2.9% (gen9)
The other 16 games I tested had very minor changes in performance
(2/3 positive, but not significant enough to list here).
Note: this patch harms synmark OglDrvState (which is not in Intel perf
CI) by ~2.9%, but this benchmark renders multiple scenes from other
workloads (including OglShMapVsm, which is helped in standalone mode)
in tiny rectangles. Rendering so small drastically changes branching
statistics, which favors smaller SIMD modes. I assume this matters
only in micro-benchmarks, as in real workloads more expensive (with
more uniform branching behavior) draw calls dominate.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7137>
Since Zink doesn't use swapchains to create presentable images, drivers
lose the capacity to identify memory allocations for them, which is a problem
when the underlying platform has special requirements for these, such as
needing to allocate them on a particular device. Including this struct in the
pNext chain, which is the same thing that the Mesa Vulkan WSI code does when
allocating memory for swapchain images, gives drivers a chance to identify
and handle these memory allocations properly.
v2: follow Zink's conventions for pNext chains (Mike)
v3: add scanout parameter for VkImage creation (Daniel)
v4: don't add a dependency on vulkan util (Erik)
v5: include vulkan directory for Zink builds
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> (v2)
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7378>
Fix defect reported by Coverity Scan.
Evaluation order violation (EVALUATION_ORDER)
write_write_typo: In resource = resource =
ntt_ureg_src_indirect(c, ureg_src_register(TGSI_FILE_IMAGE, 0U),
instr->src[0]), resource is written twice with the same value.
Fixes: 34cc6a804e ("gallium: Add a nir-to-TGSI pass.")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7326>
this adds a query field to denote the last point at which a query was api started,
which is then used every time we call in to get_query_result as the starting point
this is important when we want to be able to return the same result set multiple
times
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7195>
in normal operation we want to be using INPUT_ASSEMBLY_PRIMITIVES_BIT,
but then when we break out the geometry shaders we actually want to
be using GEOMETRY_SHADER_PRIMITIVES_BIT, which means we need to track
whether a query has a gs active for draws
to do this, we keep a list of all these queries with this type and
iterate over it every draw to flag the gs state of the query that's
being drawn to. this works because our ring buffer of batches will
always wait on a fence after a full cycle, meaning there can only ever
be 4 queries with outstanding results
Fixes: e40a77ea5d ("zink: use right vulkan type for GL_PRIMITIVES_GENERATED queries")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7195>
After we lowered `return` into `break` - the control flow is changed and
the block with this change has a new successor, which means that in this
new successor phis should have additional source.
Since the instructions that use phis in the successor are predicated -
it's ok for a new phi source to be undef.
If `return` is lowered in a nested loop, `break` is inserted in the outer
loops, so all new blocks with break require the same changes to phis
described above.
Examples of NIR before lowering:
block block_0:
loop {
block block_1:
if ssa_2 {
block block_2:
return
// succs: block_6
} else {
block block_2:
break;
// succs: block_5
}
block block_4:
}
block block_5:
// preds: block_3
vec1 32 ssa_4 = phi block_3: ssa_1
// succs: block_6
block block_6:
Here converting return to break should add block_2 to the phis
of block_5.
block block_0:
loop {
block block_1:
loop {
block block_2:
if ssa_2 {
block block_3:
return
// succs: block_8
} else {
block block_4:
break;
// succs: block_6
}
block block_5:
}
block block_6:
break;
// succs: block_7
}
block block_7:
// preds: block_6
vec1 32 ssa_4 = phi block_6: ssa_1
// succs: block_8
block block_8:
Here converting return to break will insert conditional break in
the outer loop, changing block_6 predcessors.
Cc: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3322
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3498
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6186>
If a secondary command buffer has occlusion query inheritance then
draw calls recorded in it should update an active occlusion query
counter started in the primary command buffer.
If executing the secondary in a primary required to emit jobs and
not just a branch instruction, then we might need to create a new
job for the primary as well, and in that case we would lose the
occlusion query state, so we need to re-emit it at that point so
any additional draw calls recorded into the secondary that is being
executed continue to update the counter.
Fixes:
dEQP-VK.query_pool.concurrent_queries.secondary_command_buffer
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7373>
And refuse to import image with protected_content enabled.
We don't want a compositor to import an encrypted buffer in a image
without the ProtectedContent attribute enabled, because that will
lead to incorrect display.
Similarly, if the compositor thinks the image is encrypted, we fail
the import if the buffer is not.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5096>
For 3D images we should take the slice to copy from or to, from
the Z coordinate of the corresponding offset, not the base array
layer.
Fixes VK_KHR_maintenance1 tests:
dEQP-VK.api.copy_and_blit.core.image_to_image.3d_images.3d_to_2d_by_slices
dEQP-VK.api.copy_and_blit.core.image_to_image.3d_images.2d_to_3d_by_layers
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7364>
I don't know why these values were introduced for but it seems like
we can optimize this by just doing:
gl_SampleMaskIn[0] = (SampleCoverage & (1 << gl_SampleID))
AMDGPU-PRO and AMDVLK apply the same formula to compute the
sample mask when per-sample shading is enabled.
No fossils-db changes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7377>
Less than 8-bit formats may pack multiple pixels in a byte along a row,
possibly padding along the edge. We already had one such format
(RGBA4_UNORM), here are the rest.
As far as I can tell, 64-bit formats are purely a theoretical
curiousity. I don't think any implementation actually supports them, do
not use. Might as well complete the list, though.
I'm not actually piping any new formats into Gallium with this commit,
that can come later if someone has a use case.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Suggested-by: Icecream95 <ixn@disroot.org>
Tested-by: Christian Hewitt <christianshewitt@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7352>
Midgard (v4, v5) and Bifrost v6 have swizzles on every pixel format
descriptor, allowing for arbitrary component reordering. With v7,
reordering is limited to a fixed set of common swizzles, which
simplifies the hardware but to some extent limits the formats available.
To handle, we split out the format tables, with the correct table for
the current hardware loaded as dev->formats.
v2: Switch sRGB flag from T/F to S/L per icecream's suggestion
v3: Add back Z16_UNORM formats to fix trace changes.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Christian Hewitt <christianshewitt@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7352>
Instead of matching on the PIPE format directly, match on the internal
format of the tile buffer and pick the pixel format that corresponds to
the internal tile buffer format (which differs from the format written
back to memory in the general case).
We add a number of missing formats to accomodate this, including the
AU/PU variants of each tilebuffer pixel format, where the AU formats use
the extra bits to store extra precision for dithering but the PU formats
simply pad the extra bits with zeroes. For the moment we use AU
everywhere. I'm not sure if there's a cost associated.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Christian Hewitt <christianshewitt@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7352>
Map PIPE formats that are fixed-function blendable to their (internal,
writeback) tuple. Formats which are renderable but require a blend
shadeer will be handled elsewhere to keep this easy to verify.
Notice the subset of SFBD and MFBD color writeback formats used to
identify fixed-function blendable formats are bit compatible, so it
suffices to store only the MFBD variants.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Christian Hewitt <christianshewitt@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7352>
The TX_CTRL register has a bit to enable TS compression, the setting in
TS_SAMPLER_CONFIG is ignored for descriptor based textures. Apparently
256B tile mode already implies enabled compression to the HW, as with
the larger tile mode with compression was working fine, only the 128B
tile mode needs this change to work correctly.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7367>
There is a race where the BO refcount might drop to 0 before the
dmabuf/name import paths had a chance to grab a reference for a
BO found in the handle_table. The easiest solution is to keep the
refcount stable as long as the table_lock is held.
While a more involved scheme of rechecking the refcount before
actually destroying the BO might also work, the bo_del path isn't
called very often, so micro-optimizing a single mutex_lock seems
to be over-engineered, so go for the easy solution.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7367>
(v2) Remove include from Android.common.mk
Avoid adding libsync shared dependency in Android.common.mk
Add libsync shared dependency where needed, for easier tracking
(v1) Fixes the following building errors:
In file included from external/mesa/src/gallium/drivers/freedreno/a3xx/fd3_query.c:27:
In file included from external/mesa/src/gallium/drivers/freedreno/freedreno_query_hw.h:33:
In file included from external/mesa/src/gallium/drivers/freedreno/freedreno_context.h:33:
external/mesa/src/util/libsync.h:48:10: fatal error: 'android/sync.h' file not found
^~~~~~~~~~~~~~~~
1 error generated.
In file included from external/mesa/src/mesa/drivers/dri/i965/brw_sync.c:41:
external/mesa/src/util/libsync.h:48:10: fatal error: 'android/sync.h' file not found
^~~~~~~~~~~~~~~~
1 error generated.
In file included from external/mesa/src/gallium/auxiliary/util/u_tests.c:513:
external/mesa/src/util/libsync.h:48:10: fatal error: 'android/sync.h' file not found
^~~~~~~~~~~~~~~~
1 error generated.
FAILED: out/target/product/x86_64/obj_x86/SHARED_LIBRARIES/i965_dri_intermediates/LINKED/i965_dri.so
...
external/mesa/src/mesa/drivers/dri/i965/brw_sync.c:223: error: undefined reference to 'sync_wait'
external/mesa/src/mesa/drivers/dri/i965/brw_sync.c:287: error: undefined reference to 'sync_wait'
FAILED: out/target/product/x86_64/obj_x86/SHARED_LIBRARIES/gallium_dri_intermediates/LINKED/gallium_dri.so
...
external/mesa/src/util/libsync.h:142: error: undefined reference to 'sync_merge'
external/mesa/src/gallium/drivers/freedreno/freedreno_fence.c:94: error: undefined reference to 'sync_wait'
external/mesa/src/gallium/auxiliary/util/u_tests.c:575: error: undefined reference to 'sync_wait'
Fixes: 27b8887946 ("android: Add pre-4.7 Android kernel compatibility to our libsync header.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7306>
R11G11B10_FLOAT and R9G9B9E5_FLOAT are three-component formats, so we
shouldn't use 1 for the alpha component.
We don't know about any test/app getting fixed with this change, but
it is the equivalent to v3dv commit
e07c546763. Vulkan CTS has some tests
that used that format and failed if not using XYZ1.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7365>
Fix the gbm_dri_bo_get_handle_for_plane use case by allowing plane > 0
in dri2_from_planar for images with multiple planes in separate chained
texture resources.
Not all multiplanar resources are chained, though. The iris aux buffer
is a separate plane in the same resource.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7028>
So far for the formats E5B9G9R9_UFLOAT_PACK32 and
B10G11R11_UFLOAT_PACK32 we were using a XYZW swizzle. But from Vulkan
spec those are three-component, without alpha, formats. So we should
use XYZ1 instead, as we were already doing for other three-component
formats.
Curiously the only case where this raised a problem were when using
clamp to border with transparent black. This change allows us to
remove the code that handled only that specific case.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7355>
If the workgroup_size variable is lower than the actual workgroup size,
that means it's possible that ACO won't emit some s_barrier instructions
when in fact it should. This can possibly cause a GPU hang.
This is just for the sake of general correctness, currently this
can't cause a real problem because the maximum vertex count is always
greater than (or equal to) the primitive count in GS, and already
takes into account the number of GS invocations.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7232>
The pack_* instructions are now lowered via nir_lower_alu_to_scalar()
and unpack_* are not lowered anymore.
These bitcasts are no-ops, and lowering prevents
some optimizations like vectorization.
Note: There are still some *_split variations remaining
from different other NIR passes.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6527>
This is another case of a feature that is implemented in the compiler
and that only required that we set the shader key properly from the
pipeline state, which we were already doing.
I verified we pass the tests in dEQP-VK.pipeline.multisample.alpha_to_one.*
(we only support 4x multisampling, so we can only pass a single test there),
however, the tests seem to have a bug by which they always pass, even if
the driver doesn't actually implement alpha to one correctly. I submitted
a fix to Khronos and verified that we also pass the fixed tests (and that
we failed them if we don't actually set te shader key correctly).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7336>
Partial rollback as GFX9 really requires height = 1 to work.
The two substantial parts of the fix remaining:
1) Deal with views with multiple levels.
2) Limit the expansion to the base mip pitch/height. On GFX9 this
is exactly equal to the surf_pitch that was used before. I've
done some investigation to make sure that on GFX10 this always
results in the right physical layout.
Remaining stupid question is how the actual extents for bounds
checking never end up too low when the size gets clamped, but
this change and the previous change don't change that ...
Fixes: 1fb3e1fb70 "radv: Fix mipmap extent adjustment on GFX9+."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7245>
If the shader does:
loop {
if (divergent)
discard
else
a()
b()
}
then a()'s block will dominate b()'s block in the logical CFG, but not the
linear CFG. This will cause value numbering to try to combine SLAU from
a() and b().
This didn't happen with break/continue because sanitize_if() would move
a() out of the branch. Using sanitize_if() to fix this doesn't look easy,
because discards are not control flow instructions in NIR.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7216>
Iterate over maps by reference to avoid copies.
Replace find/insert with insert to avoid double search.
Use range-based for loop, avoiding copies by reference. Delete comment.
Erase by iterator instead of key to avoid repeat search.
Iterators unneeded to modify unwaited_instrs. Use range-based for loop.
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7285>
Our blit shader path allocates a descriptor pool to create
combined image sampler descriptors for blit source images. So
far, we had sized this pool statically and the driver would
fail if we ever need to allocate more descriptors than that.
With this change, we switch to using a dynamic allocation
mechanism instead where we allocate as many pools as we need to
meet descriptor set allocation requirements for the command buffer.
Also, every time a new pool needs to be created, we double its
size (up to a limit), so we can start small and avoid wasting
memory for command buffers that only have a small number of blits,
while trying to keep allocation overhead low for command buffers
that record a lot of blits.
v2: use existing framework for automatic destruction of private
driver objects to free allocated pools.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7311>
For us this is mostly handled in the compiler by a NIR lowering so
for the Vulkan driver we only need to make sure that we program our
shader key correctly from the pipeline state, which we were already
doing.
It doesn't look like CTS has any coverage for this yet so it has only
been smoke tested, but it seems to be working correctly, as expected.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7313>
Fix defects reported by Coverity Scan.
Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member m_numPkrLog2 is not
initialized in this constructor nor in any functions that it
calls.
uninit_member: Non-static class member m_numSaLog2 is not
initialized in this constructor nor in any functions that it
calls.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7178>
There is an upper bound on # of bits we have to encode bin height on
various gens, which we could exceed with larger GMEM sizes and low
byte/pixel formats.
The max-width limits are initialized based on corresponding bitfield
sizes.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7222>
Split out into helper that can be re-used by gmemtool, to de-duplicate
the limits table. And convert to switch instead of if-else ladder.
A little bit of duplication, but that will no longer be the case with
additional limits added in next patch.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7222>
If the current thread asks for either the current context or the current
dispatch table for a thread that has not yet set any context current, we
currently risk returning the wrong data if there was only a single
thread that had called u_current_init() yet.
So let's first check if the only expected thread-id is the one getting
these, and return NULL and/or __glapi_noop_table instead if not.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7280>
When not using the USE_ELF_TLS code-path, this function is
thread-unsafe, because it returns u_current_table if set without
consulting the ThreadSafe variable in u_current.c.
There's a short period where this can cause problems, if a program uses
multiple threads, but only have made a single context current so far. If
the program issues OpenGL commands from the initialized thread while a
new thread is setting u_current_table to __glapi_noop_table, we will
return the wrong table here.
It doesn't seem right to have two versions of the code that does the
same anyway, so let's use the version that doesn't have this problem
instead.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7280>
This enables GL_EXT_semaphore feature.
v2:
* reversed previous commit that was conditionally setting the signal
fence capability if the syncobj was present
* reversed previous commit that was introducing a bool has_syncobj that
is not necessary anymore
v3:
* changed the signal function to use fence->seqno due to recent changes
to master
v4:
* changed the signal callback to use the new structs of the fences
backend (iris_fine_fence)
v5:
* removed check for ctx == NULL in iris_fence_signal and await functions
as at the time they are called we always have a context
* splitted a line to not exceed width
v6:
* put back the if(ctx) check in iris_fence_await, if this is an error
the fix should be in a different MR
Signed-off-by: Eleni Maria Stea <estea@igalia.com>
Reviewed-by: Rohan Garg <rohan.garg@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7042>
The code does not compile on Windows, so just disable for now. There is
already a pattern to do this for Android.
Stop including expat dependency if building Windows.
Disable WITH_XMLCONFIG if _WIN32 is defined.
Tuck _WIN32 incompatible includes inside WITH_XMLCONFIG.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7249>
On import we don't get a modifier so the modifier in the Gallium
handle should be set to DRM_FORMAT_MOD_INVALID instead of LINEAR.
Otherwise things like screen capture break if the driver actually
starts supporting modifiers.
Acked-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7302>
NIR sources are not depending on LLVM (anymore?) as can be seen in the
equivalent unconditional inclusion of nir/ source files in meson.build.
Symbols in these files are necessary to compile softpipe:
Linking build/linux-x86_64-debug/gallium/targets/libgl-xlib/libGL.so.1.5 ...
/usr/bin/ld: build/linux-x86_64-debug/gallium/drivers/softpipe/libsoftpipe.a(sp_state_shader.os): in function `softpipe_create_shader_state':
src/gallium/drivers/softpipe/sp_state_shader.c:146: undefined reference to `nir_to_tgsi'
/usr/bin/ld: build/linux-x86_64-debug/gallium/drivers/softpipe/libsoftpipe.a(sp_state_shader.os): in function `softpipe_create_compute_state':
src/gallium/drivers/softpipe/sp_state_shader.c:435: undefined reference to `nir_to_tgsi'
Fixes: fa483d8cd1 ("android: gallium/auxiliary: Deduplicate nir_to_tgsi.c inclusion")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3669
Tested-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7250>
Letting the caller zero-initialize the program object is error prone,
not to mention that resources attached to the program might not be freed
by the caller. Let's simplify that by letting the compiler allocate the
panfrost_program object. Those objects should be freed with ralloc_free().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7206>
Clip & cull distances, which are compact arrays, exposed a lot of holes
because they can take up multiple slots and partially overlap.
I wanted to eliminate our dependence on knowing the layout of the
variables, as this can get complicated with things like partially
overlapping arrays, which can happen with ARB_enhanced_layouts or with
clip/cull distance arrays. This means no longer changing the layout
based on whether the i/o is part of an array or not, and no longer
matching producer <-> consumer based on the variables. At the end of the
day we have to match things based on the user-specified location, so for
simplicity this switches the entire i/o handling to be based off the
user location rather than the driver location. This means that the
primitive map may be a little bigger, but it reduces the complexity
because we never have to build a table mapping user location to driver
location, and it reduces the amount of work done at link time in the SSO
case. It also brings us closer to what the other drivers do.
While here, I also fixed the handling of component qualifiers, which was
another thing broken with clip/cull distances.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6959>
I think the rationale for not setting the size for inputs is that
when passed between geometry stages the clip and cull distances are
supposed to be treated like any other varying. However, this isn't 100%
the case for the FS, since when it's read by the FS it's also used by
the fixed-function stage. In freedreno we setup varying locations when
compiling the FS, and then tack on VS-only outputs like gl_Position at
the end. Furthermore there's code to compact input locations based on
what's actually read. But this compaction can't happen for clip and cull
distances, because then we won't have space for components that are only
read by the clipper. So, we need to know the original number of
components for both arrays. Modify this pass so that we don't have to go
digging around for it ourselves.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6959>
This patch fixes mesa compiler crash in i965 on shaders like the following one:
```
in VS_OUTPUT {
mat4 data;
} vs_output;
out vec4 fs_output;
vec4 convert(in float val) {
return vec4(val);
}
void main()
{
fs_output = vec4(0.0);
for (int a = -1; a < 5; a++) {
for (int b = -1; b < 5; b++) {
fs_output += convert(vs_output.data[b][a]);
}
}
}
```
Section 5.11 (Out-of-Bounds Accesses) of the GLSL 4.60 spec says:
In the subsections described above for array, vector, matrix and
structure accesses, any out-of-bounds access produced undefined
behavior....
Out-of-bounds reads return undefined values, which
include values from other variables of the active program or zero.
Out-of-bounds writes may be discarded or overwrite
other variables of the active program.
GL_KHR_robustness and GL_ARB_robustness encourage us to return zero
for reads.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6560>
This creates a directory and save various logs (dmesg, umr,
pipeline, gpu info, etc) instead of printing stuff to stdout/stderr.
This dumps the following files when a GPU hang is detected:
- dmesg.log
- gpu_info.lo
- options.log
- pipeline.log (shaders including SPIR-V if spirv-dis found)
- registers.log
- trace.log
- vm_fault (if a VM fault is detected)
- umr_ring.log (if UMR found)
- umr_waves.log (if UMR found)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7233>
We don't natively support BGRA format, instead we handle these
as RGBA and we lower the loads to swap components R and B.
However, the driver emits VPM loads based on the size of the
input variables so when we have a vec2 or float BGRA input,
it would only emit VPM loads for components 0 and 1, which is
not correct since we emit a load of component 2 to swap with
component 0.
v2: handle GL legacy vertex inputs gracefully.
Fixes:
dEQP-VK.draw.output_location.array.b8g8r8a8-unorm-highp-output-vec2
dEQP-VK.draw.output_location.array.b8g8r8a8-unorm-mediump-output-vec2
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7271>
There are two bits for the vector size of a vertex input, with the
value 0 meaning 4 components. The CLE decoder seems to try and dump
"4" instead of "0" is to be a bit more user friendly, but it had an
off-by-one error that would cause it to dump "2" instead.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7271>
Fix defects reported by Coverity Scan.
Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member buffer_access_type is not
initialized in this constructor nor in any functions that it
calls.
uninit_member: Non-static class member progress is not initialized
in this constructor nor in any functions that it calls.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7243>
Handle compressed stencil buffer transition from one layout to another
on gen12+.
When stencil compression is enabled, we have to initialize buffer via
stencil clear (HZ_OP) before any renderpass.
v2:
- Pass predicate bit false to anv_image_ccs_op (Nanley Chery)
v3:
- update aspect assertion (Nanley Chery)
v4:
- Make state decision based on anv_layout_to_aux_state instated of
anv_layout_to_aux_usage (Sagar Ghuge)
v5:
- No need to handle stencil CCS resolve case (Jason Ekstrand)
- Initialize buffer using HZ_OP (Nanley Chery)
v6: (Nanley Chery)
- Pass correct layer/level count.
- Remove local variable.
v7:
- Skip stencil initialization with HZ_OP packet if followed by fast
clear. (Nanley Chery)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2942>
All these instructions replicate the result of a N-component dot-product
to a vec4. Naming them fdot_replicatedN gives the impression that are
some sort of abstract dot-product that replicates the result to a vecN.
They also deviate from fdph_replicated... which nobody would reasonably
consider naming fdot_replicatedh.
Naming these opcodes fdotN_replicated more closely matches what they
are, and it matches the pattern of fdph_replicated.
I believe that the only reason these opcodes were named this way was
because it simplified the implementation of the binop_reduce function in
nir_opcodes.py. I made some fairly simple changes to that function, and
I think the end result is ok.
The bulk of the changes come from the sed rename:
sed --in-place -e 's/fdot_replicated\([234]\)/fdot\1_replicated/g' \
$(grep -r 'fdot_replicated[234]' src/)
v2: Use a named parameter to binop_reduce instead of using
isinstance(name, str). Suggested by Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5725>
With swap interval 0, i.e. sync-to-vblank disabled.
This can be necessary for unthrottled drawing with Xwayland:
1) One buffer can be scanned out
2) One buffer can be pending in the kernel for a page flip
3) One buffer can be pending in the Wayland compositor
Therefore, with 3 buffers, the frame-rate could be capped much lower
than the throughput the GPU is capable of, in the worst case at the
Wayland compositor refresh rate.
(The native Wayland EGL backend always uses up to 4 buffers)
Leave the maximum number of buffers at 3 for swap interval != 0, it's
sufficient in that case to always be able to queue one frame ahead of
time.
https://gitlab.gnome.org/GNOME/mutter/-/issues/1455https://gitlab.gnome.org/GNOME/mutter/-/issues/1462
Cc: mesa-stable
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7033>
We'd previously take the copy path. If we were actually flipping (in
which case skipped frames are more likely to occur), we'd ping-pong
between a smaller and larger number of back buffers, and frame-rate
could vary / take a dip due to the buffer management overhead.
While I'm not sure this is actually possible to hit at this point, it
definitely will be with the next change.
Cc: mesa-stable
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7033>
Previously, we would always allocate 3 buffers for page flipping. But 2
buffers can suffice for clients which always wait for buffer swaps to
complete before starting a new frame.
Therefore, keep track of the maximum number of buffers separately from
the current number, and only bump the latter if both current buffers are
busy.
Cc: mesa-stable
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7033>
The current blorp API only allows source layers for 3D images to be
integers. That is causing problems with the Vulkan API where we need
to be able to use a 3D layer that could be in between 2 layers.
This change allows a floating point value to be passed for blits and
internally sets up the input parameters to pass floating point values
to kernels.
v2: Use tex op to determinate what types are the coordinates (Jason)
Drop setting params->z (Lionel)
v3: Fix nir_texop_txf_ms_mcs op not considered as having integer coords (Lionel)
v4: Fix incorrect test on nir_texop_txf_ms_mcs (Ivan)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3458
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6909>
If we are blitting to tile boundaries we don't need to emit
tile loads. The exception to this is the case where we are
blitting only a subset of the pixel components in the image
(which we do for single aspect blits of D24S8), since in that
case we need to preserve the components we are not writing.
There is a corner case where some times we create framebuffers
that alias subregions of a larger image. In that case the edge
tiles are not padded and we can't skip the loads.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7247>
This can be useful for debugging or working around bugs such as
Mesa#106 where Wine is expecting to find a visual that isn't
available.
v2:
- split the indirect GL extension override to its own commit
- memset the bitfields to 0 in __glXExtensionsCtrScreen
Reviewed-by: Adam Jackson <ajax@redhat.com>
v3:
- slight rework necessary after splitting the computation of usable
extensions (Ian)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Martin Peres <martin.peres@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7212>
An upcoming commit will require to find an extension in a list. Rather
than duplicating part of the code from set_glx_extension, split it into
find_extension and set_glx_extension.
NOTE: set_glx_extension is a bit of a misnomer since it is also used
for gl extensions. This is why the find function is named
find_extension rather than find_glx_extension.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Martin Peres <martin.peres@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7212>
Fix defects reported by Coverity Scan.
Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member found_unsupported_op is not
initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member found_expensive_op is not
initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member found_dynamic_arrayref is not
initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member is_then is not initialized in
this constructor nor in any functions that it calls.
uninit_member: Non-static class member then_cost is not initialized in
this constructor nor in any functions that it calls.
uninit_member: Non-static class member else_cost is not initialized in
this constructor nor in any functions that it calls.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7228>
I actually had never found these, buried under Developer Topics -> Gallium
-> Drivers. Given that driver documentation contains not just gallium
driver documentation but also end-user information, bring it to a much
more prominent location between User Topics and Developer Topics at the
top level.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7174>
The lower_alpha_func is initialized as COMPARE_FUNC_NEVER, and the
nir_lower_alpha_test() only executed if the alpha_func has changed.
This means when user runs an alpha test with GL_NEVER function the
lowering is never executed, and no fragment is discarded.
Rather, we would like to initialize the function as COMPARE_FUNC_ALWAYS,
and run the lowering when the alpha_func is different. If user runs the
alpha test with GL_ALWAYS, indeed the lowering is never executed; but
using GL_ALWAYS makes no fragment is discarded, so it is like not
executing the testing.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7251>
These can catch various common issues in MR settings and Git commit
logs.
The "check mr" job only exists in pre-merge pipelines for MRs, and runs
automatically.
The "check commits" job only exists in pre-merge pipelines for MRs and
in pipelines for forked branches. It runs automatically in the former
case and can be manually triggered in the latter.
v2:
* Use git_archive docker image (Daniel Stone)
* Use a single sanity stage for both jobs (Tomeu Vizoso)
Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6209>
This is a minor bugfix, in that the prior code required that the server
expose either XFree86-DRI or DRI2 to get the name; Xwayland exposed
neither, just DRI3. Now, for DRI2 and DRI3, we just ask the loader. It
also means we report "swrast" for the driver name when that's what we're
using, which is probably a good thing.
v2: Trust the driver name from the server for DRI2 (Michel Dänzer)
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7219>
If a pass returning boolean progress reports no change, we shouldn't need
to re-validate. If a pass breaks the NIR but also fails to report
progress correctly, it would be up to the next pass to catch that.
This should hopefully help with test timeouts on
KHR-GL33.texture_swizzle.functional since switching softpipe to
nir-to-tgsi and enabling NIR validation in CI (27s to 20s on my system).
Suggested-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7239>
OUT_OF_MEMORY is the only valid error code from this function, but this
error is more of a "things went horribly wrong, you can't talk to the GPU"
case. Set the device to be in error.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7224>
../src/freedreno/decode/cffdec.c: In function ‘reg_disasm_gpuaddr’:
../src/freedreno/decode/cffdec.c:404:29: error: ‘sprintf’ writing a
terminating nul past the end of the destination [-Werror=format-overflow=]
404 | sprintf(filename, "%04d.%s", n++, ext);
../src/freedreno/decode/cffdec.c:404:3: note: ‘sprintf’ output between
9 and 16 bytes into a destination of size 8
404 | sprintf(filename, "%04d.%s", n++, ext);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7224>
Clover uses very specific sizes and alignments for images and samplers
to pass various bits of data. We need to add a new size/align helper
for inputs which matches the standard CL size/align for most types but
also has the right size/align for images and samplers.
v2 (Karol): use sizeof(cl_mem) instead of 8 to fix 32 bit runtimes.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7069>
Now that we have the HDC, using the data cache for UBO pulls seems to
help things quite a bit:
GTA V DXVK 104.0%
Talos Principle GL 102.8%
Rise of Tomb Raider VK 102.8%
Dark Souls 3 DXVK 101.4%
Witcher3 DXVK 101.3%
Bioshock Infinite GL 100.5%
Doom 2016 VK 97.7%
Doom is a bit of a loss but it helps enough other stuff, it's probably
worth the hit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7230>
Both commits add nir_to_tgsi.c to a different variable, causing a
build-time error when compiling in an AOSP tree:
build/make/core/binary.mk:970: error: overriding commands for target `..../obj/STATIC_LIBRARIES/libmesa_gallium_intermediates/nir/nir_to_tgsi.o', previously defined at build/make/core/binary.mk:970
Move all sources into NIR_SOURCES to resolve this issue.
Fixes: d0f8fe5909 ("softpipe: Switch to using NIR as the shader format from mesa/st.")
Fixes: 34cc6a804e ("gallium: Add a nir-to-TGSI pass.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7237>
This causes our TGSI to use far more temps, since NTT is currently not
releasing temps from registers. On the other hand, this interpreter is
already spectacularly slow, and if we wanted to go fast we should probably
write a scalar NIR intrepeter.
For now, using NTT means that we test that codepath in preparation for
switching TGSI-consuming HW drivers over, so that we can eventually
garbage collect st_glsl_to_tgsi.
As this is a major restructuring, there are some impacts on piglit:
- Several tests start assert failing about 64-bit NIR registers for temp
arrays not getting split to vec2s:
- fs-frexp-dvec4-variable-index.shader_test
- arb_gpu_shader_fp64/uniform_buffers/{vs,fs,gs}-array-copy.shader_test
- arb_gpu_shader_int64/execution/indirect-array-two-accesses.shader_test
- dEQP-GLES31.functional.primitive_bounding_box.wide_points.global_state.vertex_geometry_fragment.fbo_bbox_larger
starts crashing depending on various bits of state (previous tests run
before it, presence of valgrind, presence of glib's memcheck). Doesn't
seem really NTT-specific, added to flakes list with other GS flakes.
- Almost 200 fp64/int64-related tests start passing, mostly around i/o loayout.
shader-db:
total instructions in shared programs: 3492656 -> 3081674 (-11.77%)
total loops in shared programs: 1418 -> 1387 (-2.19%)
total temps in shared programs: 340041 -> 615527 (81.02%)
total const in shared programs: 3158970 -> 1528630 (-51.61%)
total imm in shared programs: 117586 -> 101349 (-13.81%)
Total CPU time (seconds): 430.36 -> 900.94 (109.35%)
FPS results:
glmark2 texture +7.32484% +/- 3.76528% (n=10)
glmark2 desktop:effect=shadow +20% +/- 0% (n=10)
glmark2 shadow +6.49351% +/- 3.65335% (n=7)
glmark2 conditionals +18.75% +/- 2.74658% (n=9)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3395>
The goal is to replace glsl_to_tgsi.cpp and its supporting code (~10k
LOC). This code ends up being smaller because NIR has many lowering
passes that help it map better to TGSI than GLSL IR does.
As a benefit, this brings NIR optimizations to TGSI-only drivers.
Many of the softpipe shaders I've looked at end up being significantly
shorter. Some potentially relevant changes to TGSI consumers:
- All immediates are now UINT typed. This means they're less legible
in printouts, but means that they get deduplicated better (no more
multiple copies of 0x0!)
- Sampler views are not currently declared.
- nir_registers don't have their live ranges tracked, so TGSI temp usage
may go up with a lot of control flow.
- nir_lower_vec_to_mov naively inserts movs instead of trying to coalesce
the movs with the generators of the ssa values, sometimes increasing
instruction count.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3395>
live_index had two things going on: 0 meant the instr was an undef and
always dead, and otherwise ssa defs had increasing numbers by instruction
order. We already have a field in the instruction for storing instruction
order, and ssa defs don't need that number to be contiguous (if you want a
compact per-ssa-def number, use ssa->index after reindexing).
We don't use ssa->index for this, because reindexing those would change
nir_print, and that would be rude to people trying to track what's
happening in optimization passes.
This openend up a hole in nir_ssa_def, so we move nir_ssa_def->index
toward the end to shrink the struct from 64 bytes to 56.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3395>
I wasted a bunch of time today tracking down a spurious test results
change due to a driver invoking UB by running tests where NIR validation
had failed (instruction reading from components beyond vector size). If
we need to shrink our coverage to get runtimes down, it will still be
better to be catching validation errors in CI.
To keep the test jobs runtime under 10 minutes, I've split a530's gles2 to
two different jobs.
Acked-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7203>
This has several advantages:
- it generates roughly the same NIR for both compiler backends
(this might help for debugging purposes)
- it might allow to move around some NIR pass to improve compile time
- it might help for RadeonSI support
- it improves fossils-db stats for RADV/LLVM (this shouldn't matter
much but it's a win for free)
fossil-db (Navi/LLVM):
Totals from 80732 (59.18% of 136420) affected shaders:
SGPRs: 5390036 -> 5382843 (-0.13%); split: -3.38%, +3.24%
VGPRs: 3910932 -> 3890320 (-0.53%); split: -2.38%, +1.85%
SpillSGPRs: 319212 -> 283149 (-11.30%); split: -17.69%, +6.39%
SpillVGPRs: 14668 -> 14324 (-2.35%); split: -7.53%, +5.18%
CodeSize: 265360860 -> 267572132 (+0.83%); split: -0.47%, +1.30%
Scratch: 5338112 -> 6134784 (+14.92%); split: -2.65%, +17.57%
MaxWaves: 1077230 -> 1086902 (+0.90%); split: +2.79%, -1.90%
No fossils-db changes on RADV/ACO.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7077>
pthread_cond_broadcast man page says this:
"The pthread_cond_broadcast() or pthread_cond_signal() functions may
be called by a thread whether or not it currently owns the mutex that
threads calling pthread_cond_wait() or pthread_cond_timedwait() have
associated with the condition variable during their waits; however,
if predictable scheduling behavior is required, then that mutex shall
be locked by the thread calling pthread_cond_broadcast() or
pthread_cond_signal()."
Found by reading the code.
Compile tested only.
Fixes: da997ebec9 ("vulkan: Add KHR_display extension using DRM [v10]")
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7197>
Fix defect reported by Coverity Scan.
Identical code for different branches (IDENTICAL_BRANCHES)
identical_branches: Ternary expression on condition width has
identical then and else expressions: 32. Should one of the
expressions be modified, or the entire ternary expression
replaced?
Fixes: 8bb1d61f27 ("panfrost: Add panfrost_block_dim helper")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7177>
Thanks to Felix Degrood for discovering that we missed enabling this
additional caching on Tigerlake! Felix also benchmarked the changes.
We now use MOCS 48 (HDC:L1 + L3 + LLC) for render targets, textures,
and pull constant buffers. We leave storage buffers & images, as well
as stateless messages, using the previous MOCS 2 value. We can't use
HDC:L1 with atomics, and we don't know a priori whether storage buffers
will be used with atomics or not. Similarly, the Vulkan buffer device
address feature allows atomics to be performed on buffers via stateless
messages, and we only can control MOCS at the base address level, so
we can't do much there.
This is closer to what the Windows Vulkan and OpenGL drivers do,
though it isn't quite the same - they also disable LLC in some cases,
but we observed this to have noticable performance regressions when
we tried (though a couple titles benefited). We may try experiment
with that in the future.
Improves performance in a number of titles:
- Unreal Engine 4 Shooter Demo [VK]: 11.8%
- Witcher 3 [DXVK]: 3.9%
- Rise of the Tomb Raider [VK]: 1.5%
- Shadow of the Tomb Raider [VK]: 1.0%
- Grand Theft Auto V [DXVK]: 0.8%
We did not observe any performance regressions.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7104>
On Gen12+, we can enable additional caches in certain usage situations.
This routes that decision making to a central place in ISL, based on
surface usage flags, and updates both drivers to use it. (i965 doesn't
need to change because it doesn't support Gen12.)
We continue handling the "external" decision via an anv_mocs() wrapper
for now, since we store that flag in anv_bo, which isl doesn't know
about. (We could introduce an ISL_SURF_USAGE_EXTERNAL, but I'm not
actually sure that would be cleaner.)
This patch should not have any functional nor performance effects, as
we continue selecting the exact same MOCS values for now.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7104>
Most uses of this function deal with destination buffers, but for
copy_buffer_to_image, the buffer is the source, and isn't rendered
to. We should avoid setting ISL_SURF_USAGE_RENDER_TARGET_BIT.
Also, we should avoid setting ISL_SURF_USAGE_TEXTURE_BIT for the
destination, which isn't sampled from.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7104>
Given that we already link to Android's libsync, use it instead of using a
build-time dependency on libdrm for the KGSL path. This also would help
for older kernel compat with KGSL.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6821>
The downstream Android kernels had a different API than was merged
upstream, and libsync on Android abstracts over that for us. Use their
sync_merge() and sync_wait(), at the cost of linking against libsync
(which Android.mk and meson both do).
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6821>
libsync.h is one of the common dependencies on libdrm from drivers that
otherwise don't want libdrm. Given that it's header-only code, just
import it to Mesa instead of forcing library dependencies on our users.
Copied from libdrm commit a84caff71be9 (" intel: Add PCI ID support to RKL
platform")
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6821>
In SDK 25 (Android 7, 2016), the kernel only went up to around 4.4, so
the upstreamed sync API didn't exist, and libsync didn't expose
sync_merge().
In SDK 26 (Android 8, 2017), the kernel is sometimes bumped to 4.9 or
4.14, but libsync has a sync_merge() that operates on either the
downstream or the upstream API.
Since our android-targeting drivers in general use sync objects (requiring
a kernel newer than 7 got) and sync_merge() (suggesting interaction with
external fencing that started in Android 8), make CI build against an SDK
version with the sync API. I think really doing SDK 25 right at this
point would involve backporting mesa with some #ifdefs to not expose sync
APIs that wouldn't work.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6821>
loader_dr3_helper.c uses xcb_xfixes_create_region() that requires dep_xcb_xfixes to link. This is dependent on with_platform_x11 and with_dri3.
But the source meson file does not set this up dependent on with_dri3.
The build was initialsed using platforms=x11 and gallium-drivers=zink,swrast.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7164>
It's a bit hard to exactly map our implementation to the limits
described by Vulkan. The bindless surface handle in the extended
message descriptors is 20 bits and it's an index into the table of
RENDER_SURFACE_STATE structs that starts at bindless surface base
address. This means that we can have at must 1M surface states
allocated at any given time. Since most image views take two
descriptors, this means we have a limit of about 500K image views.
However, since we allocate surface states at vkCreateImageView time,
this means our limit is actually something on the order of 500K image
views allocated at any time. The actual limit describe by Vulkan, on
the other hand, is a limit of how many you can have in a descriptor set.
Assuming anyone using 1M descriptors will be using the same image view
twice a bunch of times (or a bunch of null descriptors), we can safely
advertise a larger limit. 1M is what's required by D3D12, so let's
advertise that.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3335
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7180>
According to the Vulkan specification, the
VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT flag will be ignored if
included in a VkCommandBufferBeginInfo for a primary command buffer.
This also implies pBeginInfo->pInheritanceInfo should not be read even
if the flag is present, and makes it legal to include the flag knowing
it will be ignored.
Signed-off-by: Ricardo Garcia <rgarcia@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7128>
we've transformed this in the vertex output, so we need to undo that here
ideally we'd only be performing this transform once, but that's going to get
complicated later with the halfz extension which requires shader keys on top
of this, so we can get around to simplifying things at that stage
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7139>
When reading from and writing to the same image in a shader the
memory_barrier can possibly be handled by emitting an ack-write and then
wait for the ack when the memory barrier is set.
Not sure whow well this goes with the syncronization across all shader
invocations though.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7142>
Fix defects reported by Coverity Scan.
Uninitialized pointer field (UNINIT_CTOR)
uninit_member: Non-static class member buffer_access_type is not
initialized in this constructor nor in any functions that it
calls.
uninit_member: Non-static class member uniform_block is not
initialized in this constructor nor in any functions that it
calls.
uninit_member: Non-static class member progress is not initialized
in this constructor nor in any functions that it calls.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7120>
Altough the driver isn't expected to receive nir_var_uniform types
from GLSL this happens currently for one of the internal driver shaders.
At vc4_get_yuv_fs at vc4_blit.c there is a "stride" nir_var_uniform
variable that needs to be lowered so the shader can be compiled.
This regression was affecting several piglit tests under
spec/ext_image_dma_buf_import and at least MythTV application.
Fixes: 96d99f2ecc ("vc4: Only call nir_lower_io on shader_in/out")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3536
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Piotr Oniszczuk <piotr.oniszczuk@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7160>
This file gets built per-Gen, so the warnings are repeated a lot.
src/mesa/drivers/dri/i965/genX_state_upload.c: In function ‘vf_invalidate_for_vb_48bit_transitions’:
src/mesa/drivers/dri/i965/genX_state_upload.c:405:60: warning: unused parameter ‘brw’ [-Wunused-parameter]
405 | vf_invalidate_for_vb_48bit_transitions(struct brw_context *brw)
| ~~~~~~~~~~~~~~~~~~~~^~~
src/mesa/drivers/dri/i965/genX_state_upload.c: In function ‘vf_invalidate_for_ib_48bit_transition’:
src/mesa/drivers/dri/i965/genX_state_upload.c:444:59: warning: unused parameter ‘brw’ [-Wunused-parameter]
444 | vf_invalidate_for_ib_48bit_transition(struct brw_context *brw)
| ~~~~~~~~~~~~~~~~~~~~^~~
src/mesa/drivers/dri/i965/genX_state_upload.c: In function ‘gen4_upload_default_color’:
src/mesa/drivers/dri/i965/genX_state_upload.c:4951:40: warning: unused parameter ‘format’ [-Wunused-parameter]
4951 | mesa_format format, GLenum base_format,
| ~~~~~~~~~~~~^~~~~~
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6899>
I considered a couple other options (including adding #if / #endif
around UNUSED and adding an UNUSED_ON_SOME_GEN), but this seemed the
best. There was also at least one other case of having UNUSED on a
paramter that is sometimes unused (params in
blorp_emit_color_calc_state).
This header gets included in a lot of places (esp. in files that get
built per-Gen), so the warnings are repeated a lot.
In file included from src/mesa/drivers/dri/i965/genX_blorp_exec.c:33:
src/intel/blorp/blorp_genX_exec.h: In function ‘emit_urb_config’:
src/intel/blorp/blorp_genX_exec.h:193:48: warning: unused parameter ‘deref_block_size’ [-Wunused-parameter]
193 | enum gen_urb_deref_block_size *deref_block_size)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
src/intel/blorp/blorp_genX_exec.h: In function ‘blorp_fill_vertex_buffer_state’:
src/intel/blorp/blorp_genX_exec.h:350:52: warning: unused parameter ‘batch’ [-Wunused-parameter]
350 | blorp_fill_vertex_buffer_state(struct blorp_batch *batch,
| ~~~~~~~~~~~~~~~~~~~~^~~~~
src/intel/blorp/blorp_genX_exec.h: In function ‘blorp_emit_surface_state’:
src/intel/blorp/blorp_genX_exec.h:1403:42: warning: unused parameter ‘aux_op’ [-Wunused-parameter]
1403 | enum isl_aux_op aux_op,
| ~~~~~~~~~~~~~~~~^~~~~~
src/intel/blorp/blorp_genX_exec.h: In function ‘blorp_update_clear_color’:
src/intel/blorp/blorp_genX_exec.h:1867:46: warning: unused parameter ‘batch’ [-Wunused-parameter]
1867 | blorp_update_clear_color(struct blorp_batch *batch,
| ~~~~~~~~~~~~~~~~~~~~^~~~~
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6899>
Also fix the obtuse comment. I had to dig back through the commit logs
to find the real issue. GL_ARB_viewport_array requires geometry
shaders, and in i965 the only way to have that is with a 3.2+ Core
profile context... or use allow_higher_compat_version.
This increases the maximum Compatibility profile version from 4.0 to 4.6
(on supported hardware) when the allow_higher_compat_version option is
used.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7118>
as implied in the surrounding code, left_components can be 0 here, in which
case creating a left swizzle is unnecessary (and triggers an assert)
this moves a failing assert farther down the stack to a more useful location
when trying to pack e.g., struct[3] { dvec3; float; }
ref spec@arb_gpu_shader_fp64@execution@inout@vs-out-fs-in-s1-s2@3-dvec2-float
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7134>
We can copy any value into a 16-bit subregister with a 3 dword
v_pack_b32_f16 on GFX10 or a v_and_b32+v_or_b32 on GFX9.
Because the generated code can depend on the register assignment and to
improve constant propagation, Builder::copy creates a p_create_vector in
the case of sub-dword literals.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7111>
The current logic assumes blend descriptors are always retrieved from
the blend descriptor slots present in the FAU RAM, but this assumption
no longer stands when we add blend shaders to the mix. In that case we
need to use an 'opaque blend' whose descriptor is passed through
embedded constants.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7151>
Commit 67ee9c5f55 added support for
using the `pipe_compute_state::req_local_mem` field, because Clover
can have a run-time specified size that isn't baked into the shaders.
However, it started adding the static size from the shader to the
dynamic state-supplied size. The Mesa state tracker fills out
req_local_mem to prog->Base.info.cs.shared_size, which is exactly
what we fill out prog_data->total_shared to be. Effectively, this
meant that we double-counted the same SLM requirements, doubling
our space requirements.
Fixes a 10% performance regression in Synmark2's OglCSDof test.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7152>
cs_prog_data->slm_size is basically redundant with
prog_data->total_shared, which is the field that we actually use for
controlling the shared local memory size in all drivers. We were
still using it in one place for VK_EXT_pipeline_executable_properties,
but we should just fix that and delete the field.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7152>
Without this, we end up with indirect sampler messages all the time
because we don't propagate the texture/image BTI. This makes debugging
shaders with imageSize or textureSamples in them a pain.
Shader-db results on Ice Lake:
total instructions in shared programs: 19720612 -> 19720564 (<.01%)
instructions in affected programs: 4998 -> 4950 (-0.96%)
helped: 12
HURT: 0
All affected shaders were compute shaders in Deus Ex: Mankind Divided.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6794>
For 8bpp surfaces on TGL, prevent LOD1+ from being fast-cleared. This
will be relevant once ISL starts allowing CCS for 8bpp surfaces with
more than 2 miplevels. I verified the problem behind this restriction
with a modified version of the fbo-clearmipmap piglit test.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7085>
To support Android drivers, we're going to want to be tracking that Mesa's
build succeeds on a real android toolchain. This still uses the android
stubs since these libs aren't in the NDK.
Note that I had to drop the Intel and AMD drivers currently: we don't have
LLVM cross-compiled for Android in this container, and I'm honestly hoping
ACO saves us from that. Intel has dependencies on libexpat, which AOSP
really doesn't want to bring in, and it looks to me like those dependencies
could be optional.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6700>
We already have the targets we care about doing this using
ld_args_gc_sections, and by adding it to project arguments we caused
warnings spam in the android clang build about the compile stage not using
the argument.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6700>
The current scratch mechanism uses an MRF hack where we reserve a few
GRF registers to treat like the MRF and we collect the data into that
MRF region before doing a scratch write. We also use that region for
the header for scratch reads.
This commit changes things and gets rid of the MRF hack. Instead, we
reserve a single register (which RA is free to pick) for the scratch
header and uses split sends for scratch writes to avoid having to do
the copy. This should provide RA with more freedom in the presence of
spilling as well as avoid some unnecessary data moves. In future, the
new GEN9_SCRATCH_HEADER opcode gives us a place where we can do our own
per-thread scratch base address calculations rather than depending on
the scratch base address that gets pushed into g0. Having an opcode for
this lets us do it once at the top of the shader rather than repeating
it at every read/write.
One other noticeable difference is the use of SHADER_OPCODE_SEND. We
can get away with this thanks to the fact that we're now using a set to
track which instructions are generated by spills and don't rely on the
opcodes to find spill/fill instructions. This allows us to avoid adding
more virtual opcodes and let the normal code paths handle things like
scoreboard dependencies between header setup and the SEND. It also
means that post-RA scheduling may be able to space out the header setup
MOV and the SEND for better latency hiding.
Shader-db results on Skylake:
total spills in shared programs: 12137 -> 10604 (-12.63%)
spills in affected programs: 6685 -> 5152 (-22.93%)
helped: 274
HURT: 2
total fills in shared programs: 13065 -> 11515 (-11.86%)
fills in affected programs: 9007 -> 7457 (-17.21%)
helped: 275
HURT: 1
Shader-db results on Ice Lake:
total spills in shared programs: 12482 -> 10953 (-12.25%)
spills in affected programs: 6586 -> 5057 (-23.22%)
helped: 275
HURT: 0
total fills in shared programs: 12819 -> 11234 (-12.36%)
fills in affected programs: 7867 -> 6282 (-20.15%)
helped: 274
HURT: 0
Shader-db results on Tigerlake:
total spills in shared programs: 11689 -> 10233 (-12.46%)
spills in affected programs: 4740 -> 3284 (-30.72%)
helped: 259
HURT: 0
total fills in shared programs: 10840 -> 9443 (-12.89%)
fills in affected programs: 6244 -> 4847 (-22.37%)
helped: 259
HURT: 0
Fossil-db results on Ice Lake:
Spills in all programs: 245249 -> 201633 (-17.8%)
Fills in all programs: 366066 -> 314368 (-14.1%)
More practically, this seems to give about a 0.5-1% perf boost in
Witcher 3 (DXVK) and Shadow of the Tomb Raider (Vulkan native).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
Starting with e99081e76d, we don't re-construct liveness information
every time we spill a register. Instead, we're very careful to track
which instructions are spill instructions and not contribute those to
the IP count so that we can continue to use the old liveness information
even though instructions have been added. This commit adds an assert
that sanity-checks that we count the same number of instructions as our
liveness information is based on.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
This opcode is responsible for setting up the buffer base address and
per-thread scratch space fields of a scratch message header. For the
most part, it's a copy of g0 but some messages need us to zero out g0.2
and the bottom bits of g0.5.
This may actually fix a bug when nir_load/store_scratch is used. The
docs say that the DWORD scattered messages respect the per-thread
scratch size specified in gN.3[3:0] in the message header but we've been
leaving it zero. This may mean that we've been ignoring any scratch
reads/writes from a load/store_scratch intrinsic above the 1KB mark.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
In theory, this fixes a bug where we were dropping the PTSS bound on the
floor. The hardware docs claim that the A32 DWORD and BYTE scattered
read/write messages do a PTSS bounds check. However, in practice, it
seems that the hardware ignores the bounds check so this doesn't
actually matter. I verified this with the following couple of piglit
tests:
https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/399
In practice, this prevents the next commit from making a subtle
behavioral change.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
Because we hard-coded the default to vec4, any platform where it doesn't
have a "Dispatch Mode" field gets vec4 by default. This includes Gen11+
where vec4 is no longer a thing. Change the default so it works on
newer hardware.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
The aliasing we were using was not always correct. Particularly,
for 3D images, the simulator would complain about image strides
not being large enough in some cases.
This patch fixes this by aliasing both src and dst images and
carefully choosing the alias dimensions taking into account the
format chosen for the copy and the ratio of block sizes between
both images.
Playing a bit with the image dimensions used by the relevant CTS
tests we confirmed this works well for all tile layouts (lineartile,
ublinear1/2 and UIF).
This fixes all CTS tests involving 3D image copies from compressed
formats without needing to force UIF layout for all compressed
images (which would actually not work for all image sizes either).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This patch addresses various issues, mostly from secondary command buffers
that recorded pipeline barriers that are not consumed in the secondary itself,
so they need to be applied to jobs that come right after the execution of the
secondary in a primary command buffer.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
If a subpass clears one aspect of Depth/Stencil but loads the other
the clear might get lost. Fix this by emitting the clear as a draw
call instead of relying on the TLB clear.
Fixes:
dEQP-VK.renderpass.suballocation.attachment.3.307
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far V3DV_ENABLE_DEFAULT_PIPELINE_CACHE allowed to configure
pipeline cache to avoid any caching using a pipeline cache.
With this change we can be more detailed. Then envvar is not anymore a
boolean. Allowed values:
* "off": no pipeline cache at all. PipelineCache objects behaves as
no-op objects.
* "no-default-cache": user PipelineCache caches nir/variants, but we
don't provide a default cache in case the user doesn't provide a
PipelineCache object, neither for internal pipelines.
* "full" (default): we provide a default PipelineCache, used when
the user doesn't provide one when creating a Pipeline, and for
internal Pipelines.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We don't want to let the default pipeline cache grow without limit. We
choose a maximum number of entries that should work for all real world
applications. CTS will exceed that limit, but that is okay, as it will
prevent us from running out of memory.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Some shaders that need to spill hundreds of registers can take very long times
to compile as each allocation attempt spills a single register and restarts
the allocation process. We can significantly cut down these times if we allow
the compiler to spill in batches, which should be possible if we are spilling
uniforms, which is in fact the kind of spills that we do first because they
have lower cost than TMU spills.
Doing this could cause us to slightly over spill in some cases (depending on
the chosen batch size) leading to slightly worse performance, so we only
enable this behavior after we have started to spill over a certain threshold,
at which point we assume that performance won't be good and we want to
favor compilation speed instead.
v2:
- Keep it simple and just try to spill a fixed amount of registers in a
batch instead of trying to compute this dynamically based on accumulated
spills and current register pressure. (Eric).
v3:
- Check if the node is valid before doing anything with it.
- Drop the environment variable to select batch size and just fix it to 20.
With this we can take this CTS test from 35 minutes down to about 3 minutes:
dEQP-VK.ssbo.layout.random.all_shared_buffer.5
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We had some code on blit_tfu to hande 3D images but it was wrong. For
example, it executed a copy on the 3D image no matter the depth
component copy needed. This was not detected until vk-gl-cts 1.2.4
introduced more 1D and 3D blitting tests.
Also add checks for rely on blit_shader if needed like when mirroring
on the depth component.
Fixes the following tests:
dEQP-VK.api.copy_and_blit.core.blit_image.simple_tests.mirror_z_3d.nearest
dEQP-VK.api.copy_and_blit.core.blit_image.simple_tests.whole_3d.nearest
dEQP-VK.api.copy_and_blit.dedicated_allocation.blit_image.simple_tests.mirror_z_3d.nearest
dEQP-VK.api.copy_and_blit.dedicated_allocation.blit_image.simple_tests.whole_3d.nearest
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
When sampling the stencil aspect we want to reinterpret the D24S8 format
as RGBA8 and read stencil values from the R component.
Fixes:
dEQP-VK.renderpass.suballocation.formats.d24_unorm_s8_uint.input.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Gets tests like the following one properly skipped:
dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color.1d.etc2_r8g8b8a8_unorm_block.etc2_r8g8b8a8_unorm_block.optimal_general
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we have only been exposing linear for WSI formats and UIF on
everythig else, but we should instead expose linear or UIF based
on whether the underlying format supports any features for the given
layout.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
When negotiating DRM modifiers, applications may use this to validate the
features that are supported with a particular modifier. The WSI code in
Mesa relies on this to validate its modifiers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
By basing the tex_coord on the max layer, instead of min (similarly to
what we do for mirroring x/y)
Avoid all crashes, and get to Pass most of the following tests:
dEQP-VK.api.copy_and_blit.core.blit_image.simple_tests.mirror_z_3d.*
The only one failing is this one:
dEQP-VK.api.copy_and_blit.core.blit_image.simple_tests.mirror_z_3d.nearest
but looks that the core cause would be different, as there are other
3d nearests tests failing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Command buffer object destruction callbacks take 64-bit object
handles, but we defined the color clear pipeline callback to take
a 32-bit argument.
Should fix recent crash regressions with some CTS tests on Rpi4.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Subpass color clear pipelines are those used to emit partial attachment
clears as draw calls inside the render pass currently bound by the
application in the command buffer, leading to a huge performance improvement
compared to the case where we emit them in their own render pass.
Unfortunately, because the pipeline references the render pass
object in which it is used and the render pass object is owned by the
application (and can be destroyed at any point), we can't cache these
pipelines (unless we implement a refcounting mechanism or other
similar strategy).
Performance impact looks negligible based on experiments with vkQuake3,
probably because the underlying pipeline cache is preventing the
redundant shader recompiles.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Specifically, we should select the slice to blit from on the source
image to be in the middle of the depth step.
This issue was only raised recently after the CTS improved the 3D
blitting tests.
Fixes:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.*.3d.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Originally, copies between buffers and images required a buffer offset
that was a multiple of 4 bytes, however, the spec was later fixed to
relax this rule and only require offsets that had texel alignment.
Our implementation of image to buffer copies using the blit path needs
to bind the destination buffer as a linear image and be able to bind
the requested buffer memory at the required offset, so for that to work
we need to chnage the alignment requirements for linear images to match
the relaxed texel alignment requirement.
Fixes new tests in Vulkan CTS 1.2.4:
dEQP-VK.api.copy_and_blit.core.image_to_buffer.buffer_offset_relaxed
dEQP-VK.api.copy_and_blit.dedicated_allocation.image_to_buffer.buffer_offset_relaxed
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The lowering will get all the interpolateAt() functions from GLSL lowered to
the corresponding intrinsics we have just implemented in the compiler backend,
which was the last piece we needed to enable the feature.
This gets us to pass all the relevant tests in:
dEQP-VK.pipeline.multisample_interpolation.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The option use_interpolated_input_intrinsics will lower these as well
as regular input loads. This is inconvenient for V3D, where we can
produce optimal code for regular input loads based on the input
variable layout qualifiers, so this change adds an option to only
lower instances of interpolateAt().
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
as we have just set proper values for point granularity etc, we can
enable largePoints. With this change tests like this:
dEQP-VK.rasterization.primitive_size.points.point_size_*
goes from Skip to Pass.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
As we are here, we also tweak some line-related limits, as some use
the same value that for point, and in order to use the enum we added
recently at common/v3d_limits.h
Fixes the following test:
dEQP-VK.glsl.builtin_var.simple.pointcoord
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
PTB assumes that instance id to be 0 at start of tile, but hw would
not do that, we need to set it.
This fixes some Vulkan CTS tests that start to fails after some other
tests used an instance id.
So for example, before this commit for the following tests, executed
in that order, we got the following behaviour:
dEQP-VK.pipeline.vertex_input.multiple_attributes.binding_one_to_many.attributes.float.mat2.mat3 => Pass
dEQP-VK.draw.indexed_draw.draw_instanced_indexed_triangle_strip => Pass
dEQP-VK.pipeline.vertex_input.multiple_attributes.binding_one_to_many.attributes.float.mat2.mat3 => Fails
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we were pre-generating two variants, an all 16 bit return_size
and an all 32-bit return_size, as at pipeline creation time we don't
know the texture format that it would be used finally used.
But it is possible to override or at least refine the 32bit case, as
we know in advance that all shadow textures can (and in fact should)
use return_size 16bit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
To be used to decide the texture return size. We add it on the
descriptor map because it is the easier place to do so. As we are
lowering the texture accesses we can check instr->is_shadow at that
point. It is true that it is somewhat odd, as so far the descriptor
map was general-descriptor info, but is_shadow is only for
textures. But it doesn't make sense to make an effort now, as it is
possible that we would get more descriptor-specific info on the map on
the future. We can revisit that later.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
There are some potential advantages for that. Even if we are not
taking advantage of them, it would be interesting to be using this
path now, specially as non-deref path could be removed at some point.
Note that instead of returning for both resource_index and
vulkan_descriptor a vec2, we return a scalar for the first one, as it
is what the v3d backend expect (like for get_ssbo_size). For this to
work, we reconfigure the vec2 at vulkan_descriptor using the index and
an unused 0 value.
As far as I see turnip avoids that by lowering too load_ssbo/ubo, so
it just gets the index lowered (that in their case it is a vec3 with a
fixed 0 on the third component), but for now it is easier doing this.
v2: return a single-component for the index, to avoid the backend
needing to handle it (Eric, Jason).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Asking the simulator the total memory it is using, instead of sysinfo
(that returned the host system memory).
Fixes the following CTS tests when using the simulator:
dEQP-VK.memory.allocation.basic.percent_1.forward.count_12
dEQP-VK.memory.allocation.basic.percent_1.reverse.count_12
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Although we don't support texture buffers on the OpenGL driver, we are
already doing that for the Vulkan driver. This would be needed for the
OpenGL driver in any case.
Fixes following tests on v3dv:
dEQP-VK.memory.pipeline_barrier.host_write_uniform_texel_buffer.*
dEQP-VK.memory.pipeline_barrier.transfer_dst_uniform_texel_buffer.*
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we were using directly the local variable key to do the
insertion, when the hash table expects a permanent address. We add a
key field on all the meta structures (that are already basically a
wrapper over v3dv_pipeline).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were inserting as key directly the local key variable used to
search for entries, but hash_table expect a real pointer. Fixed by
using the array of keys that we already had at v3dv_pipeline.
Fixed failures on the rpi4 like:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.a1r5g5b5_unorm_pack16.a1r5g5b5_unorm_pack16.general_general_linear
but fwiw, this tests on the simulator, and several other tests on both
the simulator and rpi4, were working just by luck.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
If the framebuffer has no attachments then multisample rasterization
is enabled based on the rasterizationSamples multisample state of
the pipelines. It should be noted that since we don't support
the variableMultisampleRate feature, all pipelines in the same
subpass must have matching number of samples.
V3D requires that we specifically setup our frames to enable
multisampling or not, and we do this when we create jobs inside
a subpass. Since we create the first job for a subpass as soon as
the subpas starts, this is problematic: if we don't have any
attachments, we don't won't enable MSAA at this point, but later
on we might bind an MSAA pipeline, since pipelines can be bound
at any point in the lifespan of a command buffer.
Here, we fix this by testing if the first draw call in a job uses
an MSAA pipeline but the job the was setup to not use MSAA, and in
that case we re-start the job with MSAA enabled.
We also take care of a corner case that seems to be tested by CTS
where a framebuffer with no attachments doesn't bind any pipelines
with MSAA enabled (so according to the Vulkan spec, multisample
rasterization must be disabled) but the fragment shader in use
reads gl_SampleID (which enables per-sample shading). This would
lead to enabling per-sample shading with single-sample rasterization,
which doesn't make sense and makes the simulator complain, so we just
disable per-sample shading in that case.
Fixes:
dEQP-VK.pipeline.multisample.mixed_count.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
According to the spec, if a fragment shader reads gl_SampleID then the
shader must be evaluated per-sample.
Fixes:
dEQP-VK.pipeline.multisample_shader_builtin.write_sample_mask.4_samples
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
For example, regarding gl_SampleID, the GLSL spec states:
"Any static use of this variable in a fragment shader causes the
entire shader to be evaluated per-sample."
So we need to track if the fragment shader does anything that implicitly
enables per-sample shading in the compiler for the driver to
auto-enable sample rate shading if needed.
v2:
- Instead of tracking reads of gl_SampleID, check SYSTEM_BIT_SAMPLE_ID
and SYSTEM_BIT_SAMPLE_POS as well as the sample layout qualifier like
other drivers are doing to activate this behavior (Eric).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Now that we added support for texel_buffers, on all the cases that we
were checking for a image_view we end checking for a image_view or
buffer_view, so we stopped to use it. Remove it as it become
superfluous.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
If the formats are not suitable as texture type, then they can't be
used as texel buffers.
Gets tests like the following one:
dEQP-VK.image.load_store.without_format.buffer.r32g32b32_sfloat_minalign_uniform
to be properly skipped (instead of Crash on the simulator)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The TLB multisample resolve feature is only limited to specific format types.
For everything else, including sfloat and integer formats, we need to
fallback to a blit resolve. This needs to be handled both for in-pass
resolves as well as for vkCmdResolveImage.
Because these blits would happen after the tile store operations, we need
to make sure we store the multisampled buffers so we can then read them for
the blit resolve.
Fixes the remaining test failures in:
dEQP-VK.renderpass.suballocation.multisample_resolve.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
As we understand that texture accesses should be aligned to the UIF
block size.
Fixes several of the CTS tests under this pattern:
dEQP-VK.binding_model.shader_access.primary_cmd_buf.uniform_texel_buffer.*.offset_nonzero
dEQP-VK.binding_model.shader_access.primary_cmd_buf.storage_texel_buffer.*.offset_nonzero
Note: for those tests, using a lower value (64) was enough to get them
working, but again, we understand that the real alignment is the UIF
block size.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
There are several definitions for hw limits on v3dv_image that we want
to share, but v3dv_private was already growing bigger and messier.
So let's move them to a specific header. Note that there is already a
broadcom/common/v3d_limits.h. We are not putting them there because
right now they are only used by the Vulkan driver, but are candidates
to be moved.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We didn't need this until now, since this was included with GLES 3.2,
but we need it for Vulkan.
Eric had already done the plumbing for it though, we just need to
actually emit the mask.
Fixes some tests in:
dEQP-VK.renderpass.suballocation.multisample_resolve.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This should be able to handle partial copies of multisampled images.
This change extends our blit shader interface to also handle multisampled
destinations so that if the blit destination is a multisampled image,
the blit will rely on sample rate shading to copy all samples from
the source image (which must have a matching number of samples).
I have not found any tests in CTS that do partial copies of
multisampled images, so I tested this with a full multisampled image
copy, using this test:
dEQP-VK.api.copy_and_blit.core.resolve_image.whole_copy_before_resolving.4_bit
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This fallback is required when we have to do partial resolves. It
works the same way as other blit fallbacks for copy operations: it
will bind the source image as a source texture and blit the selected
region to the destination image.
The difference in this case is that the source image is multisampled
and the blit shader needs to fetch and average individual samples for
each texel.
This gets us to pass all the remaining test cases in
dEQP-VK.api.copy_and_blit.core.resolve_image.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
vkCmdCopyImage can be used to copy multisampled images. We can
easily support that on the TLB path, which copies full images.
For partial copies we will need to amend our blit shader path
to support multisampling resolve.
Fixes:
dEQP-VK.api.copy_and_blit.core.resolve_image.whole_copy_before_resolving.4_bit
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we were just assuming that it would work (so we could try to
access a NULL pointer), and not freeing it properly.
Fixing that was somewhat messy due pipeline_compile_graphics using a
temporary array and then update the internal pipeline stages. As we
are here we just removed the array and simplified
pipeline_compile_graphics code.
Fixes following tests:
dEQP-VK.api.object_management.alloc_callback_fail.graphics_pipeline
dEQP-VK.api.object_management.alloc_callback_fail_multiple.graphics_pipeline
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
In order to reduce the number of shader builds after pipeline creation
(that ideally shouldn't happen) we pre-generate two shader variants at
pipeline creation time. In addition to the default one, that set the
return size for all texture to 16 bit, we build another variant
setting the return size for all textures to 32-bit. cmd buffer selects
the latter if any of the textures requires 32bit.
So we are using an all 16-bit return size or an all 32-bit return size
variants. This could be slightly improved by pre-generating return
size combinations if the texture number is below a threshold. But that
would require more space, and bigger pipeline creation time, so would
need to be evaluated.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far, when checking for a variant fulfilling a specific v3d key, we
were checking the caches, and if that failed, we compiled a new
variant, and update the current variant.
But we could check first if the current variant fullfils that. This
was not really problematic so far, as checking on the caches was fast,
but now that we could be without any kind of shader cache using
V3DV_ENABLE_PIPELINE_CACHE, it is far better to check first current
variant.
Without this vkQuake3 at 720p drops to 1fps when disabling the cache.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
That it would be used as fallback. Three advantages:
* Having a cache for user operations even if the user doesn't
provide it.
* Having a cache for internal operations. v3dv_meta_copy creates
pipelines for some copy path, so it is interesting to have them
cached.
* Testing: so now the pipeline cache is tested by more CTS tests.
As any other pipeline cache, it can be disabled with the
V3DV_ENABLE_PIPELINE_CACHE. It was suggested that would make sense to
have a specific envvar for the default pipeline cache, but for now
just one envvar is enough.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far for private pipelines we were creating dummy shader modules
where we directly provided the nir shader. But for the pipeline cache
we were using the SPIR-V to generate part of the cache key sha1.
The main use case for private pipelines are meta_copy/clear. Those nir
shaders depend on parameters like the format etc, so we use directly
the serialized form of the NIR shader to generate the sha1.
The other case are the no-op fragment shader that we need to provide
if no fragment shader is defined by the user. For that case we can
just use the default shader name, as the no-op shader is always the
same.
This is required as we plan to add a default pipeline cache, that
would include our private shaders too.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This also includes being able to serialize them as part of
GetPipelineCacheData and to deserialize it as part of
CreatePipelineCache.
So now we can also upload the assembly of the variant as part of the
PipelineCache creation.
Note that from all this the tricky part was the prog_data
serialization. v3d_compile allocates and fill a new prog_data, with
rzalloc. Among other things because it also allocates internally the
uniform list. So we needed to replicate that when deserializating the
prog_data. Ideally we would like to avoid that, and allocate as much
resources as possible using vk_alloc, but that would mean a somewhat
deep change on the v3d_compiler, that we want to avoid as much
possible for now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Heavily based on anv nir caching. One of the bigger difference is that
we don't create the nir shader using a ralloc_context local to the
main compile graphics method. On anv, after compiling the shader, they
discard the nir shader. We need it as we could need it to build shader
variants later.
As anv, we introduce a environment variable to disable the cache:
V3DV_ENABLE_PIPELINE_CACHE
By default is enabled. The main purpose for this envvar is debugging,
in order to provide a easy way to discard a bug on the cache.
It is pending to serialize/deserialize the NIR shaders as part of
GetPipelineCacheData and PipelineCacheCreate. We also plan is to cache
too shader variants. We would do that on following patches.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
And this means providing a proper cache object, and being able to
load/retrieve a cache data with a proper header. Not really caching
anything yet. That would be tackle on following patches.
Note that this no-op cache got all the specific pipeline_cache and
pipeline.cache tests passing on the rpi4.
The following tests are still crashing when using the simulator:
dEQP-VK.synchronization.internally_synchronized_objects.pipeline_cache_compute
dEQP-VK.synchronization.internally_synchronized_objects.pipeline_cache_graphics
But those are an issue of synchronization tests on the simulator, and
not related with the pipeline cache itself. In general synchronization
tests should be tested on the rpi4.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We should always save state on a push before starting a meta operation,
even if we don't have a pipeline, since dynamic state can be set at any
time directly on the command buffer. Similarly, we should always restore
it if the pop after the meta operation signals that it has written any
state, not only if we have a graphics pipeline to restore.
Fixes a rendering artifact in VkQuake.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Since vkCmdClearAttachments executes inside a render pass, we would
benefit from converting it to a draw within the current subpass job to
improve batching and avoid expensive tile load/store operations.
This can dramatically improve performance for applications using this
command, however, we can only use this if we are clearing the base
layers of framebuffer attachments, since otherwise we would need to
use layered rendering, which we don't support yet.
This improves vkQuake3 performance dramatically (almost 100%
performance improvement at 1080p), which calls this twice per frame.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This bit is helpful if we already need to store the buffer, but otherwise
we should not emit the store only to get the clear, we can use the global
clear packet for that and save us an expensive tile buffer store operation.
Also, we have not been using the per-buffer clear bit for depth/stencil
stores since "v3dv: fix depth/stencil clears on hardware", so we should
never been considering clearing needs to flag stores.
This improves vkQuake performance somewhere between 5%-15% by allowing
us to skip the store of the depth attachment.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This gets vkQuake to render correctly, which creates render passes with
stencil load operations even when the depth/stencil attachment format
doesn't have a stencil aspect. While this is a bit weird, it seems to
be allowed by the spec:
"If the format has depth and/or stencil components, loadOp and storeOp
apply only to the depth data, while stencilLoadOp and stencilStoreOp
define how the stencil data is handled."
In our case we were not ignoring it and this was causing that we emitted a
Z buffer load that seemed to clobber the Z clear, preventing all draw calls
from passing the depth test.
While we are at it, also change the depth/stencil store operation (which
was already handling this scenario correctly) to use the format of the
render pass attachment description rather than the underlying image
format.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
If we have a semaphore wait the job cannot be started before the semaphore
has been signaled, so we need to wait before starting the binning stage.
Fixes CTS failures in:
dEQP-VK.synchronization.op.single_queue.binary_semaphore.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we have been getting away with finishing the current job in the
presence of a pipeline barrier and relying on the RCL serialization,
but of course this is not always enough.
This patch addresses synchronization across different GPU units
(i.e. draw indirect after compute), as well as cases where we need to
sync before binning.
Fixes CTS failures in:
dEQP-VK.synchronization.op.single_queue.barrier.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We will need this in Vulkan to support vertex format
VK_FORMAT_B8G8R8A8_UNORM. The hardware doesn't allow to swizzle
vertex attribute components, so we need to do it in the shader.
v2:
- Use nir_intrinsic_io_semantics() to retrieve the location instead
of looping through the shader input variables (Eric).
- Assert that we only have one component (Eric).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The order in which we emit the attributes is relevant, since
GL_SHADER_STATE_ATTRIBUTE_RECORD packets don't include an explicit
attribute index. This means that we need to emit them in driver
location order, since the compiler uses that location to compute
attribute offsets in the VPM.
Fixes ~1300 CTS tests in:
dEQP-VK.pipeline.vertex_input.multiple_attributes.out_of_order.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
For some reason, CTS expects E5B9G9R9 and B10G11R11 with
transparent black border clamping produce alpha 1 instead of 0.
Since border color takes precedence over the texture state swizzle,
the only way to fix this is to lower the texture swizzle in the shader
to set alpha to 1.
Fixes:
dEQP-VK.pipeline.sampler.view_type.*b10g11r11*clamp_to_border_transparent_black
dEQP-VK.pipeline.sampler.view_type.*e5b9g9r9*.clamp_to_border_transparent_black
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
It seems that we only want to set the texture state's depth to the
number of 2D layers divided by 6 when sampling, not wen doing
load/store.
This means that we need to generate two different states and choose
the one to use depending on the descriptor.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
For these we want to divide the number of layers by 6.
Fixes:
dEQP-VK.pipeline.image_view.view_type.cube_array.*
dEQP-VK.pipeline.image.suballocation.sampling_type.combined.view_type.cube_array.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
In OpenGL, unnormalized coordinates are implicit based on the sampler
type (rectangle textures), so the compiler can set the flag when needed.
In Vulkan, however, this is configured explicitly in the sampler object,
so the compiler won't set it and we need to do it manually when we are
writing the P1 uniform.
Fixes:
dEQP-VK.pipeline.sampler.exact_sampling.*.unnormalized_coords
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
When translating combined depth/stencil blits to compatible color blits we
should look at the requested region aspects to decide the color
mask to apply.
Fixes:
dEQP-VK.api.copy_and_blit.*.buffer_to_depthstencil.buffer_offset_d24_unorm_s8_uint_D
dEQP-VK.api.copy_and_blit.*.buffer_to_depthstencil.buffer_offset_d24_unorm_s8_uint_SD
dEQP-VK.api.copy_and_blit.*.buffer_to_depthstencil.buffer_offset_d24_unorm_s8_uint_S_D
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Currently, we end the current job whenever the user emits a
pipeline barrier, but we then expect to have a valid job when
we emit a draw call.
If by the time we have to emit a draw call we don't have a valid
job, we need to create one by resuming execution of the current
subpass.
Fixes some tests in:
dEQP-VK.renderpass.suballocation.attachment_allocation.input_output.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Instead of asserting that users don't try to create images that
would require 4GB+ of memory, error out with the corresponding
OOM error when the user tries to actually allocate the memory
for the image.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The first attribute must be active if using builtins.
This fixes a lot of simulator crashes for vertex input CTS tests.
It should be noted that some of these tests still fail after this
fix though, so there may be some other bug.
Fixes crashes in:
dEQP-VK.pipeline.vertex_input.multiple_attributes.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
In practice we found that we need this for v3d (specifically for cube
map arrays, as they don't support the default value for wrap_i, so a
sampler object is needed to override that value).
It is worth to note that the main reason behind this auxiliar method
was to identify those cases that we didn't have a sampler object
available for Vulkan. So far, we found that we have a sampler object
coming from nir always for that operation.
Fixes cube map array tests like the following:
dEQP-VK.glsl.texture_functions.query.texturequerylod.usamplercubearray_fragment
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is really similar to the existing lower_tex_src_to_offset, but
for now we prefer to keep them independent, just in case we start to
found specific image use-cases as we advance fixing CTS tests.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The hardware doesn't have unorm/snorm packing variants and we were
already lowering the packing versions of these.
Fixes:
dEQP-VK.glsl.builtin.function.pack_unpack.unpacksnorm2x16_compute
dEQP-VK.glsl.builtin.function.pack_unpack.unpackunorm2x16_compute
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
If the meta operation did not change descriptor state then we should keep it,
not reset it.
Fixes:
dEQP-VK.fragment_operations.early_fragment.early_fragment_tests_stencil
dEQP-VK.fragment_operations.early_fragment.no_early_fragment_tests_stencil
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We are treating them as a special case of texture, so the commit is
mostly about integrating them with the existing
SAMPLER/SAMPLER_IMAGE/COMBINED_IMAGE_SAMPLER infrastructure.
This commit doesn't use in any special way the render pass
information, including the dependencies, so it is possible that we
would need to do something else. But this commit gets several CTS
tests, and two Sascha Willem Vulkan demos, so let's start with this
commit and handle any other use case for following commits.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we were using nir->data.num_textures to fill the default values
for the textures used on the shader, and set the value for the number
of textures used.
But nir->data.num_textures doesn't take into account input
attachments, even after nir_lower_input_attachments. Although that
could make sense from a general pov, in our case we are treating input
attachments mostly as textures.
This commit count the number of textures interating through the
pipeline combined index map, as it includes both. This also makes the
populate of the shader key for default values more similar to the one
done at cmd_buffer with real values.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we were not asking the driver for the sampler state if we could
just use the default P1 values. But even if we need to fill P1 (for
example to fill up the output type of the format), if the texture
operation doesn't need a sampler, we can let that field as NULL (so
default values) and avoid calling back the driver for a sampler.
This is not mandatory for OpenGL (as we always have a sampler object),
although still a good to have. For Vulkan this is needed, as we don't
have a sampler object in that case.
v2: reword comment (Eric)
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
When rasterization is disabled there are a number of CreateInfo
structs that should be ignored. We were managing this correctly
for some cases, but not all of them. Specifically, viewport state
must be ignored and we weren't doing that.
Fixes:
dEQP-VK.api.descriptor_set.descriptor_set_layout_lifetime.graphics
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Only free the underlying BO when the exported memory object is freed
to avoid multiple frees of the same memory.
The only exception is winsys BOs where we import a BO created in the
display device into the render device. In this case, we only have one
memory object referencing the BO and we want to destroy it with that
memory object.
Fixes:
dEQP-VK.api.external.memory.dma_buf.*
dEQP-VK.api.external.memory.opaque_fd.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The hardware can't do this, so we need to record a CPU job that will
map the indirect buffer at queue submission time, read the dispatch
parameters and then submit a regular dispatch.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
New jobs need to re-emit all state. Typically, this is achieved
by resetting all dirty state flags when we start a new job, but
for index buffers we were not using a dirty bit because we always
emit them immediately. This patch adds the bit and only tries
to skip index buffer state if the bit is not dirty, which will
ensure that we will always emit it for new jobs.
This fixes a regression in the shadowmapping demo from Sascha Willems
introduced with "v3dv: try harder to skip emission of redundant state".
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Focused specially on the cache, how many BOs and how much bo_size is
store on the cache, freed bos, bos moved to cache etc.
Initially not configured with V3D_DEBUG (like v3d) to avoid a runtime
check on most of v3dv_bo functions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
V3DV_MAX_BO_CACHE_SIZE can be used to configure it.
So one way to disable the bo cache is setting V3DV_MAX_BO_CACHE_SIZE
to zero. This would still run all the bo_cache size checks, but having
another envvar just to ensure that anything related to the cache is
used seemed like an overkill.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Heavily based on the already existing for the v3d OpenGL driver, but
without references, and with some extra OOM checks (Vulkan CTS has
several OOM tests).
With this commit v3dv_bo_alloc and v3dv_bo_free became frontends to
the bo_cache. The former tries to get a BO from the cache if possible,
and the latter stores the BO on the cache if possible. The former also
adds a new parameter to point if the BO to allocate is private.
As v3d we are only caching private BOs, those created by the driver
for internal use (like CLs, tile_alloc, etc). They are the ones with
the highest change of being reused (for example, CL BOs are always
4KB, so they can always be reused). User-created BOs can have any
size, including some very large ones for buffers and images, which
makes them far less likely to be reused and would add a lot of memory
pressure if we decided to cache them.
In any case, in practice, we found that we could get a performance
improvement by caching also user-created BOs, but that would need more
care and an analysis to decide which ones makes sense. Would also
require to change how the cached BOs are stored by size. Right now
there are an array of list_head, that doesn't work well with big
BOs. If done, that would be handled on a separate commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Both the API user and the driver may attempt to map a BO, possibly
only partially and using different ranges. This is a problem because
we only have a single map per BO. Fix this by making sure that when
a BO is mapped, we always map its entire range. This way if a BO
has been mapped before, we know that map is still valid no matter the
region we need to access now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The blit shader path for buffer to image copies is pretty bad,
since it needs to produce a tiled image from the linear buffer
prior to emitting the blit copy.
This patch adds a new preferential path where we implement the
copy using the CPU, similar to what the GL driver does for
texture uploads. This makes vkQuake2 at least 4x faster when
dynamic lights are enabled (which triggers dynamic texture
updates).
We also tested a GPU path where we use a shader that takes the
linear buffer as a UBO and copies directly from it. This also
shows a clear performance gain, but still worse than the CPU
implementation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We had done all the plumbing for this but EZ can be disabled in 3 places
and we were never setting the enable bit in the configuration bits packet.
Also, configuration bits must not enable EZ if this has been disabled in
the RCL for the whole frame, which we do if we don't have a depth
attachment at all.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Otherwise cloned BO lists point to the original list objects and not
the cloned ones, and that will confuse anything that tries to iterate over
them, such as list_length(), leading to infinite loops.
Fixes (in debug mode):
dEQP-VK.api.command_buffers.render_pass_continue
In that test we clone a full CL job from a secondary, and without this,
the BO lists in its CL lists will point to the bo_list field in the
original job, leading to an infinite loop as we assert the expected size
of these lists at queue submit time in handle_cl_job.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Asserting on them makes it easier to identify applications and tests that
try to use unimplemented features.
Also, there are some APIs that relate to optional features we don't
(or can't) support, such as sparse, so for these we just provide
the trivial implementation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This allows us to remove some individual bos for the image and
sampler, used to store the SAMPLER_STATE and TEXTURE_SHADER_STATE. Now
they are prepacked on static memory as part of the vulkan object
struct.
This commit introduces small descriptor structs, used to define what
the bo subregion would contain. It is used mostly to compute offsets
to that specific data, and define the size needed. Having said so, it
would be possible to replace them with some kind of flag (like anv) or
just compute the offset based on the context.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we were saving all the descriptor info on the host memory. With
this commit we do the equivalent that other mesa vulkan drivers (Anvil
and Turnip) and create a bo on the descriptor pool that would be
suballocated for each descriptor.
This would allow to clean up individual bos from some vulkan objects,
reducing device memory fragmentation, and allowing to avoid to alloc
bos for that info. After all, pre-allocating needed memory is one of
the purposes of the descriptor pool.
This commit introduces all the infrastructure, but doesn't use it for
any descriptor yet, as if no descriptor needed data uploaded to a bo.
The idea to decide which info goes to the descriptor pool bo is info
that we would need to upload to a bo in any case, as it is referenced
as an address by any packet.
We could be more aggressive with that general rule, but that would be
enough for now. If in the future we support
VK_EXT_descriptor_indexing, we probably would need to store more info,
as under that extension, descriptors can be updated after being bound.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were assuming that if the command buffer state doesn't have any
attachments (as per the attachment count) the attachment state array
should not be allocated, however, during meta operations it is
possible that the attachment state grows (since meta operations can
emit render passes of their own). In that case, we would grow the
state for the meta operation but then pop the previous attachment
count and we would leak the state.
An example of that is a secondary command buffer which has no
attachment state by default since it doesn't execute a render pass
begin, but that executes one in a meta operation (for
vkCmdClearAttachments for example).
Fix this by making the attachment count an allocation count instead
and not popping it once we finish a meta operation. Also, always free
the state so long as there is a valid pointer, and assert that the
allocated count is not zero in that case.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The main change we are introducing here is that now we allow secondary
command buffers that execute in a render pass to have a job list with
more than one job.
The main issue with vkCmdClearAttachments is that we currently need
this to spawn multiple jobs to clear multilayered framebuffers, as we
need to setup a different 2D framebuffer for each layer to clear and
therefore emit a different RCL for each. We could avoid this
completely by used layered rendering with the "clear rect" path to
redirect the clear rects to appropriate layers of the primary
framebuffer, however, our hardware only supports layered rendering
with geometry shaders, which we don't support at present.
Because vkCmdClearAttachments relies on having framebuffer state
available (something we would not need if we used the geometry shader
implementation), if this is not available in the secondary we need to
postpone emission of the command until the secondary is executed
inside a primary. We do this by using a new CPU job
V3DV_JOB_TYPE_CPU_CLEAR_ATTACHMENTS that is processed during
vkCmdExecuteCommands by calling vkCmdClearAttachments directly in the
primary.
As a consequence of these changes, it is now possible that a secondary
command buffer that runs inside a render pass have any kind of job in
its job list, including partial CLs that need to be branched to and
full CLs that need to be submitted to the GPU as is, so we introduced
a new GPU job type V3DV_JOB_TYPE_GPU_CL_SECONDARY to identify partial
CLs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Event waits can be safely moved before a render pass start, since
event setting and resetting commands cannot happen inside one. We
don't need to go that far, but we can use this to record the wait
in its own separate job and then execute this job before the
binning commands recorded in the secondary command buffer when
we execute the secondary into a primary.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
There are basically two types of scenarios to consider:
- Secondary command buffers that run inside a render pass.
- Secondary command buffers that run outside a render pass.
For the former we want to record their commands into a binning command
list that we can branch to when executed into a primary command
buffer. This means this kind of command buffers don't spawn new jobs,
just the default one where they record the binning commands which
won't include the frame setup, which will be provided by the primary
they will be executed in.
For the latter we don't require anything special, we just record as
many jobs as we need as usual and link that job list from the primary
job list when executed.
This handles most scenarios except:
- vkCmdWaitForEvents
- VkCmdClearAttachments
Both of these can spawn new jobs inside a render pass, which is not
what we want for secondary command buffers. We will address this is
follow-up patches.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We should always do this. So far we have been getting away with this
because we overallocate at v3dv_job_start_frame, but that won't do
for secondary command buffers for example, it is also unreliable
if CLs grow past that initial allocation.
In the future, we might want to fix our emit macros so they do the
allocation check implicitly, which would simplify the code and would
make this process a lot less error prone.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We are currently able to clear any supported format using the TLB, but
this is more consistent with other parts of the code and is what we want
should we add any formats in the future where we can't get away with
TLB clears.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
During a command buffer reset we call cmd_buffer_init(), which will
add the command buffer to the pool, so make sure we remove it first
and that we use a safe iterator when resetting a pool.
Fixes:
dEQP-VK.api.command_buffers.pool_reset_reuse
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
That it is justconvert VkSpecializationInfo to
nir_spirv_specialization and pass it to spirv_to_nir.
The code is also basically the same used by anv, tu, and radv
Eventually it would make sense to move it to a common place.
Note that we are using calloc there to allocate the temporary
spec_entries. Trying to use vk_alloc2 causes some problems with the
nir_validate.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
In order to properly check (and possibly compile) shader variants we
need a pipeline and a compatible descriptor set. So far we were trying
to do that check as early as possible, so we were trying to do it at
CmdBindPipeline or CmdBindDescriptorSets, and a combination of dirty
flags. This showed to not cover all the corners cases, and made the
code complex, as needed to handle cases where the descriptors were not
yet available, and return early. The latter also meant that we were
running several checks that failed in the middle.
This commit moves the variant check to CmdDraw, when we should have a
pipeline and compatible descriptor sets, and simplifies and makes more
strict the existing code.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This reverts a previous half-attempt at an implementation of events
using a BO to hold the event state, and provides a full
implementation. V3D doesn't have any built-in GPU functionality to
wait on any kind of events, so we need to implement this in the driver
an therefore we no longer need to use a BO for the event state.
Instead, we implement GPU waits by using a CPU job for the wait
operation that spawns a wait thread if the wait operation doesn't have
all its events signaled by the time it is processed. To implement the
semantics of the wait correctly, any jobs in the same command buffer
that come after the wait will not be emitted until the wait thread
completes.
If a submit spawns any wait threads for a command buffer we can't
signal any semaphores for it until all the wait threads complete and
we know that all the jobs for those command buffers have been
submitted. The same applies to the submit fence, if present.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is generally very difficult to handle properly everywhere, but
at least this is good enough to make the few CTS tests for this happy.
Fixes (on Rpi4):
dEQP-VK.wsi.xlib.swapchain.simulate_oom.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
There is a hw bug by which the only way to clear the depth/stencil
tile buffers is to emit a clear of all tile buffers, so if we have
to do any such clears, we just emit a single clear of all tile
buffers and skip doing any per-buffer clears, even for color buffers,
since they would be redundant.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were declaring the destroy callback function as taking a pointer for the
vulkan object handle and relying on an implicit conversion to the Vulkan
handle type, however that would be incorrect on 32-bit platforms, where
non-dispatchable Vulkan objects (the kind that we may allocate privately during
command buffer recording), are defined as uint64_t, so the signature of the
destry callback type doesn't match the signature of the actual Vulkan
function, leading to bogus results. Fix that by using uint64_t instead.
This fixes compilation warnings and also crashes in some tests when
compiling and executing natively in Rpi4.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were pre-packing the constants from the pipeline state and then
always emitting that at draw time, ignoring dynamic state. This makes
it so we don't prepack at pipeline creation time and we always emit
the correct constants directly the command buffer dynamic state.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We can now implement all depth/stencil blits as compatible color blits,
so let's just have the blit shader interface simply convert any blit with
a depth/stencil format to a compatible color blit (like we were already
doing for s8d24) and get rid of the depth blitting path. This also allows
us to ignore the blit aspect in the blit pipeline cache key.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Now that our blit shader interface supports color writemasks and swizzles,
users can specify depth/stencil blits as color blits properly when needed,
so let's just do instead, which is more straight forward and less error prone.
Because s8d24 to s8d24 blits always require the same conversion to color blit,
the blit shader will handle these automatically, including blits of just one
aspect, however, for scenarios where there are additional semantics to consider
(particularly, copies from/to buffer), it is still up to the blit shader caller
to specify a proper color blit matching the required semantics.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We implement this by blitting the requested region to a linear image
setup to use the buffer memory store at the requested offset.
Because we can't store linear depth/stencil images, we implement
copies of depth/stencil aspects by using a compatible color blit.
To do this, we also need to account for the fact that when we are
copying depth from a d24 format we need to copy them from the MSB
24-bits of each word as provided by the hardware and store them
in the LSB 24-bit of the buffer (as per Vulkan requirements). This
is achieved by expanding our blit interface to also accept a swizzle
to apply to the source texture.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The page alignment requirement is for UIF images only, and for linear
images it is actually useful to use a 4-byte alignment so we can
use them to write images to linear buffers at arbitrary positions, which
we will need when copying subrects of an image to a buffer.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
An image can be suballocated from a larger memory allocation, in which
case we get a memory offset for the start of the bound region at
vkBindImageMemory. Take that offset into account when doing image
addressing calculations.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
With compressed formats we compute the final height/width dividing by
the block sizes. On some cases the block sizes are bigger that the
original values, or it is not a exact division, so we need to round up
the division.
Fixes tests like:
dEQP-VK.texture.compressed.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
As opposed to returning true only if it can handle the case and it
also successfully processes it.
This is because we expect handled cases to be successfully processed
except in abnormal situations such as out-of-memory errors. If an OOM
is the reason a fallback path fails, we don't want to try another path
(which will likely hit an OOM too): we have already recorded the OOM
error in the command buffer and we just want to stop executing the
command, so just flag the case as handled and move on.
Also, if we don't do this, in an OOM scenario we'll likely end up running
out of fallback paths and end up asserting (on debug builds), which makes
some CTS tests unhappy because they expect OOM to be handled more
gracefully, so this allows us to make CTS happy also in debug builds,
which is convenient.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
In "v3dv: implement stencil aspect blits for combined depth/stencil format"
we only implemented the support we needed to implement partial copies.
Copies only allow to copy a single aspect, so we really only supported blitting
either the depth aspect or the stencil aspect, but not both, however,
actual blits (as per vkCmdBlitImage) allow to blit both stencil and depth
aspects at the same time, so this adds support for that.
Finally, this also fixes the fact that we were not really masking color writes
effectively for stencil-only blits, since create_blit_pipeline() would check
the requested aspect to see if it would need to mask writes, but by the time
we called this, we had already switched the aspect to color. The reason this
was not caught before is that for copies this would only mean that when we
copied stencil we would also copy depth, and the image copy CTS tests are
probably copying both aspects anyway.
Fixes:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.depth_stencil.*d24_unorm_s8_uint*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The idea is that we also want to use the blit mechanism to implement
the copy, like we do for partial image copies. Unfortunately, we can't
sample from a linear image, so we first need to upload the buffer
contents to a tiled image, and then blit from that image to the
destination, which is not great for performance or memory usage.
In the future, we mihgt be able to do better by using a specialized
shader for these copies that takes a UBO as input instead of a texture.
The shader would then be able to access the linea buffer through the
UBO directly without having to copy the buffer contents to a tiled
image first.
This only supports color images for now, we will add support for
depth/stencil images separately.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Since we don't support rendering to compressed formats, we implement this
by using a compatible format with the same bpp as the compressed format.
However, we need to take into account that the underlying compressed image
bpp is computed for a full block, so when we specify regions on the
compressed image, we need to divide offsets and dimensions by the block
size.
This works well for anything that copies to a compressed format using
the TLB, but it doesn't specifically address copies from compressed
formats to other compatible images. These go through the blit path
and require to copy by blitting (texturing) from the compressed format
to another format. In this case, we choose a comptible format with
the per-texel bpp (not the block bpp) of the compressed format so it
matches the setup for the blit operation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
To do this we just implement the stencil blit as a masked color bit
with uint8 format. This allows us to support blitting on combined
depth/stencil formats, and therefore, also partial image copies
for these formats.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
For this we use blits with nearest filtering and choose a compatible
format for the render target if the copy format is not renderable.
This works for all supported formats except combined depth/stencil
(for which we don't support blitting for now) and compressed formats.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we were just asserting or aborting if any of the internal
method used during the pipeline creation failed.
We needed to change the return value of several methods, in order to
bubble up the proper memory allocation error.
Note that as the pipeline creation is complex and splitted in several
methods, if an error happens in the middle, it returns back, and rely
on the higher level to call PipelineDestroy. This method needs to take
into account that some of the resources could have not been allocated
when freeing it.
Also note that v3dv_get_shader_variant is used during the pipeline
bind, as with the new resources bound, we need to check if we need to
recompile a new variant. At that moment we are not creating a new
vulkan object so we can really return a OOM error. For now we just
assert on that case.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This was allocating image views in the stack, which was kind of
hackish, and of course was expecting that allocated Vulkan objects
could be immediately freed after being recorded in the command buffer
which is not always safe to do in the general case (even if it was
here). This makes things more consistent and reliable.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This uses the framework to register private commmand buffer objects
that get freed automatically when the command buffer is destroyed by
the application.
This change also moves the descriptor set pool that the meta blit path
uses to allocate descriptors for the blit source textures, from the
device to the command buffer, so we can have a descriptor pool per
command buffer. This is necessary to ensure correct behavior when
doing multi-threaded command buffer recording (alternatively, we would
have to lock around the descriptor set allocation code, which would be
undesirable).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This allows the driver to register private Vulkan objects it creates as part
of command buffer recording (usually for meta operations) in the command
buffer, so they can be destroyed together with it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The hardware can't do sampling from raster depth/stencil textures and
1D images are always raster, even if the user requested optimal tiling.
Using an image as the source of a blit is a transfer source operation,
so we can't expose that either, as blitting involves sampling.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
In this case the hardware seems to copy the bits that actually fit
in the destination instead of clamping to the maximum value allowed
by the bit size of the destination components like Vulkan expects.
Fix this by adding code to clamp the color results to the bit size
of the destination components.
It should be noted that this is a general issue with the hardware,
and while we can fix it here for blits done by the driver, user
shaders writing outside the range of the destination bit size will
have the same issue and we probably don't want to add code to clamp
every single render target write in every shader with integer format.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The pipeline stages have a reference to the NIR code produced from
the SPIR-V shader modules, but they never destroy it.
It should also be noted that our coordinate shader stage was sharing
the NIR with the vertex shader stage, which is kind of tricky to handle
and probably very error prone. Just make sure each pipeline stage has
owns it NIR shader and that we always free it when the stage is
destroyed.
Also, for the case of NIR modules created by the driver internally,
we always need to clone them, since the driver will destroy the NIR
as soon as it is done creating pipelines with it. We could also not
clone it and let the pipeline stage take ownership of the NIR code for
NIR modules, but that would be inconsistent with how ownership works for
SPIR-V modules.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This needs to be updated everytime we bind a new pipeline, but we can
bind a pipeline and not have an actual job yet, so we want to postpone
this until we actually need to emit CFG_BITS, during the pre-draw
setup.
Also, rename the update helper to be about the job rather the command
buffer, since it is updating state that we track per job.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were asserting that we had a valid subpass index, but we can have
meta operations that run outside a render pass, such as for blitting.
If we allow this, then we also need to account for the fact that
pipelines can be bound outside a render pass too.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
While very limited in scope, this might be the most efficient way to blit
when applicable. In fact, we might also want to use this for the image copy
commands when possible instead of the TLB.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is similar to the scenario where we have a submit without
any command buffers, even if we don't have any actual GPU work to do
we still might need to signal fences/semaphored and possibly wait on
previous jobs to finish, so we need to submit something to the kernel
to get all that done right.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The design for queries in Vulkan requires that some commands execute
in the GPU as part of a command buffer. Unfortunately, V3D doesn't
really have supprt for this, which means that we need to execute them
in the CPU but we still need to make it look as if they happened
inside the comamnd buffer from the point of view of the user, which
adds certain hassle.
The above means that in some cases we need to do CPU waits for certain
parts of the command buffer to execute so we can then run the CPU
code. For exmaple, we need to wait before executing a query resets
just in case the GPU is using them, and we have to do a CPU wait wait
for previous GPU jobs to complete before copying query results if the
user has asked us to do that. In the future, we may want to have
submission thread instead so we don't block the main thread in these
scenarios.
Because we now need to execute some tasks in the CPU as part of a
command buffer, this introduces the concept of job types, there is one
type for all GPU jobs, and then we have one type for each kind of job
that needs to execute in the CPU. CPU jobs are executed by the queue
in order just like GPU jobs, only that they are exclusively CPU tasks.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Most of our state doesn't carry over across jobs, so it needs to be re-emitted.
For example, if we have two render passes running back to back using the
same pipeline, the application could decide to only bind the vertex buffer
or/and the pipeline just once, but as soon as we record the second render
pass and create a new job for it we will need to re-emit the shader state
record for it just because it is a new job.
We could probably only do this for jobs inside a render pass, since those
are the only ones that actually draw something and need to care about
dirty state, however, there is no harm in doing this for all jobs, for the
same reason.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were enabling VK_FORMAT_FEATURE_SAMPLED_IMAGE_FILTER_LINEAR_BIT for
any format valid for texturing, but for example, right now we don't
support linear filtering on any depth format.
This is needed to get some hundreds of tests like this:
dEQP-VK.pipeline.sampler.view_type.1d.format.r32g32_sfloat.mag_filter.linear
properly skipped (those were all Crashes with the simulator, and
almost all Fails with the real device).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
When testing if we could merge the new subpass into the previous one
We were taking the subpass index from the state (which isn't updated
to the new subpass until a bit later when the job for the new subpass
has been settled). This means that we were doing the merge checks with
the previous subpass, not the current one.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
There are some texture operations (like mipmap query levels) that
doesn't require a sampler. In fact, you should ignore it. So we need
to take it into account when combining the
indexes. nir_tex_instr_src_index is returning a negative value to
identify that case, but as we are using a uint32_t to pack both values
(for convenience, easy to pack/unpack the hash table key), we just use
a uint value big enough to be a wrong sampler id.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This requires that we emit a specific draw command and that we emit
the base instance if not zero right before the instanced draw call.
Notice that we were already doing this for instanced indexed draw
calls, so here we are only adding this for non-indexed draw calls.
We also need to flag whether the vertex shader reads the base instance
in the shader record (which it will if it reads uses gl_InstanceIndex,
as that is lowered in Vulkan to base_instance + instance_id).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Vulkan lowers gl_InstanceIndex to load_base_instance +
load_instance_id, so we need to implement loading the base instance in
the compiler.
The base instance is set by the BASE_VERTEX_BASE_INSTANCE command
right before the instanced draw call and it is included in the VPM
payload together with the InstanceID and VertexID if this is requested
by the shader record.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
OpenGL doesn't have the concept of individual texture and sampler, so
texture and sampler indexes have the same value. v3d compiler uses
this assumption, so for example, the texture info at the v3d key
include values that you need to use the texture format and the sampler
to fill (like the return_size).
One option would be to adapt the v3d compiler to handle both, but then
we would need to adapt to the lowerings it uses, like nir_lower_tex,
that also take the same assumption.
We deal with this on the Vulkan driver, by reassigning the texture and
sampler index to a combined one. We add a hash table to map the
combined texture idx and sampler idx to this combined idx, and a
simple array to the opposite map. On the driver we work with the
separate indices to fill up the data, while the v3d compiler works
with the combined one.
As mentioned, this is needed to properly fill up the texture return
size, so as we are here, we fix that. This gets tests like the
following working:
dEQP-VK.glsl.texture_gather.basic.2d.depth32f.base_level.level_2
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were using the subpass render target index to index into the framebuffer,
which is not correct, since the framebuffer is defined for the render pass.
We should use the attachment index instead.
Fixes:
dEQP-VK.renderpass.suballocation.attachment_allocation.roll.{40,48}
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were using the subpass render target index to index into the framebuffer,
which is not correct, since the framebuffer is defined for the render pass.
We should use the attachment index instead, which we were already computing
but that we were not actually using for indexing by mistake.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We must update our check for whether the render area is tile-aligned for
each subpass, since the hardware will update tile sizes for each RCL.
Fixes:
dEQP-VK.renderpass.suballocation.attachment_allocation.roll.8
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
While this was already being achieved by the scissort rect set on the
pipeline, we still want to limit the render area to we reduce the tile
coverage of the pass as much as possible and avoid unnecessar
tile load and store operations.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is the same as the subpass start version, only that it won't
emit subpass clears. This is necessary when resuming a subpass
from a partial clear to make sure we don't try to clear subpass
attachments again.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Since a meta partial clear starts a new render pass, we need to store
all state that can be changed with vkCmdBeginRenderPass.
Also, since the meta clear pipeline sets dynamic state, we also
have to restore that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
They are bound at the set layout, and cannot be changed. From
VkDescriptorSetLayoutBinding spec:
"pImmutableSamplers affects initialization of samplers. If
descriptorType specifies a VK_DESCRIPTOR_TYPE_SAMPLER or
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER type descriptor, then
pImmutableSamplers can be used to initialize a set of immutable
samplers. Immutable samplers are permanently bound into the set
layout and must not be changed; updating a
VK_DESCRIPTOR_TYPE_SAMPLER descriptor with immutable samplers is
not allowed and updates to a
VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER descriptor with immutable
samplers does not modify the samplers (the image views are updated,
but the sampler updates are ignored)"
We stored them as part of the set layout. It also means that when we
need the sampler (like for texture operations) we can't just ask for a
descriptor, as it would not have the sampler. A new method is created.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The problem with this is that TLB clears always clear and store full
tiles, so if our render area is not perfectly aligned to tile boundaries
we end up clearing all pixels in tiles that are only partially covered.
In this scenario we have to avoid using TLB clears and instead fallback
to clearing by rendering a scissored quad in the clear color, like we do
for partial clears in vkCmdClearAttachments.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
There are some scenario where this won't happen and don't imply a bug.
For example, if we find a pipeline barrier, we will finish the current
job automatically and won't start a new one. There may be other
scenarios where we may want to do the same.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
And flag dirty scissor state if the render area is constraining the
current clip window, so that we emit a new clip window with the next
draw call.
Also, remove the early emission of a clip window for the render area
if we didn't have any scissor state. TLB clears ignore the clip
window, so this was doing nothing for us.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we have been caching the first pipeline we produced and always
reusing that, which is obviously incorrect.
This change implements a proper cache and also takes care of releasing
the cached resources when the device is destroyed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is achieved by rendering a quad in the clear color for each layer
of each attachment being cleared. Right now we emit each clear in a
separate job with a single attachment framebuffer, but in the future
we may be able to extend the solution to using multiple render targets
and clear multiple attachments with a single job.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
According to the Vulkan 1.0 spec:
"attachmentCount is the number of VkPipelineColorBlendAttachmentState
elements in pAttachments. This value must equal the colorAttachmentCount
for the subpass in which this pipeline is used."
so let's assert exactly that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
For arrays we are adding one entry on the map per array element. This
makes getting back the descriptor for each array element easier, as
for example, for ubo arrays, each array element can be bound to a
different descriptor buffer.
For samplers arrays this would also make sense.
Fixes crashes on tests like:
dEQP-VK.binding_model.shader_access.primary_cmd_buf.combined_image_sampler_mutable.fragment.descriptor_array.2d
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Right now shader variant update on the cmd_buffer is based on populate
a new key using the descriptor bounds, assuming that we would get one
final descriptor for any usage on the shader. But if the descriptors
are being bound with more that one call to CmdBindDescriptorSet, that
would not be true, as the first calls would not bind all the
descriptors. Right now this was raising an assert.
Now we allow that as possible, and for the case of checking variants,
we just stop it, and we don't clean up the SHADER_VARIANT flag.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Specially after CmdBindDescriptorSets, it is likely that we would need
a new shader variant, like for example if sampler descriptor sets are
bound.
At that moment a new v3d key is populated, using as base the one used
at pipeline creation, so only cmd_buffer depending values are changed.
Then a new variant is requested. Note that internally it is handled
with a cache, so no new compilation will be done if not needed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far, we were doing the compilation to qpu when the pipeline was
created (as part of vkCreateGraphicsPipeline).
But this would not be correct when some specific descriptors are
involved, like textures. For that case some nir lowerings depend on
the texture format, and that info is not available until the specific
descriptors are bound to the command buffer. In the same way, the same
command buffer with a given pipeline could get their descriptor bound
again.
So it would be needed to support compilation variants of the same
shader. So finally, the v3d_key would work as keys, as the variants
would be tracked with a hash table.
This commit introduces the new structures for that. What we were
building as the final qpu shader would become the initial default
variant for the pipeline. We are also saving the keys used at that
point, to avoid needing to fully regenerate them when a new variant is
created. Not just for performance, but also to avoid needing to track
the graphics pipeline create info structure.
The code to handle updating the current variant would be done on
following commits.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This include SAMPLER, COMBINED_IMAGE_SAMPLER and SAMPLED_IMAGE
descriptors.
In order to support them we do the pre-packing of TEXTURE_SHADER_STATE
and SAMPLER_STATE when Images and Samplers (respectively) are
created. Those packets doesn't need to be tweaked later, so we upload
them to an bo.
A possible improvement of this would be that the descriptor pool
manages a bo for all descriptors, that suballocate for each descriptor
allocated. This is what other drivers do (and as far as I understand,
one of the reasons of having a descriptor pool).
Immutable samplers are not supported, will be handled on a following
patch.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
v3dv_descriptor is going to be expanded with more data, so it doesn't
make sense anymore to handle a fake descriptor for the push
constants. Introducing a new struct, that is just a pair
bo/offset. Initially named v3dv_resource, as it could be the base to
reuse bos for different resources (like assembly bo)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were configuring the TLB to use ABGR1555, but that doesn't really
give us what we want. There were two issues:
* We were using the wrong Texture Data Format and Output Image
Format. In fact those we need to use were not included on the
packet file.
* Even using the correct one, we need to do a RB swap to match the
semantics of the Vulkan format.
This patch fixes both issues. As we are here we keep the formats we
were already used, that would provide support for r5g5b5a1.
So this patch makes tests like the following going from skip to pass:
dEQP-VK.texture.filtering.2d.formats.r5g5b5a1_unorm.nearest
And the following test from fail to pass:
dEQP-VK.texture.filtering.2d.formats.a1r5g5b5_unorm.nearest
Note that the R5G5B5A1_UNORM_PACK16 is not mandatory, but as we
already made the effort to understand them and get them working let's
just keep it on the list
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This fixes multi-layer vkCmdClearAttachments CTS tests. The underlying
problem here is that even though this command runs inside a render pass,
it is implemented as a separate job that emits its own RCL to program
render target color clears, so we should not emit the subpass RCL for it.
Fixes 250+ CTS tests (all but a1r5g5b5) in:
dEQP-VK.api.image_clearing.core.clear_color_attachment.cube_layers.*
dEQP-VK.api.image_clearing.core.clear_color_attachment.multiple_layers.*
dEQP-VK.api.image_clearing.core.clear_depth_stencil_attachment.multiple_layers.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We had changed the interface for job starts so they take the subpass index
rather than a boolean indicating whether the job starts a new subpas, but we
forgot to update this accordingly.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were not considering that the depth of the image is minified according
to its miplevel. For some reason this only seemed to show up for tiled
images.
Fixes (except a1r5g5b5 format):
dEQP-VK.api.image_clearing.core.clear_color_image.3d.optimal.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
If a format is not supported by the TLB, we can still use the TLB path
if we setup the render target using a compatible format. The only caveat
is that for clears we need to pack the clear value using the original
format of the underlying image, not the compatible format.
With this change we get to use the TLB path successfully for all supported
image formats (except a1r5g5b5, at least for now) so long as the region starts
at (0,0), and we only need to consider fallback paths for partial copies
and clears, not because of the format.
This gets us to pass a few extra hundreds of tests in:
dEQP-VK.api.image_clearing.core.clear_color_image.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is required to pass line rasterization tests in CTS while exposing
at least 4 bits of subpixel precision, which is the minimum required
by the spec. We are currently exposing 6 bits, however, if we select
diamond exit instead of perp end caps rasterization, then even if we
lower subpixel precision bits to 4 bits, we'd still fail one of the tests.
Fixes:
dEQP-VK.rasterization.flatshading.line_strip
dEQP-VK.rasterization.flatshading.lines
dEQP-VK.rasterization.interpolation.basic.line_strip
dEQP-VK.rasterization.interpolation.basic.lines
dEQP-VK.rasterization.interpolation.projected.line_strip
dEQP-VK.rasterization.interpolation.projected.lines
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The name suggests that this method emits the full graphics pipeline,
but that is not the case (ie: scissor is emitted at a different
point).
Right now that method is mostly emitting the gl_shader state plus some
other packets. So we just renamed it to emit_gl_shader_state, and move
the other packet emission to new emission methods.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
First this makes it so we only clear dirty stencil state if we actually
emit the stencil packets. Second, now we check if we need to emit stencil
whenever a new pipeline is bound, since a new pipeline may not change the
dynamic stencil state but might still be changing other aspects of stencil,
which means that even if the dynamic stencil state is not dirty, we might
still need to emit new stencil packets.
This fixes a regression in VkRunner test depth-buffer.vk_shader_test after
we dropped the redundant emission of stencil state, since that redundant
emission was happening unconditionally whenever we had a new pipeline.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The current implementation assumed that we would clear all dirty state
after we have emitted a pipeline, but that is not always true. In
particular, we don't emit blend constants unless we need them, so we
can't clear its dirty bit until we have bound a pipeline that actually
requires them.
The change implemented here has individual emit functions clear pipeline
states they hadle as they emit the corresponding state and we clear
the dirty pipeline bit at the end.
This fixes some CTS pipeline blend tests where we have multiple draws
with blending and only some of them require blend constants. In this case,
the original behavior would clear the blend constants dirty bit on draw
calls that don't actually emit blend constants (because they don't need
them), making the later draw calls that do need them fail because they
don't emit them either (since the previous draw calls cleared the dirty
bit).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Not quite sure why this is required though. Conversion from/to
sRGB happens on tile loads and stores, with the tile buffer
being always linear, so there should be no difference.
Fixes all test failures in:
dEQP-VK.pipeline.blend.format.r8g8b8a8_srgb.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
In this mode, which can be activated with V3D_DEBUG=always_flush like
in the GL driver, we flush every draw call separately. For now this
is useful for debugging, but we can also set the flag internally on
specific jobs when we identify scenarios where we need the same behavior.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
It is valid to submit with an empty list ofcommand buffers, however,
we still need to wait on the pWaitSemaphores provided and only signal
the pSignalSemaphores and fence once we have finished waiting on them
to honor the semantics of the submission.
Because waiting and signaling happens in the kernel, the easiest way
to do this is to submit a trivial no-op job to the GPU. To do this,
we need to refactor some of our code so that code that might have been
operating on a command buffer starts operating on a job instead, so we
can resuse most of our infrastructure to create the no-op job.
Additionally, because no-op jobs are created internally by the driver,
we are responsible for destroying them too. For this, we bind a fence
to each no-op job we submit and we test for completion of in-flight
no-op jobs (and destory them if completed) every time vkQueueSubmit
is called.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
There are a handful of tests that simulate 'out of memory' situations
during swapchain image creation, and these can lead to failed job
allocations when the driver is running on the prime blit path, as that
involves creating a command buffer. The tests expect us to handle this
scenario gracefully and return an appropriate OOM error as a result.
This make sure we don't try to dereference a job if we failed to allocate
it so we don't crash and can return the OOM error gracefully in the
process.
Fixes:
dEQP-VK.wsi.xlib.swapchain.simulate_oom.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Vulkan has Z NDC range in [0, 1], we where using OpenGL's [-1, 1].
Fixes:
dEQP-VK.draw.inverted_depth_ranges.nodepthclamp_deltasmall
dEQP-VK.draw.inverted_depth_ranges.nodepthclamp_deltaone
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
In this scenario we can end up generating a clip window where
the max coordinates are smaller than the min coordinates and the simulator
asserts.
Fixes:
dEQP-VK.draw.scissor.dynamic_scissor_outside_viewport
dEQP-VK.draw.scissor.static_scissor_outside_viewport
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were not doing this right for images created with VK_IMAGE_TILING_LINEAR.
Also, only assign a DRM modifier if the image has been created for WSI.
This fixes a bunch of CTS tests that use copies to linear images to verify
the result of rendering.
Fixes multiple failures in:
dEQP-VK.draw.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This triggers when dumping CLIF because the dump process involves
internally mapping all the BOs. We could unmap them there after we
are done, but there is really no reason why we need to assert on this,
so let's just keep things simple and unmap. If the user is really
double mapping, that should be caught by the validation layers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
It is possible to specify a depth/stencil clear but then have no
actual depth/stencil attachment used in the subpass. In that case
we are already skipping the store.
Fixes:
dEQP-VK.renderpass.suballocation.unused_clear_attachments.*depthonly_unused
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
For some reason our backend compiler doesn't have an implementation for
usubborrow, only for uaddcarry. We could add it, however, the existing
uaddcarry implementation also seems to fail some of the CTS tests,
which pass if we lower.
Fixes:
dEQP-VK.glsl.builtin.function.integer.uaddcarry.*
dEQP-VK.glsl.builtin.function.integer.usubborrow.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
They still need some review to get some real final values, but what we
had before were somewhat too low. Increasing them a little. This
allows to get some CTS tests from skip to pass, which afais they are
using reasonable values.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This gets us to split matrix vertex inputs with direct access to
vectors, which the backend compiler expects. The GL driver seems to
rely on this too, which is called by the Mesa state tracker.
Cases with indirect cases seem to be handled at link time via
nir_lower_io_arrays_to_elements, which we are already calling.
Fixes crashes in:
dEQP-VK.glsl.conversions.matrix_to_matrix.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Push constant tests were still working without taking this into
account because we can't really fine graine how much we allocate for
them.
For now we only use them to know if we should allocate/fill the ubo
for push constants or not.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Also, Vulkan uses the same provoking vertex rules are Direct3D, which
changes from OpenGL's default. Make sure we honor that.
Fixes:
dEQP-VK.spirv_assembly.instruction.graphics.cross_stage.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We had 4 really small functions for the io lowering (which main
purpose is getting locations assigned).
This commit merge them to 2 slightly bigger functions. It also fix a
typo were a vs lowering was calling nir_assign_io_var_locations using
MESA_SHADER_FRAGMENT.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The main reason is getting nir_lower_io_to_temporaries and
nir_opt_peephole_select to get some switch/ifelse with store outputs
simplified out, as some tests were failing because the v3d compiler
was not able to handle them.
As this needed some extra lowerings to get that working, we are also
revamping the full nir processing.
Heavily based on intel preprocess/optimize nir passes.
As mentioned on some other v3dv commits, some of this work could be
done by adding those passes to the v3d compiler, allowing to avoid
some duplication. But at this point we prefer to keep the v3d compiler
untouched as much as possible. This could be revisited on the future.
We also remove some passes that are unnedeed or we already know are
called by v3d_compiler. Although we try to not add too many passes,
we are already adding passes in advance that we think that would be
useful in the near-term.
Among others, gets the following tests working:
dEQP-VK.binding_model.descriptor_copy.graphics.storage*
dEQP-VK.binding_model.descriptor_copy.graphics.uniform*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This was intended to check that we only got
VK_COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE_BIT
on secondary command buffers, but the spec states that this flag
should be ignored for primary command buffers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were clearing memory to 0 on create and reset, including the
loader data, which is not correct on reset since it would cleat
the loader dispatch table for the command buffer. We should only
clear driver data.
Also, don't use vk_zalloc for the command buffer allocation, since
we are already clearing on reset and we always reset when we begin
recording.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were getting driver location 0 on all output variables, which meant
that our color writes were always being redirected to render target 0.
Fixes dEQP-VK.pipeline.framebuffer_attachment.diff_* which uses multiple
render targets.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This puts all the information required to setup frame tiling into
v3dv_frame_tiling so we no longer need a framebuffer to start a
frame. This makes the code simpler, since frame tiling calculations
happen automatically when we start a new frame and simplifies
the implementation of copy and clear operations that used to
requiere that we setup a fake framebuffer with no actual attachments,
which was a bit of a kludge.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we have been getting away with computing frame tiling information for
the framebuffer object, but that is not correct, since different subpasses
may access different subsets of the framebuffer, with each requiring a
different configuration because the number of render targets and the maximum
bpp can change for each subpass.
This adds a v3dv_frame_tiling struct to keep the frame tiling information and
rewrites the code to compute this for every new job we start.
Fixes a bunch of tests in dEQP-VK.pipeline.render_to_image.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
When we create a new job for a new subpass, we might have to finish
the current job for the previous subpass, so it is important that we
we don't get ahead of ourselves and increment the current subpass index
in the command buffer state until we are really done finishing the
previous job.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were storing the format of the base miplevel in the image view and
we were typically using that instead of the taking the format from
the appropriate image slice. This was a problem when loading or storing
a miplevel other than the base which happened to have a different format.
This also removed the tiling field from the image view to avoid repeating
the same mistake in the future.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This corresponds to PIPE_FORMAT_B5G6R5_UNORM, which is the format that
is natively supported. Also, we can't swap R/B on 3-channel images!
Also, we should rely on the v3dv format table for this rather than
pipe format descriptions since we specify the expected correct swizzles
there for all supported formats. This, for example, gets us correct
beahvior for things like VK_FORMAT_B4G4R4A4_UNORM_PACK16 without
needing to special case it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is actually mandatory for any implementation so there is no
point in not supporting it.
This probably doesn't work yet and we might need to patch the
compiler to emit bounds testing code for TMU accesses.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We don't support 4-bit multisample yet, but we will at some point.
Also, remove point size granularity/range since we were not meeting the
minimum requires, we might want to review that in the future.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
I am not quite certain that this is the way to go though. Here, we are
expecting that the GPU can set/reset the event inside a command buffer
as a 1x1 pixel clear for example, however, there is still the question
of how we get to implement the command buffer wait on an event, since
reading the docs I haven't found any such functionality to be available.
We could think of implementing this by splitting the command buffer
into multiple jobs at the wait command, and then using a separate
thread for job submissions that would poll the event UBO before sending
it to the kernel, but that looks like a bit of a kludge.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were not doing this and that could lead to the kernel refusing the
job if this happened to be gargabe. Make it zero, meaning that we don't
want to keep our bin jobs waiting for anything.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
I accidentally tried to clear D+S of a depth-only image which was not caught
by the validation layers in my environment. This made the simulator crash, but
tracking down the crash to the actual error was not trivial. This should make
it immediately obvious.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
By default they are trivially lowered to load_uniform.
We still need to allocate an UBO for push constants, used for those
that are accessed using a non-const index. This is automatically
handled by the compiler, as it cames back as asking a
QUNIFORM_UBO_ADDR. This is what already does for gallium.
Note that if needing the UBO, we are uploading the full push constant
data. An improvement would be to try to upload only the data that
needs to rely on the UBO (so non-const accesses to uniforms).
Also, the code is not handling getting out of space from the UBO
bo. This would be tackled at a different commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
As with vkCmdCopyImageToBuffer, this implements a fast path using the
TLB for the case where we are copying images from offset (0, 0). We
don't have the fallback path for other cases yet.
For this we need to rethink a bit our loads and stores so we can handle
depth/stencil copies correctly. Specifically, we are expected to do
different things depending on whether we are copying to a linear buffer
or to an image.
When copying depth/stencil to a buffer, we can only copy one aspect
at a time and the result should be tightly packed in the destination
buffer.
When copying depth/stencil to an image, we can copy one or both aspects,
and we need to write them in the corresponding aspect of the destination
image.
Because we can't do stores from the Z/S tile buffers in raster format,
we need to do image to buffer copies of these aspects using the a color
tile buffer and a compatible color format. However, when we are copying
to another image, since we need to write the channels in the corresponding
aspect of the destination image, we need to do this using by loading and
storing from an appropriate Z/S tile buffer.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
For some reason we were ignoring the extent to copy that was
passed in the region to copy and instead we were computing a
a region based on the image size and the selected miplevel.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Some of these may need additional work to work for real, but we should
be able to support them.
We also include some formats that are not supported for images, but
that we want to support for buffers, such as R32G32B32 for a vertex
buffer. In the future we might want to expand the format table to
specify which formats are supported for buffers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
From the Vulkan spec:
"When an image view of a depth/stencil image is used as a
depth/stencil framebuffer attachment, the aspectMask is ignored
and both depth and stencil image subresources are used."
So in that scenario, we ignore the aspect mask on the view and go
check the actual format of the underlying image to decide if we
have depth or depth+stencil aspects.
This gets VkRunner's depth-buffer.shader_test to pass.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
For now this only implements a fast path using the tile buffer, so it
can only be used when clearing full images, but this is good enough
for VkRunner.
The implementation is a bit tricky because this command executes
inside a render pass, and yet, since we are using the tile buffer to
clear, this needs to go in its own job. This means that with this, we
need to be able to split a subpass into multiple jobs which creates
some issues.
For example, certain operations, such as the subpass load operation
(particularly if it is a clear) should only happen on the first job of
the subpass and subsequent jobs in the same subpass should always
load.
Similarly, we should not discard the last store on an attachment
unless we know it is the last job for the last subpass that uses the
attachment.
To handle these cases we add two new flags to the job, one to know if
the job is not the first in a subpass (is_subpass_continue) and
another one to know if a job is the last in a subpass
(is_subpass_finish).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This prevents a crash when the validation layers are enabled for some
of our ubo tests. Again, it seems that if some attachments are
missing, the validation layers remove some of the structs from the
pCreateInfo.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
For that we include the array_index when asking for a ubo/ssbo index
from the descriptor_map.
Until now, array_index was not included, but the descriptor_map took
into account the array_size. This had the advantage that you only need
a entry on the descriptor map, and the index was properly return.
But this make it complex to get back the set, binding and array_index
back from the ubo/ssbo binding. So it was more easy to just add
array_index. Somehow now the "key" on the descriptor map is the
combination of (set, binding, array_index).
Note that this also make sense as the vulkan api identifies each array
index as a descriptor, so for example, from spec,
VkDescriptorSetLayoutBinding:descriptorCount
"descriptorCount is the number of descriptors contained in the
binding, accessed in a shader as an array"
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Focused on getting the basic UBO and SSBO cases implemented. So no
dynamic offset, push contanst, samplers, and so on.
This include a initial implementation for CreatedescriptorPool,
CreateDescriptorSetLayout, AllocateDescriptorSets,
UpdateDescriptorSets, CreatePipelineLayout, and CmdBindDescriptorSets.
Also introduces lowering vulkan intrinsics. For now just
vulkan_resource_index.
We also introduce a descriptor_map, in this case for the ubos and
ssbos, used to assign a index for each set/binding combination, that
would be used when filling back the details of the ubo or ssbo on
other places (like QUNIFORM_UBO_ADDR or QUNIFORM_SSBO_OFFSET).
Note that at this point we don't need a bo for the descriptor pool, so
descriptor sets are not getting a piece of it. That would likely
change as we start to support more descriptor set types.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The general idea is that we always emit from our dynamic state, and when
a particular piece of state is not dynamic, we just set our dynamic state
from the pipeline state, however, the implementation was quite confusing:
the mask of dynamic states flagged states that were not dynamic and some
places woud mix dirty flags and dynamic state flags. We also were not
updating the dynamic state mask in the command buffer, etc.
This patch, hopefully, simplifies all this and makes it less confusing,
starting by making the dynamic state mask flag dynamic states, fixing
the places where we would confuse dirty state flags with dynamic state
flags, making sure that our command buffer state is setup correctly
and that we only emit state when it is actually dirty.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were not computing viewport transform for static viewports provided with
the pipeline state. Also, we were not copying the transform into the command
buffer state when we bound the pipeline.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Just as with color copies, we only support tile buffer copies for now,
so we can only copy regions that start at offset (0,0).
Because the hardware doesn't support storing depth/stencil buffers to
raster format, we need to load and store our data to/from the tile
buffer as a compatible color format instead.
The Vulkan spec also has specific expectations regarding placement
of X8/S8 bits in 24-bit depth formats which don't match the hardware's
so these formats require specific work so we can swizzle channels
to match the Vulkan spec.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is only used when doing a clif/cle dump, but makes it far easier
to understand.
Most names are the same that the ones used at v3d (CL, tile_alloc,
TDSA), except those that on v3d were labelled as "resource", as right
now we don't have a resource uploader that englobes different
things. In fact, the good thing of not having that uploader is that
individual bos has a more accurate description of their purpose.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Otherwise we would lose updates relevant to subsequent subpasses in
the same renderpass that read or partially write the attachment.
The only scenario where we can safely do this is on the last subpass
that uses the attachment, so long as we don't need to emit the store
for the clear.
This also fixes a bug in the computation of the first subpass that
uses an attachment.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This ignores stencil for now and focuses on depth testing without
support for early depth testing.
To implement this we need to start considering how many of our
framebuffer attachments are color attachments, since some of the
computations we use to determine tile sizes and binning configuration
depend on this.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We don't need to do this, since we are just copying. Also, we are not
swapping on the store, so doing it on the load would be incorrect.
This gets the prime blit present path in WSI common to render R/B
channels correctly after blitting from the BGRA image to the linear
buffer for display output.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
When running on the real hardware we have two devices: the v3d render
node and the vc4 display node. We need the latter to allocate
winsys BOs for v3d to render into. Since exporting these BOs is
a privileged operation, we need to obtain the fd for this device
through the display server. For now we only support doing this through
the XCB DRI3 platform.
Also, do not duplicate or re-open the DRM devices when creating logical
devices. The simulator checks that the file descriptor is exactly
the same we used to initialize it when we created the physical device
and aborts if it sees a different fd number, even if it points to the
same device.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Among other things, gets a constants output from a vs, used as input
to a fs, to get lowered and moved as a load const on the fs.
Heavily based on st_glsl_to_nir, already used by the v3d
driver. Slightly adapted to our needs, but there are still room for
customization.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
If either of bufferRowLength or bufferImageHeight are zero, then that
aspect alone of the image is tighly packed. We were assuming that if
either was zero both aspects were tightly packed, which is wrong.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This is so we can generate entry points for extensions that have been
promoted to core in 1.1.
Entry points for promoted extensions are aliased without the KHR suffix
in the Vulkan API XML, and the entry point generation scripts are designed
to point the dispatch tables to entry points generated from the non-aliased
function names, however, these are not included in the header file without
this change.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This includes:
* Implementation for CmdBindVertexBuffers
* Gather vertex input info during CreateGraphicsPipelines
(pipeline_init) and SHADER_STATE_ATTRIBUTE_RECORD prepacking
* Final emission of such packet during CmdDraw
(cmd_buffer_emit_graphics_pipeline)
Default attributes values will be handled on a following patch.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We can't prepack all the record, as addresses need the job, and
uniforms depend on dynamic value.
Also due cl_emit_with_prepacked and v3dv_pack asserting correct
values, we need to define two values twice, that lead to move
vpm_config to the pipeline. In any case, the latter will be useful
when we start to prepack more stuff.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So we do what we actually state in the comment. Particularly, the load
operation only affects the first subpass that uses the attachment,
after that we always want to load, but we were only doing that for
attachments marked as CLEAR.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Generally, we can do this when they render to the same collection of
attachments and we only need to emit a single RCL for them.
To implement this, we need to track the first subpass that is included
in the job and rewrite our loads and stores in the RCL to refer to that
subpass instead of the current subpass (which would be the last included
in the RCL).
When we merge jobs we also reuse the tile state/alloc BOs and we only
emit the binning setup once.
The environment variable V3DV_NO_MERGE_JOBS can be set to disable
job merging and have each subpass be in a separate job. This can be
useful for debugging issues spawning from incorrect subpass merges.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Binner flushes should be emitted naturally at the end of each draw,
if we are finishing a job and it doesn't have the binner flush, it
probably means that we have bogus emission code somewhere.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
As we make progress towards more complex submissions we will need to split
our command buffers into smaller executable units (jobs) that we can
submit indepdently to the kernel. This will be required to implement
pipeline barriers, split subpasses that have depedencies on previous
subpasses, split render passes that use more than 4 render targets, etc.
For now we keep things simple and we only keep one job as current
recording target in the command buffer, and we generate a new one
with every subpass or with any commands we see outside of a render pass
(only vkCopyImageToBuffer for now). In the future we probably want to
optimize this by merging subpasses into the same job when possible,
etc.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
For now we only support the TLB path, which limits us to copying
regions that start at offset (0,0). In the future, we will need to add
a fallback path that uses blitting to copy regions with an offset.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We will use this when we implement copying images to buffers using the
TLB, where we'll need to setup a framebuffer and tiling configuration
for the TLB store to the destination buffer.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Initial port of the equivalent v3d_write_uniforms, to be used by the
cmd_buffer when emitting the drawing packets.
Initially doesn't include all the quniform types, only those needed by
the initial basic vulkan tests.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Before that commit we were calling get_viewport_xform to get those
values twice (to emit scissor and viewport), and we found that we
would need that info even more times. So let's just compute that info
when setting the viewport, and reuse the values.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Add support for V3D_DEBUG=clif. Useful to compare clif_dumps from
vulkan small-programs and the equivalent opengl ones.
As we are here we expand clif_dump_packet wrapper to use
v3d42_clif_dump_packet if needed, as the vulkan driver would use that
packet version.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Values still doesn't take into account having vertex elements data,
but keeps some of that half-done code in comments. It would be better
to do that when we get an example using it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Starting with Viewport/Scissor data from VkGraphicsPipelineCreateInfo.
Note that initially this can be somewhat counter-intuitive. What we
are really doing it is filling up the structs with the dynamic stuff
from the pipeline, when such is not defined as dynamic. This is what
anv/radv does, and basically means that we treat both in the same way,
so easier after that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The basic to get the spirv built to nir, including calling some common
nir passes. Pending deep review if all those are needed or if we miss
some, but for that it would be better to be able to run existing
tests.
Enough to get assembly generated for simple tests.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We would need on OpenGL to update values for all the textures used. On
OpenGL that value can be always took from the context or the nir
shader, but there are cases on Vulkan that it is not the case, or
would force up to recompute it.
Acked-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The CLs in the command buffer will reference BOs allocated by the application
such as images and buffers, that will be destroyed by the application, so
destroying them with the command buffer won't be correct.
For now, let's just assume that the comman buffer only owns the BOs
it explicitly allocates and that other abstractions own their own
BOs and are responsible from freeing them.
In the future, we might consider a ref/unref system similar to v3d's, but
for now this should work.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So we have a lower level representation of a buffer object that we can
manipulate that is not tied to a Vulkan representation of memory. This
will be useful as we start allocating driver internal buffers, such as
command lists.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Right now there is not a big reason/difference to implement the
utilities present at v3d_debug for the vulkan driver, so lets just
reuse it.
The other advantage is that is the debug utilities used by common
parts of the driver, like broadcom/compiler
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This relies heavily in infrastructure taken from the v3d driver. We should
probably look for ways to share the code between both drivers by creating
a surface layout library that we can use from both, or at least moving
parts of the v3d driver to broadcom/common. Specifically:
We take v3d_tiling.c, which requires gallium's pipe_box type for some
helper functions that we don't quite need yet.
We copied and adapted bits of v3d_resource.c into v3dv_image.c, however,
it should be possible to look for ways to reuse the code instead of
duplicating it.
Pre-compute UIF padding into the slice setup. This is different from
what we do in v3d (we do this at cerate_surface time), but it is
more convenient for us to pre-calculate it here for all mipmap
slices.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
This takes a subset of vk_format_info.h from Anvil which has some
Intel specific elements. At some point we might want to discuss
if we want to make the file reusable and move the intel bits to
some other place, but it is not a lot of code and for now this works,
so we keep going.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
For now we are only interested in being able to include the header
file for format definitions, so this is enough. When we start actually
emitting packets we will need to provide proper hooks.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Not an actual implementation since this doesn't initialize any actual
physical devices just yet.
Also, this doesn't check that available decices are really compatible
with the driver for now. This is for convenience, so we can move
past this point even if we are not running on actual hardware.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Initial commit, mostly a import of the minimum from anv/radv to get a
skeleton to start to work with.
In includes:
* meson files
* Copy & adapt entrypoints ane extensions scripts from anv (that were
later used on radv)
This is a firt approach, but is is likely that we can remove/simplify
some things.
v2: fix copyright character at broadcom/vulkan/meson.build (Eric)
v3: no spaces inside arrays (Dylan)
v4: add gnu_symbol_visibility (detected by CI on first Merge attemp)
Reviewed-by: Eric Anholt <eric@anholt.net>
squash! v3dv: add v3d vulkan driver skeleton
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Fix defects reported by Coverity Scan.
Uninitialized pointer field (UNINIT_CTOR)
uninit_member: Non-static class member ctx is not initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member prog is not initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member shader_program is not initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member options is not initialized in this constructor nor in any functions that it calls.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7071>
Fix defects reported by Coverity Scan.
Uninitialized pointer field (UNINIT_CTOR)
uninit_member: Non-static class member field options.info is not
initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member field options.values is not
initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member field options.tableSize is not
initialized in this constructor nor in any functions that it calls.
Suggested-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6929>
We don't support SSBOs yet, as we don't expose the
PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT-cap. But the gallium
state-tracker's limit-calculation gets confused by this half-way
support, and ends up thinking we can support atomics, which we
don't support yet either.
So let's not confuse the state-tracker here, and let's introduce this
again when SSBOs are actually supported.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7113>
Significantly improves performance of a Control compute shader. Also seems
to increase FPS at the very start of the game by ~9% (RX 580, 1080p,
medium settings, no MSAA).
fossil-db (Navi):
Totals from 315 (0.23% of 135946) affected shaders:
SGPRs: 18296 -> 18336 (+0.22%); split: -0.26%, +0.48%
VGPRs: 11856 -> 11844 (-0.10%); split: -0.81%, +0.71%
CodeSize: 2233800 -> 2457508 (+10.01%)
MaxWaves: 4506 -> 4497 (-0.20%); split: +0.04%, -0.24%
Instrs: 438766 -> 486215 (+10.81%); split: -0.00%, +10.81%
Cycles: 7880180 -> 8963340 (+13.75%); split: -0.00%, +13.75%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558>
locations are important for these because they provide info about how
many block indices each ubo takes up
UBO arrays have nonzero values here. all non-array UBOs have either 0
for the base or nonzero for an io lowered block at an offset,
but only arrays need to be changed here because they're the only ones
with absolute values, whereas all the others are relative.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6272>
while linking uniforms, we might get a variable which is the only reference
to the ubo (i.e., offset 0), as determined by its type being the UBO's
interface_type, at which point we can assign the previously-gotten
block index to this variable's location
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5831>
With arrays we really have to use the correct size for the base
mipmap to get the right array pitch. In particular, using
surf_pitch results in pitch that is bigger than the base mipmap
and hence results in wrong pitches computed by the HW.
It seems that on GFX9 this has mostly been hidden by the epitch
provided in the descriptor but this is not something we do on
GFX10 anymore.
Now this has some draw-backs:
1. normalized coordinates don't work
2. Bounds checking uses slightly bigger bounds.
2 mostly is not an issue as we still ensure that they're within
the texture memory and not overlapping other layers/mips, but
we can't properly ignore writes.
1 is kinda dead in the water ... On the other hand I'd argue that
using normalized coords & a filter for sampling a block view of
a compressed format is extraordinarily useless.
The old method we employed already had these drawbacks for everything
except the base miplevel of the imageview.
AFAICT this is the same tradeoff AMDVLK makes and no CTS test hits
this. (once it does I think the HW is dead in the water ... Only
workaround I can think of is shader processing which is hard because
we don't know texture formats at compile time.)
I also removed the extra calculations when the image has only 1 mip
level because they ended up being a no-op in that case.
CC: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2292
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2266
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2483
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2906
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3607
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7090>
nir_lower_io_arrays_to_elements lowers arrays or matrices to elements,
which ends up to vectors for matrices, but a bunch of IO optimizations
only work for scalars.
Calling it before lower_io_to_scalar_early allows nir_link_opt_varyings
to remove duplicated inputs and replace constant inputs.
fossils-db (Navi10):
Totals from 294 (0.22% of 136546) affected shaders:
CodeSize: 861356 -> 860224 (-0.13%); split: -0.13%, +0.00%
Instrs: 161972 -> 161832 (-0.09%); split: -0.09%, +0.00%
Cycles: 1185680 -> 1185120 (-0.05%); split: -0.05%, +0.00%
SMEM: 31422 -> 31424 (+0.01%)
Copies: 9065 -> 9068 (+0.03%)
Only Talos and Dark Souls 3 are affected.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7041>
With us not creating a bo_list anymore, there is a problem if we
delete a buffer between enumerating all buffers and doing the submission.
Also changes this to a rwlock given the wider scope of the things under
lock. (especialy some of the syncobj stuff is now under the lock)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7091>
Android porting of gen rules as per 22ffc05266 ("util: Move xxd.py to util")
Fixes the following building error:
ninja: error: 'external/mesa/src/compiler/glsl/xxd.py', needed by 'out/target/product/x86_64/gen/STATIC_LIBRARIES/libmesa_glsl_intermediates/glsl/float64_glsl.h', missing and no known rule to make it
Fixes: 22ffc05266 ("util: Move xxd.py to util")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7087>
Since 1e7d82c881 ("nir/algebraic: always lower idiv
to shifts if bitops are allowed") idiv is lowered and
generates a isign operation.
VC4 HW doesn't support isign and lower_isign wasn't enabled.
Enabling it fixes the regressions caused by this new
optimization on piglit tests shaders/glsl-fs-loop-nested.
Fixes: 1e7d82c881 ("nir/algebraic: always lower idiv to shifts if bitops are allowed")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7089>
Before 8e1b75b330 ("nir/algebraic: optimize iand/ior of (n)eq zero") this
optimization didn't need the use of umax/umin. VC4 HW supports only signed
integer max/min operations.
lower_umin and lower_umax are added to allow enabling previous optimizations
behaviour for this cases.
Fixes: 8e1b75b330 ("nir/algebraic: optimize iand/ior of (n)eq zero")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7083>
- don't test 2 waves/SA
- create the compute shader only once per subtest
- use only 1 TIME_ELAPSED query per subtest
- don't invalidate sL0 (it's not used)
- don't invalidate L2 for L2_LRU to test L2 throughput
- don't flush the CS after every run
- remove unused min/max computation
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7055>
On some systems it is problematic to have the shader cache enabled
by default. This adds a build option to support the disk cache but
keep it disabled unless the environment variable
MESA_GLSL_CACHE_DISABLE=false.
For example, on Chrome OS, Chrome already has it's own shader
disk cache implementation so it disables the mesa feature. Tests
do not want the shader disk cache enabled because it can cause
inconsistent performance results and the default 1GB for the
disk cache could lead to problems that require more effort to
work around. The Mesa shader disk cache is useful for VMs though,
where it is easy to configure the feature with environment
variables. With the current version of Mesa, Chrome OS would need
to have a system-wide environment variable to disable the disk
cache everywhere except where needed. More elegant to just build
Mesa with the cache feature disabled by default.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6967>
Right now we create shaders that are not attached to any memory
context, leading to memory leaks. Ideally, we should free the NIR
shader as soon as we've turned it into a binary, but there's no
function explicitly destroy a shader. Let's attach those to the blend
state so they get destroyed when this state is freed.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7066>
Merged shaders have a workgroup barrier which makes sure that
the first half is completed in every wave before the 2nd half
is started.
This barrier is located in divergent control flow, so that waves
that don't have any invocations in the 2nd half can finish as early
as possible. This is problematic for NGG GS because it has more
workgroup barriers after the 2nd half.
So, for NGG GS we need to put the barrier outside
control flow because otherwise the waves that have 0 GS threads
won't be able to wait for the waves which have non-zero GS threads.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
We store emitted GS vertices in LDS.
Then, at the end of the shader, the emitted vertices are compacted
and each thread loads a single vertex from LDS in order to export
a primitive as needed, and the vertex attributes.
The reason this is done is because there is an impedance mismatch
between how API GS and the NGG HW works. API GS can emit an arbitrary
number of vertices and primites in each thread, but NGG HW can only
export one vertex per thread.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
After each end_primitive and at the end of the shader before emitting
set_vertex_and_primitive_count, we check if the primitive that is being
emitted has enough vertices or not, and we adjust the vertex and
primitive counters accordingly.
As a result, if the backend uses this option, the backend compiler
will not have to worry about discarding the unneeded vertices
and primitives.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
And document where to find information on qcom gralloc's private handle
layout. I chose not to #include the gralloc_priv because it seems that
there's not much we need yet, and I'm hoping we can avoid the build-time
dependency on the specific platform.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7015>
This reverts commit bcfec61d1e.
The previous patch fixed the underlying issue that the above commit was
actually working around. It turns out that the previously observed
performance regression was due to invalid aux-map entries for
multi-layer HiZ+CCS buffers.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7046>
Fixes rendering corruption in the shadowmappingcascade Sascha Willems
Vulkan demo. To see the corruption, I adjusted the demo options as
follows:
1. Enable "Display depth map"
2. Set "Split lambda" to 0.100
3. Make "Cascade" non-zero.
Fixes: 80ffbe915f ("anv: Add support for HiZ+CCS")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7046>
Shaders may not use a particular region of a UBO in a given shader (think
UBOs shared between stages, or between shaders), and by just always
extending the existing range for a given UBO, we'd waste bandwidth
uploading it, and also waste our precious const space in storing the
unused data.
Instead, only upload exactly the ranges we can use, and merge ranges when
they're neighbors. We may end up with more upload packets, but the
bandwidth savings is surely going to be worth it (and if find we want a
distance threshold for merging with nearby uploads, that would be easy to
add).
total instructions in shared programs: 9266114 -> 9255092 (-0.12%)
total full in shared programs: 343162 -> 341709 (-0.42%)
total constlen in shared programs: 1454368 -> 1275236 (-12.32%)
total cat6 in shared programs: 93073 -> 82589 (-11.26%)
total (ss) in shared programs: 212402 -> 206404 (-2.82%)
total (sy) in shared programs: 122905 -> 114007 (-7.24%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7036>
In 53bfcdeecf, we added load/store_scratch instructions which deviate
a little bit from most memory load/store instructions in that we can't
use the normal untyped read/write instructions which can read and write
up to a vec4 at a time. Instead, we have to use the DWORD scattered
read/write instructions which are scalar. To handle this, we added code
to brw_nir_lower_mem_access_bit_sizes to cause them to be scalarized.
However, one case was missing: the load-as-larger-vector case. In this
case, we take small bit-sized constant-offset loads replace it with a
32-bit load and shuffle the result around as needed.
For scratch, this case is much trickier to get right because it often
emits vec2 or wider which we would then have to lower again. We did
this for other load and store ops because, for lower bit-sizes we have
to scalarize thanks to the byte scattered read/write instructions being
scalar. However, for scratch we're not losing as much because we can't
vectorize 32-bit loads and stores either. It's easier to just disallow
it whenever we have to scalarize.
Fixes: 53bfcdeecf "intel/fs: Implement the new load/store_scratch..."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6872>
This adds support for building clover/llvmpipe and running the
piglit CL tests on it.
It uses the gl testing container, and add builds the libclc
spirv libraries as part of that which requires the llvm spirv
translator in the build container.
It also builds the llvm spirv translator as part of the build
root and creates a mesa build that builds clover for testing
against it. It uses llvm 10 as the baseline.
This drops bswap as it has an oob memory access with llvmpipe
which cause flaky test results. phatk also seems flaky
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6901>
This commit enables clover support for iris. It is intended as a
compiler developer tool and not as a new OpenCL implementation from
Intel. If you want competent OpenCL, we have a different open-source
driver for that built on our LLVM-based IGC compiler stack. However,
using clover with iris is becoming increasingly useful as a compiler
development tool and I'm getting tired of carrying the patches in a
private branch.
By default, clover will not initialize on iris. To enable clover, set
the IRIS_ENABLE_CLOVER environment variable to "1" or "true". As we've
done with the semi-sketchy platform support in ANV, it dumps a very loud
WARNING to stderr when enabled. Use at your own risk.
NOTE: To anyone intending to benchmark this, the performance is going to
be terrible and that is expected. This is in no way representative of
the Intel/NIR compiler stack. As it currently stands, clover passes
-O0 to clang when compiling OpenCL C to make SPIRV-LLVM-Transator work.
When compiling the SPIR-V, clover currently doesn't run any NIR
optimizations before it lowers memory access so any NIR optimizations
iris attempts to do are severely hampered. One day, clover will get a
NIR optimization loop or the ability to hand things off to the driver
per-lowering but today is not that day.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7047>
To ask to debug a registr allocation failure
(V3D_DEBUG_REGISTER_ALLOCATION seemed too long to me).
When a fallback register allocation algorithm was added, if the
register allocation fails, it only dumpg the current vir with the
register pressure info with the failed fallback. But if we want do
debug the problem, we would be interested on both.
Additionally, it was strange that we got the full vir dump with the
failure even if no debug option was set.
Additionally we add shaderdb like stats for those failures, to make
easier to compare one and the other.
v2: keep a small warning message in case both register allocation
algorithms fails (Neil)
Reviewed-by: Neil Roberts <nroberts@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6999>
This is needed due Vulkan because by spec (31.1. Limit Requirements)
the minimum value for the following limits are the following ones:
maxPerStageDescriptorSampledImages 16
maxPerStageDescriptorStorageImages 4
maxPerStageDescriptorInputAttachments 4
And we are using v3d textures for all of them, so current limit would
not be enough for some cases.
Note that as the current comment explains there is not exactly a HW
limit for it, so we could bump to 32 for example, but let's just be
conservative and ask the minimum required.
It is worth to note that we needed to maintain the same value for the
OpenGL case, as it gets a register allocation failure on some GL
cases. We tried to fix that with small changes on the nir scheduler,
but we found that it would require some non-trivial effort to get it
done (that eventually we would need to).
Fixes tests like:
dEQP-VK.binding_model.descriptorset_random.sets16.constant.ubolimitlow.sbolimitlow.imglimitlow.noiub.uab.comp.noia.0
v2: keep the previous limit for Opengl (Eric)
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6999>
The fixed-function blend logic uses the following equation: A + B x C.
A, B and C are configurable and can be complemented with negation (for
A and B) or inversion (for C) modifiers. Let's rework the blending
code to take that into account.
Note that we need to update the checksum of a few traces because the
equations we use have changed, leading to small deviations on the
final images. Indeed, there are several valid options for a given GL
blend equation, but the operand selection probably has an impact on the
rounding, leading to those mismatch.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6980>
pipe_screens that are created used to be dedublicated directly based on the gpu device file.
If a process created two of them, for example by opening the etnaviv render node twice
and calling `gbm_create_device`, or by opening two card nodes which used kmsro, or any
combination of these things, then for any but the first instance, the gem handles created
for it would be for the first one instead of the intended one, due to them being created
using the same pipe_screen, and buffers allocated for kmsro devices would be allocated using
the wrong file description as well. This can lead to various problems, such as a proccess not
being able to use two cards which use kmsro at the same time, for example.
This patch changes the dedublication to be done based on the gpu device description
rather than based on the gpu device file. This will solve the above mentioned problem,
but there will now be only a few cases in which anything is dedublicated at all.
Signed-off-by: Daniel Abrecht <public@danielabrecht.ch>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6790>
next_regs decoding is wrong for the first and last instructions in a
clause:
- the first instruction has its destination encoded in the second reg
block
- the last instruction has its destination encoded in the first reg block
(things wrap around)
So, only the last instruction should pass first=true when decoding
next_regs. Fix that by passing the is_last_instruction information
instead of is_first_instruction to the disasm helpers.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7040>
We can't read information from the loaded shared object because we have
different objects for Vulkan and OpenGL drivers, but we need to share
the same UUID for both.
Hence let's use SHA1 from the Git commit and package version.
v2: use also package version for the case of building from tarball (Eric)
v3: fix typos in comment (Tapani)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Rohan Garg <rohan.garg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7025>
On Fallout4, enabling HIZ_CCS_WT compression for D16_UNORM format
regress the performance by 2%, in order to avoid that disable
compression via driconf option.
The experiment showed that, running Fallout4 with HIZ performs better
than HIZ_CCS and HIZ_CCS_WT. Reason behind that is the benchmark uses
the depth pass with D16_UNORM surfaces format which fills the L3 cache
and next pass doesn't make use of it where we end up clearing cache.
v2:
- Don't add conditional check in isl (Nanley, Jason)
- Move disable_d16unorm_compression flag to instance (Lionel)
- Use plane_format.isl_format (Nanley)
v3:
- Add more descriptive comment (Marcin Ślusarz)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6734>
I distinctly remember testing this during development, yet clearly deleted
the code.
I pulled the parseValue code back up to the top where it used to be since
we have to use it, rather than just adding a prototype.
Fixes: 8a05d6ffc6 ("driconf: Make the driver's declarations be structs instead of XML.")
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7012>
When support for multi-slice fast-clears was introduced for color
surfaces, an existing optimization for skipping fast-clears was not
updated (this optimization assumed single-slice fast-clears). As a
result, the driver began to skip multi-layer fast-clears if just the
first slice was in the CLEAR state (ignoring the state of the others).
A Civilization VI trace was the only workload I found to make use of
this optimization and it did so for 2D, non-array textures. Therefore,
this fix simply checks that the depth of the clear box is 1. It also
moves the single-slice aux-state query closer to the optimization to
clarify the need for the depth check.
Enables iris to pass a case of the fcc-write-after-clear piglit test,
[fast-clear tracking across layers 0 -> 1 -> (0,1)].
Fixes: 393f659ed8 ("iris: Enable fast clears on other miplevels and layers than 0.")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6973>
a lot of this is just rejiggering code to allow timestamp queries to
take the same codepaths as "normal" queries instead of even more explicitly
special casing everything
key point here is that we need to convert vulkan-level timestamp "ticks" to
nanoseconds like the gallium api expects
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6923>
Things work a little different on Gen7 than they do on Gen8+. In
particular, SOBufferEnable lives in 3DSTATE_STREAMOUT but BufferPitch
lives in 3DSTATE_SO_BUFFER. This leaves us having to marshal data
around a bit more than we did on Gen8. Still, it's not too bad.
Normally, I don't spend much time on Gen7 but XFB just became a hard
requirement for DXVK so it stopped working for all our Haswell users.
Let's get them happily playing their games again. 😸
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3532
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6997>
(v2) bifrost_gen_disasm.c generated source belongs to libpanfrost_bifrost_disasm
Fixes the following build errors, which happen with Android P, but not with Android Q
FAILED: out/target/product/x86_64/obj_x86/SHARED_LIBRARIES/gallium_dri_intermediates/LINKED/gallium_dri.so
...
external/mesa/src/panfrost/bifrost/disassemble.c:678: error: undefined reference to 'bi_disasm_fma'
external/mesa/src/panfrost/bifrost/disassemble.c:679: error: undefined reference to 'bi_disasm_add'
Fixes: 792b51713 ("android: pan/bi: Use new disassembler")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6996>
When a texture is mapped with the DISCARD_WHOLE_RESOURCE flag set,
v3d_map_usage_prep will try to allocate a new buffer for the resource.
Previously, if the resource was used in a bound texture then nothing
would cause it to update the sampler view with the offset for the new
buffer. This commit just adds that in by looking at all sampler views
and calling v3d_create_texture_shader_state_bo for each one that
references this resource.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6914>
This extracts the part of v3d_create_sampler_view that creates and fills
in the buffer for the TEXTURE_SHADER_STATE record into a helper
function. This will be used in a later patch to update the record when
the information changes.
v2: Also put the part that creates the buffer into the helper function
so that it won’t override the contents of an inflight buffer.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6914>
This moves the static function v3d_dirty_tex_state_for_shader_type from
v3dx_state.c to v3d_context.c and adds a declaration for it in the
header so that it can be used as a general utility function in a later
patch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6914>
Fix defects reported by Coverity Scan.
uninit_member: Non-static class member insns is not initialized in this
constructor nor in any functions that it calls.
uninit_member: Non-static class member clipVertexOutput is not
initialized in this constructor nor in any functions that it calls.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6769>
The android platform is not interested in this feature of Mesa. There are
currently workarounds for apps on Android, and no support for it in the
xmlconfig code. Even if there we do need workarounds eventually, we'll
want to bake them in as structs rather than have this awkward external
dependency for parsing user-readable data installed by Mesa for
Mesa-internal details.
This gets rid of the expat dependency in the turnip driver.
Note that rather than have more #ifdefs in the file, I've opted to move
the code to have more logical locations since the structs refactor had
left less-used helpers scattered across the file.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6916>
We can generate the XML if anybody actually queries it, but this reduces
the amount of work in driver setup and means that we'll be able to support
driconf option queries on Android without libexpat.
This updates the driconf interface struct version for i965, i915, and
radeon to use the new getXml entrypoint to call the on-demand xml
generation. Note that our loaders (egl, glx) implement the v2 function
interface and don't use .xml when that's set, and the X server doesn't use
this interface at all.
XML generation tested on iris and i965 using adriconf
Acked-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6916>
The only user was radeon/r200, which was using it to have something that
looks a lot like an enum value return a float from the config option.
Just convert that option to a plain float value (for compat with existing
driconfs) with the min and max of its disjoint range as the range. The
driver's option handling code already correctly deals with other values in
the range.
The disjoint range support was a bunch of extra parsing for this dead
driver, and made turning driconf into static structs difficult.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6916>
When I copied and pasted the code from MOV_INDIRECT for handling the
dependency controls, I missed a subtle difference between MOV_INDIRECT
and SHUFFLE. Specifically, MOV_INDIRECT gets lowered to a narrow
instruction on Gen7 by the SIMD width lowering whereas SHUFFLE has to
split it in the generator. Therefore, the check safety check for
whether or not we can use dependency control has to be based on the
lowered width rather than the width of the original instruction.
Fixes: a8ac61b0ee "intel/fs: NoMask initialize the address..."
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3593
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6989>
Fixes the following building errors:
external/mesa/src/intel/vulkan/anv_android.c:627: error: undefined reference to 'mesa_log'
...
external/mesa/src/intel/vulkan/anv_device.c:164: error: undefined reference to 'mesa_log'
Fixes: 13ea7db76 ("mesa: Promote Intel's simple logging façade for Android to util/")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6972>
The volatile pattern gives me flaky results for 32-bit builds on
ChromeOS Android. This is because on 32-bit the volatile 64-bit
loads gets split into 2 32-bit loads each.
So if we read the lower dword first and then the upper dword, it
can happen that the upper dword is already changed but the lower
dword isn't yet. In particular for occlusion queries this gives
false readings, as the upper dword commonly only constains the
ready bit.
With the GCC atomic intrinsics we get a call to __atomic_load_8
in libatomic.so which does the right thing.
An alternative fix would be to explicitly split the 32-bit loads
in the right order and do a bunch of retries if things change, though
that gets messy quickly and for 32-bit builds only doesn't feel worth
it that much.
CC: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6933>
var->data.binding is only set for vulkan drivers (though it also will
get incremented after nir_lower_uniforms_to_ubo), so we have to generate
our own values here.
to do this, we iterate backwards over the ubos to account for the
"first" ubos being at the end of the list, and we must also ensure that we
remap the buffer index correctly based on whether we're running our
nir_lower_uniforms_to_ubo pass
note that running nir_lower_uniforms_to_ubo unconditionally would require
us to add a number of checks in this patch for !shader->num_uniforms in
order to properly adjust to the altered instructions which become 1-indexed
instead of 0-indexed when this pass is run with no uniforms present
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6981>
if we're creating a block containing multiple variables, we want to create
the whole block at once, not just each individual variable, as we have no
way to reference individual variables in vulkan due to the requirement
for VkDescriptorSetLayoutBinding members to have different binding values
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6981>
this extension allows for array nesting with no clear limitations, so we need
to ensure that we use the "unrolled" array size in a couple places in order to
correctly bind and access these types of arrays in shaders
fixes spec@arb_separate_shader_objects@active sampler conflict and others
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6981>
LLVM loves take advantage of the fact that vec3s in OpenCL are 16B
aligned and so it can just read/write them as vec4s. This results in a
LOT of vec4->vec3 casts on loads and stores. One solution to this
problem is to get rid of all vec3 variables.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6871>
LLVM loves take advantage of the fact that vec3s in OpenCL are 16B
aligned so it can just read/write them as vec4s. This is questionably
legal except that it uses a xyz write-mask when it does it. The result
is a LOT of vec4->vec3 casts on loads and stores. This optimization
detects this case as well as other bit-cast cases and rewrites them to
get rid of the cast.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6871>
These are based on the ones which already existed in the load/store
vectorization pass but I made some improvements while moving them. In
particular,
1. They're both faster if the bit sizes are equal
2. The check is faster if old_bit_size > new_bit_size
3. The check now fails if it would use more than NIR_MAX_VEC_COMPONENTS
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6871>
This pass attempts to optimize three broad categories of memcpy:
1. Self-copies: These we can discard out-of-hand.
2. Vector copies: It doesn't matter what the vector size is or if the
source and destination have different vector types, it's still easy
enough to emit a load/store pair.
3. Tightly packed copies: In the case where a type is tightly packed
(no padding bits), we can replace the memcpy with a copy_deref
instruction which the optimizer is far better at handling.
This has proven capable of getting rid of many of the memcpy instances
in some rather gnarly OpenCL C kernels I've been looking at, even after
coming out of LLVM's optimizer.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6871>
In 9f3c595dfc, we attempted to handle casts in opt_find_array_copies
but missed a critical case. In particular, in the case where we begin
finding a copy but then encounter a cast, we need to discard everything
which might alias that cast.
Fixes: 9f3c595dfc "nir/find_array_copies: Handle cast derefs"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6871>
When I originally wrote this code, I forgot to release the views the
code creates, leaking a bit of memory that never gets cleaned up. That's
not great, so let's plug it.
Fixes: e8a40715a8 ("gallium/util: add blitter-support for stencil-fallback")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6960>
v2:
* Use sub_cs when creating the IB in tu6_build_lrz(). (Jonathan Marek)
* Emit tu6_build_lrz() only when pipeline state changes or there is a
clear. (Jonathan Marek)
v3:
* Don't modify tu_pipeline object, track the changes in command buffer
state.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5146>
v2:
* Don't emit tu6_clear_lrz() using a IB but in the command stream
provided. (Jonathan Marek)
* Valid_clear_ib is always false if TU_DEBUG_NOLRZ is set. Remove the
useless condition. (Jonathan Marek)
* Added more comments.
* Use r2d function for blitting LRZ. (Jonathan Marek)
v3:
* Do LRZ tracking in the command buffer state (Connor).
v4:
* Simplify the emission of source setup (Jonathan Marek)
v5:
* Separate LRZ setup in a different function.
* Not hide LRZ setup inside GMEM path (Jonathan Marek)
* Fix iova address emission in tu6_clear_lrz() (Jonathan Marek)
* Add CCU sysmem flushes (Jonathan Marek)
v6:
* Fixed bug related to storing a VkClearValue pointer that could be
out-of-scope when we access to it for emitting LRZ clear.
v7:
* Merge tu6_clear_lrz() and tu6_clear_lrz_setup() into the same
function and emit LRZ clear at the beginning of the renderpass.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5146>
There are depth compare op modes that are not supported by LRZ in the
HW. Also, it is not supported when blend or stencil are enabled.
v2:
* Set pipeline->lrz.write to the same value than depthWriteEnable.
* Improve comment on disabling LRZ write on blend.
* Remove pipeline's lrz invalidation when there is no clear mask in
render pass. It is confusing. (Jonathan Marek)
* Mark the pipeline state as changed.
* Add comment on not using GREATER flag.
v3:
* Replace {rb,gras}_lrz_cntl by flags in struct tu_pipeline.
* Added z_test_enable flag.
v4:
* Created struct tu_lrz_pipeline to avoid modifying immutable objects.
v5:
* Fixed crashes when pDepthStencilState pointer is NULL.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5146>
v2:
- Add missing vulkan subpass support. (Jonathan Marek)
- When creating the BO, mark it as not valid until it is cleared.
- Move LRZ struct to tu_image. (Jonathan Marek)
- Destroy BO when we destroy the image. (Jonathan Marek)
v3:
- Allocate the buffer as part of the image's BO (Connor)
- Moved image's LRZ values to its layout.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5146>
The poweron failure happens before we get to the bootloader
("load_archive: loading locale_en.bin") not after we're trying to boot the
kernel and we're waiting for the deqp run to complete.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6970>
The linker was adding all state vars as uniforms, doubling the storage size
for shaders using only builtin uniforms, which increased CPU overhead for
constant buffer uploads.
When this code was originally ported from the GLSL IR linker we forgot
to exclude builtins because the check was not done in the
add_uniform_to_shader class but rather a check was done when passing
variables to this class for processing.
Fixes: 664e4a610d ("glsl/nir: Fill in the Parameters in NIR linker")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Tested-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6958>
With blob resources, stride doesn't necesarily have to
equal width * bpp. The use case for this a minigbm blob
resource with blob mem BLOB_MEM_HOST3D_GUEST imported into
guest Mesa. In addition, for BLOB_MEM_HOST we can repurpose
the transfer ioctls to also flush caches if need be, so this
seems a good time to fix this issue.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4821>
We should have GL4.5 with this. Piglit tests should now pass.
In terms of performance, we're between 70% to 80% of host
performance on Iris, based on a apitrace of a 2013 GL4.5
game:
11.204 FPS (guest)
15.947 FPS (host)
This is still better than the status quo, when said game was unplayable
with Virgl due to an inefficient GL4.3 fallback.
TEST=piglit -t arb_buffer_storage all results/ passes
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4821>
A blob resource is a container for:
- VIRTGPU_BLOB_MEM_GUEST: a guest memory allocation
(referred to as a "guest-only blob resource")
- VIRTGPU_BLOB_MEM_HOST3D: a host3d memory allocation
(referred to as a "host-only blob resource")
- VIRTGPU_BLOB_MEM_HOST3D_GUEST: a guest + host3d memory allocation
(referred to as a "default blob resource").
Blob resources can be used to implement new features and fix shortcomings
with the current resource create path. The subsequent patches how
blob resources may be leveraged to implement GL_ARB_buffer_storage
and get GL4.5.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4821>
This reverts commit 4fb2eddfdf.
This reverts commit 7a1deb16f8.
This reverts commit 2b6a172343.
This reverts commit 5af81393e4.
This reverts commit 87900afe5b.
A couple of problems were discovered after this series was merged that
cause breakage in different configurations:
(1) It seems that using -mf16c also enables AVX, leading to SIGILL on
platforms that do not support AVX.
(2) Since clang only warns about unknown flags, and as I understand
it Meson's handling in cc.has_argument() is broken, the F16C code is
wrongly enabled when clang is used, even for example on ARM, leading
to a compilation error.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3583
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6969>
Android.mk and Makefile.sources are still defining virgl_driinfo.h target
This patch removes the remaining gen rules
Fixes the following building error:
FAILED: out/target/product/x86_64/obj/STATIC_LIBRARIES/libmesa_pipe_virgl_intermediates/virgl/virgl_driinfo.h
...
cp: bad 'out/target/product/x86_64/gen/STATIC_LIBRARIES/libmesa_pipe_virgl_intermediates/virgl/virgl_driinfo.h': No such file or directory
Fixes: 974981c4e6 ("gallium/drm: Make the pipe loader handle the driconf merging.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6880>
Android.mk and Makefile.sources are still defining si_driinfo.h target
This patch removes the remaining gen rules
Fixes the following building error:
FAILED: out/target/product/x86_64/obj/STATIC_LIBRARIES/libmesa_pipe_radeonsi_intermediates/radeonsi/si_driinfo.h
...
cp: bad 'out/target/product/x86_64/gen/STATIC_LIBRARIES/libmesa_pipe_radeonsi_intermediates/radeonsi/si_driinfo.h': No such file or directory
Fixes: 974981c4e6 ("gallium/drm: Make the pipe loader handle the driconf merging.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6880>
Android.mk and Makefile.sources are still defining iris_driinfo.h target
This patch removes the remaining gen rules
Fixes the following building error:
FAILED: out/target/product/x86_64/obj/STATIC_LIBRARIES/libmesa_pipe_iris_intermediates/iris/iris_driinfo.h
...
cp: bad 'out/target/product/x86_64/gen/STATIC_LIBRARIES/libmesa_pipe_iris_intermediates/iris/iris_driinfo.h': No such file or directory
Fixes: 974981c4e6 ("gallium/drm: Make the pipe loader handle the driconf merging.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6880>
Note, the aligned versions aren't handled specially yet.
The float16buffer capability is now at least partially supported after
this patch, so move it to be supported when kernels are supported.
v2 (Jason Ekstrand):
- A few cosmetic cleanups around type/base_type
- Rebased on top of the big SPIR-V SSA value rework
- Use the new version of the conversion helpers
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6945>
This adds primarily two passes: One is a lowering pass which turns
these conversion intrinsics into a series of ALU ops. The other is an
optimization pass which attempt to simplify the conversion whenever
possible in the hopes that we can turn it into a "normal" conversion op
which doesn't need special treatment.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6945>
Most of these were originally written by Daniel Stone in the Microsoft
ClOn12 branch, reworked by Jesse Natalie, fixed by Boris Brezillon, and
possibly touched by others along the way. Unfortunately, none of that
is in the commit history thanks to living in the CLOn12 branch.
I ported them to mesa master and further reworked things for better
cosmetics. In particular,
1. They now live in a builder helper rather than in vtn_alu.c.
2. Instead of looping inside each builder helper, we just trust NIR
vector instructions to handle vectors.
3. Lots of re-arranging of the helpers for clarity, better asserting,
and better re-use with the upcoming lowering pass.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6945>
This new intrinsic is capable of handling the full range of conversions
from OpenCL including rounding modes and possible saturation. The
intention is that we'll emit this intrinsic directly from spirv_to_nir
and then lower it to ALU ops later.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6945>
We're about to introduce conversion ops which are going to want two
different types. We may as well just split the one we have rather than
end up with three. There are a couple places where this is mildly
inconvenient but most of the time I find it to actually be nicer.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6945>
Workaround # 22011374674
Applied to i965, iris and anv drivers
No performance impact is observed with WA.
Cc: mesa-stable
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a mostly cosmetic change but there is one subtle functional
issue: If we ever render to a 3D depth image, we are now handling the
base layer and number of layers correctly. I'm not sure rendering to 3D
depth is even allowed but we can theoretically handle it now.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6549>
Now that we're enabling HiZ on multi-layer images, there's no reason why
we can't enable HiZ clears for multi-view. The only reason I can think
of why we didn't before was because no one thought to and the old code
didn't. Enabling this means that an attachment will get HiZ cleared if
and only if att_state->fast_clear.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6549>
So far, the callback to create a resource from a memory object had code
for importing textures only. Modified it to allow importing buffers too.
Fixes the following piglit tests:
- ext_external_objects/vk-buf-exchange
- ext_external_objects/vk-pix-buf-update-errors
- ext_external_objects/vk-vert-buf-update-errors
- ext_external_objects/vk-vert-buf-reuse
v2: Used si_alloc_buffer_struct instead of CALLOC
v3: Fixed indentation issue, removed free in case of unsuccessful
allocation, joined two if conditions together
Signed-off-by: Eleni Maria Stea <estea@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6364>
After disable SDMA on Arcturus(gfx9), dead lock with aux_context_lock is
detected since si_screen_clear_buffer is called recursively before
release lock.
The call trace is:
si_clear_render_target->si_compute_clear_render_target->
si_launch_grid_internal->si_launch_grid->si_emit_cache_flush->
si_prim_discard_signal_next_compute_ib_start->u_suballocator_alloc->
si_resource_create->si_buffer_create->si_alloc_resource->
si_screen_clear_buffer->simple_mtx_lock->
si_sdma_clear_buffer->si_pipe_clear_buffer->
si_clear_buffer->si_compute_do_clear_or_copy->
si_launch_grid_internal->si_launch_grid->si_emit_cache_flush->
si_prim_discard_signal_next_compute_ib_start->u_suballocator_alloc->
si_resource_create->si_buffer_create->si_alloc_resource->
si_screen_clear_buffer->simple_mtx_lock
Fixes: 07a49bf597 "radeonsi: disable SDMA on gfx9"
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6941>
This will merge loads of UBO components together into vec4 loads. At the
same time, it improves the alignment information on our loads, fixing the
regression from the vec3 loads fix.
shader-db results:
total instructions in shared programs: 12829370 -> 8755851 (-31.75%)
total cat6 in shared programs: 145840 -> 97027 (-33.47%)
Overall results from before the vec3 fix:
total instructions in shared programs: 8019997 -> 8755851 (9.18%)
total cat6 in shared programs: 87683 -> 97027 (10.66%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6612>
It turns out I had missed a case in my enumeration of why everything
currently was vec4-aligned.
Fixes a simple testcase of loading from a vec3[2] array in freedreno with
IR3_SHADER_DEBUG=nouboopt.
Initial shader-db results look devastating:
total instructions in shared programs: 8019997 -> 12829370 (59.97%)
total cat6 in shared programs: 87683 -> 145840 (66.33%)
Hopefully this will recover once we introduce the i/o vectorizer, but that
was blocked on getting the vec3 case fixed.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6612>
nir_lower_to_explicit_io will give us good alignments if we have the
cast's alignment information known, and it's trivial: Just the offset of
the UBO variable that is at the base of the deref. Otherwise, explicit io
assumes the load is aligned just to the size of a scalar value in it.
The change in freedreno is in the noise.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6612>
The lowering pass may introduce vector bcsels that we need to scalarize
back out. It's unusual to have UBOs and not get any lowered to push
constants, so the flag was usually set anyway.
Fixes: 2b25240993 ("freedreno/ir3: Replace our custom vec4 UBO intrinsic
with the shared lowering.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6612>
When we come in from some other level or from the parent, we need to
ensure that the reach set is in prev_frontier but we also need to
consider the dominance frontier of our level. Otherwise, we may end up
leaving out possible blocks when computing the reach of a level.
Acked-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6750>
There are two issues here:
1. If there are any phi nodes, we'll make complete hash of them. This
isn't likely actually a problem because spirv_to_nir doesn't
generate any actual phi nodes today. However, if we start doing any
other passes before this, we may have a problem.
2. Even without phi nodes, we may still break SSA form. This can
happen if we ever have to stick a block inside a conditional to
satisfy weird CFG constraints. Doing so can cause it to no longer
look like it dominates some of its uses even though, at runtime,
it's guaranteed to be run first.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6750>
This commit adds a number of new validation checks:
1. We now check that every block pointer in the IR points to a block
that actually exists in a block list that's reachable from the
nir_function_impl.
2. We assert that nir_function_impl::body is non-empty
3. We assert that the start block has no predecessors. This is
important because we tend to put run-once code there.
4. We now validate some stuff on the end block.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6750>
In the case where end was a instruction-based cursor, we would mix up
our blocks and end up with block_begin pointing after the second split.
This causes a segfault as the cf_node list walk at the end of the
function never terminates properly. There's also a possibility of
mix-up if begin is an instruction-based cursor which was found by
inspection.
Fixes: fc7f2d2364 "nir/cf: add new control modification API's"
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6866>
This legacy path needs to die. Drivers shouldn't be using it anymore.
While we're here, we also implement the resource_reindex intrinsic which
doesn't come up in most direct-access cases but can depending on how the
OpAccessChains are structured. It comes up all the time in the variable
pointers world.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5275>
This is a bit over-paranoid, and can cause drm device fd leaks if there
is a bo leak somewhere. Which is kind of a worse outcome.
This "fixes" a fd leak seen in:
dEQP-EGL.functional.query_context.get_current_display.*
dEQP-EGL.functional.query_context.get_current_context.*
dEQP-EGL.functional.query_context.get_current_display.*
(Still tracking down the root leak)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6900>
The gles3 and gles31 multisample and 565-no-depth-no-stencil caselists
are also mustpass. And they don't add a significant number of extra
test cases.
The remaining mustpass caselists all involve rotation, which is not
currently supported in the surfaceless deqp build.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6553>
vl_mpeg12_decoder needs to override the chroma_format value to get the
correct size calculated (chroma_format is used by vl_video_buffer_adjust_size).
I'm not sure why it's needed, but this is needed to get correct mpeg decode.
Fixes: 24f2b0a856 ("gallium/video: remove pipe_video_buffer.chroma_format")
Acked-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6817>
In order to do cross-stage linking, we'll need to split out SPIR-V->NIR
and NIR finalization, so that we can do a round of linking in between.
The multiview lowering pass also assumes that it sits between two
optimization loops, which in anv are the pre-linking optimizations and
post-linking finalization.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6515>
Previously there were 20 padding bits after them, which would be left
uninitialized and preserved when writing to the swizzle members. This
could result in two equal viewport states spuriously being considered
different (because memcmp compared the padding bits as well).
Noticed while looking for something else with valgrind:
==801624== Conditional jump or move depends on uninitialised value(s)
==801624== at 0x10B86259: cso_set_viewport (cso_context.c:739)
==801624== by 0x10B862C7: cso_set_viewport_dims (cso_context.c:764)
==801624== by 0x1057E3A1: clear_with_quad (st_cb_clear.c:335)
==801624== by 0x1057E3A1: st_Clear (st_cb_clear.c:545)
==801624== [...]
==801624==
==801624== Conditional jump or move depends on uninitialised value(s)
==801624== at 0x10B885DE: cso_restore_viewport (cso_context.c:777)
==801624== by 0x10B885DE: cso_restore_state (cso_context.c:1710)
==801624== by 0x1057E4CB: clear_with_quad (st_cb_clear.c:364)
==801624== by 0x1057E4CB: st_Clear (st_cb_clear.c:545)
==801624== [...]
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6867>
This isn't strictly necessary for freedreno, since we aren't using it
yet, but I wanted to avoid any problems if we do. If we wanted to handle
this "properly", and handle matrix and array per-view variables, we'd
probably want to encode the "view stride" (number of views per user
location) and base view in the intrinsic, but for now we just don't do
any offsetting and assume the indirect offset is the view.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6514>
Taken mostly directly from the anv pass. A few anv-specific things that
I could leave in anv aren't included. Specifically on turnip we don't
need to set gl_Layer to 0, and we can handle the case where the FS reads
gl_ViewIndex, so that check is moved into anv.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6514>
Set up texture coordinates for the stencil-fallback blit code. This
worked in the orignal NIR code, but accidentally broke when rewriting to
use TGSI, and my test-case had a constant colored stencil buffer.
Fixes: e8a40715a8 ("gallium/util: add blitter-support for stencil-fallback")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6898>
This should have been the destination surface size, not the dimensions
of the source box. These were the same in the test-case I used while
developing this, but this matters for the GTF framebuffer-blit
functional CTS tests.
Fixes: e8a40715a8 ("gallium/util: add blitter-support for stencil-fallback")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6898>
The separate readthedocs documentation is quite pointless these days, as
it's been moved to docs.mesa3d.org, where all other documentation
already is. There's nothing special about this documentation any longer.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6893>
Save bit_offset between iterations fixes for piglit:
* LIBGL_ALWAYS_SOFTWARE=true GALLIUM_DRIVER=softpipe piglit/bin/bptc-float-modes
* LIBGL_ALWAYS_SOFTWARE=true GALLIUM_DRIVER=llvmpipe piglit/bin/bptc-float-modes
Memset to zero in reserved mode for rgba_unorm fixes for VK-GL-CTS with libvulkan_val:
* dEQP-VK.texture.compressed.bc7_unorm_block_2d_pot
* dEQP-VK.texture.compressed.bc7_srgb_block_2d_pot
* dEQP-VK.texture.compressed.bc7_unorm_block_2d_npot
* dEQP-VK.texture.compressed.bc7_srgb_block_2d_npot
Signed-off-by: Denis Pauk <pauk.denis@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6809>
v2: Restore the gen == 10 hunk in brw_compile_vs (around line 2940).
This function is also used for scalar VS compiles. Squash in:
intel/vec4: Reindent after removing Gen8+ support
intel/vec4: Silence unused parameter warning in try_immediate_source
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6826>
Lowering uniforms to UBOs results in an aditional iadd for the
UBO buffer id evaluation, and for indirect buffers access that
results in an unnecessary op that can be avoided by not lowering
uniforms. There is some code duplication when reading the uniforms
but it saves a whole instruction group per indirect cont buffer
access.
This reverts commit 98eb00face with
some additional fixes.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6879>
I'm bringing up freedreno Vulkan on an Android phone, and my pains are
exactly what Chad said when working on Intel's vulkan for Android in
aa716db0f6 ("intel: Add simple logging façade for Android (v2)"):
On Android, stdio goes to /dev/null. On Android, remote gdb is even
more painful than the usual remote gdb. On Android, nothing works like
you expect and debugging is hell. I need logging.
This patch introduces a small, simple logging API that can easily wrap
Android's API. On non-Android platforms, this logger does nothing
fancy. It follows the time-honored Unix tradition of spewing
everything to stderr with minimal fuss.
My goal here is not perfection. My goal is to make a minimal, clean API,
that people hate merely a little instead of a lot, and that's good
enough to let me bring up Android Vulkan. And it needs to be fast,
which means it must be small. No one wants to their game to miss frames
while aiming a flaming bow into the jaws of an angry robot t-rex, and
thus become t-rex breakfast, because some fool had too much fun desiging
a bloated, ideal logging API.
Compared to trusty fprintf, _mesa_log[ewi]() is actually usable on
Android. Compared to os_log_message(), this has different error levels
and supports format arguments.
The only code change in the move is wrapping flockfile/funlockfile in
!DETECT_OS_WINDOWS, since mingw32 doesn't have it. Windows likely wants
different logging code, anyway.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6806>
We might have several identical options to vectorize an entry with, but
only one might be vectorizable because of writes interfering.
An example of this is a pattern found in some CTS tests:
a = load(0)
b = load(4)
store(0, a)
store(4, b)
a = load(0)
b = load(4)
store(0, a)
store(4, b)
...
It might have attempted to vectorize the first load(0) with the second
load(4) without attempting the second load(4) when the first fails. This
changes vectorize_entries() to continue even if the first try_vectorize()
failed.
fossil-db (Navi):
Totals from 117 (0.09% of 137413) affected shaders:
SGPRs: 7040 -> 7088 (+0.68%)
CodeSize: 276504 -> 276308 (-0.07%); split: -0.08%, +0.01%
Instrs: 51152 -> 51111 (-0.08%); split: -0.09%, +0.01%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5415>
It's easier to contribute to the documentation if we have links to the
document on GitLab. This will allow people to easily edit docs, or to
realize where in the source-tree they are without having to search.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6742>
According to the specs, the initial depth value for a
depth buffer clear is 1. Use 0x00ffffff like the blob does.
We can remove setting this value in lima_clear, because it's
set during job creation now.
Fixes the following dEQP tests:
dEQP-GLES2.functional.fbo.render.shared_depthbuffer.rbo_rgb565_depth_component16
dEQP-GLES2.functional.fbo.render.shared_depthbuffer.rbo_rgb5_a1_depth_component16
dEQP-GLES2.functional.fbo.render.shared_depthbuffer.rbo_rgb4_depth_component16
dEQP-GLES2.functional.fbo.render.shared_depthbuffer.tex2d_rgba_depth_component16
dEQP-GLES2.functional.fbo.render.shared_depthbuffer.tex2d_rgb_depth_component16
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6842>
This works by taking the spirv produced by libclc which contains
a lot of mangled function entrypoints identified with LinkageAttribute decorations.
This patch just sets up clover to load the libclc blob and convert it to
library nir, and support inlining application nir with calls to libclc.
v2: Add a disk cache support for this object, to avoid the spirv parsing
overheads each time. move spirv->nir to lazy instantiation to avoid
the mess with glsl types and constructor ordering.
v3: make disk cache optional
v1-Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6035>
This pass goes through all the functions in the shader, checks
if the matching function is in the clc spir-v library and inlines
the replacement from there it is.
v2 (daniels): Also copy variables from the libclc shader
v3 (jekstrand): Fix progress return, only run variable inlining once
v4 (jekstrand): Have function inlining also copy vars for us
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6035>
Manually balancing the BEGIN/ENDs is a recipe for xml validation failures,
just make the macros do the balancing. The only ugly bit I think is that
enums take a list of DRI_CONF_ENUM() without ','s in between them.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6753>
The reload texture descriptor needs to take care of the mipmap level
and the layer in case of GL_TEXTURE_CUBE_MAP.
glCopyTexSubImage2D triggers the lima_blit function which ends in a draw.
A reload is necessary. The reload texture descriptor is always built with
just one mipmap level, but this needs to be the level we want to reload,
not just 0. We also have to take care of the cubemap face.
This fixes the following dEQP tests:
dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.2d_rgb
dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.2d_rgba
dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.cube_rgb
dEQP-GLES2.functional.texture.specification.basic_copytexsubimage2d.cube_rgba
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6816>
Scratch stores are being lowered to the instructions with side-effects,
however they should be enabled in fs helper invocations, since they
are produced from operations which don't imply side-effects.
To fix this - we move the decision of whether the sample mask predication
is enable to the point where logical brw instructions are created.
GLSL example of the issue:
int tmp[1024];
...
do {
// changes to tmp
} while (some_condition(tmp))
If `tmp` is lowered to scrach memory, `some_condition` would be
undefined if scratch write is predicated on sample mask, making
possible for the while loop to become infinite and hang the GPU.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3256
Fixes: 53bfcdeecf
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6056>
Fix defect reported by Coverity Scan.
Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking image suggests that it may be
null, but it has already been dereferenced on all paths leading to
the check.
Fixes: ad609bf55a ("frontend/dri: Implement mapping individual planes.")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6807>
This gives me one place to document why it works this way. This does make
the debug message a little less helpful ("etna" instead of "etnaviv" and
"vmwgfx" instead of "svga", but you should only be able to reach this when
doing something like trying the radeon/nouveau vdpau target on the wrong
DRM device for example.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6751>
We can just reuse drm_helper.h, which has either the real code or the stub
for all pipe_screens based on the GALLIUM_* driver defines, and the
dynamic pipe loader's .c build will only define one GALLIUM_* driver
define. The remaining stubs should get GCed by the linker.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6751>
Subtitles are rendering with an upload through a staging texture.
So the sequence is:
1. draw video (with a secure cs)
2. copy staging texture to the real texture (via si_resource_copy_region) in
a non-secure cs.
3. draw video (with a secure cs)
Step 2 and 3 both generates a flush with RADEON_FLUSH_TOGGLE_SECURE_SUBMISSION.
These flushes are executed quite late: right before doing the draw/dispatch,
so maybe the issue here is the handling of dependencies.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6049>
tess_rings must be encrypted when used in a secure job so this commit
introduces a tess_rings_tmz resource.
The cs_preamble_state doesn't contain the tess_rings address anymore since
it can change. The tess_rings related registers go in a separate preamble.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6049>
This commit makes TMZ always allowed instead of being either off or forced-on
with AMD_DEBUG=tmz.
With this change:
- secure job can be used as soon as the application made a tmz allocation. Driver
internal allocations are not enough to enable secure jobs (if tmz is supported
and enabled by the kernel)
- AMD_DEBUG=tmz forces all scanout/depth/stencil buffers to be allocated as TMZ.
This is useful to test app thats don't explicitely support protected content.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6049>
Tag allocations as driver internal.
Some of these allocations will need to be doubled to handle TMZ (one secure bo,
one normal bo) but these allocations shouldn't switch the winsys in "the app
is using TMZ".
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6049>
Without this, we end up throwing errors on code along these lines when
rendering using single-buffering:
GLint att;
glGetIntegerv(GL_READ_BUFFER, &att);
glGetFramebufferAttachmentParameteriv(GL_READ_FRAMEBUFFER, att, ...);
This is because we internally translate GL_BACK (which is what
glGetIntegerv returned) to GL_FRONT, which we don't handle in the
Desktop GL case. So let's start handling it.
This fixes the GLTF-GL33.gtf21.GL2FixedTests.buffer_color.blend_color
test for me.
Fixes: e6ca6e587e ("mesa: Handle pbuffers in desktop GL framebuffer attachment queries")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6815>
Issue (57) for the ARB_uniform_buffer_object spec states:
"The uniform buffer could be larger than the amount of uniform
block(s) data inside it."
This means we need to clamp the uniform buffer size in case it is
bigger than what hardware supports.
Fixes the OpenGL 3.3 renderer of GZDoom.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6835>
Add natively supported vertex buffer formats. If formats are not listed
here as natively supported, mesa triggers a buffer format translation
routine per draw call which can be expensive.
This helps improve performance in some applications.
The 32-bit integer formats were found by trial and error with a script
and checked in particular with piglit test gl-2.0-vertexattribpointer.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6714>
This shader is useful to replicate single bit from a stencil buffer even
when there's no support for PIPE_CAP_SHADER_STENCIL_EXPORT.
This is useful for the D3D12 driver, where the graphics pipeline is the
only way of writing to MSAA stencil-buffers, and not all drivers support
exporting the stencil-value from the shader.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6681>
We need the disk cache enabled in Android to get EGL_ANDROID_blob_cache's
callbacks called, but we don't actually want to store anything on disk.
Fixes "Failed to create //.cache for shader cache (Read-only file
system)---disabling." spam on init.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6762>
Previously, we included the stubs in our driver binaries, so they didn't
call the actual system libraries for these functions. This was enough to
build-test the Android code in CI without even the NDK.
To make NDK-built Mesa drivers useful, we need to link against these
system libraries that aren't present in the NDK. Split the symbols to
separate non-installed shared libraries and link against those, so that
when you drop the resulting .so in your /vendor/lib64/hw/, it just works
out.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6805>
We don't use the format anymore in the backend, except determining the
number of components, and we fallback to 4 there if it's not specified.
So we should be safe to enable this.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6800>
Use the sampler type instead, which was recently plumbed through core
NIR, for load/store and the right type for atomics. This removes the
last hard dependency on the image format.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6800>
It's asserted that the visit_load_input code isn't reached. It also didn't
handle divergent indexing and this situation should have been lowered
anyway.
I think this used to be needed to pass a dEQP-VK.glsl.indexing.* test, but
it doesn't seem needed anymore.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6689>
Section 5.11 (Out-of-Bounds Accesses) of the GLSL 4.60 spec says:
"In the subsections described above for array, vector, matrix and
structure accesses, any out-of-bounds access produced undefined
behavior.... Out-of-bounds reads return undefined values, which
include values from other variables of the active program or zero."
Robustness extensions suggest to return zero on out-of-bounds
accesses, however it's not applicable to the arrays of samplers,
so just clamp the index.
Otherwise instr->sampler_index or instr->texture_index would be out
of bounds, and they are used as an index to arrays of driver state.
E.g. this fixes such dereference:
if (options->lower_tex_packing[tex->sampler_index] !=
in nir_lower_tex.c
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6428>
Out-of-bounds writes could be eliminated per spec:
Section 5.11 (Out-of-Bounds Accesses) of the GLSL 4.60 spec says:
"In the subsections described above for array, vector, matrix and
structure accesses, any out-of-bounds access produced undefined
behavior.... Out-of-bounds writes may be discarded or overwrite
other variables of the active program."
Fixes: 1235850522
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6428>
Out-of-bounds writes could be eliminated per spec:
Section 5.11 (Out-of-Bounds Accesses) of the GLSL 4.60 spec says:
"In the subsections described above for array, vector, matrix and
structure accesses, any out-of-bounds access produced undefined
behavior....
Out-of-bounds writes may be discarded or overwrite
other variables of the active program.
Out-of-bounds reads return undefined values, which
include values from other variables of the active program or zero."
GL_KHR_robustness and GL_ARB_robustness encourage us to return zero
for reads.
Otherwise get_io_offset would return out-of-bound offset which may
result in out-of-bound loading/storing of inputs/outputs,
that could cause issues in drivers down the line.
E.g. this fixes such dereference:
int vue_slot = vue_map->varying_to_slot[intrin->const_index[0]];
in brw_nir.c
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6428>
The crop / conformance window parameters are set by ffmpeg but they only
seem to be made available in packed headers. This commit copies the H265
header parsing code from st/omx (planning in the future to move this
code to a common place to be shared by the different state trackers) in
order to grab the crop parameters
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4184>
Issue: While running the tast for odd resolutions there are green lines
observed in the dumped image. The resolution is 321x241, the extra 1 pixel
data is missing in the image. The reason for this is in the post
processing when we adjust the size of height and width we are
dividing it by 2.
Fix: To resolve this issue we first need to align it to 2 and then divide
by 2 to get the required values. Once we do this we will have proper data
in the dumped image and missing pixel data will be available.
Signed-off-by: Krunal Patel <krunalkumarmukeshkumar.patel@amd.corp-partner.google.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6795>
Panfrost descriptors are big and are usually built from a combination of
sub-descriptors. On top of that, layout of sub-descriptors might vary
depending on the architecture version. Since unions are not really an
option (too complex), here is a thin abstraction layer allowing us to
manipulate aggregates in their packed format. Each aggregate is formed
of one or more sections that are meant to be packed/unpacked/printed
separately. Section overlapping is allowed to facilitate handling of
descriptor variants.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6797>
It's not really unordered in the sense that it can still stall on
ordered things and we don't need a SYNC_NOP for that because it is a
SYNC_NOP. However, it also doesn't count when computing instruction
distances.
Fixes: 18e72ee210 "intel/fs: Add FS_OPCODE_SCHEDULING_FENCE"
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6781>
In CL 1.2, images are required to be either read-only or write-only. We
can always translate the read-only image ops to texture ops. In CL 2.0
(and an extension), the ability is added to have read-write images but
sampling (with a sampler) is only allowed on read-only images. As long
as we only lower read-only images to texture ops, everything should stay
consistent.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6578>
In SPIR-V, the access qualifiers for an image are provided on the image
type. Assuming no one swaps the types around on us (I think that should
be illegal), this means we can reliably fetch the access qualifiers from
the type itself. The ops which don't really have an easy-to-fetch type
are the atomics because they use OpImageTexelPointer. However, those
are only allowed on read/write images and that's the default.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6578>
In some cases when switching shader programs, mesa does not switch the
currently set pipe_constant_buffer, which keeps pointing to the one
previously set.
If the two shader programs have a different number of uniforms, the size
of the constant buffer may be different and this needs to be considered
while generating the next draw command.
This patch fixes the uniform buffer creation in the lima vertex shader
command to avoid an out of bounds memcpy due to a previously set
pipe_constant_buffer.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6701>
The retile maps are a software mechanism and hence very suceptible
to change. As such I'd like to avoid making it part of the cross
driver ABI.
Ideally we'd just use the cached tile info + a shader to avoid these
buffers altogether.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6783>
Adds a shader disk-cache for shader variants. Note that builds with
`-Dshader-cache=false` have no-op stubs with `disk_cache_create()` that
returns NULL.
This shader disk-cache gets used when using NIR only. Helps to save
about 1-2 minutes for a deqp run on gc2000.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6669>
cbuf was changed to unsigned in commit 3fec2f67c3 ("radeonsi:
compact MRTs to save PS export memory space").
Fix defect reported by Coverity Scan.
Macro compares unsigned to 0 (NO_EFFECT)
unsigned_compare: This greater-than-or-equal-to-zero comparison of
an unsigned value is always true. cbuf >= 0U.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6739>
Remove filename ralloc comment. filename is allocated by asprintf.
Clean up disk_cache_get dead code left over from 367ac07efc
("disk_cache: move cache item loading code into
disk_cache_load_item() helper").
Fix defect reported by Coverity Scan.
Logically dead code (DEADCODE)
dead_error_line: Execution cannot reach this statement:
free(filename);
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6738>
Fixes the following building error:
external/mesa/src/panfrost/bifrost/bi_pack.c:409:24: error: implicit declaration of function 'pan_pack_fma_nop_i32' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
return pan_pack_fma_nop_i32(clause, NULL, regs);
^
external/mesa/src/panfrost/bifrost/bi_pack.c:433:32: error: implicit declaration of function 'pan_pack_fma_fadd_f32' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
return pan_pack_fma_fadd_f32(clause, bundle.fma, regs);
^
external/mesa/src/panfrost/bifrost/bi_pack.c:433:32: note: did you mean 'pan_pack_fma_nop_i32'?
external/mesa/src/panfrost/bifrost/bi_pack.c:409:24: note: 'pan_pack_fma_nop_i32' declared here
return pan_pack_fma_nop_i32(clause, NULL, regs);
^
external/mesa/src/panfrost/bifrost/bi_pack.c:435:32: error: implicit declaration of function 'pan_pack_fma_fadd_v2f16' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
return pan_pack_fma_fadd_v2f16(clause, bundle.fma, regs);
^
external/mesa/src/panfrost/bifrost/bi_pack.c:435:32: note: did you mean 'pan_pack_fma_fadd_f32'?
external/mesa/src/panfrost/bifrost/bi_pack.c:433:32: note: 'pan_pack_fma_fadd_f32' declared here
return pan_pack_fma_fadd_f32(clause, bundle.fma, regs);
^
external/mesa/src/panfrost/bifrost/bi_pack.c:442:32: error: implicit declaration of function 'pan_pack_fma_fcmp_f32' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
return pan_pack_fma_fcmp_f32(clause, bundle.fma, regs);
^
external/mesa/src/panfrost/bifrost/bi_pack.c:442:32: note: did you mean 'pan_pack_fma_fadd_f32'?
external/mesa/src/panfrost/bifrost/bi_pack.c:433:32: note: 'pan_pack_fma_fadd_f32' declared here
return pan_pack_fma_fadd_f32(clause, bundle.fma, regs);
^
external/mesa/src/panfrost/bifrost/bi_pack.c:444:32: error: implicit declaration of function 'pan_pack_fma_fcmp_v2f16' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
return pan_pack_fma_fcmp_v2f16(clause, bundle.fma, regs);
^
external/mesa/src/panfrost/bifrost/bi_pack.c:444:32: note: did you mean 'pan_pack_fma_fadd_v2f16'?
external/mesa/src/panfrost/bifrost/bi_pack.c:435:32: note: 'pan_pack_fma_fadd_v2f16' declared here
return pan_pack_fma_fadd_v2f16(clause, bundle.fma, regs);
^
external/mesa/src/panfrost/bifrost/bi_pack.c:449:41: error: implicit declaration of function 'pan_pack_fma_rshift_and_i32' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
pan_pack_fma_rshift_and_i32(clause, bundle.fma, regs) :
^
external/mesa/src/panfrost/bifrost/bi_pack.c:449:41: note: did you mean 'pan_pack_fma_fadd_f32'?
external/mesa/src/panfrost/bifrost/bi_pack.c:433:32: note: 'pan_pack_fma_fadd_f32' declared here
return pan_pack_fma_fadd_f32(clause, bundle.fma, regs);
^
external/mesa/src/panfrost/bifrost/bi_pack.c:450:41: error: implicit declaration of function 'pan_pack_fma_lshift_and_i32' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
pan_pack_fma_lshift_and_i32(clause, bundle.fma, regs);
^
external/mesa/src/panfrost/bifrost/bi_pack.c:450:41: note: did you mean 'pan_pack_fma_rshift_and_i32'?
external/mesa/src/panfrost/bifrost/bi_pack.c:449:41: note: 'pan_pack_fma_rshift_and_i32' declared here
pan_pack_fma_rshift_and_i32(clause, bundle.fma, regs) :
^
external/mesa/src/panfrost/bifrost/bi_pack.c:453:41: error: implicit declaration of function 'pan_pack_fma_rshift_and_v2i16' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
pan_pack_fma_rshift_and_v2i16(clause, bundle.fma, regs) :
^
external/mesa/src/panfrost/bifrost/bi_pack.c:453:41: note: did you mean 'pan_pack_fma_fadd_v2f16'?
external/mesa/src/panfrost/bifrost/bi_pack.c:435:32: note: 'pan_pack_fma_fadd_v2f16' declared here
return pan_pack_fma_fadd_v2f16(clause, bundle.fma, regs);
^
external/mesa/src/panfrost/bifrost/bi_pack.c:454:41: error: implicit declaration of function 'pan_pack_fma_lshift_and_v2i16' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
pan_pack_fma_lshift_and_v2i16(clause, bundle.fma, regs);
^
external/mesa/src/panfrost/bifrost/bi_pack.c:454:41: note: did you mean 'pan_pack_fma_rshift_and_v2i16'?
external/mesa/src/panfrost/bifrost/bi_pack.c:453:41: note: 'pan_pack_fma_rshift_and_v2i16' declared here
pan_pack_fma_rshift_and_v2i16(clause, bundle.fma, regs) :
^
external/mesa/src/panfrost/bifrost/bi_pack.c:458:41: error: implicit declaration of function 'pan_pack_fma_rshift_and_v4i8' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
pan_pack_fma_rshift_and_v4i8(clause, bundle.fma, regs) :
^
external/mesa/src/panfrost/bifrost/bi_pack.c:459:41: error: implicit declaration of function 'pan_pack_fma_lshift_and_v4i8' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
pan_pack_fma_lshift_and_v4i8(clause, bundle.fma, regs);
^
external/mesa/src/panfrost/bifrost/bi_pack.c:459:41: note: did you mean 'pan_pack_fma_rshift_and_v4i8'?
external/mesa/src/panfrost/bifrost/bi_pack.c:458:41: note: 'pan_pack_fma_rshift_and_v4i8' declared here
pan_pack_fma_rshift_and_v4i8(clause, bundle.fma, regs) :
^
external/mesa/src/panfrost/bifrost/bi_pack.c:465:41: error: implicit declaration of function 'pan_pack_fma_rshift_or_i32' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
pan_pack_fma_rshift_or_i32(clause, bundle.fma, regs) :
^
external/mesa/src/panfrost/bifrost/bi_pack.c:465:41: note: did you mean 'pan_pack_fma_nop_i32'?
external/mesa/src/panfrost/bifrost/bi_pack.c:409:24: note: 'pan_pack_fma_nop_i32' declared here
return pan_pack_fma_nop_i32(clause, NULL, regs);
^
external/mesa/src/panfrost/bifrost/bi_pack.c:466:41: error: implicit declaration of function 'pan_pack_fma_lshift_or_i32' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
pan_pack_fma_lshift_or_i32(clause, bundle.fma, regs);
^
external/mesa/src/panfrost/bifrost/bi_pack.c:466:41: note: did you mean 'pan_pack_fma_rshift_or_i32'?
external/mesa/src/panfrost/bifrost/bi_pack.c:465:41: note: 'pan_pack_fma_rshift_or_i32' declared here
pan_pack_fma_rshift_or_i32(clause, bundle.fma, regs) :
^
external/mesa/src/panfrost/bifrost/bi_pack.c:469:41: error: implicit declaration of function 'pan_pack_fma_rshift_or_v2i16' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
pan_pack_fma_rshift_or_v2i16(clause, bundle.fma, regs) :
^
external/mesa/src/panfrost/bifrost/bi_pack.c:470:41: error: implicit declaration of function 'pan_pack_fma_lshift_or_v2i16' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
pan_pack_fma_lshift_or_v2i16(clause, bundle.fma, regs);
^
external/mesa/src/panfrost/bifrost/bi_pack.c:470:41: note: did you mean 'pan_pack_fma_rshift_or_v2i16'?
external/mesa/src/panfrost/bifrost/bi_pack.c:469:41: note: 'pan_pack_fma_rshift_or_v2i16' declared here
pan_pack_fma_rshift_or_v2i16(clause, bundle.fma, regs) :
^
external/mesa/src/panfrost/bifrost/bi_pack.c:474:41: error: implicit declaration of function 'pan_pack_fma_rshift_or_v4i8' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
pan_pack_fma_rshift_or_v4i8(clause, bundle.fma, regs) :
^
external/mesa/src/panfrost/bifrost/bi_pack.c:475:41: error: implicit declaration of function 'pan_pack_fma_lshift_or_v4i8' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
pan_pack_fma_lshift_or_v4i8(clause, bundle.fma, regs);
^
external/mesa/src/panfrost/bifrost/bi_pack.c:475:41: note: did you mean 'pan_pack_fma_rshift_or_v4i8'?
external/mesa/src/panfrost/bifrost/bi_pack.c:474:41: note: 'pan_pack_fma_rshift_or_v4i8' declared here
pan_pack_fma_rshift_or_v4i8(clause, bundle.fma, regs) :
^
external/mesa/src/panfrost/bifrost/bi_pack.c:482:41: error: implicit declaration of function 'pan_pack_fma_rshift_xor_i32' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
pan_pack_fma_rshift_xor_i32(clause, bundle.fma, regs) :
^
external/mesa/src/panfrost/bifrost/bi_pack.c:483:41: error: implicit declaration of function 'pan_pack_fma_lshift_xor_i32' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
pan_pack_fma_lshift_xor_i32(clause, bundle.fma, regs);
^
external/mesa/src/panfrost/bifrost/bi_pack.c:483:41: note: did you mean 'pan_pack_fma_rshift_xor_i32'?
external/mesa/src/panfrost/bifrost/bi_pack.c:482:41: note: 'pan_pack_fma_rshift_xor_i32' declared here
pan_pack_fma_rshift_xor_i32(clause, bundle.fma, regs) :
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
Fixes: f8fc2105 ("pan/bi: Use new disassembler")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6765>
When handling arrays, range is increased based on the array size minus
one. But if such is zero, it has the effect of reducing the
range. Handle that case by returning the unknown range value.
v2:
* Add missing braces.
* Return unknown range in this case, instead of keeping the initial
range.
v3: Simplify code, using existing "fail" label. (Jason)
Fixes the following using v3dv:
dEQP-VK.graphicsfuzz.cov-simplify-clamp-max-itself
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6737>
On FMA, a carry/borrow is required for iaddc/isubb (whereas the ADD
counterparts don't support carrying/borrowing). The trick is to model
this with an extra dummy (ZERO) argument which is free to encode on FMA,
and in the scheduler, "demote" to the non-carried versions if we want to
schedule to ADD.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6749>
Given a parsed instruction set definition, this script generates
instruction disassembly routines responsible for decoding instruction
words and pretty-printing. Decoding is somewhat complex as with the
previous disassembler but can be automated.
Disssembly is complicated by indirect specifications of instruction
modifiers. These specifiers are given as logic expressions in the XML,
which optimizes for straightforwaard packing but makes disassembly
awkward. Instead of attempting to invert the logic directly, we generate
lookup tables of `modifiers -> encoding` maps which we may invert
directly to produce a lookup table for the `encoding -> modifiers` map
needed for disassembly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6749>
From the ISA definition, we can generate a function for each instruction
that looks at the bi_instruction in the intermediate representation and
emits a 20- or 23-bit word (for ADD/FMA respectively) containing that
instruction with all of its modifiers.
These will approximate the old packing routines, although the mapping of
bi_instruction to machine instructions will be hardcoded (at least for
now).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6749>
Throughout this series, this XML file will serve as architectural ground
truth. It contains every instruction in the instruction set with all
programmable modifiers, as well as logic for computing derived values
(indirectly specified modifiers) and swapping operands as needed by
numerous encodings. It also allows for multiple encodings per
instruction differentiated by exact bits (a generalization of opcodes),
with different derived fields in each encoding, and logic tests to
select between the encodings at pack time.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6749>
In !6574 I fixed the dates, but I didn't realise there was one too many
releases in the list, as `-rc3` had already been released and `-rc4` was
about to be.
`-rc4` was since released, so the next 20.2 release is now `-rc5`, and
it's slipped another week; let's update the calendar to that.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6614>
src/gallium/targets/libgl-gdi/libgl_gdi.c:59:16: warning: ‘use_swr’ defined but not used [-Wunused-variable]
59 | static boolean use_swr = FALSE;
| ^~~~~~~
src/gallium/targets/libgl-gdi/libgl_gdi.c:58:16: warning: ‘use_llvmpipe’ defined but not used [-Wunused-variable]
58 | static boolean use_llvmpipe = FALSE;
| ^~~~~~~~~~~~
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6508>
Add an allowlist to make an exception when deriving images from
interlaced buffers. Normally, the function should fail if the surface
needs to be modified to derive the image. But some applications do not
follow the fall-back method of using vaCreateImage + vaPutImage as
mentioned in the VAAPI documentation, so we have to make an exception.
Signed-off-by: Thong Thai <thong.thai@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5942>
Fix defects reported by Coverity Scan.
Uninitialized pointer field (UNINIT_CTOR)
Non-static class member exit is not initialized in this
constructor nor in any functions that it calls
Non-static class member immInsertPos is not initialized in this
constructor nor in any functions that it calls.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6695>
Fix defect reported by Coverity.
Sizeof not portable (SIZEOF_MISMATCH)
suspicious_sizeof: Passing argument vec_size * 8UL /* sizeof
(LLVMValueRef *) */ to function __builtin_alloca and then casting
the return value to LLVMValueRef * is suspicious. In this
particular case sizeof (LLVMValueRef *) happens to be equal to
sizeof (LLVMValueRef), but this is not a portable assumption.
Fixes: ca74603b4f ("ac/llvm: add better code for isign")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6682>
It seems that the mali400 pp is unable to load vec3 unaligned varyings.
This can happen in the current state with mesa if a varying float is put
into the first component of a vec4 and a vec3 is packed right after it.
This would be fine as by default nir would create a vec4 load followed
by a mov with swizzle to realign the components into a vec3.
In lima_nir_split_load_input, this becomes a separate vec3 load
expecting the unaligned load.
Since this can't happen, skip the load input splitting for this special
case.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6507>
Anv doesn't do multi-layer fast-clear tracking, but TGL may add
fast-clears to multiple layers. Disable CCS_E for image arrays on TGL+
until anv gets more clear color tracking abilities.
With this change, anv+TGL now passes:
* dEQP-VK.multiview.readback_implicit_clear.15_15_15_15
* dEQP-VK.multiview.readback_implicit_clear.8_1_1_8
* dEQP-VK.multiview.readback_implicit_clear.1_2_4_8_16_32
* dEQP-VK.multiview.renderpass2.readback_implicit_clear.15_15_15_15
* dEQP-VK.multiview.renderpass2.readback_implicit_clear.8_1_1_8
* dEQP-VK.multiview.renderpass2.readback_implicit_clear.1_2_4_8_16_32
v2. Mention HSD 14010672564. (Sagar)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6528>
syncobj wait takes int64_t timeout and won't clamp it
in kernel code, so we have to pass in INT64_MAX instead
of OS_TIMEOUT_INFINITE which is UINT64_MAX. Otherwise
syncobj wait with OS_TIMEOUT_INFINITE case just return
fail.
Fixes: c638301b42 "radeonsi: fix syncobj wait timeout"
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6676>
Pre-V3D 4.3 hardware has a quirk where it expects XY coordinates in
.8 fixed-point format, but then it will internally round it to .6 fixed-point,
introducing a double rounding. The double rounding can cause very slight
differences in triangle raterization coverage that can actually be noticed by
some CTS tests.
The correct fix for this as recommended by Broadcom is to convert to
.8 fixed-point with ffloor().
Fixes:
dEQP-VK.renderpass.suballocation.subpass_dependencies.late_fragment_tests.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6677>
It wouldn't work in any case, as the internal API only stores a signed
int, and GLX_EXT_swap_control_tear will overload the meaning of negative
values so we should avoid ambiguity.
If your application needs a swap interval in excess of ~414.25 days, I'm
very sorry.
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6671>
I find the __glX naming convention distasteful for a bunch of reasons,
not least that I expect "vi -t glXBindTexImageEXT" to take me someplace
useful. The functions we're referencing here should not be exported from
libGL (hence the static / _X_HIDDEN) but that's no reason not to name
them correctly.
This does have one possible, very minor, correct functional change,
glXGetMscRateOML now returns Bool (unsigned int) instead of GLboolean
(unsigned char); if your psABI really only writes to a single byte of
the return register when the return type is char-like, then we probably
would not have returned false when we meant to. At least for amd64 this
does not seem to be an issue; the old code wrote 0 to %eax, the new code
does a zero-extended load from %al to %eax (since the internal function
still returns GLboolean).
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6671>
This exploits a HW optimization for when only the size of a draw state is
changed, to make things simpler and more optimal (assuming a well behaved
user which doesn't unecessarily call CmdBindVertexBuffers many times)
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6665>
These variables seem to be initialized before being used, so this
patch is not fixing any bug, but leaving them unitialized may become
a bug after some refactoring.
These classes were affected: fs_reg_alloc, fs_visitor, fs_generator,
instruction_scheduler.
Found by Coverity.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6667>
We need to do MSAA clear on the 3d pipe, it seems to not work out
properly on 2d pipe (at least for 4x MSAA and heights that are not
multiple of 16).
This matches what blob and tu seem to do. Fixes the following with
DEQP_CONFIG=rgba8888d24s8ms4
dEQP-GLES31.functional.primitive_bounding_box.depth.builtin_depth.per_primitive_bbox_equal
dEQP-GLES31.functional.primitive_bounding_box.depth.builtin_depth.per_primitive_bbox_larger
dEQP-GLES31.functional.primitive_bounding_box.depth.user_defined_depth.per_primitive_bbox_equal
dEQP-GLES31.functional.primitive_bounding_box.depth.user_defined_depth.per_primitive_bbox_larger
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6649>
Fix defects reported by Coverity Scan.
Resource leak (RESOURCE_LEAK)
leaked_handle: Handle variable fd going out of scope leaks the handle.
Argument cannot be negative (NEGATIVE_RETURNS)
negative_returns: fd is passed to a parameter that cannot be negative.
Fixes: 1ea4ef0d3b ("freedreno: slurp in decode tools")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6642>
Add a function suitable for planar YUV surfaces. For these surfaces,
drivers remap each plane to an RGB-formatted surface. Enable drivers to
pass the plane index and the original YUV format to get the right
aux-map format bits.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6486>
Add support for up to four planes being imported via the
I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS modifier. In the four plane
scenario, two planes are used for the compressed surface and two planes
are used for the compression metadata.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6486>
It's useful for kernel dev to be able throw all of our testing
infrastructure at a risky kernel change, but it's expensive (time and
bandwidth) to roll new containers every time your rev your kernel. Make
it so you can just point the env vars to your personal build you've
uploaded.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6592>
I'd like to see this new non-UAPI feature bake in CI. More importantly,
it may prevent some classes of flakes on cheza by isolating the processes
on the GPU so that a fault in one doesn't stomp over memory in another.
I've also pulled in a fix that etnaviv needed for their upcoming CI.
We add a few more kernel options while uprevving:
- More interconnect drivers for getting good GPU perf
- PRNG so that we don't get late-in-boot complaints about randomness.
- db820c's power domains and ethernet so hopefully we can switch to this
upstream kernel
This seems to slightly change the flakes happening in bypass mode, so add
them to the list.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6592>
aco_instruction_selection_setup.cpp (previously used as a header) has
been split into a header and an implementation file. The latter "only"
implements init_context and setup_isel_context, but since these files
carry a long trail of helper functions, this cleans up the isel header
a lot.
Reduces library size by 3.1% due to more functions being compiled with
static linkage. Makes aco_instruction_selection.cpp compile 3% faster.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6504>
Statically known values were encoded using template parameters previously,
causing specializations for each of the 5 sets of template arguments to be
generated. Since emit_load is not performance critical (the inner loop
never runs more often than twice), it's better for build time to use
runtime arguments everywhere.
Reduces build time of this file by 9% (17.3s -> 15.7s on my machine) and
reduces libaco's size by 2.6%.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6504>
SPIR-V's OpKill is a control-flow instruction but NIR's discard is not.
Therefore, it can be valid SPIR-V to have
if (...) {
foo = /* something */
} else {
discard;
}
use(foo);
without any phi between the definition of foo and its use. This is not
true in NIR, however, because NIR's discard isn't considered
control-flow. Arguably, this is a NIR bug but making discard control-
flow is a very deep change that can have serious ans subtle
side-effects. The easier thing to do is just fix up the SSA in case we
have an OpKill which might have gotten us into the above case.
Fixes dEQP-VK.graphicsfuzz.vectors-and-discard-in-function with the new
NIR dominance validation pass enabled.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5288>
We don't do full dominance validation of SSA values in nir_validate
because it requires generating valid dominance information and, while
that's not extremely expensive, it's probably more than we want to do on
every pass. Also, dominance information is generated through the
metadata system so if we ran it by default in nir_validate, we would get
different beavior of the metadata system based on whether or not you
have a debug build and metadata bugs would be very hard to find.
However, having a pass for it that can be run occasionally, should help
detect and expose bugs. For ease of use, we add a NIR_VALIDATE_SSA_DOMINANCE
environment variable which can be set to manually enable dominance
validation as a standard part of nir_validate.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5288>
freedreno results (note that cat6 is loads from memory as opposed to
pushed constants from the constant file):
total instructions in shared programs: 8044344 -> 8022085 (-0.28%)
total constlen in shared programs: 1411384 -> 1461964 (3.58%)
total cat6 in shared programs: 89983 -> 87065 (-3.24%)
Over the last 3 commits, we increased Manhattan31 performance by ~10%
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6359>
There's no sense in planning out an upload that we won't be able to
actually upload due to the limit. This means that we can keep making
other loads pushable, even after we find one that wouldn't be, and we
don't fill the const file with UBO data for a load we couldn't promote.
total instructions in shared programs: 8096655 -> 8044344 (-0.65%)
total constlen in shared programs: 1447824 -> 1411384 (-2.52%)
total cat6 in shared programs: 97824 -> 89983 (-8.02%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6359>
Now that NIR doesn't lose the original base/range on the
nir_lower_uniforms_to_ubo() path, we get a lot more indirect arrays
uploaded in shader-db.
total instructions in shared programs: 8125988 -> 8103788 (-0.27%)
total constlen in shared programs: 1313096 -> 1448864 (10.34%)
total cat6 in shared programs: 104089 -> 97824 (-6.02%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6359>
For UBO accesses to be the same performance as classic GL default uniform
block uniforms, we need to be able to push them through the same path. On
freedreno, we haven't been uploading UBOs as push constants when they're
used for indirect array access, because we don't know what range of the
UBO is needed for an access.
I believe we won't be able to calculate the range in general in spirv
given casts that can happen, so we define a [0, ~0] range to be "We don't
know anything". We use that at the moment for all UBO loads except for
nir_lower_uniforms_to_ubo, where we now avoid losing the range information
that default uniform block loads come with.
In a departure from other NIR intrinsics with a "base", I didn't make the
base an be something you have to add to the src[1] offset. This keeps us
from needing to modify all drivers (particularly since the base+offset
thing can mean needing to do addition in the backend), makes backend
tracking of ranges easy, and makes the range calculations in
load_store_vectorizer reasonable. However, this could definitely cause
some confusion for people used to the normal NIR base.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6359>
libfreedreno_common static dependency is need for gallium_dri target
Fixes the following building error:
ld.lld: error: undefined symbol: fd_get_device_uuid
>>> referenced by freedreno_screen.c:836 (external/mesa/src/gallium/drivers/freedreno/freedreno_screen.c:836)
>>> freedreno_screen.o:(fd_screen_get_device_uuid) in archive out/target/product/x86_64/obj/STATIC_LIBRARIES/libmesa_pipe_freedreno_intermediates/libmesa_pipe_freedreno.a
ld.lld: error: undefined symbol: fd_get_driver_uuid
>>> referenced by freedreno_screen.c:842 (external/mesa/src/gallium/drivers/freedreno/freedreno_screen.c:842)
>>> freedreno_screen.o:(fd_screen_get_driver_uuid) in archive out/target/product/x86_64/obj/STATIC_LIBRARIES/libmesa_pipe_freedreno_intermediates/libmesa_pipe_freedreno.a
Fixes: e3c39e505 "freedreno: Implement pipe screen's get_device/driver_uuid()"
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6625>
The current align_mul calculation in the unknown-array-index
calculation is
align_mul = MIN3(parent_mul,
min_pow2_divisor(parent_offset),
min_pow2_divisor(stride))
which is certainly correct if parent_offset > 0. However, when
parent_offset = 0, min_pow2_divisor(parent_offset) isn't well-defined
and our calculation for it is 1 << -1 which isn't well-defined. That
said.... it's not actually needed.
The offset to the base of the array is
array_base = parent_mul * k + parent_offset
for some integer k. When we throw in an unknown array index i, we get
elem = parent_mul * k + parent_offset + stride * i.
If we set new_align = MIN2(parent_mul, min_pow2_divisor(stride)), then
both parent_mul and stride are divisible by new_align and
elem = (parent_mul / new_alig) * new_align * k +
(stride / new_align) * new_align * i + parent_offset
= new_align * ((parent_mul / new_alig) * k +
(stride / new_align) * i) + parent_offset
so elem = new_align * j + parent_offset where
j = (parent_mul / new_alig) * k + (stride / new_align) * i.
That's a very long-winded way of saying that we can delete one parameter
from the align_mul calculation and it's still fine. :-)
Fixes: 480329cf8b "nir: Add a helper for getting the alignment of a deref"
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Tested-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6628>
The argument handling of this little tool was pretty rubbish. It had no
help and it required the filename to come first which is just strange.
This reworks it and makes things much nicer. It's still rubbish but at
least there's a chance people can figure out how to use it now.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6607>
ppir_do_one_node_to_instr merges instructions and uses a pipeline reg
to save a reg. It tests if ppir_node_has_single_src_succ, but it should
check if ppir_node_has_single_succ.
The following deqp tests run into this issue because they have a node
with 2 successors in different blocks, where one was merged into the same
instruction and the second one is pointing to a missing predecessor then.
Fixes the following deqp tests:
dEQP-GLES2.functional.shaders.loops.do_while_dynamic_iterations.vector_counter_fragment
dEQP-GLES2.functional.shaders.loops.for_dynamic_iterations.vector_counter_fragment
dEQP-GLES2.functional.shaders.loops.while_dynamic_iterations.vector_counter_fragment
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6555>
Compilers may use vector instructions in calculating
hash values of std::string. This happens usualy when
high optimalization level is enabled. SWR had two
static std::map<std::string, T> variables which
lead to crashes on non-AVX systems during the initialization
of those variables. This commit makes those variables
dynamically allocated and fixes this AVX instruction
leak.
Closes: #3077Closes: #198
Reviewed-by: Krzysztof Raszkowski <krzysztof.raszkowski@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6565>
Android building rules are aligned to meson ones
Fixes the following building error:
FAILED: ninja: 'external/mesa/src/amd/registers/amdgfxregs.json',
needed by 'out/target/product/x86_64/gen/STATIC_LIBRARIES/libmesa_amd_common_intermediates/common/sid_tables.h',
missing and no known rule to make it
Fixes: b7a6333ee ("amd/registers: switch to new generated register definitions")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6618>
As shown in the valid SPIR-V below, if one switch case statement
directly jumps to the merge block, it has no branches at all and
we have to reset the fall variable. Otherwise, it creates an
unintentional fallthrough.
OpSelectionMerge %97 None
OpSwitch %96 %97 1 %99 2 %100
%100 = OpLabel
%102 = OpAccessChain %_ptr_StorageBuffer_v4float %86 %uint_0 %uint_37
%103 = OpLoad %v4float %102
%104 = OpBitcast %v4uint %103
%105 = OpCompositeExtract %uint %104 0
%106 = OpShiftLeftLogical %uint %105 %uint_1
OpBranch %97
%99 = OpLabel
OpBranch %97
%97 = OpLabel
%107 = OpPhi %uint %uint_4 %75 %uint_5 %99 %106 %100
This fixes serious corruption in Horizon Zero Dawn.
v2: Changed the code to skip the entire if-block instead of resetting
the fallthrough variable.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3460
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6590>
brw_compile_gs and brw_compile_tcs are extern C functions, but are
defined inside of brw namespace, which somehow works but confuses
Eclipse CDT's code analysis.
Move these functions out of brw namespace and fix references to
objects from brw namespace.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6602>
We're casting pointers to const memory to const pointers. MSVC complains
about this with the following warning:
warning C4090: 'initializing': different 'const' qualifiers
In this case, we can easily use both constnesses, because all we do is
read here. So let's avoid the warning by adding another const-keyword.
Fixes: 193765e26b ("nir/lower_goto_if: Sort blocks in select_fork")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6582>
Modeling after what I did for cros_servo_run.py, this gives us easy
support for restarting the test run a530 when we detect a spontaneous
reboot. I had to touch up serial_buffer.py to handle buffering in from a
file instead of a serial device, to support the upcoming etnaviv CI
(tested by running it against a serial log from db410c and seeing it step
to calling "fastboot")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6529>
gitlab CI doesn't include timestamps in its logs by default, but it's
really useful for finding delays in our CI so stuff one in on the lines
coming in from serial and being output to the gitlab log. The artifacts
file is still the raw serial output.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6529>
We were only reading from the CPU serial, not EC, so we'd never notice
these sources of job timeouts. I couldn't find a cleaner solution, so I
spawned two threads to do the blocking reads from our serial line fifos
and merge them together in a single queue to read.
Closes: #3470
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6529>
match() looks for the start of the line to match our regex, while search
just looks for the regex anywhere in the line. I messed this up when
converting our greps in shell to python, which was part of breaking the
POWER_GOOD flake detection. Most of our matches worked, but let's
consistently use this one so we don't mess this up in the future.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6529>
The result of 0xf << 28 is a signed integer and hence overflows into the sign
bit. In practice compilers did the right thing here, since the intent of the
code was unsigned arithmetic anyway.
Cc: mesa-stable
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6568>
The result of 0xf << 28 is a signed integer and hence overflows into the sign
bit. In practice compilers did the right thing here, since the intent of the
code was unsigned arithmetic anyway.
These conditions were observed in:
* dEQP-VK.pipeline.image.suballocation.sampling_type.combined.view_type.1d.format.r4g4b4a4_unorm_pack16.count_8.size.512x1
* dEQP-VK.binding_model.descriptorset_random.sets32.noarray.ubolimitlow.sbolimitlow.sampledimglow.outimgonly.noiub.nouab.frag.ialimithigh.0
Cc: mesa-stable
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6568>
This doesn't really do anything for us today. One day, I suppose we
could use it to do something with wide loads with non-uniform offsets.
The big reason to do this is to get better testing to make sure that NIR
doesn't blow up on the deref paths.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6472>
This commit propagates the alignment information provided either through
the Alignment decoration on pointers or via the alignment mem operands
to OpLoad, OpStore, and OpCopyMemory to the NIR deref chain. It does so
by wrapping the deref in a cast. NIR should be able to clean up most
unnecessary casts only leaving us with the useful alignment information.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6472>
If the deref has no explicit alignment in the chain, we assume component
alignment which is what we currently assume for all derefs today. This
should be correct for all APIs in the sense that we can usually assume
at least component alignment. However, for some APIs such as OpenCL, we
could potentially make larger alignment assumptions. The intention is
that those will be handled via alignment-increasing casts.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6472>
When creating explicit type, the alignment information is lost, thus
forcing explicit type users to recalculate the alignment using the same
size_align() function. Let's add a new field to cache this information.
Only structs, matrices, and vectors have and explicit alignment. Arrays
alignment is implicitly set to its element alignment and matrices are
required to have an alignment that matches that of its vector columns.
the concept of alignment simply doesn't apply to other types.
We make the strategic choice to not allow explicit alignments on
scalars. This is for a couple of reasons:
1. There are no cases today where we use explicit types where we want
any other alignment for scalars than natural alignment.
2. Vectors don't have a component alignment that's separate from the
explicit_alignment so it's impossible to get an explicitly aligned
scalar type which is the component of the explicitly aligned vector
type.
This choice may cause problems if we ever want to use explicitly laid
out types for things like varyings where we sometimes want vec4
alignment of scalars. We can deal with that when the time comes.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6472>
OpenCL doesn't mandate a size and this is consistent with the rest of
the glsl_type system. While we're here, we also clean ::cl_size() up a
bit and use a new explicit_type_scalar_byte_size() helper.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6472>
Instead, we do a limited indirect deref lowering and then use
nir_lower_vars_to_explicit_types and nir_lower_explicit_io to lower it
as if it were SSBO or global memory access. Among other things, this
should enable pointer arithmetic on local variables. Fun!
The only shader-db change from this change on ICL was a few tiny cycle
count changes in 7 Aztec Ruins compute shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5909>
Instead of always lowering everything, we add a threshold such that if
the total indirected array size (AoA size) is above that threshold, it
won't lower. It's assumed that the driver will sort things out somehow
by, for instance, lowering to scratch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5909>
Since everything flows through NIR and we're doing all of our indirect
deref lowering there now, there's no reason to keep making those
decisions in brw_compiler and stuffing them in the GLSL compiler
structs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5909>
Two potential problems, batch re-ordering doesn't really play nicely
with fence_server_sync(), so when we switch away from one batch, detect
the case that we need to sync, and if so flush. The alternative of
trying to track that later batches depend on an earlier batch that had
an in-fence is hairy, and the normal use-case would be to sync at the
beginning of the frame.
But this brings up the second problem, which is that typically we'll get
told to sync on an in-fence before the first draw, which means before
mesa/st flushes down the framebuffer state to the driver. Which means
we don't yet have the correct batch to attach the fence to. So we need
to track the in-fence on the context, and transfer it to the batch
before draws, etc.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6575>
When we have softpin, we know the address of the shader constant data at
shader upload time because it's sitting at the end of the shader. This
commit changes ANV to use patch constants to embed the address in the
shader patch the right address in at upload time. This allows us to
avoid having to set up a UBO binding on-the-fly for shader constants.
This commit uses an A64 message but it's quite possible that we could
also use an A32 message and make the dataport do the 64-bit add for us.
However, load_global is what we have right now so it was easier to just
use that.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6244>
With discrete GPUs, it's going to be possible to have GPUs from two
different hardware generations in the machine at the same time. Global
singletons like this aren't going to fly. Have a struct containing the
pointers which gets initialized once per shader disassemble instead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6244>
With the kernel timeline sysncobj changes, the kernel submits do
not necessarily happen in global vkQueueSubmit order. Which should
be fine, we added the appropriate waits for that. (See
DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT in the winsys)
However, all kernel submissions take a lock on the bo_list mutex,
and since we do the wait in the winsys, we wait while having the
bo_list mutex held. This means that as soon as a wait and a signal
submission are out of order we have a deadlock on the bo_list mutex
and the wait.
Solution is to use a shared reader lock during the kernel submission,
as we only need read access for the submission.
Fixes: 6bc5ce7a91 "radv: Add timeline syncobj for timeline semaphores."
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3446
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6478>
If decrement == 0 then:
- it isn't safe to access the submission
- even if it is, checking that the result of the atomic_sub is 0
doesn't given an unique owner anymore.
So skip it. The submission always starts out with refcount >= 1,
so first one to decrement to 0 still get dibs on executing it.
Fixes: 4aa75bb3bd "radv: Add wait-before-submit support for timelines."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6478>
WAIT takes a notification register as a destination and a src0 argument.
Since the same notification register is specified in both fields, we
treat it as a special case and disassemble it only once.
If we disassemble it as if it is a source register, its scalar region
will be printed as <0,1,0>. This causes difficulties round-tripping
through the assembler <-> disassembler because that is not an acceptable
destination region. If we instead disassemble the destination, we
instead get a <1> region which is an acceptable and equivalent region
for source and destination.
The test .asm files are regenerated by round-tripping them through the
assembler/disassembler. Note that the <0> region in the tests was a
harmless mistake: the compiler translated it to a <0,1,0> source region
and a <1> destination region, since <0> isn't valid.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6543>
We can get duplicate declarations for an index (for example dvec3 + float
packed into 2 vec4s, the second one won't pack into the first's array
decl), and we'd end up stepping by the wrong amount in GS vtx/prim emit.
Fixes vs-gs-fs-double, sso-vs-gs-fs-array-interleave piglit tests.
Fixes: 49155c3264 ("draw/tgsi: fix geometry shader input/output swizzling")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6567>
For NIR-to-TGSI, we don't want to revectorize 64-bit ops that we split to
scalar beyond vec2 width. We even have some ops that we would rather
retain as scalar due to TGSI opcodes being scalar, or having more unusual
requirements.
This could be used to do the vectorize_vec2_16bit filtering, but that
shader compiler option is also used in algebraic so leave it in place for
now.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6567>
It would be nice if we could do swizzling of an expression on the
replacement side so that we could have a single ieq/ine of the vector
after CSE. However, if you do want vector operations, nir_opt_vectorize()
does just fine.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6567>
Only in pre-merge pipelines for MRs, or in pipelines for forked project
branches.
Having the manual job in post-merge pipelines prevented the pages job
from running automatically as well, which could prevent the public
website from getting updated.
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6534>
When PIPE_CAP_ALPHA_TEST is zero, the driver does not support alpha
testing, so alpha shouldn't be set. In particular, alpha.enable should
be zero, since logically alpha testing is not used in the ZSA CSO when
it's lowered in the fragment shader key.
Fixes failing asserts in kicad, rvgl, etc with Panfrost since 6afd4ad.
(We could remove the assert in panfrost instead, but logically setting
alpha.enabled on top of lowering the shader seems wrong?)
As Erik pointed out, this should improve CSO cache behaviour.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reported-by: Icecream95 <ixn@keemail.me>
Tested-by: Urja Rannikko <urjaman@gmail.com>
Tested-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 6afd4addef ("panfrost: Simplify depth/stencil/alpha")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6523>
Since there are different number of results depending on query types,
this patch removes the result field out of the common struct and defines
query-specific results in each type of query struct.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6299>
Use labels instead of numeric JIP/UIP offsets.
Works for gen6+.
v2:
- Change asm tests to use labels on gen6+
- Remove usage of relative offsets on gen6+
- Consider brw_jump_scale when setting relative offset
- Return error if there is a JIP/UIP label without matching target
- Fix matching of label tokens
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4245>
Since the dest type is retrieved from the SPIR-V return type now,
we have to set it manually for OpFragmentMaskFetchAMD. The result
type must be a 32-bit unsigned integer type scalar.
Fix dEQP-VK.pipeline.multisample.shader_fragment_mask.* with RADV.
Fixes: a196f05fc2 ("nir/vtn: Use return type rather than image type for tex ops")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6533>
Only advertise VK_TIME_DOMAIN_CLOCK_MONOTONIC_RAW_EXT if CLOCK_MONOTONIC_RAW
is defined. Fixes the build on OpenBSD which has CLOCK_MONOTONIC but not
CLOCK_MONOTONIC_RAW.
Fixes: 67a2c1493c ("vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v5]")
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6517>
Passes piglit depth_clamp, depth-clamp-range,
amd_depth_clamp_separate_range. This is part of enabling GL 3.2 (the
other is bumping PIPE_CAP_GLSL_FEATURE_LEVEL, which I'm hoping to do once
we have the KHR-GL* testing in place).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6544>
There's no need to have separate build scripts here, just choose what the
DEQP_TARGET is for the particular container being built. This brings in a
tremendous number of GLES test fixes that haven't made it into a tagged
gles CTS release.
Closes: #2056
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6526>
The version bump gets us various testcase fixes, mostly to test
requirements). While we're rebuilding the container, copy GL CTS stuff
from build-deqp-gl.sh -- we had already included the glcts binary in our
image, but we had unnecessary other binaries and were missing the mustpass
files (container size stays the same overall). Also pull in all the GLES
mustpass lists, not just the main ones -- Rob wants them to increase our
coverage to match what Android CTS covers.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6526>
This really shouldn't matter as inputs should have logical pointers.
However, nir_builder defaults to building derefs based on the pointer
size in the shader_info. It's easier for now to just be consistent
everywhere.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6379>
libclang seems to define this on its own for SPIR targets, but the CTS
requires it to be not set if the device doesn't support images.
The SPIRV-LLVM-Translator also requires the spir triple to be set so we
can't really do anything else except to undefine.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstran.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6433>
The immediate case is pretty uncommon to see but it can happen, in
theory. BROADCAST is typically used to uniformize values and those are
usually 32-bit. However, it does come up in some subgroup ops.
Fixes: 49c21802cb "intel/compiler: Split has_64bit_types into float/int"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6211>
Drop unnecessary check which allows enabling of lossless write through
compression (HiZ + CCS) for D16_UNORM format on Gen12+.
We had misleading HSD information previously which used to claim that
compression can not be supported for 16bpp format. Although BSpec does
not have any restriction for D16_UNORM format.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6485>
This implements timeline semaphores using a new type of dma-fence
stored into drm-syncobjs. We use a thread to implement delayed
submissions.
v2: Drop cloning of temporary semaphores and just transfer their ownership (Jason)
Drain queue when dealing with binary semaphore
Ensure we don't submit to the thread as long as we don't need to
v3: Use __u64 not uintptr_t for kernel pointers
Fix commented code for INTEL_DEBUG=bat
Set DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES in timeline fence execbuf extension
Add new anv_queue_set_lost()
Drop multi queue stuff meant for the fake multi queue patch
Rework temporary syncobj handling
Don't use syncobj when not available (DeviceWaitIdle/CreateDevice)
Use ANV_MULTIALLOC
And a few more tweaks...
v4: Drop drained condition helper (Lionel)
Fix missing EXEC_OBJECT_WRITE on BOs we want to wait on (Jason)
v5: Add missing device->lost_reported in _anv_device_report_lost (Lionel)
Fix missing free on submit->simple_bo (Lionel)
Don't drop setting the device in lost state on QueueSubmit error (Jason)
Store submit->fence_bos as an array of uintptr_t (Jason)
v6: condition device->has_thread_submit to i915 & core DRM support (Jason)
v7: Fix submit->in_fence leakage on error (Jason)
Keep dummy semaphore with no thread submission (Jason)
v8: Move ownership of submit->out_fence to submit (Jason)
v9: Don't forget to read the VkFence's syncobj binary payload (Lionel)
v10: Take the mutex lock on anv_gem_close() (Jason/Lionel)
v11: Fix void* -> u64 cast on 32bit (Lionel)
v12: Rebase after BO backed timeline semaphore (Lionel)
v13: Fix missing snippets lost after rebase (Lionel)
v14: Drop update_binary usage (Lionel)
v15: Use ANV_MULTIALLOC (Lionel)
v16: Fix some realloc issues (Ivan)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v8)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2901>
Rework setup_{input,output} to be called during emit_intrinsic, in a way
which allows struct/array/matrix type varyings to work.
This allows turnip to pass dEQP-VK.glsl.linkage.varying.struct.*
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6181>
In FIFO presentation mode we block either on our present-queue or on Present
events after an image was transmitted.
In case we receive completion events without idle events at some point we
exhaust our acquire-queue and can not block anymore on present-queue.
Ensure that the consumer has at least one image to acquire before blocking
again on present-queue. Otherwise wait for one from the X server.
CC: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3344
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Simon Ser <contact@emersion.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6513>
Since OpenCL images don't have types, we can't use the image type here.
Rather than special-casing and only using SPIR-V return type for CL images,
we can just always use the return type to fill out the tex info.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5242>
In 957bbc6ad9 we merged all the per stages allocations of push
constants into a single one. Unfortunately one field remained per
stage.
This fixes the issue by including all the per stage values of the
masked registers for robust buffer access into the push constant data.
v2: Drop unneeded loop (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 957bbc6ad9 ("anv: simplify push constant emissions")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6505>
In nir-to-tgsi, I want to free temps storing SSA values when they go dead,
and NIR liveness has most of the information I need. Hoever, when I reach
the end of a block, I need to free whatever temps were in liveout which
are dead at that point. If liveout is indexed by live_index, then I don't
know the maximum live_index for iterating the live_out bitset, and I also
don't have a way to map that index back to the def->index that my temps
are stored under.
We can use the more typical def->index for these bitsets, which resolves
both of those problems. The only cost is that ssa_undefs don't get merged
into a single bit in the bitfield, but there are generally 1-4 of them in
a shader and we don't track liveness for those anyway so splitting them
apart is fine.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6408>
Debian defaults to bfd, which is comically slow. We can't use lld because
the old version we have in the debian stable we use has various bugs.
This required bumping libwayland, which had multiply-defined symbols
issues in the previous release.
Closes: #3236
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6324>
The state->shader is missing when used outside of nir_print_shader, just
drop these details in that case. We can fix nir_print_instr() to look up
the shader, but let's also make sure that an instr detached from a shader
(such as one you're constructing but haven't yet inserted) still works.
Fixes: 2b1ef5df4e ("nir: print IO semantics (v2)")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6496>
EXT_packed_depth_stencil adds GL_UNSIGNED_INT_24_8_EXT which is an interleaved format,
but vulkan spec states that reading/writing the corresponding format provides only the
D24 component, which requires that we perform separate operations for each component
using separate buffers
fixesmesa/mesa#3031
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6266>
For several schedule restrictions, we are checking if the instruction
is using the vpm. So far it was implemented as being a read or a write
of the vpm. But VPM wait (vpmwt) is not a read or a write (it is a
wait until all pending writes finishes). This is relevant to implement
peripheral accesses restrictions, as for some cases where vpm
read|writes are allowed, vpmwt is not.
Fixes:
dEQP-VK.binding_model.descriptorset_random.sets8.constant.ubolimitlow.sbolimitlow.sampledimglow.outimgtexlow.noiub.nouab.vert.noia.0
On the sim, as it was raising an assert for wrong peripheral access.
v2: simplify v3d_qpu_waits_vpm (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6498>
As previously shown, it is a mode on top of textureLod. The main gotcha
is the results are swizzled; we reuse the Broadcom lowering for that.
Also, there's a pretty significant erratum affecting gathers of cubemaps
which can be dealt with... eventually.
Fixes:
dEQP-GLES31.functional.texture.gather.basic.2d.*
dEQP-GLES31.functional.texture.gather.basic.2d_array.*
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6516>
By applying `textureGather` to a `sampler2DShadow`, the blob produces
(under the old disassembly):
tex_22.vtx.2d.shadow.cont.last r29, texture0, fsampler0.zwyx, r29,
The op 0x22 is 10|0010 in binary, the old shadow parameter is 1, and old
gather parameter is 0, so we get 0110|0010 in binary, or an op of
textureLod with a mod of 0110 = 6.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6516>
We don't properly support 64-bit vec2 yet for various reasons, and as-is
vectorize will try to create vec4 which we choke on. Since any workloads
relying on 64-bit vector performance are already DOA at this point,
let's just do the conformant thing.
Fixes:
dEQP-GLES31.functional.shaders.builtin_functions.integer.umulextended.uvec2_highp_compute
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6516>
Fix absolute to relative timeout computation.
Add sanity checks to futex_wait()
- handle the NULL timeout pointer case
- avoid negative cases.
From Matthieu Herrb and Scott Cheloha.
Fixes: c91997b6c4 ("util/futex: use futex syscall on OpenBSD")
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5630>
Mesa builds with -std=c99 but uses timespec_get() a c11 function.
Build with _ISOC11_SOURCE for c11 visibility when -std is specified.
On linux c11 visibility comes from defining _GNU_SOURCE.
Fixes: e3a8013de8 ("util/u_queue: add util_queue_fence_wait_timeout")
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5630>
Since cbee1bfb34 endian.h is unconditionally
used if available.
glibc has byte order defines with two leading underscores. OpenBSD
has private defines with a single leading underscore in machine/endian.h
and public defines in endian.h with no underscore.
The code under the endian.h block did not check if symbols were
defined before equating them so '#if __BYTE_ORDER == __LITTLE_ENDIAN'
would turn into '#if 0 == 0' which is always true.
Fixes: cbee1bfb34 ("meson/configure: detect endian.h instead of trying to guess when it's available")
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5630>
GLSL lowers packhalf2x16 itself, but for SPIRV we don't have that
option.
For packing when NIR lowers it uses f2f16
and for unpack it needs the casting and f2f32
Fixes:
dEQP-VK.glsl.builtin.function.pack_unpack.packhalf2x16*
dEQP-VK.glsl.builtin.function.pack_unpack.unpackhalf2x16*
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6381>
Need to update the z value after updating the pos at pixel
center, and later reupdate it again, so we can avoid some
LLVM IR values not being dominant issues.
Fixes:
dEQP-VK.renderpass.suballocation.multisample.s8_uint.samples_4
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6381>
ARB_framebuffer_no_attachments + multisampling means blend
can have an effect even outside of colorbufs
Fixes:
dEQP-VK.pipeline.multisample.alpha_to_coverage_no_color_attachment.samples_4.alpha_opaque
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6381>
This was suggested by Roland, and fixes stencil images.
Fixes:
dEQP-VK.renderpass.dedicated_allocation.formats.d24_unorm_s8_uint.*
dEQP-VK.renderpass.dedicated_allocation.formats.d32_sfloat_s8_uint.*
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6381>
Do pretty much what the gallium state tracker does here, and
fill out first/last layer for 3D image views correctly.
Fixes:
dEQP-VK.binding_model.shader_access.primary_cmd_buf.storage_image*3d*
Fixes: b38879f8c5 ("vallium: initial import of the vulkan frontend")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6381>
Currently llvmpipe calls finish on the context when a shader
variant has to be destroyed just in case the variant is currently
in use by the setup engine.
Fix this by reference counting the shaders, and reference counting
the shader variants.
Whenever a shader is used in the rasteriser backend, it is added
to a reference list and removed when the rasterizer is finished with it.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6341>
I have no idea how this pass ever worked. I guess it worked ok on the
one or two piglit tests but the whole thing seemed very fragile. It
makes a number of undocumented and unasserted assumptions and they
aren't always valid. This rewrite makes a number of changes:
1. It now properly handles the case where the gl_SampleMask write comes
before the gl_FragColor or gl_FragData[0] write.
2. It should early-exit faster because it now looks at bits in
shader_info::outputs_written instead of looking for variables.
3. Instead of the fragile variable lookup where we try to look the
variable up by both location and driver_location and match, we just
use the driver_location calculations used by brw_fs_nir.
4. It asserts that the index parameter to store_output is a constant
instead of silently failing if it isn't.
5. We now actually assert the implicit assumption that the two writes
are in the same block. We go even further and assert that they are
in the last block in the shader.
6. In the case where 3 or fewer components of the output are written,
we explicitly choose to leave the sample mask alone.
Fixes: 7ecfbd4f6d "nir: Add alpha_to_coverage lowering pass"
Closes: #3166
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6233>
Avoid having to mmap() unnecessarily by moving UBWC clear/init to
blitter.
Because we don't have a context when the bo is allocated, we need to
lazily initialize UBWC data, so hook into the resource_written()
tracking to do this. Don't bother with resource_read() because that
would be undefined anyways.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6475>
For prologue's in the nondraw path, we need a "gmem" rb that we can emit
the IB to the prologue before the main part of the batch. This has the
side benefit of cleaning up a bunch of duplicate setup code in a5xx.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6475>
Again, we'd like to keep the routines filling out the postfix together,
and this has a single remaining caller (once for vertex then immediately
for tiler).
By keeping them together we can avoid uploading the shared
memory/framebuffer structures twice in a row, saving a bit of memory in
the process.
We also fix a bug where bit 2 of gl_enables is incorrectly set on
Bifrost.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6476>
Need to signal push constants via a side channel. I tried to disentangle
this code, but there are a number of stacked issues here:
* We need to upload sysvals. Currently we prefix UBO #0 with sysvals,
but this requires a memcpy() of the entire contents of UBO #0. We
could create a synthetic UBO instead with sysvals at the end.
* We want to push uniforms/sysvals. Currently we push UBO #0 as much as
we can, which pushes sysvals automatically by point 1.
* We want to optimize out f2f16(uniform). We don't currently handle
this.
* We want to optimize out uniform-on-uniform/constant operations. Mesa
doesn't currently have good support for this.
The real solution will look something like:
* Create a separate UBO for sysvals.
* Let the compiler allocate push constant space as it sees fit ("copy
word 12:15 of UBO 1 to word 2:3 of push constant space, as fp16").
* Somehow handle uniform folding when NIR gains support.
For now, let's not block the depostfixening.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6476>
This is a bit of everything but overall sets up the draw state.
Translating fairly directly from the header. Main structural change is
breaking out a 2-bit enum for occlusion query mode instead of
maintaining separate booleans for the modes.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6476>
Mali has a special 5:3 encoding representing a subset of the natural
numbers, of the form:
a * (2^b)
for a odd and b natural/zero. It is used for padding out instance sizes,
as well as in attribute records so it's worth representing as a native
type as opposed to having manual packs/unpacks in various places.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6476>
Instead of allocating a push constant buffer per stage from the
dynamic state pool, we can use the same one for all stages.
We can do this because the push constant data is supposed to be
identical of all stages. Even if vkCmdPushConstants() allows to update
chunks of the push constant data differently per stage, this valid
usage guarantees that any chunk of push constant data used be 2
different stages must be identical :
"For each byte in the range specified by offset and size and for
each push constant range that overlaps that byte, stageFlags must
include all stages in that push constant range’s
VkPushConstantRange::stageFlags"
v2: Fix dirtying of stages (Jason)
v3: Move push constant data into base pipeline state struct (Jason)
v4: Remove duplicated field (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6183>
The nir_shader_lower_instructions() is really nice, but it's only for SSA
operations, and sometimes you want something more general. I've put it in
nir_builder.h so it can be inlined and retain the same performance
characteristics we're used to in our lowering passes even in the absence
of LTO.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6412>
The warning is kind of silly:
Test case 'dEQP-GLES2.functional.shaders.indexing.tmp_array.vec3_const_write_static_read_vertex'..
==1874780== Source and destination overlap in memcpy(0xa261690, 0xa261690, 160)
==1874780== at 0x484D498: __GI_memcpy (vg_replace_strmem.c:1037)
==1874780== by 0x596FC07: copy_entry_remove (nir_opt_copy_prop_vars.c:296)
The "memcpy is undefined if they overlap" thing is surely meant to be
"memcpy with *partial* overlap is undefined", but let's keep anyone else
from having to debug this.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6178>
When calling glWaitSync (fence_server_sync), we added dependencies
in all batches (render and compute) on existing work. Even if
applications don't use compute at all, they theoretically could,
so we record that the compute batch depends on the render batch.
But if the application truly doesn't use compute, or rarely uses
it, we ended up recording dependencies on _all_ previous render
batches, racking up a massive list of syncobjs. Not only is this
pointless, it also meant that we never allowed the kernel to free
the underlying i915_request objects.
There are a number of solutions to this problem, but for now, we
take a simple one: when recording a new syncobj dependency, we
walk the list and see if any of them have already passed. If so,
that dependency has been fulfilled. We no longer need to track it,
and can simply drop it from the list, unreferencing the syncobj.
Android's SurfaceFlinger in particular was hitting this issue,
as it uses glWaitSync, doesn't typically use compute shaders,
and runs for long durations.
Thanks to Yang A Shi <yang.a.shi@intel.com> and
Kefei Yao <kefei.yao@intel.com> for their excellent work in
tracking down this issue!
Fixes: f459c56be6 ("iris: Add fence support using drm_syncobj")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Yang A Shi <yang.a.shi@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6463>
Swapping the order of the loops makes the logic much easier to follow:
for each point in our fence, if it hasn't gone by, make future work in
all batches depend on it. Both loops are necessary, and now it's
clearer why.
(This doesn't actually fix a bug but needs to be cherry-picked for
the next patch to apply, which does fix a bug.)
Fixes: f459c56be6 ("iris: Add fence support using drm_syncobj")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Yang A Shi <yang.a.shi@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6463>
The VIVS_PE_DEPTH_CONFIG_DISABLE_ZS in PE_DEPTH_CONFIG caused depth
write hangs on HALTI5.
This is because the 0x11000000 bits in RA have to be toggled on
when setting this bit to zero. This combination will disable
early-z rejection on GC7000L, which was previously done through
a different bit.
Tested only on GC7000L so far.
Signed-off-by: Lukas F. Hartmann <lukas@mntre.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5456>
PTB assumes that base instance to be 0 at start of tile, but hw would
not do that, we need to set it. It is worth to note that the opcode
name is somewhat confusing as what it really sets is the base
instance. We could rename the opcode, but then the name would be
different to the original Broadcom name, so confusing in any case.
This fixes several dEQP-GLES3 and dEQP-GLES31 tests that passes
individually, but started to fail depending on other tests running
before using base instance different to zero.
This is the backport of a Vulkan patch that fixed some Vulkan CTS
tests that start to fails after some other tests used an instance id.
CC: 20.2 20.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6447>
As with a6xx (commits beb02a78, 5785bcc8), the blob doesn't set this flag
for a5xx when fragcoords are used but not proper varyings. See for
example dEQP-GLES2.functional.shaders.builtin_variable.fragcoord_xyz.
The hope was that this would clear up separate_shader fails/flakes like it
helped with a6xx's flakes, but that didn't happen.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6464>
We got another flake, this time on
dEQP-GLES31.functional.compute.shared_var.atomic.compswap.highp_uint,
which blocked !4162 from merging. Mark the rest flaky so we don't have to
keep firefighting one test at a time.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6459>
It seems we do have some limits. Similar to older gens, # of tiles per
pipe cannot be more than 32. But I could not trigger any hangs with 16
or more tiles per pipe in either X or Y direction, so that limit does
not seem to apply.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6461>
Adds shader disk caching for nvc0 to reduce the need to every time compile
shaders. Shaders are saved into disk_shader_cache from nvc0_screen structure.
It serializes the input nv50_ir_prog_info to compute the hash key and
also to do a byte compare between the original nv50_ir_prog_info and the one
saved in the cache. If keys match and also the byte compare returns they
are equal, shaders are same, and the compiled nv50_ir_prog_info_out from the
cache can be used instead of compiling input info.
Seems to be significantly improving loading times, these are the results
from running bunch of shaders:
cache off
real 2m58.574s
user 21m34.018s
sys 0m8.055s
cache on, first run
real 3m32.617s
user 24m52.701s
sys 0m20.400s
cache on, second run
real 0m23.745s
user 2m43.566s
sys 0m4.532s
Signed-off-by: Mark Menzynski <mmenzyns@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4264>
The initial plan was to use this for OpenCL kernels, but back then the
plan was to convert from LLVM to TGSI. As it turns out, we didn't went
that way.
Right now for OpenCL we don't reqiure supporting multiple entry points
inside the same binary and if we want to support it later, we can add
this back.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4264>
We only need to assert this in the `io_lowered` case, which actually
uses num_outputs. This assert also doesn't appear to hold on iris,
where num_outputs is showing up as 0 (because it's likely not yet set).
Fixes assertion failures in edgeflag related tests on iris, which
doesn't use the io_lowered path currently.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3456
Fixes: 484a60d547 ("nir: generate lowered IO in nir_lower_passthrough_edgeflags")
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6450>
When using nir_lower_interpolation, we need to propagate the IO
semantics from the load_interpolated_input to the new
load_fs_input_interp_deltas intrinsics. nir_lower_io assumes
they will be filled out.
This fixes assertions in most tests on iris since commit
01ab308edc, where nir_lower_io
started reading this field.
Fixes: 01ab308edc ("nir: update IO semantics in nir_io_add_const_offset_to_base")
Fixes: 502abfce7f ("nir: save IO semantics in lowered IO intrinsics")
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6450>
These special OPAQUE packs use packed structs in the struct template,
instead of struct templates. The use case is packing nested structs
out-of-band, to fit into the CSO model.
A more conventional GenXML solution would be an overlapping uint, but
this breaks our assumptions about struct packing which are otherwise
correct, so this seemed less intrusive than risk disrupting the main
pack routines.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6440>
This contains a bit of everything, so just XML for this commit. The rest
of the series will be slowly moving over to this representation.
The one noteworthy addition is the rename of "No MSAA" to
"Single-sampled lines". This came about due to a buggy branch that
forgot to set this bit. Ths worked, with the caveat of the following
tests failing with a single-sampled framebuffer:
dEQP-GLES2.functional.rasterization.interpolation.basic.line_loop_wide
dEQP-GLES2.functional.rasterization.interpolation.basic.line_strip_wide
dEQP-GLES2.functional.rasterization.interpolation.basic.lines_wide
dEQP-GLES2.functional.rasterization.interpolation.projected.line_loop_wide
dEQP-GLES2.functional.rasterization.interpolation.projected.line_strip_wide
dEQP-GLES2.functional.rasterization.interpolation.projected.lines_wide
dEQP-GLES2.functional.rasterization.primitives.line_loop
dEQP-GLES2.functional.rasterization.primitives.line_loop_wide
dEQP-GLES2.functional.rasterization.primitives.line_strip
dEQP-GLES2.functional.rasterization.primitives.line_strip_wide
dEQP-GLES2.functional.rasterization.primitives.lines
dEQP-GLES2.functional.rasterization.primitives.lines_wide
That is, this bit controls the behaviour of line rasterization with
multisampling. This is required to implement the divergent behaviours
described in the OpenGL ES 3.2 specification sections 13.6.1 ("Basic
Line Segment Rasterization") and 13.6.4 ("Line Multisample
Rasterization"), where setting this bit corresponds to the former
(single-sampled) behaviour.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6440>
For non-fragment shaders, we can just use the preuploaded BO as-is and
do no packing/copying at draw-time. Fragment shaders are still a bit of
an edge case, in having rasterizer/zsa/blend state in the same
descriptor, but now we can specialize that path further for
fragment-only.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6440>
This isn't as clean as vertex shaders, since some of this state is only
known at draw-time (e.g. some reasons we might disable early-Z). But we
can still pack as much as we can ahead-of-time and then OR together
what's left at draw-time.
Thank you to Kristian for this one crazy trick (blob developers hate
it! er...)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6440>
There's a lot of code here since the meaning of this field changes
depending on shader state. The good news is that our careful handling
allows preload registers to be decoded now, which pandecode could not
previously do. Likewise, the cmdstream code to emit this is now much
more obvious.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6440>
In a long journey to a full XML representation of mali_shader_meta,
let's start with the fourth word, containing some shader properties.
This is a translation from panfrost-job.h, with the exception of
widening the uniform buffer count field [1]
The other noteworthy change is combining the unknown 0x20 flag with the
WRITES_Z flag to form a 2-bit depth source. This papers over the fact
that the blob zeroes this field for non-fragment shaders. Given the
proximity, this is a reasonable guess and avoids an ugly "is_fragment"
bit.
[1] Justified by the increased limit advertised by the Vulkan blob
(maxDescriptorSetUniformBuffers on
https://vulkan.gpuinfo.org/displayreport.php?id=5602#limits). Not
actually supported in Panfrost right now.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6440>
The cases of SFBD blend shaders (which work normally) and MFBD blend
shaders (which need a bizarre errata workaround) are distinct. Let's
seperate them to make each a bit clearer.
Since this field is repurposed on Bifrost, this should fix bugs there
too (although blend shaders on Bifrost are currently todo).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6440>
In exchange for a bit of code duplication, we can streamline both
routines. A huge amount of the descriptor is unused for non-fragment
shaders, and even some of the parts that are used appear to have varying
meanings. Given we want to emit the descriptors atomically, this seems
like a reasonable tradeoff.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6440>
Before this change, internalFormat was defaulted to GL_RGBA (
unsized internal format). Therefore, subsequent glTexSubImage2D
call with type != GL_UNSIGNED_BYTE, GL_UNSIGNED_SHORT_4_4_4_4 or
GL_UNSIGNED_SHORT_5_5_5_1 would give GL_INVALID_OPERATION.
This fixes
android.graphics.cts.BitmapColorSpaceTest#test16bitHardware
android.graphics.cts.ImageDecoderTest#testDecodeBitmap*
android.graphics.cts.BitmapTest#testNdkFormatsHardware
in CtsGraphicsTestCases
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6382>
This is the value exposed by the a3xx-a4xx drivers according to an
official Adreno OpenGL ES Developer guide I found, and also a report I saw
for a5xx while googling. Fixes renderdoc replay of a manhattan31 trace
captured on a Pixel 3a.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6416>
If you have a sequence where there is a single buffer associated with
the current render target, and then you end up shadowing it on the 3d
pipe (u_blitter), because of how we swap the new shadow and rsc before
the back-blit, you could end up confusing things into thinking that
the blitters framebuffer state is the same as the current framebuffer
state.
Re-organizing the sequence to swap after the blit is complicated when
also having to deal with CPU memcpy blit path, and the batch/rsc
accounting. So instead just detect this case and flush if we need to.
Fixes:
dEQP-GLES31.functional.stencil_texturing.render.depth24_stencil8_clear
dEQP-GLES31.functional.stencil_texturing.render.depth24_stencil8_draw
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6434>
This enables drivers and utils to get all IO information from intrinsics,
so that they don't have to walk the complex types of NIR variables to find
out other information about IO intrinsics.
NIR in/out variables can be removed after nir_lower_io. We could remove
the variables in the pass, but for now I just decided to remove
the variables in radeonsi before shaders are returned to st/mesa.
(st/mesa just needs adjustments to work without NIR in/out variables)
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6442>
The raw query is meant to be used with MDAPI [1]. When using this
metric without this library, we usually selected the TestOa metric to
provide some default sensible values (instead of undefined).
Historically this TestOa metric lived in the kernel at ID=1. We
removed all metrics from the kernel in kernel commit 9aba9c188da136
("drm/i915/perf: remove generated code").
This fixes the Mesa code to use a valid metric set ID (1 could work
some of the time, but not guaranteed).
[1] : https://github.com/intel/metrics-discovery
v2: Store fallback metric at init time
v3: Drop TestOa lookout
v4: Skip the existing queries (Marcin)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
CC: <mesa-stable@lists.freedesktop.org>
Tested-by: Marcin Ślusarz <marcin.slusarz@intel.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6438>
This gets us fewer comparisons in the shaders that we need to optimize
back out, and reduces backend code.
total instructions in shared programs: 11547270 -> 7219930 (-37.48%)
total full in shared programs: 334268 -> 319602 (-4.39%)
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6378>
When TRAP_PRESENT is not enabled, all traps and exceptions are ignored.
Only EXCP_EN.mem_viol is currently supported because the other
exceptions have to be tested/validated first.
EXCP_EN.mem_viol is used to detect any sort of invalid memory
access like VM fault. When a memory violation is reported, the
hw jumps to the trap handler.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6384>
A trap handler is used to handle shader exceptions like memory
violations, divide by zero etc. The trap handler shader code will
help to identify the faulty shader/instruction and to report
more information for better debugging.
This has only been tested on GFX8, though it should work on GFX6-GFX7.
It seems we need a different implemenation for GFX9+.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6384>
It's way easier to write a trap handler shader using ACO IR
instead of writing disassembly by hand + clrxasm + copy&paste.
This trap handler is quite simple for now, it just loads a
buffer descriptor from the TMA BO, it saves ttmp0-1 which
contain various info about the faulty instruction, and it
stores some hw registers about the wave/trap status.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6384>
The TBA/TMA scalar registers are only available on GFX6-GFX8.
On GFX9+, TBA/TMA addr are stored in hardware registers and
the number of TTMP scalar registers is thus increased by 4.
Just keep in mind that tba_lo is actually ttmp0. Best would
be to support ttmp registers in RA but that's more complicated.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6384>
Microsoft's DXIL is based on LLVM, which doesn't have an integer ABS
opcode, but instead needs it lowered to NEG + MAX. We need to do this
with an option, to prevent an already existing optimization rule from
undoing this.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5211>
If no options are provided, existing intrinsics are used.
If the lowering pass indicates there should be offsets used for global
invocation ID or work group ID, then those instructions are lowered to
include the offset.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5891>
New intrinsics are added for global invocation IDs and work group IDs to
deal with offsets in both. The only one of these that needs a system value
is global invocation offset, for CL's get_global_offset().
Note that CL requires very large work group sizes, so these intrinsics
are modified to be able to use 64bit values, for 64bit SPIR-V.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5891>
Occasionally something goes weird in the network and a group of chezas
will produce streams of these errors during the tftp process, eventually
timing out after 60 minutes in the job. By the time we notice, the next
jobs seem to go through fine, so watch for them and try rebooting the
cheza to see if that gets our jobs to pass again.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6398>
If we get this error, we can just try rebooting again and see if it comes
up then. The POWER_GOOD failures are clustered in time, but it's better
to retry a few times in a row in one job (which has its own 60min timeout)
than to spuriously fail someone's pipeline.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6398>
This one uses python threads to move some of our logic from shell
pipelines to python, and opens the door to doing better serial output
tracking in the future (the SerialBuffer.lines() method)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6398>
On some malloc implementation, malloc doesn't always align to 16
bytes even on 64 bits system. To make sure ralloc_header always
starts at the wanted alignment, just force the size to be aligned at
the alignment of ralloc_header. This fixes crashed on instruction
like "movaps %xmm0,0x10(%rax)" which requires aligned memory access.
Signed-off-by: Lepton Wu <lepton@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6314>
With commit 2122b902b8, u_index_translator can return U_TRANSLATE_MEMCPY
for 8-bits indices, and in this case we need to call the translation function
instead of a simple passthrough to the device.
Fixes piglit spec@nv_primitive_restart tests.
Fixes: 2122b902b8 "gallium/indices: don't expand prim-type for 8-bit indices"
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6414>
When we initialize the buffer surface, do not map the existing storage
with DONTBLOCK, leave it as a synchronized map.
This patch also sets the surface rebind flag after it is bound to a
new buffer and sets the surface buffer pointer accordingly.
This fixes display corruption issue seen with running steam.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6415>
Section 5.11 (Out-of-Bounds Accesses) of the GLSL 4.60 spec says:
"In the subsections described above for array, vector, matrix and
structure accesses, any out-of-bounds access produced undefined
behavior.... Out-of-bounds writes may be discarded or overwrite
other variables of the active program."
Fixes crashes when dereferencing gl_ClipDistance and gl_TessLevel*, e.g:
int index = -1;
gl_ClipDistance[index] = -1;
When LowerCombinedClipCullDistance is true.
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6363>
Several optimization paths, including constant folding, can lead to
indexing vector with an out of bounds index.
Out-of-bounds writes could be eliminated per spec:
Section 5.11 (Out-of-Bounds Accesses) of the GLSL 4.60 spec says:
"In the subsections described above for array, vector, matrix and
structure accesses, any out-of-bounds access produced undefined
behavior.... Out-of-bounds writes may be discarded or overwrite
other variables of the active program."
Fixes piglit tests:
spec@glsl-1.20@execution@vector-out-of-bounds-access@fs-vec4-out-of-bounds-1
spec@glsl-1.20@execution@vector-out-of-bounds-access@fs-vec4-out-of-bounds-6
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6363>
This makes sure that we keep executing the tests so that we can get our
alerts in IRC and know whether the tests are still flaking. It also keeps
us from having adjustments to the skip list causing failures/flakes to
move to different tests (as seen with a530 having to move some xfails
around after changing the skip list)
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6392>
So far, we've been putting our known flakes that intermittently fail CI
into the skips list. This has two downsides:
1) You don't know when the flakes stop happening and when to delist them
from skips, unless you go do a bunch of manual runs with the skips list
cleared.
2) If the flake was because the previous test left some broken state in
the HW, you may just move your intermittent to a new test.
With this new path, you can list your flakes in the flakes file to keep
them from erroring out people's pipelines. They still get run and
reported as is.
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6392>
The OpenCL image_width/height/depth functions have variants which can
take an LOD parameter. More importantly, LLVM-SPIRV-Translator always
generates OpImageQuerySizeLod even if the LOD is guaranteed to be zero.
Given that over half the hardware out there has an LOD field for image
size queries (based on a rudimentary scan through their NIR -> whatever
code), we may as well just add the source to the NIR intrinsic. If this
is ever a problem for anyone, the lowering is pretty trivial.
I've also added asserts to everyone's drivers that should alert them if
they ever see an LOD other than zero. This will never happen with GL or
Vulkan so there's no need for panic.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6396>
v2:
a) Move `supported_spirv_verssions()` to `spirv::` (Francisco Jerez)
b) Introduce a `SPV_MAKE_VERSION` macro to avoid making mistakes and
making it more readable than its uint representation.
v3: Replaced an if-statement with a `std::min()` in
`spirv::get_spirv_translator_options()` (Karol Herbst)
v4: Turn `SPV_MAKE_VERSION()` into a function (Francisco Jerez)
Signed-off-by: Pierre Moreau <dev@pmoreau.org>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5038>
SPIRV-LLVM-Translator had API-breaking changes during version 0.2.1.
Since the new API was being used, a higher version has to be requested.
While at it and since SPIRV-LLVM-Translator added a new API to request a
specific SPIR-V version as well as allowed SPIR-V extensions, increase
the version number to the first including those new features.
v2: Require an LLVM version which is compatible with the
SPIRV-LLVM-Translator that was found; that requirement was
previously implicit and would be ensured when that library was
built. Making it explicit will help in cases where multiple versions
of LLVM might be installed.
v3: fix use of undefined _minimum_llvmspirvlib_version_array
v4 (Karol): fix creating the minimum requested version
Simplifing the code
Signed-off-by: Pierre Moreau <dev@pmoreau.org>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Serge Martin <edb@sigluy.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5038>
As the original comment says, we can't really give the user what they
want if there's a timestamp inside a GMEM renderpass, but we can give
them a better approximation of it. At least sysmem renderpasses will now
have an accurate timestamp.
Also, don't emit the WFI if it's not necessary, based on the stage
flags.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5720>
Loads, stores, clears, and resolves now happen per-view. Since we only
support multiview with sysmem rendering, we only implement this for
sysmem clears and resolves.
There aren't any tests that mix multiview and MSAA, so no coverage of
the resolve path.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5720>
This cuts a bunch of vector setup for undef components in the i965 vec4
backend. Noticed while looking into codegen regressions in nir-to-tgsi.
brw results:
total instructions in shared programs: 3893221 -> 3881461 (-0.30%)
total cycles in shared programs: 113792154 -> 113810288 (0.02%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6054>
On glmark2-es2 -bterrain:
594.05KB leaked over 9282 calls from:
panfrost_batch_update_bo_access
at ../src/gallium/drivers/panfrost/pan_job.c:462
in /home/alyssa/rockchip_dri.so
panfrost_batch_add_bo
at ../src/gallium/drivers/panfrost/pan_job.c:560
panfrost_batch_add_bo
at ../src/gallium/drivers/panfrost/pan_job.c:519
in /home/alyssa/rockchip_dri.so
panfrost_batch_add_resource_bos
at ../src/gallium/drivers/panfrost/pan_job.c:569
panfrost_batch_add_fbo_bos
at ../src/gallium/drivers/panfrost/pan_job.c:588
in /home/alyssa/rockchip_dri.so
panfrost_create_batch
at ../src/gallium/drivers/panfrost/pan_job.c:126
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: mesa-stable
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6373>
8.74KB leaked over 52 calls from:
0xffffbb5b9fc3
in ??
_mesa_hash_table_init
at ../src/util/hash_table.c:163
in /home/alyssa/rockchip_dri.so
_mesa_hash_table_create
at ../src/util/hash_table.c:186
_mesa_hash_table_u64_create
at ../src/util/hash_table.c:701
in /home/alyssa/rockchip_dri.so
panfrost_nir_assign_sysvals
at ../src/panfrost/util/pan_sysval.c:130
in /home/alyssa/rockchip_dri.so
midgard_compile_shader_nir
at ../src/panfrost/midgard/midgard_compile.c:2905
in /home/alyssa/rockchip_dri.so
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: mesa-stable
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6373>
Before we drop the reference.
160 calls with 0B peak consumption from:
0xffffbd9d2fc3
in ??
pan_compute_liveness
at ../src/panfrost/util/pan_liveness.c:127
in /home/alyssa/rockchip_dri.so
mir_compute_liveness
at ../src/panfrost/midgard/midgard_liveness.c:55
in /home/alyssa/rockchip_dri.so
midgard_opt_dead_code_eliminate
at ../src/panfrost/midgard/midgard_opt_dce.c:118
in /home/alyssa/rockchip_dri.so
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: mesa-stable
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6373>
No need to put it on the context, we can keep it local in mir_squeeze
and drop when we're done.
15.77KB leaked over 85 calls from:
0xffffaed3bfc3
in ??
_mesa_hash_table_rehash
at ../src/util/hash_table.c:368
in /home/alyssa/rockchip_dri.so
hash_table_insert
at ../src/util/hash_table.c:403
in /home/alyssa/rockchip_dri.so
find_or_allocate_temp
at ../src/panfrost/midgard/mir_squeeze.c:48
in /home/alyssa/rockchip_dri.so
find_or_allocate_temp
at ../src/panfrost/midgard/mir_squeeze.c:35
in /home/alyssa/rockchip_dri.so
mir_squeeze_index
at ../src/panfrost/midgard/mir_squeeze.c:76
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: mesa-stable
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6373>
After we compile from NIR to a native binary, we can throw away the NIR.
17.47KB leaked over 104 calls from:
0xffff87dcafc3
in ??
_mesa_hash_table_init
at ../src/util/hash_table.c:163
in /home/alyssa/rockchip_dri.so
_mesa_hash_table_create
at ../src/util/hash_table.c:186
nir_lower_vars_to_ssa_impl
at ../src/compiler/nir/nir_lower_vars_to_ssa.c:717
in /home/alyssa/rockchip_dri.so
nir_lower_vars_to_ssa
at ../src/compiler/nir/nir_lower_vars_to_ssa.c:817
optimise_nir
at ../src/panfrost/midgard/midgard_compile.c:504
in /home/alyssa/rockchip_dri.so
midgard_compile_shader_nir
at ../src/panfrost/midgard/midgard_compile.c:2895
in /home/alyssa/rockchip_dri.so
panfrost_build_blit_shader
at ../src/panfrost/lib/pan_blit.c:103
in /home/alyssa/rockchip_dri.so
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: mesa-stable
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6373>
Fixes heaptrack leak:
19.37KB leaked over 63 calls from:
0xffff92bbefc3
in ??
nir_alu_instr_create
at ../src/compiler/nir/nir.c:442
in /home/alyssa/rockchip_dri.so
clone_alu
at ../src/compiler/nir/nir_clone.c:277
in /home/alyssa/rockchip_dri.so
clone_instr
at ../src/compiler/nir/nir_clone.c:495
in /home/alyssa/rockchip_dri.so
clone_block
at ../src/compiler/nir/nir_clone.c:544
clone_cf_list
at ../src/compiler/nir/nir_clone.c:594
clone_function_impl
at ../src/compiler/nir/nir_clone.c:672
in /home/alyssa/rockchip_dri.so
nir_shader_clone
at ../src/compiler/nir/nir_clone.c:744
in /home/alyssa/rockchip_dri.so
panfrost_shader_compile
at ../src/gallium/drivers/panfrost/pan_assemble.c:154
in /home/alyssa/rockchip_dri.so
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: mesa-stable
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6373>
Whereas the main batch->pool is CPU read/write, the new
batch->invisible_pool is not. This enables GPU-internal structures that
the CPU must allocate from a pool dynamically but does not read,
corresponding to the BO_INVISIBLE create flag.
The use case is speeding up varying allocation by skipping the
CPU-side mmap/munmap.
We simultaneously half the pool's minimal allocation to avoid negatively
affecting memory usage.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6373>
We were failing to set the "Headerless Message for Preemptable Contexts"
bit in SAMPLER_MODE in the compute context. Other drivers use a single
hardware context, so setting it on the render engine was sufficient to
flip it in both pipelines. But iris uses a separate hardware context
for compute, so we were only getting these set for the render context.
Thanks to Jason Ekstrand for catching this bug.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6380>
We were calling the table-based unpack functions from inside the pack and
unpack table's methods, so if anything included these pack functions (such
as a call to a table-based pack function), you'd pull in all of unpack as
well.
By calling them explicitly, we save some overhead in these functions
(switch statement, address math on the zero x,y arguments) anyway.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6307>
Eric Anholt identified the issue when merging one of my MRs: the
variable contained words in '`' backticks, which caused them to be
executed by the bare metal runner's shell.
Quote the value printed using bash's shell expansion feature to make
sure anything in the future will be properly quoted.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6389>
This pass works entirely with variables, all we have to do is ignore any
derefs we see which we can't chase back to the variable. The one
interesting case we have to handle is if we have a complex use of a
deref_var. In that case, we have to flag it non-constant.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6210>
The *2 here would bump into the *2 in regset, causing assertion failures
dumping CS programs. Just set the mergedregs flag on a6xx, and don't
duplicate the mergedregs logic. If you're dealing with new HW where we
don't know if mergedregs is set, you may need to tweak the flag during
disasm setup for the stats to make sense.
Fixes: f7bd3456d7 ("freedreno: deduplicate a3xx+ disasm")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6323>
This matches what ir3.c does in the mergedregs case: just count max full
reg used. This flag is unset so far, but will be soon and keeps our
output comparable between blob and freedreno.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6323>
Previously we were doing this both for draw and binning pass.. resulting
that the stateobj contained first the incorrect state (based on binning
pass shader with varyings DCEd) followed by the correct state.
Also, in the streamout case we should link binning pass VS against the
real FS, rather than the dummy FS, so that OUTLOC's match (since same
streamout stateobj is used for both draw and binning pass).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6376>
During build on Android 10, build error occurred:
'''
[ 26% 456/1718] Gen Header: libfreedreno_registers_32 <= a3xx.xml.h
FAILED: out/target/product/pinephone/gen/STATIC_LIBRARIES/libfreedreno_registers_intermediates/registers/adreno/a3xx.xml.h
/bin/bash -c "PATH=/usr/bin:\$PATH python3 external/mesa3d/src/freedreno/registers/gen_header.py external/mesa3d/src/freedreno/registers/adreno/a3xx.xml > out/target/product/pinephone/gen/STATIC_LIBRARIES/libfreedreno_registers_intermediates/registers/adreno/a3xx.xml.h"
Traceback (most recent call last):
File "external/mesa3d/src/freedreno/registers/gen_header.py", line 470, in <module>
main()
File "external/mesa3d/src/freedreno/registers/gen_header.py", line 446, in main
xml_file = sys.argv[2]
IndexError: list index out of range
'''
Align build rules with meson fixes it.
Fixes: 62ebd342 ("freedreno/registers: split header build into subdirs")
Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6170>
This is a missing part of freedreno EXT_semaphore support,
recently merged as part of series adding EXT_external_objects. This hunk was
actually lost due to a mistake on my side while doing a rebase.
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6383>
This currently covers two situations where it's obvious that
the GPU hung:
1) when wait-of-idle doesn't finish in a finite time
2) when a CS submission is cancelled by the kernel
There is still probably some other situations that aren't yet handled.
According to the Vulkan spec, some operations should return
VK_ERROR_DEVICE_LOST when the corresponding logical device is
known to be lost.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5878>
vulkan has these separate, should be fine for GL as well as
the values will be the same anyways.
Fixes:
dEQP-VK.binding_model.shader_access.secondary_cmd_buf.uniform_texel_buffer.*
dEQP-VK.binding_model.descriptorset_random.sets4.noarray.ubolimitlow.sbolimitlow.sampledimglow.outimgtexlow.noiub.uab.vert.noia*
dEQP-VK.binding_model.descriptor_copy.graphics.uniform_texel_buffer.*
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6339>
The lod query doesn't take a layer, but the code tries to use one,
detect lodq and don't use a layer in those cases.
There appears to be no GL tests for this behaviour, but the vulkan
CTS hits it.
Fixes:
dEQP-VK.glsl.texture_functions.query.texturequerylod.*
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6339>
v2: Use 'prsc' and 'rsc', 'pmemobj' and 'memobj' for consistency with
rest of the code. (Rob Clark)
v3: - Use the existing flag PIPE_BIND_LINEAR instead (Marek Olšák)
- Assert that the resource is not intended for scanout (Rob Clark)
- Use the fd_resource_allocate_and_resolve() helper (Rob Clark)
- Check that bo's resolved size fit into memobj's bo size (Rob Clark)
v4: Don't steal memobj's bo, but share it instead by getting a new
ref. (Rob Clark)
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4565>
v2: Add implementation of fd_memobj_destroy() virtual func, which was newly
added.
v3: The memobj bo must be non-NULL and destroyed as part of memobj
destruction (instead of its reference being stolen). (Rob Clark)
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4565>
If tiling is linear, also adds the corresponding pipe bind flag
to the resource template when creating a new resource from the
memory object.
Modified TexParameteri to update the TextureTiling option only when the
texObj is not immutable (before TexStorageMem*DEXT is called to create
the texture from external memory).
v2: Ensure that memory object is not NULL before setting the
TextureTiling option (fixes TexStorage*D).
v3: Also add flag PIPE_SHARED to bindings (needed for some gallium
drivers).
v4: Use PIPE_BIND_LINEAR instead of adding a new PIPE_RESOURCE_FLAG
(Marek Olšák)
Co-authored-by: Eduardo Lima Mitev <elima@igalia.com>
Co-authored-by: Eleni Maria Stea <estea@igalia.com>
Co-authored-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Rohan Garg <rohan.garg@collabora.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4565>
We really need to update the enum for consistency, but that involves
a bunch of GL & bitfield work which is error-prone, so since this is
a fix for stable lets do the simple things.
Confirmed that nothing in radv/aco/nir/spirv uses MAX_VERT_ATTRIB
except the one thing I bumped.
CC: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6120>
This uses a new gralloc perform op that returns the buffer info we
need. No need to guess at formats, hard code offsets and recalculate
strides. This also gives us the format modifier as well as aux planes
for compressed RGBA buffers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6055>
This function wants to create a __DRIimage for an ANativeWindowBuffer,
which is mostly the same logic as when we create an EGLImage for an
ANativeWindowBuffer. Reuse droid_create_image_from_prime_fds().
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6055>
Instead of building up EGL attribute lists and then having to parse
them again, call the DRI driver directly and then use the
dri2_create_image_from_dri() helper to wrap the __DRIimage in an
EGLImage.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6055>
Although it's kind-of similar to "(rptN)" in the shader ISA, I called it
"xmov" to make it clear that it's completely orthogonal to "(rep)",
although you certainly can use both modifiers on the same instruction.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6368>
This struct allows us to pass the configuration as a struct, which can
more easily be extended to take more arguemnts as long as we're careful
about zero-initialization.
We keep the old create-function, but implement it as a wrapper on top so
we don't have to update all existing call-sites right now.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5976>
Expanding the primitive-type has two undesirable effects:
1. It breaks primitive-restart. This is possible to fix by explicitly
handling primitive-restart in more conversion routines. But
u_indices_gen.py is kind of a mess, so it's not trivial as-is.
2. It changes the reported gl_VertexID.
While it might be possible to work around this in each driver, it seems
better to avoid this when we can.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5976>
Due to scheduled physical lab maintenance to prepare for expansion and
better shard our device types across redundant infrastructure, these
device types will be inaccessible for approx. 3 hours. Disable them
until they are back.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6302>
The round-to-nearest-even implementation found in lower_2f() is incorrect
for any value having a significand that is not directly representable
and whose non-representable part lies between 1 and half the minimum
representable value. In this case, the significand is rounded up instead
of being rounded down.
Fixes: 936c58c8fc ("nir: Extend nir_lower_int64() to support i2f/f2i lowering")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Tested-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6290>
For some unfathomable reason, three out of these four tests often time
out when running within CI. On the assumption that there is some
parallelisation badness happening rather than the non-UNIX tests
entering infinite loops, try just marking them as serial-only.
This should have a negligible impact on runtime since they are quick to
execute.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6301>
This is the initial import of the vallium frontend for gallium.
This is only good enough to run the triangle and the gears demo
(wrongly) from Sascha demos.
Improvements are mostly on the llvmpipe side after this.
It contains an implementation of the Vulkan API which is mapped
onto the gallium API, and is suitable only for SOFTWARE drivers.
Command buffers are recordred into malloced memory, then later
they are played back against the gallium API. The command buffers
are mostly just Vulkan API marshalling but in some places the information is
processed before being put into the command buffer (renderpass stuff).
Execution happens on a separate "graphics" thread, againt the gallium API.
There is only a single queue which wraps a single gallium context.
Resources are allocated via the new resource/memory APIs.
Shaders are created via the context and bound/unbound in the
second thread.
(No HW for reasons - memory management, sw paths for lots of paths,
pointless CPU side queue)
v2: drop mesa_icd, drop cpp_args, drop extra flags, change meson config (Eric)
v2.1: use meson-gallium job
meson pieces:
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
overall:
Acked-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6082>
This adds an option to the WSI support for a software path to be
used with the vulkan sw drivers. There is probably some changes
that could be made to improve this and use present, for now
just use put image.
v2: roll out flag across all drivers (Eric)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6082>
In order to support vulkan over gallium for the sw renderers,
there needs to be a vulkan-like memory allocation API.
It doesn't need to be overly complicated for the needs of the sw
renderers.
The vallium layer will allocate resources and memory separately
and bind them via this API.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6082>
This is the only user of fetch_rgba outside of llvmpipe, and it's in the
fallback path of this fallback path. Looking at an example of these two
functions, b8g8r8a8's unpack_rgba is 2.7x as long as fetch_rgba. It feels
reasonable to sacrifice some perf in this already slow (VBO readback, and
a function pointer call per attribute per vertex) path to reduce our
binary size. And, if I ever finish getting unpack codegen to switch to
rows instead of rects, that factor will go back down.
Saves 40kb of binary on non-llvmpipe gallium drivers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6305>
A single format either had the float, the sint, or the uint version.
Making the dst be void * lets us store them in the same slot and not have
logic in the callers to call the right one.
-6kb on gallium drivers
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6305>
KHR-GL45.robust_buffer_access_behavior.vertex_buffer_objects
on the CTS 4.6.0 branch and this fixes it.
Roland identified that if the vertex format doesn't contain channels
then we shouldn't be overriding them to 0, so RGB fetch out of bounds
should return 0 for RGB, but the A channel should still be getting back
1.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6287>
Changes are necessary after commit 84ed2d098
("util/format: expose generated format packing functions through a header")
because script for format/u_format_pack.h has different commandline
compared to existing pattern
Generated sources rules are ported from meson
Fixes: 84ed2d098 ("util/format: expose generated format packing functions through a header")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6319>
If there's some reason why this needs to be a tripple loop, I'm not
seeing it. As far as I can tell, all the inner-most loop does is look
for the next remaining block not already in cur_level->blocks. There's
no reason to re-walk the whole set every time just to do that.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2401>
It's really hard to track in this pass which sets are getting ralloc'd
off which other sets. To avoid leaks, just pass a mem_ctx around
everywhere and ralloc almost everything off the one context. We do keep
using recursion a few places where it's crystal clear what the parent
relationship is.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2401>
v2 (Karol):
renamed pathes to paths
use more bool
use _mesa_set_intersects
deduplicated some code
fixed some typos
v3 (Karol):
don't enable structurizer as we do this in vtn now
v4 (Jason):
A few clean-ups due to unstructured NIR changes
v5 (Jason):
Misc whitespace and style cleanups
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2401>
These are safe to call on either structured or unstructured NIR but
don't provide the nice ordering guarantees of nir_foreach_block and
friends. While we're here, we use them for a very small selection of
passes which are known to be safe for unstructured control-flow. The
most important such pass is nir_dominance which is required for
structurizing.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2401>
bifrost requires both "pafrost/lib/midgard_pack.h" and "midgard_pack.h" headers
Fixes the following building error:
In file included from external/mesa/src/panfrost/bifrost/cmdline.c:33:
In file included from external/mesa/src/panfrost/bifrost/test/bit.h:31:
external/mesa/src/panfrost/lib/pan_device.h:40:10: fatal error: 'midgard_pack.h' file not found
#include <midgard_pack.h>
^~~~~~~~~~~~~~~~
1 error generated.
Fixes: bce1a7e9 ("android: panfrost: Redirect cmdstream includes through GenXML")
Fixes: 88dc4c21 ("panfrost: Redirect cmdstream includes through GenXML")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6322>
This involves rolling our own int packing functions, because the u_format
versions do clamping which differs from VK spec requirement.
This reduces the size of libvulkan_freedreno.so significantly.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6304>
Calling SET_TILING on a DMA buffer with the gen12 CCS modifier can fail
unnecessarily. The main surface in the BO is Y-tiled, but the CCS portion is
linear and can have a stride that's not a multiple of 128B. Because SET_TILING
is called on the CCS plane with I915_TILING_Y, the ioctl will sometimes reject
the stride.
SET_TILING was originally used in b6d45e7f74 to
fix an assertion failure in iris_resource_from_handle. Assigning the BO's
tiling_mode field is sufficient to avoid the failure.
Fixes: c19492bcdb ("iris: Handle importing aux-enabled surfaces on TGL")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6296>
Some of the generated functions can be useful without going through the
format table (filling border color struct in turnip). By not calling these
functions through the format table, we should eventually be able to garbage
collect the unused packing functions, and also allows LTOs to happen.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6093>
Note we are just handling the index part of the format. This is *not*
the full format, which would include the swizzle (or v7 equivalent) and
the sRGB flag. But in the interest of incremental progress, let's move
this part over first and save on decoding complexity.
To avoid substantial churn from prefixing FORMAT to format names, we
special case the enums to avoid the prefix. This is undesirable but
reduces churn, especially since format handling is slated for an
overhaul soon to accomodate v7
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6195>
Instead of converting back and forth we should stick with fourcc codes
as the canonical layout definition. Furthermore modifiers allow all the
variants of AFBC to be encoded canonically, whereas the previous enum
does not (info about YTR is encoded out of band, for instance).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Icecream95 <ixn@keemail.me>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6159>
It's safe to do the math in 32 bits because they're all local workgroup
calculations. We just need to do a conversion at the end. For a couple
of intrinsics, we just turn them into 32-bit intrinsics and add a u2u64.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6280>
Fix warnings reported by Coverity Scan.
Resource leak (RESOURCE_LEAK)
leaked_storage: Variable bt1 going out of scope leaks the storage
it points to.
leaked_storage: Variable bt2 going out of scope leaks the storage
it points to.
Fixes: d0d14f3f64 ("util: Add unit test for stack backtrace caputure")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6246>
Fixes random failures of dEQP-VK.image.qualifiers.volatile.cube_array.r32i
and similar tests on Vega.
fossil-db (Navi):
Totals from 6 (0.00% of 135946) affected shaders:
VMEM: 1218 -> 1110 (-8.87%); split: +2.46%, -11.33%
SMEM: 174 -> 189 (+8.62%)
Copies: 84 -> 87 (+3.57%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: cd392a10d0 ('radv/aco,aco: use scoped barriers')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6174>
Add tracking for # of instructions per category, similar to the last
patch. Also add a few other shader-db stats that were missing on the
disasm side, to make it easier to compare to shaders from cmdstream
traces.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6243>
in some cases (e.g., zink and d3d12) we only want to split the depth and stencil
buffers when we're mapping them, and we can handle packed buffers in other cases,
so being able to reuse the u_transfer_helper functionality is still desired but
only if we can preserve the underlying buffer the rest of the time
Kenneth Graunke notes during post-review:
Vulkan reads/copies on packed Z24S8 only return depth, so we need to use separate
Z24 and S8 reads and do packing tricks.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5338>
GEN:BUG:1604601757 prevents source modifiers for multiplication of
lower precision integers.
lower_mul_dword_inst() splits 32x32 multiplication into 32x16, and
needs to eliminate source modifiers in this case.
Closes: #3329
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Lowering source modifiers will convert a 16bit source to a 32bit
value. In the case of integer multiplication, this will reverse
previous lowering performed by lower_mul_dword_inst.
Assert to prevent an illegal DxD operation (and GPU hang).
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Fixes the following building errors:
target C: libpanfrost_encoder <= external/mesa/src/panfrost/encoder/pan_attributes.c
...
clang: error: no such file or directory: 'external/mesa/src/panfrost/encoder/pan_attributes.c'
clang: error: no input files
target C: libpanfrost_encoder <= external/mesa/src/panfrost/encoder/pan_afbc.c
...
clang: error: no such file or directory: 'external/mesa/src/panfrost/encoder/pan_afbc.c'
clang: error: no input files
Fixes: 1c62b5528 ("panfrost: Rename encoder/ to lib/")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6261>
We cannot decompress from the compute queue. While I'm pretty sure
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL is only useful on the
graphics queue, I cannot find a VU that prevents the transition
from happening on another queue, so we need to be careful here.
This patch ensures we do the decompression on the barrier that changes
the queue ownership.
Another problem was that DCC images were considered fast-clearable
when not DCC compressed, which resulted in a mess with concurrent
queue ownership.
Cc: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3387
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6252>
Currently Linux kernel gathers performance counters at fixed
intervals (~5-10ms), so if application uses AMD_performance_monitor
extension and immediately after glFinish() asks GL driver for HW
performance counter values it might not get any data (values == 0).
Fix this by moving the "read counters from kernel" code from
"is query ready" to "get counter values" callback with a loop around
it. Unfortunately it means that the "read counters from kernel"
code can spin for up to 10ms.
Ideally kernel should gather performance counters whenever we ask
it for counter values, but for now we have deal with what we have.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5788>
https://github.com/KhronosGroup/EGL-Registry/pull/95 has moved
a couple of extensions defines and functions to the upstream `eglext.h`,
but when 9a74746bd1 sync'ed these files we broke compilation
of apps that require these symbols on systems that don't have the
updated Khronos headers.
On non-GLVND builds, we still provide these headers, so everything's
fine, but on GLVND builds the Khronos headers are external so we need to
make sure we have a libglvnd version that's recent enough.
Fixes: 9a74746bd1 ("EGL: sync headers with Khronos")
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6069>
The code applied the restriction too late, which could overflow LDS size,
which started happening more often after the minimum vertex count was
increased for Sienna.
Incorporate the clamping into the previous code for rounding up the counts.
Now the LDS size can never overflow, but it may use vector lanes less
efficiently (max_gsprims can be decreased more), which will be addressed
in the next commit.
Fixes: 4ecc39e1aa ("radeonsi/gfx10: NGG geometry shader PM4 and upload")
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6137>
now that we're tracking these by shader, we want to ensure that they live through
each render pass successfully if there's no flush regardless of the timing when the
shader objects are destroyed. this becomes useful when we split up shader create and compile
functionality in future patches, at which point program refcounts can be changed
during successive draw calls, potentially resulting in a program being destroyed at that
point when it shouldn't be
with this patch, each shader used by the program gets a reference, with the renderpass
batch itself becoming the owner of the program such that it will be deleted
when the draw state gets invalidated and a new program is created
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5970>
We can go further in tracking what BOs are written to by the driver by
tracking when a buffer in unmapped. A BO could be mmap, written, unmap
and never be written to again. In such case we can just write the BO's
content on the first exec buf after unmap and never write it again.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2201>
@@ -12,11 +12,11 @@ And please remove anything that doesn't apply to keep things readable :)
### System information
Please post `inxi -GSC -xx` output OR fill information below manually
Please post `inxi -GSC -xx` output ([fenced with triple backticks](https://docs.gitlab.com/ee/user/markdown.html#code-spans-and-blocks)) OR fill information below manually
List all the (non platform-specific) symbols exported by the library
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.