This can happen when we record a VkCmdDraw in a secondary buffer that
was created inheriting from the primary buffer, but with the framebuffer
set to NULL in the VkCommandBufferInheritanceInfo.
Vulkan 1.1.81 spec says that "the application must ensure (using scissor
if neccesary) that all rendering is contained in the render area [...]
[which] must be contained within the framebuffer dimesions".
While this should be done by the application, commit 465e5a86 added the
clamp to the framebuffer size, in case of application does not do it.
But this requires to know the framebuffer dimensions.
If we do not have a framebuffer at that moment, the best compromise we
can do is to just apply the scissor as it is, and let the application to
ensure the rendering is contained in the render area.
v2: do not clamp to framebuffer if there isn't a framebuffer
v3 (Jason):
- clamp earlier in the conditional
- clamp to render area if command buffer is primary
v4: clamp also x and y to render area (Jason)
v5: rename used variables (Jason)
Fixes: 465e5a86 ("anv: Clamp scissors to the framebuffer boundary")
CC: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 1ad26f9417)
"The C standard says that compound literals which occur inside of
the body of a function have automatic storage duration associated
with the enclosing block. Older GCC releases were putting such
compound literals into the scope of the whole function, so their
lifetime actually ended at the end of containing function. This
has been fixed in GCC 9. Code that relied on this extended lifetime
needs to be fixed, move the compound literals to whatever scope
they need to accessible in."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109543
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 129a9f4937)
Piglit's vp-max-array test creates a vertex program containing a uniform
array sized to the value of GL_MAX_NATIVE_PROGRAM_PARAMETERS_ARB. Mesa
will then add additional state-var parameters for things like the MVP
matrix.
radeonsi currently exposes a value of 4096, derived from constant buffer
upload size. This means the array will have 4096 elements, and the
extra MVP state-vars would get a prog_src_register::Index of over 4096.
Unfortunately, prog_src_register::Index is a signed 13-bit integer, so
values beyond 4096 end up turning into negative numbers. Negative
source indexes are only valid for relative addressing, so this ends up
generating illegal IR.
In prog_to_nir, this would cause an out of bounds array access.
st_mesa_to_tgsi checks for a negative value, assumes it's bogus,
and remaps it to parameter 0 in order to get something in-range.
This isn't right - instead of reading the MVP matrix, it would read
the first element of the vertex program's large array. But the test
only checks that the program compiles, so we never noticed that it
was broken.
This patch limits the size of the program limits, with the understanding
that we may need to generate additional state-vars internally. i965 has
exposed 1024 for this limit for years, so I don't expect lowering it to
2048 will cause any practical problems for radeonsi or other drivers.
Fixes vp-max-array with prog_to_nir.c.
Cc: "19.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit f45dd6d31b)
For some reason we don't use view volume clipping by default, and use
scissors instead. These scissors were set to an 8k max fb size, while
the driver advertises 16k-sized framebuffers.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit cc79a1483f)
If the driver does not support rendering to these formats but does
support texturing, we can end up in incompatibilities between textures
and renderbuffers that are then copied to.
Fixes KHR-GL45.copy_image.functional on nvc0
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit cbd1ad6165)
Some NVIDIA hardware can accept 128 fragment shader input components,
but only have up to 124 varying-interpolated input components. We add a
new cap to express this cleanly. For most drivers, this will have the
same value as PIPE_SHADER_CAP_MAX_INPUTS for the fragment shader.
Fixes KHR-GL45.limits.max_fragment_input_components
Conflicts resolved by Dylan
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
[imirkin: rebased, improved docs/commit message]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 6010d7b8e8)
Not quite perfect, but at least we don't end up with random values in
the query buffer.
Fixes KHR-GL45.pipeline_statistics_query_tests_ARB.functional_default_qo_values
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 6adb9b38bf)
For the NO_WAIT variants, we would jump into the ALWAYS case for both
nested and inverted occlusion queries. However if the query had
previously completed, the application could reasonably expect that the
render condition would follow that result.
To resolve this, we remove the nesting distinction which unnecessarily
created an imbalance between the regular and inverted cases (since
there's no "zero" condition mode). We also use the proper comparison if
we know that the query has completed (which could happen as a result of
an earlier get_query_result call).
Fixes KHR-GL45.conditional_render_inverted.functional
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit e00799d3dc)
Looks like SUBFM.3D and SUEAU are perfectly capable of dealing with 3d
tiling, they just need the correct inputs. Supply them.
We also have to deal with the case where a 2d "layer" of a 3d image is
bound. In this case, we supply the z coordinate separately to the
shader, which has to optionally treat every 2d case as if it could be a
slice of a 3d texture.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 162352e671)
Atomic operations don't update the local cache, which means that we
would have to issue CCTL operations in order to get the updated values.
When we know that a buffer is primarily used for atomic operations, it's
easier to just avoid the caching at that level entirely.
The same issue persists for non-atomic buffers, which will have to be
fixed separately.
Fixes the failing dEQP-GLES31.functional.atomic_counter.* tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 4443b6ddf2)
The hardware does not natively support FIXED and DOUBLE formats. If
those are used in an indirect draw, they have to be converted. Our
conversion tries to be clever about only converting the data that's
needed. However for indirect, that won't work.
Given that DOUBLE or FIXED are highly unlikely to ever be used with
indirect draws, read the indirect buffer on the CPU and issue draws
directly.
Fixes the failing dEQP-GLES31.functional.draw_indirect.random.* tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 399215eb7a)
When Mesa is compiled for gallium-xlib using e.g.
./configure --enable-glx=gallium-xlib --disable-dri --disable-gbm
-disable-egl
and is used by an X server (usually remotely via SSH X11 forwarding)
that does not support MIT-SHM such as XMing or MobaXterm, OpenGL
clients report error messages such as
Xlib: extension "MIT-SHM" missing on display "localhost:11.0".
ad infinitum.
The reason is that the code in src/gallium/winsys/sw/xlib uses
MIT-SHM without checking for its existence, unlike the code
in src/glx/drisw_glx.c and src/mesa/drivers/x11/xm_api.c.
I copied the same check using XQueryExtension, and tested with
glxgears on MobaXterm.
This issue was reported before here:
https://lists.freedesktop.org/archives/mesa-users/2016-July/001183.html
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: <mesa-stable@lists.freedesktop.org>
(cherry picked from commit a203eaa4f4)
This reverts commit 378f996771.
This also remove the default true argument from the a2xx nir backend,
which was introduced after this commit. There should be no change in
functionality.
When nir_rematerialize_derefs_in_use_blocks_impl was first written, I
attempted to optimize things a bit by not bothering to re-materialize
the sources of deref instructions figuring that the final caller would
take care of that. However, in the case of more complex deref chains
where the first link or two lives in block A and then another link and
the load/store_deref intrinsic live in block B it doesn't work. The
code in rematerialize_deref_in_block looks at the tail of the chain,
sees that it's already in block B and skips it, not realizing that part
of the chain also lives in block A.
The easy solution here is to just rematerialize deref sources of deref
instructions as well. This may potentially lead to a few more deref
instructions being created by the conditions required for that to
actually happen are fairly unlikely and, thanks to the caching, it's all
linear time regardless.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109603
Fixes: 7d1d1208c2 "nir: Add a small pass to rematerialize derefs per-block"
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
(cherry picked from commit 9e6a6ef0d4)
We used to pre-set a bunch of extra arguments to a texture instruction
in order to force the RA to allocate a register at the boundary of 4.
However with the levelZero optimization, which removes a LOD argument
when it's uniformly equal to zero, we undid that logic by removing an
extra argument. As a result, we could end up with insufficient alignment
on the second wide texture argument.
Instead we switch to a different method of achieving the same result.
The logic runs during the constraint analysis of the RA, and adds unset
sources as necessary right before being merged into a wide argument.
Fixes MISALIGNED_REG errors in Hitman when run with bindless textures
enabled on a GK208.
Fixes: 9145873b15 ("nvc0/ir: use levelZero flag when the lod is set to 0")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
(cherry picked from commit 5de5beedf2)
We're writing to the bo and the kernel needs to know for
fd_bo_cpu_prep() to work.
Fixes: f93e431272 ("freedreno/a6xx: Enable blitter")
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
(cherry picked from commit 357ea7da51)
For example with VK_EXT_buffer_device_address or
VK_KHR_variable_pointers.
Fixes: a2b5cc3c39 "radv: enable variable pointers"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
(cherry picked from commit 00253ab2c4)
sizeof counts the terminating null character as well, so that also
contributed to the ID computed for the X11 atom. But the convention is
for only the non-null characters to contribute to the atom ID.
Fixes: 2e12fe425f "loader/dri3: Enable adaptive_sync via
_VARIABLE_REFRESH property"
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
(cherry picked from commit c0a540f320)
Transform feedback did not set correct SO_DECL.ComponentMask for
varyings packed in VARYING_SLOT_PSIZ:
gl_Layer - VARYING_SLOT_LAYER in VARYING_SLOT_PSIZ.y
gl_ViewportIndex - VARYING_SLOT_VIEWPORT in VARYING_SLOT_PSIZ.z
gl_PointSize - VARYING_SLOT_PSIZ in VARYING_SLOT_PSIZ.w
Fixes: 36ee2fd61c "anv: Implement the basic form of VK_EXT_transform_feedback"
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry picked from commit 64d3b148fe)
Previously, we only applied the fix to shaders with a dispatch mode of
SIMD8 but the code it relies on for SIMD16 mode only applies to SIMD16
instructions. If you have a SIMD8 instruction in a SIMD16 shader,
neither would trigger and the restriction could still be hit.
Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127..."
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit b4f0d062cd)
We need to initialize all fields in rs->prim explicitly while
creating new rastpos stage.
Fixes: bac8534267 ("st/mesa: allow glDrawElements to work with GL_SELECT
feedback")
v2: Initializing all fields in rs->prim as per Ilia.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
(cherry picked from commit 69d736b17a)
Reported by Coverity: in the case of unsupported modifier request, the
code does not jump to the “fail” label to destroy the acquired resource.
CID: 1435704
Signed-off-by: Ernestas Kulik <ernestas.kulik@gmail.com>
Fixes: 45bb8f2957 ("broadcom: Add V3D 3.3 gallium driver called "vc5", for BCM7268.")
(cherry picked from commit 90458bef54)
This was copy-and-paste fail, that oddly showed up in the CTS's
reinterprets of r32f, rgba8, and srgba8 to rgba8i, but not r32ui and r32i
to rgba8i or reinterprets to other signed int formats.
Fixes: 6281f26f06 ("v3d: Add support for shader_image_load_store.")
(cherry picked from commit ab4d5775b0)
One of the CTS cases tries to invalidate just stencil of packed
depth/stencil, and we incorrectly lost the depth contents.
Fixes dEQP-GLES3.functional.fbo.invalidate.whole.unbind_read_stencil
Fixes: 0c42b5f3cb ("mesa: wire up InvalidateFramebuffer")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
(cherry picked from commit db2ae51121)
This fixes serious stuttering in Shadow Of The Tomb Raider.
Fixes: 50fd253bd6 ("radv/winsys: Add priority handling during submit.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
(cherry picked from commit 9c762c01c8)
Reported by Coverity: in the case where there exist hardware and
non-hardware queries, the code does not jump to err_free_query and leaks
the query.
CID: 1430194
Signed-off-by: Ernestas Kulik <ernestas.kulik@gmail.com>
Fixes: 9ea90ffb98 ("broadcom/vc4: Add support for HW perfmon")
(cherry picked from commit f6e49d5ad0)
Earlier commit addressed 7 of the 8 instances available.
v2: Rebase patch back to master (by anholt)
Cc: Carsten Haitzler (Rasterman) <raster@rasterman.com>
Cc: Eric Anholt <eric@anholt.net>
Fixes: 300d3ae8b1 ("vc4: Declare the cpu pointers as being modified in NEON asm.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 385843ac3c)
Android.mk and autotools disagree about where generated files should
go, which wasn't a problem until we wanted to build a dist
tarball. This corrects the problme by changing the output and include
paths to be the same on android and autotools (meson already has the
correct include path).
Fixes: 7d7b30835c
("automake: Fix path to generated source")
It is currently impossible to build a dist tarball that works when SWR
requires LLVM 6. To generate the tarball we'd need to configure with
LLVM 6, which is fine. But to build the dist check we need LLVM 7, as
RadeonSI and RadV require that version. Unfortunately the headers
genererated with LLVM 6 don't compile with LLVM 7, the API has changed
between the two versions.
I weighed a couple of options here. One would be to ship an
unbootstrapped tarball generated with meson. This would fix the issue
by not bootstrapping, so whatever version of LLVM used would work
because the SWR headers would be generated at compile
time. Unfortunately this would involve some heavy modifications to the
infastructure used to upload the tarballs, and I've decided not to
persue this.
Use the trick of adding and then subtracting 2**52 (52 is the number of
explicit mantissa bits a double-precision floating-point value has) to
implement round-to-even.
Cuts the number of instructions on SKL of the piglit test
fs-roundEven-double.shader_test from 109 to 21.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This batch->cleared value is only used to decide to use sysmem rendering
or not, so it should include any buffers that are affected by a clear.
This is required because the a2xx fast clear doesn't work with sysmem
rendering. The a22x "normal" clear path doesn't work with sysmem either.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Depth can be used even when there is no restore/resolve of depth. This
happens when the depth buffer is invalidated after rendering to avoid
the resolve operation.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Set dirty bits on invalidate to trigger invalidate logic in fd_draw_vbo.
Also, resource_written for color needs to be after the invalidate logic.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
And before someone actually starts implementing DiscardFramebuffer()
lets rework the interface to something that is actually usable.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This allows freedreno to be aware of the depth invalidate when flushing
batches on flush_resource.
AFAIK, the only other driver which might care about this change is vc4,
where I think it should help by allowing the depth invalidate to work with
GALLIUM_HUD.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Check if a pixel format is supported by the Wayland servers gpu driver
before exposing it to the client via wl_drm, so we avoid reporting formats
to the client which the server gpu can't handle.
Restrict this reporting to the new color depth 30 formats for now, as the
ARGB/XRGB8888 and RGB565 formats are probably supported by every gpu under
the sun.
Atm. this is mostly useful to allow proper PRIME renderoffload for depth
30 formats on the typical Intel iGPU + NVidia dGPU "NVidia Optimus" laptop
combo.
Tested on Intel, AMD, NVidia with single-gpu setup and on a Intel + NVidia
Optimus setup.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Support PRIME render offload between a Wayland server gpu and a Wayland
client gpu with different channel ordering for their color formats,
e.g., between Intel drivers which currently only support ARGB2101010
and XRGB2101010 import/display and nouveau which only supports ABGR2101010
rendering and display on nv-50 and later.
In the wl_visuals table, we also store for each format an alternate
sibling format which stores colors at the same precision, but with
different channel ordering, e.g., ARGB2101010 <-> ABGR2101010.
If a given client-gpu renderable format is not supported by the server
for import, but the alternate format is supported by the server, expose
the client-gpu renderable format as a valid EGLConfig to the client. At
eglSwapBuffers time, during the blitImage() detiling blit from the client
backbuffer to the linear buffer, the client format is converted to the
server supported format. As we have to do a copy for PRIME anyway,
this channel swizzling conversion comes essentially for free.
Note that even if a server gpu in principle does support sampling
from the clients native format, this conversion will be a performance
advantage if it allows to convert to the servers preferred format
for direct scanout, as the Wayland compositor may then be able to
directly page-flip a fullscreen client wl_buffer onto the primary
plane, or onto a hardware overlay plane, avoiding an extra data copy
for desktop composition.
Tested so far under Weston with: nouveau single-gpu, Intel single-gpu,
AMD single-gpu, "Optimus" Intel server iGPU for display + NVidia
client dGPU for rendering.
v2: Implement minor review comments by Eric Engestrom: Add some
comment and assert, and some style fixes for clarity.
No functional change.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Surface reads don't need them because they just have the one address
payload. With surface writes, on the other hand, we can put the address
and the data in the different halves and avoid building the payload all
together.
The decrease in register pressure and added freedom in register
allocation resulting from this change reduces spilling enough to improve
the performance of one customer benchmark by about 2x.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We're about to add some more if cases so let's have the giant re-indent
in it's own patch to make review easier.
Acked-by: Iago Toral Quiroga <itoral@igalia.com>
These have clearly never seen any use.... On gen8, the bottom 4 bits are
missing so we need to shift them off before we call set_bits and shift
again when we get the bits. Found by inspection.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Instead of fetching the information out of the instruction directly,
fetch the descriptor and then pluck the information out of the
descriptor. The current scheme works ok for SEND but with SENDS, it all
falls to pieces because the descriptor is completely shuffled around.
This commit doesn't actually convert everything. One notable exception
is URB messages which don't even use descriptors in emit_urb_WRITE yet.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We want to be able to extract data from descriptors as well as unify a
bit of the descriptor construction.
One of the unifications we do is to unify the read/write and dataport
descriptors. On gen4-5, read/write are substantially different and the
read descriptors change between gen4 and gen4.x. On gen6, they unified
layouts between read, write, and dataport. Then, on gen8, they added
one bit to the message type field but left it reserved MBZ for
read/write messages. This commit chooses to treat that as if they
expanded the field everywhere and just didn't have enough enum values
for read/write to bother with the extra bit.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This commit pulls the surface descriptor helpers out into brw_eu.h and
makes them no longer depend on the codegen infrastructure. This should
allow us to use them directly from the IR code instead of the generator.
This change is unfortunately less mechanical than perhaps one would like
but it should be fairly straightforward.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Instead of magically falling back to SIMD8 for atomics and typed
messages on Ivy Bridge, explicitly figure out the exec size and pass
that into brw_surface_payload_size.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Like all the other sends, it's just mlen * REG_SIZE.
Fixes: 3cbc02e469 "intel: Use TXS for image_size when we have..."
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
If you pass a bool in as the value to set, the C standard says that it
gets converted to an int prior to shifting. If you try to set a bool to
bit 31, this lands you in undefined behavior. It's better just to add
the explicit cast and let the compiler delete it for us.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
There are piles of fields that it doesn't check so using it is a lie.
The only reason why it's not causing problem is because it has exactly
one user which only uses it for MOV instructions (which aren't very
interesting) and only on Sandy Bridge and earlier hardware. Just get
rid of it and inline it in the one place that it's actually used.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Previously we tried to normalize nr_samples to MAX2(1, nr_samples) to
avoid having to deal with 0 vs 1 everywhere. But this causes problems
in mesa/st, for example st_finalize_texture() will think there is a
nr_samples mismatch and recreate the texture. Somehow this manifests
as corrupt x11 font rendering on generations that do not support MSAA
(but apparently works fine on a5xx and a6xx which do support MSAA.)
Fixes: cf0c7258ee freedreno/a5xx: MSAA
Signed-off-by: Rob Clark <robdclark@gmail.com>
Switched to the raw bo list api to avoid having to use 2 arrays for
everything.
This was introduced in libdrm 2.4.97 which we already depend upon.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This has been disabled some months ago because it introduced
rendering issues with Shadow Of Warrier II (DXVK). This game is
no longer affected, I wonder if 824cfc1ee5 ("radv: rework the
TC-compat HTILE hardware bug with COND_EXEC") fixed the problem.
I checked The Forest on my Polaris, and it renders fine too.
According to Phillip, this gives +5.5% with Rise Of The Tomb
Raider and DXVK. This is because DXVK uses 16-bit depth surfaces
while the native port from Feral uses 32-bit depth surfaces.
Unfortunately, Shadow Of The Tomb Raider isn't affected because
it clears each layer of a D16 array texture individually. So it
doesn't hit the fast clear path.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The core Mesa with_asm_arch and USE_ARM_ASM flags are disabled for meson
cross-builds because of the need to run host binaries on the build system.
vc4 doesn't need to do that, so skip with_asm_arch to enable NEON on my
cross-builds.
Fixes: ebcb4c2156 ("meson: Enable VC4's NEON assembly support.")
Otherwise, the compiler is free to reuse the register containing the input
for another call and assume that the value hasn't been modified. Fixes
crashes on texture upload/download with current gcc.
We now have to have a temporary for the cpu2 value, since outputs must be
lvalues.
(commit message by anholt)
Fixes: 4d30024238 ("vc4: Use NEON to speed up utile loads on Pi2.")
This fixes the depth/stencil clear on a20x, and adds a fast clear path.
The fast clear path is only used for a20x, needs performance tests on a22x.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
This allows us to avoid expensive string compares since we already have
a map to the pointers.
These compares were taking ~30 seconds for a single shader compile
in Godot due to it using 64,000+ uniforms.
Fixes: c4cff5f402 ("glsl: add basic support for resource list to shader cache")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109229
Enable using etnaviv for KMS renderonly. This still needs KMS driver
name mapping to kmsro to be used automatically.
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Rob Herring <robh@kernel.org>
This allows vc4 to initialize on the Adafruit PiTFT 3.5" touchscreen with
the hx8357d tinydrm driver
v2: Whitespace fix noted by Eric Engestrom, update commit message for the
driver being merged.
v3: Rebase on Rob Herring's pipe-loader changes.
Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Acked-by: Emil Velikov <emil.velikov@collabora.com> (v1)
If we can't find a driver matching by name, then use the kmsro driver.
This removes the need for needing a driver descriptor for every possible
KMS driver.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
The vc4 driver can do prime sharing to many different KMS-only devices,
such as the various tinydrm drivers for SPI-attached displays. Rename the
driver away from "pl111" to represent what it will actually support:
various sorts of KMS displays with the renderonly layer used to attach a
GPU.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Instead of using this useless array_params_mask variable.
This should set these two attributes to streamout buffers too.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Use EXT_framebuffer_sRGB to expose EXT_sRGB_write_control on GLES. Remove
the checks for desktion GL in the enable calls, since EXT_framebuffer_sRGB
now also indicates support for switching the linear-sRGB color
space conversion on GLES.
Thanks to Ilia Mirkin for all the helpful discussions that helped to rework
this series.
v2: Fix alphabetical listing of extensions (Tapani Pälli)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
GLES 3.0 does not actually require support for EXT_framebuffer_sRGB, it
only needs support for sRGB attachments to framebuffers and framebuffer
objects as defined in ARB_framebuffer_objects.
v2: Clarify that ARB_framebuffer_objects is needed.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
All drivers that support EXT_framebuffer_sRGB also support EXT_sRGB, but
in order to keep this commit minial, and not to break any drivers both
flags are checked.
v2: - Use only EXT_sRGB (Ilia Mirkin)
- Move adding the flag EXT_sRGB to gl_extensions to a separate patch
v3: use _mesa_has_EXT_framebuffer_sRGB instead of extension flag
The _mesa_has function also checks for the correct versions and
should be preferred over using the flags directly (Erik)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
For GLES sRGB framebuffer attachemnt support is provided in two steps:
sRGB attachments like described in EXT_sRGB (and GLES 3.0) that enable
linear to sRGB color space transformation automatically, and the ability
to switch formats of the render target surface between sRGB and linear
that introduces full support for EXT_framebuffer_sRGB.
Set the according flags to reflect these two levels of sRGB support.
As a difference between desktopm GL and GLES, on desktop GL for a sRGB
framebuffer attachment the linear-sRGB conversion is turned off by default,
and for GLES it is turned on. This needs to be taken into account when
initally creating a surface, i.e. on desktop GL creation of a sRGB surface
is preferred, but on GLES sRGB surfaces are only created when explicitely
requested.
v2: - Use the new CAPS name
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
EXT_sRGB is an (incomplete) GLES extension that provides support for sRGB
framebuffer attachments, hence it can be used to check for this support
as an alternative to EXT_framebuffer_sRGB that provies the same
functionality but also sRGB write control support.
However, since EXT_sRGB is incomplete and superseted by GLES 3.0 it will
not be exposed as an extension.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: - Use the renamed CAPS
- add assetions to make sure that mesa doesn't try to switch
destination surface formats when it is not supported. (Ilia Mirkin)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Add a new cap that indicates whether the drivers supports
enabling/disabling the conversion from linear space to sRGB
for a framebuffer attachment. In Driver terms that this CAP indicates
whether the driver can switcht between a linear and and a sRGB surface
format for draw destinations witout changing the sourface itself.
v2: rename CAP to DEST_SURFACE_SRGB_CONTROL to reflect its
purpouse better (pointed out by Ilia Mirkin)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Under Vulkan, the double vertex attributes take up the same size
regardless of whether they are vertex inputs or any other stage
interface.
Under OpenGL (ARB_gl_spirv), from GLSL 4.60 spec, section 4.3.9
Interface Blocks:
"It is a compile-time error to have an input block in a vertex
shader or an output block in a fragment shader. These uses are
reserved for future use."
So we also don't need to check if it is an vertex input or not, and
use false in any case.
v2: (changes made by Alejandro Piñeiro)
* Update required after "spirv: Handle location decorations on
block interface members" own updates (original patch was sent
several months ago)
* After Neil suggesting it, confirm that this change can be also
done for OpenGL (ARB_gl_spirv). Expand commit message.
v3: update after changing name of main method on a previous patch
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
glsl_count_attribute_slots takes a parameter to specify whether the
type is being used as a vertex input because on GL double attributes
only take up one slot. Vulkan doesn’t make this distinction so this
patch renames the argument to is_gl_vertex_input in order to make it
more clear that it should always be false on Vulkan.
v2: minor variable renaming (s/member/member_type) (Tapani)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Previously the code was taking any location decoration on the block
and using that to calculate the member locations for all of the
members. I think this was assuming that there would only be one
location decoration for the entire block. According to the Vulkan spec
it is possible to add location decorations to individual members:
“If the structure type is a Block but without a Location, then each
of its members must have a Location decoration. If it is a Block
with a Location decoration, then its members are assigned
consecutive locations in declaration order, starting from the
first member which is initially the Block. Any member with its own
Location decoration is assigned that location. Each remaining
member is assigned the location after the immediately preceding
member in declaration order.”
This patch makes it instead keep track of which members have been
assigned an explicit location. It also has a space to store the
location for the struct as a whole. Once all the decorations have been
processed it iterates over each member to fill in the missing
locations using the rules described above.
So, this commit is needed to get working a case like this, on both
Vulkan and OpenGL using SPIR-V (ARB_gl_spirv):
out block {
layout(location = 2) vec4 c;
layout(location = 3) vec4 d;
layout(location = 0) vec4 a;
layout(location = 1) vec4 b;
} name;
v2: (changes made by Alejandro Piñeiro)
* Update after introducing struct member splitting (See commit b0c643d)
* Update after only exposing interface_type for blocks, not to any struct
* Update after last changes done for xfb support
v3: use "assign" instead of "add" on the new method added (Tapani)
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Defines how sampler (and pixel pipes) needs to access the data
represented with a resource. The used default is mode is
ETNA_ADDRESSING_MODE_TILED.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
The text segment is shared among multiple contexts, while each one has
its own bufctx. So when reallocating the text segment, some contexts may
end up with stale values in their bufctx's. Instead limit the exposure
to the bufctx to within a single draw.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The sampler border color is encoded in the TMU's blending format (half
floats, 32-bit floats, or integers) and must be clamped to the format's
range unorm/snorm/int ranges by the driver. Additionally, the TMU doesn't
know about how we're abusing the swizzle to support BGRA, A, and LA, so we
have to pre-swizzle the border color for those.
We don't really want to spend half a kb on sampler states in most cases,
so skip generating the variants when the border color is unused or is
0,0,0,0.
When the sampler view is in sample-stencil mode, we need to return uint
stencil values. To do that, fill in the format table to return R8I, and
have the sampler view point at the separate stencil buffer.
Fixes dEQP-GLES31.functional.stencil_texturing.format.depth32f_stencil8_2d
Fixes OOMs in the CTS's packed_pixels.varied_rectangle.* tests -- the
series of texture uploads at the start before texturing occurred would end
up all sitting around as cached jobs for reuse. By flushing immediately,
peak active BO usage goes from 150M to 40M.
We could maybe put some limits on how many jobs we keep around, but blits
seem particularly unlikely to get reused for other drawing.
This is the GLES 3.2 minmax, and also what the closed source driver does.
Avoids hitting OOMs in the CTS's
dEQP-GLES3.functional.texture.units.all_units.only_cube.1.
From the vulkan spec 33.3 "Extension Dependencies":
"Any device extension that has an instance extension dependency that is
not enabled by vkCreateInstance is considered to be unsupported, hence
it must not be returned by vkEnumerateDeviceExtensionProperties for any
VkPhysicalDevice child of the instance."
Therefore we need to check whether the instance-level extensions are
actually enabled when deciding to support a device-level extension or
not.
Furthermore, we need to do this for all instance-level extensions of any
(transitive) device-level extension dependency, due to the following
paragraph:
"If an extension is supported (as queried by
vkEnumerateInstanceExtensionProperties or
vkEnumerateDeviceExtensionProperties), then required extensions of that
extension must also be supported for the same instance or physical
device."
Finally, because some of these vulkan extensions may be implicitly
promoted to future vulkan core API versions, we can also satisfy the
dependency if the vulkan API version is high enough.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
From the vulkan spec 3.2 "Instances":
"Providing a NULL VkInstanceCreateInfo::pApplicationInfo or providing an
apiVersion of 0 is equivalent to providing an apiVersion of
VK_MAKE_VERSION(1,0,0)."
Fixes: ffa15861ef "radv: UseEnumerateInstanceVersion for the default version."
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Section 7.6.2.2 (Standard Uniform Block Layout) of the GL spec says:
The base offset of the first member of a structure is taken from the
aligned offset of the structure itself. The base offset of all other
structure members is derived by taking the offset of the last basic
machine unit consumed by the previous member and adding one.
The current code does not reflect this last sentence - it effectively
instead aligns up the next offset up to the alignment of the previous
member. This causes an issue in exactly one case:
layout(std140) uniform block {
layout(offset=0) vec3 var1;
layout(offset=12) float var2;
};
As per section 7.6.2.1 (Uniform Buffer Object Storage) and elsewhere, a
vec3 consumes 3 floats, i.e. 12 basic machine units. Therefore, `var1`
in the example above consumes units 0-11, with 12 being the first
available offset afterwards. However, before this commit, mesa
incorrectly assumes `var2` must start at offset=16 when using explicit
offsets, which results in a compile-time error. Without explicit
offsets, the shaders actually work fine, indicating that mesa is already
correctly aligning these fields internally. (Just not in the code that
handles explicit buffer offset parsing)
This patch should fix piglit tests:
ssbo-explicit-offset-vec3.vert
ubo-explicit-offset-vec3.vert
Signed-off-by: Niklas Haas <git@haasn.xyz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This only implements the actual opcodes and does not implement support
for using them with specialization constants.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We handle forward declarations by creating the pointer type with it's
storage type based on storage class and just waiting to fill out the
actual deref type until we get the OpTypePointer. Because any
composites using the forward declared type only care about the storage
type (i.e. uint64_t, uvec2, etc.) when creating their glsl_type, this
works fine and we can defer the actual deref_type as far as we need.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
This was valid back when the only valid types of pointers were uint32
and uvec2. Now that we're allowing more variety, it could be just about
anything so we'll just drop the assert.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
These correspond roughly to reading/writing OpenCL global pointers. The
idea is that they just take a bare address and load/store from it. Of
course, exactly what this address means is driver-dependent.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
We want to have debug info as well if using
meson's debugoptimized when ndebug is off.
v2: use u_debug functions that do something
even if DEBUG is not set.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Fixes regression caused by
42d672fa6a
st/nine: Bind src not dst in nine_context_box_upload
Before that patch, for user provided textures,
when the texture was destroyed, the safety
check for pending uploads, which according to
the code "Following condition cannot happen currently",
was flushing the queue and thus triggering the upload.
After the patch, the texture destruction was delayed after
the upload. However the user frees the texture buffer,
as it thinks the texture released.
Instead of reverting the faulty patch,
this patch instead flushes the csmt queue right away
after queuing the upload for this type of textures.
This is more future-proof, as we may want to bind the
surface for other reasons in the future.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Cc: 18.3 <mesa-stable@lists.freedesktop.org>
Compilation of user-specified shaders with software fp64 works by
compiling on demand an "fp64-funcs" shader implementing various fp64
operations and then linking it into the "user shader".
In
commit 64b8c86d37
Author: Timothy Arceri <tarceri@itsqueeze.com>
Date: Thu Jan 17 17:16:29 2019 +1100
glsl: be much more aggressive when skipping shader compilation
we changed the behavior of the shader cache to skip compilation earlier
when we get a cache hit.
After the aforementioned commit, compiling a user program using fp64
would store into the cache an entry for the fp64-funcs shader.
Subsequent compilations of uncached user shaders using fp64 would fail
in compile_fp64_funcs() after finding a cache entry for the fp64-funcs,
but being unprepared to read from the cache.
It's unclear to me how to retrieve the cached NIR of the fp64-funcs (if
it even is cached), so just call _mesa_glsl_compile_shader() with
force_recompile=true in order to ensure we generate the fp64-funcs
successfully.
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows creating a fd_screen with a renderonly object which will be
used to allocated scanout resources.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
[slight tweak to fix uninitialized 'prsc' in debug print]
Signed-off-by: Rob Clark <robdclark@gmail.com>
Fixes the following piglit test on my VEGA and matches the behaviour in the
tgsi backend.
tests/spec/glsl-1.10/execution/samplers/glsl-fs-shadow2D-clamp-z.shader_test
Fixes: 625dcbbc45 ("amd/common: pass address components individually to ac_build_image_intrinsic")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The util helpers were looking for a non-void channels in a non-mixed
format and returning its snorm/unorm state. However, compressed formats
don't have non-void channels, so they always returned false. V3D wants to
use util_format_is_[su]norm for its border color clamping workarounds, so
fix the functions to return the right answer for these.
This now means that we ignore .is_mixed. I could retain the is_mixed
check, but it doesn't seem like a useful feature -- the only code I could
find that might care is freedreno's blit, which has some notes about how
things are wonky in this area anyway.
Reviewed-by: <Roland Scheidegger sroland@vmware.com>
These tests don't need swrast, so we can always enable them when
build_tests is set. Most of them run to successful completion quickly
(.9s on my SKL).
Reviewed-by: <Roland Scheidegger sroland@vmware.com>
Earlier commit aimed to remove unneeded function declarations. Namely
OpenGL entrypoints which are not applicable for OpenGLES*
Although it did not consider the shared glapi which needs all,
including hidden ones. Resulting in warning/errors like the following
../build/src/mapi/shared-glapi/glapi_mapi_tmp.h:26014:15:
error: no previous prototype for ‘shared_dispatch_stub_1414’ [-Werror=missing-prototypes]
This patch addressed that.
Cc: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reported-by: Eric Anholt <eric@anholt.net>
Fixes: 6148cce388 ("mapi: drop unneeded gl_dispatch_stub declarations")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Eric Anholt <eric@anholt.net>
1ce5d757d0 dropped this limit.. which is probably the right thing to
do. But it results in an extra tiled->linear blit for glReadPixels()
(ie. dEQP/piglit) which is hitting some intermittent corruption (looks
like cache) on a6xx, causing a lot of spurious fails.
Since we are getting close to 19.0 branchpoint, re-instate this limit
for now, until the blitter problems are resolved.
Fixes: 1ce5d757d0 freedreno: core buffer modifier support
Signed-off-by: Rob Clark <robdclark@gmail.com>
This reverts commits 00ad77b9f6 and
5334dafee2.
This avoids Appveyor build breakage due to Cygwin, but more importantly,
there are several problems with these patches, as highlighted to my
recent mesa-dev mail. So better to revert for now, and pursue Cygwin
support after these have been address.
It should be incremented by one according to
how it is calculated by 'emit_vertex_buffer_state':
"\#if GEN_GEN < 8
.BufferAccessType = step_rate ? INSTANCEDATA : VERTEXDATA,
.InstanceDataStepRate = step_rate,
\#if GEN_GEN >= 5
.EndAddress = ro_bo(bo, end_offset - 1),
\#endif
\#endif"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109449
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This mirrors what autotools does in src/gallium/state_trackers/vdpau/Makefile.am
and src/gallium/targets/vdpau/Makefile.am:
VDPAU_MAJOR = 1
VDPAU_MINOR = 0
libvdpau_gallium_la_LDFLAGS = -version-number $(VDPAU_MAJOR):$(VDPAU_MINOR)
Reported-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
Fixes: 68076b8747 "meson: build gallium vdpau state tracker"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Accessing bo->map and then pool->center_bo_offset without a lock is
racy. One way of avoiding such race condition is to store the bo->map +
center_bo_offset into pool->map at the time the block pool is growing,
which happens within a lock.
v2: Only set pool->map if not using softpin (Jason).
v3: Move things around and only update center_bo_offset if not using
softpin too (Jason).
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reported-by: Ian Romanick <idr@freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109442
Fixes: fc3f588320
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
ICC tries to be helpful by not erroring when it sees something that it
doesn't understand, which is completely the opposite of helpful. Meson
0.49.0 does much better at handling this by really trying to make ICC
error, but there are some things in mesa that still get ignored until
0.49.1
v2: - Fix id check, which is 'intel' not 'icc'
Cc: 18.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
This is a bit fragile, as the way this "fixes" the check is to move the
one that we know is correct before the one that is incorrectly reported
as working. In meson 0.49.1 (which isn't out yet) this is fixed that the
incorrect check is reported as a failure.
Fixes: e0b037d697
("meson: Build SWR driver")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109129
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
LLVM uses the single instruction "FRINTI" to implement llvm.nearbyint.
Fixes the rounding tests of lp_test_arit.
Bug: https://bugs.gentoo.org/665570
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
NEON (now called ASIMD) is available on all aarch64 CPUs. Our code was
missing an aarch64 path, leading to util_cpu_caps.has_neon always being
false on aarch64.
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes the drisw paths to use the new shm2 interface, so that
we don't trigger the X server overflow checks when the x offset is non-zero.
This just hides the versioning in drisw, and either passes the src_x
or adds the offset fixup for the fallback path.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
This adds a new interface to the swrast interface to fix an shm put image bug.
The current code adds the x,y src offsets into the offset parameters,
however if the x offset is > 0, and the put image copies up to the height
of the image, this can trigger an X server validation check to fail and
the renderering to get BadMatch.
This patch fixes it to pass the x offset coord in as a src x.
We cannot pass the Y coordinate due to the horrible code mangling the
image w/h vs stride in swrastXPutImage.
v2: drop srcx,y from api
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
With the previous scripts API from the following was incorrectly
exported. Drop them from the list, since they're no longer around.
GL_EXT_blend_func_extended
GL_EXT_texture_integer
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Now we use the upstream XML file and a cleaner generator. Thus the
symbols are no longer exported and we can drop them from this list.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
As some point in the past we fixed the scripts so, these are no longer
exported. Drop them from the list.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
This reverts commit a1f5d9412cf7cacb3534635f6c2409fafbe6574e.
We no longer needed to sort - it was meant only to ease compare against
the old generated files.
The output produced functionally identical, with the following changes:
- A cosmetic: swapped ABI compatible types [ GLclampf -> GLfloat, etc ]
- B cosmetic: renamed parameters [ zNear -> n, etc ]
- C dropped extension entrypoints - invalid/incorrect
To make things easier to validate, normalise both old/new headers run
the sed patterns A, B and C to both sets.
A
s/\<GLclampf\>/GLfloat/g; s/\<GLclampx\>/GLfixed/g;
s/\<GLvoid\>/void/g;
B
s/\ \* / */g; s/\<texture\>/target/g;
s/\<plane\>/p/g; s/\<depth\>/d/g; s/\<modeAlpha\>/modeA/g;
s/\<shader\>/program/g; s/\<obj\>/shaders/g; s/\<equation\>/eqn/g;
s/\<param\>/data/g; s/\<params\>/data/g; s/\<buffers\>/buffer/g;
s/\<src\>/mode/g; s/\<count\>/n/g; s/\<zNear\>/n/g; s/\<zFar\>/f/g;
s/\<zfail\>/dpfail/g; s/\<zpass\>/dppass/g; s/\<buf\>/index/g;
s/\<value\>/target/g; s/\<cap\>/target/g; s/\<maskNumber\>/index/g;
s/\<srcRGB\>/sfactorRGB/g; s/\<dstRGB\>/dfactorRGB/g;
s/\<srcAlpha\>/sfactorAlpha/g; s/\<dstAlpha\>/dfactorAlpha/g;
s/\<primitiveMode\>/mode/g; s/\<primcount\>/instancecount/g;
s/\<top\>/t/g; s/\<bottom\>/b/g; s/\<left\>/l/g; s/\<right\>/r/g;
s/\<x\>/v0/g; s/\<y\>/v1/g; s/\<z\>/v2/g; s/\<w\>/v3/g;
s/\<sfactor\>/mode/g; s/\<dfactor\>/dst/g; s/\<attribindex\>/bindingindex/g;
s/\<internalFormat\>/internalformat/g; s/\<bufSize\>/bufsize/g;
C
glMultiDrawArraysEXT
glMultiDrawElementsEXT
glBindFragDataLocationEXT
glGetTexParameterIivEXT
glGetTexParameterIuivEXT
glTexParameterIivEXT
glTexParameterIuivEXT
v2:
- gl_dispatch_stub declarations are addressed with previous patch
- the public_entries table is no longer generated
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
We already do it a few lines above - drop the duplicate.
Note that for consistency sake, we keep the substitution since the GL
API is a mixed bad - some use GLvoid while others a normal void.
We might want to merge this back in GLVND.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
This way we can reuse the latter, which is already present in the
headers that we use. Thus we can drop the manual typedef we generate.
We might want to merge this back in GLVND.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
There is no need for the noop functions, the public_stubs and
public_entries table or table size defines. Remove those.
Pretty much all of this is applicable to GLVND, although it
requires preparatory work.
v2:
- python style fixes (Dylan)
- use "gldispatch" instead of not "glesv1" "glesv2"
- remove the public_entries table/array (Erik)
v3:
- use if == "gldispatch", instead of "in" (Kyle)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> (v2)
The only instance that requires the public_entries table is the
dispatch library - split that into another function.
We have to be careful with when undefining the guard, so split it out.
We might want to merge this back in GLVND.
Minor GLVND cleanup will be needed first.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Strictly speaking we can rework the rest of the code so we do not need
those. That said, this will require a series on it's own so let's carry
this local quirk for now.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Otherwise the incorrect ones will be used, effectively breaking the ABI.
Note: some entries in static_data.py list a suffixed API, while (for ES*
at least) we expect the one w/o suffix.
v2:
- rework path handling (Dylan)
- use else if chain (Erik)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Currently we have over 20 scripts that generate the libGL* dispatch and
various other functionality. More importantly we're using local XML
files instead of the Khronos provides one(s). Resulting in an
increasing complexity of writing, maintaining and bugfixing.
One fairly annoying bug is handling of statically exported symbols.
Today, if we enable a GL extension for GLES1/2, we add a special tag to
the xml. Thus the ES dispatch gets generated, but also since we have no
separate notion of GL/ES1/ES2 static functions it also gets exported
statically.
This commit adds step one towards clearing and simplifying our setup.
It imports the mapi generator from GLVND.
012fe39 ("Remove a couple of duplicate typedefs.")
v2: use local genCommon.py
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
The helper will also be used by the new Khronos gl.xml aware generator.
v2: Move existing one, instead of duplicating it.
v3: Correct genCommon.py references in meson [Erik]
v4: Drop the file from the EGL EXTRA_DIST [Erik]
Suggested-by: Kyle Brenneman <kbrenneman@nvidia.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Currently various parts of mesa use the glapi_table differently.
Some use _glapi_get_proc_offset() to get the offset, while others
directly reference the specific offset via _gloffset_Function.
Add all static entries, to ensure things don't break as we flip to the
upstream XML + new mapi generator.
Note: the offsets are also used for the alias remap table, thus we need
to ensure we honour the correct offsets range or it will break.
Currently this is done via MAX_OFFSETS constant, although a better
solution is in the works.
v2: add FramebufferTexture2DMultisampleEXT
v3: add MAX_OFFSETS guard
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> (v1)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
This reverts commit f1998e15ff.
This changes the ABI, such that glGetnTexImageARB entry-point from the
GLAPI gets removed. Thus accessing many functions by offset (as we do)
will result in getting the wrong one.
Follow-up work will swap the by-offset handling, but for now revert
this patch.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
These declarations are not used anywhere - be that generated code or
otherwise.
[Emil: format the hunk from Erik into a patch]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
With Windows in mind, using forward slash isn't the right thing to do.
Even if it just works, we might want to fix it.
As here, use __file__ instead of argv[0] and sys.path.insert over
sys.path.append. With the path tweak being reportedly faster.
Suggested-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Cheating a tiny bit as these headers aren't in the Khronos repo yet, but
I expect them to be within a couple days.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
emit_uniformize() emits SHADER_OPCODE_FIND_LIVE_CHANNEL with its
flag_subreg set, so that the IR knows which flag is accessed. However
the flag is only used on Gen7 in Align1 mode.
To avoid setting unnecessary bits in the instruction words, get the
information we need and reset the default flag register. This allows
round-tripping through the assembler/disassembler.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This reverts commit ff621a5055.
with default warnings configuration, this commit generates:
../src/egl/main/eglapi.c:2654:1: error: no previous prototype for
‘eglGetDisplayDriverConfig’ [-Werror=missing-prototypes]
When this functionality was added, the PRIMITIVES_GENERATED query was
accidentally omitted. This causes issues for drivers that support
transform feedback."
Fixes: d644698b44 ("gallium: Add the ability to query a single
pipeline statistics counter")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Sending MRs from the main Mesa repository increase clutter in the
repository, and decrease visibility of project-wide branches. So it's
better if MRs are sent from forks instead.
Let's add a note about this, in case its not obvious to everyone.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Fixes an assert we start hitting with kms/gbm:
#0 0x0000007fbf3d6e3c in raise () from /lib64/libc.so.6
#1 0x0000007fbf3c4a68 in abort () from /lib64/libc.so.6
#2 0x0000007fbf3d04e8 in __assert_fail_base () from /lib64/libc.so.6
#3 0x0000007fbf3d0550 in __assert_fail () from /lib64/libc.so.6
#4 0x0000007fbf5a73c4 in gbm_dri_bo_create (gbm=0x5820f0, width=2160, height=1440, format=875713112, usage=0, modifiers=0x695e00, count=1) at ../src/gbm/backends/dri/gbm_dri.c:1150
#5 0x0000007fbf5a49c4 in gbm_bo_create_with_modifiers (gbm=0x5820f0, width=2160, height=1440, format=875713112, modifiers=0x695e00, count=1) at ../src/gbm/main/gbm.c:491
#6 0x0000007fbbac3d64 in get_back_bo (dri2_surf=0x6f4cc0) at ../src/egl/drivers/dri2/platform_drm.c:258
#7 0x0000007fbbac4318 in dri2_drm_image_get_buffers (driDrawable=0x704490, format=4098, stamp=0x6fc730, loaderPrivate=0x6f4cc0, buffer_mask=1, buffers=0x7fffffe210) at ../src/egl/drivers/dri2/platform_drm.c:409
#8 0x0000007fbf5a5318 in image_get_buffers (driDrawable=0x704490, format=4098, stamp=0x6fc730, loaderPrivate=0x70e150, buffer_mask=1, buffers=0x7fffffe210) at ../src/gbm/backends/dri/gbm_dri.c:135
#9 0x0000007fbe4308c4 in dri_image_drawable_get_buffers (drawable=0x6fc730, images=0x7fffffe210, statts=0x6f2660, statts_count=1) at ../src/gallium/state_trackers/dri/dri2.c:339
#10 0x0000007fbe430c44 in dri2_allocate_textures (ctx=0x614b30, drawable=0x6fc730, statts=0x6f2660, statts_count=1) at ../src/gallium/state_trackers/dri/dri2.c:466
#11 0x0000007fbe435580 in dri_st_framebuffer_validate (stctx=0x714160, stfbi=0x6fc730, statts=0x6f2660, count=1, out=0x7fffffe3b8) at ../src/gallium/state_trackers/dri/dri_drawable.c:85
#12 0x0000007fbe7b2c84 in st_framebuffer_validate (stfb=0x6f2190, st=0x714160) at ../src/mesa/state_tracker/st_manager.c:222
#13 0x0000007fbe7b4884 in st_api_make_current (stapi=0x7fbf0430d8 <st_gl_api>, stctxi=0x714160, stdrawi=0x6fc730, streadi=0x6fc730) at ../src/mesa/state_tracker/st_manager.c:1074
#14 0x0000007fbe434f44 in dri_make_current (cPriv=0x703c20, driDrawPriv=0x704490, driReadPriv=0x704490) at ../src/gallium/state_trackers/dri/dri_context.c:301
#15 0x0000007fbe42c910 in driBindContext (pcp=0x703c20, pdp=0x704490, prp=0x704490) at ../src/mesa/drivers/dri/common/dri_util.c:579
#16 0x0000007fbbabab40 in dri2_make_current (drv=0x69d170, disp=0x69c6e0, dsurf=0x6f4cc0, rsurf=0x6f4cc0, ctx=0x70cb40) at ../src/egl/drivers/dri2/egl_dri2.c:1456
#17 0x0000007fbbaa8ef4 in eglMakeCurrent (dpy=0x69c6e0, draw=0x6f4cc0, read=0x6f4cc0, ctx=0x70cb40) at ../src/egl/main/eglapi.c:862
#18 0x0000007fbf5736ac in InternalMakeCurrentVendor (dpy=dpy@entry=0x614fb0, draw=draw@entry=0x6f4cc0, read=read@entry=0x6f4cc0, context=context@entry=0x70cb40, apiState=apiState@entry=0x6fc940, vendor=0x6975f0) at libegl.c:861
#19 0x0000007fbf573764 in InternalMakeCurrentDispatch (dpy=0x614fb0, draw=0x6f4cc0, read=0x6f4cc0, context=0x70cb40, vendor=0x6975f0) at libegl.c:630
#20 0x0000000000403640 in init_egl (egl=0x5805a8 <gl>, gbm=0x580528 <gbm>, samples=0) at ../common.c:263
#21 0x0000000000403c1c in init_cube_smooth (gbm=0x580528 <gbm>, samples=0) at ../cube-smooth.c:225
#22 0x0000000000408618 in main (argc=1, argv=0x7fffffe8d8) at ../kmscube.c:145
Fixes: 1ce5d757d0 freedreno: core buffer modifier support
Signed-off-by: Rob Clark <robdclark@gmail.com>
It's invalid to emit a ZPASS_DONE event on the compute queue, and
the fence BO is unused on the compute queue (ie. we don't flush
CB or DB caches).
This saves some space in the upload BO.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This parameter is actually useless as the immediate value
can always be zero.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For example, if a pipeline has two stages VS and FS. And if only
the fragment stage needs dynamic bindings, we shouldn't allocate
an extra user SGPR for the vertex stage. Of course, if the vertex
stage loads constants, it needs an user SGPR.
This should reduce the number of SET_SH_REG packets that are emitted.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
In the Intel backend, it makes the most sense to treat gl_TessLevelInner
and gl_TessLevelOuter as ordinary shader inputs. For Radeon, it makes
more sense to treat them as system values which get special handling.
We already have a compiler option for this, but the Iris driver will
need a capability bit so we can set it appropriately.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We may have to flush the cache if there are any textures presently bound
that refer to the outgoing framebuffer. This is only checked at
validation time.
Fixes a number of dEQP-GLES3.functional.fbo.color.repeated_clear.sample.*
tests, which would bind a texture, then clear it while the binding was
in effect, and then render to a different texture. This seems legal
under the "no feedback loops" rule.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Normally modifiers take precendence over use flags, as they are more
explicit. But if the driver supports modifiers, but the xserver does
not, then we should fallback to the old mechanism of allocating a buffer
using 'use' flags.
Fixes: 069fdd5f9f
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Fixes a static assertion which broke the build.
Fixes: 3ee240890 "gallium: add SINT formats to have exact counterparts to SNORM formats"
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Neha Bhende<bhenden@vmware.com>
Annoyingly, this requires that we implement integer division on the
command streamer. Fortunately, we're only ever dividing by constants so
we can use the mulh+add+shift trick and it's not as bad as it sounds.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In order to allow nir_gather_xfb_info to be used on OpenGL,
specifically ARB_gl_spirv.
So, from OpenGL 4.6 spec, section 11.1.2.1, "Output Variables":
"outputs specifying both an *XfbBuffer* and an *Offset* are
captured, while outputs not specifying both of these are not
captured. Values are captured each time the shader writes to such
a decorated object."
This implies that are captured if both are present, and not if one of
those are lacking. Technically, it doesn't explicitly point that
having just one or the other is a mistake. In some cases, glslang is
adding some extra XfbBuffer without XfbOffset around, and mentioning
that technically that is not a bug (see issue#1526)
And for the case of Vulkan, as the same glslang issue mentions, it is
not clear if that should be a mistake or not. But even if it is a
mistake, it is not really needed to be checked on the driver, and we
can let the validation layers to check that.
v2: simplify explicit_xfb_buffer and explicit_offset checks (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Before, we were double-counting the component slots when we had a dvec3
or dvec4. Instead, just add them in once and manually offset the
recorded output offset.
Fixes: 19064b8c "nir: Add a pass for gathering transform feedback info"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
If we have a transform feedback output like:
float[2] x2_out (VARYING_SLOT_VAR1.x, 0, 0)
which is lowered by nir_lower_io_arrays_to_elements to,
float x2_out (VARYING_SLOT_VAR1.x, 0, 0)
float x2_out@5 (VARYING_SLOT_VAR2.x, 0, 0)
We have to update the destination offset to avoid overwriting
the same value.
v2 (Jason Ekstrand):
- Compute the correct offsets for arrays of vectors and/or doubles
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
When a xfb buffer is explicitely declared on a varying
variable, we shouldn't remove it at link time.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Instead of setting interface_type to whatever the per-vertex type is, we
only set it on blocks. This allows later passes to tell the difference
between variables that are in blocks and those that aren't.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Instead of splitting every per-vertex struct, just split the ones that
are actually blocks. The reason for the split is so that we have
separate variables for separate locations, qualifiers, and builtin
decorations. The vulkan spec only allows these on members of blocks.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This seems to make the simulator happier. The early return wasn't
really protecting anything and the code that follows will happily
initialize the dummy element to STORE_0 and emit it.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Issue was hit with this configuration:
--disable-{egl,gbm} --with-platform=drm
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes: 3208fd2e46 ("configure: move platform handling further up")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Two cases:
* replacing srcs which refer to MOV instructions
* replacing MOVs used to write to exports
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
If we want to use a scalar instruction with two sources, both sources have
to be in the same register. This covers a common case by inserting a scalar
MOV into a previous instruction with only a vector alu instruction.
A better method would be to have the sources end up in the same register in
the first place, but when one source is a constant this is the only way.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
This patch replaces the a2xx TGSI compiler with a NIR compiler.
It also adds several new features:
-gl_FrontFacing, gl_FragCoord, gl_PointCoord, gl_PointSize
-control flow (including loops)
-texture related features (LOD/bias, cubemaps)
-filling scalar ALU slot when possible
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Take away const qualifier from return type of these functions as
-Wignored-qualifiers points out it is ignored for these cases.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
vtn supports these, so don't squalk if user is happy with enabling
these.
v2: add new members sorted
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Both the Intel and RADV people have been really bad about adding things
to the release notes. We should start actually paying attention.
Acked-by: Tapani Pälli <tapani.palli@intel.com>
It's common in some applications to bind a new graphics pipeline without
ending up changing any context registers.
This has a pipline have two command buffers: one for setting context
registers and one for everything else. The context register command buffer
is only emitted if it differs from the previous pipeline's.
v2: ensure late scissor emission is done when radv_emit_rbplus_state() is
called
v2: make use of cmd_buffer->state.workaround_scissor_bug
v3: rename "workaround_scissor_bug" to
"context_roll_without_scissor_emitted"
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
On a20x, set VGT_VERTEX_REUSE_BLOCK_CNTL to 2 and don't change it. Small
rearrangement on a220 to reduce the size of draw commands.
Only set DEALLOC_CNTL on a20x because the correct a220 value is not known.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Fixes cases where previous viewport values might case gmem2mem to fail.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Doesn't change much, but reduces the size of fd2_emit_state
gmem2mem does not need to change the value: no Z clipping on resolve
mem2gmem now needs to restore the common value after rendering
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Only 3 vertices are used so we can drop the data for vertex 4
It doesn't make sense to have 1.1 for some coordinates, use 1.0 instead
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
v2: add assert to verify we have at least one valid bit_size
v3: fix use of load_front_face in nir_lower_two_sided_color and tgsi_to_nir
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
With OpenCL some system values match the address bits, but in GLSL we also
have some system values being 64 bit like subgroup masks.
With this it is possible to adjust the builder functions so that depending
on the bit_sizes the correct bit_size is used or an additional argument is
added in case of multiple possible values.
v2: validate dest bit_size
v3: generate hex values in python code
remove useless imports
rename and move bit_sizes
v4: add 1 to legal bit_sizes for front_face
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
fixes a couple of deqp tests (on nvc0 and potential other drivers):
dEQP-GLES3.functional.shaders.invariance.highp.common_subexpression_1
dEQP-GLES3.functional.shaders.invariance.highp.common_subexpression_2
dEQP-GLES3.functional.shaders.invariance.highp.common_subexpression_3
dEQP-GLES3.functional.shaders.invariance.mediump.common_subexpression_1
dEQP-GLES3.functional.shaders.invariance.mediump.common_subexpression_2
dEQP-GLES3.functional.shaders.invariance.mediump.common_subexpression_3
dEQP-GLES3.functional.shaders.invariance.lowp.common_subexpression_1
dEQP-GLES3.functional.shaders.invariance.lowp.common_subexpression_2
dEQP-GLES3.functional.shaders.invariance.lowp.common_subexpression_3
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
A little bit of explanation regarding how vkCmdPipelineBarrier()
works.
v2: Avoid referring to data port cache when it's actually sampler
caches (Jason)
Complete explanation for indirect draws (Jason)
v3: s/samplers/sampler/ (Jason)
s/UBOs/data port/
Add documentation for VK_ACCESS_CONDITIONAL_RENDERING_READ_BIT_EXT (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
In commit 9a7b319903 ("anv/query: flush render target before
copying results") we tracked all the render target writes to apply a
flushes in the vkCopyQueryResults(). But we can narrow this down to
only when we write a buffer (which is the only input of
vkCopyQueryResults).
v2: Drop newer render target write flags introduce by 1952fd8d2c
("anv: Implement VK_EXT_conditional_rendering for gen 7.5+")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Currently we only add a cache key for a shader once it is linked.
However games like Team Fortress 2 compile a whole bunch of shaders
which are never actually linked. These compiled shaders can take
up a bunch of memory.
This patch changes things so that we add the key for the shader to
the cache as soon as it is compiled. This means on a warm cache we
can avoid the wasted memory from these shaders. Worst case scenario
is we need to compile the shaders at link time but this can happen
anyway if the shader has been evicted from the cache.
Reduces memory use in Team Fortress 2 from 1.3GB -> 770MB on a
warm cache from start up to the game menu.
V2: only add key to cache when compilation is successful.
Acked-by: Marek Olšák <marek.olsak@amd.com>
The docs are fairly incomplete and inconsistent about it, but this
seems to be the reason why half-float destinations are required to be
DWORD-aligned on BDW+ projects. This way the regioning lowering pass
will make sure that the destination components of W to HF and HF to W
conversions are aligned like the corresponding conversion operation
with 32-bit execution data type.
Tested-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This builds on the recent interpolate fix by Rhys ee8488ea3b.
This fixes the arb_gpu_shader5 interpolateAt* tests that contain
arrays.
Fixes: ee8488ea3b ("ac/nir,radv,radeonsi/nir: use correct indices for interpolation intrinsics")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The color swap isn't available for tiled formats and it's not needed
either. We pick one channel order and use for all non-linear formats.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Staging blit downloads would wait on the src resource instead of the
staging resource and didn't make sure to submit the blit batch first.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Currently we only add a cache key for a shader once it is linked.
However games like Team Fortress 2 compile a whole bunch of shaders
which are never actually linked. These compiled shaders can take
up a bunch of memory.
This patch changes things so that we add the key for the shader to
the cache as soon as it is compiled. This means on a warm cache we
can avoid the wasted memory from these shaders. Worst case scenario
is we need to compile the shaders at link time but this can happen
anyway if the shader has been evicted from the cache.
Reduces memory use in Team Fortress 2 from 1.3GB -> 770MB on a
warm cache from start up to the game menu.
Acked-by: Marek Olšák <marek.olsak@amd.com>
This basically reverts c2bc0aa7b1.
By running the opts we reduce memory using in Team Fortress 2
from 1.5GB -> 1.3GB from start-up to game menu.
This will likely increase Deus Ex start up times as per commit
c2bc0aa7b1. However currently 32bit games like Team Fortress 2
can run out of memory on low memory systems, so that seems more
important.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Passes' function names, separated by comma, listed in NIR_SKIP
environment variable will be skipped in debug mode. The mechanism is
hooked into the _PASS macro, like NIR_PRINT.
The extra macro NIR_SKIP is available as a developer convenience, to
skip at pointer other than the passes entry points.
v2: Fix typo in NIR_SKIP macro. (Bas)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Conditional rendering affects next functions:
- vkCmdDraw, vkCmdDrawIndexed, vkCmdDrawIndirect, vkCmdDrawIndexedIndirect
- vkCmdDrawIndirectCountKHR, vkCmdDrawIndexedIndirectCountKHR
- vkCmdDispatch, vkCmdDispatchIndirect, vkCmdDispatchBase
- vkCmdClearAttachments
Value from conditional buffer is cached into designated register,
MI_PREDICATE is emitted every time conditional rendering is enabled
and command requires it.
v2: by Jason Ekstrand
- Use vk_find_struct_const instead of manually looping
- Move draw count loading to prepare function
- Zero the top 32-bits of MI_ALU_REG15
v3: Apply pipeline flush before accessing conditional buffer
(The issue was found by Samuel Iglesias)
v4: - Remove support of Haswell due to possible hardware bug
- Made TMP_REG_PREDICATE and TMP_REG_DRAW_COUNT defines to
define registers in one place.
v5: thanks to Jason Ekstrand and Lionel Landwerlin
- Workaround the fact that MI_PREDICATE_RESULT is not
accessible on Haswell by manually calculating
MI_PREDICATE_RESULT and re-emitting MI_PREDICATE
when necessary.
v6: suggested by Lionel Landwerlin
- Instead of calculating the result of predicate once - re-emit
MI_PREDICATE to make it easier to investigate error states.
v7: suggested by Jason
- Make anv_pipe_invalidate_bits_for_access_flag add CS_STALL
if VK_ACCESS_CONDITIONAL_RENDERING_READ_BIT is set.
v8: suggested by Lionel
- Precompute conditional predicate's result to
support secondary command buffers.
- Make prepare_for_draw_count_predicate more readable.
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2: by Jason Ekstrand
- Move out of the draw loop population of registers
which aren't changed in it.
- Remove dependency on ALU registers.
- Clarify usage of PIPE_CONTROL
- Without usage of ALU registers patch works for gen7+
v3: set pending_pipe_bits |= ANV_PIPE_RENDER_TARGET_WRITES
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Native file support in command line serialization isn't present in meson
0.49, but will be for 0.49.1 and 0.50
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
In some shaders, you can end up with a stride in the source of a
SHADER_OPCODE_MULH. One way this can happen is if the MULH is acting on
the top bits of a 64-bit value due to 64-bit integer lowering. In this
case, the compiler will produce something like this:
mul(8) acc0<1>UD g5<8,4,2>UD 0x0004UW { align1 1Q };
mach(8) g6<1>UD g5<8,4,2>UD 0x00000004UD { align1 1Q AccWrEnable };
The new region fixup pass looks at the MUL and sees a strided source and
unstrided destination and determines that the sequence is illegal. It
then attempts to fix the illegal stride by replacing the destination of
the MUL with a temporary and emitting a MOV into the accumulator:
mul(8) g9<2>UD g5<8,4,2>UD 0x0004UW { align1 1Q };
mov(8) acc0<1>UD g9<8,4,2>UD { align1 1Q };
mach(8) g6<1>UD g5<8,4,2>UD 0x00000004UD { align1 1Q AccWrEnable };
Unfortunately, this new sequence isn't correct because MOV accesses the
accumulator with a different precision to MUL and, instead of filling
the bottom 32 bits with the source and zeroing the top 32 bits, it
leaves the top 32 (or maybe 31) bits alone and full of garbage. When
the MACH comes along and tries to complete the multiplication, the
result is correct in the bottom 32 bits (which we throw away) and
garbage in the top 32 bits which are actually returned by MACH.
This commit does two things: First, it adds an assert to ensure that we
don't try to rewrite accumulator destinations of MUL instructions so we
can avoid this precision issue. Second, it modifies
required_dst_byte_stride to require a tightly packed stride so that we
fix up the sources instead and the actual code which gets emitted is
this:
mov(8) g9<1>UD g5<8,4,2>UD { align1 1Q };
mul(8) acc0<1>UD g9<8,8,1>UD 0x0004UW { align1 1Q };
mach(8) g6<1>UD g5<8,4,2>UD 0x00000004UD { align1 1Q AccWrEnable };
Fixes: efa4e4bc5f "intel/fs: Introduce regioning lowering pass"
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
For a long time, we based exec sizes on destination register widths.
We've not been doing that since 1ca3a94427 but a few remnants
accidentally remained.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Otherwise writes get propagated across atomics if no barrier is
used. Without barrier writes should still be visible in the same
invocation, so an atomic has to be considered a write.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: b3c6146925 "nir: Copy propagation between blocks"
Fixes: 62332d139c "nir: Add a local variable-based copy propagation pass"
Add a test that checks that we can use the extra space allocated for
padding while allocating larger anv_states.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If softpin is supported, create new BOs for the required size and add the
respective BO maps. The other main change of this commit is that
anv_block_pool_map() now returns the map for the BO that the given
offset is part of. So there's no block_pool->map access anymore (when
softpin is used.
v3:
- set fd to -1 on softpin case (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We are not going to use userptr for anv block pool BOs anymore. However,
so far we have been relying on the fact that userptr BOs are snooped on
non-llc platforms. Let's make sure that the block pool BOs are still
snooped, and we can also remove the clflush'ing that we do on all state
buffers.
And since we plan to remove the flushes, set the anv_bo_pool BOs to
cached (snooped on non-LLC platforms) too. For LLC platforms, they are
all cached by default, so this becomes a no-op.
v5:
- Add snooping to anv_bo_pool BOs too (Jason).
- Remove anv_gem_set_domain.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
It's possible that we still have some space left in the block pool, but
we try to allocate a state larger than that state. This means such state
would start somewhere within the range of the old block_pool, and end
after that range, within the range of the new size.
That's fine when we use userptr, since the memory in the block pool is
CPU mapped continuously. However, by the end of this series, we will
have the block_pool split into different BOs, with different CPU
mapping ranges that are not necessarily continuous. So we must avoid
such case of a given state being part of two different BOs in the block
pool.
This commit solves the issue by detecting that we are growing the
block_pool even though we are not at the end of the range. If that
happens, we don't use the space left at the end of the old size, and
consider it as "padding" that can't be used in the allocation. We update
the size requested from the block pool to take the padding into account,
and return the offset after the padding, which happens to be at the
start of the new address range.
Additionally, we return the amount of padding we used, so the caller
knows that this happens and can return that padding back into a list of
free states, that can be reused later. This way we hopefully don't waste
any space, but also avoid having a state split between two different
BOs.
v3:
- Calculate offset + padding at anv_block_pool_alloc_new (Jason).
v4:
- Remove extra "leftover".
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This commit tries to rework the code that split and returns chunks back
to the state pool, while still keeping the same logic.
The original code would get a chunk larger than we need and split it
into pool->block_size. Then it would return all but the first one, and
would split that first one into alloc_size chunks. Then it would keep
the first one (for the allocation), and return the others back to the
pool.
The new anv_state_pool_return_chunk() function will take a chunk (with
the alloc_size part removed), and a small_size hint. It then splits that
chunk into pool->block_size'd chunks, and if there's some space still
left, split that into small_size chunks. small_size in this case is the
same size as alloc_size.
The idea is to keep the same logic, but make it in a way we can reuse it
to return other chunks to the pool when we are growing the buffer.
v2:
- Include Jason's suggestions to the algorithm that returns chunks.
- Update comments.
v3:
- Disallow returning 0 blocks (Jason).
- fix min_size in the loop (Jason).
- remove temporary variables (Jason)
v4:
- return_chunk() should never return blocks larger than
pool->block_size.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We now have multiple BOs in the block pool, but sometimes we still
reference only the first one in some instructions, and use relative
offsets in others. So we must be sure to add all the BOs from the block
pool to the validation list when submitting commands.
v2:
- Don't add block pool BOs to the dependency list right before
execbuf (Jason)
- Call anv_execbuf_add_bo() to each BO in the block pools (Jason)
- Use anv_execbuf_add_bo_set() to add surface state dependencies to
execbuf.
v3:
- Add comment to the non-softpin case (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This part of the anv_execbuf_add_bo() code is totally independent of the
BO being added. Let's split it out, so we can reuse it later.
v3: rename to anv_execbuf_add_bo_set (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
So far we use only one BO (the last one created) in the block pool. When
we switch to not use the userptr API, we will need multiple BOs. So add
code now to store multiple BOs in the block pool.
This has several implications, the main one being that we can't use
pool->map as before. For that reason we update the getter to find which
BO a given offset is part of, and return the respective map.
v3:
- Simplify anv_block_pool_map (Jason).
- Use fixed size array for anv_bo's (Jason)
v4:
- Respect the order (item, container) in anv_block_pool_foreach_bo
(Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Change block_pool->bo to be a pointer, and update its usage everywhere.
This makes it simpler to switch it later to a list of BOs.
v3:
- Use a static "bos" field in the struct, instead of malloc'ing it.
This will be later changed to a fixed length array of BOs.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
After switching to using anv_state_table, there are very few places left
still using pool->map directly. We want to avoid that because it won't
be always the right map once we split it into multiple BOs.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Use anv_state_pool_return_blocks() to return blocks to the pool, instead
of manually pushing them.
v3:
- return blocks from the end of the chunk (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The use of anv_state_table_add() combined with anv_state_table_push(),
specially when adding a bunch of states to the table, is very verbose.
So we add this helper that makes things easier to digest.
We also already add the anv_state_table member in this commit, so things
can compile properly, even though it's not used.
v2: assert that the states are always aligned to their size (Jason)
v3: Add "table" member to anv_state_pool in this commit.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We will need the anv_block_pool_map to find the map relative to some BO
that is not at the start of the block pool.
v2: just return a pointer instead of a struct (Jason)
v4: Update comment (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Add a structure to hold anv_states. This table will initially be used to
recycle anv_states, instead of relying on a linked list implemented in
GPU memory. Later it could be used so that all anv_states just point to
the content of this struct, instead of making copies of anv_states
everywhere.
One has to call anv_state_table_add(), which returns an index for the
state in the table, and then get a pointer to such index, and finally
fill in the rest of the struct.
TODO:
1) There's a lot of common code between this table backing store
memory and the anv_block_pool buffer, due to how we grow it. I think
it's possible to refactory this and reuse code on both places.
2) Add unit tests.
v3:
- Rename state table memfd (Jason)
- Return VK_ERROR_OUT_OF_HOST_MEMORY on more places (Jason)
- anv_state_table_grow returns VkResult (Jason)
- Rename variables to be more informative (Jason)
- Return errors on state table grow.
- Rename anv_state_table_push/pop to anv_free_list_push2/pop2
This will be renamed again to remove the trailing "2" later.
v4:
- Remove exit(-1) from anv_state_table (Jason).
- Use uint32_t "next" field in anv_free_entry (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
There were 2 problems with this test.
First it was comparing highest, which was -1, with an uint32_t. So the
current value would never be higher than that, and the assert would
always be false. It just never reached this point because of the next
problem.
It was always looking for the highest value of each thread and storing
it in thread_max. So a test case like this wouldn't work:
[Thread]: [Blocks]
[0]: [0, 32, 64, 96]
[1]: [128, 160, 192, 224]
[2]: [256, 288, 320, 352]
Not only that would skip values and iterate only over thread number 2,
instead of walking through all of them, but thread_max was also
initialized to -1. And then compared to unsigned blocks[i][next[i].
We fix that by getting the smallest value of each thread, and checking
if it is lower than thread_min, which is initialized to INT32_MAX. And
then we end up walking through all the blocks of all threads. We also
change "blocks" to be int32_t instead of uint32_t, since in some places
(alloc_blocks) it was already referenced as int32_t, and that fixes the
comparison to -1.
v2:
- keep highest initialized to -1, and change blocks to be int32_t.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We had defined MAX_IMAGES as 8, which we used to size the array for
image push constant data. The comment there stated that this was for
gen8, but anv_nir_apply_pipeline_layout runs for all gens and writes
that array, asserting that we don't exceed that number of images,
which imposes a limit of MAX_IMAGES on all gens.
Furthermore, despite this, we are exposing up to 64 images per shader
stage on all gens, gen8 included.
This patch lowers the number of images we expose in gen8 to 8 and
keeps 64 images for gen9+ while making sure that only pre-SKL gens
use push constant space to handle images.
v2:
- <= instead of < in the assert (Eric, Lionel)
- Change the way the assertion is written (Eric)
v3:
- Revert the way the assertion is written to the form it had in v1,
the version in v2 was not equivalent and was incorrect. (Lionel)
v4:
- gen9+ doesn't need push constants for images at all (Jason)
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v3)
Fixes following failing vk-gl-cts cases on Linux desktop:
dEQP-VK.api.external.memory.android_hardware_buffer.suballocated.buffer.info
dEQP-VK.api.external.memory.android_hardware_buffer.suballocated.image.info
dEQP-VK.api.external.memory.android_hardware_buffer.dedicated.image.info
dEQP-VK.api.external.memory.android_hardware_buffer.dedicated.buffer.info
Fixes: 517103abf1 "anv/android: add ahardwarebuffer external memory properties"
Reported-by: Juan A. Suarez <jasuarez@igalia.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Noticed while debugging V3D -- the ro->gpu_fd was freshly opened in ro
setup, and it needs to stay open until screen close (since it may be used
by renderonly) and should be the same one used by the vc4 screen.
Fixes: 7029ec05e2 ("gallium: Add renderonly-based support for pl111+vc4.")
The CTS was running out of fds, because of the ro->gpu_fd never being
closed. ro->gpu_fd should match the screen (in case the caller of
v3d_drm_screen_create_renderonly() has a scanout_for_resource() that uses
gpu_fd) and the screen is expected to close its fd at the end, fixing the
resource leak.
Fixes: e113b21cb7 ("v3d: Add renderonly support.")
I had bugs in the old path where I was laying out as tiled (so we'd render
tiled) but then only allocating space in the shared object for linear
rendering. The resource_from_handle makes it so the same layout choices
are made in both the import and export scanout cases. Also, fixes a leak
of the fd that was tripping up the CTS.
Now that we're checking PIPE_BIND_SHARED to choose to use RO, the
DRM_FORMAT_MOD_LINEAR check wasn't needed any more.
Fixes visual corruption and MMU faults in X in renderonly mode.
Fixes: bd09bb1629 ("v3d: SHARED but not necessarily SCANOUT buffers on RO must be linear.")
The current code only strips off arrays and cannot find the type
for images that are struct members.
Instead of trying to get the image type from the variable, we just
get it directly from the deref instruction.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This matches the behaviour of AMDVLK and hides banding.
It is also seems to be allowed by the Vulkan spec.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This improves cache filesystem performance, especially during CI tests
Also updated jitcache magic number due to codegen parameter changes
Removed 2 `if constexpr` to prevent C++17 requirement
Avoids confusion with other defaulted integer parameters
- fixed some unspecified usages
- removed unnecessary includes
- removed unecessary protected access specifier in buckets framework
- added graphics address translation in odd gathers
- added support for unaligned gathers in fetch shader
- changed how 2+ GB offsets are handled to make them compatible with
unaligned offsets
- updated sample from TRTT surfaces correctly
- implemented mapped status return for TRTT surfaces
- implemented per-sample instruction minLod clamp
- updated bilinear filter weight calculation to be closer to D3D specs
- implemented "ReducedTexcoordRange" operation from D3D specs to avoid
loss of precision on high-value normalized coordinates
This was already enabled for gallium based osmesa with gallium drivers
in 9d10581897, so do the same for classic
driver with classic osmesa.
Fixes: cbbd5bb889
("meson: build classic osmesa")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Various recreation scenarios lead to API thread getting stuck in
swr_fence_finish(). This is a multi-context issue, whereby one context
overwrites the fence read-value with a previous sync's lesser value.
The fence sync value is supposed to be always increasing.
In swr_fence_cb(), only update the "read" value if the new value is
greater.
(This may seem like we're not waiting on the other context to finish, but
had we needed for it to finish there would have been a wait prior to
submitting a new sync.)
cc: mesa-stable@lists.freedesktop.org
Intel's blending hardware does not properly return 1.0 for destination
alpha for RGBX formats; it requires the factors to be overridden to
either zero or one. Broadcom vc4 and v3d also could use this override.
While overriding these factors is safe in general, Nouveau and Radeon
would prefer not to. Their blending hardware already returns correct
values for RGB/RGBX formats, and would like to avoid the resulting
per-buffer blending and independent blend factors (rgb != a) since it
can cause additional overhead.
I considered simply handling this in the driver, but it's not as nice.
pipe_blend_state doesn't have any format information, so we'd need the
hardware blend state to depend on both pipe_blend_state and
pipe_framebuffer_state. Furthermore, Intel GPUs don't have a native
RGBX_SNORM format, so I avoid exposing one, which makes Gallium fall
back to RGBA_SNORM. The pipe_surfaces we get in the driver have an RGBA
format, making it impossible to tell that there shouldn't be an alpha
channel. One could argue that st not handling it in that case is a bug.
To work around this, we'd have to expose RGBX pipe formats, mapped to
RGBA hardware formats, and add format swizzling special cases. All
doable, but it ends up being more code than I'd like.
st_atom_blend already has access to the right information and it's
trivial to accomplish there, so we just add a cap bit and do that.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
`with_gallium_icd` is never used throughout the different Meson build
files, whereas `with_opencl_icd` tracks whether or not `gallium-opencl`
was set to "icd".
Fixes: 42ea0631f1
("meson: build clover")
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Gallium historically has treated pipeline statistics queries as a single
query, PIPE_QUERY_PIPELINE_STATISTICS, which returns a block of 11
values. This was originally patterned after the D3D1x API. Much later,
Brian introduced an OpenGL extension that exposed these counters - but
it exposes 11 separate queries, each of which returns a single value.
Today, st/mesa simply queries all 11 values, and returns a single value.
While pipeline statistics counters aren't typically performance
critical, this is still not a great fit. A D3D1x->GL translator might
request all 11 counters by creating 11 separate GL queries...which
Gallium would map to reads of all 11 values each time, resulting in a
total 121 counter reads. That's not ideal.
This patch adds a new cap, PIPE_CAP_QUERY_PIPELINE_STATISTICS_SINGLE,
and corresponding query type PIPE_QUERY_PIPELINE_STATISTICS_SINGLE.
When calling create_query(), q->index should be set to one of the
PIPE_STAT_QUERY_* enums to select a counter. Unlike the block query,
this returns the value in pipe_query_result::u64 (as it's a single
value) instead of the pipe_query_data_pipeline_statistics group.
We update st/mesa to expose ARB_pipeline_statistics_query if either
capability is set, preferring the new SINGLE variant when available.
Thanks to Roland, Ilia, and Marek for helping me sort this out.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This just changes the order of the switch statements, so we only
look at target if the query type is PIPE_QUERY_PIPELINE_STATISTICS.
The next commit will introduce a new SINGLE query type which can be
used for the same GL query types, and it won't want this processing.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Gallium handles pipeline statistics queries as a single query
(PIPE_QUERY_PIPELINE_STATISTICS) which returns a struct with 11 values.
Sometimes it's useful to refer to each of those values individually,
rather than as a group. To avoid hardcoding numbers, we define a new
enum for each value. Here, the name and enum value correspond to the
index in the struct pipe_query_data_pipeline_statistics result.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
A simple Vulkan extension that allows apps to query size and
usage of all exposed memory heaps.
The different usage values are not really accurate because
they are per drm-fd, but they should be close enough.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Looking at -pro we need to enable it for pipelines with just a
GS too.
This seems to reduce the hangs from
https://bugs.freedesktop.org/show_bug.cgi?id=109242 on a RX 550 to
the point where I can't reproduce, after the false start with the
wd_switch_on_eop patch due to flakiness.
(but people are reporting it does not fix the issue completely for
them on polaris 11)
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We get a payload for the ivec3 workgroup and an int local invocation
index, and we use the core lowering to turn into the global invocation id
and the local invocation id ivec3s.
This is only exposed on V3D 4.1+, because we didn't have the TMU write
operations for images on 3.3 (To do GLES 3.1 there, you have to lower it
to SSBO load/stores, which is a problem to solve later).
This was an arbitrary "we support lots of stuff" value when I started the
driver. However, at 400 we expose OES_gpu_shader5, which claims support
for dynamically indexing samplers, which the driver doesn't do yet.
We've been relying on linking splitting up our varying matrices into
separate vectors, but with SSO that doesn't happen. Supporting matrix
inputs isn't too hard, though.
Otherwise, the simulator raises the GMP interrupt and waits for it to be
handled, and v3d ends up spinning in v3d_hw_tick(). Aborting right when
violation happens gives us a chance to look at the backtrace of whatever
thread triggered the violation.
Fix crashes with
dEQP-VK.spirv_assembly.instruction.compute.workgroup_memory.*16
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Causes hangs on some machines.
What works for dEQP-VK.tessellation.shader_input_output.barrier:
- running num_patches = 6 (which limits LDS to 32 KiB)
- running num_patches = 8, and artificially cutting LDS size at 32 KiB.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
From @jekstrand's nir-1-bit-bool branch, with improved ior/inot lowering.
ior: fmax instead of fadd allows removing the fsat.
inot: seq(x, 0) can be better than fsub(1, x). On a2xx, it works better
with the scalar instruction set.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Replace calls to create hash tables and sets that use
_mesa_hash_pointer/_mesa_key_pointer_equal with the helpers
_mesa_pointer_hash_table_create() and _mesa_pointer_set_create().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Replace calls to create hash tables and sets that use
_mesa_hash_pointer/_mesa_key_pointer_equal with the helpers
_mesa_pointer_hash_table_create() and _mesa_pointer_set_create().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Eric Engestrom <eric@engestrom.ch>
These combinations are common enough and deserve a shortcut.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Eric Engestrom <eric@engestrom.ch>
This function is modeled after the aux_op functions except that it has a
lot more parameters because it deals with two images as well as source
and destination regions.
Function's out variable could be an array dereferenced by an array:
func(v[w[i]]);
or something more complicated.
Copy index in any case.
Fixes: 76c27e47b9 ("glsl: Copy function out to temp if we don't directly ref a variable")
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The original idea was that the backend compiler could eliminate
surfaces, so we would have it mark which ones are actually used,
then shrink the binding table accordingly. Unfortunately, it's a
pretty blunt mechanism - it can only prune things from the end,
not the middle - since we decide the layout before we even start
the backend compiler, and only limit the size. It also basically
gives up if it sees indirect array access.
Besides, we do the vast majority of our surface elimination in NIR
anyway, not the backend - and I don't see that trend changing any
time soon. Vulkan abandoned this plan a long time ago, and I don't
use it in Iris, but it's still been kicking around in i965.
I hacked shader-db to print the binding table size in bytes, and
observed no changes with this patch. So, this code appears to do
nothing useful.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
DeprecationWarning: the imp module is deprecated in favour of importlib
Instead of complicated logic, just import the file directly.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Instead of emitting all of the conditions for the cases of a switch
statement up-front, emit them on-the-fly as we emit the code for each
case. The original justification for this was that we were going to
have to build a default case anyway which would need them all. However,
we can just trust CSE to clean up the mess in that case. Emitting each
condition right before the if statement that uses it reduces register
pressure and, in one customer benchmark, reduces spilling and improves
performance by about 2x.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Even though no one's been brave enough to ever use this pass, I like to
keep it functionally working.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instead of applying the workaround universally, detect semi-old GLSLang
via the generator ID and only enable the workaround on old GLSLang.
This isn't nearly as precise as one would like it to be because the
first GLSLang generator id version bump was on October 7, 2017 which is
about 1.5 years after the bug was fixed. However, it at least lets us
disable it for non-GLSLang and for more modern versions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
A long time in a galaxy far far away, there was a GLSLang bug with how
it handled samplers passed in as function parameters. (The bug can be
found here: https://github.com/KhronosGroup/glslang/issues/179.)
Unfortunately, that version was shipped in several apps and has been
causing heartburn for our SPIR-V parser ever since.
Recent changes to NIR uncovered a moderately old bug in how we work
around this issue. In particular, we ended up with a deref_cast from
uniform to local which is not a no-op cast so nir_opt_deref wasn't
getting rid of the cast. The only reason why it worked before was
because someone just happened to call nir_fixup_deref_modes which
"fixed" the cast (that shouldn't be happening) and then a later round of
copy-prop would get rid of it. The fact that the deref_cast survived
that long without causing trouble for other parts of NIR is a bit
surprising.
Just whacking the mode of the pointer seems to fix it fairly
unobtrusively. Currently, only apps with this bug will have a local
variable containing an image or sampler.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109304
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
If the TCS and TES are linked together, we can simply replace the TES's
gl_PatchVerticesIn system value with a constant, possibly allowing extra
optimization or letting the driver avoid uploading a special value.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
With the new handling of bool types, the conversion to float in glsl_to_nir
should not apply to bool types anymore.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
out_type in the default cast case is always GLSL_TYPE_FLOAT, so we get a
mov otherwise.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
All alu instructions emitted with native_integers=false expect float
(or bool in some cases) constants, so this change is necessary.
This will cause changes with some intrinsics which had integer sources,
such as nir_intrinsic_load_uniform. Apparently it might cause issues with
some opt passes, but perhaps those don't apply in OpenGL ES 2.0 cases?
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The num_components value passed into get_mul_for_src is used to only
compose the parts of the swizzle that we know will be used so we don't
compose invalid swizzle components. However, we had a bug where we
passed the number of components of the add all the way through. For the
given source, we need the number of components read from that source.
In the case where we have a narrow add, say 2 components, that is
sourced from a chain of wider instructions, we may not compose all the
swizzles. All we really need to do is pass through the right number of
components at each level.
Fixes: 2231cf0ba3 "nir: Fix output swizzle in get_mul_for_src"
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
GL_ARB_gl_spirv does not provide a sampler deref for e.g. texelFetch(),
so we can't assume that both are present and identical. Simply lower
each if it is present.
Fixes regressions in GL_ARB_gl_spirv tests since I switched everyone to
using this pass. Thanks to Alejandro Piñeiro for catching these.
Fixes: f003859f97 nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Alejandro Piñeiro <apinheiro@igalia.com>
Fixes the following errors:
usage: which [-as] program ...
/Users/travis/.travis/job_stages: line 110: --version: command not found
... caused by the use of an undefined $LLVM_CONFIG
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This will allow drivers to pin shader buffers if necessary.
i965 and anv do not need to do this today, but iris will.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Currently, BLORP expects drivers to provide two functions for dealing
with buffers: blorp_emit_reloc and blorp_surface_reloc. Both record a
relocation and combine the BO address and offset into a full 64-bit
address. Traditionally, blorp_surface_reloc has written that combined
address to an implicitly-known buffer where surface states are stored.
(In contrast, blorp_emit_reloc returns the value.)
The upcoming Iris driver stores surface states in multiple buffers,
which makes it impossible for blorp_surface_reloc to write the combined
address - it only takes an offset, not the actual buffer to write to.
This commit adds a third function, blorp_get_surface_address, which
combines and returns an address, which is then passed to ISL's surface
state fill functions. Softpin-only drivers can return a real address
here and skip writing it in blorp_surface_reloc. Relocation-based
drivers are have options. They can simply return 0 from the new
function, and continue writing the address from blorp_surface_reloc.
Or, they can return a presumed address from blorp_get_surface_address,
and have other relocation processing write the real value later.
For now, i965 and anv simply return 0.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Make sure that the next line starts with spaces so that bullets are
maintained throughout, add `` around a few more special tokens, and fix
SAMPLE_COUNT_TEXTURE -> SAMPLE_COUNT.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
In all GLSL ES versions output variables in fragment shader are allowed
to be invariant.
From Section 4.6.1 ("The Invariant Qualifier") GLSL ES 1.00 spec:
"Only the following variables may be declared as invariant:
...
- Built-in special variables output from the fragment shader."
From Section 4.6.1 ("The Invariant Qualifier") GLSL ES 3.00 spec:
"Only variables output from a shader can be candidates for invariance."
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107842
This adds a second level of caching for the pre-lowered NIR that's only
based off of the shader module, entrypoint and specialization constants.
This is enough for spirv_to_nir as well as our first round of lowering
and optimization. Caching at this level should allow for faster shader
recompiles due to state changes.
The NIR caching does not get serialized to disk via either the
VkPipelineCache serialization mechanism or the transparent on-disk
cache. We could but it's usually not that expensive to fall back to
SPIR-V for the odd cache miss especially if it only happens once for
several misses and it simplifies the cache.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The stuff hashed by anv_pipeline_hash_shader is exactly the inputs to
anv_shader_compile_to_nir so it can be used for NIR caching.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
They have glsl_sampler_dim enum values of 8 and 9 which don't work when
you & them with 0x7. Fortunately, we have plenty of bits.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Thanks to the new NIR load_descriptor intrinsic added by the UBO/SSBO
lowering series, we weren't getting UBO pushing because the UBO range
detection pass couldn't see the constants it needed. This fixes that
problem with a quick round of constant folding. Because we're folding
we no longer need to go out of our way to generate constants when we
lower the vulkan_resource_index intrinsic and we can make it a bit
simpler.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The last round of fixing 3d layer+level layout skipped the tiled case,
since tiled texture support was not in place yet. This finishes the
job.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This is known when the CSO is created, so no need to patch it in later.
Also, it seems like smaller textures where the first level is small
enough to be linear, it seems like we should set linear tile mode.
See: dEQP-GLES3.functional.texture.format.unsized.rgb_unsigned_byte_3d_pot
Signed-off-by: Rob Clark <robdclark@gmail.com>
Previously we'd use format/etc from the primary (z32) buffer for the
stencil (s8), due to confusion about rsc vs psurf. Rework this to drop
extra arg and push down handling of separate stencil case (and make sure
we take the fmt from the right place).
This doesn't completely fix separate-stencil, but at least it avoids the
GPU scribbling over random other cmdstream buffers and causing a bunch
of bogus fails in dEQP.
Signed-off-by: Rob Clark <robdclark@gmail.com>
If nothing else, this will make problems with cmdstream getting blit
over with pixels easier to track down (ie. faults when it first happens
rather than strange failures later from corrupted cmdstream when a
stateobj is later reused).
(NOTE this somewhat depends on the kernel supporting the flag, and the
iommu implementation. But the worst case is just that the cmdstream
ends up writeable as before.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
The check for location aliasing was always asuming output variables
but this validation is also called for input variables.
Fixes: e2abb75b0e ("glsl/linker: validate explicit locations for SSO programs")
Cc: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Patch moves intel_tiled_memcpy[_sse41] libraries to isl, renames some
functions and types and makes the required build system changes for
meson, automake and Android. No functional changes are introduced.
v2: code cleanups, move isl_get_memcpy_type to i965 (Jason)
v3: move isl_mem_copy_fn to priv header, cleanups (Jason, Dylan)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we have software implementations of ARB_gpu_shader_int64 and
ARB_gpu_shader_fp64 we can unconditionally enable these extensions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Shaders containing software implementations of double-precision
operations can be very large such that we cannot stack-allocate
an array of grf_count*16.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Shaders containing software implementations of double-precision
operations can be very large such that we have more the 2^16 virtual
registers during optimization.
Move the 'nr' field to the union containing the immediate storage and
expand it to 32-bits.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The next patch replaces an unsigned bitfield with a plain unsigned,
which triggers gcc to begin warning on signed/unsigned comparisons.
Keeping this patch separate from the actual move allows bisectablity and
generates no additional warnings temporarily.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A follow on commit will move nr to the same union as the immediate
data, so we should assert these invariants before we overwrite the nr
field.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A follow on patch will move the 'nr' field to the union containing the
immediate field, so prepare by checking that we're only testing these
assertions if the .file is correct.
The assertions with != ARF were kind of silly to begin with because the
<128 check is specifically only for things in the GRF.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NIR metadata validation verifies that the debug bit was unset (by a call
to nir_metadata_preserve) if a NIR optimization pass made progress on
the shader. With the expectation that the NIR shader consists of only a
single main function, it has been safe to call nir_metadata_preserve()
iff progress was made.
However, most optimization passes calculate progress per-function and
then return the union of those calculations. In the case that an
optimization pass makes progress only on a subset of the functions in
the shader metadata validation will detect the debug bit is still set on
any unchanged functions resulting in a failed assertion.
This patch offers a quick solution (short of a larger scale refactoring
which I do not wish to undertake as part of this series) that simply
unsets the debug bit on unchanged functions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Will be used to communicate that a shader uses 64-bit operations to the
concerned lowering passes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We're going to have multiple functions, so nir_shader_get_entrypoint()
needs to do something a little smarter.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously it assumed that only a single function (the entrypoint)
existed and attempted to lower constant initializers of shader outputs
for each function, for instance.
v2: use mix and findMSB to optimise.
v3: [Sagar] Fix zFrac0 == 0u case in __normalizeRoundAndPackFloat64
Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
The following patches will add implementations of various
double-precision operations to this file.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Will be used to convert the .glsl source file containing software fp64
routines to a .h file that can be included while building the compiler.
This commit contains two squashed together: the first from Ian adding
the utility (with the existing title), and the second from Dylan making
the code both python2 and python3 compatible.
This is somewhat modeled after the xxd utility that comes with Vim.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
xxd.py: Make python2 and 3 compatible
This makes use of unicode_literals, so that undecorated strings are
considered text (python2 unicode, python3 str) and not bytes in python2
and text in python3. It makes use of io.open, which provides python2
with python3's open behavior (it's an alias in python3), in particular
support for the 't' and 'b' option. Finally, it decorates all of the
string literals with the 'b' prefix, so that python interprets them as
bytes.
I've removed the stdin and stdout options, as python2 always requires
these to be bytes, but python3 always treats them as text (there is a
way to get at the underlying bytes buffer, but that's even more
complexity), and makes the input files required arguments.
In the meson we use the '@INPUT@' shorthand instead of listing each
input, as meson will expand that to [prog_python, '@INPUT0@', @INPUT1@,
..., @OUTPUT@, ...]
Otherwise we can end up with IR that looks like this:
(
(declare (temporary ) vec4 f@8)
(assign (xyzw) (var_ref f@8) (var_ref f) )
(call f16 ((swiz y (var_ref f@8) )))
(assign (xyzw) (var_ref f) (var_ref f@8) )
))
When we really need:
(declare (temporary ) float inout_tmp)
(assign (x) (var_ref inout_tmp) (swiz y (var_ref f) ))
(call f16 ((var_ref inout_tmp) ))
(assign (y) (var_ref f) (swiz y (swiz xxxx (var_ref inout_tmp) )))
(declare (temporary ) void void_var)
The GLSL IR function inlining code seemed to produce correct code
even without this but we need the correct IR for GLSL IR -> NIR to
be able to understand whats going on.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Based on a patch from Tim Arceri, but I had to substantially rewrite it
as a result of the NIR derefs rework.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are broken on a future platform, but it turns out we don't need
to fix them, since they're just type-converting moves with strided
source. Kill them.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This legalization pass is meant to handle situations where the source
or destination regioning controls of an instruction are unsupported by
the hardware and need to be lowered away into separate instructions.
This should be more reliable and future-proof than the current
approach of handling CHV/BXT restrictions manually all over the
visitor. The same mechanism is leveraged to lower unsupported type
conversions easily, which obsoletes the lower_conversions pass.
v2: Give conditional modifiers the same treatment as predicates for
SEL instructions in lower_dst_modifiers() (Iago). Special-case a
couple of other instructions with inconsistent conditional mod
semantics in lower_dst_modifiers() (Curro).
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Currently the visitor attempts to enforce the regioning restrictions
that apply to double-precision instructions on CHV/BXT at NIR-to-i965
translation time. It is possible though for the copy propagation pass
to violate this restriction if a strided move is propagated into one
of the affected instructions. I've only reproduced this issue on a
future platform but it could affect CHV/BXT too under the right
conditions.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
I triggered this bug while prototyping code for a future platform on
IVB. Could be a problem today though if a strided move is
copy-propagated into a type-converting move with DF destination.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This seems to be a problem in combination with the lower_regioning
pass introduced by a future commit, which can modify a SIMD-split
instruction causing its execution size to become illegal again. A
subsequent call to lower_simd_width() would hit this bug on a future
platform.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Align16 is no longer a thing, so a new implementation is provided
using Align1 instead. Not all possible swizzles can be represented as
a single Align1 region, but some fast paths are provided for
frequently used swizzles that can be represented efficiently in Align1
mode.
Fixes ~90 subgroup quad swap Vulkan CTS tests.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
lower_integer_multiplication() implements 32x32-bit multiplication on
some platforms by bit-casting one of the 32-bit sources into two
16-bit unsigned integer portions. This can give incorrect results if
the original instruction specified a source modifier. Fix it by
emitting an additional MOV instruction implementing the source
modifiers where necessary.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Looks like it is impossible that 'last' variable is a null
because at least the get_vs_prog_data shouldn't return a null pointer.
So this check is unnecessary starts from commit:
99d497c5b6 "anv/pipeline: Replace get_fs_input_map with ..."
This small issue is found by cppcheck.
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This ensures that during encoding, applications can get
the correct status of the surface before submitting
more operations on the same.
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Indrajit Das <indrajit-kumar.das@amd.com>
This reverts commit f6a6da8131.
With this commit we see massive amounts of asserts triggering
in lp_fence_wait(), assert(f->issued), for instance with libgl_xlib
state tracker and piglit. Not entirely sure if the assert could
just be removed.
We have found some pipe_surface leaks internally.
This is the same code as surface_destroy in radeonsi.
Ideally, surface_destroy would be in pipe_screen.
Cc: 18.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
With Mesa 18.1, commit be973ed21f, si_llvm_load_input_vs()
changed the number of source 32-bit wide dword components
used for fetching vertex attributes into the vertex shader
from a constant 4 to a variable num_channels number, depending
on input data format, with some special case handling for
input data formats like 64-Bit doubles.
In the case of a GL_DOUBLE input data format with one
or two components though, e.g, submitted via ...
a) glTexCoordPointer(1, GL_DOUBLE, 0, buffer);
b) glTexCoordPointer(2, GL_DOUBLE, 0, buffer);
... the input format would be SI_FIX_FETCH_RG_64_FLOAT,
but no special case handling was implemented for that
case, so in the default path the number of 32-bit
dwords would be set to the number of float input components
derived from info->input_usage_mask. This ends with corrupted
input to the vertex shader, because fetching a 64-bit double
from the vbo requires fetching two 32-bit dwords instead of 1,
and fetching a two double input requires 4 dword fetches
instead of 2, so in these cases the vertex shader receives
incomplete/truncated input data:
a) float v = gl_MultiTexCoord0.x; -> v.x is corrupted.
b) vec2 v = gl_MultiTexCoord0.xy; -> v.x is assigned
correctly, but v.y is corrupted.
This happens with the standard TGSI IR compiled shaders.
Under NIR with R600_DEBUG=nir, we got correct behavior
because the current radeonsi nir code always assigns
info->input_usage_mask = TGSI_WRITEMASK_XYZW, thereby
always fetches 4 dwords regardless of what the shader
actually needs.
Fix this by properly assigning 2 or 4 dword fetches for
one or two component GL_DOUBLE input.
Fixes: be973ed21f ("radeonsi: load the right number of
components for VS inputs and TBOs")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Fixes artifacts in World of Warcraft when Multi-sample Alpha-Test is
enabled with DXVK.
It also fixes artifacts with Fallout 4's god rays with DXVK.
Various piglit interpolateAt*() tests under NIR are also fixed.
v2: formatting fix
update commit message to include Fallout 4 and the Fixes tag
Fixes: f4e499ec79 ('radv: add initial non-conformant radv vulkan driver')
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106595
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
The Vulkan spec 1.1.97 says:
"variablePointers specifies whether the implementation supports
the SPIR-V VariablePointers capability. When this feature is
not enabled, shader modules must not declare the
VariablePointers capability."
As the SPIR-V feature is enabled, we should turn on the
extension feature as well.
All dEQP-VK.spirv_assembly.instruction.compute.variable_pointers.*
pass with the khronos internal repo. Note that a bunch of them
fails with the public repo, but it's expected as they violate the
specification.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
From glibc printf(3):
Z A nonstandard synonym for z that predates the appearance of z.
Do not use in new code.
Z may not exist on non-glibc systems. Prefer the standard symbol.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
If there is no last fence, due to no rendering happening yet, just
create a new signaled fence and return it, to match the expectations of
the EGL sync fence API.
Fixes random "Could not create sync fence 0x3003" assertion failures from
Skia on Android, coming from the following code:
https://android.googlesource.com/platform/frameworks/base/+/master/libs/hwui/pipeline/skia/SkiaOpenGLPipeline.cpp#427
Reproducible especially with thread count >= 4.
One could make the driver always keep the reference to the last fence,
but:
- the driver seems to explicitly destroy the fence whenever a rendering
pass completes and changing that would require a significant functional
change to the code. (Specifically, in lp_scene_end_rasterization().)
- it still wouldn't solve the problem of an EGL sync fence being created
and waited on without any rendering happening at all, which is
also likely to happen with Android code pointed to in the commit.
Therefore, the simple approach of always creating a fence is taken,
similarly to other drivers, such as radeonsi.
Tested with piglit llvmpipe suite with no regressions and following
tests fixed:
egl_khr_fence_sync
conformance
eglclientwaitsynckhr_flag_sync_flush
eglclientwaitsynckhr_nonzero_timeout
eglclientwaitsynckhr_zero_timeout
eglcreatesynckhr_default_attributes
eglgetsyncattribkhr_invalid_attrib
eglgetsyncattribkhr_sync_status
v2:
- remove the useless lp_fence_reference() dance (Nicolai),
- explain why creating the dummy fence is the right approach.
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
The binding is checked against the limits later in the function, so we
need to make sure we don't overflow before the check here.
Fixes this valgrind warning (and sometimes segfault):
==1460== Invalid write of size 4
==1460== at 0x74C98DD: ast_declarator_list::hir(exec_list*, _mesa_glsl_parse_state*) (ast_to_hir.cpp:4943)
==1460== by 0x74C054F: _mesa_ast_to_hir(exec_list*, _mesa_glsl_parse_state*) (ast_to_hir.cpp:159)
==1460== by 0x7435C12: _mesa_glsl_compile_shader (glsl_parser_extras.cpp:2130)
in
dEQP-GLES31.functional.debug.negative_coverage.get_error.compute.
exceed_atomic_counters_limit
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
I think this was copy-and-paste mistake -- nir_opt_large_constants was
passing in glsl_get_natural_size_align_bytes() given brw_nir.c's arguments
to the opt pass.
I wanted to reuse this function for handling constant offsets of arrays of
images in V3D.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
V3D returns the texels in a different order in the resulting vec4 from
what GLSL wants, so we need to put in a swizzle. Fixes
dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.base_level.level_1
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This way they can be shared. Build tested with meson, but not too sure
on the autotools stuff though.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Initialize the variable with NULL. Fixes the following
In file included from ../src/compiler/nir/nir_lower_io.c:34:
../src/compiler/nir/nir_lower_io.c: In function ‘nir_lower_explicit_io’:
../src/compiler/nir/nir.h:668:11: warning: ‘addr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
return src;
^~~
../src/compiler/nir/nir_lower_io.c:735:17: note: ‘addr’ was declared here
nir_ssa_def *addr;
^~~~
v2: Avoid using a 'default' case so we get help from the compiler when
new deref types are added. (Lionel)
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
"pad" was missing in Mesa's msm_drm.h. sizeof(drm_msm_gem_info)
remains the same, but now the compiler initializes the field to
zero.
Buffer allocation results in EINVAL without this for me.
Cc: Rob Clark <robdclark@gmail.com>
Cc: Kristian Høgsberg <hoegsberg@gmail.com>
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>
the naming is a bit confusing no matter how you look at it. Within SPIR-V
"global" memory is memory accessible from all threads. glsl "global" memory
normally refers to shader thread private memory declared at global scope. As
we already use "shared" for memory shared across all thrads of a work group
the solution where everybody could be happy with is to rename "global" to
"private" and use "global" later for memory usually stored within system
accessible memory (be it VRAM or system RAM if keeping SVM in mind).
glsl "local" memory is memory only accessible within a function, while SPIR-V
"local" memory is memory accessible within the same workgroup.
v2: rename local to function as well
v3: rename vtn_variable_mode_local as well
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This has never functioned and probably wont ever function, due to the
way gallium media state trackers are architected and the tegra video
decoder is architected.
Cc: Thierry Reding <thierry.reding@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Fixes: 1755f608f5
("tegra: Initial support")
The version exported by LLVM in its CMake configuration files can
include the “svn” suffix when building a development version (for
example “8.0.0svn”). However the exported clang headers are still found
under “lib/clang/8.0.0/”, without the “svn” suffix.
Meson takes care of removing the “svn” suffix from the version when
using the dependency’s `version()` method.
This processing is already performed in “configure.ac” when using
autotools.
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
In the following scenario :
1. Create image format R8G8B8A8_UNORM
2. Create image view format R8G8B8A8_SRGB
3. Clear the view through a sub pass to a particular color
4. Barrier on the image to from color attachment to source transfer
5. Copy the image into a linear buffer to check the content
The step 4 resolving the clear color is unaware of the SRGB format of
the view, because the blorp resolve operations operate on images the
color associated with the resolve will not operate on SRGB format but
UNORM. Leading to the wrong color being written into surfaces.
This change forces a clear color resolve at the end of the render pass
so following resolves won't have to deal with the clear color with a
format that doesn't match the image's format.
On gfxbench vulkan_5_normal 1280x720, this appear to cost us ~0.5fps,
from 49.316 down to 48.949.
v2: Only fast clear resolve when image & view have different formats
(Lionel)
v3: Update warning (Jason)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108911
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Fixes following errors from valgrind output:
==23388== Conditional jump or move depends on uninitialised value(s)
==23388== at 0x48B4924: loader_dri3_drawable_init (loader_dri3_helper.c:381)
==23388== by 0x48A97D2: dri3_create_drawable (dri3_glx.c:386)
==23388== by 0x489E190: driFetchDrawable (dri_common.c:369)
==23388== by 0x48A9187: dri3_bind_context (dri3_glx.c:195)
==23388== by 0x488B75C: MakeContextCurrent (glxcurrent.c:220)
==23388== by 0x488B8DB: glXMakeCurrent (glxcurrent.c:267)
==23388== by 0x10A987: ??? (in /usr/bin/glxgears)
==23388== by 0x4BEB412: (below main) (in /usr/lib64/libc-2.28.so)
==23388==
==23388== Conditional jump or move depends on uninitialised value(s)
==23388== at 0x48B5A40: loader_dri3_swap_buffers_msc (loader_dri3_helper.c:923)
==23388== by 0x48A9B7E: dri3_swap_buffers (dri3_glx.c:587)
==23388== by 0x4887A81: glXSwapBuffers (glxcmds.c:857)
==23388== by 0x10ADED: ??? (in /usr/bin/glxgears)
==23388== by 0x4BEB412: (below main) (in /usr/lib64/libc-2.28.so)
Fixes: 2e12fe425f "loader/dri3: Enable adaptive_sync via _VARIABLE_REFRESH property"
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
This stores the raster state and calls the correct primconvert interface
using the currently bound raster state.
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
For now, it's hidden behind a cap. Hopefully, we can eventually drop
that along with all the manual offset code in spirv_to_nir.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The choice of whether or not we should use block_load/store isn't a
choice between external and not so much as a choice between deref
instructions and manually calculated offsets. In vtn_pointer_from_ssa,
we guard the index+offset case behind vtn_pointer_uses_ssa_offset and
then branch out from there.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Instead of baking in uvec2 for UBO and SSBO pointers and uint for push
constant and shared memory pointers, make it configurable.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Previously, we hard-coded the rule about workgroup variables and the
builder lower_workgroup_access_to_offsets flag. Instead base it on the
handy helper we have for exactly this sort of thing.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Variable pointers being well-defined across the block boundary requires
a couple of very specific SPIR-V validation rules. Normally, we'd trust
the validator to catch these but since CTS tests have been found in the
wild which violate them, we'll carry our own checks.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This new pass is for lowering explicitly laid out memory coming in from
SPIR-V or a similar source. It's quite a bit more complicated than the
normal lower_io because we have to be able to handle matrices. The
way the stride information is stored for matrices is awkward and dealing
with row-major matrices is especially painful.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This commit adds a new num_components value for intrinsic sources of -1
which means that it consumes everything and the number of components
effectively isn't validated. This is useful for deref sources which
just take the result of the deref and we leave it up to the driver to
decide what that size should be.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We added this assert when first moving derefs over to instructions to
ensure that deref chains could go all the way back to the variables.
Now that we're going to start using derefs for things that we can do
variable pointers on such as UBOs and SSBOs, we need to be able to run
derefs through phi nodes, selects, and basically anything else.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We already detect any incomplete deref chains (where the deref is used
for something other than another deref or a load/store) and flag the
variable as used thanks to deref_used_for_not_store. All that's left to
do is to properly skip casts when cleaning up.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This pass is used when, for instance, we lazily change the mode of
variables rather than replacing the variable with a new one. Since we
only do this in cases where we know we have full deref chains, it's ok
to just skip them in fixup_deref_modes.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The code which constructs deref paths already gives you the path
starting at the nearest deref_cast or deref_var. All we need to do for
casts is handle the case where the start of the path isn't a deref_var.
For ptr_as_array derefs, we just bail if we have any after the
divergence point between the two derefs. We may be able to do better in
the future but this works for now.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
When handling casts, we can't blindly propagate the parent of a cast
into a ptr_as_array deref because doing so might loose the stride
information from the cast. Instead, before we can propagate into
ptr_as_array derefs, we need to check that the cast is a cast of an
array deref and that the stride matches. For other types of derefs, we
can continue to propagate casts as normal because they don't need the
stride. We also add an optimization which can combine a ptr_as_array
deref with it parent if it is also an array deref of some form.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
These correspond directly to SPIR-V's OpPtrAccessChain. As such, they
treat whatever their parent gives them as if it's the first element in
some array and dereferences that array. If the parent is, itself, an
array deref, then the two indices can just be added together to get the
final array deref. However, it can also be used in cases where what you
have is a dereference to some random vec2 value somewhere. In this
case, we require a cast before the ptr_as_array and use the ptr_stride
field in the cast to provide a stride for the ptr_as_array derefs.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We're going to want to do more deref optimizations going forward and
this gives us a central place to do them. Also, cast propagation will
get a bit more complicated with the addition of ptr_as_array derefs.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Instead of just storing the decorations in the vtn_type, propagate them
all the way through to the glsl_type. For array strides, this means we
need to handle them earlier so we break array stride handling into it's
own function and explicitly call it for both pointer and array types.
Due to type deduplication in the SPIR-V, we may have explicit layout
decorations on all sorts of types that don't actually want them. In
order to prevent these leaking into unfortunate places in NIR, we
explicitly strip them off before creating NIR variables and when casting
pointers to non-external memory.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
SPIR-V allows for matrix and array types to be decorated with explicit
byte stride decorations and matrix types to be decorated row- or
column-major. This commit adds support to glsl_type to encode this
information. Because this doesn't work nicely with std430 and std140
alignments, we add asserts to ensure that we don't use any of the std430
or std140 layout functions with explicitly laid out types.
In SPIR-V, the layout information for matrices is applied to the parent
struct member instead of to the matrix type itself. However, this is
gets rather clumsy when you're walking derefs trying to compute offsets
because, the moment you hit a matrix, you have to crawl back the deref
chain and find the struct. Instead, we take the same path here as we've
taken in spirv_to_nir and put the decorations on the matrix type itself.
This also subtly adds support for strided vector types. These don't
come up in SPIR-V directly but you can get one as the result of taking a
column from a row-major matrix or a row from a column-major matrix.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This is C++ so we can just poke at the fields of glsl_type if we wish
and calling get_instance is way easier and more reliable than handling
each instance separately. While we're at it, we re-arrange the base
type labels to match the enum order and add 8-bit type support.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
It was added in bce6f99875 even though it's completely redundant with
glsl_array_type().
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Previously, NIR had a single nir_var_uniform mode used for atomic
counters, UBOs, samplers, images, and normal uniforms. This commit
splits this into nir_var_uniform and nir_var_ubo where nir_var_uniform
is still a bit of a catch-all but the nir_var_ubo is specific to UBOs.
While we're at it, we also rename shader_storage to ssbo to follow the
convention.
We need this so that we can distinguish between normal uniforms and UBO
access at the deref level without going all the way back variable and
seeing if it has an interface type.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
I have no idea how shader_storage made it into the list of banned
variable modes for stores but it clearly should be allowed. This only
doesn't cause us a problem today because we never actually use derefs on
shader_storage variables.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This doesn't currently change anything because array indices are
required to be 32 bits and all derefs are also 32 bits. However, we
will one day have 64-bit derefs for OpenCL.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We already had code in link_as_ssa to handle bit sizes; we just need to
use it. While we're at it we clean up link_as_ssa a bit and add an
explicit bit_size parameter in preparation for a day when we have derefs
that aren't 32 bit.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This simplifies our deref handling by emitting the actual NIR deref
instructions on-the-fly instead of of building up a deref chain and then
emitting them at the last moment. In order for this to work with the
parts of the compiler that assume they can chase deref chains, we have
to run nir_rematerialize_derefs_in_use_blocks_impl to put the derefs
back in the right places. Otherwise, in cases such as loop continues
where the SPIR-V blocks are not in the same order as the NIR blocks, we
may end up with a deref chain with a parent that does not dominate it's
child and nir_repair_ssa_impl will insert phis in the deref chain.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The loop through instructions doesn't set the cursor for us so unless we
set it somewhere, we may end up emitting instructions in the wrong
place. The only reason why we haven't been bitten by this in the past
is that it only happens in a few variable pointers cases and the CTS
tests for those don't use much control flow so things were getting
emitted in the correct order by accident.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Note that these limits are exact, not a "precision is at least x",
as texel coords also get snapped to a multiple of this step size
before filtering.
This fixes CTS tests
dEQP-VK.texture.explicit_lod.2d.sizes.31x55_nearest_linear_mipmap_nearest_repeat
dEQP-VK.texture.explicit_lod.2d.sizes.57x35_nearest_linear_mipmap_nearest_repeat
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109151
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
These days, we have two sampler lowering passes. The newer one,
gl_nir_lower_samplers_as_deref, is used by radeonsi. It rewrites
variables to drop structures out of sampler deref chains, to make
life simpler. It then sets var->data.binding for non-bindless
sampler and image variables based on the GL uniform storage's
opaque index values.
The older one converts sampler deref chains (nir_tex_src_texture_deref)
to a numerical offset (nir_tex_src_texture_offset). It also stores the
constant-valued portion of that number in tex->texture_index, making
life really simple for drivers that don't support indirects. It too
pokes at GL uniform storage's opaque index values.
Logically, we can do the first pass (simplify derefs, set bindings)
then the second (turn derefs to offsets, set texture_index). This
patch does exactly that, eliminating some redundancy (only one pass
has to poke at GL uniform storage), and gaining proper var->data.binding
values for drivers using the full lowering.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
We recurse to remove structures, and at each step, re-modify the
resulting type for our link in the deref chain. For arrays, the
result of recursion is the new underlying type - so we wrap it with
the array dimensionality again. For structs, we want to simply use
the new underlying type, skipping the struct altogether.
The correct way to do this is to do nothing at all. Previously, we
had reset type to next->type, which is the /old/ field type, not the
new field type we obtained by recursing. This undid our recursive work.
Fixes about 338 tests with nested structs, such as:
dEQP-GLES2.functional.uniform_api.value.initial.get_uniform.nested_structs_arrays.sampler2D_samplerCube_fragment
Note that currently only radeonsi uses this pass, and NIR support is
disabled there by default, so the breakage was likely not seen by most
people. The next commit uses this pass for more drivers, so this fix
prevents regressions from that change.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
unused and gcc complains about strncpy. (from what I can see because
strncpy does not leave a 0 byte on truncate. That said we don't use
it so this does not fix a real bug).
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
We do the ImageFormatProperties check already, and rejecting an usage
flag when both ImageFormatProperties and the WSI (which is Android)
support it is not allowed.
Intel does support storage for some of the support WSI formats, such
as R8G8B8A8_UNORM, and looking at the ISL_SURF_USAGE_DISABLE_AUX_BIT,
the imported images do not have any form of compression that would
prevent this fix.
v2: Also consider STORAGE bit for Gralloc usage bits.
(From Kevin Strasser <kevin.strasser@intel.com>)
Fixes: 053d4c328f "anv: Implement VK_ANDROID_native_buffer (v9)"
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
We started using it in the btoi paths for r32g32b32, and the LLVM IR
checker will complain about it because we end up with intrinsics with
the wrong type extension in the name.
Fixes: 593996bc02 ("radv: implement buffer to image operations for R32G32B32")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Some of the status variables in the compiler are only used in asserts
and thus may be unused in release builds. Annotate them accordingly
to avoid 'unused but set' warnings from the compiler.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Take into account the render target format when checking if the color
mask affects all channels of the RT. This allows to enable full
overwrite in a few cases where a non-alpha format is used.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
The functional change here is moving the nir_lower_io_to_scalar_early()
calls inside st_nir_link_shaders() and moving the st_nir_opts() call
after the call to nir_lower_io_arrays_to_elements().
This fixes a bug with the following piglit test due to the current code
not cleaning up dead code after we lower arrays. This was causing an
assert in the new duplicate varyings link time opt introduced in
70be9afccb.
tests/spec/glsl-1.10/execution/vsfs-unused-array-member.shader_test
Moving the nir_lower_io_to_scalar_early() calls also allows us to tidy
up the code a little and merge some loops.
Reviewed-by: Eric Anholt <eric@anholt.net>
Even without any clever optimization on the unpack operations, this gives
us a useful value for the channels read field, which we can use to avoid
ldtmu instructions to the no-op register.
instructions in affected programs: 890712 -> 881974 (-0.98%)
I've been doing this in the nir-to-vir and nir-to-qir backends of v3d and
vc4, but nir could potentially do some useful stuff for us (like avoiding
unpack/repacks) if we give it the information.
v2: Skip lowering for txs/query_levels
v3: Fix a crash on old-style shadow
v4: Rename to tex_packing, use nir_format_unpack_sint/uint helpers, pack
the enum.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In 92eb5bbc68 we attempted to avoid copying clear colors whenever
we weren't doing a resolve. However, this broke MSAA resolves because
we need the clear color in the source. This patch makes blorp much more
conservative such that it only avoids the clear color copy if either
aux_usage == NONE or it's explicitly doing a fast-clear.
Fixes: 92eb5bbc68 "intel/blorp: Only copy clear color when doing..."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107728
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
We can pull a whole vector in a single indirect load. This saves a bunch
of round-trips to the TMU, instructions for setting up multiple loads,
references to the UBO base in the uniforms, and apparently manages to
reduce register pressure as well.
instructions in affected programs: 3086665 -> 2454967 (-20.47%)
uniforms in affected programs: 919581 -> 721039 (-21.59%)
threads in affected programs: 1710 -> 3420 (100.00%)
spills in affected programs: 596 -> 522 (-12.42%)
fills in affected programs: 680 -> 562 (-17.35%)
Improves 3dmmes performance by 2.29312% +/- 0.139825% (n=5)
In the process of adding support for SSBOs and CS shared vars, I ended up
needing a helper function for doing TMU general ops. This helper can be
that starting point, and saves us a bunch of round-trips to the TMU by
loading a vector at a time.
I noticed that a VS I was debugging was missing all of its output stores
-- outputs_written was for POS, VAR0, VAR3, while the shader's variables
were POS, VAR9, and VAR12. I'm not sure what outputs_written is supposed
to be doing here, but we can just walk the declared variables and avoid
both this bug and the emission of extra stvpms for less-than-vec4
varyings.
When copy_prop_vars also took care of dead write handling, intrin was
used as part of store_to_entry. Now it isn't, so this assignment
isn't used really used. Add a comment clarifying what happens to
intrin.
Fixes: 4dfa7adc10 "nir: Remove handling of dead writes from copy_prop_vars"
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Even with the previous commit, hangs are still happening. The problem
there is that the VF cache invalidate do happen immediately without
waiting for previous rendering to complete. What happens is that we
invalidate the cache the moment the PIPE_CONTROL is parsed but we
still have old rendering in the pipe which continues to pull data into
the cache with the old high address bits. The later rendering with the
new high address bits then doesn't have the clean cache that it
expects/needs.
v2: Update commit message/explanation with Jason's
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: a363bb2cd0 ("i965: Allocate VMA in userspace for full-PPGTT systems.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109072
Hopefully we can kick start the revolution and other distros will start
providing them as well :)
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
No functional change as the socket name is the same,
just removing the double definition of the path.
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Signed-off-by: Jakob Bornecrantz <jakob@collabora.com>
A 2d-array texture (for example), should get the # of array elements
from box->depth, rather than depth0 which is minified.
Fixes dEQP-GLES3.functional.shaders.texture_functions.texture.sampler2darray_bias_float_fragment
with tiled textures.
Reported-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
If we hit the memcpy() path for copy_region(), that will try to do a
transfer_map(), which goes badly for blits to/from staging triggered
by transfer_map() or transfer_unmap().
We could possibly add fd_blit2() which has allow_transfer_map param,
and call that for staging blits. But I'm not really sure if trying
the blit via copy_region() is very useful. At least for newer gens
that implement fd_context::blit(), it probably isn't.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Switch over to using fd_context::blit(), in the same way that a5xx does.
The previous patch wires fd_resource_copy_region() up to the blitter so
a6xx no longer needs to bypass the core layer to accelerate this.
Signed-off-by: Rob Clark <robdclark@gmail.com>
First step to unify the way fd5 and fd6 blitter works. Currently a6xx
bypasses the blit API in order to also accelerate resource_copy_region()
But this approach can lead to infinite recursion:
#0 fd_alloc_staging (ctx=0x5555936480, rsc=0x7fac485f90, level=0, box=0x7fbab29220) at ../src/gallium/drivers/freedreno/freedreno_resource.c:291
#1 0x0000007fbdebed04 in fd_resource_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/drivers/freedreno/freedreno_resource.c:479
#2 0x0000007fbe5c5068 in u_transfer_helper_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/auxiliary/util/u_transfer_helper.c:243
#3 0x0000007fbde2dcb8 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47c780, src_level=0, src_box_in=0x7fbab2945c) at ../src/gallium/auxiliary/util/u_surface.c:350
#4 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173
#5 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587
#6 0x0000007fbde2f3d0 in util_try_blit_via_copy_region (ctx=0x5555936480, blit=0x7fbab29430) at ../src/gallium/auxiliary/util/u_surface.c:864
#7 0x0000007fbdec02c4 in fd_blit (pctx=0x5555936480, blit_info=0x7fbab29588) at ../src/gallium/drivers/freedreno/freedreno_resource.c:993
#8 0x0000007fbdf08408 in fd6_blit (pctx=0x5555936480, info=0x7fbab29588) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:546
#9 0x0000007fbdebdc74 in do_blit (ctx=0x5555936480, blit=0x7fbab29588, fallback=false) at ../src/gallium/drivers/freedreno/freedreno_resource.c:129
#10 0x0000007fbdebe58c in fd_blit_from_staging (ctx=0x5555936480, trans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:326
#11 0x0000007fbdebea38 in fd_resource_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:416
#12 0x0000007fbe5c5c68 in u_transfer_helper_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/auxiliary/util/u_transfer_helper.c:516
#13 0x0000007fbde2de24 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47b8e0, src_level=0, src_box_in=0x7fbab2997c) at ../src/gallium/auxiliary/util/u_surface.c:376
#14 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173
#15 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587
...
Instead rework the API to push the fallback back to core code, so that
we can rework resource_copy_region() to have it's own fallback path,
and then finally convert fd6 over to work in the same way.
This also makes ctx->blit() optional, and cleans up some unnecessary
callers.
Signed-off-by: Rob Clark <robdclark@gmail.com>
For multi-pass rendering, it is common to keep the same depth buffer
from previous pass, to discard geometry that would be hidden by later
draws. In the later passes with depth-test enabled, but depth-write
disabled, there is no reason to do gmem2mem resolve.
TODO probably do something similar for stencil.. although stencil
buffer isn't used as commonly these days
Signed-off-by: Rob Clark <robdclark@gmail.com>
After trying multiple times to merge if-statements with phis
between them I've come to the conclusion that it cannot be done
without regressions. The problem is for some shaders we end up
with a whole bunch of phis for the merged ifs resulting in
increased register pressure.
So this patch just merges ifs that have no phis between them.
This seems to be consistent with what LLVM does so for radeonsi
we only see a change (although its a large change) in a single
shader.
Shader-db results i965 (SKL):
total instructions in shared programs: 13098176 -> 13098152 (<.01%)
instructions in affected programs: 1326 -> 1302 (-1.81%)
helped: 4
HURT: 0
total cycles in shared programs: 332032989 -> 332037583 (<.01%)
cycles in affected programs: 60665 -> 65259 (7.57%)
helped: 0
HURT: 4
The cycles estimates reported by shader-db for i965 seem inaccurate
as the only difference in the final code is the removal of the
redundent condition evaluations and jumps.
Also the biggest code reduction (~7%) for radeonsi was in a tomb
raider tressfx shader but for some reason this does not get merged
for i965.
Shader-db results radeonsi (VEGA):
Totals from affected shaders:
SGPRS: 232 -> 232 (0.00 %)
VGPRS: 164 -> 164 (0.00 %)
Spilled SGPRs: 59 -> 59 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 14584 -> 13520 (-7.30 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 13 -> 13 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Before, I had per-stage entryoints with some helpers shared between them.
As I extended for compute shaders and shader-db, it turned out that the
other common code in the middle wanted to be shared too.
Loops will be trickier, since we need some analysis to figure out if the
breaks/continues inside are uniform. Until we get that in NIR, this gets
us some quick wins.
total instructions in shared programs: 6192844 -> 6174162 (-0.30%)
instructions in affected programs: 487781 -> 469099 (-3.83%)
I wanted to reuse the comparison stuff for nir_ifs, but for that I just
want the flags and no destination value. Splitting the conditions from
the destinations ended up cleaning the existing code up, anyway.
We'll still fail at draw time, but this avoids a regression in shader-db
execution once I enable TLB writes in precompiles.
Fixes: b38e4d313f ("v3d: Create a state uploader for packing our shaders together.")
Makes debugging easier when we care about the deref chain and not the
deref instruction itself. To make it take a const pointer, constify
some of the static functions in nir_print.c.
Reviewed-by: Eric Anholt <eric@anholt.net>
Nouveau requires rtti. Often LLVM is configured without rtti, and code
with and without cannot be linked safely. Lets just error out if nouveau
is requested and llvm is built without rtti.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109202
Fixes: c5a97d658e
("meson: fix builds against LLVM built without rtti")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The 16-bit polynomial execution doesn't meet Khronos precision requirements.
Also, the half-float denorm range starts at 2^(-14) and with asin taking input
values in the range [0, 1], polynomial approximations can lead to flushing
relatively easy.
An alternative is to use the atan2 formula to compute asin, which is the
reference taken by Khronos to determine precision requirements, but that
ends up generating too many additional instructions when compared to the
polynomial approximation. Specifically, for the Intel case, doing this
adds +41 instructions to the program for each asin/acos call, which looks
like an undesirable trade off.
So for now we take the easy way out and fallback to using the 32-bit
polynomial approximation, which is better (faster) than the 16-bit atan2
implementation and gives us better precision that matches Khronos
requirements.
v2:
- Fallback to 32-bit using recursion (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2:
- use nir_fadd_imm and nir_fmul_imm helpers (Jason)
v3:
- since we need to define one for fsub use it for fdiv too (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2:
- fix huge_val for 16-bit, it was mean't to be 2^14 not 10^14.
v3:
- rebase on top of new bool sized opcodes
- use nir_b2f helper
- use nir_fmul_imm helper
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2:
- use nir_fadd_imm and nir_fmul_imm helpers (Jason)
- rebased on top of new sized boolean opcodes
- use nir_b2f helper
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
v2:
- use nir_fmul_imm and nir_fadd_imm helpers (Jason)
v3:
- missed one case where we need to replace nir_imm_float
with nir_imm_floatN_t (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This just cleans things up a little and make things more safe for
derefs.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
The following patches will add support for an additional
optimisation so this function will no longer just optimise varying
constants.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will help the new opt introduced in the following patches
allowing us to remove extra duplicate varyings.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The following patch will use this with the radeonsi NIR backend
but I've added it to ac so we can use it with RADV in future.
This is a NIR implementation of the tgsi function
tgsi_scan_tess_ctrl().
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The previous code used a do while loop and continues after walking
a nested loop/if-statement. This means we end up evaluating the
last instruction from the nested block against the while condition
and potentially exit early if it matches the exit condition of the
outer block.
Fixes: 386d165d8d ("tgsi/scan: add a new pass that analyzes tess factor writes")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This just happened not to crash/assert because all loops have at
least 1 if-statement and due to a second bug we end up matching
the same ENDIF to exit both the iteration over the if-statment
and the loop.
The second bug is fixed in the following patch.
Fixes: 386d165d8d ("tgsi/scan: add a new pass that analyzes tess factor writes")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There's no way to tell the 3D engine about swizzling on such textures.
While rendering to NPOT ones may be possible, there's no great way to
expose that in gallium, nor would there be any practical benefit.
Fixes the non-compressed-format "copyteximage 3D" failures. Something
odd going on with the compressed formats.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This caused random failures for two conditional rendering tests:
dEQP-VK.conditional_rendering.draw_clear.draw.update_with_rendering_discard
dEQP-VK.conditional_rendering.draw_clear.draw.update_with_rendering_no_discard
These wrote the predicate with the vertex shader, did a barrier and then
started the conditional rendering. However the cache flushes for the barrier
only happen on first draw, so after the predicate has been read.
Fixes: e45ba51ea4 "radv: add support for VK_EXT_conditional_rendering"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Without this, I get the following error when building the tests with
autotools on i686:
---8<---
src/intel/common/gen_clflush.h: In function ‘gen_clflush_range’:
src/intel/common/gen_clflush.h:37:7: warning: implicit declaration of function ‘__builtin_ia32_clflush’; did you mean ‘__builtin_ia32_pause’? [-Wimplicit-function-declaration]
__builtin_ia32_clflush(p);
^~~~~~~~~~~~~~~~~~~~~~
__builtin_ia32_pause
src/intel/common/gen_clflush.h: In function ‘gen_flush_range’:
src/intel/common/gen_clflush.h:45:4: warning: implicit declaration of function ‘__builtin_ia32_mfence’; did you mean ‘__builtin_ia32_fnclex’? [-Wimplicit-function-declaration]
__builtin_ia32_mfence();
^~~~~~~~~~~~~~~~~~~~~
__builtin_ia32_fnclex
---8<---
The erros are generated for each of these files:
- mesa/src/intel/vulkan/tests/state_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool.c
- mesa/src/intel/vulkan/tests/block_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool_free_list_only.c
This is obviously because gen_clflush.h contains code that uses
intrinsics that are only available with SSE3. Since the driver already
uses SSE3, it seems reasonable to add this to the tests as well.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Eric Engeström <eric@engestrom.ch>
Without this, I get the following error when building the tests using
meson on i686:
---8<---
In file included from ../../../mesa/src/intel/vulkan/anv_private.h:46,
from ../../../mesa/src/intel/vulkan/tests/state_pool_no_free.c:26:
../../../mesa/src/intel/common/gen_clflush.h: In function ‘gen_clflush_range’:
../../../mesa/src/intel/common/gen_clflush.h:37:7: error: implicit declaration of function ‘__builtin_ia32_clflush’; did you mean ‘__builtin_ia32_pause’? [-Werror=implicit-function-declaration]
__builtin_ia32_clflush(p);
^~~~~~~~~~~~~~~~~~~~~~
__builtin_ia32_pause
../../../mesa/src/intel/common/gen_clflush.h: In function ‘gen_flush_range’:
../../../mesa/src/intel/common/gen_clflush.h:45:4: error: implicit declaration of function ‘__builtin_ia32_mfence’; did you mean ‘__builtin_ia32_fnclex’? [-Werror=implicit-function-declaration]
__builtin_ia32_mfence();
^~~~~~~~~~~~~~~~~~~~~
__builtin_ia32_fnclex
---8<---
The errors are generated for each of these files:
- mesa/src/intel/vulkan/tests/state_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool.c
- mesa/src/intel/vulkan/tests/block_pool_no_free.c
- mesa/src/intel/vulkan/tests/state_pool_free_list_only.c
This is obviously because gen_clflush.h contains code that uses
intrinsics that are only available with SSE3. Since the driver already
uses SSE3, it seems reasonable to add this to the tests as well.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engeström <eric@engestrom.ch>
s3tc layouts are a bit finicky - they're packed, but not swizzled.
Adjust logic to allow for that case:
- Don't set a uniform pitch for POT-sized compressed textures
- Adjust define_rect API to be less confused about block sizes
- Only mark a texture as linear if it has a uniform pitch set
This has been tested to fix xonotic (as well as the s3tc-* piglits)
on nv3x and keeps it working on nv4x.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This doesn't matter since all compressed formats supported by this
hardware use square blocks, but best to use the correct helper.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This logic mirrors what we do on nv50. The relatively new
texture_subdata callback can cause this to happen with 3D textures,
which is triggered at least by xonotic, and probably many piglits.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
If the last-active context gets deleted, the pushbuf doesn't have a
bufctx to reference. Then there could be a sequence of binds which would
trigger a reset on that bin before validation was done. Instead we just
pass in the bufctx in question directly.
All other instances of PUSH_RESET happen strictly after a validation is
run.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102349
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The whole user_priv thing is a mess, but as long as it's there, it
basically has to map 1:1 to the cur_ctx. Unfortunately we were setting
user_priv to some context, then that context could get deleted without
any draws/validations in it, leading user_priv to become NULL, with
cur_ctx still pointing at some old context. Then we wouldn't run the
switch logic, which in turn led to a NULL bufctx being dereferenced.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102349
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
We can just look at the MSF flags -- if they're unset, then we're
definitely in a helper invocation. Fixes
dEQP-GLES31.functional.shaders.helper_invocation.* with GLES3.1 enabled.
This is what the GLSL ES 310 spec tells us to do, but apparently the
"gather mode" flag doesn't imply it in the HW. Fixes
dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.filter_mode.min_nearest_mipmap_linear_mag_linear
The greedy comparison folding in bcsel means that we may have left the
original bool-generating NIR ALU instruction dead, but DCE wasn't
eliminating the VIR code for it because of the flags updates.
total instructions in shared programs: 5186024 -> 5100894 (-1.64%)
instructions in affected programs: 1448695 -> 1363565 (-5.88%)
This allows the original shader-db project's run.c runner to parse things
easily, and is probably a good thing to have for GL_ARB_debug_output in
general. I formatted it more like Intel's so I can mostly reuse their
report script.
I've been using my apitrace-based shader-db so far, but it's slow
(apitrace decompression), intrusive (apitrace windows spamming the
screen), and doesn't have much coverage. The original shader-db provides
a lot more coverage and compiles faster, at the expense of not having the
actual runtime variant key. As v3d has a lot less runtime variation than
vc4 did, this tradeoff makes more sense.
It's better to let most applications make use of adaptive sync
by default. Problematic applications can be placed on the blacklist
or the user can manually disable the feature.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
The DDX driver can be notified of adaptive sync suitability by
flagging the application's window with the _VARIABLE_REFRESH property.
This property is set on the first swap the application performs
when adaptive_sync is set to true in the drirc.
It's performed here instead of when the loader is initialized for
two reasons:
(1) The window's drawable can be missing during loader init.
This can be observed during the Unigine Superposition benchmark.
(2) Adaptive sync will only be enabled closer to when the application
actually begins rendering.
If adaptive_sync is false then the _VARIABLE_REFRESH property
is deleted on loader init.
The property is only managed on the glx DRI3 backend for now. This
should cover most common applications and games on modern hardware.
Vulkan support can be implemented in a similar manner but would likely
require splitting the function out into a common helper function.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Applications that don't present at a predictable rate (ie. not games)
shouldn't have adapative sync enabled. This list covers some of the
common desktop compositors, web browsers and video players.
[ Michel Dänzer: Added entry for firefox-esr ]
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
This option lets the user decide whether mesa should notify the
window manager / DDX driver that the current application is adaptive
sync capable.
It's off by default.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Some programs start with the path and command line arguments in
argv[0] (program_invocation_name). Chromium is an example of
an application using mesa that does this.
This tries to query the real path for the symbolic link /proc/self/exe
to find the program name instead. It only uses the realpath if it
was a prefix of the invocation to avoid breaking wine programs.
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
We were leaking surfaces because the references taken in
etna_set_framebuffer_state weren't being released on context destroy.
Instead of just directly releasing those references in
etna_context_destroy, use the util_copy_framebuffer_state helper.
Take the chance to remove the duplicated buffer references in
compiled_framebuffer_state to avoid confusion.
The leak can be reproduced with a client that continuously creates and
destroys contexts.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reported-by: Sjoerd Simons <sjoerd.simons@collabora.co.uk>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Older versions of virglrenderer before 33da7361aec486290df0aec4ad8dfa8ff6adde2c
in vtest mode, misrender gears.
Fixes: 9d81cd8e7c (virgl: Pass resource size and transfer offsets)
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
I know it's not what anyone wants, but how about we start with a
message in the documentation that encourages people to try meson.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engeström <eric@engestrom.ch>
Note that meson requires python 3, scons requires python 2, and
autotools works with either.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engeström <eric@engestrom.ch>
ATOMFADD is a little special -- make drivers have to specify it
explicitly.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is supported by at least NVIDIA hardware, and exposeable via GL
extensions.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We need a 64-bit value, otherwise we only handle the low 32, and happen to
sign-extend to claim to write all varying slots if VARYING_SLOT_VAR2 was
used.
Fixes: 4d0b2c7aaa ("ttn: Update shader->info as we generate code.")
Reviewed-by: Rob Clark <robdclark@gmail.com>
We've made the choice not to use fast clears on layer > 0 with
multilayer images. This is partly because we would need to store
multiple clear colors for each layer, making the existing memory
layout, already including aux surfaces, fast clear color, image state,
etc... even more complex.
Partial resolves are the operations transfering the clear colors into
the auxiliary buffers. This operation is currently implemented in
Blorp by loading the clear color from the image's BO, into a shader
that then samples from the auxiliary buffer and writes the color only
if it isn't there already.
The problem here is that because we store only one clear color for all
layers and it is used for partial resolves. If you trigger a partial
clear on a layer > 0, then you're likely to deal with a color that is
not what you actually want. In the particular issues below, we have
multiple layers, each cleared with a different color but the partial
resolve just writes the wrong color into the auxiliary buffers for
layers > 0.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108910
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108911
Cc: mesa-stable@lists.freedesktop.org
100 is too small for some games, which triggers recompilations
every frame. Increase to 1024.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
nine_context_box_upload uploads a ram buffer (from src)
to a pipe_resource (dst).
We already have a refcount on the pipe_resource,
what needs to be protected from release is the ram buffer,
thus a reference to src.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Cc: mesa-stable@lists.freedesktop.org
This enables to match the window size
on resize on all cases, as it only works
currently with presentation buffers.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This patch introduces a structure to release the
present_handles only when they are fully released
by the server, thus making
"DestroyD3DWindowBuffer" actually release the buffer
right away when called.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
This logic can be re-written as the two cases for 3d (ie. before/after
the miplevel sizes start reducing) vs everything else. I think it is
easier to read this way.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This was a hold-over from the early TGSI days, and mostly not needed
with NIR. This avoids burning an entire 4 consecutive scalar regs
for vec3 outputs, for example. Which fixes a few places that we were
doing worse that we should on register usage.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Fixes the following crash that happened after d6110d4d
The problem happens if we first compile a "vanilla" shader with nothing
lowered in NIR, which perform the final lowering passes on so->shader->
nir (including nir_lower_locals_to_regs()), and then later we have
compile a shader with some lowering. The second time through we would
have already done nir_lower_locals_to_regs().
Arguably this was already a bug, just one we hadn't noticed yet.
Fixes: d6110d4d54 intel/compiler: move nir_lower_bool_to_int32 before nir_lower_locals_to_regs
Signed-off-by: Rob Clark <robdclark@gmail.com>
DrawPixels lowering, for example, adds new varyings that need to be
accounted for in inputs_read. The earlier info gathering at link time
cannot account for this.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The glDrawPixels passthrough vertex shader copies position and texcoord
vertex attributes to varying outputs. It also optionally copies a third
gl_Color attribute, which sometimes is unnecessary. Until now, we've
compiled separate variants of the shader, one of which does this extra
copy, and the other of which doesn't. We have done this since 2007.
But, the vertex shader runs for a whopping four vertices, and so the
cost of a copying a single input to output is likely inconsequential.
In theory, we could bind one fewer vertex element - but we always bind
all three regardless. So, we don't even get that savings.
This patch unifies the two, so we always copy the optional color,
and save having to compile the variant. It also makes the VS input
interface match up with the vertex element state without any dead
(unused) input attributes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
So that gitlab will render the < and > correctly allowing the tag to be
copy-n-pasted without additional formatting.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Whenever llvm removes an intrinsic (we're using), we're hitting segfaults
due to llvm doing calls to address 0 in the jitted code instead.
However, Jose figured out we can actually detect this with
LLVMGetIntrinsicID(), so use this to abort, so we don't have to wonder
what got broken. (Of course, someone still needs to fix the code to
no longer use this intrinsic.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This intrinsic disppeared with llvm 6.0, using it ends up in segfaults
(due to llvm issuing call to NULL address in the jited shaders).
Add code doing the same thing as the autoupgrade code in llvm so it
can be matched and replaced back with a pavgb.
While here, also improve lp_test_format, so it tests both with and without
cache (as it was, it tested the cache versions only, whereas cache is
actually disabled in llvmpipe, and in any case even with it enabled
vertex and geometry shaders wouldn't use it). (Although at least for
the unorm8 uncached fetch, the code is still quite different to what
llvmpipe is using, since that would use unorm8x16 type, whereas
the test code is using unorm8x4 type, hence disabling some intrinsic
paths.)
Fixes: 6f4083143b ("gallivm: use llvm jit code for decoding s3tc")
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
This commit adds a number of build combinations:
- Gallium Drivers {SWR, RadeonSI, Others)
Each one has different LLVM requirements. Building SWR alone is twice
as slow as all other drivers combined.
- Gallium ST Clover LLVM {5,6,7}
Because C++ API changes all the time. Analogous to above building
Clover takes as much time as building all other ST combined.
- Gallium ST Others
Nouveau is used, instead of i915g since meson has explicit target
tracking. Meaning that a configure error is thrown if we use i915g
with say va, vdpau or others.
Note: LLVM prior to 5.0 is intentionally dropped. If needed we can add
that later.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
This is the supported way to do this, and should be more robust and
reliable.
v2: [Emil]
- enable backslash escapes
- don't hardcode the path
- pass the argument directly to meson
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The latter is the default these days and Travis will be removing sudo
soonish.
Flipping to xenial, allows us to remove a bunch of hacks we have. Plus
it prevents us from adding new ones, to workaround what seems like a
gcc/binutils bug. For example (from the upcoming meson build):
FAILED: ccache c++ -o src/gallium/targets/pipe-loader/pipe_r600.so ...
... src/util/libmesa_util.a ... /usr/lib/x86_64-linux-gnu/libz.so ...
src/util/libmesa_util.a(disk_cache.c.o): In function `deflate_and_write_to_disk':
_build/../src/util/disk_cache.c:746: undefined reference to `deflateInit_'
_build/../src/util/disk_cache.c:765: undefined reference to `deflate'
...
As we can see, even though libz.so is explicitly passed after the
object that requires it - the linker still fails to see the symbols.
Avoid all those situations - flip the switch.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Seemingly with LLVM7 and GCC 5.0, the former won't properly advertise
-std=c++11 and the latter will choke.
dd this temporary workaround, otherwise we'll get errors like:
In file included from /usr/include/c++/5/type_traits:35:0,
from /usr/lib/llvm-7/include/llvm/Support/type_traits.h:18,
from /usr/lib/llvm-7/include/llvm/ADT/Optional.h:22,
from /usr/lib/llvm-7/include/llvm/ADT/STLExtras.h:20,
from /usr/lib/llvm-7/include/llvm/ADT/StringRef.h:13,
from /usr/lib/llvm-7/include/llvm/Target/TargetMachine.h:17,
from ../../../src/amd/common/ac_llvm_helper.cpp:36:
/usr/include/c++/5/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Swap '..' with the symbolic inc_glx and add glproto as dependency. That
will pull the correct include, effectively fixing the tests on macOS.
Fixes: a47c525f32 ("meson: build glx")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
When producing the final libGL.so/libGLX_mesa.so we only link the local
static helper lib (libglx). Thus there's no reason for the includes.
Fixes: a47c525f32 ("meson: build glx")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The library itself (libGL) is only built when -Dglx=dri, yet it's
accompanying tests are build even with -Dglx=xlib.
Adjust the guards, so we don't build the tests when they are not
applicable
v2:
- Reword commit message (Dylan)
- Drop build_by_default hunk (Dylan)
Fixes: a47c525f32 ("meson: build glx")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The gallium drivers do not require a DRI loader. Drop the artificial
and unnecessary restriction.
Fixes: af9d276134 ("meson: build libmesa_gallium")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Currently our is_sha_nomination does:
- folds any whitespace, attempting to extract sha-like information
- checks that at least one of the shas has landed
Split it in two and do sha-like validation first.
This way, commits with mesa-stable and sha nominations will feature the
fixes/revert/etc instead of stable (a) or will be omitted if not
applicable for the respective branch (b).
Misc examples from 18.3
(a)
-[ stable ] 5bc509363b glx: make xf86vidmode mandatory for direct rendering
+[ fixes ] 5bc509363b glx: make xf86vidmode mandatory for direct rendering
(b)
-[ stable ] 9a7b319903 anv/query: flush render target before copying results
CC: Juan A. Suarez <jasuarez@igalia.com>
CC: Dylan Baker <dylan@pnwbakers.com>
CC: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
This lets the driver use pipe_debug_message() for GL_ARB_debug_output.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This lets the driver use pipe_debug_message() for GL_ARB_debug_output.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
i915 render nodes refuse the dumb ioctls, so the simulator would crash on
the original non-apitrace shader-db. Replace them with direct i915 calls
if we detect that we're on one of their gem fds.
Because of the many caveats involved, using -Dc_args instead of CFLAGS
is recommended both by meson upstream and by us.
v2: - Fix typo
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
We don't ever want to do the fmask lookup on a atomic or
store, the fmask should have been decompressed if the
surface has been moved to IMAGE_LAYOUT.
Original patch by Dave Airlie.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fixes GPU hangs on GFX9 with
dEQP-VK.memory.external_memory_host.bind_image_memory_and_render.with_zero_offset.*
Copied from RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Exactly what title says, the new addrlib does not allow the above with
certain dimensions that the CTS seems to hit. Work around it by not
allowing the app to render to it via compat with other 128bpp formats
and do not render to it ourselves during copies.
Fixes: 776b911365 "amd/addrlib: update Mesa's copy of addrlib"
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The driver needs to decompress all image layers if a fast
depth/color clear has been performed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This workaround has been introduced by 135e4d434f for fixing
DXVK GPU hangs with many games. It is no longer needed since
LLVM r345718.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The former expects to see SSA-only things, but the latter injects registers.
The assertions in the lowering where not seeing this because they asserted
on the bit_size values only, not on the is_ssa field, so add that assertion
too.
Fixes: 11dc130779 "nir: Add a bool to int32 lowering pass"
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
A long time ago, when this was first implemented, not having a sampler
bound would cause problems on Fermi. I didn't work out the reasons, but
the solution was simple -- just put the samplers back in.
Since then, regular texturing paths appear to have lost their associated
samplers which required a fuller investigation and fix in nouveau. Now
that this is done, this code should no longer need a sampler state for
fetching texels from a buffer texture.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is (much) faster than using the util fallback.
(Note that there's two methods here, one would use a cache, similar to
the existing code (although the cache was disabled), except the block
decode is done with jit code, the other directly decodes the required
pixels. For now don't use the cache (being direct-mapped is suboptimal,
but it's difficult to come up with something better which doesn't have
too much overhead.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
An earlier patch that introduced the function failed to handle the case
where an image format layout qualifier is not specified, which is allowed
on desktop GL profiles. In these cases, nir_variable's image format is
GL_NONE, and we don't need to print a debug message for those.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Across several projects I've seen new contributors say "I wasn't sure if I
should provide a review tag since I'm not really an expert in this area."
Everyone I know already applies some implicit weighting to reviews from
different people, so encourage participation.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This calls the expensive uif offset function once per utile, but it still
gets us a 212.218% +/- 2.41216% (n=10) win on 1024x1024 glTexImage over
calling it on each pixel.
This lets us store the non-PBO glTexImage data directly into the tiled
image without making an extra untiled memcpy for the gallium transfer.
Improves 1024x1024 TexImage perf by ~19%, mostly from not thrashing around
in the kernel mapping and unmapping the transfer's temporary area.
We're waiting for the jobs-completed count to increment (with wrapping),
not to reach its starting state. This mostly ended up working out because
the next v3d_hw_tick() for a submit CL would end up doing the TFU
operation first, but it did fail when a blit was used for glReadPixels()
at the end of a test.
Fixes: ee0549ff9a ("v3d: Add the V3D TFU submit interface to the simulator.")
In the UAPI, the first BO is the destination, and the one the kernel
should do an exclusive reservation on. Currently we only do exclusive
reservations, anyway. However, in the simulator path I was only copying
back the "destination" BO (actually src in this case), and this caused
regressions once I fixed the simulator to actually complete TFU before
returning (since otherwise, the TFU op would happen at the start of the
next CL submit and the draw would get the right contents).
Fixes: 976ea90bdc ("v3d: Add support for using the TFU to do some blits.")
When copy propagation handles a store/copy, it iterates the current
copy entries to remove aliases, but keeps the "equal" entry (if
exists) to be updated.
The removal step may swap the entries around (to ensure there are no
holes), invalidating previous iteration pointers. The bug was saving
such pointer to use later. Change the code to first perform the
removals and then find the remaining right entry.
This was causing updates to be lost since they were being made to an
entry that was not part of the current copies.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108624
Fixes: b3c6146925 "nir: Copy propagation between blocks"
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes build failure if the LLVM headers aren't in a standard include
directory.
Fixes: ec22dd34c8 "radeonsi: move SI_FORCE_FAMILY functionality to
winsys"
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
When updating a copy entry source value from a "non-SSA" (the data
come from a copy instruction) to a "SSA" (the data or parts of it come
from SSA values), it was possible to hold invalid data in ssa[0]
depending on the writemask. Because the union, ssa[0] could contain a
pointer to a nir_deref_instr left-over from previous non-SSA usage.
Change code to clean up the array before use to avoid invalid data
around.
Fixes: 62332d139c "nir: Add a local variable-based copy propagation pass"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously, we ignored the the glUnmap(..) operation and
flushed before we flush the cbuf. Now, let's just flush
the data when we unmap.
Neither method is optimal, for example:
glMapBufferRange(.., 0, 100, GL_MAP_FLUSH_EXPLICIT_BIT)
glFlushMappedBufferRange(.., 25, 30)
glFlushMappedBufferRange(.., 65, 70)
We'll end up flushing 25 --> 70. Maybe we can fix this later.
v2: Add fixme comment in the code (Elie)
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
With commit 89b479, we moved to tracking buffer cleanliness
when binding.
TEST=dEQP-GLES31.functional.image_load_store.buffer.load_store.r32ui
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Remove a level of indirection to make the code more explicit -- should
make it easier to follow what's going on.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is a move towards using composition instead of inheritance for
different query types.
This change weakens out-of-memory error reporting somewhat, though this
should be acceptable since we didn't consistently report such errors in
the first place.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Order-aware scan/reduce can trade-off LDS traffic for external atomics
memory traffic in producer/consumer compute shaders.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The following race condition could occur in the no-timeout case:
API thread Gallium thread Watchdog
---------- -------------- --------
dd_before_draw
u_threaded_context draw
dd_after_draw
add to dctx->records
signal watchdog
dump & destroy record
execute draw
dd_after_draw_async
use-after-free!
Alternatively, the same scenario would assert in a debug build when
destroying the record because record->driver_finished has not signaled.
Fix this and simplify the logic at the same time by
- handing the record pointers off to the watchdog thread *before* each
draw call and
- waiting on the driver_finished fence in the watchdog thread
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This fulfills a requirement for clients that want to utilize same
code path for images with external formats (VK_FORMAT_UNDEFINED) and
"regular" RGBA images where format is known. This is similar to how
OES_EGL_image_external works.
To support this, we allow color conversion samplers for non-YUV
formats but skip setting up conversion when format does not have
can_ycbcr flag set.
v2: add comment and bundle can_ycbcr to the existing break
condition (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
If a conversion struct was passed, then initialize view using
format from the conversion structure.
v2: use vk_format directly from the anv_format struct
v3: added some assertions (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
If external format is used, we store the external format identifier in
conversion to be used later when creating VkImageView.
v2: rebase to b43f955037 changes
v3: added assert, ignore components when creating external
format conversion (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Since we don't know the exact format at creation time, some initialization
is done only when bound with memory in vkBindImageMemory.
v2: demand dedicated allocation in vkGetImageMemoryRequirements2 if
image has external format
v3: refactor prepare_ahw_image, support vkBindImageMemory2,
calculate stride correctly for rgb(x) surfaces, rename as
'resolve_ahw_image'
v4: rebase to b43f955037 changes
v5: add some assertions to verify input correctness (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v2: have separate memory properties for android, set usage
flags for buffers correctly
v3: code cleanup (Jason)
+ limit maxArrayLayers to 1 for AHardwareBuffer based images
v4: rebase to b43f955037 changes
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v2: add support for non-image buffers (AHARDWAREBUFFER_FORMAT_BLOB)
v3: properly handle usage bits when creating from image
v4: refactor, code cleanup (Jason)
v5: rebase to b43f955037 changes,
initialize bo flags as ANV_BO_EXTERNAL (Lionel)
v6: add assert that anv_bo_cache_import succeeds, add comment
about multi-bo support to clarify current implementation (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This makes it cleaner to introduce more cases where we import memory
from different types of external memory buffers.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Use the anv_format address in formats table as implementation-defined
external format identifier for now. When adding YUV format support this
might need to change.
v2: code cleanup (Jason)
v3: set anv_format address as identifier
v4: setup suggestedYcbcrModel and suggested[X|Y]ChromaOffset
as expected for HAL_PIXEL_FORMAT_NV12_Y_TILED_INTEL
v5: set linear tiling for GPU_DATA_BUFFER usage, add comment
about multi-bo support to clarify current implementation (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
v2: handle R8G8B8X8 as R8G8B8_UNORM (Jason)
v3: add HAL_PIXEL_FORMAT_NV12_Y_TILED_INTEL, we make it define
for now to avoid direct dependency to minigbm headers
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This will make it possible for next patch to rip
anv_image_create_info out from make_surface function.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The quotation marks around 1.0 cause it to be treated as a string
instead of a floating point value. The generator then treats it as an
arbitrary variable replacement, so any iand involving a ('ineg', ('b2i',
a)) matches.
v2: Remove misleading comment about sized literals (suggested by
Timothy). Add assertion that the name of a varible is entierly
alphabetic (suggested by Jason).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Timothy Arceri <tarceri@itsqueeze.com> [v1]
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> [v1]
Fixes: 6bcd2af086 ("nir/algebraic: Add some optimizations for D3D-style Booleans")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109075
Tested on gen9.
v2: Rename lower_txd_3d_surafaces flag to lower_txd_3d (Jason Ekstrand)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Note that we have an official GL extension number, pick the appropriate
section of the GLX spec to modify, and add changelog.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
This has not even had an attempt at implementation. If you asked for
renderer 0 - which, the spec implies, should always work - then
dri2_convert_glx_attribs would fail, we'd silently fall back to creating
an indirect context, and xserver would also not recognize the attribute
and would throw BadValue at you.
The API would be difficult to use in any case, since there's no way to
enumerate how many renderers the screen has. I'd be tempted to add that
by defining:
glXQueryRendererIntegerMESA(dpy, screen,
/* renderer = */ -1,
0, &value);
to return the number of renderers, but a new entrypoint might be
cleaner. Still, better to not specify it at all than to lie about it.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
In one place we say, if GLES isn't supported then the profile version
will be 0.0. Then later we say, if the GLES profile extension isn't
supported then GLX_RENDERER_OPENGL_ES_PROFILE_VERSION_MESA is not
mentioned in the spec. A strict reading of the latter would mean that
GLX_RENDERER_OPENGL_ES_PROFILE_VERSION_MESA is not a recognized token,
and the query should instead return False.
The implementation does not check for the GLES profile extensions, and
the additional complexity doesn't seem worth it. Removing the
interaction text makes the spec match the implementation.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
emit_intrinsic_store_image() is always using 4 components when
collecting registers for the value. When image has less than
4 components (e.g, r32f, rg32i, etc) this results in extra mov
instructions.
This patch uses the actual number of components from the image format.
For example, in a shader like:
layout (r32f, binding=0) writeonly uniform imageBuffer u_image;
...
void main(void) {
...
imageStore (u_image, some_offset, vec4(1.0));
...
}
instruction count is reduced in at least 3 instructions (note image
format is r32f, 1 component only).
This obviously reduces register pressure as well.
v2: - Added support for image formats from NV_image_format extension
(Ilia Mirkin).
- Return 4 components by default instead of asserting. (Rob Clark).
v3: Added more missing formats (Ilia Mirkin).
v4: Added a debug message for unknown image formats (Rob Clark).
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Instead of going all the way back to the variable, just look at the
deref. The modes are guaranteed to be the same by nir_validate whenever
the variable can be found. This fixes clear_unused_for_modes for
derefs that don't have an accessible variable.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Instead of going all the way back to the variable, just look at the
deref. The modes are guaranteed to be the same by nir_validate whenever
the variable can be found. This fixes apply_barrier_for_modes for
derefs that don't have an accessible variable.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is instead of looking all the way back to the variable which may
not exist for all derefs. This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is instead of looking all the way back to the variable which may
not exist for all derefs. This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is instead of looking all the way back to the variable which may
not exist for all derefs. This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This is instead of looking all the way back to the variable which may
not exist for all derefs. This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
If we can't find the variable from the deref, just assume it isn't
invariant and continue on. This can happen if, for instance, we're
writing to a deref that points into an SSBO.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Each time I have to touch the buffer import/export functions in the dri
state tracker I get lost in the maze of functions converting between
DRI_IMAGE_FOURCC, DRI_IMAGE_FORMAT, DRI_IMAGE_COMPONENTS and pipe format.
Rip it out and replace by a single table, which defines the correspondence
between the different representations.
Also this now stores all the known representations in the __DRIimageRec,
to avoid the loss of information we currently have when importing a buffer
with a fourcc, which doesn't have a corresponding dri format.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Currently all the EGL APIs are missing a way to specify how an imported
dma-buf is intended to be used. Demanding the format to be both usable
for sampling and rendering artificially restricts the list of formats a
driver is able to import.
Looking at how the Intel driver implements those DRI2 image APIs it
doesn't distinguish between render or sampler compatible formats. So
this patch aligns behavior between Intel and Gallium based drivers.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There is no need to do the detour over the resource behind the
surface to get the format. Use the surface format directly.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Old versions of meson returned ppc64le as the cpu_family for little
endian power8 cpus, versions >=0.48 don't do this, so the check wouldn't
work in that case. This generalizes the check to work for both old and
new versions of meson.
Fixes: 34bbb24ce7
("meson: Add support for ppc assembly/optimizations")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Gen9-10 have fewer than 4 subslices per slice, so they need this to be
rounded up. Gen11 isn't documented as needing this hack, and it can
also have more than 4 subslices, so the hack actually can break things.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
On some GPUs, especially older Intel GPUs, some math instructions are
very expensive. On those architectures, don't reduce flow control to a
csel if one of the branches contains one of these expensive math
instructions.
This prevents a bunch of cycle count regressions on pre-Gen6 platforms
with a later patch (intel/compiler: More peephole select for pre-Gen6).
v2: Remove stray #if block. Noticed by Thomas.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
That flow control may be trying to avoid invalid loads. On at least
some platforms, those loads can also be expensive.
No shader-db changes on any Intel platform (even with the later patch
"intel/compiler: More peephole select").
v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested
by Rob. See also the big comment in src/intel/compiler/brw_nir.c.
v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from
nir_lower_io_arrays_to_elements.c).
v4: Fix inverted condition in brw_nir.c. Noticed by Lionel.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
If there is a CMP.NZ that compares a single component (via a .zzzz
swizzle, for example) with 0, it can propagate its conditional modifier
back to a previous CMP that writes only that component. The specific
case that I saw was:
cmp.l.f0(8) g42<1>.xF g61<4>.xF (abs)g18<4>.zF
...
cmp.nz.f0(8) null<1>D g42<4>.xD 0D
In this case we can just delete the second CMP.
No changes on Broadwell or Skylake because they do not use the vec4
backend. Also no changes on GM45 or Iron Lake.
Sandy Bridge, Ivy Bridge, and Haswell had similar results. (Sandy Bridge shown)
total instructions in shared programs: 10856676 -> 10852569 (-0.04%)
instructions in affected programs: 228322 -> 224215 (-1.80%)
helped: 1331
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 3.09 x̃: 4
helped stats (rel) min: 0.11% max: 6.67% x̄: 1.88% x̃: 1.83%
95% mean confidence interval for instructions value: -3.19 -2.99
95% mean confidence interval for instructions %-change: -1.93% -1.83%
Instructions are helped.
total cycles in shared programs: 154788865 -> 154732047 (-0.04%)
cycles in affected programs: 2485892 -> 2429074 (-2.29%)
helped: 1097
HURT: 59
helped stats (abs) min: 2 max: 168 x̄: 51.96 x̃: 64
helped stats (rel) min: 0.12% max: 12.70% x̄: 3.44% x̃: 2.22%
HURT stats (abs) min: 2 max: 16 x̄: 3.02 x̃: 2
HURT stats (rel) min: 0.18% max: 0.83% x̄: 0.64% x̃: 0.71%
95% mean confidence interval for cycles value: -51.04 -47.26
95% mean confidence interval for cycles %-change: -3.40% -3.07%
Cycles are helped.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The (-abs(x) >= 0) => (x == 0) optimization is removed from the vec4 and
scalar parts. In the VS part, adding the new pattern was not
helpful. The pattern that is removed is really old, and it has been
handled by NIR for ages.
All Gen7+ platforms had similar results. (Broadwell shown)
total instructions in shared programs: 14715715 -> 14715709 (<.01%)
instructions in affected programs: 474 -> 468 (-1.27%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.12% max: 1.35% x̄: 1.28% x̃: 1.35%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -1.40% -1.15%
Instructions are helped.
total cycles in shared programs: 559569911 -> 559569809 (<.01%)
cycles in affected programs: 5963 -> 5861 (-1.71%)
helped: 6
HURT: 0
helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17
helped stats (rel) min: 1.45% max: 1.88% x̄: 1.73% x̃: 1.85%
95% mean confidence interval for cycles value: -18.15 -15.85
95% mean confidence interval for cycles %-change: -1.95% -1.51%
Cycles are helped.
Iron Lake and Sandy Bridge had similar results. (Iron Lake shown)
total instructions in shared programs: 7780915 -> 7780913 (<.01%)
instructions in affected programs: 246 -> 244 (-0.81%)
helped: 2
HURT: 0
total cycles in shared programs: 177876108 -> 177876106 (<.01%)
cycles in affected programs: 3636 -> 3634 (-0.06%)
helped: 1
HURT: 0
GM45
total instructions in shared programs: 4799152 -> 4799151 (<.01%)
instructions in affected programs: 126 -> 125 (-0.79%)
helped: 1
HURT: 0
total cycles in shared programs: 122052654 -> 122052652 (<.01%)
cycles in affected programs: 3640 -> 3638 (-0.05%)
helped: 1
HURT: 0
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In an instruction sequence like
cmp(8).ge.f0.0 vgrf17:D, vgrf2.xxxx:D, vgrf9.xxxx:D
(+f0.0) sel(8) vgrf1:UD, vgrf8.xyzw:UD, vgrf1.xyzw:UD
The other fields of vgrf17 may be unused, but the CMP still needs to
generate the other flag bits.
To my surprise, nothing in shader-db or any test suite appears to hit
this. However, I have a change to brw_vec4_cmod_propagation that
creates cases where this can happen. This fix prevents a couple dozen
regressions in that patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 5df88c20 ("i965/vec4: Rewrite dead code elimination to use live in/out.")
We were not using the view mask for depth clears, causing only the
first view to be cleared.
Fixes: 2e86f6b259 "radv: Add multiview clears."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The switch directly after the check has a default case that returns
NULL too, so the effective return value is not changed. Also this
check is wrong once we start dealing with formats introduced by an
extension (e.g. YUV formats).
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
I botched some copy-and-paste and clamped to signed int max instead of
uint max. Fixes KHR-GL46.shader_image_load_store.multiple-uniforms on
skl.
Fixes: d3e046e76c ("nir: Pull some of intel's image load/store format
conversion to nir_format.h")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I thought the value was correctly propagated, but actually not.
Fixes: 2ac6d55f38 ("radv: bump reported version to 1.1.90")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
nir_sweep already marks all metadata invalid, so it is safe to release
the memory here too.
mean soft fp64 using uint64: 1,342,759,331 => 1,010,670,475
gfxbench5 aztec ruins high 11: 63,555,571 => 61,889,811
deus ex mankind divided 148: 62,845,304 => 62,829,640
deus ex mankind divided 2890: 71,922,686 => 71,922,686
dirt showdown 676: 69,238,607 => 69,238,607
dolphin ubershaders 210: 77,822,072 => 77,822,072
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Found using pahole.
Changes in peak memory usage according to Valgrind massif:
mean soft fp64 using uint64: 1,343,991,403 => 1,342,759,331
gfxbench5 aztec ruins high 11: 63,619,971 => 63,555,571
deus ex mankind divided 148: 62,887,728 => 62,845,304
deus ex mankind divided 2890: 72,399,750 => 71,922,686
dirt showdown 676: 69,464,023 => 69,238,607
dolphin ubershaders 210: 78,359,728 => 77,822,072
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Replace the old array in each value with a hash table in each value.
Changes in peak memory usage according to Valgrind massif:
mean soft fp64 using uint64: 5,499,875,082 => 1,343,991,403
gfxbench5 aztec ruins high 11: 63,619,971 => 63,619,971
deus ex mankind divided 148: 62,887,728 => 62,887,728
deus ex mankind divided 2890: 72,402,222 => 72,399,750
dirt showdown 676: 74,466,431 => 69,464,023
dolphin ubershaders 210: 109,630,376 => 78,359,728
Run-time change for a full run on shader-db on my Haswell desktop (with
-march=native) is 1.22245% +/- 0.463879% (n=11). This is about +2.9
seconds on a 237 second run. The first time I sent this version of this
patch out, the run-time data was quite different. I had misconfigured
the script that ran the test, and none of the tests from higher GLSL
versions were run. These are generally more complex shaders, and they
are more affected by this change.
The previous version of this patch used a single hash table for the
whole phi builder. The mapping was from [value, block] -> def, so a
separate allocation was needed for each [value, block] tuple. There was
quite a bit of per-allocation overhead (due to ralloc), so the patch was
followed by a patch that added the use of the slab allocator. The
results of those two patches was not quite as good:
mean soft fp64 using uint64: 5,499,875,082 => 1,343,991,403
gfxbench5 aztec ruins high 11: 63,619,971 => 63,619,971
deus ex mankind divided 148: 62,887,728 => 62,887,728
deus ex mankind divided 2890: 72,402,222 => 72,402,222 *
dirt showdown 676: 74,466,431 => 72,443,591 *
dolphin ubershaders 210: 109,630,376 => 81,034,320 *
The * denote tests that are better now. In the tests that are the same
in both patches, the "after" peak memory usage was at a different
location. I did not check the local peaks.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
D3D Booleans use a 32-bit 0/-1 representation. Because this previously
matched NIR exactly, we didn't have to really optimize for it. Now that
we have 1-bit Booleans, we need some specific optimizations to chew
through the D3D12-style Booleans.
Shader-db results on Kaby Lake:
total instructions in shared programs: 15136811 -> 14967944 (-1.12%)
instructions in affected programs: 2457021 -> 2288154 (-6.87%)
helped: 8318
HURT: 10
total cycles in shared programs: 373544524 -> 359701825 (-3.71%)
cycles in affected programs: 151029683 -> 137186984 (-9.17%)
helped: 7749
HURT: 682
total loops in shared programs: 4431 -> 4399 (-0.72%)
loops in affected programs: 32 -> 0
helped: 21
HURT: 0
total spills in shared programs: 10290 -> 10051 (-2.32%)
spills in affected programs: 2532 -> 2293 (-9.44%)
helped: 18
HURT: 18
total fills in shared programs: 22203 -> 21732 (-2.12%)
fills in affected programs: 3319 -> 2848 (-14.19%)
helped: 18
HURT: 18
Note that a large chunk of the improvement fixing regressions caused by
switching to 1-bit Booleans. Previously, our ability to optimize D3D
booleans was improved by using the D3D representation directly in NIR.
Now that NIR does 1-bit bools, we need a few more optimizations.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is a squash of a few distinct changes:
glsl,spirv: Generate 1-bit Booleans
Revert "Use 32-bit opcodes in the NIR producers and optimizations"
Revert "nir/builder: Generate 32-bit bool opcodes transparently"
nir/builder: Generate 1-bit Booleans in nir_build_imm_bool
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We also have to add support for 1-bit integers while we're here so we
get 1-bit variants of iand, ior, and inot.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This commit adds support for 1-bit Booleans and integers. Booleans
obviously take a value of true or false. Because we have to define the
semantics of 1-bit signed and unsigned integers, we define uint1_t to
take values of 0 and 1 and int1_t to take values of 0 and -1. 1-bit
arithmetic is then well-defined in the usual way, just with fewer bits.
The definition of int1_t and uint1_t doesn't usually matter but we do
need something for purposes of constant folding.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This commit contains three related changes. First, we define boolN_t
for N = 8, 16, and 64 and move the definition of boolN_vec to the loop
with the other vec definitions. Second, there's no reason why we need
the != 0 on the source because that happens implicitly when it's
converted to bool. Third, for destinations, we use a signed integer
type and just do -(int)bool_val which will give us the 0/-1 behavior we
want and neatly scales to all bit widths.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is a squash of a bunch of individual changes:
nir/builder: Generate 32-bit bool opcodes transparently
nir/algebraic: Remap Boolean opcodes to the 32-bit variant
Use 32-bit opcodes in the NIR producers and optimizations
Generated with a little hand-editing and the following sed commands:
sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c
Use 32-bit opcodes in the NIR back-ends
Generated with a little hand-editing and the following sed commands:
sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This was originally added for the out-of-tree Mali driver but I think
we've all agreed it's easy enough for them to just do in their back-end.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Shader-db results on Kaby Lake:
total instructions in shared programs: 15072525 -> 15072525 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
This helps prevent regressions in later commits.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Instead of looking at input_sizes[i] which contains the number of
components for each source, we look at the bit size of input_types[i].
This fixes a regression in the 1-bit boolean series though I have no
idea how we haven't seen it before now.
Fixes: 35baee5dce "nir/constant_folding: fix incorrect bit-size check"
Fixes: 9076c4e289 "nir: update opcode definitions for different bit sizes"
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The previous code was creating a boolean by doing an arithmetic right-
shift by 31 which produces a boolean which is true if the argument is
negative. This is the same as the expression r < 0 which is much
simpler and doesn't depend on NIR's representation of booleans.
Reviewed-by: Eric Anholt <eric@anholt.net>
Sadly, the GLX_USE_APPLEGL and GLX_USE_WINDOWSGL cases are not identical
(because GLX_USE_WINDOWSGL uses vtables rather than a maze of ifdefs)
Include <sys/time.h> again, as functions prototyped by it are used in
the GLX_USE_WINDOWSGL path.
Make the include guard around the __glxGetMscRate() definition match the
one at it's declaration again, as it's referenced from dri_common.c
which is built for GLX_USE_WINDOWSGL.
Fixes: a95ec138 ("glx: mandate xf86vidmode only for "drm" dri platforms")
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
These have all been floating in my head, and while I've thought about
encoding them in issues on gitlab once they're enabled, they also make
sense to just have in the area of the code you'll need to work in.
Follows 3954331aff ("vc4: Pull uinfo->data[i] dereference out to the top
of the loop.") which showed a large performance win for vc4, but also
cleans up the code a decent bit.
After generating VIR, we leave c->cursor pointing at the end of the
shader. If the shader had dead code at the end (for example from preamble
instructions in a shader with no side effects), we would assertion fail
that we were leaving the cursor pointing at freed memory. Since anything
following DCE should be setting up a new cursor anyway, just clear the
cursor at the start.
In trying to enable compute shaders, I found that a bunch of deqp-gles31's
compute stuff wanted to interact with indirect dispatch. This was easy to
do on its own.
The thrsw will invalidate rtop, just like accumulators and flags. Caught
by simulator assertions in CS imulextended/umulextended tests.
Fixes: 90269ba353 ("broadcom/vc5: Use THRSW to enable multi-threaded shaders.")
Just like vc4, we have to support linear shared BOs for X11 on arbitrary
displays. When we're faced with a request to texture from one of those,
make a shadow image that we copy using the TFU at the start of the draw
call.
generatemipmap is just filling out the rest of the mipmap that's already
been written (by a mapping or a draw call), so it didn't matter. As I
reuse the TFU code for linear-to-UIF conversions, it'll start mattering.
Otherwise we may race to read old contents. This didn't show up in the
CTS and piglit for me, but it did once I started using the TFU to do
linear->UIF blits for X11.
Fixes: 2ebca177dc ("v3d: Use the TFU to do generatemipmap.")
Same as on nv50, the TXF op always uses the TSC bound to slot 0,
returning blank values if nothing is bound.
An earlier change arranges for the TSC entries list to always have valid
data at entry 0, so here we just make use of it.
Fixes arb_texture_buffer_object-subdata-sync among others.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This was used for implementing FBFETCH. However that uses TXF, which
doesn't do much with a TSC. The only important bit is that sRGB-decoding
works as expected, which we can achieve since all samplers we ever
generate enable sRGB-decoding. Always point to entry 0 in the TSC table,
and ensure that even before it ever gets initialized, the sRGB-decoding
enable bit is set.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
For older gen's fd_wfi() is used to conditionally insert a WFI if there
hasn't already been one since last draw. But this doesn't work out well
with stateobj since the order the stateobj is evaluated might not be
what you expect. (Ie. stateobj might not be evaluated until a later
draw if there is no geometry from the current draw in a given tile.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Gen9 hardware requires some workarounds to disable preemption depending
on the type of primitive being emitted.
We implement this by adding a function that checks the primitive type
and number of instances right before the 3DPRIMITIVE.
For now, we just ignore blorp. The only primitive it emits is
3DPRIM_RECTLIST, and since it's not listed in the workarounds, we can
safely leave preemption enabled when it happens. Or it will be disabled
by a previous 3DPRIMITIVE, which should be fine too.
v3:
- Apply missing workarounds for instanced rendering and line loop (Ken)
- Move workaround code to brw_draw_single_prim()
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Set bit when initializing context.
v3:
- Always toggle preemption bool to false before enabling it for the
first time, so the state gets emitted (Chris Wilson).
- Emit end of pipe sync with PIPE_CONTROL_RENDER_TARGET_FLUSH (Ken)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now everything with type 'struct slab_child_pool *' is name pool, and
everything with type 'struct slab_mempool *' is named mempool.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
There is no need to have an extra ctx paramter as all the other
parameters carry all the needed information.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
When we first started using genxml, we decided to represent MOCS as an
actual structure, and pack values. However, in many places, it was more
convenient to use a numeric value rather than treating it as a struct,
so we added secondary setters in a bunch of places as well.
We were not entirely consistent, either. Some places only had one.
Gen6 had both kinds of setters for STATE_BASE_ADDRESS, but newer gens
only had the struct-based setters. The names were sometimes "Constant
Buffer Object Control State" instead of "Memory", making it harder to
find. Many had prefixes like "Vertex Buffer MOCS"...in a vertex buffer
packet...which is a bit redundant.
On modern hardware, MOCS is simply an index into a table, but we were
still carrying around the structure with an "Index to MOCS Table" field,
in addition to the direct numeric setters. This is clunky - we really
just want a number on new hardware.
This patch eliminates the struct-based setters, and makes the numeric
setters be consistently called "MOCS". We leave the struct definition
around on Gen7-8 for reference purposes, but it is unused.
v2: Drop bonus "Depth Buffer MOCS" fields on Gen7.5 and Gen9
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
The pass did not correctly handle loops ending in:
if ssa_7 {
block block_8:
/* preds: block_7 */
continue
/* succs: block_1 */
} else {
block block_9:
/* preds: block_7 */
break
/* succs: block_11 */
}
The break will get eliminated by another opt but if this pass gets
called first (as it does on RADV) we ended up inserting
instructions after the break.
Fixes: 5921a19d4b ("nir: add if opt opt_if_loop_last_continue()")
Reviewed-by: Dave Airlie <airlied@redhat.com>
pctx->resource_copy_region() needs to fall back to sw copy for
non-renderable formats. But previously for things that we could
not use the blitter for, would fall back to 3d. Which won't work
if 3d can't render to the dst format either.
Instead rework things to fallback to fd_resource_copy_region(),
which will try 3d core and then fall back to memcpy().
Fixes (for example) dEQP-GLES3.functional.texture.format.sized.2d.rgb9_e5_pot
Signed-off-by: Rob Clark <robdclark@gmail.com>
Fixes a crash with unsupported formats in dEQP-GLES3.functional.texture.format.sized.2d.rgb9_e5_pot
Also fixes gpu hangs with some formats that are supported, but which we
don't know what internal-format to use for the blitter, for ex
dEQP-GLES3.functional.texture.format.sized.2d_array.rgb10_a2_pot
Signed-off-by: Rob Clark <robdclark@gmail.com>
Fixes a crash in dEQP-GLES3.functional.shaders.fragdepth.compare.fragcoord_z
Fixes: 0d240c2214 freedreno/ir3: don't fetch unused tex components
Signed-off-by: Rob Clark <robdclark@gmail.com>
If we emit shader as a pointer to a GEM object, also set the RELOC_DUMP
flag as a hint to kernel that this is a useful buffer to snapshot for
debug dumps.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Pull in updated UAPI and use kernel API version to enable softpin.
Since MSM_SUBMIT_BO_DUMP flag was added at same time, use that to
signal to kernel that cmdstream buffers are useful to dump for
debugging/cmdstream-traces.
Signed-off-by: Rob Clark <robdclark@gmail.com>
I needed the same function for v3d. This was originally in d3e046e76c
("nir: Pull some of intel's image load/store format conversion to
nir_format.h") before we made am istake about simplifying the function.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This reverts commit 06fbcd2cd5.
nir_pack_half_2x16_split *isn't* vectorizable, it's 1-component only, thus
why we had this split-scalar code in the first place.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This helps a lot when debugging image load/store lowering on large
testcases. Unfortunately the Mesa enum name stuff is under src/mesa and
we can't get at it from the compiler.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We have a NIR path, and V3D doesn't have TGSI input for compute (only what
TTN can handle for the various gallium-internal shaders).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
If we gave this function 0 counter buffers, we'd still try and
access pCounterBuffers[0] as this check was incorrect.
Fixes crash with ext_transform_feedback-pipeline-basic-primgen
on zink on radv.
Fixes: 677b496b6 (radv: fix begin/end transform feedback with 0 counter buffers.)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The pass should work for all bit sizes but it's less clear that the
extra instructions are worth it on small integers. Also, the hardware
doesn't do mul_high on anything other than 32-bit integers and, absent
any decent mechanism for testing the pass on 8 and 16-bit types, it's
probably best to just leave it disabled for now.
Shader-db results on Sky Lake:
total instructions in shared programs: 15105795 -> 15111403 (0.04%)
instructions in affected programs: 72774 -> 78382 (7.71%)
helped: 0
HURT: 265
Note that hurt here actually means helped because we're getting rid of
integer quotient operations (which are a send on some platforms!) and
replacing them with fairly cheap ALU ops.
Reviewed-by: Ian Romanick ian.d.romanick@intel.com
It's a reasonably well-known fact in the world of compilers that integer
divisions by constants can be replaced by a multiply, an add, and some
shifts. This commit adds such an optimization to NIR for easiest case
of udiv. Other division operations will be added in following commits.
In order to provide some additional driver control, the pass takes a
minimum bit size to optimize.
Reviewed-by: Ian Romanick ian.d.romanick@intel.com
Currently we have the three dri "platforms" - drm, apple and windows.
Since xf86vidmode is a thing only for the drm one, adjust the
preprocessor guards and correctly check for the dependency.
v2: terminate the GLX_USE_WINDOWSGL hunk
Cc: Jon TURNEY <jon.turney@dronecode.org.uk>
Fixes: 5bc509363b ("glx: make xf86vidmode mandatory for direct rendering")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Virglrenderer does the wrong thing when given an instance divisor;
it tries to use the element-index rather than the binding-index as
the argument to glVertexBindingDivisor(). This worked fine as long
as there was a 1:1 relationship between elements and bindings,
which was the case util 19a91841c3 "st/mesa: Use Array._DrawVAO in
st_atom_array.c.".
So let's detect instance divisors, and restore a 1:1 relationship in
that case. This will make old versions of virglrenderer behave
correctly. For newer versions, we can consider making a better
interface, where the instance divisor isn't specified per element,
but rather per binding. But let's save that for another day.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 19a91841c3 "st/mesa: Use Array._DrawVAO in st_atom_array.c."
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Tested-By: Gert Wollny <gert.wollny@collabora.com>
When hile_size is 0, we can't enable HTILE. This doesn't change
anything, except not calling radv_image_alloc_htile().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Feral games aren't affected because they don't decompress DCC.
F1 2018 has one DCC decompression per frame, but I don't see
any performance improvements. This new predicate will be
probably more useful for DCC/MSAA.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
I needed the same functions for v3d. Note that the color value in the
Intel lowering has already been cut down to image.chans num_components.
v2: Drop the half float one, since it was a 1-liner after cleanup.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Most of the bits were constant, but a few were missed. Avoids warnings
from v3d's upcoming static const bits declarations.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Here we rework force_unroll_array_access() so that we can reuse
the induction variable detection in a following patch.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
This documents a process for using GitLab Merge Requests as an second
way to submit code changes for Mesa. Only one of the two methods is
allowed for each patch series.
We will *not* require all patches to be emailed. Some code changes may
be reviewed and merged without any discussion on the mesa-dev email
list.
v2:
* No longer require email. Allow submitter to choose email or a
GitLab merge request.
* Various feedback from Brian, Daniel, Dylan, Eric, Erik, Jason,
Matt, Michel and Rob.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Rob Clark <robdclark@gmail.com>
This has thrown a few people off recently and it's good to have the
process and all the rational for it documented somewhere. A comment at
the top of nir_inline_functions seems as good a place as any.
Acked-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
After going through the spec changelog, it looks like RADV
is up to date. Note that ANV also reports 1.1.90.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When I made sure that half-float texture-filtering was required for ES3,
I didn't realize that virgl doesn't report support for this correctly.
This regressed the GLES version available on top of several drivers,
including i965 from 3.2 to 2.0.
This is going to need protocol changes to fix properly, so let's just
restore the previous behavior by enabling floating-point filtering
unconditionally for now.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: fcf9fcee3c "mesa/main: do not require float-texture filtering for es3"
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
The implementation of these opcodes in the generator assumes that their
arguments are packed, and it generates register regions based on that
assumption.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
These are usually used for dealing with sparse resources but there's no
reason why we can't hook them up before we have sparse. We have the
hardware; let's light it up.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We have to lower some shadow instructions because they don't exist in
hardware and we have to lower txb+offset+clamp because the message gets
too big and we run into the sampler message length limit of 11 regs.
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
I don't know if one is better than the other or not but this approach
has the advantage that we never forget to copy information over and
we're not hard-coding quite as many assumptions. It's also a lot
simpler and much less code.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Instead of having to call two different lower_gradient functions based
on whether or not it's a cube, just make lower_gradient handle cubes.
This significantly simplifies some of the logic.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This simple check helps catch bugs early that can end up propagating
into later stages of the compile and triggering strange asserts.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
AoS sampling tries to use integers for coord wrapping when possible,
as it should be faster. However, for AVX, this was suboptimal, because
only floats can use 8x32bit vectors, whereas integers have to be split
into 4x32bit vectors. (I believe part of why it was slower was also
that at least earlier llvm versions had trouble optimizing it properly,
since you can still do simple bit ops with 8x32bit vectors, so a
sequence of int add / and / int add / and with such vectors would
actually end up doing 128bit inserts/extracts between the operations
instead of just doing the cheap 128bit ands.)
Hence, a special float coord wrapping path was added to AoS sampling.
But this path was actually disabled for a long time already, since we
found that just splitting everything before entering the AoS path was
still sligthly faster usually, so none of this float coord wrapping
code was used anymore (AoS sampling code, when avx2 isn't supported,
never sees vectors with length > 4). I thought it might be useful some
day again, but I'm not interested anymore in optimizing for very weird
instruction sets which have support for 256bit vectors for floats but
not for ints, so just drop it.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The commit aa0fed10d3 moved a bunch of Freedreno code to a common
directory. The previous directory had a .dir-locals file for Emacs.
This patch copies it to the new directory as well.
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
LinkedTransformFeedback is normally populated, which had nerf'd varying
packing since the check was introduced.
Fixes: dbd52585fa st/nir: Disable varying packing when doing transform feedback.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The Vulkan working group recently discovered that we made a mistake in
assuming that PCI domains are 16-bit even though they can potentially be
32-bit values. To fix this, the next spec update will change the types
in the VK_EXT_pci_bus_info struct to be 32 bits which will be a
backwards-incompatible change. Normally, Khronos tries very hard to
never make backwards incompatible changes to specs. Hopefully, the
extension is new enough (2 months) that there are no shipping apps which
use the extension so this should be safe.
This commit disables the extension for both anv and radv in mesa and
should be back-ported to 18.3 ASAP so we avoid any potential issues with
new apps running on old drivers. I'll send out a commit (which we can
also back-port to 18.3 if we really care) to re-enable the extension in
both drivers once this week's spec update ships. The one known use of
this extension is internal to mesa and will continue working with the
extension disabled and will naturally update when we get a new header.
Cc: "18.3" <mesa-stable@lists.freedesktop.org>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
There's a few missing and convoluted bits:
- FramebufferTexture2DMultisampleEXT
Missing sanity check, should be desktop="false"
- RenderbufferStorageMultisampleEXT
Missing sanity check, is aliased to RenderbufferStorageMultisample.
Thus it's set only when desktop GL or GLES2 v3.0+, while the extension
is GLES2 2.0+.
If we flip the aliasing we'll break indirect GLX, so loosen the version
to 2.0. Not perfect, yet this is the most sane thing I could think of.
v2: [Emil] Fixup RenderbufferStorageMultisampleEXT, commmit message
Cc: Kristian H. Kristensen <hoegsberg@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108974
Fixes: 1b331ae505 ("mesa: Add core support for EXT_multisampled_render_to_texture{,2}")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Commit b028ce29f0 fixed a typo in
src/freedreno/Makefile.am, but ended up breaking the build for
freedreno. The typo inadvertently made things work, as we were not
supposed to link with libnir or libmesautil to begin with. Those come
in through libmesagallium and the typo prevented the duplicated
linkage.
Fixes: b028ce29f ("freedreno: add the missing _la in libfreedreno_ir3_la")
Cc: Emil Velikov <emil.velikov@collabora.com>
While disassembling the predicate always print flag subregister number
to keep grammar same across the generation for assembler tool.
v2: Combine consecutive format calls (Matt Turner)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When RepCtrl is set, the swizzle field is ignored by the hardware. In
order to ensure a 1-to-1 correspondence between the human-readable
disassembly and the binary instruction encoding always set the swizzle
to XXXX (all zeros) when it is unused due to RepCtrl
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Just to make it easier to run a nir tests together.
Fixes: a0ae12ca91
("nir/algebraic: Add unit tests for bitsize validation")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Currently we distinguish if the drawable is a window or pixmap by
checking xcb_present_select_input throws an error or not.
Yet, we don't always free the error state returned by xcb.
Cc: Kirill Burtsev <kirill.burtsev@qt.io>
Cc: Boyan Ding <boyan.j.ding@gmail.com>
Fixes: 6bd9ba7d07 ("loader: Add dri3 helper")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
[Emil: add commit message, fixes tag]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Following commits will introduce additional fields such as
guessed_trip_count. Renaming these will help avoid confusion
as our unrolling feature set grows.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
load_register_imm and load_register_mem take the destination as the
first argument, so I'd like load_register_reg to do the same the sake
of consistency. Otherwise, reading sequences of mixed LRI/LRM/LRR is
needlessly confusing.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
opnd() might delete the passed in instruction, but it's used through
i->srcExists() later in visit
v2: use continue instead return
v3: use brackets for the outer if/else chain
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
this race condition is pretty harmless, but also pretty trivial to fix
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Shaders are usually quite short, and are private to the context. We can
save memory and reduce the work the kernel needs to do at exec time by
packing them together in a stream uploader for long-lived state.
The TFU lets us format raster and SAND images into formats that can be
read by the texture engine, and do mipmap generation.
The UAPI comes from drm-next e69aa5f9b97f ("Merge tag
'drm-misc-next-2018-12-06' of git://anongit.freedesktop.org/drm/drm-misc
into drm-next")
The HW apparently has some issues (or at least a much more complicated VCM
calculation) with non-combined segments, and the closed source driver also
uses combined I/O. Until I get the last CTS failure resolved (which does
look plausibly like some VPM stomping), let's use combined I/O too.
We were exposing ARB_texture_float, but apparently not the OES subset
flag. Fixes regression from GLES3 support to GLES2.
Fixes: fcf9fcee3c ("mesa/main: do not require float-texture filtering
for es3")
In the softpin world, surface state base address may be a fixed 64-bit
address (with no associated BO). It makes sense to store this in the
offset field. But it needs to be the full size.
We also update the clear color address to be consistently uint64_t
everywhere so we can continue passing intel_miptree_get_clear_color
a pointer to the blorp_address's offset field without type mismatches.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Fix an emberrasing memory leak with the non-softpin submit/rb
implementation.
Fixes: f3cc0d2747 freedreno: import libdrm_freedreno + redesign submit
Signed-off-by: Rob Clark <robdclark@gmail.com>
Detect when a component of an (for example) texture fetch is unused and
propagate the updated wrmask back to the parent instruction.
Signed-off-by: Rob Clark <robdclark@gmail.com>
If we have an reloc from stateobjA to stateobjB, we would previously
leave stateobjB's bos out of the submit's bos table. Handle this case
by copying into stateobjA's reloc_bos table.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Linking against LLVM built with BUILD_SHARED_LIBS fails otherwise,
as the component is required for the draw module.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
There is not much to do in freedreno - tile layout and multisample
state for gmem renderings is programmed based on the pfb sample count,
while resolve blits take the destination sample count from the resource.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
In gallium, we model the attachment sample count as a new nr_samples
field in pipe_surface. A driver can indicate support for the extension
using the new pipe cap, PIPE_CAP_MULTISAMPLED_RENDER_TO_TEXTURE.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
This new pipe cap and the new nr_samples field in pipe_surface lets a
state tracker bind a render target with a different sample count than
the resource. This allows for implementing
EXT_multisampled_render_to_texture and
EXT_multisampled_render_to_texture2.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
This also turns on EXT_multisampled_render_to_texture which is a
subset of EXT_multisampled_render_to_texture2, allowing only
COLOR_ATTACHMENT0.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is
one if 8, 16, 32, or 64. This leads to having a few more opcodes but
now everything is consistent and booleans aren't a weird special case
anymore.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Suffixes are dropped from a bunch of conversion opcodes when it makes
sense to do so. Others are kept if we really do want the bit-size
restriction.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
All conversion opcodes require a destination size but this makes
constructing certain algebraic expressions rather cumbersome. This
commit adds support to nir_search and nir_algebraic for writing
conversion opcodes without a size. These meta-opcodes match any
conversion of that type regardless of destination size and the size gets
inferred from the sizes of the things being matched or from other
opcodes in the expression.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Instead of using an OrderedDict, just have a (necessarily sorted) array
of transforms and a set of opcodes.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Both of these things are already handled in the Value base class so we
don't need to handle them explicitly in Constant.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The non-failure path can be tested by just compiling mesa and then
testing it, but the failure paths won't be hit unless you make a mistake,
so it's best to test them with some unit tests.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Before this commit, there were two copies of the algorithm: one in C,
that we would use to figure out what bit-size to give the replacement
expression, and one in Python, that emulated the C one and tried to
prove that the C algorithm would never fail to correctly assign
bit-sizes. That seemed pretty fragile, and likely to fall over if we
make any changes. Furthermore, the C code was really just recomputing
more-or-less the same thing as the Python code every time. Instead, we
can just store the results of the Python algorithm in the C
datastructure, and consult it to compute the bitsize of each value,
moving the "brains" entirely into Python. Since the Python algorithm no
longer has to match C, it's also a lot easier to change it to something
more closely approximating an actual type-inference algorithm. The
algorithm used is based on Hindley-Milner, although deliberately
weakened a little. It's a few more lines than the old one, judging by
the diffstat, but I think it's easier to verify that it's correct while
being as general as possible.
We could split this up into two changes, first making the C code use the
results of the Python code and then rewriting the Python algorithm, but
since the old algorithm never tracked which variable each equivalence
class, it would mean we'd have to add some non-trivial code which would
then get thrown away. I think it's better to see the final state all at
once, although I could also try splitting it up.
v2:
- Replace instances of "== None" and "!= None" with "is None" and
"is not None".
- Rename first_src to first_unsized_src
- Only merge the destination with the first unsized source, since the
sources have already been merged.
- Add a comment explaining what nir_search_value::bit_size now means.
v3:
- Fix one last instance to use "is not" instead of !=
- Don't try to be so clever when choosing which error message to print
based on whether we're in the search or replace expression.
- Fix trailing whitespace.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Nothing to do, the compiler already handles that.
All new dEQP.VK.ubo.* and dEQP.VK.ssbo.* pass, except some
16-bit tests that are quite related to fdo bug #108114.
Only enable the extension on CIK+ because it might not work on SI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The original code was modifying the global drisw_lf variable, which is bad
when there are multiple contexts in single process, each initialized with
different loader. One may support put_image_shm and the other not.
Since there are currently only two possible combinations, lets create two
global tables, one for each. Lets make them const, since we won't change them
and they can be shared.
This fixes crash in VLC. It used two GL contexts (each in different thread), one
was initialized by its Qt GUI, the other by its video output plugin. The first
one set the put_image_shm=drisw_put_image_shm, the second did not, but
since the same structure was used, the drisw_put_image_shm was used too. Then
it crashed because the second loader did not have putImageShm set.
Downstream bug:
https://bugzilla.opensuse.org/show_bug.cgi?id=1113533
v2: Added Fixes and described the VLC bug.
Fixes: 63c427fa71 ("drisw: use putImageShm if available")
Signed-off-by: Michal Srb <msrb@suse.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
In case we are unlucky if the low part is 0xffffffff.
Fixes: 5d6a560a29 ("radv: do not use the availability bit for timestamp queries")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If the driver used a compute shader for resetting a query pool,
it should be completed when caches are flushed.
This might reduce the number of stalls if operations are done
between vkCmdResetQueryPool() and vkCmdBeginQuery()
(or vkCmdWriteTimestamp()).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Alex Smith <asmith@feralinteractive.com>
After investigating on this, it appears that COND_WRITE doesn't
work correctly in some situations. I don't know exactly why does
it fail to update DB_Z_INFO.ZRANGE_PRECISION, but as AMDVLK
also uses COND_EXEC I think there is a reason.
Now the driver stores a new metadata value in order to reflect
the last fast depth clear state. If a TC-compat HTILE is fast cleared
with 0.0f, we have to update ZRANGE_PRECISION to 0 in order to
work around that hardware bug.
This fixes rendering issues with The Forest and DXVK and doesn't
seem to introduce any regressions.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108914
Fixes: 68dead112e ("radv: update the ZRANGE_PRECISION value for the TC-compat bug")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Technically speaking, this validation was incorrect, because GL_RGB565
is only supported in OpenGL ES 1.x if OES_framebuffer_object is
supported. This couldn't lead to any real incorrect behavior, because
all drivers support OES_framebuffer_object. But let's keep the code
self-documenting, by correcting the check as per the spec.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
For format fallbacks like ETC and ASTC, switching between sRGB and linear
decoding is undefined, or at least is not bit-exact. Same as
EXT_texture_sRGB_decode on GLES.
There are no piglit or dEQP regresssions.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
1. tools/i965_disasm.c:58:4: warning:
ignoring return value of ‘fread’,
declared with attribute warn_unused_result
fread(assembly, *end, 1, fp);
v2: Fixed incorrect return value check.
( Eric Engestrom <eric.engestrom@intel.com> )
v3: Zero size file check placed before fread with exit()
( Eric Engestrom <eric.engestrom@intel.com> )
v4: - Title is changed.
- The 'size' variable was moved to top of a function scope.
- The assertion was replaced by the proper error handling.
- The error message on a caller side was fixed.
( Eric Engestrom <eric.engestrom@intel.com> )
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
../src/intel/tools/aubinator_error_decode.c: In function ‘instdone_register_for_ring’:
../src/intel/tools/aubinator_error_decode.c:177:4: warning: enumeration value ‘I915_ENGINE_CLASS_INVALID’ not handled in switch [-Wswitch]
switch (class) {
^~~~~~
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
It doesn't seem like the exact number has too much effect on the
performaince in "teximage". However setting it to just about anything
prevents some OOMs from getting hit. These values are not well-tuned,
but don't seem too bad.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
All TXF operations implicitly use sampler 0, and fail if it's not bound
to anything. This does not happen in LINKED_TSC mode, but we don't
currently use this.
We ensure that TSC entry at id 0 has the SRGB conversion bit enabled
(and all samplers we normally generate will too). Then when the TSC at
*slot* 0 (not to be confused with entry 0 in the global TSC table) is
unbound, we bind it to entry 0. This way, TXF operations are not
dependent on there being a regular sampler bound there.
Fixes arb_texture_buffer_object-subdata-sync among others. (TBO's are
particularly susceptible to this as they don't bind a sampler.)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This fixes some crucible 3d miptree tests I've been working on
when executed using the compute shader path.
Fixes: d08f267814 (radv/gfx9: fix 3d image to image transfers on compute queues.)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
These days we don't always allocate scanout compatible textures anymore.
That does mean we have to fix the radv android WSI though.
Fixes: b1444c9ccb "radv: Implement VK_ANDROID_native_buffer."
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This extension is not properly tested (testing for
GL_ARB_fragment_shader_interlock is not sufficient), and since this was
noted in review on August 28th no tests have been sent.
Revert "i965: Add INTEL_fragment_shader_ordering support."
Revert "mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering"
This reverts commit 03ecec9ed2.
This reverts commit 119435c877.
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Eric Anholt <eric@anholt.net>
The OpenGL ES 3.0 specification, table 3.13 lists half-float textures as
filterable, but not float textures. So we shouldn't depend on
ARB_float_texture, which requires full filtering support for both.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
sRGB textures is a requirement for OpenGL ES 3.0, so let's make sure
we don't incorrectly enable a too high version.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
OpenGL ES 3.0 require this functionality, so we should also test for it
to avoid incorrectly exposing a too high GLES version.
On desktop, this has been required since all the way back in OpenGL 1.2
anyway.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
On OpenGL ES 2.0, there's separate extensions adding support for
half-float and float textures. So we need to validate the enums
separately as well.
This also prevents these enums from incorrectly being allowed on
OpenGL ES 1.x, where there's no extension that enables this in the
first place.
While we're at it, remove the pointless default-case, and the seemingly
stale fallthrough comment.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ctx->Extensions.EXT_texture_sRGB_R8 is set regardless of the API
that's used, so checking for those direcly will always allow the
enums from this extensions when they are supported by the driver.
There's no extension adding support for this on OpenGL ES before
version 3.0, so let's tighten the check.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ctx->Extensions.EXT_texture_sRGB is set regardless of the API that's
used, so checking for those direcly will always allow the enums from
this extensions when they are supported by the driver.
There's no extension adding support for this on OpenGL ES before
version 3.0, so let's tighten the check.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ctx->Extensions.EXT_texture_snorm is set regardless of the API
that's used, so checking for those direcly will always allow the
enums from this extensions when they are supported by the driver.
There's no extension adding support for this on OpenGL ES before
version 3.0, so let's tighten the check.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ctx->Extensions.OES_texture_float is set regardless of the API
that's used, so checking for those direcly will always allow the
enums from this extensions when they are supported by the driver.
There's no extension enabling floating-point textures for OpenGL
ES 1.x, so we shouldn't allow those enums there.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ctx->Extensions.EXT_texture_type_2_10_10_10_REV is set regardless of
the API that's used, so checking for those direcly will always enable
extensions when they are supported by the driver.
There's no corresponding extension for OpenGL ES 1.x/2.0, so we
shouldn't allow these enums there.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This extension requies OpenGL, and shouldn't be available on OpenGL ES.
So let's not allow the enums from it either.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ctx->Extensions.EXT_texture_shared_exponent is set regardless of the
API that's used, so checking for those direcly will always allow the
enums from this extensions when they are supported by the driver.
We also need to make sure this is enabled on OpenGL ES 3. Because the
check is repeated, let's introduce a helper.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
EXT_packed_float isn't supported on OpenGL ES, we shouldn't allow
these enums there, before OpenGL ES 3.0 which also introduce support
for these enums.
Since this check is repeated a lot, let's make a helper for this.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
EXT_packed_float isn't supported on OpenGL ES, we shouldn't allow
these enums there, before OpenGL ES 3.0 which also introduce support
for these enums.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Floating-point depth buffers are only supported on OpenGL 3.0, OpenGL ES
3.0, or if ARB_depth_buffer_float is supported. Because we checked a
driver capability rather than using an extension-check helper, we ended
up incorrectly allowing this on OpenGL ES 1.x and 2.x.
Since this logic is repeated, let's make a helper for it.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Integer textures shouldn't be implicitly exposed on OpenGL ES 1.x and
2.x, but because the code checked against a driver-capability rather
than using an extension-check helper, we ended up accidentally allowing
these enums on older versions when the driver supports it.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ARB_texture_rgb10_a2ui isn't supported on OpenGL ES, we shouldn't expose
it there even if the driver supports it.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ctx->Extensions.ARB_texture_stencil8 is set regardless of the API
that's used, so checking for those direcly will always allow the
enums from this extensions when they are supported by the driver.
So let's instead check for both ARB_texture_stencil8 and
OES_texture_stencil8, so we support depth textures on OpenGL and
OpenGL ES 2.0+. There's no extension enabling stencil-textures for
OpenGL ES 1.x, so we shouldn't allow those enums there.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ctx->Extensions.ARB_depth_texture is set regardless of the API that's
used, so checking for those direcly will always allow the enums from
this extensions when they are supported by the driver.
So let's instead check for both ARB_depth_texture and OES_depth_texture,
so we support depth textures on OpenGL and OpenGL ES 2.0+. There's no
extension enabling depth-textures for OpenGL ES 1.x, so we shouldn't
allow those enums there.
This fixes oes_packed_depth_stencil-depth-stencil-texture_gles1 on i965
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ctx->Extensions.KHR_texture_compression_astc_ldr is set regardless of
the API that's used, so checking for those direcly will always enable
extensions when they are supported by the driver.
But there's no extension enabling ASTC for OpenGL ES 1.x, so we
shouldn't allow those enums there.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
ctx->Extensions.ARB_ES3_compatibility is set regardless of the API
that's used, so checking for those direcly will always enable
extensions when they are supported by the driver.
But there's no extension enabling ETC2 for OpenGL ES 1.x, so we
shouldn't allow those enums there.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There's no extension enabling S3TC formats on OpenGL ES 1.x, so we
shouldn't allow these even if the driver can support it. So let's check
for EXT_texture_compression_s3tc instead of ANGLE_texture_compression_dxt,
which is supported on all other OpenGL variations.
We also need to use _mesa_has_EXT_texture_compression_s3tc() instead of
checking the driver cap directly, otherwise we end up enabling this on
OpenGL ES 1.x, as the API isn't checked.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
_mesa_has_FOO_bar() knows about the APIs these extensions should be
supported under, so let's use that to simplify these checks a bit.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This makes the logic a little bit easier to follow, and reduce a bit of
repetition.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This makes the logic a little bit easier to follow; this is *either*
about ES2 compatibility *or* about gles. GL_RGB565 was added already in
OpenGL ES 1.0.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Using the _mesa_has_FOO_bar helpers is generally more safe and should
generally be prefered over checking driver-caps like this code did,
because the _mesa_has_FOO_bar helpers also verify the API type and
version.
This shouldn't have any practical effect here, as this function only
gets called for OpenGL ES 3.x right now. But if this was to change in
the future, this makes the function behave a lot more predictable.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
S3_s3tc is the extension that enables this functionality on desktop, so
let's check for that one. The _mesa_has_S3_s3tc() helper already
verifies the API according to the extension-table.
As for the second hunk, we currently already only expose
EXT_texture_compression_s3tc on desktop so by using the helper instead,
we get rid of this detail here, and once we enable it for GLES we'll
automaticall get the interaction right.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
_mesa_es3_error_check_format_and_type isn't specific to OpenGL ES 3.x,
it applies to all versions of OpenGL ES. So let's rename it to reflect
this.
While we're at it, let's also rename a helper function it uses similarly.
As the helper is static, we can also remove the namespacing-prefix from
the name.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
All other _mesa_has_foo functions return bool rather than GLboolean, so
let's follow that style here as well.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The new approach is that samplers don't get unbound even if they won't be used
in a draw and we should just leave them be as well.
Fixes a regression in multiple windows games using gallium nine and nouveau.
v2: adjust num_samplers to keep track of the highest sampler bound
v3: rework how to set the new value of num_samplers
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106577
Fixes: 4d6fab245e
"cso: don't track the number of sampler states bound"
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Fixes -Wstringop-truncation compiler warnings.
See f836d799f9 "intel/decoder: use snprintf(..., "%s", ...) instead of strncpy"
Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
Android has cpufeatures library but pinning of threads is not supported
PIPE_OS_LINUX code path causes build error due to sched_getcpu() unavailable
thus we need to avoid setting HAVE_SCHED_GETCPU for Android
Fixes: 48f2160 ("st/mesa: regularly re-pin driver threads to the CCX where the app thread is")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This patch fixes this build error.
CC tests/xvmc_bench.o
In file included from tests/xvmc_bench.c:35:
tests/testlib.h:38:10: fatal error: 'X11/Xlib.h' file not found
^~~~~~~~~~~~
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
We can mark the buffer unclean if it's ever bound as a TBO,
SSBO, ABO, or image.
This improves
dEQP-GLES3.performance.buffer.data_upload.function_call.map_buffer_range.new_specified_buffer.flag_write_full.stream_draw
from 9.58 MB/s to 451.17 MB/s.
v2: Track buffer cleanliness as a function of bindings (Ilia).
v3: virgl_modify_clean --> virgl_dirty_res (Erik)
Tested-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
We flush everytime the command buffer (16 kB) is full, which is
quite costly.
This improves
dEQP-GLES3.performance.buffer.data_upload.function_call.buffer_data.new_buffer.usage_stream_draw
from 111.16 MB/s to 1930.36 MB/s.
In addition, I made the benchmark produce buffers from 0 --> VIRGL_MAX_CMDBUF_DWORDS * 4,
and tried ((VIRGL_MAX_CMDBUF_DWORDS * 4) / 2), ((VIRGL_MAX_CMDBUF_DWORDS * 4) / 4), etc.
I didn't notice any clear differences, so let's just go with the most obvious
heuristic.
Tested-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Tested running WebGL aquarium on Nvidia host (10,000 fishes)
This moves us from 7 fps to 9 fps. After quadrupling, performance
gains diminish.
v2: Remove change ID (Erik)
Tested-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Pipeline state pending bits should be taken into account when copying
results.
In the particular bug below, the results of the
vkCmdCopyQueryPoolResults() command was being overwritten by the
preceding vkCmdCopyBuffer() with a same destination buffer. This is
because we copy the buffers using the 3D pipeline whereas we copy the
query results using the command streamer. Those pieces of HW work in
parallel and the results are somewhat undefined.
v2: Unconditionally flush the pipeline before copying the results
(Jason)
v3: Wrap & expressions (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108894
Cc: mesa-stable@lists.freedesktop.org
The calculated length of a line may be infinite, if the coords we
get are bogus. This leads to an infinite loop in line stippling.
To prevent this test for this explicitly (although technically
on at least x86 sse it would actually work without the explicit
test, as long as we use the int-converted length value).
While here also get rid of some always-true condition.
Note this does not actually solve the root cause, which is that
the coords we receive are bogus after clipping. This seems a difficult
problem to solve. One issue is that due to float arithmetic, clip w
may become 0 after clipping if the incoming geometry is
"sufficiently degenerate", hence x/y/z ndc (and window) coords will
be all inf (or nan). Even with w not quite 0, I believe it's possible
we produce values which are actually outside the view volume.
(Also, x=y=z=w=0 coords in clipspace would be not considered subject
to clipping, and similarly result in all NaN coords.) We just hope for
now other draw stages (and rasterizers) can handle those relatively
safely (llvmpipe itself should be sort of robust against this, certainly
converstion to fixed point will produce garbage, it might fail a couple
assertions but should neither hang nor crash otherwise).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Update to the internal master as of 2018-11-15.
This has a lot of gratuitous whitespace change, but on the plus
side it's built using the same tooling that's used for AMDVLK,
which should help going forward.
Our choices here are simply redundant as long as sin.flags is set
correctly.
(v2:
- remove unused function parameter)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If all layers are bound we can perform a fast color or depth clear
instead of iterating over all layers. This has the advantage
to avoid trashing the framebuffer for nothing if you we end up by
doing a fast clear when calling radv_clear_image_layer(), and
clearing all layers in one shot is obviously faster.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fix build error.
CXXLD pipe_msm.la
../../../../src/gallium/drivers/freedreno/.libs/libfreedreno.a(freedreno_batch.o): In function `batch_init':
src/gallium/drivers/freedreno/freedreno_batch.c:54: undefined reference to `fd_device_version'
src/gallium/drivers/freedreno/freedreno_batch.c:59: undefined reference to `fd_submit_new'
src/gallium/drivers/freedreno/freedreno_batch.c:61: undefined reference to `fd_submit_new_ringbuffer'
src/gallium/drivers/freedreno/freedreno_batch.c:64: undefined reference to `fd_submit_new_ringbuffer'
src/gallium/drivers/freedreno/freedreno_batch.c:66: undefined reference to `fd_submit_new_ringbuffer'
src/gallium/drivers/freedreno/freedreno_batch.c:70: undefined reference to `fd_submit_new_ringbuffer'
Fixes: b4476138d5 ("freedreno: move drm to common location")
Fixes: aa0fed10d3 ("freedreno: move ir3 to common location")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Sadly, the 3 games I tested (DeusEx:MD, DiRT Rally, DOTA 2) are unaffected
by the overallocation, because I guess their buffers don't fall into
the small range below a power-of-two size.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
- the slab buffer size increased from 128 KB to 2 MB (PTE fragment size)
- the max suballocated buffer size increased from 64 KB to 256 KB,
this increases memory usage because it wastes memory
- the number of suballocators increased from 1 to 3 and they are layered
on top of each other to minimize unused space in slabs
The final increase in memory usage is:
DeusEx:MD: 1.8%
DOTA 2: 1.75%
DiRT Rally: 0.2%
The kernel driver will also receive fewer buffers.
Vulkan and Gallium don't use Mesa's gl_program data structure, so they
can't poke at 'prog'. But we can simply use the copy of the shader info
stored with the NIR shader, which is guaranteed to exist.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Otherwise, I get this error:
main/egldevice.h:54:13: error: ‘NULL’ undeclared (first use in this function)
dev = NULL;
^~~~
with this config:
./autogen.sh --enable-gles1 --enable-gles2 --with-platforms='surfaceless' --disable-glx
--with-dri-drivers="i965" --with-gallium-drivers="" --enable-gbm
v3: Use stddef.h (Matt)
v4: Modify commit message (Eric)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
The common truncf(x + 0.5) fails for the floating-point value just less
than 0.5 (nextafterf(0.5, 0.0)). nextafterf(0.5, 0.0) + 0.5, after
rounding is 1.0, thus truncf does not produce the desired value.
The solution is to add nextafterf(0.5, 0.0) instead of 0.5 before
truncating. This works for all values.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Introduce a new driver-private transfer flag RADEON_TRANSFER_TEMPORARY
that specifies whether the caller will use buffer_unmap or not. The
default behavior is set to permanent maps, because that's what drivers
do for Gallium buffer maps.
This should eliminate the need for hacks in libdrm. Assertions are added
to catch when the buffer_unmap calls don't match the (temporary)
buffer_map calls.
I did my best to update r600 for consistency (r300 needs no changes
because it never calls buffer_unmap), even though the radeon winsys
ignores the new flag.
As an added bonus, this should actually improve the performance of
the normal fast path, because we no longer call into libdrm at all
after the first map, and there's one less atomic in the winsys itself
(there are now no atomics left in the UNSYNCHRONIZED fast path).
Cc: Leo Liu <leo.liu@amd.com>
v2:
- remove comment about visible VRAM (Marek)
- don't rely on amdgpu_bo_cpu_map doing an atomic write
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Originally the driver reported GL_FRAMEBUFFER_UNSUPPORTED in all cases,
adding more specific error messages was not correct and broke many tests.
Mostly revert this and only report GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT
for MESA_FORMAT_R_SRGB8.
Fixes: ebcde34545
i965: be more specific about FBO completeness errors
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108805
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The format is emulated by using ISL_FORMAT_L8_SRGB, therefore we need to
force swizzles for the GBA channels. However, doing this only based on the
data type GL_RED breaks other formats, therefore, test specifically for the
format.
Fixes: c5363869d4
i965: Force zero swizzles for unused components in GL_RED and GL_RG
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
vtest doesn't implement the according API and would segfault:
Program received signal SIGSEGV, Segmentation fault.
#0 0x0000000000000000 in ?? ()
#1 in virgl_fence_server_sync at
src/gallium/drivers/virgl/virgl_context.c:1049
#2 in st_server_wait_sync at
src/mesa/state_tracker/st_cb_syncobj.c:155
so just don't do the call when the function pointers are not set.
Fixes dEQP:
dEQP-GLES3.functional.fence_sync.wait_sync_smalldraw
dEQP-GLES3.functional.fence_sync.wait_sync_largedraw
Fixes: d1a1c21e76
virgl: native fence fd support
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Robert Foss <robert.foss@collabora.com>
Avoids:
Conditional jump or move depends on uninitialised value(s)
at 0x9E2B39F: virgl_vtest_winsys_resource_cache_create (virgl_vtest_winsys.c:379)
by 0x9E2725F: virgl_buffer_create (virgl_buffer.c:169)
by 0x9E246D5: virgl_resource_create (virgl_resource.c:60)
by 0xA0C1B9F: bufferobj_data (st_cb_bufferobjects.c:344)
by 0xA0C1B9F: st_bufferobj_data (st_cb_bufferobjects.c:390)
by 0x9F4ACE3: vbo_use_buffer_objects (vbo_exec_api.c:1136)
by 0xA0C68C3: st_create_context_priv (st_context.c:416)
by 0xA0C707A: st_create_context (st_context.c:598)
by 0x9F81C6B: st_api_create_context (st_manager.c:918)
by 0x9BBE591: dri_create_context (dri_context.c:161)
by 0x9BB6931: driCreateContextAttribs (dri_util.c:473)
by 0x4E97A44: drisw_create_context_attribs (drisw_glx.c:630)
by 0x4E7C591: glXCreateContextAttribsARB (create_context.c:78)
Uninitialised value was created by a stack allocation
at 0x9E2B249: virgl_vtest_winsys_resource_cache_create (virgl_vtest_winsys.c:342)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Robert Foss <robert.foss@collabora.com>
We normally call with stderr which is unbuffered, so this won't affect
that, but it does let me call nir_print_shader(nir, fopen("log", "w+"))
from gdb and actually get the whole shader in my file.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
I've been using this with the kmsro series to test v3d on VKMS without my
old KMS hack in the v3d kernel driver. KMSRO still needs some cleanup,
but v3d RO support seems reasonable.
Improves performance in Talos by about 15% (and significant improvements
in RotR and possibly other but did not bench with final patch) on
kernel 4.19 and earlier.
On 4.20+ a similar effect comes from
433ca054949a "drm/amdgpu: try allocating VRAM as power of two"
v2: Do not impact the alignment of the physical memory.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Since 1285f71d3e landed, it needs to provide apps with proper sample
position for MSAA.
Currently no way to query this to hw, these are taken from blob driver.
Fixes: dEQP-GLES31.functional.texture.multisample.samples_#.sample_position
Signed-off-by: Rob Clark <robdclark@gmail.com>
We set equiv bit in SP_FS_CTRL_REG0. Somehow the hw doesn't hang with
this mismatched config, but does run slower. It is faster with either
neither bit set, or both bits set, but both is the fastest of the three
configurations. Worth a bit over 10% gain in glmark2.
Spotted-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@gmail.com>
blip_fp uses GENERIC as input, so blit_vp should match for linking
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Adds all missing texture related logic. For everything to work it also
needs changes to ir2/fd2_program, which are part of the ir2 update patch.
Note: it needs rnndb update
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
[remove stray patch]
Signed-off-by: Rob Clark <robdclark@gmail.com>
200: 256KiB GMEM A200 (imx53)
201: 128KiB GMEM A200 (imx51)
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Rob Clark <robdclark@gmail.com>
On older gens, the CLIP_ADJ bitfields were actually 3.6 fixed point.
Which might make more sense. Although this formula comes up with values
pretty close to what blob does for various viewport sizes (for at least
a5xx and a6xx), and seems to work.
Signed-off-by: Rob Clark <robdclark@gmail.com>
f6131d4ec7 had the side effect of enabling LRZ w/ 32b depth buffers.
But there are some bugs with this, which aren't fully understood yet,
so for now just skip LRZ w/ z32..
Fixes: f6131d4ec7 freedreno/a6xx: Clear z32 and separate stencil with blitter
Signed-off-by: Rob Clark <robdclark@gmail.com>
We generate an IB to clear the gmem at flush time and jump to it
before rendering each tile. This lets us get rid of the command stream
patching for gmem offsets.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
For RGB surfaces (for example) we don't really care that the colormask
is 0x7 instead of 0xf. This should not trigger clear_with_quad()
slowpath.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If we can't clear all the buffers with pctx->clear() (say, for example,
because of ColorMask), push the buffers we *can* clear with pctx->clear()
first. Tilers want to see clears coming before draws to enable fast-
paths, and clearing one of the attachments with a quad-draw first
confuses that logic.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Move (most of) the ir3 compiler to src/freedreno/ir3 so that it can be
re-used by some future vulkan driver. The parts that are gallium
specific have been refactored out and remain in the gallium driver.
Getting the move done now so that it can happen before further
refactoring to support a6xx specific instructions.
NOTE also removes ir3_cmdline compiler tool from autotools build since
that was easier than fixing it and I normally use meson build. Waiting
patiently for the day that we can remove *everything* from the autotools
build.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Split the parts that are gallium specific into ir3_gallium so the rest
can move to a common location outside of gallium.
Signed-off-by: Rob Clark <robdclark@gmail.com>
A bit annoying to have to copy into our own struct. But this is
something the compiler really needs to know, at least on earlier
generations where streamout is implemented in shader.
Signed-off-by: Rob Clark <robdclark@gmail.com>
So I can drop env2u() helper from freedreno_util.h and get rid of one
small ir3 dependency on gallium/freedreno
Signed-off-by: Rob Clark <robdclark@gmail.com>
Just massive search/replace for the most part.
Step towards removing ir3 dependency on disasm.h which is shared by
a2xx. One step closer to being able to move ir3 out of gallium.
Signed-off-by: Rob Clark <robdclark@gmail.com>
So that we can re-use at least parts of it for vulkan driver, and so
that we can move ir3 to a common location (which uses fd_bo to allocate
storage for shaders)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Prep work to move drm to a common location.
Slightly hacky, but the softpin debug flag is only temporary.
Signed-off-by: Rob Clark <robdclark@gmail.com>
The compiler doesn't know that ny != 0, so x might be uninitialized for
the printf at the end.
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
L3 allocation table in h/w specification recommends using 4 KB
granularity for programming allocation fields in L3CNTLREG.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
L3 allocation table in h/w specification recommends using 4 KB
granularity for programming allocation fields in L3CNTLREG.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Use L3 configuration specified in h/w specification.
V2: Drop configs which do under allocation of l3 cache.
Bump up the comment above table.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Scons and autotools don't define it, and as of last commit nothing
uses it.
`VERSION` is also a generic enough name that something somewhere will
eventually clash, and we don't want to repeat the LLVM `DEBUG` fiasco.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Everything else uses PACKAGE_VERSION, so let's be consistent, and
VERSION and PACKAGE_VERSION are currently defined to be the same in
meson and android, while VERSION is undefined in autotools and scons.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Per chapter 3.2 "Instances":
> Providing a NULL VkInstanceCreateInfo::pApplicationInfo or providing
> an apiVersion of 0 is equivalent to providing an apiVersion of
> VK_MAKE_VERSION(1,0,0).
Reported-by: Niklas Haas <git@haasn.xyz>
Fixes: 8c048af589 "anv: Copy the appliation info into the instance"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If glGetTexImage or glGetnTexImage is called with a level that doesn't
exist, we get an error message on this form:
Mesa: User error: GL_INVALID_VALUE in glGetTexImage(depth = 0)
This is clearly nonsensical, because these APIs don't even have a
depth-parameter. The reason is that get_texture_image_dims() return
all-zero dimensions for non-existent texture-images, and we go on to
validate these dimensions as if they were user-input, because
glGetTextureSubImage requires checking.
So let's split this logic in two, so glGetTextureSubImage can have
stricter input-validation. All arguments that are no longer validated
are generated internally by mesa, so there's no use in validating them.
Fixes: 42891dbaa1 "gettextsubimage: verify zoffset and depth are correct"
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
This check is the only part of dimensions_error_check that isn't about
error-checking the offset and size arguments of
glGet[Compressed]TextureSubImage(), so it doesn't really belong in here.
This doesn't make a difference right now, apart for changing the
presedence of this error. But it will make a difference for the next
patch, where we no longer call this method from the non-sub tex-image
getters.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
This error checking is the same for teximage and texsubimage getters, so
let's factor it out to its own function.
This will be useful when getteximage and gettexsubimage gets their own
error checking routines a bit later.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
This will be useful when we split error-checking for getteximage and
gettexsubimage later.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
The explanation quotes the spec on the following wording to justify the
error:
"An INVALID_VALUE error is generated if xoffset + width is greater than
the texture’s width, yoffset + height is greater than the texture’s
height, or zoffset + depth is greater than the texture’s depth."
However, this shouldn't generate an error in the case where *all three*
of width, xoffset and the texture's width are zero. In this case, we end
up generating an unspecified error.
So let's remove this check, and instead make sure that we consider this
as an empty texture.
So let's not generate an error, there's non mandated in the spec in
xoffset/yoffset/zoffset = 0 case. We already avoid doing any work in
this case, because of the final, non-error generating check in this
function.
Fixes: b37b35a5d2 "getteximage: assume texture image is empty for non defined levels"
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
This function has been core since OpenGL 4.3, so naming the
implementation and reporting erros using an ARB-suffix can be
confusing.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
When a shader program is de-serialized the gl_shader_program passed in
may actually still hold memory allocations for the transform feedback
varyings. If that is the case, free the varying names and reallocate
the new storage for the names array.
This fixes a memory leak:
Direct leak of 48 byte(s) in 6 object(s) allocated from:
in malloc (/usr/lib64/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so+0xdb880)
in transform_feedback_varyings ../../samba/mesa/src/mesa/main/transformfeedback.c:875
in _mesa_TransformFeedbackVaryings ../../samba/mesa/src/mesa/main/transformfeedback.c:985
...
Indirect leak of 42 byte(s) in 6 object(s) allocated from:
in __interceptor_strdup (/usr/lib64/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so+0x761c8)
in transform_feedback_varyings ../../samba/mesa/src/mesa/main/transformfeedback.c:887
in _mesa_TransformFeedbackVaryings ../../samba/mesa/src/mesa/main/transformfeedback.c:985
Fixes: ab2643e4b0
glsl: serialize data from glTransformFeedbackVaryings
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
We used the layer count which results in an off by one error.
Not sure this really affects anything.
Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Use VAO binding information in feedback rendering. In theory
it should reduce the amount of buffer objects scheduled for rendering.
Feedback rendering is implemented in a crude way anyhow, so I do not
expect much gain here. But for the sake of code reuse we should
use the same code for the same task. And finally if feeback rendering
may get improved the array setup is already well done there.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
The change removes the reference that is held on the entries of the
vbuffers[] array. The new code does not do that anymore as following
the code into draw_set_vertex_buffers() the draw context holds an
other reference as long as it is reset down the function again.
So it should be already by that argument save to remove that
additional reference count.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Factor out vertex array setup routines from the array state atom.
The factored functions will be used in feedback rendering in the
next change.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
In st_atom_array, we only need to unmap the upload buffer that
was actually used.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
In st_atom_array, we only need to care for unmapping the upload buffer
if we actually used it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
dnz flag only applies for multiplications (e.g. to make 0 * Infinity
becomes 0 instead of NaN). Once we optimize a MAD into an ADD, the dnz
flag no longer makes sense, and upsets the GM107 emitter (since it looks
at the ftz and dnz flags together).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
We don't need to flush anything before these two commands as well.
This is because they have to be externally synchronized, so the
app should have called CmdPipelineBarrier() prior to that and the
driver should have flushed the caches.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The rules encoded in this code also applies to OpenGL ES 3.0 and up,
but the per-enum validation has already been taught about these rules.
So let's get rid of this duplicate, narrow version of the validation.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
ctx->Extensions.ARB_timer_query is set based on the driver-
capabilities, not based on the context type. We need to check
against _mesa_has_ARB_timer_query(ctx) instead to figure out
if the extension is really supported. We also need to check for
EXT_disjoint_timer_query for GLES-support.
This shouln't have any functional effect, as this entry-point is only
valid on desktop GL, or on GLES with EXT_disjoint_timer_query in the
first place. But if this gets added to the core of a future version
of ES, this should be a step in the right direction.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
ctx->Extensions.ARB_query_buffer_object is set based on the driver-
capabilities, not based on the context type. We need to check against
_mesa_has_ARB_query_buffer_object(ctx) instead to figure out if the
extension is really supported.
This turns attempts to read queries into buffer objects on ES 3 into
errors, as required by the spec.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
ctx->Extensions.ARB_transform_feedback_overflow_query is set based on
the driver-capabilities, not based on the context type. We need to
check against _mesa_has_RB_transform_feedback_overflow_query(ctx)
instead to figure out if the extension is really supported.
This turns usage of GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW and
GL_TRANSFORM_FEEDBACK_OVERFLOW into errors on ES 3, as required by the
spec.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
ctx->Extensions.EXT_transform_feedback is set based on the driver-
capabilities, not based on the context type. We need to check against
_mesa_has_EXT_transform_feedback(ctx) instead to figure out if the
extension is really supported. We also need to check for
OES_geometry_shader.
This turns usage of GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN into an
error on ES 2, as well as usage of GL_PRIMITIVES_GENERATED on ES 3, both
as required by the spec.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
ctx->Extensions.EXT_timer_query is set based on the driver-
capabilities, not based on the context type. We need to check against
_mesa_has_EXT_timer_query(ctx) instead to figure out if the extension
is really supported. We also need to check for
EXT_disjoint_timer_query, which enables the same functionality for ES.
This turns usage of GL_TIME_ELAPSED into an error on ES 3, as is
required by the spec.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
ctx->Extensions.ARB_ES3_compatibility is set based on the driver-
capabilities, not based on the context type. We need to check against
_mesa_has_ARB_ES3_compatibility(ctx) instead to figure out if the
extension is really supported.
In addition, EXT_occlusion_query_boolean should also allow this
behavior.
This shouldn't cause any functional change, as all drivers that support
ES3_compatibility should in practice enable either ES3_compatibility or
EXT_occlusion_query_boolean under all APIs that export this symbol.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
ctx->Extensions.ARB_occlusion_query2 is set based on the driver-
capabilities, not based on the context type. We need to check against
_mesa_has_ARB_occlusion_query2(ctx) instead to figure out if the
extension is really supported.
In addition, EXT_occlusion_query_boolean should also allow this
behavior.
This shouldn't cause any functional change, as all drivers that support
ARB_occlusion_query2 should in practice enable either
ARB_occlusion_query2 or EXT_occlusion_query_boolean under all APIs that
export this symbol.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
ctx->Extensions.ARB_occlusion_query is set based on the driver-
capabilities, not based on the context type. We need to check against
_mesa_has_ARB_occlusion_query(ctx) instead to figure out if the
extension is really supported. We also need to check for
ARB_occlusion_query2, as ARB_occlusion_query isn't available in core
contexts.
This turns usage of GL_SAMPLES_PASSED into an error on ES 3, as is
required by the spec.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The _mesa_has_ARB_pipeline_statistics_query(ctx)-helper will already
check the GLES-version according to the extension-table, so if this
extension would ever be back-ported to ES, we only need to update the
table to support this.
This shouln't have any functional effect.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
These enums all have the same values as their non-prefixed versions, and
there's several aliases for some of them. So let's switch to the
non-prefixed versions for simplicity.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
According to the extension spec, this was initially released in 2011,
so let's set this to the correct value.
The value of 2001 could be a copy-paste mistake, as ARB_occlusion_query
which this is based on was released then.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
EXT_occlusion_query_boolean require support for GL_ANY_SAMPLES_PASSED,
which ARB_occlusion_query doesn't supply. We need ARB_occlusion_query2
for this instead.
This is still not 100% accurate, as we also require support for the
GL_SAMPLES_PASSED_CONSERVATIVE target, which isn't guaranteed by either
ARB_occlusion_query nor ARB_occlusion_query2. But it should be trivial
to implement for any driver supporting ARB_occlusion_query2, as it can
simply be implemented as GL_ANY_SAMPLES_PASSED.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Fixes issues with following SkQP tests:
unitTest_VulkanHardwareBuffer_Vulkan_EGL_Syncs
unitTest_VulkanHardwareBuffer_Vulkan_Vulkan_Syncs
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Instead of taking a whole pipeline (which could be anything!), just take
a physical device and robust_buffer_access boolean. This makes it
easier to verify that only the things in the hash actually affect
pipeline compilation.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
It affects apply_pipeline_layout. Shaders compiled with the wrong value
will work but they may not be robust as requested by the app.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Our compile already splits UBO loads into scalars and the untyped
surface read messages we use for SSBO reads and writes only require
dword alignment.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
'post_flush' is only set to NULL for the normal clear path
(ie. only vkCmdClearColorImage() and vkCmdClearDepthStencilImage()
are affected commands).
Because these two operations have to be externally synchronized
with VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT,
it's useless to set those flags internallY.
VK_PIPELINE_STAGE_TRANSFER_BIT will wait for compute to be idle,
while VK_ACCESS_TRANSFER_WRITE_BIT will invalidate both L1 vector
caches and L2. RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2 will be superseded
by RADV_CMD_FLAG_INV_GLOBAL_L2.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Since apps also have to follow the ImageFormatProperties query,
we can disallow formats that don't allow image stores (for AMD
that would be SRGB formats).
Note that this only affects anything if the app actually decides
to use the flag.
Had someone ask for this on IRC and at least on the AMD side we
can support it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
thread_submit can be useful even without DRI_PRIME,
as it can help avoid missed pageflips.
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
The path allowing triple buffering behaviour wasn't implemented
yet for thread_submit
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
This fixes two memory leaks reported by ASAN:
Direct leak of 248 byte(s) in 1 object(s) allocated from:
in malloc (/usr/lib64/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so+0xdb880)
in r600_alloc_buffer_struct ../../samba/mesa/src/gallium/drivers/r600/r600_buffer_common.c:578
in r600_buffer_create ../../samba/mesa/src/gallium/drivers/r600/r600_buffer_common.c:600
in r600_resource_create_common ../../samba/mesa/src/gallium/drivers/r600/r600_pipe_common.c:1265
in r600_resource_create ../../samba/mesa/src/gallium/drivers/r600/r600_pipe.c:725
in pipe_buffer_create ../../samba/mesa/src/gallium/auxiliary/util/u_inlines.h:291
in update_gs_block_state ../../samba/mesa/src/gallium/drivers/r600/r600_state_common.c:1482
Direct leak of 248 byte(s) in 1 object(s) allocated from:
in malloc (/usr/lib64/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so+0xdb880)
in r600_alloc_buffer_struct ../../samba/mesa/src/gallium/drivers/r600/r600_buffer_common.c:578
in r600_buffer_create ../../samba/mesa/src/gallium/drivers/r600/r600_buffer_common.c:600
in r600_resource_create_common ../../samba/mesa/src/gallium/drivers/r600/r600_pipe_common.c:1265
in r600_resource_create ../../samba/mesa/src/gallium/drivers/r600/r600_pipe.c:722
in pipe_buffer_create ../../samba/mesa/src/gallium/auxiliary/util/u_inlines.h:291
in update_gs_block_state ../../samba/mesa/src/gallium/drivers/r600/r600_state_common.c:1489
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Fixes: 1371d65a7f
r600g: initial support for geometry shaders on evergreen (v2)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
CP DMA can only be busy when the driver copies buffers. The
only affected Vulkan commands are vkCmdCopyBuffer() and
vkCmdUpdateBuffer() (because we fallback to a copy depending on
a threshold). Clear operations are currently not concerned
because the driver always syncs after the last DMA operation.
Per the spec, these two operations have to be externally
synchronized with VK_PIPELINE_STAGE_TRANSFER_BIT.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Unnecessary as they allow the app to call vkCmdPipelineBarrier()
inside the render pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This reverts commit 1f29f4db1e.
For this to work the compiler must ensure that it never puts
the values that arrive to this helper into unsigned variables
at any point in its processing, since that would not apply sign
extension to the value and it would break the expectations here.
Unfortunately, we use uint64_t extensively to pass and copy
things around, so some times we get to this helper with values
that are not properly sign extended to 64-bit. Here is an example
for an 8-bit value that comes from a switch case:
(gdb) p /x x
$1 = 0xffffffd6
The value seems to have been sign extended to 32-bit at some point
getting proper sign extension, but then copied into a uint64_t
which wont' apply sign extension, breaking the expectations of
the assertion.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
With the current VAO layout we do not need to make these
fields a bitfield. We get a tight struct layout with this change
for VAO attributes.
v2: Change unsigned char -> GLubyte.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Factor out struct gl_vertex_format from array attributes.
The data type is supposed to describe the type of a vertex
element. At this current stage the data type is only used
with the VAO, but actually is useful in various other places.
Due to the bitfields being used, special care needs to be
taken for the glGet code paths.
v2: Change unsigned char -> GLubyte.
Use struct assignment for struct gl_vertex_format.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Instead of open coding the size computation, use the
already available gl_array_attribute::_ElementSize value.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Instead of open coding the size computation, use the
already available gl_array_attribute::_ElementSize value.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Use GL_UNSIGNED_BYTE as initialization data type
for the edge flag vertex attribute array. The same datatype
is used in the glEdgeFlagPointer function when setting the
array pointer.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
For enabling or disabling VAO arrays it is now possible to
change a set of arrays with a single call without the need to
iterate the attributes.
Make use of this technique in the vao module.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Instead of using gl_array_attributes::Enabled use the
much more compact representation stored in
gl_vertex_array_object::Enabled using the corresponding bits.
Keep the glGet changes in a seperate patch at least for review.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Instead of using gl_array_attributes::Enabled use the
much more compact representation stored in
gl_vertex_array_object::Enabled using the corresponding bits.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Mark the up to now derived bitfield value now as primary
value by removing the underscore.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
radeonsi has 3 driver threads (glthread, gallium, winsys), other drivers
may have 2 (glthread, gallium), so it makes sense to pin them to a random
CCX and keep that irrespective of the app thread.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This is used when glthread is disabled.
Mesa pretty much chases the app thread on the CPU.
The performance is the same as pinning the app thread.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This moves nir_shader_clone() to the driver-specific compile function,
rather than the shared src/intel/compiler code. This allows i965 to do
key-specific passes before calling brw_compile_*. Vulkan should not
need this cloning as it doesn't compile multiple variants.
We do need to continue cloning in the compute shader code because we
lower various things in NIR based on the SIMD width.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
We won't have a var to load from, so don't try to the processing
required if we don't need it.
This avoids crashes in:
dEQP-VK.spirv_assembly.instruction.compute.variable_pointers.compute.workgroup_two_buffers
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For variable pointers we really don't want to case the pointers to int
without a good reason, just add a wrapper for bcsel loading and result
storing.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Meson test has a concepts of suites, which allow tests to be grouped
together. This allows for a subtest of tests to be run only (say only
the tests for nir). A test can be added to more than one suite, but for
the most part I've only added a test to a single suite, though I've
added a compiler group that includes nir, glsl, and glcpp tests.
To use this you'll need to invoke meson test directly, instead of ninja
test (which always runs all targets). it can be invoked as:
`meson test -C builddir --suite $suitename` (meson test has addition
options that are pretty useful).
Tested-By: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Currently we detect the module and if missing, the glXGetMsc* API is
effectively a stub, always returning false.
This is what effectively has been happening with our meson build :-(
Thus users have no chance of using it - they cannot even distinguish
if the failure is due to a misconfigured build.
There's no reason for keeping xf86vidmode optional - it has been
available in all distributions for years.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Fixes: a47c525f32 "meson: build glx"
Fixes an assertion in SoTTR.
Fixes: dd0172e865 ("radv: Use structured intrinsics instead of indexing workaround for GFX9.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The existing backend code assumed that if VARYING_SLOT_CLIP_DIST0
was written, then VARYING_SLOT_CLIP_DIST1 would be as well. That's
true with the current lowering, but not necessary if there are 4 or
fewer clip distances. Separate out the checks to allow this.
The new NIR-based lowering will trigger this case, which would have
caused backend validation errors (src is null) without this patch.
Reviewed-by: Eric Anholt <eric@anholt.net>
The way nir_lower_clip_vs() works with store_output intrinsics makes a
ton of assumptions about the driver_location field.
In i965 and iris, I'd rather do this lowering early and work with
variables. v3d may want to switch to that as well, and ir3 could too,
but I'm not sure exactly what would need updating. For now, handle
both methods.
Reviewed-by: Eric Anholt <eric@anholt.net>
I posted a load of hacks before to do this, Jason suggested this,
just check the deref mode, not the variable mode and delay getting
the variable until we know the type.
avoids crashes when derefing shared memory pointers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
../src/intel/compiler/brw_fs_nir.cpp:3534:46: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘int’ [-Wsign-compare]
assert(nir_intrinsic_write_mask(instr) ==
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
(1 << instr->num_components) - 1);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This was caused by 6339aba775 which added these completely valid
checks. However clang likes to complain about signedness mismatches.
Fixes: 6339aba775 "intel/compiler: Lower SSBO and shared..."
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
It's not at all intel-specific; the formula is dictated by OpenGL and
Vulkan. The only intel-specific thing is that we need the lowering. As
a nice side-effect, the new version is variable-group-size ready.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Fixes: d971a4230d "loader: Factor out the common driver
opening logic from each loader."
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
This allows to fast clear the depth part (or the stencil part)
of a depth+stencil surface when HTILE is enabled. I didn't test
on GFX8, so it's disabled currently.
This gives a very nice boost, for example when clearing the depth
aspect of a 4096x4096 D32_SFLOAT_S8_UINT image (18x faster).
BEFORE: 235 us
AFTER: 13 us
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This path will be eventually improved later but as it's only
used on SI (or with RADV_DEBUG=noibs), I'm not sure if that
matters much.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The chained submission is the fastest path and it should now
be used more often than before. This removes some EOP events.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
At least GC2000 seems to push some dirt from the PE color cache into
the last bound render target when drawing depth only. Newer cores
seem to behave properly and don't do this, but I have found no way
to fix it on GC2000. Flushes and stalls don't seem to make any
difference.
In order to stop the core from pushing the dirt into a precious real
render target, plug in dummy buffer when rendering without a color
buffer.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
DCC and FMASK also imply a fast-clear eliminate, so it should be
safe to reset the predicate unconditionally. We still only skip
FMASK or CMASK decompressions for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We read 4 values out of sample_locs_8x, so make sure the array is
big enough.
Fixes: ac76aeef20 ("radeonsi: switch back to standard DX sample positions")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
With 5d517a streamout info is only attached to the shader for which the
transform feedback is actually recorded, but the driver set the context info
with each state submitted, thereby always using the info data that was
attached to the vertex shader.
Pass the streamout stride info to the context only from the shader that
actually has outputs. (Thanks to Marek Olšák for pointing me in the right
direction)
Fixes regresion with: dEQP-GLES31.functional.tessellation.invariance.*
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108734
Fixes: 5d517a599b
st/mesa: Don't record garbage streamout information in the non-SSO case.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
FRAMEBUFFER_INCOMPLETE_DIMENSIONS is not supported for GLES 3.0 and later and
not defined for Desktop OpenGL. Instead use FRAMEBUFFER_UNSUPPORTED like it
was done before.
Thanks to Iago Toral and Andrey Simiklit for pointing out the problem and the
details.
Fixes: ebcde34545
i965: be more specific about FBO completeness errors
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The structure qdws is not allocated at this point, nor is the
file descriptor set to it's member. Use the fd directly instead.
Fixes: d1a1c21e76
virgl: native fence fd support
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Emulate MESA_FORMAT_R_SRGB8 by using L8_UNORM_SRGB. This is possible
because component swizzling is handled based on the mesa format and,
hence, the a r001 swizzling can be used to correct the components.
Enables and makes pass (tested on Kabylake)
dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.*
dEQP-GLES31.functional.texture.filtering.cube_array.formats.sr8*
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
The driver was returning GL_FRAMEBUFFER_UNSUPPORTED for all cases of an
incomplete fbo, be a bit more specific about this following the description
of glCheckFramebufferStatus.
This helps to keeps dEQP happy when adding EXT_texture_sRGB_R8 support.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
On nv50, certain operations must happen on regs below 64, due to
encoding requirements. First of all, we add infrastructure to enforce
this. Secondly we change the spill order to first spill RIG nodes that
are unconstrained, followed by ones that are.
This makes the gamecube logo shadertoy compile properly. Curiously, if
we adjust the spill order so that we first spill the constrained RIG
nodes instead, the RA also succeeds. However it seems more logical to
first spill the unconstrained ones.
While we're at it, drop the nv50 max register to reserve r127 as the
zero register of last resort (r63 is preferred).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Karol Herbst <kherbst@redhat.com>
Instead of the size restriction existing in two places, and potentially
being applied twice, we move this together. Ops with 16-bit register
addresses can only take a short reg, and ops with immediates can only
take a short reg.
Of course we leave the immediate 0 in place since we know that it will
be replaced by r63/r127 down the line, so don't treat zeroes as an
immediate.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
We removed the op from the BB, but it was still listed in its sources'
uses. This could trip up some logic down the line which analyzes all the
uses of an l-value, e.g. spilling.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Previously we would print errors on the console like:
libEGL debug: EGL user error 0x3001 (EGL_NOT_INITIALIZED) in eglInitialize
When we had everything we needed for:
libEGL debug: EGL user error 0x3001 (EGL_NOT_INITIALIZED) in eglInitialize: DRI2: failed to find EGLDevice
(for a gbm error in my case)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
I copied the code from egl_dri2.c, but the functionality was equivalent
between all the loaders other than their particular environment variables.
v2: Drop the logging function equivalent to loader_default_logger()
(requested by Eric, Emil). Move the SCons workaround across. Drop
the now-unused driGetDriverExtensions() declaration that was lost in a
rebase.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
I need other types from the header now, and "gl.h is big" is not a good
reason to duplicate definitions.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The only thing you do with a dri driver handle is get the extensions
pointer, so just fold it in to simplify the callers.
v2: Add the declaration of driGetDriverExtensions() that got lost in a
rebase.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
You can tell by "Mesa/configs/default" how old this is. Your build system
really has to provide the DEFAULT_DRIVER_DIR, or other loaders will break.
v2: Move the bad (non-prefix-dependent) define to the SConscript to avoid
breaking it.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
After doing a bunch of benchmarks, primitive binning helps
some games like The Talos Principle (+5%) or Serious Sam 2017
(+3%). For other titles, either it doesn't change anything or
it hurts very few (less than 1%).
This only affects GFX9.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We can only start parsing commands from the head pointer. This was
working fine up to now because we only dealt with a "made up" ring
buffer (generated by aub_write) which always had its head at 0.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Toni Lönnberg <toni.lonnberg@intel.com>
According to the EGL_EXT_image_dma_buf_import spec, creating an EGL
image with a DRM format not supported should yield the BAD_MATCH
error :
"
* If <target> is EGL_LINUX_DMA_BUF_EXT, and the EGL_LINUX_DRM_FOURCC_EXT
attribute is set to a format not supported by the EGL, EGL_BAD_MATCH
is generated.
"
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 20de7f9f22 ("egl/dri2: support for creating images out of dma buffers")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
gbm_bo_create() was presumably meant to originally accept gbm_bo_format
enums, but it's accepted GBM_FORMAT_* tokens since the dawn of time.
This is good, since gbm_bo_format is rarely used and covers a lot less
ground than GBM_FORMAT_*.
Change the documentation to refer to both; this involves removing a 'see
also' for gbm_bo_format, since we can't also use \sa to refer to a
family of anonymous #defines.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reported-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
This reverts commit 647c2b90e9. There was
one recently-introduced bug in ac for dvec3 loads, but the other test
failures were actually bugs in the tests. See
9429e621c4
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The handles exported need to be on the KMS device's fd, anything else is
failure. Also, this code is assuming that the scanout resource has been
created already, so assert it.
The DRI3 create_with_modifiers paths don't set tmpl.bind to SCANOUT or
SHARED, with the theory that given that you've got modifiers, that's all
you need. However, we were looking at the tmpl.bind for setting up the
KMS handle in the renderonly case, so we'd end up trying to use vc4's
handle on the hx8357d fd.
Fixes: 84ed8b67c5 ("vc4: Set shareable BOs as T tiled if possible")
Handle all cases in calculation of layers count for isl_view
taking into account texture view and image unit.
st_convert_image was taken as a reference.
When u->Layered is true the whole level is taken with respect to
image view. In other case only one layer is taken.
v3: (Józef Kucia and Ilia Mirkin)
- Rewrote patch by taking st_convert_image as a reference
- Removed now unused get_image_num_layers function
- Changed commit message
v4: (Jason Ekstrand)
- Added assert
Fixes: 5a8c8903
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107856
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We have a bunch of code to do this in the back-end compiler but it's
fairly specific to typed surface messages and the way we emit them.
This breaks it out into NIR were it's easier to do things a bit more
generally. It also means we can easily share the code between the vec4
and FS back-ends if we wish.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This also changes spirv_to_nir and glsl_to_nir to set them. The one
place that doesn't set them is shared memory access lowering in
nir_lower_io. That will have to be updated before any consumers of it
can effectively use these new alignments.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
The new helpers can generate any pack/unpack operation including those
for which we do not have specific opcodes and they express a bitcast in
terms of these pack/unpack operations. In particular, the new helpers
properly handle 8-bit types.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The pattern of adding or multiplying an integer by an immediate is
fairly common especially in deref chain handling. This adds a helper
for it and uses it a few places. The advantage to the helper is that
it automatically handles bit sizes for you.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
This assert won't catch all mistakes with this helper but it will at
least ensure that the top bits are all zero or all one which should help
catch bugs.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Section 3.7 (Identifiers) of the GLSL spec says:
However, as noted in the specification, there are some cases where
previously declared variables can be redeclared to change or add
some property, and predeclared "gl_" names are allowed to be
redeclared in a shader only for these specific purposes. More
generally, it is an error to redeclare a variable, including those
starting "gl_".
This patch should fix piglit tests:
clip-distance-redeclare-without-inout.frag
clip-distance-redeclare-without-inout.vert
However, this causes a regression in
clip-distance-out-values.shader_test. A fix for that test has been sent
to the piglit list for review:
https://patchwork.freedesktop.org/patch/255201/
As far as I understood following mailing thread:
https://lists.freedesktop.org/archives/piglit/2013-October/007935.html
looks like we have accepted to remove an ability to change qualifiers
but have not done it yet. Unless I missed something)
v2 (idr): Move 'earlier->data.mode != var->data.mode' test much earlier
in the function. Add special handling for gl_LastFragData.
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
I screwed up a rebase over a refactor and didn't notice locally because
the uncommitted refactor hid the issue.
Fixes: c973364967 "egl: add missing glvnd entrypoint for EGL_ANDROID_blob_cache"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Both BRW_SFID_SAMPLER and GEN6_SFID_DATAPORT_SAMPLER_CACHE are getting
disassembled as "sampler", which is misleading for assembler tool.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Fixes dEQP-EGL.functional.get_proc_address.extension.egl_android_blob_cache
on builds with glvnd enabled.
Fixes: 6f5b57093b "egl: add support for EGL_ANDROID_blob_cache"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Currently we detect when a breaking commit:
- has landed in stable, and
- is referenced by a untagged fix in master
Yet we did not consider the case of breaking commit:
- prior to the branchpoint, and
- is referenced by a untagged fix in master
Addressing the latter is extremely slow, due to the size of the lookup.
That said, we can trivially use the existing is_sha_nomination() helper
to catch reverts.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Currently we match on:
- any arbitrary length of,
- any a-z A-Z and 0-9 characters
At the same time, a commit sha consists of lowercase hexadecimal
numbers. Any sha shorter than 8 characters is ambiguous - in some cases
even 11+ are required.
So change the pattern to a-f0-9 and adjust the length to 8-40.
As we're here we could use a single grep, instead of the grep/sed combo.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Having a separate script to handle the fixes tag, brings a number of
issues, so let's fold it in get-pick-list.sh.
v2:
- pass the sha as argument to the function
- Keep original sed pattern
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
As the comment in get-typod-pick-list.sh says, there's little point in
having a duplicate file.
Add the new pattern + tag to get-pick-list.sh and nuke this file.
v2:
- pass the sha as argument to the function
- grep -q instead of using a variable (Eric)
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
With later commits we'll fold all the different scripts into one.
Add the explicit prefix, so that we know the origin of the nomination
v2:
- pass the sha as argument to the function
- swap $tag = none for an else statment (Juan)
- grep -q instead of using a variable (Eric)
- print the tag and commit oneline separately (Eric)
v3:
- drop unused "tag=none" assignment (Juan)
- typo nomination
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v2)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
A while back we agreed that having a live/staging branch is beneficial.
Sadly we forgot to document that, so here is my first attempt.
Document the caveat that the branch history is not stable.
CC: Andres Gomez <agomez@igalia.com>
CC: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
etna_resource_alloc and etna_resource_from_handle currently use different checks.
This leads to
etna_resource_from_handle:492: target=2, format=PIPE_FORMAT_B8G8R8X8_UNORM, 1080x1920x1, array_size=1, last_level=0, nr_samples=0, usage=0, bind=8000a, flags=0
etna_resource_from_handle:541: BO stride 4320 is too small for RS engine width padding (4352, format PIPE_FORMAT_B8G8R8X8_UNORM)
since etna_resource_from_handle wants to be aligned to a 16 byte
boundary while the etna_resource_alloc does not.
Adjust the two checks by using a common function.
Broken by baff59ebf0
Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
1. nir/nir_lower_vars_to_ssa.c:691:21: warning:
unused variable ‘var’
nir_variable *var = path->path[0]->var;
v2: Changes for some part of 'may be used uninitialized'
warnings were removed, seems like it is a compiler issue.
( Eric Engestrom <eric.engestrom@intel.com> )
Possible like this one:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46684
This issue is flagged as duplicate but an
original one is not closed yet.
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Bump minor to signal support for new formats and higher precision
solid pictures.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Support Component Alpha for those composite operations that do not require
per-channel alpha blending.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
In the case when we had both source and mask samplers, transformations were
typically not applied correctly.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
The only solid fill picture type we supported only had 8 bit color
channels. Add a new solid picture type that supports float channels.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Remove unused and obsolete code for gradients and component-alpha
Support solid source- and mask pictures using a variable number
of samplers in the composite pipeline rather than the fixed number
we used before.
Tested using rendercheck for XA.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Some hardware supports source mods only for float operations. Make it
possible to skip lowering to source mods in these cases.
v2: use option flags instead of a boolean (Jason Ekstrand)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
this helps reduce the overall code changes when a bit_size parameter is
added to nir_load_system_value
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
this allows to replace some nir_load_system_value calls with the specific
system value constructor
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
This shouldn't make any difference but I feel uneasy to use the
expanded aspects that do not represent the image in its entirety. If
we ever change the implementation of the anv_image_aspect_to_plane()
helper, this is safer.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
To play around with debugging, we might want to disable one or the
other component. Having 0s as default values makes this work.
Otherwise we might have NULL components, leading to crashes.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Those empty variables in the !wayland case are useless and running that
meson.build with them breaks the build:
[287/850] Generating wayland-drm-client-protocol.h with a custom command.
FAILED: src/egl/wayland/wayland-drm/wayland-drm-client-protocol.h
client-header ../src/egl/wayland/wayland-drm/wayland-drm.xml src/egl/wayland/wayland-drm/wayland-drm-client-protocol.h
/bin/sh: client-header: command not found
ninja: build stopped: subcommand failed.
Fixes: d1992255bb "meson: Add build Intel "anv" vulkan driver"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
`inc_wayland_drm` is only used if wayland is built, and it's already
added in that case a few lines below.
Fixes: a29869e872 "gbm: Don't traverse backwards for includes"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
These files are close to 4 years out of date; a lot's changed since.
Let's just check in a recently-regenerated version.
Changes generated by running `ninja xmlpool-{pot,update-po,gmo}`.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.
v2: Divided into individual patches for each gen
v3: Added additional engine definitions.
v4: Added missing engine definition to MI_TOPOLOGY_FILTER.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.
v2: Divided into individual patches for each gen
v3: Added additional engine definitions.
v4: Added missing engine definition to MI_TOPOLOGY_FILTER.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.
v2: Divided into individual patches for each gen
v3: Added additional engine definitions.
v4: Added more missing engine definitions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.
v2: Divided into individual patches for each gen
v3: Added additional engine definitions.
v4: Added missing engine tag for MI_TOPOLOGY_FILTER and MI_LOAD_URB_MEM.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.
v2: Divided into individual patches for each gen
v3: Added additional engine definitions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.
v2: Divided into individual patches for each gen
v3: Added additional engine definitions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.
v2: Divided into individual patches for each gen
v3: Added additional engine definitions
v4: Added missing engine to MEDIA_GATEWAY_STATE
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.
v2: Divided into individual patches for each gen
v3: Added additional engine definitions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.
v2: Divided into individual patches for each gen
v3: Added addition engine definitions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.
v2: Divided into individual patches for each gen
v3: Added additional engine definitions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The engine to which the batch was sent to is now set to the decoder context when
decoding the batch. This is needed so that we can distinguish between
instructions as the render and video pipe share some of the instruction opcodes.
v2: The engine is now in the decoder context and the batch decoder uses a local
function for finding the instruction for an engine.
v3: Spec uses engine_mask now instead of engine, replaced engine class enums
with the definitions from UAPI.
v4: Fix up aubinator_viewer (Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Removed the gen_engine enum and changed the involved functions to use the
drm_i915_gem_engine_class enum from UAPI instead.
v3: Wrong engine was being used for blocks in video ring
v4: Fixed aubinator_viewer.cpp
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Preliminary work for adding handling of different pipes to gen_decoder. Each
instruction needs to have a definition describing which engine it is meant for.
If left undefined, by default, the instruction is defined for all engines.
v2: Changed to use the engine class definitions from UAPI
v3: Changed I915_ENGINE_CLASS_TO_MASK to use BITSET_BIT, change engine to
engine_mask, added check for incorrect engine and added the possibility to
define an instruction to multiple engines using the "|" as a delimiter in the
engine attribute.
v4: Fixed the memory leak.
v5: Removed an unnecessary ralloc_free().
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
On the host VREND_DEBUG=guestallow must be set to let the guest override
the debug flags.
v2: Send flag string instead of flags, this avoids the need to keep
the flags in sync.
v3: Only request host logging if the host actually understands the command
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Transform feedback objects may hold a pointer to a shader program, and
at least in Gallium, this must be a valid pointer until
ctx->Driver.EndTransformFeedback in glEndTransformFeedback has been called
- which is conform with the spec that any program that is part of a
current rendering state should only be flagged for deletion by glDeleteProgram.
This was not handled properly for the transform feedback objects so that
a call sequence
glUseProgram(x)
glBeginTransformFreedback(...)
glPauseTransformFeedback(...)
glDeleteProgram(x)
glEndTransformFeedback(...)
would result in a use after free bug. With this patch the transform
feedback object also updates the reference count to the used program
thereby keeping the program valid as long as the transform feedback
objects links to it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108713
Fixes: 654587696b
mesa: add end_transform_feedback() helper
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
If the local work group size is variable it won't be available
at compile time so we can't lower it in nir_lower_system_values().
Signed-off-by: Plamena Manolova <plamena.n.manolova@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Prior to this patch sizeof(linear_header) was 20 bytes in a
non-debug build on 32-bit platforms. We do some pointer arithmetic to
calculate the next available location with
ptr = (linear_size_chunk *)((char *)&latest[1] + latest->offset);
in linear_alloc_child(). The &latest[1] adds 20 bytes, so an allocation
would only be 4-byte aligned.
On 32-bit SPARC a 'sttw' instruction (which stores a consecutive pair of
4-byte registers to memory) requires an 8-byte aligned address. Such an
instruction is used to store to an 8-byte integer type, like intmax_t
which is used in glcpp's expression_value_t struct.
As a result of the 4-byte alignment returned by linear_alloc_child() we
would generate a SIGBUS (unaligned exception) on SPARC.
According to the GNU libc manual malloc() always returns memory that has
at least an alignment of 8-bytes [1]. I think our allocator should do
the same.
So, simple fix with two parts:
(1) Increase SUBALLOC_ALIGNMENT to 8 unconditionally.
(2) Mark linear_header with an aligned attribute, which will cause
its sizeof to be rounded up to that alignment. (We already do
this for ralloc_header)
With this done, all Mesa's unit tests now pass on SPARC.
[1] https://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html
Fixes: 47e1758692 ("glcpp: use the linear allocator for most objects")
Bug: https://bugs.gentoo.org/636326
Reviewed-by: Eric Anholt <eric@anholt.net>
For example the following type of thing is seen in TCS from
a number of Vulkan and DXVK games:
vec1 32 ssa_557 = deref_var &oPatch (shader_out float)
vec1 32 ssa_558 = intrinsic load_deref (ssa_557) ()
vec1 32 ssa_559 = deref_var &oPatch@42 (shader_out float)
vec1 32 ssa_560 = intrinsic load_deref (ssa_559) ()
vec1 32 ssa_561 = deref_var &oPatch@43 (shader_out float)
vec1 32 ssa_562 = intrinsic load_deref (ssa_561) ()
intrinsic store_deref (ssa_557, ssa_558) (1) /* wrmask=x */
intrinsic store_deref (ssa_559, ssa_560) (1) /* wrmask=x */
intrinsic store_deref (ssa_561, ssa_562) (1) /* wrmask=x */
No shader-db changes on i965 (SKL).
vkpipeline-db results RADV (VEGA):
Totals from affected shaders:
SGPRS: 7832 -> 7728 (-1.33 %)
VGPRS: 6476 -> 6740 (4.08 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 469572 -> 456596 (-2.76 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 989 -> 960 (-2.93 %)
Wait states: 0 -> 0 (0.00 %)
The Max Waves and VGPRS changes here are misleading. What is
happening is a bunch of TCS outputs are being optimised away as
they are now recognised as unused. This results in more varyings
being compacted via nir_compact_varyings() which can result in
more register pressure when they are not packed in an optimal way.
This is an existing problem independent of this patch. I've run
some benchmarks and haven't noticed any performance regressions
in affected games.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Previously the debug would be:
libEGL debug: No DRI config supports native format 0x20203852
libEGL debug: No DRI config supports native format 0x38385247
but
libEGL debug: No DRI config supports native format R8
libEGL debug: No DRI config supports native format GR88
is a lot easier to understand.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
This requires that the caller make a little (stack) allocation to store
the string.
v2: Use gbm_format_canonicalize (suggested by Daniel)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
There are two problems:
1) the extra underscore in MISSING_64BIT_ATOMICS
2) we should link with libatomic if the previous test decided we needed
it
Fixes: d1992255bb
("meson: Add build Intel "anv" vulkan driver")
Reviewed-and-Tested-by: Matt Turner <mattst88@gmail.com>
regs is only set and used on x86; on other platforms (like ARM), this
code causes a trivial warning, solved by moving the regs declaration to
the architecture-dependent usage.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
meson does this for you with its warn levels, so we don't need to set
it ourselves.
Fixes: d1992255bb
("meson: Add build Intel "anv" vulkan driver")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
We're about to introduce AYUV support which provides its own alpha
channel. So give alpha as a parameter and set it to 1 on exising
formats.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Only needed when the pipeline actually uses tessellation. I don't
think that changes anything, except improving readability.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This can occur during payload setup of SIMD-split send message
instructions, which can lead to the emission of header setup
instructions with a non-zero channel group and fixed SIMD width. Such
instructions could end up using undefined channel enable signals
except they don't care since they're always marked force_writemask_all.
Not known to affect correctness of any workload at this point, but it
would be trivial to back-port to stable if something comes up.
Reported-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Sagar Ghuge <sagar.ghuge@intel.com>
Just break out of the loop instead, it does the same thing.
Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
This fixes various crashes and hangs when using nine's 'thread_submit'
feature.
On 64bit, the thread function's data argument would just be NULL.
On 32bit, the data argument would be garbage depending on the compiler
flags (in my case -march>=core2).
Fixes: f3fa7e3068 ("st/nine: Use WINE thread for threadpool")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
It seems I missed some details when exposing NV_conditional_render
on GLES; this fixes up "make check".
Fixes: 5213be9fab ("mesa: expose NV_conditional_render on GLES")
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-and-Tested-by: Eric Engestrom <eric.engestrom@intel.com>
Helpful for debugging compiler backend problems: this allows us to
easily retrieve the LLVM IR from RenderDoc.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The extension spec has been updated to include GLES 2 support, so let's
enable it there.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
nir_alu_type_get_type_size takes a type as parameter and we were
passing a bit-size instead, which did what we wanted by accident,
since a bit-size of zero matches nir_type_invalid, which has a
size of 0 too.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
SIMD16 instructions need to have additional interferences to prevent
source / destination hazards when the source and destination registers
are off by one register.
While we already have code to handle this, it was only running for SIMD16
dispatches, however, we can have SIDM16 instructions in a SIMD8 dispatch.
An example of this are pull constant loads since commit b56fa830c6,
but there are more cases.
This fixes a number of CTS test failures found in work-in-progress
tests that were hitting this situation for 16-wide pull constants
in a SIMD8 program.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Because we only have one file_max for the (2d) gs input file, the value
actually represents the max of attrib and vertex index (although I'm
not entirely sure if we really want the max, since the max valid value
of the vertex dimension can be easily deduced from the input primitive).
Thus in cases where the number of inputs is higher than the number of
vertices per prim, we did not properly clamp the vertex index, which
would result in out-of-bound fetches, potentially causing segfaults
(the segfaults seemed actually difficult to trigger, but valgrind
certainly wasn't happy). This might have happened even if the shader
did not actually try to fetch bogus vertices, if the fetching happened
in non-active conditional clauses.
To fix simply use the correct max vertex index value (derived from
the input prim type) instead when clamping for this case.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Fixes Skqp's unitTest_EGLImageTest test.
For Intel platforms, we support external textures only for EGLImages
created with EGL_EXT_image_dma_buf_import. This restriction seems to
be Intel specific and not present for other platforms.
While running SKQP test - unitTest_EGLImageTest, GL_INVALID is sent
to the test because of this restriction.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105301
Signed-off-by: Aditya Swarup <aditya.swarup@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Use #pragma warning(off) and #pragma warning(on) to disable or enable
all warnings. This is a big hammer. If we ever need a smaller hammer,
we can enhance this functionality.
There is one lame thing about this. Because we parse everything, create
an AST, then convert the AST to GLSL IR, we have to treat the #pragma
like a statment. This means that you can't do something like
' void
' #pragma warning(off)
' __foo
' #pragma warning(on)
' (float param0);
Fixing that would, as far as I can tell, require a huge amount of work.
I did try just handling the #pragma during parsing (like we do for
state for the whole shader.
v2: Fix the #pragma lines in the commit message that git-commit ate.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
As of this commit, all uses of const sources either go through a
nir_src_as_<type> helper which handles bit sizes correctly or else are
accompanied by a nir_src_bit_size() == 32 assertion to assert that we
have the size we think we have.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As of this commit, all uses of const sources either go through a
nir_src_as_<type> helper which handles bit sizes correctly or else are
accompanied by a nir_src_bit_size() == 32 assertion to assert that we
have the size we think we have.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Everywhere we handle SSBO intrinsics, we have exactly the same pattern
for computing the index so we may as well make a helper for it. We also
add a get_nir_src_imm to vec4 and use it for SSBO offsets.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
HTILE is supported on these chips, not sure how I missed that.
This restores using PFP_SYNC_ME when LOAD_CONTEXT_REG is not used.
Fixes: f425d9ee74 ("radv: use LOAD_CONTEXT_REG when loading fast clear values")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This avoids syncing the Micro Engine. This is only supported
for VI+ currently. There is probably a way for using
LOAD_CONTEXT_REG on previous chips but that could be done later.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
GLXCreate{,New}Context, like most X resource creation requests, does not
emit a reply and therefore is emitted into the X stream asynchronously.
However, unlike most resource creation requests, the GLXContext we
return is a handle to library state instead of an XID. So if context
creation fails for any reason - say, the server doesn't support indirect
contexts - then we will fail in strange places for strange reasons.
We could make every GLX entrypoint robust against half-created contexts,
or we could just verify that context creation worked. Reuse the
__glXIsDirect code to do this, as a cheap way of verifying that the
XID is real.
glXCreateContextAttribsARB solves this by using the _checked version of
the xcb command, so effectively this change makes the classic context
creation paths as robust as CreateContextAttribs.
v2: Better use of Bool, check that error != NULL first (Olivier Fourdan)
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
In function 'uint8_t nv50_ir::getTEXSMask(uint8_t)':
warning: control reaches end of non-void function [-Wreturn-type]
Reported-by: Moiman@freenode
Fixes: f821e80213
"gm107/ir: use scalar tex instructions where possible"
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
TEXS, TLD4 and TLD4S are variants of tex instructions which are more
scalar, which gives RA more freedom and is less likely to insert silly
MOVs to satisfy quad registers.
shader-db changes:
total instructions in shared programs : 7687265 -> 7614782 (-0.94%)
total gprs used in shared programs : 803620 -> 798045 (-0.69%)
total shared used in shared programs : 639636 -> 639636 (0.00%)
total local used in shared programs : 24648 -> 24648 (0.00%)
total bytes used in shared programs : 82103400 -> 81330696 (-0.94%)
local shared gpr inst bytes
helped 0 0 3648 10647 10647
hurt 0 0 464 205 205
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The biggest change here is the rename of VK_NVX_ray_tracing to
VK_NV_ray_tracing and the total removal of VK_KHR_mir_surface.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Enables on R600 and makes pass:
dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.*
dEQP-GLES31.functional.texture.filtering.cube_array.formats.sr8*
v2: remove chunk for dri/radeon (Emil)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Allocating through Gralloc implies buffers are going to be used
outside the driver. We have special MOCS settings for external BOs and
we probably want to use them here too.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a1220e7311 ("anv/android: Set the BO flags in bo_cache_import (v2)")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This reduces the amount of #ifdef ANDROID we'll have to have inside
the driver. Potentially offering better coverage of the android
extensions.
v2: Move anv_android.h include before anv_entrypoints.h (Tapani)
Fix autotools android build (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
At higher resolutions with the addition of MSAA, the number of tiles
can increase to the point where we use more than one VSC pipe per
tile. Which would cause us to calculate an out-of-bounds offset for
VSC_SIZE_ADDRESS. So don't try to be clever, just always put it at
a fixed offset assuming the max 32 VSC pipes in use.
Signed-off-by: Rob Clark <robdclark@gmail.com>
After commit a9fb331ea ("wayland/egl: update surface size on window
resize"), the surface size is updated as soon as the resize is done, and
`update_buffers()` would resize only if the surface size differs from
the attached size.
However, in the case of swrast, there is no resize callback and the
attached size is updated in `dri2_wl_swrast_commit_backbuffer()` prior
to the `swrast_update_buffers()` so the attached size is always up to
date when it reaches `swrast_update_buffers()` and the surface is never
resized.
This can be observed with "totem" using the GDK backend on Wayland (the
default) when running on software rendering:
$ LIBGL_ALWAYS_SOFTWARE=true CLUTTER_BACKEND=gdk totem
Resizing the window would leave the EGL surface size unchanged.
To avoid the issue, partially revert the part of commit a9fb331ea for
`swrast_update_buffers()` and resize on the win size and not the
attached size.
Fixes: a9fb331ea - wayland/egl: update surface size on window resize
Signed-off-by: Olivier Fourdan <ofourdan@redhat.com>
CC: Daniel Stone <daniel@fooishbar.org>
CC: Juan A. Suarez Romero <jasuarez@igalia.com>
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
If the user provides an invalid display or device the ToVendor lookup
will fail.
In this case, the local [Mesa vendor] error code will be set. Thus on
sequential eglGetError(), the error will be EGL_SUCCESS.
To be more specific, GLVND remembers the last vendor and calls back
into it's eglGetError, although there's no guarantee to ever have had
one.
v2:
- Add _eglError call, so the debug callback is executed (Kyle)
- Drop XXX comment.
Piglit: tests/egl/spec/egl_ext_device_query
Fixes: ce562f9e3f ("EGL: Implement the libglvnd interface for EGL (v3)")
Cc: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kyle Brenneman <kbrenneman@nvidia.com>
eglQueryDevicesEXT (unlike the other three functions) does not depend
on the display. It is implemented in GLVND, which calls into each
driver collecting the list of devices and presenting it to the user.
For the other entrypoints, GLVND acts as pass through stub calling into
the vendor library. The vendor implementation calls back into GLVND to
get the vendor dispatch. Then the driver proceeds to call itself via
the said dispatch.
This design makes is possible to keep using "old" GLVND with newer
vendor drivers. Since effectively all the extension code is within the
latter itself.
Without said entrypoints, any user will outright crash - as reported in
the bug report.
Note: there's a follow-up fix needed to our GLVND code, to make piglit
happy.
v2: add some beefy documentation in the commit message.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108635
Fixes: 7552fcb7b9 ("egl: add base EGL_EXT_device_base implementation")
Reported-by: kyle.devir@mykolab.com
Cc: kyle.devir@mykolab.com
Acked-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Emil Velikov <emil.velikov@collabora.com>
We can only map at page aligned offsets. We got that wrong with buffer
size where (size % 4096) != 0 (anv has a WA buffer of 1024).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The spec says:
"vkCmdCopyQueryPoolResults is considered to be a transfer
operation, and its writes to buffer memory must be synchronized
using VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT
before using the results."
VK_PIPELINE_STAGE_TRANSFER_BIT will wait for compute to be idle,
while VK_ACCESS_TRANSFER_WRITE_BIT will invalidate both L1 vector
caches and L2. So, it's useless to set those flags internally.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
In the non-SSO case, where multiple shader stages are linked together,
we were recording garbage pipe_stream_output_info structures for all
but the last enabled geometry-processing stage.
Specifically, we were using the gl_transform_feedback_info from
shader_program->last_vert_prog (the stage whose outputs will be
recorded)...but were pairing it with the output varying mappings
from the current shader stage. For example, a program with a VS and
GS, the VS's pipe_shader_state would have a pipe_stream_output_info
based on the GS transform feedback info, but the VS output mapping.
This generally worked out okay because only the pipe_stream_output_info
for the last stage really matters - the others can be ignored. However,
we'd like to avoid confusing the pipe driver. In particular, my new
driver translates the stream out information to hardware packets at
bind_{vs,tes,gs}_state() time...and was hitting asserts about garbage
varyings that didn't exist.
This patch changes st/mesa to record a blank pipe_stream_output_info
with num_outputs = 0 for all stages prior to last_vert_prog. The last
one is captured as normal.
(In the fully-SSO case, nothing should change - each program contains
a single shader stage, so last_vert_prog *is* the current shader.)
Tested with llvmpipe (piglit's gpu profile), and freedreno (a3xx,
gpu profile with -t transform.feedback). Fixes several hundred CTS
tests on my new driver.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
There are some cases where the VS is the only stage enabled, it uses the
entire URB, and the URB is large enough that placing later stages after
the VS exceeds the number of bits for "URB Starting Address".
For example, on Icelake GT2, "varying-packing-simple mat2x4 array" from
Piglit is getting a starting offset of 128 for the GS/HS/DS. But the
field is only large enough to hold an offset of 127.
i965 doesn't hit any genxml assertions because it's still using the old
OUT_BATCH mechanism. 128 << GEN7_URB_STARTING_ADDRESS_SHIFT (57) == 0,
with the extra bit falling off the end. So we place the disabled stage
at the beginning of the URB (overlapping with push constants). This is
likely okay since it's a zero size region (0 entries).
It seems like the Vulkan driver might hit this assertion, however, and
the situation seems harmless. To work around this, always place
disabled stages at the start of the URB, so the last enabled stage can
fill the remaining space without overflowing the field.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
libmesa_git_sha1 whole static dependency is added to get git_sha1.h header
and avoid following building error:
external/mesa/src/amd/vulkan/radv_device.c:46:10:
fatal error: 'git_sha1.h' file not found
^
1 error generated.
Fixes: 9d40ec2cf6 ("radv: Add support for VK_KHR_driver_properties.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
In some cases (not building with llvm, which automatically pulls in
pthreads) nine needs to be directly linked with pthreads. Fixes building
on x86 (32 bit) without llvm.
Distro bug: https://bugs.gentoo.org/670094
Fixes: 6b4c7047d5
("meson: build gallium nine state_tracker")
Tested-by: Rafal Lalik <rafallalik@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
WA_1606682166:
Incorrect TDL's SSP address shift in SARB for 16:6 & 18:8 modes.
Disable the Sampler state prefetch functionality in the SARB by
programming 0xB000[30] to '1'. This is to be done at boot time and
the feature must remain disabled permanently.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In the same spirit as commit a5889d70f2
"i965/icl: Disable binding table prefetching". Fixes some 110+
intermittent piglit failures with tex-miplevel-selection variants.
WA_1606682166:
Incorrect TDL's SSP address shift in SARB for 16:6 & 18:8 modes.
Disable the Sampler state prefetch functionality in the SARB by
programming 0xB000[30] to '1'. This is to be done at boot time and
the feature must remain disabled permanently.
Anuj: Set SamplerCount = 0 for vs, gs, hs, ds and wm units as well.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We cannot use nir_build_alu() to create the new alu as it has no
way to know how many components of the src we will use. This
results in it guessing the max number of components from one of
its inputs.
Fixes the following CTS tests:
dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag
dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_geom
dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_tessc
dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_vert
Fixes: 2975422ceb ("nir: propagates if condition evaluation down some alu chains")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
To avoid build error in u_debug_stack_android.cpp
due to now missing u_debug.h header:
external/mesa/src/gallium/auxiliary/util/u_debug_stack_android.cpp:26:10:
fatal error: 'u_debug.h' file not found
#include "u_debug.h"
^
1 error generated.
Fixes: 37db383abb ("util: Move u_debug to utils")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
The bind flags defined by mesa/gallium might not always be in sync
with the ones copied to virglrenderer/gallium. Therefore, use the
flags defined in virgl like it is done for all the other calls to
create resources.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This only adds support on the Gallium core level, for the drivers
it is likely that additional changes are needed to support the
new texture format and thereby enabling the extension.
Enables on softpipe and makes pass:
dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.*
v2: - add include for getting GL_SR8_EXT
v4: - since the extension is not required don't bother providing
a fallback (Ilia Mirkin)
- split patch (2/2) to separate Gallium and mesa/st parts
(Roland Scheidegger)
- trim commit message to only contain the history of the patch
relevant to this part
v5: - don't include GLES headers (required enum has been added to glheader.h)
(Ilia Mirkin)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This format is needed to support EXT_texture_sRGB_R8. THe patch adds a new
format enum, the format entries in Gallium and and svga, the mapping between
sRGB and linear formats, and tests.
v2: - add mapping to linear format for PIPE_FORMATR_R8_SRGB
v3: - Add texture format to svga format table since otherwise building
mesa will fail when this driver is enabled. It was not tested
whether the extension actually works.
v4: - svga: remove the SVGA specific format definitions and table entries
and only add correct the location of PIPE_FORMAT_R8_SRGB in the
format_conversion_table (Ilia Mirkin)
- Split patch (1/2) to separate Gallium part and mesa/st part.
(Roland Scheidegger)
- Trim the commit message to only contain the relevant parts from the
split.
v5: - svga: correct location of PIPE_FORMAT_SRGB_R8 (Ilia Mirkin)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: - fix format definition line
- disable for desktop GL
- don't add GL_R8_EXT to glext.h since it is already in
GLES2/gl2ext.h in glext.h and include this header where needed
(all Emil)
v3: - swrast: Fill the function table for sRGB_R8
The size of the function table is checked at compile time and must
correspond to the number of mesa texture formats.
dri/swrast being gles-2.0 doesn't support the extension though
v4: - correct format layout comment (Ilia Mirkin)
- correct logic for accepting GL_RED only textures (in part Ilia Mirkin)
EXT_texture_sRGB_R8 requires OpenGL ES 3.0 which includes
ARB_texture_rg/EXT_texture_rg, so one only must check for the first
when SR8_EXT is really requested.
v5: - add define for GL_ES8_XT to glheader.h and don't include GLES
headers (Ilia Mirkin)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The GLSL 4.6 specification (section 4.1.14. "Implicit Conversions")
says:
"There are no implicit array or structure conversions. For
example, an array of int cannot be implicitly converted to an
array of float."
So let's add a check in place when assigning array initializers to
implicitly sized arrays, to avoid incorrectly allowing code on the
form:
int[] foo = float[](1.0, 2.0, 3.0)
This fixes the following dEQP test-cases:
- dEQP-GLES31.functional.shaders.implicit_conversions.es31.invalid.arrays.int_to_float_vertex
- dEQP-GLES31.functional.shaders.implicit_conversions.es31.invalid.arrays.int_to_float_fragment
- dEQP-GLES31.functional.shaders.implicit_conversions.es31.invalid.arrays.int_to_uint_vertex
- dEQP-GLES31.functional.shaders.implicit_conversions.es31.invalid.arrays.int_to_uint_fragment
- dEQP-GLES31.functional.shaders.implicit_conversions.es31.invalid.arrays.uint_to_float_vertex
- dEQP-GLES31.functional.shaders.implicit_conversions.es31.invalid.arrays.uint_to_float_fragment
- dEQP-GLES31.functional.shaders.implicit_conversions.es32.invalid.arrays.int_to_float_vertex
- dEQP-GLES31.functional.shaders.implicit_conversions.es32.invalid.arrays.int_to_float_fragment
- dEQP-GLES31.functional.shaders.implicit_conversions.es32.invalid.arrays.int_to_uint_vertex
- dEQP-GLES31.functional.shaders.implicit_conversions.es32.invalid.arrays.int_to_uint_fragment
- dEQP-GLES31.functional.shaders.implicit_conversions.es32.invalid.arrays.uint_to_float_vertex
- dEQP-GLES31.functional.shaders.implicit_conversions.es32.invalid.arrays.uint_to_float_fragment
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
EXT_shader_implicit_conversions adds support for implicit conversions
for GLES 3.1 and above.
This is essentially a subset of ARB_gpu_shader5, and augments
OES_gpu_shader5.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
In GLES, we currently either need an exact match with a local function,
or an exact match with a builtin.
However, if we add support for implicit conversions for GLES shaders,
we also need to fall back to a non-exact match in the case where there
were no builtin match either.
Luckily, we already have a variable ready with this, so let's just
return it if the builtin-search failed.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This makes the code a bit easier to read, as well as reduces repetition,
especially when we add support for EXT_shader_implicit_conversions.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This makes the code a bit easier to read, as well as will reduce
repetition when we add support for EXT_shader_implicit_conversions.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
If the user gives 0 counterBuffers then the driver should still
enable transform feedback on all targets. This changes the
driver to always enable xfb, and use counter buffers where
one is defined for the target in question.
Fixes: b4eb029062 (radv: implement VK_EXT_transform_feedback)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
In order to handle pause/resume properly, the offset should
be added to the buffer binding not to the begin/end paths.
v2: don't add offset to size
Fixes ext_transform_feedback-alignment* under zink
Fixes: b4eb029062 (radv: implement VK_EXT_transform_feedback)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Since 0c1dd9dee0 ("broadcom/vc4: Allow importing linear BOs with
arbitrary offset/stride."), we have the vc4-side BO properly laid out
(assuming it's linear) in the winsys BO so that we can skip this extra
copy.
The recompile reduction is nice, but this also makes it so that a straight
texture copy could get optimized some day to not unpack/repack the f16
values.
If the caller has passed in a stride for (linear) BO import, we should use
that stride when rendering to the BO (or, if we some day support texturing
from linear-imported BOs, when doing the linear-to-UIF shadow copy). This
lets us remove the extra stride-changing relayout in the simulator.
This came from vc4, where we had a file format for GPU hangs. I don't
have one of those for V3D, and I probably won't ever have the simulator
side produce dumps even if I do.
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=105731">Bug 105731</a> - linker error "fragment shader input ... has no matching output in the previous stage" when previous stage's output declaration in a separate shader object</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107511">Bug 107511</a> - KHR/khrplatform.h not always installed when needed</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107626">Bug 107626</a> - [SNB] The graphical corruption and GPU hang occur sometimes on the piglit test "arb_texture_multisample-large-float-texture" with parameter --fp16</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107626">Bug 107626</a> - [SNB] The graphical corruption and GPU hang occur sometimes on the piglit test "arb_texture_multisample-large-float-texture" with parameter --fp16</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=107856">Bug 107856</a> - i965 incorrectly calculates the number of layers for texture views (assert)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=106577">Bug 106577</a> - broken rendering with nine and nouveau (GM107)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108245">Bug 108245</a> - RADV/Vega: Low mip levels of large BCn textures get corrupted by vkCmdCopyBufferToImage</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108311">Bug 108311</a> - Query buffer object support is broken on r600.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108894">Bug 108894</a> - [anv] vkCmdCopyBuffer() and vkCmdCopyQueryPoolResults() write-after-write hazard</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108909">Bug 108909</a> - Vkd3d test failure test_resolve_non_issued_query_data()</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108914">Bug 108914</a> - blocky shadow artifacts in The Forest with DXVK, RADV_DEBUG=nohiz fixes this</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=108925">Bug 108925</a> - vkCmdCopyQueryPoolResults(VK_QUERY_RESULT_WAIT_BIT) for timestamps with large query count hangs</li>
</ul>
<h2>Changes</h2>
<p>Alex Smith (1):</p>
<ul>
<li>radv: Flush before vkCmdWriteTimestamp() if needed</li>
</ul>
<p>Bas Nieuwenhuizen (4):</p>
<ul>
<li>radv: Align large buffers to the fragment size.</li>
<li>radv: Clamp gfx9 image view extents to the allocated image extents.</li>
<li>radv/android: Mark android WSI image as shareable.</li>
<li>radv/android: Use buffer metadata to determine scanout compat.</li>
</ul>
<p>Dave Airlie (2):</p>
<ul>
<li>r600: make suballocator 256-bytes align</li>
<li>radv: use 3d shader for gfx9 copies if dst is 3d</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>egl/wayland: bail out when drmGetMagic fails</li>
<li>egl/wayland: plug memory leak in drm_handle_device()</li>
</ul>
<p>Eric Anholt (3):</p>
<ul>
<li>v3d: Fix a leak of the transfer helper on screen destroy.</li>
<li>vc4: Fix a leak of the transfer helper on screen destroy.</li>
<li>v3d: Fix a leak of the disassembled instruction string during debug dumps.</li>
</ul>
<p>Eric Engestrom (3):</p>
<ul>
<li>anv: correctly use vulkan 1.0 by default</li>
<li>wsi/display: fix mem leak when freeing swapchains</li>
<li>vulkan/wsi: fix s/,/;/ typo</li>
</ul>
<p>Gurchetan Singh (3):</p>
<ul>
<li>virgl: quadruple command buffer size</li>
<li>virgl: avoid large inline transfers</li>
<li>virgl: don't mark buffers as unclean after a write</li>
Mesa 19.0.0 is a new development release. People who are concerned
with stability and reliability should stick with a previous release or
wait for Mesa 19.0.1.
</p>
<p>
Mesa 19.0.0 implements the OpenGL 4.5 API, but the version reported by
glGetString(GL_VERSION) or glGetIntegerv(GL_MAJOR_VERSION) /
glGetIntegerv(GL_MINOR_VERSION) depends on the particular driver being used.
Some drivers don't support all the features required in OpenGL 4.5. OpenGL
4.5 is <strong>only</strong> available if requested at context creation.
Compatibility contexts may report a lower version depending on each driver.
</p>
<h2>SHA256 checksums</h2>
<pre>
TBD.
</pre>
<h2>New features</h2>
<ul>
<li>GL_AMD_texture_texture4 on all GL 4.0 drivers.</li>
<li>GL_EXT_shader_implicit_conversions on all drivers (ES extension).</li>
<li>GL_EXT_texture_compression_bptc on all GL 4.0 drivers (ES extension).</li>
<li>GL_EXT_texture_compression_rgtc on all GL 3.0 drivers (ES extension).</li>
<li>GL_EXT_render_snorm on gallium drivers (ES extension).</li>
<li>GL_EXT_texture_view on drivers supporting texture views (ES extension).</li>
<li>GL_OES_texture_view on drivers supporting texture views (ES extension).</li>
<li>GL_NV_shader_atomic_float on nvc0 (Fermi/Kepler only).</li>
<li>Shader-based software implementations of GL_ARB_gpu_shader_fp64, GL_ARB_gpu_shader_int64, GL_ARB_vertex_attrib_64bit, and GL_ARB_shader_ballot on i965.</li>
<li>VK_ANDROID_external_memory_android_hardware_buffer on Intel</li>
<li>Fixed and re-exposed VK_EXT_pci_bus_info on Intel and RADV</li>
<li>VK_EXT_scalar_block_layout on Intel and RADV</li>
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.